Evaluate the query in streaming mode and write to a CSV file
Description
This allows streaming results that are larger than RAM to be written to disk.
Usage
<LazyFrame>$sink_csv(
path,
...,
include_bom = FALSE,
include_header = TRUE,
separator = ",",
line_terminator = "\n",
quote_char = "\"",
batch_size = 1024,
datetime_format = NULL,
date_format = NULL,
time_format = NULL,
float_scientific = NULL,
float_precision = NULL,
null_value = "",
quote_style = c("necessary", "always", "never", "non_numeric"),
maintain_order = TRUE,
type_coercion = TRUE,
`_type_check` = TRUE,
predicate_pushdown = TRUE,
projection_pushdown = TRUE,
simplify_expression = TRUE,
slice_pushdown = TRUE,
collapse_joins = TRUE,
no_optimization = FALSE,
storage_options = NULL,
retries = 2,
sync_on_close = c("none", "data", "all"),
mkdir = FALSE
)
Arguments
path
|
A character. File path to which the file should be written. |
…
|
These dots are for future extensions and must be empty. |
include_bom
|
Logical, whether to include UTF-8 BOM in the CSV output. |
include_header
|
Logical, whether to include header in the CSV output. |
separator
|
Separate CSV fields with this symbol. |
line_terminator
|
String used to end each row. |
quote_char
|
Byte to use as quoting character. |
batch_size
|
Number of rows that will be processed per thread. |
datetime_format
|
A format string, with the specifiers defined by the chrono Rust crate. If no format specified, the default fractional-second precision is inferred from the maximum timeunit found in the frame’s Datetime cols (if any). |
date_format
|
A format string, with the specifiers defined by the chrono Rust crate. |
time_format
|
A format string, with the specifiers defined by the chrono Rust crate. |
float_scientific
|
Whether to use scientific form always (TRUE ), never
(FALSE ), or automatically (NULL ) for Float32
and Float64 datatypes.
|
float_precision
|
Number of decimal places to write, applied to both Float32 and Float64 datatypes. |
null_value
|
A string representing null values (defaulting to the empty string). |
quote_style
|
Determines the quoting strategy used. Must be one of:
|
maintain_order
|
Maintain the order in which data is processed. Setting this to
FALSE will be slightly faster.
|
type_coercion
|
A logical, indicats type coercion optimization. |
\_type_check
|
For internal use only. |
predicate_pushdown
|
A logical, indicats predicate pushdown optimization. |
projection_pushdown
|
A logical, indicats projection pushdown optimization. |
simplify_expression
|
A logical, indicats simplify expression optimization. |
slice_pushdown
|
A logical, indicats slice pushdown optimization. |
collapse_joins
|
Collapse a join and filters into a faster join. |
no_optimization
|
A logical. If TRUE , turn off (certain) optimizations.
|
storage_options
|
Named vector containing options that indicate how to connect to a cloud
provider. The cloud providers currently supported are AWS, GCP, and
Azure. See supported keys here:
storage_options is not provided, Polars will try to
infer the information from environment variables.
|
retries
|
Number of retries if accessing a cloud instance fails. |
sync_on_close
|
Sync to disk when before closing a file. Must be one of:
|
mkdir
|
Recursively create all the directories in the path. |
Value
Invisibly returns the input LazyFrame
Examples
library("polars")
# sink table 'mtcars' from mem to CSV
tmpf <- tempfile(fileext = ".csv")
as_polars_lf(mtcars)$sink_csv(tmpf)
# stream a query end-to-end
tmpf2 <- tempfile(fileext = ".csv")
pl$scan_csv(tmpf)$select(pl$col("cyl") * 2)$sink_csv(tmpf2)
# load parquet directly into a DataFrame / memory
pl$scan_csv(tmpf2)$collect()
#> shape: (32, 1)
#> ┌──────┐
#> │ cyl │
#> │ --- │
#> │ f64 │
#> ╞══════╡
#> │ 12.0 │
#> │ 12.0 │
#> │ 8.0 │
#> │ 12.0 │
#> │ 16.0 │
#> │ … │
#> │ 8.0 │
#> │ 16.0 │
#> │ 12.0 │
#> │ 16.0 │
#> │ 8.0 │
#> └──────┘