Materialize this LazyFrame into a DataFrame
Description
By default, all query optimizations are enabled. Individual
optimizations may be disabled by setting the corresponding parameter to
FALSE
.
Usage
<LazyFrame>$collect(
...,
type_coercion = TRUE,
`_type_check` = TRUE,
predicate_pushdown = TRUE,
projection_pushdown = TRUE,
simplify_expression = TRUE,
slice_pushdown = TRUE,
comm_subplan_elim = TRUE,
comm_subexpr_elim = TRUE,
cluster_with_columns = TRUE,
collapse_joins = TRUE,
no_optimization = FALSE,
engine = c("auto", "in-memory", "streaming", "old-streaming"),
streaming = FALSE,
`_check_order` = TRUE,
`_eager` = FALSE
)
Arguments
…
|
These dots are for future extensions and must be empty.
|
type_coercion
|
A logical, indicats type coercion optimization.
|
predicate_pushdown
|
A logical, indicats predicate pushdown optimization.
|
projection_pushdown
|
A logical, indicats projection pushdown optimization.
|
simplify_expression
|
A logical, indicats simplify expression optimization.
|
slice_pushdown
|
A logical, indicats slice pushdown optimization.
|
comm_subplan_elim
|
A logical, indicats tring to cache branching subplans that occur on
self-joins or unions.
|
comm_subexpr_elim
|
A logical, indicats tring to cache common subexpressions.
|
cluster_with_columns
|
A logical, indicats to combine sequential independent calls to
with_columns.
|
collapse_joins
|
Collapse a join and filters into a faster join.
|
no_optimization
|
A logical. If TRUE , turn off (certain) optimizations.
|
engine
|
The engine name to use for processing the query. One of the followings:
-
“auto” (default): Select the engine automatically. The
“in-memory” engine will be selected for most cases.
-
“in-memory” : Use the in-memory engine.
-
“streaming” :
Use the (new) streaming engine.
-
“old-streaming” :
Use the old streaming engine.
|
streaming
|
A logical. If TRUE , process the query in batches to handle
larger-than-memory data. If FALSE (default), the entire
query is processed in a single batch. Note that streaming mode is
considered unstable. It may be changed at any point without it being
considered a breaking change.
|
\_check_order ,
\_type_check
|
For internal use only.
|
\_eager
|
A logical, indicates to turn off multi-node optimizations and the other
optimizations. This option is intended for internal use only.
|
Value
A polars DataFrame
See Also
-
$profile()
- same as
$collect()
but also returns a
table with each operation profiled.
-
$sink_parquet()
streams query to a parquet file.
-
$sink_ipc()
streams query to a arrow file.
Examples
library("polars")
lf <- pl$LazyFrame(
a = c("a", "b", "a", "b", "b", "c"),
b = 1:6,
c = 6:1,
)
lf$group_by("a")$agg(pl$all()$sum())$collect()
#> shape: (3, 3)
#> ┌─────┬─────┬─────┐
#> │ a ┆ b ┆ c │
#> │ --- ┆ --- ┆ --- │
#> │ str ┆ i32 ┆ i32 │
#> ╞═════╪═════╪═════╡
#> │ c ┆ 6 ┆ 1 │
#> │ a ┆ 4 ┆ 10 │
#> │ b ┆ 11 ┆ 10 │
#> └─────┴─────┴─────┘
# Collect in streaming mode
lf$group_by("a")$agg(pl$all()$sum())$collect(
streaming = TRUE
)
#> shape: (3, 3)
#> ┌─────┬─────┬─────┐
#> │ a ┆ b ┆ c │
#> │ --- ┆ --- ┆ --- │
#> │ str ┆ i32 ┆ i32 │
#> ╞═════╪═════╪═════╡
#> │ b ┆ 11 ┆ 10 │
#> │ a ┆ 4 ┆ 10 │
#> │ c ┆ 6 ┆ 1 │
#> └─────┴─────┴─────┘