Skip to content

Materialize this LazyFrame into a DataFrame

Description

By default, all query optimizations are enabled. Individual optimizations may be disabled by setting the corresponding parameter to FALSE.

Usage

<LazyFrame>$collect(
  ...,
  type_coercion = TRUE,
  `_type_check` = TRUE,
  predicate_pushdown = TRUE,
  projection_pushdown = TRUE,
  simplify_expression = TRUE,
  slice_pushdown = TRUE,
  comm_subplan_elim = TRUE,
  comm_subexpr_elim = TRUE,
  cluster_with_columns = TRUE,
  collapse_joins = TRUE,
  no_optimization = FALSE,
  engine = c("auto", "in-memory", "streaming", "old-streaming"),
  streaming = FALSE,
  `_check_order` = TRUE,
  `_eager` = FALSE
)

Arguments

These dots are for future extensions and must be empty.
type_coercion A logical, indicats type coercion optimization.
predicate_pushdown A logical, indicats predicate pushdown optimization.
projection_pushdown A logical, indicats projection pushdown optimization.
simplify_expression A logical, indicats simplify expression optimization.
slice_pushdown A logical, indicats slice pushdown optimization.
comm_subplan_elim A logical, indicats tring to cache branching subplans that occur on self-joins or unions.
comm_subexpr_elim A logical, indicats tring to cache common subexpressions.
cluster_with_columns A logical, indicats to combine sequential independent calls to with_columns.
collapse_joins Collapse a join and filters into a faster join.
no_optimization A logical. If TRUE, turn off (certain) optimizations.
engine The engine name to use for processing the query. One of the followings:
  • “auto” (default): Select the engine automatically. The “in-memory” engine will be selected for most cases.
  • “in-memory”: Use the in-memory engine.
  • “streaming”: [Experimental] Use the (new) streaming engine.
  • “old-streaming”: [Superseded] Use the old streaming engine.
streaming [Deprecated] A logical. If TRUE, process the query in batches to handle larger-than-memory data. If FALSE (default), the entire query is processed in a single batch. Note that streaming mode is considered unstable. It may be changed at any point without it being considered a breaking change.
\_check_order, \_type_check For internal use only.
\_eager A logical, indicates to turn off multi-node optimizations and the other optimizations. This option is intended for internal use only.

Value

A polars DataFrame

See Also

  • $profile() - same as $collect() but also returns a table with each operation profiled.
  • $sink_parquet() streams query to a parquet file.
  • $sink_ipc() streams query to a arrow file.

Examples

library("polars")

lf <- pl$LazyFrame(
  a = c("a", "b", "a", "b", "b", "c"),
  b = 1:6,
  c = 6:1,
)
lf$group_by("a")$agg(pl$all()$sum())$collect()
#> shape: (3, 3)
#> ┌─────┬─────┬─────┐
#> │ a   ┆ b   ┆ c   │
#> │ --- ┆ --- ┆ --- │
#> │ str ┆ i32 ┆ i32 │
#> ╞═════╪═════╪═════╡
#> │ c   ┆ 6   ┆ 1   │
#> │ a   ┆ 4   ┆ 10  │
#> │ b   ┆ 11  ┆ 10  │
#> └─────┴─────┴─────┘
# Collect in streaming mode
lf$group_by("a")$agg(pl$all()$sum())$collect(
  streaming = TRUE
)
#> shape: (3, 3)
#> ┌─────┬─────┬─────┐
#> │ a   ┆ b   ┆ c   │
#> │ --- ┆ --- ┆ --- │
#> │ str ┆ i32 ┆ i32 │
#> ╞═════╪═════╪═════╡
#> │ b   ┆ 11  ┆ 10  │
#> │ a   ┆ 4   ┆ 10  │
#> │ c   ┆ 6   ┆ 1   │
#> └─────┴─────┴─────┘