Skip to content

Convert the Series of type List to a Series of type Struct

Description

Convert the Series of type List to a Series of type Struct

Usage

<Expr>$list$to_struct(
  n_field_strategy = c("first_non_null", "max_width"),
  fields = NULL,
  upper_bound = NULL
)

Arguments

n_field_strategy One of “first_non_null” or “max_width”. Strategy to determine the number of fields of the struct.
  • “first_non_null” (default): Set number of fields equal to the length of the first non zero-length sublist.
  • “max_width”: Set number of fields as max length of all sublists.
If the field argument is character, this argument will be ignored.
fields [Experimental] NULL (default) or character vector of field names, or a function that takes an integer index and returns character. If the name and number of the desired fields is known in advance, character vector of field names can be given, which will be assigned by index. Otherwise, to dynamically assign field names, a custom function can be used; if neither are set, fields will be field_0, field_1… See the examples for details.
upper_bound Single positive integer value or NULL (default). A LazyFrame needs to know the schema at all times, so the caller must provide an upper bound of the number of struct fields that will be created; if set incorrectly, subsequent operations may fail. When operating on a DataFrame, the schema does not need to be tracked or pre-determined, as the result will be eagerly evaluated, so this argument can be NULL. If the fields argument is character, this argument will be ignored.

Details

It is recommended to set upper_bound to the correct output size of the struct. If this is not set, Polars will not know the output type of this operation and will set it to Unknown which can lead to errors because Polars is not able to resolve the query.

For performance reasons, the length of the first non-null sublist is used to determine the number of output fields by default. If the sublists can be of different lengths then n_field_strategy=“max_width” must be used to obtain the expected result.

Value

A polars expression

Examples

library("polars")

df <- pl$DataFrame(n = list(c(0, 1), c(0, 1, 2)))

# Convert list to struct with default field name assignment:

# This will become a struct with 2 fields.
df$select(pl$col("n")$list$to_struct())$unnest("n")
#> shape: (2, 2)
#> ┌─────────┬─────────┐
#> │ field_0 ┆ field_1 │
#> │ ---     ┆ ---     │
#> │ f64     ┆ f64     │
#> ╞═════════╪═════════╡
#> │ 0.0     ┆ 1.0     │
#> │ 0.0     ┆ 1.0     │
#> └─────────┴─────────┘
# As the shorter sublist comes first,
# we must use the max_width strategy to force a search for the longest.
# This will become a struct with 3 fields.
df$select(
  pl$col("n")$list$to_struct(n_field_strategy = "max_width")
)$unnest("n")
#> shape: (2, 3)
#> ┌─────────┬─────────┬─────────┐
#> │ field_0 ┆ field_1 ┆ field_2 │
#> │ ---     ┆ ---     ┆ ---     │
#> │ f64     ┆ f64     ┆ f64     │
#> ╞═════════╪═════════╪═════════╡
#> │ 0.0     ┆ 1.0     ┆ null    │
#> │ 0.0     ┆ 1.0     ┆ 2.0     │
#> └─────────┴─────────┴─────────┘
# Convert list to struct with field name assignment by
# function/index:
df$select(
  pl$col("n")$list$to_struct(
    fields = \(idx) paste0("n", idx + 1),
    n_field_strategy = "max_width"
  )
)$unnest("n")
#> shape: (2, 3)
#> ┌─────┬─────┬──────┐
#> │ n1  ┆ n2  ┆ n3   │
#> │ --- ┆ --- ┆ ---  │
#> │ f64 ┆ f64 ┆ f64  │
#> ╞═════╪═════╪══════╡
#> │ 0.0 ┆ 1.0 ┆ null │
#> │ 0.0 ┆ 1.0 ┆ 2.0  │
#> └─────┴─────┴──────┘
# Convert list to struct with field name assignment by
# index from a list of names:
df$select(pl$col("n")$list$to_struct(
  fields = c("one", "two", "three"))
)$unnest("n")
#> shape: (2, 3)
#> ┌─────┬─────┬───────┐
#> │ one ┆ two ┆ three │
#> │ --- ┆ --- ┆ ---   │
#> │ f64 ┆ f64 ┆ f64   │
#> ╞═════╪═════╪═══════╡
#> │ 0.0 ┆ 1.0 ┆ null  │
#> │ 0.0 ┆ 1.0 ┆ 2.0   │
#> └─────┴─────┴───────┘