Skip to content

Polars DataFrame class (polars_data_frame)

Description

DataFrames are two-dimensional data structure representing data as a table with rows and columns. Polars DataFrames are similar to R Data Frames. R Data Frame’s columns are R vectors, while Polars DataFrame’s columns are Polars Series.

Usage

pl$DataFrame(..., .schema_overrides = NULL, .strict = TRUE)

Arguments

\<dynamic-dots\> Name-value pairs of objects to be converted to polars Series by the as_polars_series() function. Each Series will be used as a column of the DataFrame. All values must be the same length or length 1. Each name will be used as the column name. If the name is empty, the original name of the Series will be used.
.schema_overrides [Experimental] A list of polars data types or NULL (default). Passed to the $cast() method as dynamic-dots.
.strict [Experimental] A logical value. Passed to the $cast() method’s .strict argument.

Details

The pl$DataFrame() function mimics the constructor of the DataFrame class of Python Polars. This function is basically a shortcut for as_polars_df(list(…))$cast(!!!.schema_overrides, .strict = .strict), so each argument in is converted to a Polars Series by as_polars_series() and then passed to as_polars_df().

Value

A polars DataFrame

Active bindings

  • columns: $columns returns a character vector with the names of the columns.
  • dtypes: $dtypes returns a nameless list of the data type of each column.
  • schema: $schema returns a named list with the column names as names and the data types as values.
  • shape: $shape returns a integer vector of length two with the number of rows and columns of the DataFrame.
  • height: $height returns a integer with the number of rows of the DataFrame.
  • width: $width returns a integer with the number of columns of the DataFrame.
  • flags: $flags returns a list with column names as names and a named logical vector with the flags as values.

Flags

Flags are used internally to avoid doing unnecessary computations, such as sorting a variable that we know is already sorted. The number of flags varies depending on the column type: columns of type array and list have the flags SORTED_ASC, SORTED_DESC, and FAST_EXPLODE, while other column types only have the former two.

  • SORTED_ASC is set to TRUE when we sort a column in increasing order, so that we can use this information later on to avoid re-sorting it.
  • SORTED_DESC is similar but applies to sort in decreasing order.

Examples

library("polars")

# Constructing a DataFrame from vectors:
pl$DataFrame(a = 1:2, b = 3:4)
#> shape: (2, 2)
#> ┌─────┬─────┐
#> │ a   ┆ b   │
#> │ --- ┆ --- │
#> │ i32 ┆ i32 │
#> ╞═════╪═════╡
#> │ 1   ┆ 3   │
#> │ 2   ┆ 4   │
#> └─────┴─────┘
# Constructing a DataFrame from Series:
pl$DataFrame(pl$Series("a", 1:2), pl$Series("b", 3:4))
#> shape: (2, 2)
#> ┌─────┬─────┐
#> │ a   ┆ b   │
#> │ --- ┆ --- │
#> │ i32 ┆ i32 │
#> ╞═════╪═════╡
#> │ 1   ┆ 3   │
#> │ 2   ┆ 4   │
#> └─────┴─────┘
# Constructing a DataFrame from a list:
data <- list(a = 1:2, b = 3:4)

# Using the as_polars_df function (recommended)
as_polars_df(data)
#> shape: (2, 2)
#> ┌─────┬─────┐
#> │ a   ┆ b   │
#> │ --- ┆ --- │
#> │ i32 ┆ i32 │
#> ╞═════╪═════╡
#> │ 1   ┆ 3   │
#> │ 2   ┆ 4   │
#> └─────┴─────┘
# Using dynamic dots feature
pl$DataFrame(!!!data)
#> shape: (2, 2)
#> ┌─────┬─────┐
#> │ a   ┆ b   │
#> │ --- ┆ --- │
#> │ i32 ┆ i32 │
#> ╞═════╪═════╡
#> │ 1   ┆ 3   │
#> │ 2   ┆ 4   │
#> └─────┴─────┘
# Active bindings:
df <- pl$DataFrame(a = 1:3, b = c("foo", "bar", "baz"))

df$columns
#> [1] "a" "b"
df$dtypes
#> [[1]]
#> Int32
#> 
#> [[2]]
#> String
df$schema
#> $a
#> Int32
#> 
#> $b
#> String
df$shape
#> [1] 3 2
df$height
#> [1] 3
df$width
#> [1] 2