Skip to content

Bin continuous values into discrete categories

Description

[Experimental]

Usage

<Expr>$cut(
  breaks,
  ...,
  labels = NULL,
  left_closed = FALSE,
  include_breaks = FALSE
)

Arguments

breaks List of unique cut points.
These dots are for future extensions and must be empty.
labels Names of the categories. The number of labels must be equal to the number of cut points plus one.
left_closed Set the intervals to be left-closed instead of right-closed.
include_breaks Include a column with the right endpoint of the bin each observation falls in. This will change the data type of the output from a Categorical to a Struct.

Value

A polars expression

Examples

library("polars")

# Divide a column into three categories.
df <- pl$DataFrame(foo = -2:2)
df$with_columns(
  cut = pl$col("foo")$cut(c(-1, 1), labels = c("a", "b", "c"))
)
#> shape: (5, 2)
#> ┌─────┬─────┐
#> │ foo ┆ cut │
#> │ --- ┆ --- │
#> │ i32 ┆ cat │
#> ╞═════╪═════╡
#> │ -2  ┆ a   │
#> │ -1  ┆ a   │
#> │ 0   ┆ b   │
#> │ 1   ┆ b   │
#> │ 2   ┆ c   │
#> └─────┴─────┘
# Add both the category and the breakpoint.
df$with_columns(
  cut = pl$col("foo")$cut(c(-1, 1), include_breaks = TRUE)
)$unnest("cut")
#> shape: (5, 3)
#> ┌─────┬────────────┬────────────┐
#> │ foo ┆ breakpoint ┆ category   │
#> │ --- ┆ ---        ┆ ---        │
#> │ i32 ┆ f64        ┆ cat        │
#> ╞═════╪════════════╪════════════╡
#> │ -2  ┆ -1.0       ┆ (-inf, -1] │
#> │ -1  ┆ -1.0       ┆ (-inf, -1] │
#> │ 0   ┆ 1.0        ┆ (-1, 1]    │
#> │ 1   ┆ 1.0        ┆ (-1, 1]    │
#> │ 2   ┆ inf        ┆ (1, inf]   │
#> └─────┴────────────┴────────────┘