Skip to content

Bin continuous values into discrete categories based on their quantiles

Description

[Experimental]

Usage

<Expr>$qcut(
  quantiles,
  ...,
  labels = NULL,
  left_closed = FALSE,
  allow_duplicates = FALSE,
  include_breaks = FALSE
)

Arguments

quantiles Either a vector of quantile probabilities between 0 and 1 or a positive integer determining the number of bins with uniform probability.
These dots are for future extensions and must be empty.
labels Names of the categories. The number of labels must be equal to the number of categories.
left_closed Set the intervals to be left-closed instead of right-closed.
allow_duplicates If TRUE, duplicates in the resulting quantiles are dropped, rather than raising an error. This can happen even with unique probabilities, depending on the data.
include_breaks Include a column with the right endpoint of the bin each observation falls in. This will change the data type of the output from a Categorical to a Struct.

Value

A polars expression

Examples

library("polars")

# Divide a column into three categories according to pre-defined quantile
# probabilities.
df <- pl$DataFrame(foo = -2:2)
df$with_columns(
  qcut = pl$col("foo")$qcut(c(0.25, 0.75), labels = c("a", "b", "c"))
)
#> shape: (5, 2)
#> ┌─────┬──────┐
#> │ foo ┆ qcut │
#> │ --- ┆ ---  │
#> │ i32 ┆ cat  │
#> ╞═════╪══════╡
#> │ -2  ┆ a    │
#> │ -1  ┆ a    │
#> │ 0   ┆ b    │
#> │ 1   ┆ b    │
#> │ 2   ┆ c    │
#> └─────┴──────┘
# Divide a column into two categories using uniform quantile probabilities.
df$with_columns(
  qcut = pl$col("foo")$qcut(2, labels = c("low", "high"), left_closed = TRUE)
)
#> shape: (5, 2)
#> ┌─────┬──────┐
#> │ foo ┆ qcut │
#> │ --- ┆ ---  │
#> │ i32 ┆ cat  │
#> ╞═════╪══════╡
#> │ -2  ┆ low  │
#> │ -1  ┆ low  │
#> │ 0   ┆ high │
#> │ 1   ┆ high │
#> │ 2   ┆ high │
#> └─────┴──────┘
# Add both the category and the breakpoint.
df$with_columns(
  qcut = pl$col("foo")$qcut(c(0.25, 0.75), include_breaks = TRUE)
)$unnest("qcut")
#> shape: (5, 3)
#> ┌─────┬────────────┬────────────┐
#> │ foo ┆ breakpoint ┆ category   │
#> │ --- ┆ ---        ┆ ---        │
#> │ i32 ┆ f64        ┆ cat        │
#> ╞═════╪════════════╪════════════╡
#> │ -2  ┆ -1.0       ┆ (-inf, -1] │
#> │ -1  ┆ -1.0       ┆ (-inf, -1] │
#> │ 0   ┆ 1.0        ┆ (-1, 1]    │
#> │ 1   ┆ 1.0        ┆ (-1, 1]    │
#> │ 2   ┆ inf        ┆ (1, inf]   │
#> └─────┴────────────┴────────────┘