Skip to content

Get a distinct integer ID for each run of identical values

Description

The ID starts at 0 and increases by one each time the value of the column changes.

Usage

<Expr>$rle_id()

Details

This functionality is especially useful for defining a new group for every time a column’s value changes, rather than for every distinct value of that column.

Value

A polars expression

Examples

library("polars")

df <- pl$DataFrame(
  a = c(1, 2, 1, 1, 1),
  b = c("x", "x", NA, "y", "y")
)

df$with_columns(
  rle_id_a = pl$col("a")$rle_id(),
  rle_id_ab = pl$struct("a", "b")$rle_id()
)
#> shape: (5, 4)
#> ┌─────┬──────┬──────────┬───────────┐
#> │ a   ┆ b    ┆ rle_id_a ┆ rle_id_ab │
#> │ --- ┆ ---  ┆ ---      ┆ ---       │
#> │ f64 ┆ str  ┆ u32      ┆ u32       │
#> ╞═════╪══════╪══════════╪═══════════╡
#> │ 1.0 ┆ x    ┆ 0        ┆ 0         │
#> │ 2.0 ┆ x    ┆ 1        ┆ 1         │
#> │ 1.0 ┆ null ┆ 2        ┆ 2         │
#> │ 1.0 ┆ y    ┆ 2        ┆ 3         │
#> │ 1.0 ┆ y    ┆ 2        ┆ 3         │
#> └─────┴──────┴──────────┴───────────┘