Skip to content

Approximate count of unique values

Description

This is done using the HyperLogLog++ algorithm for cardinality estimation.

Usage

<Expr>$approx_n_unique()

Value

A polars expression

Examples

library("polars")

df <- pl$DataFrame(n = c(1, 1, 2))
df$select(pl$col("n")$approx_n_unique())
#> shape: (1, 1)
#> ┌─────┐
#> │ n   │
#> │ --- │
#> │ u32 │
#> ╞═════╡
#> │ 2   │
#> └─────┘
df <- pl$DataFrame(n = 0:1000)
df$select(
  exact = pl$col("n")$n_unique(),
  approx = pl$col("n")$approx_n_unique()
)
#> shape: (1, 2)
#> ┌───────┬────────┐
#> │ exact ┆ approx │
#> │ ---   ┆ ---    │
#> │ u32   ┆ u32    │
#> ╞═══════╪════════╡
#> │ 1001  ┆ 1005   │
#> └───────┴────────┘