Drop duplicate rows

Description

Drop duplicate rows

Usage

<DataFrame>$unique(
  subset = NULL,
  ...,
  keep = c("any", "none", "first", "last"),
  maintain_order = FALSE
)

Arguments

subset Column name(s) or selector(s), to consider when identifying duplicate rows. If NULL (default), use all columns.

… These dots are for future extensions and must be empty.

keep

Which of the duplicate rows to keep. Must be one of:

“any”: does not give any guarantee of which row is kept. This allows more optimizations.
“none”: don’t keep duplicate rows.
“first”: keep first unique row.
“last”: keep last unique row.

maintain_order Keep the same order as the original data. This is more expensive to compute. Setting this to TRUE blocks the possibility to run on the streaming engine.

Value

A polars DataFrame

Examples

library("polars")

df <- pl$DataFrame(
  foo = c(1, 2, 3, 1),
  bar = c("a", "a", "a", "a"),
  ham = c("b", "b", "b", "b"),
)
df$unique(maintain_order = TRUE)

#> shape: (3, 3)
#> ┌─────┬─────┬─────┐
#> │ foo ┆ bar ┆ ham │
#> │ --- ┆ --- ┆ --- │
#> │ f64 ┆ str ┆ str │
#> ╞═════╪═════╪═════╡
#> │ 1.0 ┆ a   ┆ b   │
#> │ 2.0 ┆ a   ┆ b   │
#> │ 3.0 ┆ a   ┆ b   │
#> └─────┴─────┴─────┘

df$unique(subset = c("bar", "ham"), maintain_order = TRUE)

#> shape: (1, 3)
#> ┌─────┬─────┬─────┐
#> │ foo ┆ bar ┆ ham │
#> │ --- ┆ --- ┆ --- │
#> │ f64 ┆ str ┆ str │
#> ╞═════╪═════╪═════╡
#> │ 1.0 ┆ a   ┆ b   │
#> └─────┴─────┴─────┘

df$unique(keep = "last", maintain_order = TRUE)

#> shape: (3, 3)
#> ┌─────┬─────┬─────┐
#> │ foo ┆ bar ┆ ham │
#> │ --- ┆ --- ┆ --- │
#> │ f64 ┆ str ┆ str │
#> ╞═════╪═════╪═════╡
#> │ 2.0 ┆ a   ┆ b   │
#> │ 3.0 ┆ a   ┆ b   │
#> │ 1.0 ┆ a   ┆ b   │
#> └─────┴─────┴─────┘