Skip to content

Drop duplicate rows

Description

Drop duplicate rows

Usage

<DataFrame>$unique(
  subset = NULL,
  ...,
  keep = c("any", "none", "first", "last"),
  maintain_order = FALSE
)

Arguments

subset Column name(s) or selector(s), to consider when identifying duplicate rows. If NULL (default), use all columns.
These dots are for future extensions and must be empty.
keep Which of the duplicate rows to keep. Must be one of:
  • “any”: does not give any guarantee of which row is kept. This allows more optimizations.
  • “none”: don’t keep duplicate rows.
  • “first”: keep first unique row.
  • “last”: keep last unique row.
maintain_order Keep the same order as the original data. This is more expensive to compute. Setting this to TRUE blocks the possibility to run on the streaming engine.

Value

A polars DataFrame

Examples

library("polars")

df <- pl$DataFrame(
  foo = c(1, 2, 3, 1),
  bar = c("a", "a", "a", "a"),
  ham = c("b", "b", "b", "b"),
)
df$unique(maintain_order = TRUE)
#> shape: (3, 3)
#> ┌─────┬─────┬─────┐
#> │ foo ┆ bar ┆ ham │
#> │ --- ┆ --- ┆ --- │
#> │ f64 ┆ str ┆ str │
#> ╞═════╪═════╪═════╡
#> │ 1.0 ┆ a   ┆ b   │
#> │ 2.0 ┆ a   ┆ b   │
#> │ 3.0 ┆ a   ┆ b   │
#> └─────┴─────┴─────┘
df$unique(subset = c("bar", "ham"), maintain_order = TRUE)
#> shape: (1, 3)
#> ┌─────┬─────┬─────┐
#> │ foo ┆ bar ┆ ham │
#> │ --- ┆ --- ┆ --- │
#> │ f64 ┆ str ┆ str │
#> ╞═════╪═════╪═════╡
#> │ 1.0 ┆ a   ┆ b   │
#> └─────┴─────┴─────┘
df$unique(keep = "last", maintain_order = TRUE)
#> shape: (3, 3)
#> ┌─────┬─────┬─────┐
#> │ foo ┆ bar ┆ ham │
#> │ --- ┆ --- ┆ --- │
#> │ f64 ┆ str ┆ str │
#> ╞═════╪═════╪═════╡
#> │ 2.0 ┆ a   ┆ b   │
#> │ 3.0 ┆ a   ┆ b   │
#> │ 1.0 ┆ a   ┆ b   │
#> └─────┴─────┴─────┘