Skip to content

Assign ranks to data, dealing with ties appropriately

Description

Assign ranks to data, dealing with ties appropriately

Usage

<Expr>$rank(
  method = c("average", "min", "max", "dense", "ordinal", "random"),
  ...,
  descending = FALSE,
  seed = NULL
)

Arguments

method The method used to assign ranks to tied elements. Must be one of the following:
  • “average” (default): The average of the ranks that would have been assigned to all the tied values is assigned to each value.
  • “min”: The minimum of the ranks that would have been assigned to all the tied values is assigned to each value. (This is also referred to as "competition" ranking.)
  • “max” : The maximum of the ranks that would have been assigned to all the tied values is assigned to each value.
  • “dense”: Like ‘min’, but the rank of the next highest element is assigned the rank immediately after those assigned to the tied elements.
  • “ordinal” : All values are given a distinct rank, corresponding to the order that the values occur in the Series.
  • “random” : Like ‘ordinal’, but the rank for ties is not dependent on the order that the values occur in the Series.
These dots are for future extensions and must be empty.
descending Rank in descending order.
seed Integer. Only used if method = “random”.

Value

A polars expression

Examples

library("polars")

# Default is to use the "average" method to break ties
df <- pl$DataFrame(a = c(3, 6, 1, 1, 6))
df$with_columns(rank = pl$col("a")$rank())
#> shape: (5, 2)
#> ┌─────┬──────┐
#> │ a   ┆ rank │
#> │ --- ┆ ---  │
#> │ f64 ┆ f64  │
#> ╞═════╪══════╡
#> │ 3.0 ┆ 3.0  │
#> │ 6.0 ┆ 4.5  │
#> │ 1.0 ┆ 1.5  │
#> │ 1.0 ┆ 1.5  │
#> │ 6.0 ┆ 4.5  │
#> └─────┴──────┘
# Ordinal method
df$with_columns(rank = pl$col("a")$rank("ordinal"))
#> shape: (5, 2)
#> ┌─────┬──────┐
#> │ a   ┆ rank │
#> │ --- ┆ ---  │
#> │ f64 ┆ u32  │
#> ╞═════╪══════╡
#> │ 3.0 ┆ 3    │
#> │ 6.0 ┆ 4    │
#> │ 1.0 ┆ 1    │
#> │ 1.0 ┆ 2    │
#> │ 6.0 ┆ 5    │
#> └─────┴──────┘
# Use "rank" with "over" to rank within groups:
df <- pl$DataFrame(
  a = c(1, 1, 2, 2, 2),
  b = c(6, 7, 5, 14, 11)
)
df$with_columns(
  rank = pl$col("b")$rank()$over("a")
)
#> shape: (5, 3)
#> ┌─────┬──────┬──────┐
#> │ a   ┆ b    ┆ rank │
#> │ --- ┆ ---  ┆ ---  │
#> │ f64 ┆ f64  ┆ f64  │
#> ╞═════╪══════╪══════╡
#> │ 1.0 ┆ 6.0  ┆ 1.0  │
#> │ 1.0 ┆ 7.0  ┆ 2.0  │
#> │ 2.0 ┆ 5.0  ┆ 1.0  │
#> │ 2.0 ┆ 14.0 ┆ 3.0  │
#> │ 2.0 ┆ 11.0 ┆ 2.0  │
#> └─────┴──────┴──────┘