Assign ranks to data, dealing with ties appropriately

Description

Usage

<Expr>$rank(
  method = c("average", "min", "max", "dense", "ordinal", "random"),
  ...,
  descending = FALSE,
  seed = NULL
)

Arguments

method

The method used to assign ranks to tied elements. Must be one of the following:

“average” (default): The average of the ranks that would have been assigned to all the tied values is assigned to each value.
“min”: The minimum of the ranks that would have been assigned to all the tied values is assigned to each value. (This is also referred to as "competition" ranking.)
“max” : The maximum of the ranks that would have been assigned to all the tied values is assigned to each value.
“dense”: Like ‘min’, but the rank of the next highest element is assigned the rank immediately after those assigned to the tied elements.
“ordinal” : All values are given a distinct rank, corresponding to the order that the values occur in the Series.
“random” : Like ‘ordinal’, but the rank for ties is not dependent on the order that the values occur in the Series.

… These dots are for future extensions and must be empty.

descending Rank in descending order.

seed Integer. Only used if method = “random”.

Value

A polars expression

Examples

library("polars")

# Default is to use the "average" method to break ties
df <- pl$DataFrame(a = c(3, 6, 1, 1, 6))
df$with_columns(rank = pl$col("a")$rank())

#> shape: (5, 2)
#> ┌─────┬──────┐
#> │ a   ┆ rank │
#> │ --- ┆ ---  │
#> │ f64 ┆ f64  │
#> ╞═════╪══════╡
#> │ 3.0 ┆ 3.0  │
#> │ 6.0 ┆ 4.5  │
#> │ 1.0 ┆ 1.5  │
#> │ 1.0 ┆ 1.5  │
#> │ 6.0 ┆ 4.5  │
#> └─────┴──────┘

# Ordinal method
df$with_columns(rank = pl$col("a")$rank("ordinal"))

#> shape: (5, 2)
#> ┌─────┬──────┐
#> │ a   ┆ rank │
#> │ --- ┆ ---  │
#> │ f64 ┆ u32  │
#> ╞═════╪══════╡
#> │ 3.0 ┆ 3    │
#> │ 6.0 ┆ 4    │
#> │ 1.0 ┆ 1    │
#> │ 1.0 ┆ 2    │
#> │ 6.0 ┆ 5    │
#> └─────┴──────┘

# Use "rank" with "over" to rank within groups:
df <- pl$DataFrame(
  a = c(1, 1, 2, 2, 2),
  b = c(6, 7, 5, 14, 11)
)
df$with_columns(
  rank = pl$col("b")$rank()$over("a")
)

#> shape: (5, 3)
#> ┌─────┬──────┬──────┐
#> │ a   ┆ b    ┆ rank │
#> │ --- ┆ ---  ┆ ---  │
#> │ f64 ┆ f64  ┆ f64  │
#> ╞═════╪══════╪══════╡
#> │ 1.0 ┆ 6.0  ┆ 1.0  │
#> │ 1.0 ┆ 7.0  ┆ 2.0  │
#> │ 2.0 ┆ 5.0  ┆ 1.0  │
#> │ 2.0 ┆ 14.0 ┆ 3.0  │
#> │ 2.0 ┆ 11.0 ┆ 2.0  │
#> └─────┴──────┴──────┘