Creates a summary of statistics for a LazyFrame, returning a DataFrame.
Description
This method does not maintain the laziness of the frame, and will collect the final result. This could potentially be an expensive operation.
We do not guarantee the output of describe()
to be stable.
It will show statistics that we deem informative, and may be updated in
the future. Using describe()
programmatically (versus
interactive exploration) is not recommended for this reason.
Usage
<LazyFrame>$describe(
percentiles = c(0.25, 0.5, 0.75),
...,
interpolation = c("nearest", "higher", "lower", "midpoint", "linear")
)
Arguments
percentiles
|
One or more percentiles to include in the summary statistics. All values
must be in the range \[0; 1\] .
|
…
|
These dots are for future extensions and must be empty. |
interpolation
|
Interpolation method for computing quantiles. Must be one of
“nearest” , “higher” , “lower” ,
“midpoint” , or “linear” .
|
Details
The median is included by default as the 50% percentile.
Value
A polars DataFrame
Examples
library("polars")
lf <- pl$LazyFrame(
int = 1:3,
float = c(0.5, NA, 2.5),
string = c(letters[1:2], NA),
date = c(as.Date("2024-01-20"), as.Date("2024-01-21"), NA),
cat = factor(c(letters[1:2], NA)),
bool = c(TRUE, FALSE, NA)
)
lf$collect()
#> shape: (3, 6)
#> ┌─────┬───────┬────────┬────────────┬──────┬───────┐
#> │ int ┆ float ┆ string ┆ date ┆ cat ┆ bool │
#> │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
#> │ i32 ┆ f64 ┆ str ┆ date ┆ cat ┆ bool │
#> ╞═════╪═══════╪════════╪════════════╪══════╪═══════╡
#> │ 1 ┆ 0.5 ┆ a ┆ 2024-01-20 ┆ a ┆ true │
#> │ 2 ┆ null ┆ b ┆ 2024-01-21 ┆ b ┆ false │
#> │ 3 ┆ 2.5 ┆ null ┆ null ┆ null ┆ null │
#> └─────┴───────┴────────┴────────────┴──────┴───────┘
#> shape: (9, 7)
#> ┌────────────┬─────┬──────────┬────────┬─────────────────────────┬──────┬──────┐
#> │ statistic ┆ int ┆ float ┆ string ┆ date ┆ cat ┆ bool │
#> │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
#> │ str ┆ f64 ┆ f64 ┆ str ┆ str ┆ str ┆ f64 │
#> ╞════════════╪═════╪══════════╪════════╪═════════════════════════╪══════╪══════╡
#> │ count ┆ 3.0 ┆ 2.0 ┆ 2 ┆ 2 ┆ 2 ┆ 2.0 │
#> │ null_count ┆ 0.0 ┆ 1.0 ┆ 1 ┆ 1 ┆ 1 ┆ 1.0 │
#> │ mean ┆ 2.0 ┆ 1.5 ┆ null ┆ 2024-01-20 12:00:00.000 ┆ null ┆ 0.5 │
#> │ std ┆ 1.0 ┆ 1.414214 ┆ null ┆ null ┆ null ┆ null │
#> │ min ┆ 1.0 ┆ 0.5 ┆ a ┆ 2024-01-20 ┆ a ┆ 0.0 │
#> │ 25% ┆ 2.0 ┆ 0.5 ┆ null ┆ 2024-01-20 ┆ null ┆ null │
#> │ 50% ┆ 2.0 ┆ 2.5 ┆ null ┆ 2024-01-21 ┆ null ┆ null │
#> │ 75% ┆ 3.0 ┆ 2.5 ┆ null ┆ 2024-01-21 ┆ null ┆ null │
#> │ max ┆ 3.0 ┆ 2.5 ┆ b ┆ 2024-01-21 ┆ b ┆ 1.0 │
#> └────────────┴─────┴──────────┴────────┴─────────────────────────┴──────┴──────┘
# Customize which percentiles are displayed, applying linear interpolation:
lf$describe(
percentiles = c(0.1, 0.3, 0.5, 0.7, 0.9),
interpolation = "linear"
)
#> shape: (11, 7)
#> ┌────────────┬─────┬──────────┬────────┬─────────────────────────┬──────┬──────┐
#> │ statistic ┆ int ┆ float ┆ string ┆ date ┆ cat ┆ bool │
#> │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
#> │ str ┆ f64 ┆ f64 ┆ str ┆ str ┆ str ┆ f64 │
#> ╞════════════╪═════╪══════════╪════════╪═════════════════════════╪══════╪══════╡
#> │ count ┆ 3.0 ┆ 2.0 ┆ 2 ┆ 2 ┆ 2 ┆ 2.0 │
#> │ null_count ┆ 0.0 ┆ 1.0 ┆ 1 ┆ 1 ┆ 1 ┆ 1.0 │
#> │ mean ┆ 2.0 ┆ 1.5 ┆ null ┆ 2024-01-20 12:00:00.000 ┆ null ┆ 0.5 │
#> │ std ┆ 1.0 ┆ 1.414214 ┆ null ┆ null ┆ null ┆ null │
#> │ min ┆ 1.0 ┆ 0.5 ┆ a ┆ 2024-01-20 ┆ a ┆ 0.0 │
#> │ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … │
#> │ 30% ┆ 1.6 ┆ 1.1 ┆ null ┆ 2024-01-20 ┆ null ┆ null │
#> │ 50% ┆ 2.0 ┆ 1.5 ┆ null ┆ 2024-01-20 ┆ null ┆ null │
#> │ 70% ┆ 2.4 ┆ 1.9 ┆ null ┆ 2024-01-20 ┆ null ┆ null │
#> │ 90% ┆ 2.8 ┆ 2.3 ┆ null ┆ 2024-01-20 ┆ null ┆ null │
#> │ max ┆ 3.0 ┆ 2.5 ┆ b ┆ 2024-01-21 ┆ b ┆ 1.0 │
#> └────────────┴─────┴──────────┴────────┴─────────────────────────┴──────┴──────┘