Skip to content

Create rolling groups based on a date/time or integer column

Description

Different from group_by_dynamic(), the windows are now determined by the individual values and are not of constant intervals. For constant intervals use group_by_dynamic().

If you have a time series <t_0, t_1, …, t_n>, then by default the windows created will be:

  • (t_0 - period, t_0\]
  • (t_1 - period, t_1\]
  • (t_n - period, t_n\]

whereas if you pass a non-default offset, then the windows will be:

  • (t_0 + offset, t_0 + offset + period\]
  • (t_1 + offset, t_1 + offset + period\]
  • (t_n + offset, t_n + offset + period\]

Usage

<LazyFrame>$rolling(
  index_column,
  ...,
  period,
  offset = NULL,
  closed = c("right", "left", "both", "none"),
  group_by = NULL
)

Arguments

index_column Column used to group based on the time window. Often of type Date/Datetime. This column must be sorted in ascending order (or, if group_by is specified, then it must be sorted in ascending order within each group). In case of a dynamic group by on indices, the data type needs to be either Int32 or In64. Note that Int32 gets temporarily cast to Int64, so if performance matters, use an Int64 column.
These dots are for future extensions and must be empty.
period Length of the window - must be non-negative.
offset Offset of the window. Default is -period.
closed Define which sides of the interval are closed (inclusive). Default is “left”.
group_by Also group by this column/these columns. Can be expressions or objects coercible to expressions.

Details

If you want to compute multiple aggregation statistics over the same dynamic window, consider using $rolling() - this method can cache the window size computation.

Value

An object of class polars_lazy_group_by

See Also

  • \$group_by_dynamic()

Examples

library("polars")

dates <- c(
  "2020-01-01 13:45:48",
  "2020-01-01 16:42:13",
  "2020-01-01 16:45:09",
  "2020-01-02 18:12:48",
  "2020-01-03 19:45:32",
  "2020-01-08 23:16:43"
)

df <- pl$LazyFrame(dt = dates, a = c(3, 7, 5, 9, 2, 1))$with_columns(
  pl$col("dt")$str$strptime(pl$Datetime())
)

df$rolling(index_column = "dt", period = "2d")$agg(
  sum_a = pl$col("a")$sum(),
  min_a = pl$col("a")$min(),
  max_a = pl$col("a")$max()
)$collect()
#> shape: (6, 4)
#> ┌─────────────────────┬───────┬───────┬───────┐
#> │ dt                  ┆ sum_a ┆ min_a ┆ max_a │
#> │ ---                 ┆ ---   ┆ ---   ┆ ---   │
#> │ datetime[μs]        ┆ f64   ┆ f64   ┆ f64   │
#> ╞═════════════════════╪═══════╪═══════╪═══════╡
#> │ 2020-01-01 13:45:48 ┆ 3.0   ┆ 3.0   ┆ 3.0   │
#> │ 2020-01-01 16:42:13 ┆ 10.0  ┆ 3.0   ┆ 7.0   │
#> │ 2020-01-01 16:45:09 ┆ 15.0  ┆ 3.0   ┆ 7.0   │
#> │ 2020-01-02 18:12:48 ┆ 24.0  ┆ 3.0   ┆ 9.0   │
#> │ 2020-01-03 19:45:32 ┆ 11.0  ┆ 2.0   ┆ 9.0   │
#> │ 2020-01-08 23:16:43 ┆ 1.0   ┆ 1.0   ┆ 1.0   │
#> └─────────────────────┴───────┴───────┴───────┘