Skip to content

Extract all capture groups for the given regex pattern

Description

Extract all capture groups for the given regex pattern

Usage

<Expr>$str$extract_groups(pattern)

Arguments

pattern A character of a valid regular expression pattern containing at least one capture group, compatible with the regex crate.

Details

All group names are strings. If your pattern contains unnamed groups, their numerical position is converted to a string. See examples.

Value

A polars expression

Examples

library("polars")

df <- pl$DataFrame(
  url = c(
    "http://vote.com/ballon_dor?candidate=messi&ref=python",
    "http://vote.com/ballon_dor?candidate=weghorst&ref=polars",
    "http://vote.com/ballon_dor?error=404&ref=rust"
  )
)

pattern <- r"(candidate=(?<candidate>\w+)&ref=(?<ref>\w+))"

df$with_columns(
  captures = pl$col("url")$str$extract_groups(pattern)
)$unnest("captures")
#> shape: (3, 3)
#> ┌─────────────────────────────────┬───────────┬────────┐
#> │ url                             ┆ candidate ┆ ref    │
#> │ ---                             ┆ ---       ┆ ---    │
#> │ str                             ┆ str       ┆ str    │
#> ╞═════════════════════════════════╪═══════════╪════════╡
#> │ http://vote.com/ballon_dor?can… ┆ messi     ┆ python │
#> │ http://vote.com/ballon_dor?can… ┆ weghorst  ┆ polars │
#> │ http://vote.com/ballon_dor?err… ┆ null      ┆ null   │
#> └─────────────────────────────────┴───────────┴────────┘
# If the groups are unnamed, their numerical position (as a string) is used:

pattern <- r"(candidate=(\w+)&ref=(\w+))"

df$with_columns(
  captures = pl$col("url")$str$extract_groups(pattern)
)$unnest("captures")
#> shape: (3, 3)
#> ┌─────────────────────────────────┬──────────┬────────┐
#> │ url                             ┆ 1        ┆ 2      │
#> │ ---                             ┆ ---      ┆ ---    │
#> │ str                             ┆ str      ┆ str    │
#> ╞═════════════════════════════════╪══════════╪════════╡
#> │ http://vote.com/ballon_dor?can… ┆ messi    ┆ python │
#> │ http://vote.com/ballon_dor?can… ┆ weghorst ┆ polars │
#> │ http://vote.com/ballon_dor?err… ┆ null     ┆ null   │
#> └─────────────────────────────────┴──────────┴────────┘