Skip to content

Use the Aho-Corasick algorithm to extract matches

Description

Use the Aho-Corasick algorithm to extract matches

Usage

<Expr>$str$extract_many(
  patterns,
  ...,
  ascii_case_insensitive = FALSE,
  overlapping = FALSE
)

Arguments

patterns String patterns to search. This can be an Expr or something coercible to an Expr. Strings are parsed as column names.
These dots are for future extensions and must be empty.
ascii_case_insensitive Enable ASCII-aware case insensitive matching. When this option is enabled, searching will be performed without respect to case for ASCII letters (a-z and A-Z) only.
overlapping Whether matches can overlap.

Value

A polars expression

Examples

library("polars")

df <- pl$DataFrame(values = "discontent")
patterns <- pl$lit(c("winter", "disco", "onte", "discontent"))

df$with_columns(
  matches = pl$col("values")$str$extract_many(patterns),
  matches_overlap = pl$col("values")$str$extract_many(patterns, overlapping = TRUE)
)
#> shape: (1, 3)
#> ┌────────────┬───────────┬─────────────────────────────────┐
#> │ values     ┆ matches   ┆ matches_overlap                 │
#> │ ---        ┆ ---       ┆ ---                             │
#> │ str        ┆ list[str] ┆ list[str]                       │
#> ╞════════════╪═══════════╪═════════════════════════════════╡
#> │ discontent ┆ ["disco"] ┆ ["disco", "onte", "discontent"… │
#> └────────────┴───────────┴─────────────────────────────────┘
df <- pl$DataFrame(
  values = c("discontent", "rhapsody"),
  patterns = list(c("winter", "disco", "onte", "discontent"), c("rhap", "ody", "coalesce"))
)

df$select(pl$col("values")$str$extract_many("patterns"))
#> shape: (2, 1)
#> ┌─────────────────┐
#> │ values          │
#> │ ---             │
#> │ list[str]       │
#> ╞═════════════════╡
#> │ ["disco"]       │
#> │ ["rhap", "ody"] │
#> └─────────────────┘