Skip to content

Join LazyFrames

Description

This function can do both mutating joins (adding columns based on matching observations, for example with how = “left”) and filtering joins (keeping observations based on matching observations, for example with how = “inner”).

Usage

<LazyFrame>$join(
  other,
  on = NULL,
  how = c("inner", "full", "left", "right", "semi", "anti", "cross"),
  ...,
  left_on = NULL,
  right_on = NULL,
  suffix = "_right",
  validate = c("m:m", "1:m", "m:1", "1:1"),
  join_nulls = FALSE,
  maintain_order = c("none", "left", "right", "left_right", "right_left"),
  allow_parallel = TRUE,
  force_parallel = FALSE,
  coalesce = NULL
)

Arguments

other LazyFrame to join with.
on Either a vector of column names or a list of expressions and/or strings. Use left_on and right_on if the column names to match on are different between the two LazyFrames.
how One of the following methods:
  • "inner": returns rows that have matching values in both tables
  • "left": returns all rows from the left table, and the matched rows from the right table
  • "right": returns all rows from the right table, and the matched rows from the left table
  • "full": returns all rows when there is a match in either left or right table
  • "cross": returns the Cartesian product of rows from both tables
  • "semi": returns rows from the left table that have a match in the right table.
  • "anti": returns rows from the left table that have no match in the right table.
These dots are for future extensions and must be empty.
left_on, right_on Same as on but only for the left or the right DataFrame. They must have the same length.
suffix Suffix to add to duplicated column names.
validate Checks if join is of specified type:
  • “m:m” (default): many-to-many, doesn’t perform any checks;
  • “1:1”: one-to-one, check if join keys are unique in both left and right datasets;
  • “1:m”: one-to-many, check if join keys are unique in left dataset
  • “m:1”: many-to-one, check if join keys are unique in right dataset
Note that this is currently not supported by the streaming engine.
join_nulls Join on null values. By default null values will never produce matches.
maintain_order Which frame row order to preserve, if any. Do not rely on any observed ordering without explicitly setting this parameter, as your code may break in a future release. Not specifying any ordering can improve performance. Supported for inner, left, right and full joins.
  • “none”: No specific ordering is desired. The ordering might differ across Polars versions or even between different runs.
  • “left”: Preserves the order of the left frame.
  • “right”: Preserves the order of the right frame.
  • “left_right”: First preserves the order of the left frame, then the right.
  • “right_left”: First preserves the order of the right frame, then the left.
allow_parallel Allow the physical plan to optionally evaluate the computation of both LazyFrames up to the join in parallel.
force_parallel Force the physical plan to evaluate the computation of both LazyFrames up to the join in parallel.
coalesce Coalescing behavior (merging of join columns).
  • NULL: join specific.
  • TRUE: Always coalesce join columns.
  • FALSE: Never coalesce join columns. Note that joining on any other expressions than col will turn off coalescing.

Value

A polars LazyFrame

Examples

library("polars")

lf <- pl$LazyFrame(
  foo = 1:3,
  bar = c(6, 7, 8),
  ham = c("a", "b", "c")
)
other_lf <- pl$LazyFrame(
  apple = c("x", "y", "z"),
  ham = c("a", "b", "d")
)
lf$join(other_lf, on = "ham")$collect()
#> shape: (2, 4)
#> ┌─────┬─────┬─────┬───────┐
#> │ foo ┆ bar ┆ ham ┆ apple │
#> │ --- ┆ --- ┆ --- ┆ ---   │
#> │ i32 ┆ f64 ┆ str ┆ str   │
#> ╞═════╪═════╪═════╪═══════╡
#> │ 1   ┆ 6.0 ┆ a   ┆ x     │
#> │ 2   ┆ 7.0 ┆ b   ┆ y     │
#> └─────┴─────┴─────┴───────┘
lf$join(other_lf, on = "ham", how = "full")$collect()
#> shape: (4, 5)
#> ┌──────┬──────┬──────┬───────┬───────────┐
#> │ foo  ┆ bar  ┆ ham  ┆ apple ┆ ham_right │
#> │ ---  ┆ ---  ┆ ---  ┆ ---   ┆ ---       │
#> │ i32  ┆ f64  ┆ str  ┆ str   ┆ str       │
#> ╞══════╪══════╪══════╪═══════╪═══════════╡
#> │ 1    ┆ 6.0  ┆ a    ┆ x     ┆ a         │
#> │ 2    ┆ 7.0  ┆ b    ┆ y     ┆ b         │
#> │ null ┆ null ┆ null ┆ z     ┆ d         │
#> │ 3    ┆ 8.0  ┆ c    ┆ null  ┆ null      │
#> └──────┴──────┴──────┴───────┴───────────┘
lf$join(other_lf, on = "ham", how = "left", coalesce = TRUE)$collect()
#> shape: (3, 4)
#> ┌─────┬─────┬─────┬───────┐
#> │ foo ┆ bar ┆ ham ┆ apple │
#> │ --- ┆ --- ┆ --- ┆ ---   │
#> │ i32 ┆ f64 ┆ str ┆ str   │
#> ╞═════╪═════╪═════╪═══════╡
#> │ 1   ┆ 6.0 ┆ a   ┆ x     │
#> │ 2   ┆ 7.0 ┆ b   ┆ y     │
#> │ 3   ┆ 8.0 ┆ c   ┆ null  │
#> └─────┴─────┴─────┴───────┘
lf$join(other_lf, on = "ham", how = "semi")$collect()
#> shape: (2, 3)
#> ┌─────┬─────┬─────┐
#> │ foo ┆ bar ┆ ham │
#> │ --- ┆ --- ┆ --- │
#> │ i32 ┆ f64 ┆ str │
#> ╞═════╪═════╪═════╡
#> │ 1   ┆ 6.0 ┆ a   │
#> │ 2   ┆ 7.0 ┆ b   │
#> └─────┴─────┴─────┘
lf$join(other_lf, on = "ham", how = "anti")$collect()
#> shape: (1, 3)
#> ┌─────┬─────┬─────┐
#> │ foo ┆ bar ┆ ham │
#> │ --- ┆ --- ┆ --- │
#> │ i32 ┆ f64 ┆ str │
#> ╞═════╪═════╪═════╡
#> │ 3   ┆ 8.0 ┆ c   │
#> └─────┴─────┴─────┘