Skip to content

Drop all rows that contain NaN values

Description

The original order of the remaining rows is preserved.

Usage

<LazyFrame>$drop_nans(...)

Arguments

\<dynamic-dots\> Column name(s) for which null values are considered. If empty (default), use all columns (note that only floating-point columns can contain NaNs).

Value

A polars LazyFrame

Examples

library("polars")

lf <- pl$LazyFrame(
  foo = c(1, NaN, 2.5),
  bar = c(NaN, 110, 25.5),
  ham = c("a", "b", NA)
)

# The default behavior of this method is to drop rows where any single value
# of the row is null.
lf$drop_nans()$collect()
#> shape: (1, 3)
#> ┌─────┬──────┬──────┐
#> │ foo ┆ bar  ┆ ham  │
#> │ --- ┆ ---  ┆ ---  │
#> │ f64 ┆ f64  ┆ str  │
#> ╞═════╪══════╪══════╡
#> │ 2.5 ┆ 25.5 ┆ null │
#> └─────┴──────┴──────┘
# This behaviour can be constrained to consider only a subset of columns, as
# defined by name or with a selector. For example, dropping rows if there is
# a null in the "bar" column:
lf$drop_nans("bar")$collect()
#> shape: (2, 3)
#> ┌─────┬───────┬──────┐
#> │ foo ┆ bar   ┆ ham  │
#> │ --- ┆ ---   ┆ ---  │
#> │ f64 ┆ f64   ┆ str  │
#> ╞═════╪═══════╪══════╡
#> │ NaN ┆ 110.0 ┆ b    │
#> │ 2.5 ┆ 25.5  ┆ null │
#> └─────┴───────┴──────┘
# Dropping a row only if *all* values are NaN requires a different
# formulation:
df <- pl$LazyFrame(
  a = c(NaN, NaN, NaN, NaN),
  b = c(10.0, 2.5, NaN, 5.25),
  c = c(65.75, NaN, NaN, 10.5)
)
df$filter(!pl$all_horizontal(pl$all()$is_nan()))$collect()
#> shape: (3, 3)
#> ┌─────┬──────┬───────┐
#> │ a   ┆ b    ┆ c     │
#> │ --- ┆ ---  ┆ ---   │
#> │ f64 ┆ f64  ┆ f64   │
#> ╞═════╪══════╪═══════╡
#> │ NaN ┆ 10.0 ┆ 65.75 │
#> │ NaN ┆ 2.5  ┆ NaN   │
#> │ NaN ┆ 5.25 ┆ 10.5  │
#> └─────┴──────┴───────┘