Drop all rows that contain NaN values
Description
The original order of the remaining rows is preserved.
Usage
<DataFrame>$drop_nans(...)
Arguments
…
|
\<dynamic-dots \> Column name(s) for which null values are
considered. If empty (default), use all columns (note that only
floating-point columns can contain NaN s).
|
Value
A polars DataFrame
Examples
library("polars")
df <- pl$DataFrame(
foo = c(1, NaN, 2.5),
bar = c(NaN, 110, 25.5),
ham = c("a", "b", NA)
)
# The default behavior of this method is to drop rows where any single value
# of the row is null.
df$drop_nans()
#> shape: (1, 3)
#> ┌─────┬──────┬──────┐
#> │ foo ┆ bar ┆ ham │
#> │ --- ┆ --- ┆ --- │
#> │ f64 ┆ f64 ┆ str │
#> ╞═════╪══════╪══════╡
#> │ 2.5 ┆ 25.5 ┆ null │
#> └─────┴──────┴──────┘
# This behaviour can be constrained to consider only a subset of columns, as
# defined by name or with a selector. For example, dropping rows if there is
# a null in the "bar" column:
df$drop_nans("bar")
#> shape: (2, 3)
#> ┌─────┬───────┬──────┐
#> │ foo ┆ bar ┆ ham │
#> │ --- ┆ --- ┆ --- │
#> │ f64 ┆ f64 ┆ str │
#> ╞═════╪═══════╪══════╡
#> │ NaN ┆ 110.0 ┆ b │
#> │ 2.5 ┆ 25.5 ┆ null │
#> └─────┴───────┴──────┘
# Dropping a row only if *all* values are NaN requires a different
# formulation:
df <- pl$DataFrame(
a = c(NaN, NaN, NaN, NaN),
b = c(10.0, 2.5, NaN, 5.25),
c = c(65.75, NaN, NaN, 10.5)
)
df$filter(!pl$all_horizontal(pl$all()$is_nan()))
#> shape: (3, 3)
#> ┌─────┬──────┬───────┐
#> │ a ┆ b ┆ c │
#> │ --- ┆ --- ┆ --- │
#> │ f64 ┆ f64 ┆ f64 │
#> ╞═════╪══════╪═══════╡
#> │ NaN ┆ 10.0 ┆ 65.75 │
#> │ NaN ┆ 2.5 ┆ NaN │
#> │ NaN ┆ 5.25 ┆ 10.5 │
#> └─────┴──────┴───────┘