Skip to content

Convert a String column into a Date/Datetime/Time column.

Description

Similar to the strptime() function.

Usage

<Expr>$str$strptime(
  dtype,
  format = NULL,
  ...,
  strict = TRUE,
  exact = TRUE,
  cache = TRUE,
  ambiguous = c("raise", "earliest", "latest", "null")
)

Arguments

dtype The data type to convert into. Can be either pl$Date, pl$Datetime, or pl$Time.
format Format to use for conversion. Refer to the chrono crate documentation for the full specification. Example: “%Y-%m-%d %H:%M:%S”. If NULL (default), the format is inferred from the data. Notice that time zone %Z is not supported and will just ignore timezones. Numeric time zones like %z or %:z are supported.
These dots are for future extensions and must be empty.
strict If TRUE (default), raise an error if a single string cannot be parsed. If FALSE, produce a polars null.
exact If TRUE (default), require an exact format match. If FALSE, allow the format to match anywhere in the target string. Conversion to the Time type is always exact. Note that using exact = FALSE introduces a performance penalty - cleaning your data beforehand will almost certainly be more performant.
cache Use a cache of unique, converted dates to apply the datetime conversion.
ambiguous Determine how to deal with ambiguous datetimes. Character vector or expression containing the followings:
  • “raise” (default): Throw an error
  • “earliest”: Use the earliest datetime
  • “latest”: Use the latest datetime
  • “null”: Return a null value

Details

When parsing a Datetime the column precision will be inferred from the format string, if given, e.g.: “%F %T%.3f” => pl$Datetime("ms"). If no fractional second component is found then the default is “us” (microsecond).

Value

A polars expression

See Also

  • \$str$to_date()
  • \$str$to_datetime()
  • \$str$to_time()

Examples

library("polars")

# Dealing with a consistent format
df <- pl$DataFrame(x = c("2020-01-01 01:00Z", "2020-01-01 02:00Z"))

df$select(pl$col("x")$str$strptime(pl$Datetime(), "%Y-%m-%d %H:%M%#z"))
#> shape: (2, 1)
#> ┌─────────────────────────┐
#> │ x                       │
#> │ ---                     │
#> │ datetime[μs, UTC]       │
#> ╞═════════════════════════╡
#> │ 2020-01-01 01:00:00 UTC │
#> │ 2020-01-01 02:00:00 UTC │
#> └─────────────────────────┘
# Auto infer format
df$select(pl$col("x")$str$strptime(pl$Datetime()))
#> shape: (2, 1)
#> ┌─────────────────────────┐
#> │ x                       │
#> │ ---                     │
#> │ datetime[μs, UTC]       │
#> ╞═════════════════════════╡
#> │ 2020-01-01 01:00:00 UTC │
#> │ 2020-01-01 02:00:00 UTC │
#> └─────────────────────────┘
# Datetime with timezone is interpreted as UTC timezone
df <- pl$DataFrame(x = c("2020-01-01T01:00:00+09:00"))
df$select(pl$col("x")$str$strptime(pl$Datetime()))
#> shape: (1, 1)
#> ┌─────────────────────────┐
#> │ x                       │
#> │ ---                     │
#> │ datetime[μs, UTC]       │
#> ╞═════════════════════════╡
#> │ 2019-12-31 16:00:00 UTC │
#> └─────────────────────────┘
# Dealing with different formats.
df <- pl$DataFrame(
  date = c(
    "2021-04-22",
    "2022-01-04 00:00:00",
    "01/31/22",
    "Sun Jul  8 00:34:60 2001"
  )
)

df$select(
  pl$coalesce(
    pl$col("date")$str$strptime(pl$Date, "%F", strict = FALSE),
    pl$col("date")$str$strptime(pl$Date, "%F %T", strict = FALSE),
    pl$col("date")$str$strptime(pl$Date, "%D", strict = FALSE),
    pl$col("date")$str$strptime(pl$Date, "%c", strict = FALSE)
  )
)
#> shape: (4, 1)
#> ┌────────────┐
#> │ date       │
#> │ ---        │
#> │ date       │
#> ╞════════════╡
#> │ 2021-04-22 │
#> │ 2022-01-04 │
#> │ 2022-01-31 │
#> │ 2001-07-08 │
#> └────────────┘
# Ignore invalid time
df <- pl$DataFrame(
  x = c(
    "2023-01-01 11:22:33 -0100",
    "2023-01-01 11:22:33 +0300",
    "invalid time"
  )
)

df$select(pl$col("x")$str$strptime(
  pl$Datetime("ns"),
  format = "%Y-%m-%d %H:%M:%S %z",
  strict = FALSE
))
#> shape: (3, 1)
#> ┌─────────────────────────┐
#> │ x                       │
#> │ ---                     │
#> │ datetime[ns, UTC]       │
#> ╞═════════════════════════╡
#> │ 2023-01-01 12:22:33 UTC │
#> │ 2023-01-01 08:22:33 UTC │
#> │ null                    │
#> └─────────────────────────┘