Convert a String column into a Date/Datetime/Time column.
Description
Similar to the strptime()
function.
Usage
<Expr>$str$strptime(
dtype,
format = NULL,
...,
strict = TRUE,
exact = TRUE,
cache = TRUE,
ambiguous = c("raise", "earliest", "latest", "null")
)
Arguments
dtype
|
The data type to convert into. Can be either pl$Date ,
pl$Datetime , or pl$Time .
|
format
|
Format to use for conversion. Refer to
the
chrono crate documentation for the full specification. Example:
“%Y-%m-%d %H:%M:%S” . If NULL (default), the
format is inferred from the data. Notice that time zone
%Z is not supported and will just
ignore timezones. Numeric time zones like
%z or
%:z are supported.
|
…
|
These dots are for future extensions and must be empty. |
strict
|
If TRUE (default), raise an error if a single string cannot
be parsed. If FALSE , produce a polars null .
|
exact
|
If TRUE (default), require an exact format match. If
FALSE , allow the format to match anywhere in the target
string. Conversion to the Time type is always exact. Note that using
exact = FALSE introduces a performance penalty - cleaning
your data beforehand will almost certainly be more performant.
|
cache
|
Use a cache of unique, converted dates to apply the datetime conversion. |
ambiguous
|
Determine how to deal with ambiguous datetimes. Character vector or
expression containing the followings:
|
Details
When parsing a Datetime the column precision will be inferred from the
format string, if given, e.g.: “%F %T%.3f”
=>
pl$Datetime("ms")
. If no fractional second component is
found then the default is “us”
(microsecond).
Value
A polars expression
See Also
-
\
$str$to_date() -
\
$str$to_datetime() -
\
$str$to_time()
Examples
library("polars")
# Dealing with a consistent format
df <- pl$DataFrame(x = c("2020-01-01 01:00Z", "2020-01-01 02:00Z"))
df$select(pl$col("x")$str$strptime(pl$Datetime(), "%Y-%m-%d %H:%M%#z"))
#> shape: (2, 1)
#> ┌─────────────────────────┐
#> │ x │
#> │ --- │
#> │ datetime[μs, UTC] │
#> ╞═════════════════════════╡
#> │ 2020-01-01 01:00:00 UTC │
#> │ 2020-01-01 02:00:00 UTC │
#> └─────────────────────────┘
#> shape: (2, 1)
#> ┌─────────────────────────┐
#> │ x │
#> │ --- │
#> │ datetime[μs, UTC] │
#> ╞═════════════════════════╡
#> │ 2020-01-01 01:00:00 UTC │
#> │ 2020-01-01 02:00:00 UTC │
#> └─────────────────────────┘
# Datetime with timezone is interpreted as UTC timezone
df <- pl$DataFrame(x = c("2020-01-01T01:00:00+09:00"))
df$select(pl$col("x")$str$strptime(pl$Datetime()))
#> shape: (1, 1)
#> ┌─────────────────────────┐
#> │ x │
#> │ --- │
#> │ datetime[μs, UTC] │
#> ╞═════════════════════════╡
#> │ 2019-12-31 16:00:00 UTC │
#> └─────────────────────────┘
# Dealing with different formats.
df <- pl$DataFrame(
date = c(
"2021-04-22",
"2022-01-04 00:00:00",
"01/31/22",
"Sun Jul 8 00:34:60 2001"
)
)
df$select(
pl$coalesce(
pl$col("date")$str$strptime(pl$Date, "%F", strict = FALSE),
pl$col("date")$str$strptime(pl$Date, "%F %T", strict = FALSE),
pl$col("date")$str$strptime(pl$Date, "%D", strict = FALSE),
pl$col("date")$str$strptime(pl$Date, "%c", strict = FALSE)
)
)
#> shape: (4, 1)
#> ┌────────────┐
#> │ date │
#> │ --- │
#> │ date │
#> ╞════════════╡
#> │ 2021-04-22 │
#> │ 2022-01-04 │
#> │ 2022-01-31 │
#> │ 2001-07-08 │
#> └────────────┘
# Ignore invalid time
df <- pl$DataFrame(
x = c(
"2023-01-01 11:22:33 -0100",
"2023-01-01 11:22:33 +0300",
"invalid time"
)
)
df$select(pl$col("x")$str$strptime(
pl$Datetime("ns"),
format = "%Y-%m-%d %H:%M:%S %z",
strict = FALSE
))
#> shape: (3, 1)
#> ┌─────────────────────────┐
#> │ x │
#> │ --- │
#> │ datetime[ns, UTC] │
#> ╞═════════════════════════╡
#> │ 2023-01-01 12:22:33 UTC │
#> │ 2023-01-01 08:22:33 UTC │
#> │ null │
#> └─────────────────────────┘