Read into a DataFrame from Arrow IPC (Feather v2) file
Description
Read into a DataFrame from Arrow IPC (Feather v2) file
Usage
pl$read_ipc(
source,
...,
n_rows = NULL,
cache = TRUE,
rechunk = FALSE,
row_index_name = NULL,
row_index_offset = 0L,
storage_options = NULL,
retries = 2,
file_cache_ttl = NULL,
hive_partitioning = NULL,
hive_schema = NULL,
try_parse_hive_dates = TRUE,
include_file_paths = NULL
)
Arguments
source
|
Path(s) to a file or directory. When needing to authenticate for
scanning cloud locations, see the storage_options
parameter.
|
…
|
These dots are for future extensions and must be empty. |
n_rows
|
Stop reading from parquet file after reading n_rows .
|
cache
|
Cache the result after reading. |
rechunk
|
In case of reading multiple files via a glob pattern rechunk the final DataFrame into contiguous memory chunks. |
row_index_name
|
If not NULL , this will insert a row index column with the
given name into the DataFrame.
|
row_index_offset
|
Offset to start the row index column (only used if the name is set). |
storage_options
|
Named vector containing options that indicate how to connect to a cloud
provider. The cloud providers currently supported are AWS, GCP, and
Azure. See supported keys here:
storage_options is not provided, Polars will try to
infer the information from environment variables.
|
retries
|
Number of retries if accessing a cloud instance fails. |
file_cache_ttl
|
Amount of time to keep downloaded cloud files since their last access
time, in seconds. Uses the POLARS_FILE_CACHE_TTL
environment variable (which defaults to 1 hour) if not given.
|
hive_partitioning
|
Infer statistics and schema from Hive partitioned sources and use them
to prune reads. If NULL (default), it is automatically
enabled when a single directory is passed, and otherwise disabled.
|
hive_schema
|
A list containing the column names and data types of the columns by
which the data is partitioned, e.g. list(a = pl$String, b =
pl$Float32) . If NULL (default), the schema of the
Hive partitions is inferred.
|
try_parse_hive_dates
|
Whether to try parsing hive values as date / datetime types. |
include_file_paths
|
Character value indicating the column name that will include the path of the source file(s). |
Value
A polars DataFrame
Examples
library("polars")
temp_dir <- tempfile()
# Write a hive-style partitioned arrow file dataset
arrow::write_dataset(
mtcars,
temp_dir,
partitioning = c("cyl", "gear"),
format = "arrow",
hive_style = TRUE
)
list.files(temp_dir, recursive = TRUE)
#> [1] "cyl=4/gear=3/part-0.arrow" "cyl=4/gear=4/part-0.arrow"
#> [3] "cyl=4/gear=5/part-0.arrow" "cyl=6/gear=3/part-0.arrow"
#> [5] "cyl=6/gear=4/part-0.arrow" "cyl=6/gear=5/part-0.arrow"
#> [7] "cyl=8/gear=3/part-0.arrow" "cyl=8/gear=5/part-0.arrow"
# If the path is a folder, Polars automatically tries to detect partitions
# and includes them in the output
pl$read_ipc(temp_dir)
#> shape: (32, 11)
#> ┌──────┬───────┬───────┬──────┬───┬─────┬──────┬─────┬──────┐
#> │ mpg ┆ disp ┆ hp ┆ drat ┆ … ┆ am ┆ carb ┆ cyl ┆ gear │
#> │ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │
#> │ f64 ┆ f64 ┆ f64 ┆ f64 ┆ ┆ f64 ┆ f64 ┆ i64 ┆ i64 │
#> ╞══════╪═══════╪═══════╪══════╪═══╪═════╪══════╪═════╪══════╡
#> │ 21.5 ┆ 120.1 ┆ 97.0 ┆ 3.7 ┆ … ┆ 0.0 ┆ 1.0 ┆ 4 ┆ 3 │
#> │ 22.8 ┆ 108.0 ┆ 93.0 ┆ 3.85 ┆ … ┆ 1.0 ┆ 1.0 ┆ 4 ┆ 4 │
#> │ 24.4 ┆ 146.7 ┆ 62.0 ┆ 3.69 ┆ … ┆ 0.0 ┆ 2.0 ┆ 4 ┆ 4 │
#> │ 22.8 ┆ 140.8 ┆ 95.0 ┆ 3.92 ┆ … ┆ 0.0 ┆ 2.0 ┆ 4 ┆ 4 │
#> │ 32.4 ┆ 78.7 ┆ 66.0 ┆ 4.08 ┆ … ┆ 1.0 ┆ 1.0 ┆ 4 ┆ 4 │
#> │ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … │
#> │ 15.2 ┆ 304.0 ┆ 150.0 ┆ 3.15 ┆ … ┆ 0.0 ┆ 2.0 ┆ 8 ┆ 3 │
#> │ 13.3 ┆ 350.0 ┆ 245.0 ┆ 3.73 ┆ … ┆ 0.0 ┆ 4.0 ┆ 8 ┆ 3 │
#> │ 19.2 ┆ 400.0 ┆ 175.0 ┆ 3.08 ┆ … ┆ 0.0 ┆ 2.0 ┆ 8 ┆ 3 │
#> │ 15.8 ┆ 351.0 ┆ 264.0 ┆ 4.22 ┆ … ┆ 1.0 ┆ 4.0 ┆ 8 ┆ 5 │
#> │ 15.0 ┆ 301.0 ┆ 335.0 ┆ 3.54 ┆ … ┆ 1.0 ┆ 8.0 ┆ 8 ┆ 5 │
#> └──────┴───────┴───────┴──────┴───┴─────┴──────┴─────┴──────┘
# We can also impose a schema to the partition
pl$read_ipc(temp_dir, hive_schema = list(cyl = pl$String, gear = pl$Int32))
#> shape: (32, 11)
#> ┌──────┬───────┬───────┬──────┬───┬─────┬──────┬─────┬──────┐
#> │ mpg ┆ disp ┆ hp ┆ drat ┆ … ┆ am ┆ carb ┆ cyl ┆ gear │
#> │ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │
#> │ f64 ┆ f64 ┆ f64 ┆ f64 ┆ ┆ f64 ┆ f64 ┆ str ┆ i32 │
#> ╞══════╪═══════╪═══════╪══════╪═══╪═════╪══════╪═════╪══════╡
#> │ 21.5 ┆ 120.1 ┆ 97.0 ┆ 3.7 ┆ … ┆ 0.0 ┆ 1.0 ┆ 4 ┆ 3 │
#> │ 22.8 ┆ 108.0 ┆ 93.0 ┆ 3.85 ┆ … ┆ 1.0 ┆ 1.0 ┆ 4 ┆ 4 │
#> │ 24.4 ┆ 146.7 ┆ 62.0 ┆ 3.69 ┆ … ┆ 0.0 ┆ 2.0 ┆ 4 ┆ 4 │
#> │ 22.8 ┆ 140.8 ┆ 95.0 ┆ 3.92 ┆ … ┆ 0.0 ┆ 2.0 ┆ 4 ┆ 4 │
#> │ 32.4 ┆ 78.7 ┆ 66.0 ┆ 4.08 ┆ … ┆ 1.0 ┆ 1.0 ┆ 4 ┆ 4 │
#> │ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … │
#> │ 15.2 ┆ 304.0 ┆ 150.0 ┆ 3.15 ┆ … ┆ 0.0 ┆ 2.0 ┆ 8 ┆ 3 │
#> │ 13.3 ┆ 350.0 ┆ 245.0 ┆ 3.73 ┆ … ┆ 0.0 ┆ 4.0 ┆ 8 ┆ 3 │
#> │ 19.2 ┆ 400.0 ┆ 175.0 ┆ 3.08 ┆ … ┆ 0.0 ┆ 2.0 ┆ 8 ┆ 3 │
#> │ 15.8 ┆ 351.0 ┆ 264.0 ┆ 4.22 ┆ … ┆ 1.0 ┆ 4.0 ┆ 8 ┆ 5 │
#> │ 15.0 ┆ 301.0 ┆ 335.0 ┆ 3.54 ┆ … ┆ 1.0 ┆ 8.0 ┆ 8 ┆ 5 │
#> └──────┴───────┴───────┴──────┴───┴─────┴──────┴─────┴──────┘