Selecting and Filtering
This page is based on the chapter “Indexing (Or Lack Thereof)” of the Modern Polars book.
Preparing Data
Since the PRQL and SQL results shown on this page are after being
converted to R DataFrame via knitr, they have been converted from DuckDB
types to R types. So NULL
in DuckDB is shown as NA
.
Download
Download the data to be analysis (zipped CSV file) and write the data to a Parquet file.
This document uses R here, but we can do it in another language, or, manually download and unzip and create the Parquet file (with DuckDB CLI).
- R
- Shell
Code
# Create "data" directory, download the zip file into the directory, and create a Parquet file.
data_dir <- "data"
dest <- file.path(data_dir, "flights.csv.zip")
csv_name <- "On_Time_Reporting_Carrier_On_Time_Performance_(1987_present)_2022_1.csv"
csv_path <- file.path(data_dir, csv_name)
parquet_path <- file.path(data_dir, "flights.parquet")
if (!fs::file_exists(parquet_path)) {
if (!fs::file_exists(dest)) {
fs::dir_create(data_dir)
curl::curl_download(
"https://transtats.bts.gov/PREZIP/On_Time_Reporting_Carrier_On_Time_Performance_1987_present_2022_1.zip",
dest,
handle = curl::new_handle(ssl_verifypeer = FALSE)
)
}
unzip(dest, csv_name, exdir = data_dir)
duckdb:::sql(glue::glue("COPY (FROM read_csv_auto('{csv_path}')) TO '{parquet_path}' (FORMAT PARQUET)"))
}
This is a sample command to download the zipped CSV file and covert it to a Parquet file.
On Windows, the unzip
command is not available by default, so use
something like
Expand-Archive -Path data/flights.csv.zip -DestinationPath data
in
PowerShell insead.
mkdir data
curl -sL https://transtats.bts.gov/PREZIP/On_Time_Reporting_Carrier_On_Time_Performance_1987_present_2022_1.zip -o data/flights.csv.zip
unzip -d data data/flights.csv.zip
duckdb -c "COPY (FROM read_csv_auto('data/On_Time_Reporting_Carrier_On_Time_Performance_(1987_present)_2022_1.csv')) TO 'data/flights.parquet' (FORMAT PARQUET)"
Load the Data
After the Parquet file is ready, load it into DuckDB (in-memory) database table, R DataFrame, and Python polars.LazyFrame.
- DuckDB
- R DataFrame
- Python polars.LazyFrame
CREATE TABLE tab AS SELECT * FROM 'data/flights.parquet'
FROM tab LIMIT 5
Year | Quarter | Month | DayofMonth | DayOfWeek | FlightDate | Reporting_Airline | DOT_ID_Reporting_Airline | IATA_CODE_Reporting_Airline | Tail_Number | Flight_Number_Reporting_Airline | OriginAirportID | OriginAirportSeqID | OriginCityMarketID | Origin | OriginCityName | OriginState | OriginStateFips | OriginStateName | OriginWac | DestAirportID | DestAirportSeqID | DestCityMarketID | Dest | DestCityName | DestState | DestStateFips | DestStateName | DestWac | CRSDepTime | DepTime | DepDelay | DepDelayMinutes | DepDel15 | DepartureDelayGroups | DepTimeBlk | TaxiOut | WheelsOff | WheelsOn | TaxiIn | CRSArrTime | ArrTime | ArrDelay | ArrDelayMinutes | ArrDel15 | ArrivalDelayGroups | ArrTimeBlk | Cancelled | CancellationCode | Diverted | CRSElapsedTime | ActualElapsedTime | AirTime | Flights | Distance | DistanceGroup | CarrierDelay | WeatherDelay | NASDelay | SecurityDelay | LateAircraftDelay | FirstDepTime | TotalAddGTime | LongestAddGTime | DivAirportLandings | DivReachedDest | DivActualElapsedTime | DivArrDelay | DivDistance | Div1Airport | Div1AirportID | Div1AirportSeqID | Div1WheelsOn | Div1TotalGTime | Div1LongestGTime | Div1WheelsOff | Div1TailNum | Div2Airport | Div2AirportID | Div2AirportSeqID | Div2WheelsOn | Div2TotalGTime | Div2LongestGTime | Div2WheelsOff | Div2TailNum | Div3Airport | Div3AirportID | Div3AirportSeqID | Div3WheelsOn | Div3TotalGTime | Div3LongestGTime | Div3WheelsOff | Div3TailNum | Div4Airport | Div4AirportID | Div4AirportSeqID | Div4WheelsOn | Div4TotalGTime | Div4LongestGTime | Div4WheelsOff | Div4TailNum | Div5Airport | Div5AirportID | Div5AirportSeqID | Div5WheelsOn | Div5TotalGTime | Div5LongestGTime | Div5WheelsOff | Div5TailNum | column109 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2022 | 1 | 1 | 14 | 5 | 2022-01-14 | YX | 20452 | YX | N119HQ | 4879 | 11066 | 1106606 | 31066 | CMH | Columbus, OH | OH | 39 | Ohio | 44 | 11278 | 1127805 | 30852 | DCA | Washington, DC | VA | 51 | Virginia | 38 | 1224 | 1221 | -3 | 0 | 0 | -1 | 1200-1259 | 28 | 1249 | 1352 | 4 | 1352 | 1356 | 4 | 4 | 0 | 0 | 1300-1359 | 0 | 0 | 88 | 95 | 63 | 1 | 323 | 2 | NA | NA | NA | NA | NA | NA | NA | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ||||||||||||||||||||||
2022 | 1 | 1 | 15 | 6 | 2022-01-15 | YX | 20452 | YX | N122HQ | 4879 | 11066 | 1106606 | 31066 | CMH | Columbus, OH | OH | 39 | Ohio | 44 | 11278 | 1127805 | 30852 | DCA | Washington, DC | VA | 51 | Virginia | 38 | 1224 | 1214 | -10 | 0 | 0 | -1 | 1200-1259 | 19 | 1233 | 1323 | 5 | 1352 | 1328 | -24 | 0 | 0 | -2 | 1300-1359 | 0 | 0 | 88 | 74 | 50 | 1 | 323 | 2 | NA | NA | NA | NA | NA | NA | NA | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ||||||||||||||||||||||
2022 | 1 | 1 | 16 | 7 | 2022-01-16 | YX | 20452 | YX | N412YX | 4879 | 11066 | 1106606 | 31066 | CMH | Columbus, OH | OH | 39 | Ohio | 44 | 11278 | 1127805 | 30852 | DCA | Washington, DC | VA | 51 | Virginia | 38 | 1224 | 1218 | -6 | 0 | 0 | -1 | 1200-1259 | 16 | 1234 | 1327 | 12 | 1352 | 1339 | -13 | 0 | 0 | -1 | 1300-1359 | 0 | 0 | 88 | 81 | 53 | 1 | 323 | 2 | NA | NA | NA | NA | NA | NA | NA | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ||||||||||||||||||||||
2022 | 1 | 1 | 17 | 1 | 2022-01-17 | YX | 20452 | YX | N405YX | 4879 | 11066 | 1106606 | 31066 | CMH | Columbus, OH | OH | 39 | Ohio | 44 | 11278 | 1127805 | 30852 | DCA | Washington, DC | VA | 51 | Virginia | 38 | 1224 | 1217 | -7 | 0 | 0 | -1 | 1200-1259 | 32 | 1249 | 1345 | 16 | 1352 | 1401 | 9 | 9 | 0 | 0 | 1300-1359 | 0 | 0 | 88 | 104 | 56 | 1 | 323 | 2 | NA | NA | NA | NA | NA | NA | NA | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ||||||||||||||||||||||
2022 | 1 | 1 | 18 | 2 | 2022-01-18 | YX | 20452 | YX | N420YX | 4879 | 11066 | 1106606 | 31066 | CMH | Columbus, OH | OH | 39 | Ohio | 44 | 11278 | 1127805 | 30852 | DCA | Washington, DC | VA | 51 | Virginia | 38 | 1224 | 1218 | -6 | 0 | 0 | -1 | 1200-1259 | 11 | 1229 | 1317 | 6 | 1352 | 1323 | -29 | 0 | 0 | -2 | 1300-1359 | 0 | 0 | 88 | 65 | 48 | 1 | 323 | 2 | NA | NA | NA | NA | NA | NA | NA | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
library(dplyr, warn.conflicts = FALSE)
df <- duckdb:::sql("FROM 'data/flights.parquet'")
df |> head(5)
Year | Quarter | Month | DayofMonth | DayOfWeek | FlightDate | Reporting_Airline | DOT_ID_Reporting_Airline | IATA_CODE_Reporting_Airline | Tail_Number | Flight_Number_Reporting_Airline | OriginAirportID | OriginAirportSeqID | OriginCityMarketID | Origin | OriginCityName | OriginState | OriginStateFips | OriginStateName | OriginWac | DestAirportID | DestAirportSeqID | DestCityMarketID | Dest | DestCityName | DestState | DestStateFips | DestStateName | DestWac | CRSDepTime | DepTime | DepDelay | DepDelayMinutes | DepDel15 | DepartureDelayGroups | DepTimeBlk | TaxiOut | WheelsOff | WheelsOn | TaxiIn | CRSArrTime | ArrTime | ArrDelay | ArrDelayMinutes | ArrDel15 | ArrivalDelayGroups | ArrTimeBlk | Cancelled | CancellationCode | Diverted | CRSElapsedTime | ActualElapsedTime | AirTime | Flights | Distance | DistanceGroup | CarrierDelay | WeatherDelay | NASDelay | SecurityDelay | LateAircraftDelay | FirstDepTime | TotalAddGTime | LongestAddGTime | DivAirportLandings | DivReachedDest | DivActualElapsedTime | DivArrDelay | DivDistance | Div1Airport | Div1AirportID | Div1AirportSeqID | Div1WheelsOn | Div1TotalGTime | Div1LongestGTime | Div1WheelsOff | Div1TailNum | Div2Airport | Div2AirportID | Div2AirportSeqID | Div2WheelsOn | Div2TotalGTime | Div2LongestGTime | Div2WheelsOff | Div2TailNum | Div3Airport | Div3AirportID | Div3AirportSeqID | Div3WheelsOn | Div3TotalGTime | Div3LongestGTime | Div3WheelsOff | Div3TailNum | Div4Airport | Div4AirportID | Div4AirportSeqID | Div4WheelsOn | Div4TotalGTime | Div4LongestGTime | Div4WheelsOff | Div4TailNum | Div5Airport | Div5AirportID | Div5AirportSeqID | Div5WheelsOn | Div5TotalGTime | Div5LongestGTime | Div5WheelsOff | Div5TailNum | column109 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2022 | 1 | 1 | 14 | 5 | 2022-01-14 | YX | 20452 | YX | N119HQ | 4879 | 11066 | 1106606 | 31066 | CMH | Columbus, OH | OH | 39 | Ohio | 44 | 11278 | 1127805 | 30852 | DCA | Washington, DC | VA | 51 | Virginia | 38 | 1224 | 1221 | -3 | 0 | 0 | -1 | 1200-1259 | 28 | 1249 | 1352 | 4 | 1352 | 1356 | 4 | 4 | 0 | 0 | 1300-1359 | 0 | 0 | 88 | 95 | 63 | 1 | 323 | 2 | NA | NA | NA | NA | NA | NA | NA | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ||||||||||||||||||||||
2022 | 1 | 1 | 15 | 6 | 2022-01-15 | YX | 20452 | YX | N122HQ | 4879 | 11066 | 1106606 | 31066 | CMH | Columbus, OH | OH | 39 | Ohio | 44 | 11278 | 1127805 | 30852 | DCA | Washington, DC | VA | 51 | Virginia | 38 | 1224 | 1214 | -10 | 0 | 0 | -1 | 1200-1259 | 19 | 1233 | 1323 | 5 | 1352 | 1328 | -24 | 0 | 0 | -2 | 1300-1359 | 0 | 0 | 88 | 74 | 50 | 1 | 323 | 2 | NA | NA | NA | NA | NA | NA | NA | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ||||||||||||||||||||||
2022 | 1 | 1 | 16 | 7 | 2022-01-16 | YX | 20452 | YX | N412YX | 4879 | 11066 | 1106606 | 31066 | CMH | Columbus, OH | OH | 39 | Ohio | 44 | 11278 | 1127805 | 30852 | DCA | Washington, DC | VA | 51 | Virginia | 38 | 1224 | 1218 | -6 | 0 | 0 | -1 | 1200-1259 | 16 | 1234 | 1327 | 12 | 1352 | 1339 | -13 | 0 | 0 | -1 | 1300-1359 | 0 | 0 | 88 | 81 | 53 | 1 | 323 | 2 | NA | NA | NA | NA | NA | NA | NA | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ||||||||||||||||||||||
2022 | 1 | 1 | 17 | 1 | 2022-01-17 | YX | 20452 | YX | N405YX | 4879 | 11066 | 1106606 | 31066 | CMH | Columbus, OH | OH | 39 | Ohio | 44 | 11278 | 1127805 | 30852 | DCA | Washington, DC | VA | 51 | Virginia | 38 | 1224 | 1217 | -7 | 0 | 0 | -1 | 1200-1259 | 32 | 1249 | 1345 | 16 | 1352 | 1401 | 9 | 9 | 0 | 0 | 1300-1359 | 0 | 0 | 88 | 104 | 56 | 1 | 323 | 2 | NA | NA | NA | NA | NA | NA | NA | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ||||||||||||||||||||||
2022 | 1 | 1 | 18 | 2 | 2022-01-18 | YX | 20452 | YX | N420YX | 4879 | 11066 | 1106606 | 31066 | CMH | Columbus, OH | OH | 39 | Ohio | 44 | 11278 | 1127805 | 30852 | DCA | Washington, DC | VA | 51 | Virginia | 38 | 1224 | 1218 | -6 | 0 | 0 | -1 | 1200-1259 | 11 | 1229 | 1317 | 6 | 1352 | 1323 | -29 | 0 | 0 | -2 | 1300-1359 | 0 | 0 | 88 | 65 | 48 | 1 | 323 | 2 | NA | NA | NA | NA | NA | NA | NA | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
import polars as pl
lf = pl.scan_parquet("data/flights.parquet")
lf.fetch(5)
Year | Quarter | Month | DayofMonth | DayOfWeek | FlightDate | Reporting_Airline | DOT_ID_Reporting_Airline | IATA_CODE_Reporting_Airline | Tail_Number | Flight_Number_Reporting_Airline | OriginAirportID | OriginAirportSeqID | OriginCityMarketID | Origin | OriginCityName | OriginState | OriginStateFips | OriginStateName | OriginWac | DestAirportID | DestAirportSeqID | DestCityMarketID | Dest | DestCityName | DestState | DestStateFips | DestStateName | DestWac | CRSDepTime | DepTime | DepDelay | DepDelayMinutes | DepDel15 | DepartureDelayGroups | DepTimeBlk | TaxiOut | … | Div1TotalGTime | Div1LongestGTime | Div1WheelsOff | Div1TailNum | Div2Airport | Div2AirportID | Div2AirportSeqID | Div2WheelsOn | Div2TotalGTime | Div2LongestGTime | Div2WheelsOff | Div2TailNum | Div3Airport | Div3AirportID | Div3AirportSeqID | Div3WheelsOn | Div3TotalGTime | Div3LongestGTime | Div3WheelsOff | Div3TailNum | Div4Airport | Div4AirportID | Div4AirportSeqID | Div4WheelsOn | Div4TotalGTime | Div4LongestGTime | Div4WheelsOff | Div4TailNum | Div5Airport | Div5AirportID | Div5AirportSeqID | Div5WheelsOn | Div5TotalGTime | Div5LongestGTime | Div5WheelsOff | Div5TailNum | column109 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
i64 | i64 | i64 | i64 | i64 | date | str | i64 | str | str | i64 | i64 | i64 | i64 | str | str | str | str | str | i64 | i64 | i64 | i64 | str | str | str | str | str | i64 | str | str | f64 | f64 | f64 | i64 | str | f64 | … | f64 | f64 | str | str | str | i64 | i64 | str | f64 | f64 | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str |
2022 | 1 | 1 | 14 | 5 | 2022-01-14 | "YX" | 20452 | "YX" | "N119HQ" | 4879 | 11066 | 1106606 | 31066 | "CMH" | "Columbus, OH" | "OH" | "39" | "Ohio" | 44 | 11278 | 1127805 | 30852 | "DCA" | "Washington, DC" | "VA" | "51" | "Virginia" | 38 | "1224" | "1221" | -3.0 | 0.0 | 0.0 | -1 | "1200-1259" | 28.0 | … | null | null | "" | "" | "" | null | null | "" | null | null | "" | "" | "" | null | null | "" | null | null | "" | "" | "" | null | null | "" | null | null | "" | "" | "" | null | null | "" | null | null | "" | "" | null |
2022 | 1 | 1 | 15 | 6 | 2022-01-15 | "YX" | 20452 | "YX" | "N122HQ" | 4879 | 11066 | 1106606 | 31066 | "CMH" | "Columbus, OH" | "OH" | "39" | "Ohio" | 44 | 11278 | 1127805 | 30852 | "DCA" | "Washington, DC" | "VA" | "51" | "Virginia" | 38 | "1224" | "1214" | -10.0 | 0.0 | 0.0 | -1 | "1200-1259" | 19.0 | … | null | null | "" | "" | "" | null | null | "" | null | null | "" | "" | "" | null | null | "" | null | null | "" | "" | "" | null | null | "" | null | null | "" | "" | "" | null | null | "" | null | null | "" | "" | null |
2022 | 1 | 1 | 16 | 7 | 2022-01-16 | "YX" | 20452 | "YX" | "N412YX" | 4879 | 11066 | 1106606 | 31066 | "CMH" | "Columbus, OH" | "OH" | "39" | "Ohio" | 44 | 11278 | 1127805 | 30852 | "DCA" | "Washington, DC" | "VA" | "51" | "Virginia" | 38 | "1224" | "1218" | -6.0 | 0.0 | 0.0 | -1 | "1200-1259" | 16.0 | … | null | null | "" | "" | "" | null | null | "" | null | null | "" | "" | "" | null | null | "" | null | null | "" | "" | "" | null | null | "" | null | null | "" | "" | "" | null | null | "" | null | null | "" | "" | null |
2022 | 1 | 1 | 17 | 1 | 2022-01-17 | "YX" | 20452 | "YX" | "N405YX" | 4879 | 11066 | 1106606 | 31066 | "CMH" | "Columbus, OH" | "OH" | "39" | "Ohio" | 44 | 11278 | 1127805 | 30852 | "DCA" | "Washington, DC" | "VA" | "51" | "Virginia" | 38 | "1224" | "1217" | -7.0 | 0.0 | 0.0 | -1 | "1200-1259" | 32.0 | … | null | null | "" | "" | "" | null | null | "" | null | null | "" | "" | "" | null | null | "" | null | null | "" | "" | "" | null | null | "" | null | null | "" | "" | "" | null | null | "" | null | null | "" | "" | null |
2022 | 1 | 1 | 18 | 2 | 2022-01-18 | "YX" | 20452 | "YX" | "N420YX" | 4879 | 11066 | 1106606 | 31066 | "CMH" | "Columbus, OH" | "OH" | "39" | "Ohio" | 44 | 11278 | 1127805 | 30852 | "DCA" | "Washington, DC" | "VA" | "51" | "Virginia" | 38 | "1224" | "1218" | -6.0 | 0.0 | 0.0 | -1 | "1200-1259" | 11.0 | … | null | null | "" | "" | "" | null | null | "" | null | null | "" | "" | "" | null | null | "" | null | null | "" | "" | "" | null | null | "" | null | null | "" | "" | "" | null | null | "" | null | null | "" | "" | null |
Read the Data with PRQL
Since PRQL cannot be used to create tables, data must be read from pre-created tables in a DB.
But when PRQL is used on DuckDB, it is possible to treat Parquet and other files as tables, so that PRQL queries can be executed without creating tables.
from `data/flights.parquet`
take 5
Year | Quarter | Month | DayofMonth | DayOfWeek | FlightDate | Reporting_Airline | DOT_ID_Reporting_Airline | IATA_CODE_Reporting_Airline | Tail_Number | Flight_Number_Reporting_Airline | OriginAirportID | OriginAirportSeqID | OriginCityMarketID | Origin | OriginCityName | OriginState | OriginStateFips | OriginStateName | OriginWac | DestAirportID | DestAirportSeqID | DestCityMarketID | Dest | DestCityName | DestState | DestStateFips | DestStateName | DestWac | CRSDepTime | DepTime | DepDelay | DepDelayMinutes | DepDel15 | DepartureDelayGroups | DepTimeBlk | TaxiOut | WheelsOff | WheelsOn | TaxiIn | CRSArrTime | ArrTime | ArrDelay | ArrDelayMinutes | ArrDel15 | ArrivalDelayGroups | ArrTimeBlk | Cancelled | CancellationCode | Diverted | CRSElapsedTime | ActualElapsedTime | AirTime | Flights | Distance | DistanceGroup | CarrierDelay | WeatherDelay | NASDelay | SecurityDelay | LateAircraftDelay | FirstDepTime | TotalAddGTime | LongestAddGTime | DivAirportLandings | DivReachedDest | DivActualElapsedTime | DivArrDelay | DivDistance | Div1Airport | Div1AirportID | Div1AirportSeqID | Div1WheelsOn | Div1TotalGTime | Div1LongestGTime | Div1WheelsOff | Div1TailNum | Div2Airport | Div2AirportID | Div2AirportSeqID | Div2WheelsOn | Div2TotalGTime | Div2LongestGTime | Div2WheelsOff | Div2TailNum | Div3Airport | Div3AirportID | Div3AirportSeqID | Div3WheelsOn | Div3TotalGTime | Div3LongestGTime | Div3WheelsOff | Div3TailNum | Div4Airport | Div4AirportID | Div4AirportSeqID | Div4WheelsOn | Div4TotalGTime | Div4LongestGTime | Div4WheelsOff | Div4TailNum | Div5Airport | Div5AirportID | Div5AirportSeqID | Div5WheelsOn | Div5TotalGTime | Div5LongestGTime | Div5WheelsOff | Div5TailNum | column109 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2022 | 1 | 1 | 14 | 5 | 2022-01-14 | YX | 20452 | YX | N119HQ | 4879 | 11066 | 1106606 | 31066 | CMH | Columbus, OH | OH | 39 | Ohio | 44 | 11278 | 1127805 | 30852 | DCA | Washington, DC | VA | 51 | Virginia | 38 | 1224 | 1221 | -3 | 0 | 0 | -1 | 1200-1259 | 28 | 1249 | 1352 | 4 | 1352 | 1356 | 4 | 4 | 0 | 0 | 1300-1359 | 0 | 0 | 88 | 95 | 63 | 1 | 323 | 2 | NA | NA | NA | NA | NA | NA | NA | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ||||||||||||||||||||||
2022 | 1 | 1 | 15 | 6 | 2022-01-15 | YX | 20452 | YX | N122HQ | 4879 | 11066 | 1106606 | 31066 | CMH | Columbus, OH | OH | 39 | Ohio | 44 | 11278 | 1127805 | 30852 | DCA | Washington, DC | VA | 51 | Virginia | 38 | 1224 | 1214 | -10 | 0 | 0 | -1 | 1200-1259 | 19 | 1233 | 1323 | 5 | 1352 | 1328 | -24 | 0 | 0 | -2 | 1300-1359 | 0 | 0 | 88 | 74 | 50 | 1 | 323 | 2 | NA | NA | NA | NA | NA | NA | NA | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ||||||||||||||||||||||
2022 | 1 | 1 | 16 | 7 | 2022-01-16 | YX | 20452 | YX | N412YX | 4879 | 11066 | 1106606 | 31066 | CMH | Columbus, OH | OH | 39 | Ohio | 44 | 11278 | 1127805 | 30852 | DCA | Washington, DC | VA | 51 | Virginia | 38 | 1224 | 1218 | -6 | 0 | 0 | -1 | 1200-1259 | 16 | 1234 | 1327 | 12 | 1352 | 1339 | -13 | 0 | 0 | -1 | 1300-1359 | 0 | 0 | 88 | 81 | 53 | 1 | 323 | 2 | NA | NA | NA | NA | NA | NA | NA | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ||||||||||||||||||||||
2022 | 1 | 1 | 17 | 1 | 2022-01-17 | YX | 20452 | YX | N405YX | 4879 | 11066 | 1106606 | 31066 | CMH | Columbus, OH | OH | 39 | Ohio | 44 | 11278 | 1127805 | 30852 | DCA | Washington, DC | VA | 51 | Virginia | 38 | 1224 | 1217 | -7 | 0 | 0 | -1 | 1200-1259 | 32 | 1249 | 1345 | 16 | 1352 | 1401 | 9 | 9 | 0 | 0 | 1300-1359 | 0 | 0 | 88 | 104 | 56 | 1 | 323 | 2 | NA | NA | NA | NA | NA | NA | NA | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ||||||||||||||||||||||
2022 | 1 | 1 | 18 | 2 | 2022-01-18 | YX | 20452 | YX | N420YX | 4879 | 11066 | 1106606 | 31066 | CMH | Columbus, OH | OH | 39 | Ohio | 44 | 11278 | 1127805 | 30852 | DCA | Washington, DC | VA | 51 | Virginia | 38 | 1224 | 1218 | -6 | 0 | 0 | -1 | 1200-1259 | 11 | 1229 | 1317 | 6 | 1352 | 1323 | -29 | 0 | 0 | -2 | 1300-1359 | 0 | 0 | 88 | 65 | 48 | 1 | 323 | 2 | NA | NA | NA | NA | NA | NA | NA | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
Selecting Columns and Slicing Rows
For column selection, the syntax is much the same, but the syntax for row selection using position looks different.
Note that PRQL, SQL, and R are 1-based, while Python is 0-based.
- PRQL DuckDB
- SQL DuckDB
- dplyr R
- Python Polars
from tab
select {Dest, Tail_Number}
take 13..16
Dest | Tail_Number |
---|---|
DCA | N132HQ |
DCA | N109HQ |
DCA | N421YX |
DCA | N137HQ |
SELECT
Dest,
Tail_Number
FROM tab
LIMIT 4 OFFSET 12
Dest | Tail_Number |
---|---|
DCA | N132HQ |
DCA | N109HQ |
DCA | N421YX |
DCA | N137HQ |
DuckDB allows SQL query starts with FROM
, so we can also write the
query as follows:
FROM tab
SELECT
Dest,
Tail_Number
LIMIT 4 OFFSET 12
df |>
select(Dest, Tail_Number) |>
slice(13:16)
Dest | Tail_Number |
---|---|
DCA | N132HQ |
DCA | N109HQ |
DCA | N421YX |
DCA | N137HQ |
lf.select("Dest", "Tail_Number").slice(12, 4).collect()
Dest | Tail_Number |
---|---|
str | str |
"DCA" | "N132HQ" |
"DCA" | "N109HQ" |
"DCA" | "N421YX" |
"DCA" | "N137HQ" |
Filtering Rows
- PRQL DuckDB
- SQL DuckDB
- dplyr R
- Python Polars
from tab
filter (IATA_CODE_Reporting_Airline | in ["AA", "DL"])
take 5
Year | Quarter | Month | DayofMonth | DayOfWeek | FlightDate | Reporting_Airline | DOT_ID_Reporting_Airline | IATA_CODE_Reporting_Airline | Tail_Number | Flight_Number_Reporting_Airline | OriginAirportID | OriginAirportSeqID | OriginCityMarketID | Origin | OriginCityName | OriginState | OriginStateFips | OriginStateName | OriginWac | DestAirportID | DestAirportSeqID | DestCityMarketID | Dest | DestCityName | DestState | DestStateFips | DestStateName | DestWac | CRSDepTime | DepTime | DepDelay | DepDelayMinutes | DepDel15 | DepartureDelayGroups | DepTimeBlk | TaxiOut | WheelsOff | WheelsOn | TaxiIn | CRSArrTime | ArrTime | ArrDelay | ArrDelayMinutes | ArrDel15 | ArrivalDelayGroups | ArrTimeBlk | Cancelled | CancellationCode | Diverted | CRSElapsedTime | ActualElapsedTime | AirTime | Flights | Distance | DistanceGroup | CarrierDelay | WeatherDelay | NASDelay | SecurityDelay | LateAircraftDelay | FirstDepTime | TotalAddGTime | LongestAddGTime | DivAirportLandings | DivReachedDest | DivActualElapsedTime | DivArrDelay | DivDistance | Div1Airport | Div1AirportID | Div1AirportSeqID | Div1WheelsOn | Div1TotalGTime | Div1LongestGTime | Div1WheelsOff | Div1TailNum | Div2Airport | Div2AirportID | Div2AirportSeqID | Div2WheelsOn | Div2TotalGTime | Div2LongestGTime | Div2WheelsOff | Div2TailNum | Div3Airport | Div3AirportID | Div3AirportSeqID | Div3WheelsOn | Div3TotalGTime | Div3LongestGTime | Div3WheelsOff | Div3TailNum | Div4Airport | Div4AirportID | Div4AirportSeqID | Div4WheelsOn | Div4TotalGTime | Div4LongestGTime | Div4WheelsOff | Div4TailNum | Div5Airport | Div5AirportID | Div5AirportSeqID | Div5WheelsOn | Div5TotalGTime | Div5LongestGTime | Div5WheelsOff | Div5TailNum | column109 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2022 | 1 | 1 | 6 | 4 | 2022-01-06 | DL | 19790 | DL | N315DN | 1581 | 11697 | 1169706 | 32467 | FLL | Fort Lauderdale, FL | FL | 12 | Florida | 33 | 12953 | 1295304 | 31703 | LGA | New York, NY | NY | 36 | New York | 22 | 1126 | NA | NA | NA | NA | 1100-1159 | NA | NA | 1419 | NA | NA | NA | NA | 1400-1459 | 1 | A | 0 | 173 | NA | NA | 1 | 1076 | 5 | NA | NA | NA | NA | NA | NA | NA | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | |||||||||||||||||||||||||
2022 | 1 | 1 | 6 | 4 | 2022-01-06 | DL | 19790 | DL | N545US | 1582 | 10397 | 1039707 | 30397 | ATL | Atlanta, GA | GA | 13 | Georgia | 34 | 11697 | 1169706 | 32467 | FLL | Fort Lauderdale, FL | FL | 12 | Florida | 33 | 1631 | 1627 | -4 | 0 | 0 | -1 | 1600-1659 | 15 | 1642 | 1815 | 5 | 1821 | 1820 | -1 | 0 | 0 | -1 | 1800-1859 | 0 | 0 | 110 | 113 | 93 | 1 | 581 | 3 | NA | NA | NA | NA | NA | NA | NA | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ||||||||||||||||||||||
2022 | 1 | 1 | 6 | 4 | 2022-01-06 | DL | 19790 | DL | N545US | 1582 | 11697 | 1169706 | 32467 | FLL | Fort Lauderdale, FL | FL | 12 | Florida | 33 | 10397 | 1039707 | 30397 | ATL | Atlanta, GA | GA | 13 | Georgia | 34 | 1931 | 1929 | -2 | 0 | 0 | -1 | 1900-1959 | 13 | 1942 | 2105 | 10 | 2127 | 2115 | -12 | 0 | 0 | -1 | 2100-2159 | 0 | 0 | 116 | 106 | 83 | 1 | 581 | 3 | NA | NA | NA | NA | NA | NA | NA | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ||||||||||||||||||||||
2022 | 1 | 1 | 6 | 4 | 2022-01-06 | DL | 19790 | DL | N345NB | 1583 | 11697 | 1169706 | 32467 | FLL | Fort Lauderdale, FL | FL | 12 | Florida | 33 | 14492 | 1449202 | 34492 | RDU | Raleigh/Durham, NC | NC | 37 | North Carolina | 36 | 1024 | 1019 | -5 | 0 | 0 | -1 | 1000-1059 | 17 | 1036 | 1209 | 3 | 1227 | 1212 | -15 | 0 | 0 | -1 | 1200-1259 | 0 | 0 | 123 | 113 | 93 | 1 | 680 | 3 | NA | NA | NA | NA | NA | NA | NA | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ||||||||||||||||||||||
2022 | 1 | 1 | 6 | 4 | 2022-01-06 | DL | 19790 | DL | N978AT | 1584 | 10397 | 1039707 | 30397 | ATL | Atlanta, GA | GA | 13 | Georgia | 34 | 12448 | 1244807 | 32448 | JAN | Jackson/Vicksburg, MS | MS | 28 | Mississippi | 53 | 1117 | 1113 | -4 | 0 | 0 | -1 | 1100-1159 | 14 | 1127 | 1127 | 4 | 1142 | 1131 | -11 | 0 | 0 | -1 | 1100-1159 | 0 | 0 | 85 | 78 | 60 | 1 | 341 | 2 | NA | NA | NA | NA | NA | NA | NA | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
FROM tab
WHERE IATA_CODE_Reporting_Airline IN ('AA', 'DL')
LIMIT 5
Year | Quarter | Month | DayofMonth | DayOfWeek | FlightDate | Reporting_Airline | DOT_ID_Reporting_Airline | IATA_CODE_Reporting_Airline | Tail_Number | Flight_Number_Reporting_Airline | OriginAirportID | OriginAirportSeqID | OriginCityMarketID | Origin | OriginCityName | OriginState | OriginStateFips | OriginStateName | OriginWac | DestAirportID | DestAirportSeqID | DestCityMarketID | Dest | DestCityName | DestState | DestStateFips | DestStateName | DestWac | CRSDepTime | DepTime | DepDelay | DepDelayMinutes | DepDel15 | DepartureDelayGroups | DepTimeBlk | TaxiOut | WheelsOff | WheelsOn | TaxiIn | CRSArrTime | ArrTime | ArrDelay | ArrDelayMinutes | ArrDel15 | ArrivalDelayGroups | ArrTimeBlk | Cancelled | CancellationCode | Diverted | CRSElapsedTime | ActualElapsedTime | AirTime | Flights | Distance | DistanceGroup | CarrierDelay | WeatherDelay | NASDelay | SecurityDelay | LateAircraftDelay | FirstDepTime | TotalAddGTime | LongestAddGTime | DivAirportLandings | DivReachedDest | DivActualElapsedTime | DivArrDelay | DivDistance | Div1Airport | Div1AirportID | Div1AirportSeqID | Div1WheelsOn | Div1TotalGTime | Div1LongestGTime | Div1WheelsOff | Div1TailNum | Div2Airport | Div2AirportID | Div2AirportSeqID | Div2WheelsOn | Div2TotalGTime | Div2LongestGTime | Div2WheelsOff | Div2TailNum | Div3Airport | Div3AirportID | Div3AirportSeqID | Div3WheelsOn | Div3TotalGTime | Div3LongestGTime | Div3WheelsOff | Div3TailNum | Div4Airport | Div4AirportID | Div4AirportSeqID | Div4WheelsOn | Div4TotalGTime | Div4LongestGTime | Div4WheelsOff | Div4TailNum | Div5Airport | Div5AirportID | Div5AirportSeqID | Div5WheelsOn | Div5TotalGTime | Div5LongestGTime | Div5WheelsOff | Div5TailNum | column109 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2022 | 1 | 1 | 6 | 4 | 2022-01-06 | DL | 19790 | DL | N315DN | 1581 | 11697 | 1169706 | 32467 | FLL | Fort Lauderdale, FL | FL | 12 | Florida | 33 | 12953 | 1295304 | 31703 | LGA | New York, NY | NY | 36 | New York | 22 | 1126 | NA | NA | NA | NA | 1100-1159 | NA | NA | 1419 | NA | NA | NA | NA | 1400-1459 | 1 | A | 0 | 173 | NA | NA | 1 | 1076 | 5 | NA | NA | NA | NA | NA | NA | NA | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | |||||||||||||||||||||||||
2022 | 1 | 1 | 6 | 4 | 2022-01-06 | DL | 19790 | DL | N545US | 1582 | 10397 | 1039707 | 30397 | ATL | Atlanta, GA | GA | 13 | Georgia | 34 | 11697 | 1169706 | 32467 | FLL | Fort Lauderdale, FL | FL | 12 | Florida | 33 | 1631 | 1627 | -4 | 0 | 0 | -1 | 1600-1659 | 15 | 1642 | 1815 | 5 | 1821 | 1820 | -1 | 0 | 0 | -1 | 1800-1859 | 0 | 0 | 110 | 113 | 93 | 1 | 581 | 3 | NA | NA | NA | NA | NA | NA | NA | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ||||||||||||||||||||||
2022 | 1 | 1 | 6 | 4 | 2022-01-06 | DL | 19790 | DL | N545US | 1582 | 11697 | 1169706 | 32467 | FLL | Fort Lauderdale, FL | FL | 12 | Florida | 33 | 10397 | 1039707 | 30397 | ATL | Atlanta, GA | GA | 13 | Georgia | 34 | 1931 | 1929 | -2 | 0 | 0 | -1 | 1900-1959 | 13 | 1942 | 2105 | 10 | 2127 | 2115 | -12 | 0 | 0 | -1 | 2100-2159 | 0 | 0 | 116 | 106 | 83 | 1 | 581 | 3 | NA | NA | NA | NA | NA | NA | NA | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ||||||||||||||||||||||
2022 | 1 | 1 | 6 | 4 | 2022-01-06 | DL | 19790 | DL | N345NB | 1583 | 11697 | 1169706 | 32467 | FLL | Fort Lauderdale, FL | FL | 12 | Florida | 33 | 14492 | 1449202 | 34492 | RDU | Raleigh/Durham, NC | NC | 37 | North Carolina | 36 | 1024 | 1019 | -5 | 0 | 0 | -1 | 1000-1059 | 17 | 1036 | 1209 | 3 | 1227 | 1212 | -15 | 0 | 0 | -1 | 1200-1259 | 0 | 0 | 123 | 113 | 93 | 1 | 680 | 3 | NA | NA | NA | NA | NA | NA | NA | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ||||||||||||||||||||||
2022 | 1 | 1 | 6 | 4 | 2022-01-06 | DL | 19790 | DL | N978AT | 1584 | 10397 | 1039707 | 30397 | ATL | Atlanta, GA | GA | 13 | Georgia | 34 | 12448 | 1244807 | 32448 | JAN | Jackson/Vicksburg, MS | MS | 28 | Mississippi | 53 | 1117 | 1113 | -4 | 0 | 0 | -1 | 1100-1159 | 14 | 1127 | 1127 | 4 | 1142 | 1131 | -11 | 0 | 0 | -1 | 1100-1159 | 0 | 0 | 85 | 78 | 60 | 1 | 341 | 2 | NA | NA | NA | NA | NA | NA | NA | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
df |>
filter(IATA_CODE_Reporting_Airline %in% c("AA", "DL")) |>
head(5)
Year | Quarter | Month | DayofMonth | DayOfWeek | FlightDate | Reporting_Airline | DOT_ID_Reporting_Airline | IATA_CODE_Reporting_Airline | Tail_Number | Flight_Number_Reporting_Airline | OriginAirportID | OriginAirportSeqID | OriginCityMarketID | Origin | OriginCityName | OriginState | OriginStateFips | OriginStateName | OriginWac | DestAirportID | DestAirportSeqID | DestCityMarketID | Dest | DestCityName | DestState | DestStateFips | DestStateName | DestWac | CRSDepTime | DepTime | DepDelay | DepDelayMinutes | DepDel15 | DepartureDelayGroups | DepTimeBlk | TaxiOut | WheelsOff | WheelsOn | TaxiIn | CRSArrTime | ArrTime | ArrDelay | ArrDelayMinutes | ArrDel15 | ArrivalDelayGroups | ArrTimeBlk | Cancelled | CancellationCode | Diverted | CRSElapsedTime | ActualElapsedTime | AirTime | Flights | Distance | DistanceGroup | CarrierDelay | WeatherDelay | NASDelay | SecurityDelay | LateAircraftDelay | FirstDepTime | TotalAddGTime | LongestAddGTime | DivAirportLandings | DivReachedDest | DivActualElapsedTime | DivArrDelay | DivDistance | Div1Airport | Div1AirportID | Div1AirportSeqID | Div1WheelsOn | Div1TotalGTime | Div1LongestGTime | Div1WheelsOff | Div1TailNum | Div2Airport | Div2AirportID | Div2AirportSeqID | Div2WheelsOn | Div2TotalGTime | Div2LongestGTime | Div2WheelsOff | Div2TailNum | Div3Airport | Div3AirportID | Div3AirportSeqID | Div3WheelsOn | Div3TotalGTime | Div3LongestGTime | Div3WheelsOff | Div3TailNum | Div4Airport | Div4AirportID | Div4AirportSeqID | Div4WheelsOn | Div4TotalGTime | Div4LongestGTime | Div4WheelsOff | Div4TailNum | Div5Airport | Div5AirportID | Div5AirportSeqID | Div5WheelsOn | Div5TotalGTime | Div5LongestGTime | Div5WheelsOff | Div5TailNum | column109 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2022 | 1 | 1 | 6 | 4 | 2022-01-06 | DL | 19790 | DL | N315DN | 1581 | 11697 | 1169706 | 32467 | FLL | Fort Lauderdale, FL | FL | 12 | Florida | 33 | 12953 | 1295304 | 31703 | LGA | New York, NY | NY | 36 | New York | 22 | 1126 | NA | NA | NA | NA | 1100-1159 | NA | NA | 1419 | NA | NA | NA | NA | 1400-1459 | 1 | A | 0 | 173 | NA | NA | 1 | 1076 | 5 | NA | NA | NA | NA | NA | NA | NA | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | |||||||||||||||||||||||||
2022 | 1 | 1 | 6 | 4 | 2022-01-06 | DL | 19790 | DL | N545US | 1582 | 10397 | 1039707 | 30397 | ATL | Atlanta, GA | GA | 13 | Georgia | 34 | 11697 | 1169706 | 32467 | FLL | Fort Lauderdale, FL | FL | 12 | Florida | 33 | 1631 | 1627 | -4 | 0 | 0 | -1 | 1600-1659 | 15 | 1642 | 1815 | 5 | 1821 | 1820 | -1 | 0 | 0 | -1 | 1800-1859 | 0 | 0 | 110 | 113 | 93 | 1 | 581 | 3 | NA | NA | NA | NA | NA | NA | NA | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ||||||||||||||||||||||
2022 | 1 | 1 | 6 | 4 | 2022-01-06 | DL | 19790 | DL | N545US | 1582 | 11697 | 1169706 | 32467 | FLL | Fort Lauderdale, FL | FL | 12 | Florida | 33 | 10397 | 1039707 | 30397 | ATL | Atlanta, GA | GA | 13 | Georgia | 34 | 1931 | 1929 | -2 | 0 | 0 | -1 | 1900-1959 | 13 | 1942 | 2105 | 10 | 2127 | 2115 | -12 | 0 | 0 | -1 | 2100-2159 | 0 | 0 | 116 | 106 | 83 | 1 | 581 | 3 | NA | NA | NA | NA | NA | NA | NA | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ||||||||||||||||||||||
2022 | 1 | 1 | 6 | 4 | 2022-01-06 | DL | 19790 | DL | N345NB | 1583 | 11697 | 1169706 | 32467 | FLL | Fort Lauderdale, FL | FL | 12 | Florida | 33 | 14492 | 1449202 | 34492 | RDU | Raleigh/Durham, NC | NC | 37 | North Carolina | 36 | 1024 | 1019 | -5 | 0 | 0 | -1 | 1000-1059 | 17 | 1036 | 1209 | 3 | 1227 | 1212 | -15 | 0 | 0 | -1 | 1200-1259 | 0 | 0 | 123 | 113 | 93 | 1 | 680 | 3 | NA | NA | NA | NA | NA | NA | NA | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ||||||||||||||||||||||
2022 | 1 | 1 | 6 | 4 | 2022-01-06 | DL | 19790 | DL | N978AT | 1584 | 10397 | 1039707 | 30397 | ATL | Atlanta, GA | GA | 13 | Georgia | 34 | 12448 | 1244807 | 32448 | JAN | Jackson/Vicksburg, MS | MS | 28 | Mississippi | 53 | 1117 | 1113 | -4 | 0 | 0 | -1 | 1100-1159 | 14 | 1127 | 1127 | 4 | 1142 | 1131 | -11 | 0 | 0 | -1 | 1100-1159 | 0 | 0 | 85 | 78 | 60 | 1 | 341 | 2 | NA | NA | NA | NA | NA | NA | NA | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
lf.filter(pl.col("IATA_CODE_Reporting_Airline").is_in(["AA", "DL"])).head(5).collect()
Year | Quarter | Month | DayofMonth | DayOfWeek | FlightDate | Reporting_Airline | DOT_ID_Reporting_Airline | IATA_CODE_Reporting_Airline | Tail_Number | Flight_Number_Reporting_Airline | OriginAirportID | OriginAirportSeqID | OriginCityMarketID | Origin | OriginCityName | OriginState | OriginStateFips | OriginStateName | OriginWac | DestAirportID | DestAirportSeqID | DestCityMarketID | Dest | DestCityName | DestState | DestStateFips | DestStateName | DestWac | CRSDepTime | DepTime | DepDelay | DepDelayMinutes | DepDel15 | DepartureDelayGroups | DepTimeBlk | TaxiOut | … | Div1TotalGTime | Div1LongestGTime | Div1WheelsOff | Div1TailNum | Div2Airport | Div2AirportID | Div2AirportSeqID | Div2WheelsOn | Div2TotalGTime | Div2LongestGTime | Div2WheelsOff | Div2TailNum | Div3Airport | Div3AirportID | Div3AirportSeqID | Div3WheelsOn | Div3TotalGTime | Div3LongestGTime | Div3WheelsOff | Div3TailNum | Div4Airport | Div4AirportID | Div4AirportSeqID | Div4WheelsOn | Div4TotalGTime | Div4LongestGTime | Div4WheelsOff | Div4TailNum | Div5Airport | Div5AirportID | Div5AirportSeqID | Div5WheelsOn | Div5TotalGTime | Div5LongestGTime | Div5WheelsOff | Div5TailNum | column109 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
i64 | i64 | i64 | i64 | i64 | date | str | i64 | str | str | i64 | i64 | i64 | i64 | str | str | str | str | str | i64 | i64 | i64 | i64 | str | str | str | str | str | i64 | str | str | f64 | f64 | f64 | i64 | str | f64 | … | f64 | f64 | str | str | str | i64 | i64 | str | f64 | f64 | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str |
2022 | 1 | 1 | 6 | 4 | 2022-01-06 | "DL" | 19790 | "DL" | "N315DN" | 1581 | 11697 | 1169706 | 32467 | "FLL" | "Fort Lauderdale, FL" | "FL" | "12" | "Florida" | 33 | 12953 | 1295304 | 31703 | "LGA" | "New York, NY" | "NY" | "36" | "New York" | 22 | "1126" | "" | null | null | null | null | "1100-1159" | null | … | null | null | "" | "" | "" | null | null | "" | null | null | "" | "" | "" | null | null | "" | null | null | "" | "" | "" | null | null | "" | null | null | "" | "" | "" | null | null | "" | null | null | "" | "" | null |
2022 | 1 | 1 | 6 | 4 | 2022-01-06 | "DL" | 19790 | "DL" | "N545US" | 1582 | 10397 | 1039707 | 30397 | "ATL" | "Atlanta, GA" | "GA" | "13" | "Georgia" | 34 | 11697 | 1169706 | 32467 | "FLL" | "Fort Lauderdale, FL" | "FL" | "12" | "Florida" | 33 | "1631" | "1627" | -4.0 | 0.0 | 0.0 | -1 | "1600-1659" | 15.0 | … | null | null | "" | "" | "" | null | null | "" | null | null | "" | "" | "" | null | null | "" | null | null | "" | "" | "" | null | null | "" | null | null | "" | "" | "" | null | null | "" | null | null | "" | "" | null |
2022 | 1 | 1 | 6 | 4 | 2022-01-06 | "DL" | 19790 | "DL" | "N545US" | 1582 | 11697 | 1169706 | 32467 | "FLL" | "Fort Lauderdale, FL" | "FL" | "12" | "Florida" | 33 | 10397 | 1039707 | 30397 | "ATL" | "Atlanta, GA" | "GA" | "13" | "Georgia" | 34 | "1931" | "1929" | -2.0 | 0.0 | 0.0 | -1 | "1900-1959" | 13.0 | … | null | null | "" | "" | "" | null | null | "" | null | null | "" | "" | "" | null | null | "" | null | null | "" | "" | "" | null | null | "" | null | null | "" | "" | "" | null | null | "" | null | null | "" | "" | null |
2022 | 1 | 1 | 6 | 4 | 2022-01-06 | "DL" | 19790 | "DL" | "N345NB" | 1583 | 11697 | 1169706 | 32467 | "FLL" | "Fort Lauderdale, FL" | "FL" | "12" | "Florida" | 33 | 14492 | 1449202 | 34492 | "RDU" | "Raleigh/Durham, NC" | "NC" | "37" | "North Carolina" | 36 | "1024" | "1019" | -5.0 | 0.0 | 0.0 | -1 | "1000-1059" | 17.0 | … | null | null | "" | "" | "" | null | null | "" | null | null | "" | "" | "" | null | null | "" | null | null | "" | "" | "" | null | null | "" | null | null | "" | "" | "" | null | null | "" | null | null | "" | "" | null |
2022 | 1 | 1 | 6 | 4 | 2022-01-06 | "DL" | 19790 | "DL" | "N978AT" | 1584 | 10397 | 1039707 | 30397 | "ATL" | "Atlanta, GA" | "GA" | "13" | "Georgia" | 34 | 12448 | 1244807 | 32448 | "JAN" | "Jackson/Vicksburg, MS" | "MS" | "28" | "Mississippi" | 53 | "1117" | "1113" | -4.0 | 0.0 | 0.0 | -1 | "1100-1159" | 14.0 | … | null | null | "" | "" | "" | null | null | "" | null | null | "" | "" | "" | null | null | "" | null | null | "" | "" | "" | null | null | "" | null | null | "" | "" | "" | null | null | "" | null | null | "" | "" | null |
Assigning
- PRQL DuckDB
- SQL DuckDB
- dplyr R
- Python Polars
let f = from [
{a = 1, b = 10},
{a = 2, b = 20},
{a = 3, b = 30},
{a = 4, b = 40},
{a = 5, b = 50},
]
from f
select {
a, # Note: can't use `derive` here https://github.com/PRQL/prql/issues/3130
b = case [
a <= 3 => b // 10,
true => b
]
}
a | b |
---|---|
1 | 1 |
2 | 2 |
3 | 3 |
4 | 40 |
5 | 50 |
WITH table_0 AS (
SELECT
1 AS a,
10 AS b
UNION
ALL
SELECT
2 AS a,
20 AS b
UNION
ALL
SELECT
3 AS a,
30 AS b
UNION
ALL
SELECT
4 AS a,
40 AS b
UNION
ALL
SELECT
5 AS a,
50 AS b
),
f AS (
SELECT
a,
b
FROM
table_0 AS table_1
)
SELECT
a,
CASE
WHEN a <= 3 THEN b // 10
ELSE b
END AS b
FROM
f
a | b |
---|---|
1 | 1 |
2 | 2 |
3 | 3 |
4 | 40 |
5 | 50 |
f <- dplyr::tibble(a = 1:5, b = a * 10)
f |>
mutate(
b = case_when(
a <= 3 ~ b %/% 10,
.default = b
)
)
a | b |
---|---|
1 | 1 |
2 | 2 |
3 | 3 |
4 | 40 |
5 | 50 |
f = pl.DataFrame({"a": list(range(1, 6)), "b": list(range(10, 60, 10))})
f.with_columns(
b=pl.when(pl.col("a") <= 3).then(pl.col("b") // 10).otherwise(pl.col("b"))
)
a | b |
---|---|
i64 | i64 |
1 | 1 |
2 | 2 |
3 | 3 |
4 | 40 |
5 | 50 |