Introduction
This document is an attempt to do some typical table data manipulation using PRQL and DuckDB.
PRQL is a in development modern language for data manipulation, and can be compiled to SQL. DuckDB is a modern in-memory SQL OLAP (very fast) database management system.
Motivation
CLI, JavaScript, Python, R, etc., we can combine PRQL compiler and DuckDB in various places to manipulate data with PRQL.
At the same time, new features are being added to DuckDB’s SQL on a constant basis, and the syntax is becoming more friendly.
So, I thought that by comparing PRQL and the latest DuckDB SQL (and other query libraries), we could clarify the missing features of the PRQL and highlight the latest features of DuckDB.
Content and Credit
The content of each data manipulation is based on the awesome Modern Polars book by Kevin Heavey. And furthermore its origin is Modern Pandas by Tom Augsburger.
Each query is side-by-side with that of the original Python Polars one and the equivalent operations in PRQL, DuckDB SQL, and dplyr (R).
Since this document focuses on the differences between the grammars of the different languages and does not go into detail about the processing details, I suggest that you also check out the original excellent artcles and book.
The versions we are currently using are as follows:
- PRQL: 0.13.0
- DuckDB: v1.1.0
Running the Code on IPython
As mentioned above, PRQL and DuckDB can run on a variety of languages. This section describes, as an example, the use of IPython for easy interactive execution of PRQL, SQL, and Polars.
Setup
This command installs the necessary Python packages.
python -m pip install pyprql polars[pyarrow]
Then, we can start ipython
.
Run load_ext
magic to activate PRQL magic (from
pyprql) and SQL magic (from
jupysql).
%load_ext pyprql.magic
%load_ext sql
Use SQL magic connects to DuckDB in-memory database. Note that PRQL magic shares the connection with SQL magic.
%sql duckdb://
To fix the SQL dialect used inside PRQL magic, we need to set the target
with config magic. By setting "sql.duckdb"
here, we can specify to the
PRQL compiler that it should generate optimized SQL for DuckDB.
%config PrqlMagic.target = "sql.duckdb"
All available targets can be found in the prqlc.get_targets
function.
import prqlc
print(prqlc.get_targets())
['sql.any', 'sql.ansi', 'sql.bigquery', 'sql.clickhouse', 'sql.duckdb', 'sql.generic', 'sql.glaredb', 'sql.mssql', 'sql.mysql', 'sql.postgres', 'sql.sqlite', 'sql.snowflake']
Combine DuckDB and Polars
By setting autopolars
config, the results of PRQL and SQL executions
can be converted to polars.DataFrame.
%config PrqlMagic.autopolars = True
%config SqlMagic.autopolars = True
Also, since DuckDB can execute queries against polars.DataFrame and polars.LazyFrame etc., these can be referenced directly from PRQL or SQL, as shown below.
import polars as pl
lf = pl.LazyFrame({"a": list(range(1, 6))})
With JupySQL 0.10.14 and DuckDB 1.1, the following setting is needed to scan all frames in Python.
%%sql
set python_scan_all_frames=true
- PRQL DuckDB
- SQL DuckDB
- Python Polars
%%prql
from lf
derive b = a * 5
take 3
a | b |
---|---|
i64 | i64 |
1 | 5 |
2 | 10 |
3 | 15 |
%%sql
SELECT
a,
a * 5 AS b
FROM lf
LIMIT 3
a | b |
---|---|
i64 | i64 |
1 | 5 |
2 | 10 |
3 | 15 |
lf.with_columns(b=pl.col("a") * 5).head(3).collect()
a | b |
---|---|
i64 | i64 |
1 | 5 |
2 | 10 |
3 | 15 |
License
This website’s content is licensed under the MIT license.