pg_duckdb is a Postgres extension that embeds DuckDB's columnar-vectorized analytics engine and features into Postgres. We recommend using pg_duckdb to build high performance analytics and data-intensive applications.
pg_duckdb was developed in collaboration with our partners, Hydra and MotherDuck.
- Broad support for Postgres types and functions within analytical context
- Query external datasets on S3, GCP and Azure (Parquet, Iceberg and Delta Lake)
- Join native Postgres and analytical data
- Direct access to data stored in MotherDuck
- Support installation of DuckDB extensions
Pre-built binaries and additional installation options are coming soon.
To build pg_duckdb, you need:
- Postgres 16 or 17
- Ubuntu 22.04 or MacOS
- Standard set of build tools for building Postgres extensions
- Build tools that are required to build DuckDB
To build and install, run:
make install
Next, load the pg_duckdb extension:
CREATE EXTENSION pg_duckdb;
IMPORTANT: Once loaded you can use DuckDB execution by running SET duckdb.execution TO true
. This is opt-in to avoid breaking existing queries. To avoid doing that for every session, you can configure it for a certain user by doing ALTER USER my_analytics_user SET duckdb.execution TO true
.
-
SELECT
queries executed by the DuckDB engine can directly read Postgres tables.- Able to read data types that exist in both Postgres and DuckDB. The following data types are supported: numeric, character, binary, date/time, boolean, uuid, json, and arrays.
- If DuckDB cannot support the query for any reason, execution falls back to Postgres.
-
Read parquet and CSV files from object storage (AWS S3, Cloudflare R2, or Google GCS).
SELECT n FROM read_parquet('s3://bucket/file.parquet') AS (n int)
SELECT n FROM read_csv('s3://bucket/file.csv') AS (n int)
- You can pass globs and arrays to these functions, just like in DuckDB
-
Enable the DuckDB Iceberg extension using
SELECT duckdb.install_extension('iceberg')
and read Iceberg files withiceberg_scan
. -
Write a query — or an entire table — to parquet in object storage.
COPY (SELECT foo, bar FROM baz) TO 's3://...'
COPY table TO 's3://...'
-
Read and write to Parquet format in a single query
COPY ( SELECT count(*), name FROM read_parquet('s3://bucket/file.parquet') AS (name text) GROUP BY name ORDER BY count DESC ) TO 's3://bucket/results.parquet';
-
Query and
JOIN
data in object storage with Postgres tables, views, and materialized views. -
Create indexes on Postgres tables to accelerate your DuckDB queries
-
Install DuckDB extensions using
SELECT duckdb.install_extension('extension_name');
-
Toggle DuckDB execution on/off with a setting:
SET duckdb.execution = true|false
-
Cache remote object locally for faster execution using
SELECT duckdb.cache('path', 'type');
where- 'path' is HTTPFS/S3/GCS/R2 remote object
- 'type' specify remote object type: 'parquet' or 'csv'
The best way to get started is to connect Postgres to a new or existing object storage bucket (AWS S3, Cloudflare R2, or Google GCS) with pg_duckdb. You can query data in Parquet, CSV, and Iceberg format using read_parquet
, read_csv
, and iceberg_scan
respectively.
-
Add a credential to enable DuckDB's httpfs support.
-- Session Token is Optional INSERT INTO duckdb.secrets (type, id, secret, session_token, region) VALUES ('S3', 'access_key_id', 'secret_access_key', 'session_token', 'us-east-1');
-
Copy data directly to your bucket - no ETL pipeline!
COPY (SELECT user_id, item_id, price, purchased_at FROM purchases) TO 's3://your-bucket/purchases.parquet;
-
Perform analytics on your data.
SELECT SUM(price) AS total, item_id FROM read_parquet('s3://your-bucket/purchases.parquet') AS (price float, item_id int) GROUP BY item_id ORDER BY total DESC LIMIT 100;
Please see the project roadmap for upcoming planned tasks and features.
pg_duckdb integration with MotherDuck will enable dual execution with Differential Storage.
- Zero-copy snapshots and forks
- Time travel
- Data tiering
- Improved concurrency and cacheability
pg_duckdb was developed in collaboration with our partners, Hydra and MotherDuck. We look forward to their continued contributions and leadership.
Hydra is a Y Combinator-backed database company, focused on DuckDB-Powered Postgres for app developers.
MotherDuck is the cloud-based data warehouse that extends the power of DuckDB.
We welcome all contributions big and small:
- Vote on or suggest features for our roadmap.
- Open a PR.
- Submit a feature request or bug report.
- Please see the project roadmap for upcoming planned tasks and features.
- GitHub Issues for bugs and missing features
- Discord discussion with the DuckDB community
- See our docs for more info and limitations