pg_duckdb: Official Postgres extension for DuckDB

pg_duckdb is a Postgres extension that embeds DuckDB's columnar-vectorized analytics engine and features into Postgres. We recommend using pg_duckdb to build high performance analytics and data-intensive applications.

pg_duckdb was developed in collaboration with our partners, Hydra and MotherDuck.

Goals

Broad support for Postgres types and functions within analytical context
Query external datasets on S3, GCP and Azure (Parquet, Iceberg and Delta Lake)
Join native Postgres and analytical data
Direct access to data stored in MotherDuck
Support installation of DuckDB extensions

Installation

Pre-built binaries and additional installation options are coming soon.

To build pg_duckdb, you need:

Postgres 16 or 17
Ubuntu 22.04 or MacOS
Standard set of build tools for building Postgres extensions
Build tools that are required to build DuckDB

To build and install, run:

make install

Next, load the pg_duckdb extension:

CREATE EXTENSION pg_duckdb;

IMPORTANT: Once loaded you can use DuckDB execution by running SET duckdb.execution TO true. This is opt-in to avoid breaking existing queries. To avoid doing that for every session, you can configure it for a certain user by doing ALTER USER my_analytics_user SET duckdb.execution TO true.

Features

SELECT queries executed by the DuckDB engine can directly read Postgres tables.
- Able to read data types that exist in both Postgres and DuckDB. The following data types are supported: numeric, character, binary, date/time, boolean, uuid, json, and arrays.
- If DuckDB cannot support the query for any reason, execution falls back to Postgres.
Read parquet and CSV files from object storage (AWS S3, Cloudflare R2, or Google GCS).
- SELECT n FROM read_parquet('s3://bucket/file.parquet') AS (n int)
- SELECT n FROM read_csv('s3://bucket/file.csv') AS (n int)
- You can pass globs and arrays to these functions, just like in DuckDB
Enable the DuckDB Iceberg extension using SELECT duckdb.install_extension('iceberg') and read Iceberg files with iceberg_scan.
Write a query — or an entire table — to parquet in object storage.
- COPY (SELECT foo, bar FROM baz) TO 's3://...'
- COPY table TO 's3://...'

Read and write to Parquet format in a single query

 COPY (
   SELECT count(*), name
   FROM read_parquet('s3://bucket/file.parquet') AS (name text)
   GROUP BY name
   ORDER BY count DESC
 ) TO 's3://bucket/results.parquet';

Query and JOIN data in object storage with Postgres tables, views, and materialized views.
Create indexes on Postgres tables to accelerate your DuckDB queries
Install DuckDB extensions using SELECT duckdb.install_extension('extension_name');
Toggle DuckDB execution on/off with a setting:
- SET duckdb.execution = true|false
Cache remote object locally for faster execution using SELECT duckdb.cache('path', 'type'); where
- 'path' is HTTPFS/S3/GCS/R2 remote object
- 'type' specify remote object type: 'parquet' or 'csv'

Getting Started

The best way to get started is to connect Postgres to a new or existing object storage bucket (AWS S3, Cloudflare R2, or Google GCS) with pg_duckdb. You can query data in Parquet, CSV, and Iceberg format using read_parquet, read_csv, and iceberg_scan respectively.

Add a credential to enable DuckDB's httpfs support.

-- Session Token is Optional
INSERT INTO duckdb.secrets
(type, id, secret, session_token, region)
VALUES ('S3', 'access_key_id', 'secret_access_key', 'session_token', 'us-east-1');

Copy data directly to your bucket - no ETL pipeline!

COPY (SELECT user_id, item_id, price, purchased_at FROM purchases)
TO 's3://your-bucket/purchases.parquet;

Perform analytics on your data.

SELECT SUM(price) AS total, item_id
FROM read_parquet('s3://your-bucket/purchases.parquet')
  AS (price float, item_id int)
GROUP BY item_id
ORDER BY total DESC
LIMIT 100;

Roadmap

Please see the project roadmap for upcoming planned tasks and features.

Connect with MotherDuck

pg_duckdb integration with MotherDuck will enable dual execution with Differential Storage.

Zero-copy snapshots and forks
Time travel
Data tiering
Improved concurrency and cacheability

Contributing

pg_duckdb was developed in collaboration with our partners, Hydra and MotherDuck. We look forward to their continued contributions and leadership.

Hydra is a Y Combinator-backed database company, focused on DuckDB-Powered Postgres for app developers.

Name		Name	Last commit message	Last commit date
Latest commit History 202 Commits
.github/workflows		.github/workflows
include/pgduckdb		include/pgduckdb
sql		sql
src		src
test		test
third_party		third_party
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
Makefile.global		Makefile.global
NOTICE		NOTICE
README.md		README.md
dev_requirements.txt		dev_requirements.txt
pg_duckdb.control		pg_duckdb.control
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pg_duckdb: Official Postgres extension for DuckDB

Goals

Installation

Features

Getting Started

Roadmap

Connect with MotherDuck

Contributing

Resources

About

Releases

Packages

Languages

License

hydradatabase/pg_duckdb

Folders and files

Latest commit

History

Repository files navigation

pg_duckdb: Official Postgres extension for DuckDB

Goals

Installation

Features

Getting Started

Roadmap

Connect with MotherDuck

Contributing

Resources

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages