Skip to content

Commit

Permalink
duckdb
Browse files Browse the repository at this point in the history
Signed-off-by: Zhao Junwang <zhjwpku@gmail.com>
  • Loading branch information
zhjwpku committed May 26, 2024
1 parent 28dea8a commit be8a70d
Show file tree
Hide file tree
Showing 4 changed files with 37 additions and 1 deletion.
1 change: 1 addition & 0 deletions src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
- [lakehouse](./databases/olap/lakehouse.md)
- [delta lake](./databases/olap/delta-lake.md)
- [vertica](./databases/olap/vertica.md)
- [duckdb](./databases/olap/duckdb.md)
- [htap](./databases/htap/README.md)
- [greenplum](./databases/htap/greenplum-htap.md)
- [vector db](./databases/vectordb/README.md)
Expand Down
2 changes: 2 additions & 0 deletions src/databases/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
- **[Lakehouse: A New Generation of Open Platforms that Unify DataWarehousing and Advanced Analytics][lakehouse]**
- **[Delta Lake: HighPerformance ACID Table Storage over Cloud Object Stores][deltalake]**
- **[The Vertica Analytic Database: C-Store 7 Years Later][vertica]**
- **[Data Management for Data Science Towards Embedded Analytics][duckdb]**
- **[HTAP](htap/index.html)**
- **[Greenplum: A Hybrid Database for Transactional and Analytical Workloads][greenplum]**
- **[Vector DB](vectordb/index.html)**
Expand Down Expand Up @@ -55,3 +56,4 @@
[ivf-hnsw]: vectordb/ivf-hnsw.md
[wisckey]: kv/wisckey.md
[diskann]: vectordb/diskann.md
[duckdb]: olap/duckdb.md
4 changes: 3 additions & 1 deletion src/databases/olap/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,9 @@
- **[Lakehouse: A New Generation of Open Platforms that Unify DataWarehousing and Advanced Analytics][lakehouse]**
- **[Delta Lake: HighPerformance ACID Table Storage over Cloud Object Stores][deltalake]**
- **[The Vertica Analytic Database: C-Store 7 Years Later][vertica]**
- **[Data Management for Data Science Towards Embedded Analytics][duckdb]**

[lakehouse]: lakehouse.md
[deltalake]: delta-lake.md
[vertica]: vertica.md
[vertica]: vertica.md
[duckdb]: duckdb.md
31 changes: 31 additions & 0 deletions src/databases/olap/duckdb.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
### [Data Management for Data Science Towards Embedded Analytics](https://duckdb.org/pdf/CIDR2020-raasveldt-muehleisen-duckdb.pdf)

> CIDR 2020
该论文探讨了数据科学领域对数据管理解决方案的需求,并提出了一类新的数据管理系统:嵌入式分析系统(embedded analytical systems)。

数据科学崛起导致数据分析任务复杂性增加,数据科学家通常使用脚本语言(如 Python 或 R)在个人电脑上进行中等规模的分析,传统的 RDBMS 不能满足数据科学家的需求,不适合本地数据分析用例。

嵌入式分析系统为了填补了这一空白,要解决如下一些问题:

- Combined OLAP & ETL Workloads: 高效率地支持 OLAP 工作负载以及批量追加和更新
- Transfer Efficiency: 数据库和应用程序在同一进程和地址空间中运行,需要有效利用这一机会实现高效的数据共享
- Resilience: 嵌入式数据库需要能够检测硬件问题并防止数据损坏
- Cooperation: 系统需要能够适应资源竞争,与主机应用程序共享硬件资源

**DuckDB**

DuckDB 是一个为嵌入式分析设计的新型关系型数据库管理系统。

- OLAP & ETL:使用矢量化解释执行引擎,优化了 OLAP 查询
- 弹性:DuckDB 计算并存储持久存储块的校验和,并在读取时验证,以保护数据完整性
- 协作:DuckDB 允许用户手动设置内存和 CPU 核心使用率的硬限制,并计划实现自适应资源使用方案
- 传输效率:DuckDB 实现了高效的客户端 API,允许客户端应用程序直接成为物理查询处理计划的根算子

### Code

[DuckDB is an in-process SQL OLAP Database Management System](https://github.com/duckdb/duckdb)

### Further readings

- [DuckDB: an Embeddable Analytical Database](https://duckdb.org/pdf/SIGMOD2019-demo-duckdb.pdf)

0 comments on commit be8a70d

Please sign in to comment.