From 6bb2aa81465a470779b5ba6f197dc5737299e51e Mon Sep 17 00:00:00 2001 From: Andrew Lamb Date: Mon, 9 Dec 2024 10:04:42 -0500 Subject: [PATCH 1/7] Documentat SQL dialect followed --- docs/source/user-guide/sql/dialect.md | 39 +++++++++++++++++++++++++++ docs/source/user-guide/sql/index.rst | 1 + 2 files changed, 40 insertions(+) create mode 100644 docs/source/user-guide/sql/dialect.md diff --git a/docs/source/user-guide/sql/dialect.md b/docs/source/user-guide/sql/dialect.md new file mode 100644 index 000000000000..18a7577ca6b6 --- /dev/null +++ b/docs/source/user-guide/sql/dialect.md @@ -0,0 +1,39 @@ + + +# SQL Dialect + +By default, DataFusion follows the [PostgreSQL SQL dialect]. +For Array/List functions and semantics, it follows the [DuckDB SQL dialect]. + +[DuckDB SQL dialect]: https://duckdb.org/docs/sql/functions/array +[PostgreSQL SQL dialect]: https://www.postgresql.org/docs/current/sql.html + + +## Rationale + +SQL Engines have a choice to either use an existing SQL dialect or define their +own. Using an existing dialect may not fit perfectly as it is hard to match +semantics exactly (need bug-for-bug compatibility), and is likely not what all +users want. However, it avoids the (very significant) effort of defining +semantics as well as documenting and teaching users about them. + +As DataFusion is highly customizable, systems built on DataFusion can and do +update functions and SQL syntax to model other systems, such as Spark or +MySQL. \ No newline at end of file diff --git a/docs/source/user-guide/sql/index.rst b/docs/source/user-guide/sql/index.rst index 0508fa12f0f3..a52b2e89e76c 100644 --- a/docs/source/user-guide/sql/index.rst +++ b/docs/source/user-guide/sql/index.rst @@ -21,6 +21,7 @@ SQL Reference .. toctree:: :maxdepth: 2 + dialect data_types select subqueries From 5af035c79ab90d5573bf77705a500f8dd0bc75d9 Mon Sep 17 00:00:00 2001 From: Andrew Lamb Date: Mon, 9 Dec 2024 10:04:58 -0500 Subject: [PATCH 2/7] prettier --- docs/source/user-guide/sql/dialect.md | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/docs/source/user-guide/sql/dialect.md b/docs/source/user-guide/sql/dialect.md index 18a7577ca6b6..c91bace73358 100644 --- a/docs/source/user-guide/sql/dialect.md +++ b/docs/source/user-guide/sql/dialect.md @@ -19,12 +19,11 @@ # SQL Dialect -By default, DataFusion follows the [PostgreSQL SQL dialect]. +By default, DataFusion follows the [PostgreSQL SQL dialect]. For Array/List functions and semantics, it follows the [DuckDB SQL dialect]. -[DuckDB SQL dialect]: https://duckdb.org/docs/sql/functions/array -[PostgreSQL SQL dialect]: https://www.postgresql.org/docs/current/sql.html - +[duckdb sql dialect]: https://duckdb.org/docs/sql/functions/array +[postgresql sql dialect]: https://www.postgresql.org/docs/current/sql.html ## Rationale @@ -36,4 +35,4 @@ semantics as well as documenting and teaching users about them. As DataFusion is highly customizable, systems built on DataFusion can and do update functions and SQL syntax to model other systems, such as Spark or -MySQL. \ No newline at end of file +MySQL. From 49edb037129ad8bd3b58a0ca5182491ec709eb31 Mon Sep 17 00:00:00 2001 From: Oleks V Date: Mon, 9 Dec 2024 09:09:02 -0800 Subject: [PATCH 3/7] Apply suggestions from code review --- docs/source/user-guide/sql/dialect.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/source/user-guide/sql/dialect.md b/docs/source/user-guide/sql/dialect.md index c91bace73358..fd0a859a2cf7 100644 --- a/docs/source/user-guide/sql/dialect.md +++ b/docs/source/user-guide/sql/dialect.md @@ -19,7 +19,7 @@ # SQL Dialect -By default, DataFusion follows the [PostgreSQL SQL dialect]. +By default, Apache DataFusion follows the [PostgreSQL SQL dialect]. For Array/List functions and semantics, it follows the [DuckDB SQL dialect]. [duckdb sql dialect]: https://duckdb.org/docs/sql/functions/array @@ -33,6 +33,6 @@ semantics exactly (need bug-for-bug compatibility), and is likely not what all users want. However, it avoids the (very significant) effort of defining semantics as well as documenting and teaching users about them. -As DataFusion is highly customizable, systems built on DataFusion can and do -update functions and SQL syntax to model other systems, such as Spark or +As Apache DataFusion is highly customizable, systems built on DataFusion can and do +update functions and SQL syntax to model other systems, such as Apache Spark or MySQL. From b674a42a25a29935492f3fc58a1fbcd14c954e22 Mon Sep 17 00:00:00 2001 From: Andrew Lamb Date: Tue, 10 Dec 2024 07:15:19 -0500 Subject: [PATCH 4/7] try and clarify the postgres behavior refers to frontend --- docs/source/user-guide/sql/dialect.md | 27 +++++++++++++++++++++------ 1 file changed, 21 insertions(+), 6 deletions(-) diff --git a/docs/source/user-guide/sql/dialect.md b/docs/source/user-guide/sql/dialect.md index fd0a859a2cf7..5a45152f1165 100644 --- a/docs/source/user-guide/sql/dialect.md +++ b/docs/source/user-guide/sql/dialect.md @@ -19,11 +19,28 @@ # SQL Dialect -By default, Apache DataFusion follows the [PostgreSQL SQL dialect]. -For Array/List functions and semantics, it follows the [DuckDB SQL dialect]. +The included SQL supported in Apache DataFusion mostly follows the [PostgreSQL +SQL dialect], including: + +- The sql parser and [SQL planner] +- Type checking, analyzer, and type coercions +- Semantics of functions bundled with DataFusion + +Notable exceptions: + +- Array/List functions and semantics follow the [DuckDB SQL dialect]. +- DataFusion's type system is based on the [Apache Arrow type system], and the mapping to PostgrSQL types is not always 1:1. +- DataFusion has its own syntax (dialect) for certain operations (like [`CREATE EXTERNAL TABLE`]) -[duckdb sql dialect]: https://duckdb.org/docs/sql/functions/array [postgresql sql dialect]: https://www.postgresql.org/docs/current/sql.html +[sql planner]: https://docs.rs/datafusion/latest/datafusion/sql/planner/struct.SqlToRel.html +[duckdb sql dialect]: https://duckdb.org/docs/sql/functions/array +[apache arrow type system]: https://arrow.apache.org/docs/format/Columnar.html#data-types +[`create external table`]: ddl.md#create-external-table + +As Apache DataFusion is designed to be fully customizable, systems built on +DataFusion can and do update functions, type rules, and SQL syntax to follow +other systems, such as Apache Spark or MySQL. ## Rationale @@ -33,6 +50,4 @@ semantics exactly (need bug-for-bug compatibility), and is likely not what all users want. However, it avoids the (very significant) effort of defining semantics as well as documenting and teaching users about them. -As Apache DataFusion is highly customizable, systems built on DataFusion can and do -update functions and SQL syntax to model other systems, such as Apache Spark or -MySQL. + From 8fd7f44aad8f43b39101603e0fc23d2db128597d Mon Sep 17 00:00:00 2001 From: Andrew Lamb Date: Tue, 10 Dec 2024 07:19:06 -0500 Subject: [PATCH 5/7] clarify how to change behavior --- docs/source/user-guide/sql/dialect.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/source/user-guide/sql/dialect.md b/docs/source/user-guide/sql/dialect.md index 5a45152f1165..7c7789981162 100644 --- a/docs/source/user-guide/sql/dialect.md +++ b/docs/source/user-guide/sql/dialect.md @@ -32,16 +32,18 @@ Notable exceptions: - DataFusion's type system is based on the [Apache Arrow type system], and the mapping to PostgrSQL types is not always 1:1. - DataFusion has its own syntax (dialect) for certain operations (like [`CREATE EXTERNAL TABLE`]) +As Apache DataFusion is designed to be fully customizable, systems built on +DataFusion can and do implement different SQL semantics. Using DataFusion's APs, +you can provide alternate function definitions, type rules, and/or SQL syntax +that matches other systems such as Apache Spark or MySQL or your own custom +semantics. + [postgresql sql dialect]: https://www.postgresql.org/docs/current/sql.html [sql planner]: https://docs.rs/datafusion/latest/datafusion/sql/planner/struct.SqlToRel.html [duckdb sql dialect]: https://duckdb.org/docs/sql/functions/array [apache arrow type system]: https://arrow.apache.org/docs/format/Columnar.html#data-types [`create external table`]: ddl.md#create-external-table -As Apache DataFusion is designed to be fully customizable, systems built on -DataFusion can and do update functions, type rules, and SQL syntax to follow -other systems, such as Apache Spark or MySQL. - ## Rationale SQL Engines have a choice to either use an existing SQL dialect or define their @@ -49,5 +51,3 @@ own. Using an existing dialect may not fit perfectly as it is hard to match semantics exactly (need bug-for-bug compatibility), and is likely not what all users want. However, it avoids the (very significant) effort of defining semantics as well as documenting and teaching users about them. - - From e9f5da148841ddbaa34c0f7d653d93f04b6b744e Mon Sep 17 00:00:00 2001 From: Andrew Lamb Date: Wed, 11 Dec 2024 07:39:58 -0500 Subject: [PATCH 6/7] Update docs/source/user-guide/sql/dialect.md Co-authored-by: Piotr Findeisen --- docs/source/user-guide/sql/dialect.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/user-guide/sql/dialect.md b/docs/source/user-guide/sql/dialect.md index 7c7789981162..7fae7e1ca0b4 100644 --- a/docs/source/user-guide/sql/dialect.md +++ b/docs/source/user-guide/sql/dialect.md @@ -29,7 +29,7 @@ SQL dialect], including: Notable exceptions: - Array/List functions and semantics follow the [DuckDB SQL dialect]. -- DataFusion's type system is based on the [Apache Arrow type system], and the mapping to PostgrSQL types is not always 1:1. +- DataFusion's type system is based on the [Apache Arrow type system], and the mapping to PostgreSQL types is not always 1:1. - DataFusion has its own syntax (dialect) for certain operations (like [`CREATE EXTERNAL TABLE`]) As Apache DataFusion is designed to be fully customizable, systems built on From c6a83f3899ece6484b7ff26ee5b2d6d53295f578 Mon Sep 17 00:00:00 2001 From: Andrew Lamb Date: Wed, 11 Dec 2024 07:40:35 -0500 Subject: [PATCH 7/7] Apply suggestions from code review Co-authored-by: Piotr Findeisen --- docs/source/user-guide/sql/dialect.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/user-guide/sql/dialect.md b/docs/source/user-guide/sql/dialect.md index 7fae7e1ca0b4..b2c397bc344a 100644 --- a/docs/source/user-guide/sql/dialect.md +++ b/docs/source/user-guide/sql/dialect.md @@ -22,7 +22,7 @@ The included SQL supported in Apache DataFusion mostly follows the [PostgreSQL SQL dialect], including: -- The sql parser and [SQL planner] +- The SQL parser and [SQL planner] - Type checking, analyzer, and type coercions - Semantics of functions bundled with DataFusion @@ -33,7 +33,7 @@ Notable exceptions: - DataFusion has its own syntax (dialect) for certain operations (like [`CREATE EXTERNAL TABLE`]) As Apache DataFusion is designed to be fully customizable, systems built on -DataFusion can and do implement different SQL semantics. Using DataFusion's APs, +DataFusion can and do implement different SQL semantics. Using DataFusion's APIs, you can provide alternate function definitions, type rules, and/or SQL syntax that matches other systems such as Apache Spark or MySQL or your own custom semantics.