Skip to content

Commit

Permalink
update arrow flight (#1697)
Browse files Browse the repository at this point in the history
## Versions 

- [x] dev
- [x] 3.0
- [x] 2.1
- [ ] 2.0

## Languages

- [x] Chinese
- [x] English

## Docs Checklist

- [ ] Checked by AI
- [ ] Test Cases Built
  • Loading branch information
wangtianyi2004 authored Jan 3, 2025
1 parent c85765f commit f37963f
Show file tree
Hide file tree
Showing 7 changed files with 30 additions and 63 deletions.
15 changes: 5 additions & 10 deletions docs/db-connect/arrow-flight-sql-connect.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,21 +24,16 @@ specific language governing permissions and limitations
under the License.
-->

:::tip
- since 2.1
:::
Since Doris 2.1, a high-speed data link based on the Arrow Flight SQL protocol has been implemented, allowing SQL queries to rapidly retrieve large volumes of data from Doris in multiple languages. Arrow Flight SQL also provides a universal JDBC driver, supporting seamless interaction with databases that also follow the Arrow Flight SQL protocol. In some scenarios, performance can improve by up to a hundred times compared to data transfer solutions using MySQL Client or JDBC/ODBC drivers.

Doris implements high-speed data links based on the Arrow Flight SQL protocol, and supports multiple languages ​​to use SQL to read large batches of data from Doris at high speed.
## Implementation Principle

## Usage
In Doris, query results are organized in columnar format as Blocks. In versions prior to 2.1, data could be transferred to the target client via MySQL Client or JDBC/ODBC drivers, but this required deserializing row-based Bytes into columnar format. By building a high-speed data transfer link based on Arrow Flight SQL, if the target client also supports Arrow columnar format, the entire transfer process avoids serialization and deserialization operations, completely eliminating the time and performance overhead associated with them.

To load large batches of data from Doris to other components, such as Python/Java/Spark/Flink, you can use ADBC/JDBC based on Arrow Flight SQL to replace the past JDBC/Pymysql/Pandas to obtain higher reading performance, which is often encountered in scenarios such as data science and data lake analysis.
![Arrow_Flight_SQL](/images/db-connect/arrow-flight-sql/Arrow_Flight_SQL.png)

Apache Arrow Flight SQL is a protocol developed by the Apache Arrow community to interact with database systems. It is used for ADBC ​​clients to interact with databases that implement the Arrow Flight SQL protocol using the Arrow data format. It has the speed advantage of Arrow Flight and the ease of use of JDBC/ODBC.
To install Apache Arrow, you can find detailed installation instructions in the official documentation [Apache Arrow](https://arrow.apache.org/install/). For more information on how Doris implements the Arrow Flight protocol, you can refer to [Doris support Arrow Flight SQL protocol](https://github.com/apache/doris/issues/25514).

The motivation, design and implementation, performance test results, and more concepts about Arrow Flight and ADBC ​​for Doris to support Arrow Flight SQL can be found at: [GitHub Issue](https://github.com/apache/doris/issues/25514). This document mainly introduces the use of Doris Arrow Flight SQL and some common problems.

Install Apache Arrow You can find detailed installation tutorials in the official documentation ([Apache Arrow](https://arrow.apache.org/install/))

## Python Usage

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,22 +24,16 @@ specific language governing permissions and limitations
under the License.
-->

:::tip
- since 2.1
:::
自 Doris 2.1 版本后,基于 Arrow Flight SQL 协议实现了高速数据链路,支持多种语言使用 SQL 从 Doris 高速读取大批量数据。Arrow Flight SQL 还提供了通用的 JDBC 驱动,支持与同样遵循 Arrow Flight SQL 协议的数据库无缝交互。部分场景相比 MySQL Client 或 JDBC/ODBC 驱动数据传输方案,性能提升百倍。

Doris 基于 Arrow Flight SQL 协议实现了高速数据链路,支持多种语言使用 SQL 从 Doris 高速读取大批量数据。
## 实现原理

## 用途
在 Doris 中查询结果以列存格式的 Block 组织。在 2.1 以前版本,可以通过 MySQL Client 或 JDBC/ODBC 驱动传输至目标客户端,需要将行存格式的 Bytes 再反序列化为列存格式。基于 Arrow Flight SQL 构建高速数据传输链路,若目标客户端同样支持 Arrow 列存格式,整体传输过程将完全避免序列化/反序列化操作,彻底消除因此带来时间及性能损耗。

从 Doris 加载大批量数据到其他组件,如 Python/Java/Spark/Flink,可以使用基于 Arrow Flight SQL 的 ADBC/JDBC 替代过去的 JDBC/PyMySQL/Pandas 来获得更高的读取性能,这在数据科学、数据湖分析等场景中经常遇到。
![Arrow_Flight_SQL](/images/db-connect/arrow-flight-sql/Arrow_Flight_SQL.png)

Apache Arrow Flight SQL 是一个由 Apache Arrow 社区开发的与数据库系统交互的协议,用于 ADBC 客户端使用 Arrow 数据格式与实现了 Arrow Flight SQL 协议的数据库交互,具有 Arrow Flight 的速度优势以及 JDBC/ODBC 的易用性
安装 Apache Arrow 你可以去官方文档 [Apache Arrow](https://arrow.apache.org/install/) 找到详细的安装教程。更多关于 Doris 实现 Arrow Flight 协议的原理可以参考 [Doris support Arrow Flight SQL protocol](https://github.com/apache/doris/issues/25514)

Doris 支持 Arrow Flight SQL 的动机、设计与实现、性能测试结果、以及有关 Arrow Flight、ADBC 的更多概念可以看 [GitHub Issue](https://github.com/apache/doris/issues/25514),这篇文档主要介绍 Doris Arrow Flight SQL 的使用方法,以及一些常见问题。

安装 Apache Arrow 你可以去官方文档(
[Apache Arrow](https://arrow.apache.org/install/))找到详细的安装教程。

## Python 使用方法

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,22 +24,16 @@ specific language governing permissions and limitations
under the License.
-->

:::tip
- since 2.1
:::
自 Doris 2.1 版本后,基于 Arrow Flight SQL 协议实现了高速数据链路,支持多种语言使用 SQL 从 Doris 高速读取大批量数据。Arrow Flight SQL 还提供了通用的 JDBC 驱动,支持与同样遵循 Arrow Flight SQL 协议的数据库无缝交互。部分场景相比 MySQL Client 或 JDBC/ODBC 驱动数据传输方案,性能提升百倍。

Doris 基于 Arrow Flight SQL 协议实现了高速数据链路,支持多种语言使用 SQL 从 Doris 高速读取大批量数据。
## 实现原理

## 用途
在 Doris 中查询结果以列存格式的 Block 组织。在 2.1 以前版本,可以通过 MySQL Client 或 JDBC/ODBC 驱动传输至目标客户端,需要将行存格式的 Bytes 再反序列化为列存格式。基于 Arrow Flight SQL 构建高速数据传输链路,若目标客户端同样支持 Arrow 列存格式,整体传输过程将完全避免序列化/反序列化操作,彻底消除因此带来时间及性能损耗。

从 Doris 加载大批量数据到其他组件,如 Python/Java/Spark/Flink,可以使用基于 Arrow Flight SQL 的 ADBC/JDBC 替代过去的 JDBC/PyMySQL/Pandas 来获得更高的读取性能,这在数据科学、数据湖分析等场景中经常遇到。
![Arrow_Flight_SQL](/images/db-connect/arrow-flight-sql/Arrow_Flight_SQL.png)

Apache Arrow Flight SQL 是一个由 Apache Arrow 社区开发的与数据库系统交互的协议,用于 ADBC 客户端使用 Arrow 数据格式与实现了 Arrow Flight SQL 协议的数据库交互,具有 Arrow Flight 的速度优势以及 JDBC/ODBC 的易用性
安装 Apache Arrow 你可以去官方文档 [Apache Arrow](https://arrow.apache.org/install/) 找到详细的安装教程。更多关于 Doris 实现 Arrow Flight 协议的原理可以参考 [Doris support Arrow Flight SQL protocol](https://github.com/apache/doris/issues/25514)

Doris 支持 Arrow Flight SQL 的动机、设计与实现、性能测试结果、以及有关 Arrow Flight、ADBC 的更多概念可以看 [GitHub Issue](https://github.com/apache/doris/issues/25514),这篇文档主要介绍 Doris Arrow Flight SQL 的使用方法,以及一些常见问题。

安装 Apache Arrow 你可以去官方文档(
[Apache Arrow](https://arrow.apache.org/install/))找到详细的安装教程。

## Python 使用方法

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,22 +24,16 @@ specific language governing permissions and limitations
under the License.
-->

:::tip
- since 2.1
:::
自 Doris 2.1 版本后,基于 Arrow Flight SQL 协议实现了高速数据链路,支持多种语言使用 SQL 从 Doris 高速读取大批量数据。Arrow Flight SQL 还提供了通用的 JDBC 驱动,支持与同样遵循 Arrow Flight SQL 协议的数据库无缝交互。部分场景相比 MySQL Client 或 JDBC/ODBC 驱动数据传输方案,性能提升百倍。

Doris 基于 Arrow Flight SQL 协议实现了高速数据链路,支持多种语言使用 SQL 从 Doris 高速读取大批量数据。
## 实现原理

## 用途
在 Doris 中查询结果以列存格式的 Block 组织。在 2.1 以前版本,可以通过 MySQL Client 或 JDBC/ODBC 驱动传输至目标客户端,需要将行存格式的 Bytes 再反序列化为列存格式。基于 Arrow Flight SQL 构建高速数据传输链路,若目标客户端同样支持 Arrow 列存格式,整体传输过程将完全避免序列化/反序列化操作,彻底消除因此带来时间及性能损耗。

从 Doris 加载大批量数据到其他组件,如 Python/Java/Spark/Flink,可以使用基于 Arrow Flight SQL 的 ADBC/JDBC 替代过去的 JDBC/PyMySQL/Pandas 来获得更高的读取性能,这在数据科学、数据湖分析等场景中经常遇到。
![Arrow_Flight_SQL](/images/db-connect/arrow-flight-sql/Arrow_Flight_SQL.png)

Apache Arrow Flight SQL 是一个由 Apache Arrow 社区开发的与数据库系统交互的协议,用于 ADBC 客户端使用 Arrow 数据格式与实现了 Arrow Flight SQL 协议的数据库交互,具有 Arrow Flight 的速度优势以及 JDBC/ODBC 的易用性
安装 Apache Arrow 你可以去官方文档 [Apache Arrow](https://arrow.apache.org/install/) 找到详细的安装教程。更多关于 Doris 实现 Arrow Flight 协议的原理可以参考 [Doris support Arrow Flight SQL protocol](https://github.com/apache/doris/issues/25514)

Doris 支持 Arrow Flight SQL 的动机、设计与实现、性能测试结果、以及有关 Arrow Flight、ADBC 的更多概念可以看 [GitHub Issue](https://github.com/apache/doris/issues/25514),这篇文档主要介绍 Doris Arrow Flight SQL 的使用方法,以及一些常见问题。

安装 Apache Arrow 你可以去官方文档(
[Apache Arrow](https://arrow.apache.org/install/))找到详细的安装教程。

## Python 使用方法

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
15 changes: 5 additions & 10 deletions versioned_docs/version-2.1/db-connect/arrow-flight-sql-connect.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,21 +24,16 @@ specific language governing permissions and limitations
under the License.
-->

:::tip
- since 2.1
:::
Since Doris 2.1, a high-speed data link based on the Arrow Flight SQL protocol has been implemented, allowing SQL queries to rapidly retrieve large volumes of data from Doris in multiple languages. Arrow Flight SQL also provides a universal JDBC driver, supporting seamless interaction with databases that also follow the Arrow Flight SQL protocol. In some scenarios, performance can improve by up to a hundred times compared to data transfer solutions using MySQL Client or JDBC/ODBC drivers.

Doris implements high-speed data links based on the Arrow Flight SQL protocol, and supports multiple languages ​​to use SQL to read large batches of data from Doris at high speed.
## Implementation Principle

## Usage
In Doris, query results are organized in columnar format as Blocks. In versions prior to 2.1, data could be transferred to the target client via MySQL Client or JDBC/ODBC drivers, but this required deserializing row-based Bytes into columnar format. By building a high-speed data transfer link based on Arrow Flight SQL, if the target client also supports Arrow columnar format, the entire transfer process avoids serialization and deserialization operations, completely eliminating the time and performance overhead associated with them.

To load large batches of data from Doris to other components, such as Python/Java/Spark/Flink, you can use ADBC/JDBC based on Arrow Flight SQL to replace the past JDBC/Pymysql/Pandas to obtain higher reading performance, which is often encountered in scenarios such as data science and data lake analysis.
![Arrow_Flight_SQL](/images/db-connect/arrow-flight-sql/Arrow_Flight_SQL.png)

Apache Arrow Flight SQL is a protocol developed by the Apache Arrow community to interact with database systems. It is used for ADBC ​​clients to interact with databases that implement the Arrow Flight SQL protocol using the Arrow data format. It has the speed advantage of Arrow Flight and the ease of use of JDBC/ODBC.
To install Apache Arrow, you can find detailed installation instructions in the official documentation [Apache Arrow](https://arrow.apache.org/install/). For more information on how Doris implements the Arrow Flight protocol, you can refer to [Doris support Arrow Flight SQL protocol](https://github.com/apache/doris/issues/25514).

The motivation, design and implementation, performance test results, and more concepts about Arrow Flight and ADBC ​​for Doris to support Arrow Flight SQL can be found at: [GitHub Issue](https://github.com/apache/doris/issues/25514). This document mainly introduces the use of Doris Arrow Flight SQL and some common problems.

Install Apache Arrow You can find detailed installation tutorials in the official documentation ([Apache Arrow](https://arrow.apache.org/install/))

## Python Usage

Expand Down
15 changes: 5 additions & 10 deletions versioned_docs/version-3.0/db-connect/arrow-flight-sql-connect.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,21 +24,16 @@ specific language governing permissions and limitations
under the License.
-->

:::tip
- since 2.1
:::
Since Doris 2.1, a high-speed data link based on the Arrow Flight SQL protocol has been implemented, allowing SQL queries to rapidly retrieve large volumes of data from Doris in multiple languages. Arrow Flight SQL also provides a universal JDBC driver, supporting seamless interaction with databases that also follow the Arrow Flight SQL protocol. In some scenarios, performance can improve by up to a hundred times compared to data transfer solutions using MySQL Client or JDBC/ODBC drivers.

Doris implements high-speed data links based on the Arrow Flight SQL protocol, and supports multiple languages ​​to use SQL to read large batches of data from Doris at high speed.
## Implementation Principle

## Usage
In Doris, query results are organized in columnar format as Blocks. In versions prior to 2.1, data could be transferred to the target client via MySQL Client or JDBC/ODBC drivers, but this required deserializing row-based Bytes into columnar format. By building a high-speed data transfer link based on Arrow Flight SQL, if the target client also supports Arrow columnar format, the entire transfer process avoids serialization and deserialization operations, completely eliminating the time and performance overhead associated with them.

To load large batches of data from Doris to other components, such as Python/Java/Spark/Flink, you can use ADBC/JDBC based on Arrow Flight SQL to replace the past JDBC/Pymysql/Pandas to obtain higher reading performance, which is often encountered in scenarios such as data science and data lake analysis.
![Arrow_Flight_SQL](/images/db-connect/arrow-flight-sql/Arrow_Flight_SQL.png)

Apache Arrow Flight SQL is a protocol developed by the Apache Arrow community to interact with database systems. It is used for ADBC ​​clients to interact with databases that implement the Arrow Flight SQL protocol using the Arrow data format. It has the speed advantage of Arrow Flight and the ease of use of JDBC/ODBC.
To install Apache Arrow, you can find detailed installation instructions in the official documentation [Apache Arrow](https://arrow.apache.org/install/). For more information on how Doris implements the Arrow Flight protocol, you can refer to [Doris support Arrow Flight SQL protocol](https://github.com/apache/doris/issues/25514).

The motivation, design and implementation, performance test results, and more concepts about Arrow Flight and ADBC ​​for Doris to support Arrow Flight SQL can be found at: [GitHub Issue](https://github.com/apache/doris/issues/25514). This document mainly introduces the use of Doris Arrow Flight SQL and some common problems.

Install Apache Arrow You can find detailed installation tutorials in the official documentation ([Apache Arrow](https://arrow.apache.org/install/))

## Python Usage

Expand Down

0 comments on commit f37963f

Please sign in to comment.