You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using trino to get some data out of my backend database, and trino is working as a middle ware. something like that db <-> trino <-> client(trino_cli or python client) .
I found the trino is slow when reading data out of it. more specifically, when using trino to read data, the speed of it is significantly slow than reading data from the backend database directly.
trino limits the speed to about 10-20MB/s, while my database could serve 100MB/s per connection.
I think the trino shouldn't be the bottleneck in the data pipeline, otherwise it will block something.
Possible solution
if there is some configuration or debug method that let me find the underlying bottleneck is and to know how to fix it.
I am using the way below to find out that trino blocks the data stream.
I am using trino memory connector to help me with the diagonose.
# step 1. copy data from my database to trino memory, which is the data path for reading data out from the backend database to the trino nodes.
create table memory.default.sf100_lineitem AS select * from xdb.default.sf100_lineitem limit 10000000; # it shows a throughput of about 100MB/s, more precisely, 80-150MB/s
# step 2. read data from trino to the outside, I am using trino cli to test the data
select * from memory.default.sf100_lineitem; # it shows me only a bandwidth of 10MB/s could be achieved. my network is more than 10Gb/s, so it is not blocked by the network.
Additional context
No response
Environment
No response
Would you like to work on fixing this bug?
yes
The text was updated successfully, but these errors were encountered:
Affected version
No response
Current and expected behavior
I am using trino to get some data out of my backend database, and trino is working as a middle ware. something like that
db <-> trino <-> client(trino_cli or python client)
.I found the trino is slow when reading data out of it. more specifically, when using trino to read data, the speed of it is significantly slow than reading data from the backend database directly.
trino limits the speed to about 10-20MB/s, while my database could serve 100MB/s per connection.
I think the trino shouldn't be the bottleneck in the data pipeline, otherwise it will block something.
Possible solution
if there is some configuration or debug method that let me find the underlying bottleneck is and to know how to fix it.
I am using the way below to find out that trino blocks the data stream.
I am using trino memory connector to help me with the diagonose.
Additional context
No response
Environment
No response
Would you like to work on fixing this bug?
yes
The text was updated successfully, but these errors were encountered: