-
Notifications
You must be signed in to change notification settings - Fork 300
-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support parallel scan in mito engine #2806
Comments
I'm going to reopen this issue as file-level parallelism is not enough if the number of parquet files is less than the parallelism. We still need to implement a more fine-grained parallel scan strategy, such as row group level parallel scan. However, parallel scanning row groups might not be able to maintain a sorted order of the SST file. As a result, we can implement this parallel strategy in append only tables that use I did some experiments on this branch. I also found that we must support multiple output partitioning to maximize computations. I also fixed an issue that the parquet reader might have unexpected pruning behavior as it always uses the file's schema to create the physical expression. Another potential optimization is to scan columns in parallel if the number of row groups is not enough like polar-rs. But this is not very necessary. I'll update this issue to track the implementation of parallel scanning row groups. |
What type of enhancement is this?
Performance
What does the enhancement do?
Currently, the Mito engine only supports single-threaded scanning. We can consider using parallel scanning to improve speed when dealing with larger amounts of data.
Implementation challenges
There are several approaches to parallel scanning:
We also need to figure out a way to control the parallelism of the query so spawning a task for each file might not be the best solution (We might need some experiments).
Steps
The text was updated successfully, but these errors were encountered: