-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CORE] Gluten should honor the spark configs as much as possible #8043
Comments
@marin-ma which of following shuffle configurations are used by Gluten? Can you help to fill the missing one, feel free to correct. Gluten Config is the config renamed by Gluten. if so we should either remove the Gluten config or set Gluten Config's default value as spark config's value.
|
@FelixYBW thank you for initiating this matter, and at the same time, I noticed some configuration related to table write. Please correct me if missing something.
the Parquet configuration options in the table options are also respected by Gluten, as follows:
In addition to the above configuration, any other Spark configuration related to table write will not be respected by Gluten at the moment. |
@FelixYBW Updated the table. PTAL. Thanks! |
Thank you! Can you submit a PR to support the configs:
|
We may also need to document the config Spark and Velox mapping. |
looks like a long tail task again. There are also some gaps on the data loading part: the configs on HDFS, Iceberg, Hudi and DeltaLake. As these are newly implemented in native C++. |
Let's start from spark3.5's configurations here. GO through the category one by one. |
|
I respect theses two config in spill in #8045 UnsafeSorterSpillReader receives TempLocalBlockId, so it respects this config. |
Spark control compress here https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/serializer/SerializerManager.scala#L117 Shuffle block is TempShuffleBlockId, so it should respect |
Seems like the naming of |
Yes, the name is incorrect. |
Let's fix this. Only shuffle's spill use it. |
Description
During the analysis of spill in issue #8025, we noted some issues are common between Gluten and Vanilla spark, like the spill read/write buffer size. Some configuration are even not documented in Spark like spark.unsafe.sorter.spill.reader.buffer.size.
Furthermore, I noted Gluten doesn't have a list of honored spark configurations. We should create such a list in documents. @zhouyuan
The text was updated successfully, but these errors were encountered: