-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-8025][VL] Respect config kSpillReadBufferSize and add spill compression codec #8045
Conversation
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
@@ -1585,13 +1590,6 @@ object GlutenConfig { | |||
.bytesConf(ByteUnit.BYTE) | |||
.createWithDefaultString("100G") | |||
|
|||
val COLUMNAR_VELOX_MAX_SPILL_WRITE_BUFFER_SIZE = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's put all configurations in GlutenConfig.scala even it's not documented in Gluten doc.
Can you add spillreadbuffersize as well?
@@ -732,7 +734,10 @@ object GlutenConfig { | |||
COLUMNAR_MEMORY_BACKTRACE_ALLOCATION.defaultValueString), | |||
( | |||
GLUTEN_COLUMNAR_TO_ROW_MEM_THRESHOLD.key, | |||
GLUTEN_COLUMNAR_TO_ROW_MEM_THRESHOLD.defaultValue.get.toString) | |||
GLUTEN_COLUMNAR_TO_ROW_MEM_THRESHOLD.defaultValue.get.toString), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's check if spark.unsafe.sorter.spill.read.ahead.enabled is enabled or not. If not, close the spill read adhead buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FileInputStream implement decides it must read some buffer ahead even if spark.unsafe.sorter.spill.read.ahead.enabled is disabled. We could add a new class FileInputReader to read the file as need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if we set the buffer size to 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not check the config value reasonable.
It may cause dead cycle because it read file 0 byte one time.
https://github.com/facebookincubator/velox/blob/main/velox/common/file/FileInputStream.cpp#L207
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then let's ignore it for now and open an enhancement issue on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, added, facebookincubator/velox#11673.
It won't affect us now, because Spark config requires the value is more than default value.
private[spark] val UNSAFE_SORTER_SPILL_READER_BUFFER_SIZE =
ConfigBuilder("spark.unsafe.sorter.spill.reader.buffer.size")
.internal()
.version("2.1.0")
.bytesConf(ByteUnit.BYTE)
.checkValue(v => 1024 * 1024 <= v && v <= MAX_BUFFER_SIZE_BYTES,
s"The value must be in allowed range [1,048,576, ${MAX_BUFFER_SIZE_BYTES}].")
.createWithDefault(1024 * 1024)
I will add the FileReadStream to Velox to support disable read ahead.
some spark tunning parameters about spill, FYI |
Run Gluten Clickhouse CI on x86 |
1 similar comment
Run Gluten Clickhouse CI on x86 |
366c89c
to
f4397e2
Compare
Run Gluten Clickhouse CI on x86 |
f4397e2
to
99e04f5
Compare
Run Gluten Clickhouse CI on x86 |
2 similar comments
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
@jinchengchenghh can you rebase. Let's merge it |
313872d
to
62490ac
Compare
Run Gluten Clickhouse CI on x86 |
Can you help approve and merge this PR? Thanks! @FelixYBW |
Respect Spark config
UNSAFE_SORTER_SPILL_READER_BUFFER_SIZE
. as VeloxkSpillReadBufferSize
.Add the config
spark.gluten.sql.columnar.backend.velox.spillCompressionCodec
, if not set, use Spark configspark.io.compression.codec
instead.Respect Spark
spark.shuffle.spill.diskWriteBufferSize
as VeloxkSpillWriteBufferSize
.