[GLUTEN-8025][VL] Respect config kSpillReadBufferSize and add spill compression codec #8045

jinchengchenghh · 2024-11-26T02:50:13Z

Respect Spark config UNSAFE_SORTER_SPILL_READER_BUFFER_SIZE. as Velox kSpillReadBufferSize.
Add the config spark.gluten.sql.columnar.backend.velox.spillCompressionCodec, if not set, use Spark config spark.io.compression.codec instead.
Respect Spark spark.shuffle.spill.diskWriteBufferSize as Velox kSpillWriteBufferSize.

github-actions · 2024-11-26T02:50:32Z

#8025

github-actions · 2024-11-26T02:50:45Z

Run Gluten Clickhouse CI on x86

github-actions · 2024-11-26T04:00:26Z

Run Gluten Clickhouse CI on x86

FelixYBW · 2024-11-26T04:05:21Z

shims/common/src/main/scala/org/apache/gluten/GlutenConfig.scala

@@ -1585,13 +1590,6 @@ object GlutenConfig {
      .bytesConf(ByteUnit.BYTE)
      .createWithDefaultString("100G")

-  val COLUMNAR_VELOX_MAX_SPILL_WRITE_BUFFER_SIZE =


Let's put all configurations in GlutenConfig.scala even it's not documented in Gluten doc.

Can you add spillreadbuffersize as well?

FelixYBW · 2024-11-26T04:10:53Z

shims/common/src/main/scala/org/apache/gluten/GlutenConfig.scala

@@ -732,7 +734,10 @@ object GlutenConfig {
        COLUMNAR_MEMORY_BACKTRACE_ALLOCATION.defaultValueString),
      (
        GLUTEN_COLUMNAR_TO_ROW_MEM_THRESHOLD.key,
-        GLUTEN_COLUMNAR_TO_ROW_MEM_THRESHOLD.defaultValue.get.toString)
+        GLUTEN_COLUMNAR_TO_ROW_MEM_THRESHOLD.defaultValue.get.toString),


Let's check if spark.unsafe.sorter.spill.read.ahead.enabled is enabled or not. If not, close the spill read adhead buffer.

FileInputStream implement decides it must read some buffer ahead even if spark.unsafe.sorter.spill.read.ahead.enabled is disabled. We could add a new class FileInputReader to read the file as need.

what happens if we set the buffer size to 0?

It does not check the config value reasonable.
It may cause dead cycle because it read file 0 byte one time.
https://github.com/facebookincubator/velox/blob/main/velox/common/file/FileInputStream.cpp#L207

Then let's ignore it for now and open an enhancement issue on it.

Yes, added, facebookincubator/velox#11673.
It won't affect us now, because Spark config requires the value is more than default value.

private[spark] val UNSAFE_SORTER_SPILL_READER_BUFFER_SIZE = ConfigBuilder("spark.unsafe.sorter.spill.reader.buffer.size") .internal() .version("2.1.0") .bytesConf(ByteUnit.BYTE) .checkValue(v => 1024 * 1024 <= v && v <= MAX_BUFFER_SIZE_BYTES, s"The value must be in allowed range [1,048,576, ${MAX_BUFFER_SIZE_BYTES}].") .createWithDefault(1024 * 1024)

I will add the FileReadStream to Velox to support disable read ahead.

FelixYBW · 2024-11-26T04:13:48Z

some spark tunning parameters about spill, FYI

https://mageswaran1989.medium.com/spark-optimizations-for-advanced-users-spark-cheat-sheet-d74464618c20

github-actions · 2024-11-26T05:48:22Z

Run Gluten Clickhouse CI on x86

github-actions · 2024-11-27T01:24:25Z

Run Gluten Clickhouse CI on x86

github-actions · 2024-11-27T05:59:43Z

Run Gluten Clickhouse CI on x86

github-actions · 2024-11-27T06:00:13Z

Run Gluten Clickhouse CI on x86

github-actions · 2024-11-27T08:39:42Z

Run Gluten Clickhouse CI on x86

github-actions · 2024-12-11T05:43:20Z

Run Gluten Clickhouse CI on x86

FelixYBW · 2024-12-13T01:32:45Z

@jinchengchenghh can you rebase. Let's merge it

github-actions · 2024-12-13T01:44:46Z

Run Gluten Clickhouse CI on x86

jinchengchenghh · 2024-12-13T08:27:28Z

Can you help approve and merge this PR? Thanks! @FelixYBW

github-actions bot added CORE works for Gluten Core VELOX labels Nov 26, 2024

FelixYBW mentioned this pull request Nov 26, 2024

[GLUTEN-8025][VL] set default kSpillWriteBufferSize to the same as velox #8026

Closed

FelixYBW requested changes Nov 26, 2024

View reviewed changes

jinchengchenghh force-pushed the spillread branch from 366c89c to f4397e2 Compare November 27, 2024 05:59

jinchengchenghh force-pushed the spillread branch from f4397e2 to 99e04f5 Compare November 27, 2024 05:59

jinchengchenghh mentioned this pull request Dec 11, 2024

[CORE] Gluten should honor the spark configs as much as possible #8043

Open

jinchengchenghh force-pushed the spillread branch from 313872d to 62490ac Compare December 13, 2024 01:44

respect config kSpillReadBufferSize and add spill compression codec

62490ac

FelixYBW approved these changes Dec 14, 2024

View reviewed changes

FelixYBW merged commit b1211a8 into apache:main Dec 14, 2024
48 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GLUTEN-8025][VL] Respect config kSpillReadBufferSize and add spill compression codec #8045

[GLUTEN-8025][VL] Respect config kSpillReadBufferSize and add spill compression codec #8045

jinchengchenghh commented Nov 26, 2024 •

edited

Loading

github-actions bot commented Nov 26, 2024

github-actions bot commented Nov 26, 2024

github-actions bot commented Nov 26, 2024

FelixYBW Nov 26, 2024

FelixYBW Nov 26, 2024

jinchengchenghh Nov 26, 2024

FelixYBW Nov 26, 2024

jinchengchenghh Nov 27, 2024

FelixYBW Nov 27, 2024

jinchengchenghh Nov 27, 2024

FelixYBW commented Nov 26, 2024

github-actions bot commented Nov 26, 2024

github-actions bot commented Nov 27, 2024

github-actions bot commented Nov 27, 2024

github-actions bot commented Nov 27, 2024

github-actions bot commented Nov 27, 2024

github-actions bot commented Dec 11, 2024

FelixYBW commented Dec 13, 2024

github-actions bot commented Dec 13, 2024

jinchengchenghh commented Dec 13, 2024

[GLUTEN-8025][VL] Respect config kSpillReadBufferSize and add spill compression codec #8045

[GLUTEN-8025][VL] Respect config kSpillReadBufferSize and add spill compression codec #8045

Conversation

jinchengchenghh commented Nov 26, 2024 • edited Loading

github-actions bot commented Nov 26, 2024

github-actions bot commented Nov 26, 2024

github-actions bot commented Nov 26, 2024

FelixYBW Nov 26, 2024

Choose a reason for hiding this comment

FelixYBW Nov 26, 2024

Choose a reason for hiding this comment

jinchengchenghh Nov 26, 2024

Choose a reason for hiding this comment

FelixYBW Nov 26, 2024

Choose a reason for hiding this comment

jinchengchenghh Nov 27, 2024

Choose a reason for hiding this comment

FelixYBW Nov 27, 2024

Choose a reason for hiding this comment

jinchengchenghh Nov 27, 2024

Choose a reason for hiding this comment

FelixYBW commented Nov 26, 2024

github-actions bot commented Nov 26, 2024

github-actions bot commented Nov 27, 2024

github-actions bot commented Nov 27, 2024

github-actions bot commented Nov 27, 2024

github-actions bot commented Nov 27, 2024

github-actions bot commented Dec 11, 2024

FelixYBW commented Dec 13, 2024

github-actions bot commented Dec 13, 2024

jinchengchenghh commented Dec 13, 2024

jinchengchenghh commented Nov 26, 2024 •

edited

Loading