JSON spark reader plan for 24.12 #17138
Labels
cuIO
cuIO issue
improvement
Improvement / enhancement to an existing function
Spark
Functionality that helps Spark RAPIDS
Milestone
These are the planned optimizations and bug fixes for JSON spark reader for 24.12 release.
Memory optimization PR JSON tokenizer memory optimizations #16978
Runtime mitigation issue - Multi-stage FST implementation (Elias, Shruti) [FEA] Faster path for calculating total output symbols in FST #17114
input schema issue/PR (New Feature)
read_json
should output all-nulls columns for the schema columns that do not exist in the input #17091,read_json
need to follow depth-first-search order as in the input schema #17090,cudf::read_json
#17002,cudf::io::read_json
does not verify output column structures with the input schema #16799,Performance: Preprocessing: nullify empty lines PR add option to nullify empty lines #17028
Bugfix: last invalid json is not error - [BUG]
cudf::read_json
incorrectly parses invalid JSON string #16999Bugfix: disable array of arrays for spark - disable array of arrays for recovery with null #17030
Performance: mega kernel - [FEA] Implement merged 'mega' kernel to parse leaf-level columns in JSON reader #16965
[FEA] Improve
GpuJsonToStructs
performance NVIDIA/spark-rapids#11560 (input schema, and post-processing move columns without copying)from_json_to_structs
NVIDIA/spark-rapids-jni#2510The text was updated successfully, but these errors were encountered: