Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-48898][SQL] Set nullability correctly in the Variant schema
### What changes were proposed in this pull request? The `variantShreddingSchema` method converts a human-readable schema for Variant to one that's a valid shredding schema. According to the shredding schema in apache/parquet-format#461, each shredded field in an object should be a required group - i.e. a non-nullable struct. This PR fixes the `variantShreddingSchema` to mark that struct as non-nullable. ### Why are the changes needed? If we use `variantShreddingSchema` to construct a schema for Parquet, the schema would be technically non-conformant with the spec by setting the group as optional. I don't think this should really matter to readers, but it would waste a bit of space in the Parquet file by adding an extra definition level. ### Does this PR introduce _any_ user-facing change? No, this code is not used yet. ### How was this patch tested? Added a test to do some minimal validation of the `variantShreddingSchema` function. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #49118 from cashmand/SPARK-48898-nullability. Authored-by: cashmand <david.cashman@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
- Loading branch information