Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ScalarValue::to_array_of_size for DenseUnion #13797

Merged
merged 1 commit into from
Dec 17, 2024

Conversation

kylebarron
Copy link
Contributor

Which issue does this PR close?

Closes #13762

Rationale for this change

Fix support for dense union.

What changes are included in this PR?

Fix ScalarValue::to_array_of_size for DenseUnion

Are these changes tested?

Yes. The test from #13777 is uncommented.

Are there any user-facing changes?

@kylebarron
Copy link
Contributor Author

I also validated in geoarrow/geoarrow-rs#951 that this is now working for storing geospatial geometries in a dense union!

Mirroring the PostGIS ST_GeomFromText:

SELECT ST_GeomFromText('LINESTRING(-71.160281 42.258729,-71.160837 42.259113,-71.161144 42.25932)');
---- udf::native::io::wkt::test::test stdout ----
+----------------------------------------------------------------------------------------------------+
| st_geomfromtext(Utf8("LINESTRING(-71.160281 42.258729,-71.160837 42.259113,-71.161144 42.25932)")) |
+----------------------------------------------------------------------------------------------------+
| {=[{x: -71.160281, y: 42.258729}, {x: -71.160837, y: 42.259113}, {x: -71.161144, y: 42.25932}]}    |
+----------------------------------------------------------------------------------------------------+

Under the hood this is stored as a rather complex dense union, but allows for any combination of geometry type and dimension with a unique DataType that's streaming-friendly.

kylebarron added a commit to geoarrow/geoarrow-rs that referenced this pull request Dec 16, 2024
Ref apache/datafusion#13797.

This is now working for geometries!

```
SELECT ST_GeomFromText('LINESTRING(-71.160281 42.258729,-71.160837 42.259113,-71.161144 42.25932)');
```

```
---- udf::native::io::wkt::test::test stdout ----
+----------------------------------------------------------------------------------------------------+
| st_geomfromtext(Utf8("LINESTRING(-71.160281 42.258729,-71.160837 42.259113,-71.161144 42.25932)")) |
+----------------------------------------------------------------------------------------------------+
| {=[{x: -71.160281, y: 42.258729}, {x: -71.160837, y: 42.259113}, {x: -71.161144, y: 42.25932}]}    |
+----------------------------------------------------------------------------------------------------+
```

🚀
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @kylebarron -- this looks great to me

new_null_array(dt, size)
match mode {
UnionMode::Sparse => new_null_array(dt, size),
// In a dense union, only the child with values needs to be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double checked the spec and I agree with this check: https://arrow.apache.org/docs/format/Columnar.html#dense-union

@@ -5642,14 +5650,12 @@ mod tests {
// null array
Arc::new(NullArray::new(3)),
// dense union
/* Dense union fails due to https://github.com/apache/datafusion/issues/13762
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@alamb alamb merged commit 57d1309 into apache:main Dec 17, 2024
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
common Related to common crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Schema error when returning DenseUnion from ScalarUDF
2 participants