-
Notifications
You must be signed in to change notification settings - Fork 838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add example of converting RecordBatches to JSON objects #5364
Conversation
arrow-json/src/writer.rs
Outdated
//! let a = Int32Array::from(vec![1, 2, 3]); | ||
//! let batch = RecordBatch::try_new(Arc::new(schema), vec![Arc::new(a)]).unwrap(); | ||
//! | ||
//! let json_rows: Vec<Map<String, Value>> = todo!("How do we do this?"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tustvold can you help / point me at code that does what you are thinking of so I can update the example?
I couldn't immediately see how to apply the suggestion you are making
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can "parse" a serialized JSON string into a RawValue, this allows embedding it into existing serde flows without paying additional decoding overheads. There is no way to obtain a Value
, other than to parse the serialized JSON string, this is expected. If this is insufficient for people's use-cases I would suggest they file a ticket with their requirements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it -- I will try and update the example to show reparsing the string to Json value with a note about performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given I am still very confused about how the RawValue api fits in here (perhaps because as you hint, there is no clear usecase), I am going to remove mention from the docs to avoid confusion.
I wonder if people potentially were using the json_serde
values as an intermediate representation to map RecordBatches to their own data structures via serde
🤔
Maybe we can point them to the https://crates.io/crates/serde_arrow crate for that usecase 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say you have a larger JSON document you want to embed the arrow data into, you could parse into RawValue in order to embed it. That's the major use-case I can think of
I wonder if people potentially were using the json_serde values as an intermediate representation to map RecordBatches to their own data structures via serde
I guess we shall find out 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a PR to serde_arrow with an example of how to use that crate to make arrow arrays out of rust structs: chmp/serde_arrow#131
So now I feel quite good about directing people there ❤️
//! | ||
//! ``` | ||
//! # use std::sync::Arc; | ||
//! # use arrow_array::{Int32Array, RecordBatch}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure how much value this example has, to be honest, other than to demonstrate feature parity with previous releases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I similarly am not immensely convinced of its utility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I did put this example at the end of the docs, so hopefully it is minimally confusing
Which issue does this PR close?
Related to #5318
Rationale for this change
@tustvold deprecated
record_batches_to_json_rows
in #5318 but there are at least two of us (https://github.com/apache/arrow-rs/pull/5318/files#r1460432887) who are not quite sure how to use the existing APIs or suggested APIs to achieve the same results.While converting from arrow --> JSON objects may not be ideal for certain usecases, I think it is a common request so we shouldn't cause users trouble if they were using it
Thus I think adding an example to show that #5318 doesn't regress functionality is warranted
What changes are included in this PR?
Add a doc example about how to convert
RecordBatch
es toserde_json
objectsAre there any user-facing changes?
Better examples