Fix array of objects serialization #727

cchandurkar · 2023-10-17T18:40:05Z

Currently, array of structs/objects are being written as an array of stringified json objects.

Here's a quick comparison:
Expected value in ARRAY column:

[
  null,
  {
    "xCategory": null,
    "xEntityId": null,
    "xEntityName": null,
    "xId": "89asda9s0a"
  }
]

Actual value:

[
  "null",
  "{\"xCategory\":null,\"xEntityId\":null,\"xId\":\"89asda9s0a\",\"xEntityName\":null}"
]

I think it's happening because those objects/structs are getting serialized twice. A simple fix here would be to serialize an entire ArrayNode as is instead of creating a List<String> out of it. This way, the value (Stringified array) is a valid JSON that is accepted by net.snowflake.ingest.streaming.SnowflakeStreamingIngestChannel.insertRow()

Issue: #724

sfc-gh-tzhang · 2023-10-18T20:46:26Z

src/main/java/com/snowflake/kafka/connector/records/RecordService.java

@@ -287,14 +287,7 @@ private Map<String, Object> getMapFromJsonNodeForStreamingIngest(JsonNode node)
      String columnName = columnNames.next();
      JsonNode columnNode = node.get(columnName);
      Object columnValue;
-      if (columnNode.isArray()) {


This is an interesting observation, I assume this was added for a reason, we will evaluate it to see if it will break anything, thanks!

Looks like it was introduced in PR that enabled schematization and haven't been updated since.

ok, I tired some quick experiments and I think it makes sense to make the change you suggested, it allows the array to be ingested with its original data type (like we should ingest [1,2] but not ["1","2"]. But for json, I believe you still need to do parse_json in order to use it since it will be ingested as a string?

If you serialize an entire array node, its JSON elements won't be stringified unless they are stringified in original data. They will appear as "expected" in description. And that won't require to use parse_json() as array elements are variants.

…r Schematization (#730) Before this change, every element in the array will be added as a STRING, this change preserves the old data type in the source, for example when the input is [1, 2], the ingested value will be [1, 2] now instead of ["1", "2"] Forked from #727, with additional tests

sfc-gh-xhuang · 2023-11-03T23:35:08Z

Closing as #730 has been merged

…r Schematization (snowflakedb#730) Before this change, every element in the array will be added as a STRING, this change preserves the old data type in the source, for example when the input is [1, 2], the ingested value will be [1, 2] now instead of ["1", "2"] Forked from snowflakedb#727, with additional tests

Fix array of objects serialization

d607e48

cchandurkar requested review from sfc-gh-japatel, sfc-gh-tzhang, sfc-gh-tjones, sfc-gh-rcheng and a team as code owners October 17, 2023 18:40

sfc-gh-tzhang reviewed Oct 18, 2023

View reviewed changes

sfc-gh-tzhang self-assigned this Oct 18, 2023

sfc-gh-tzhang mentioned this pull request Oct 20, 2023

NO-SNOW: Preserve the old data type that goes into an ARRAY column for Schematization #730

Merged

sfc-gh-xhuang closed this Nov 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix array of objects serialization #727

Fix array of objects serialization #727

cchandurkar commented Oct 17, 2023 •

edited

Loading

sfc-gh-tzhang Oct 18, 2023

cchandurkar Oct 18, 2023 •

edited

Loading

sfc-gh-tzhang Oct 20, 2023

cchandurkar Oct 21, 2023 •

edited

Loading

sfc-gh-xhuang commented Nov 3, 2023

Fix array of objects serialization #727

Fix array of objects serialization #727

Conversation

cchandurkar commented Oct 17, 2023 • edited Loading

sfc-gh-tzhang Oct 18, 2023

Choose a reason for hiding this comment

cchandurkar Oct 18, 2023 • edited Loading

Choose a reason for hiding this comment

sfc-gh-tzhang Oct 20, 2023

Choose a reason for hiding this comment

cchandurkar Oct 21, 2023 • edited Loading

Choose a reason for hiding this comment

sfc-gh-xhuang commented Nov 3, 2023

cchandurkar commented Oct 17, 2023 •

edited

Loading

cchandurkar Oct 18, 2023 •

edited

Loading

cchandurkar Oct 21, 2023 •

edited

Loading