Fix tombstone ingestion with schematization #700

sfc-gh-rcheng · 2023-09-11T18:21:27Z

Enables schematization tombstone ingestion
Adds comment about a bug in ignoring snowpipe tombstone records

Test changes

Adds IT for snowpipe, streaming and schematization ingestion methods
Alters existing json e2e tests to ingest tombstone records by default
Adds new e2e tests to ignore tombstone records
Special cases Kafka version 2.5.1 due to a bug with tombstone records. Will need to make a note of this in the documentation

sfc-gh-rcheng · 2023-09-11T21:42:21Z

src/main/java/com/snowflake/kafka/connector/internal/streaming/SchematizationUtils.java

@@ -152,7 +152,7 @@ static Map<String, String> getColumnTypes(SinkRecord record, List<String> column
  private static Map<String, String> getSchemaMapFromRecord(SinkRecord record) {
    Map<String, String> schemaMap = new HashMap<>();
    Schema schema = record.valueSchema();
-    if (schema != null) {
+    if (schema != null && schema.fields() != null) {


added additional null check to ensure we don't NPE in the next for loop

sfc-gh-rcheng · 2023-09-11T21:42:53Z

src/main/java/com/snowflake/kafka/connector/records/RecordService.java

+
+    // return empty if tombstone record
+    if (node.size() == 0
+        && this.behaviorOnNullValues == SnowflakeSinkConnectorConfig.BehaviorOnNullValues.DEFAULT) {


crux of the schematization tombstone change

sfc-gh-rcheng · 2023-09-14T22:20:06Z

test/rest_request_template/test_string_json_ignore_tombstone.json

+    "snowflake.database.name":"SNOWFLAKE_DATABASE",
+    "snowflake.schema.name":"SNOWFLAKE_SCHEMA",
+    "key.converter":"org.apache.kafka.connect.storage.StringConverter",
+    "value.converter": "org.apache.kafka.connect.json.JsonConverter",


Using community JsonConverter instead of Snowflake customer converter due to this bug

sfc-gh-japatel · 2023-09-19T23:26:44Z

test/test_suit/test_snowpipe_streaming_string_json.py

@@ -31,10 +31,19 @@ def send(self):
            print("Sending in Partition:" + str(p))
            key = []
            value = []
-            for e in range(self.recordNum):
+            for e in range(self.recordNum - 1):


why did we change this?

we are sending the tombstone record as the last record. The schema evolution tests rely on exactly recordNum rows being sent (throws NonRetryableError). Additionally a majority of the tests rely on row count as a buffer threshold, so sending the tombstone record as an additional row would increase test runtime

I see, thanks.
why do we have to make changes in normal snowpipe streaming tests? we can do this change in tombstone specific tests right?

yes i could add a tombstone specific test, but i didn't want to extend the e2e runtime. plus tombstone ingestion is enabled by default, so i think it makes sense to put it in the normal e2e tests

happy to copy this into a new test if you prefer

Oh I see, thanks for explanation, i missed it is default enabled. please add a comment wherever possible. Thanks

Good point it is confusing, added comments to the tests

sfc-gh-japatel · 2023-09-19T23:30:33Z

Can you add a description on what bug is fixed?

sfc-gh-rcheng · 2023-09-19T23:34:14Z

Can you add a description on what bug is fixed?

Updated the description. The fixed bug is to enable tombstone ingestion with schematization

sfc-gh-japatel

lgtm!
Left a suggestion for adding comment about tombstone since it wont be obvious by reading test code why we expect one less record. But thanks for adding all those tests. Good stuff.
Lets also get an approval from @sfc-gh-tzhang related to schematization

sfc-gh-tzhang

LGTM, thanks!

sfc-gh-tzhang · 2023-09-20T20:16:03Z

src/main/java/com/snowflake/kafka/connector/records/RecordService.java

@@ -275,6 +275,13 @@ public Map<String, Object> getProcessedRecordForStreamingIngest(SinkRecord recor
  private Map<String, Object> getMapFromJsonNodeForStreamingIngest(JsonNode node)
      throws JsonProcessingException {
    final Map<String, Object> streamingIngestRow = new HashMap<>();
+
+    // return empty if tombstone record


You probably has tested this, but all columns will be NULL for that row?

yup tombstone records only have a RECORD_METADATA column and the rest are all null. The key is in the RECORD_METADATA

sfc-gh-tzhang · 2023-09-20T20:16:51Z

Adds comment about a bug in ignoring snowpipe tombstone records

I think we want to fix this as well

sfc-gh-rcheng · 2023-09-20T20:54:24Z

Adds comment about a bug in ignoring snowpipe tombstone records

I think we want to fix this as well

I'll do this in a followup PR, I want to reduce existing tombstone behavior changes

sfc-gh-rcheng added 8 commits September 7, 2023 10:37

stash

408dc10

update tombstone it with schema and ref

3921ec5

save for swap

f98645b

Merge branch 'master' into rcheng-tbschema

3fabaf8

remove sf.log.1

b10e905

insert twice

16b959a

it pass

190b1a6

autoformatting

1744ea7

sfc-gh-rcheng commented Sep 11, 2023

View reviewed changes

sfc-gh-rcheng added 8 commits September 11, 2023 14:51

test to compile

323c9c9

configtest converters

e4a1a9f

autoformatting

6d02129

revert to test

f035f98

stash

3d3c538

stash

ad3f516

revert e2e

8c6a708

e2e test and snowpipe bug

2a9f42b

sfc-gh-rcheng commented Sep 14, 2023

View reviewed changes

sfc-gh-rcheng changed the title ~~Rcheng tbschema~~ Fix tombstone ingestion with schematization Sep 14, 2023

sfc-gh-rcheng marked this pull request as ready for review September 14, 2023 22:22

sfc-gh-rcheng requested review from sfc-gh-japatel, sfc-gh-tzhang, sfc-gh-tjones and a team as code owners September 14, 2023 22:22

sfc-gh-rcheng added 5 commits September 14, 2023 15:23

personal nits

5c699b2

exclude tombstone snowpipe ingore

22d5284

reenable tombstone json

6c5546c

fix test suites

8b94889

special case 2.5.1 due to bug

445d4ee

sfc-gh-japatel reviewed Sep 19, 2023

View reviewed changes

sfc-gh-rcheng requested a review from sfc-gh-japatel September 19, 2023 23:34

sfc-gh-japatel approved these changes Sep 20, 2023

View reviewed changes

add comment about record number in e2e

a2a6279

sfc-gh-tzhang approved these changes Sep 20, 2023

View reviewed changes

sfc-gh-rcheng merged commit f0e2db6 into master Sep 20, 2023
30 checks passed

sfc-gh-rcheng deleted the rcheng-tbschema branch September 20, 2023 22:08

khsoneji pushed a commit to confluentinc/snowflake-kafka-connector that referenced this pull request Oct 12, 2023

Fix tombstone ingestion with schematization (snowflakedb#700)

f1eb295

khsoneji pushed a commit to confluentinc/snowflake-kafka-connector that referenced this pull request Oct 12, 2023

Fix tombstone ingestion with schematization (snowflakedb#700)

1e92e62

khsoneji pushed a commit to confluentinc/snowflake-kafka-connector that referenced this pull request Oct 12, 2023

Fix tombstone ingestion with schematization (snowflakedb#700)

fcf81fe

khsoneji pushed a commit to confluentinc/snowflake-kafka-connector that referenced this pull request Oct 12, 2023

Fix tombstone ingestion with schematization (snowflakedb#700)

c6fe5df

EduardHantig pushed a commit to streamkap-com/snowflake-kafka-connector that referenced this pull request Feb 1, 2024

Fix tombstone ingestion with schematization (snowflakedb#700)

fb510a8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix tombstone ingestion with schematization #700

Fix tombstone ingestion with schematization #700

sfc-gh-rcheng commented Sep 11, 2023 •

edited

Loading

sfc-gh-rcheng Sep 11, 2023

sfc-gh-rcheng Sep 11, 2023

sfc-gh-rcheng Sep 14, 2023

sfc-gh-japatel Sep 19, 2023

sfc-gh-rcheng Sep 19, 2023

sfc-gh-japatel Sep 20, 2023

sfc-gh-rcheng Sep 20, 2023

sfc-gh-japatel Sep 20, 2023

sfc-gh-rcheng Sep 20, 2023

sfc-gh-japatel commented Sep 19, 2023

sfc-gh-rcheng commented Sep 19, 2023

sfc-gh-japatel left a comment

sfc-gh-tzhang left a comment

sfc-gh-tzhang Sep 20, 2023

sfc-gh-rcheng Sep 20, 2023

sfc-gh-tzhang commented Sep 20, 2023

sfc-gh-rcheng commented Sep 20, 2023

Fix tombstone ingestion with schematization #700

Fix tombstone ingestion with schematization #700

Conversation

sfc-gh-rcheng commented Sep 11, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sfc-gh-japatel commented Sep 19, 2023

sfc-gh-rcheng commented Sep 19, 2023

sfc-gh-japatel left a comment

Choose a reason for hiding this comment

sfc-gh-tzhang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sfc-gh-tzhang commented Sep 20, 2023

sfc-gh-rcheng commented Sep 20, 2023

sfc-gh-rcheng commented Sep 11, 2023 •

edited

Loading