SNOW-1161484: Use insertRows instead of insertRow for schematization #796

sfc-gh-tzhang · 2024-03-01T02:20:13Z

Overview

SNOW-1161484

Create channel with SKIP_BATCH instead of CONTINUE, to avoid the case that KC can crash right after adding the good rows to table, and the bad rows will be missing in the DLQ
Update the schematization code to use insertRows instead of insertRow
- Better performance
- If KC is configured with a longer buffer flush time, everything will be ingested as one batch so that the 1 second flush or 32MB channel size won't apply

Pre-review checklist

sfc-gh-xhuang · 2024-03-01T21:20:05Z

src/main/java/com/snowflake/kafka/connector/internal/streaming/TopicPartitionChannel.java

@@ -1059,7 +1059,7 @@ private SnowflakeStreamingIngestChannel openChannelForTable() {
            .setDBName(this.sfConnectorConfig.get(Utils.SF_DATABASE))
            .setSchemaName(this.sfConnectorConfig.get(Utils.SF_SCHEMA))
            .setTableName(this.tableName)
-            .setOnErrorOption(OpenChannelRequest.OnErrorOption.CONTINUE)
+            .setOnErrorOption(OpenChannelRequest.OnErrorOption.SKIP_BATCH)


This will be a BCR?

I don't think so, there is no behavior difference as far as customer is concerned

sfc-gh-japatel · 2024-03-07T18:56:40Z

Could you consider using the checklist in PR? Let me know if you want to change something in the template.

Also is SKIP_BATCH needed for schematization or is needed in general? If needed in general, I would like if it be separated from this PR.

sfc-gh-xhuang · 2024-03-08T19:00:29Z

I would merge this into the next release if it is ready but up to you

sfc-gh-japatel · 2024-03-08T20:02:51Z

src/main/java/com/snowflake/kafka/connector/internal/streaming/TopicPartitionChannel.java

+  private StreamingBuffer rebuildBufferWithoutErrorRows(
+      StreamingBuffer streamingBufferToInsert,
+      List<InsertValidationResponse.InsertError> insertErrors) {
+    StreamingBuffer buffer = new StreamingBuffer();
+    int errorIdx = 0;
+    for (long rowIdx = 0; rowIdx < streamingBufferToInsert.getNumOfRecords(); rowIdx++) {
+      if (errorIdx < insertErrors.size() && rowIdx == insertErrors.get(errorIdx).getRowIndex()) {
+        errorIdx++;
+      } else {
+        buffer.insert(streamingBufferToInsert.getSinkRecord(rowIdx));
+      }
+    }
+    return buffer;
+  }
+


will this result into a validation error you added recently?

sfc-gh-japatel · 2024-03-08T20:06:24Z

src/main/java/com/snowflake/kafka/connector/internal/streaming/TopicPartitionChannel.java

@@ -576,6 +584,22 @@ InsertRowsResponse insertBufferedRecords(StreamingBuffer streamingBufferToInsert
    return response;
  }

+  /** Building a new buffer which contains only the good rows from the original buffer */
+  private StreamingBuffer rebuildBufferWithoutErrorRows(


is there a test coverage for this new method?

sfc-gh-japatel · 2024-03-08T20:10:53Z

src/test/java/com/snowflake/kafka/connector/internal/streaming/TopicPartitionChannelTest.java

+      Assert.assertTrue(topicPartitionChannel.isPartitionBufferEmpty());
+      Assert.assertEquals(0, kafkaRecordErrorReporter.getReportedRecords().size());
+
+      // Do it again without any schematization error, and we should have row in DLQ


i am confused about the comment, do you mean not have a row in DLQ if there is no schematization error?

or do you mean to verify the DLQ results something along this lines? Assert.assertEquals(>1, kafkaRecordErrorReporter.getReportedRecords().size());?

sfc-gh-japatel

I am worried about this change not having enough tests, do you feel existing tests are enough?

Also, highly encourage for atleast important changes to fill out PR checklist. Thanks

sfc-gh-japatel · 2024-03-08T20:15:14Z

Could you consider using the checklist in PR? Let me know if you want to change something in the template.

Also is SKIP_BATCH needed for schematization or is needed in general? If needed in general, I would like if it be separated from this PR.

talked offline, dont need to split PRs, SKIP_BATCH is needed and there is no behavior change.

sfc-gh-japatel

left couple of comments, I dont have any more concerns, will approve in next iteration! Thank you, this will lower schematization latency 🥇

sfc-gh-xhuang · 2024-04-11T16:22:48Z

@sfc-gh-rcheng @sfc-gh-tzhang this should be merged for 2.2.2 release

sfc-gh-rcheng

I'm also a little concerned about this big of a behavior change without additional tests, but looks like @sfc-gh-japatel discussed this with toby offline.

sfc-gh-xhuang · 2024-05-10T22:02:45Z

With the latest, ingest-sdk merged. We should be able to update the ingest-sdk version and merge this PR?
#843

sfc-gh-xhuang · 2024-05-22T20:44:55Z

@sfc-gh-tzhang ingest-sdk update to 2.1.1 has been merged. Can this be merged now too?

sfc-gh-wtrefon · 2024-06-13T14:29:46Z

src/main/java/com/snowflake/kafka/connector/internal/streaming/TopicPartitionChannel.java

-            if (extraColNames == null && nonNullableColumns == null) {
-              InsertValidationResponse.InsertError newInsertError =
-                  new InsertValidationResponse.InsertError(
-                      insertError.getRowContent(), originalSinkRecordIdx);
-              newInsertError.setException(insertError.getException());
-              newInsertError.setExtraColNames(insertError.getExtraColNames());
-              newInsertError.setMissingNotNullColNames(insertError.getMissingNotNullColNames());
-              // Simply added to the final response if it's not schema related errors
-              finalResponse.addError(insertError);


@sfc-gh-tzhang why you removed this part? IMO we still need it to properly add the rows to the rebuilt buffer and also send only the non-schema errors into DLQ

sfc-gh-wtrefon · 2024-06-14T13:55:27Z

Closed in favor of #866 due to merge conflicts

sfc-gh-tzhang added 7 commits February 27, 2024 22:22

add verification logic

19b5e07

add comments

c7123e6

save progress:

3d4995f

save progress

29ea286

save progress

ead0748

save:

99380ed

minor change

985b102

sfc-gh-tzhang marked this pull request as ready for review March 1, 2024 07:47

sfc-gh-tzhang requested review from sfc-gh-japatel, sfc-gh-tjones, sfc-gh-rcheng and a team as code owners March 1, 2024 07:47

sfc-gh-xhuang reviewed Mar 1, 2024

View reviewed changes

sfc-gh-xhuang approved these changes Mar 7, 2024

View reviewed changes

sfc-gh-tjones approved these changes Mar 8, 2024

View reviewed changes

sfc-gh-japatel reviewed Mar 8, 2024

View reviewed changes

sfc-gh-japatel reviewed Mar 12, 2024

View reviewed changes

sfc-gh-rcheng approved these changes Apr 15, 2024

View reviewed changes

sfc-gh-wtrefon reviewed Jun 13, 2024

View reviewed changes

sfc-gh-wtrefon closed this Jun 14, 2024

sfc-gh-wtrefon mentioned this pull request Jun 14, 2024

SNOW-1151484 Use insertRows for schema evolution #866

Draft

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SNOW-1161484: Use insertRows instead of insertRow for schematization #796

SNOW-1161484: Use insertRows instead of insertRow for schematization #796

sfc-gh-tzhang commented Mar 1, 2024 •

edited

Loading

sfc-gh-xhuang Mar 1, 2024

sfc-gh-tzhang Mar 2, 2024

sfc-gh-japatel commented Mar 7, 2024 •

edited

Loading

sfc-gh-xhuang commented Mar 8, 2024

sfc-gh-japatel Mar 8, 2024

sfc-gh-japatel Mar 8, 2024

sfc-gh-japatel Mar 8, 2024 •

edited

Loading

sfc-gh-japatel left a comment

sfc-gh-japatel commented Mar 8, 2024

sfc-gh-japatel left a comment

sfc-gh-xhuang commented Apr 11, 2024

sfc-gh-rcheng left a comment

sfc-gh-xhuang commented May 10, 2024 •

edited

Loading

sfc-gh-xhuang commented May 22, 2024

sfc-gh-wtrefon Jun 13, 2024

sfc-gh-wtrefon commented Jun 14, 2024

SNOW-1161484: Use insertRows instead of insertRow for schematization #796

SNOW-1161484: Use insertRows instead of insertRow for schematization #796

Conversation

sfc-gh-tzhang commented Mar 1, 2024 • edited Loading

Overview

Pre-review checklist

sfc-gh-xhuang Mar 1, 2024

Choose a reason for hiding this comment

sfc-gh-tzhang Mar 2, 2024

Choose a reason for hiding this comment

sfc-gh-japatel commented Mar 7, 2024 • edited Loading

sfc-gh-xhuang commented Mar 8, 2024

sfc-gh-japatel Mar 8, 2024

Choose a reason for hiding this comment

sfc-gh-japatel Mar 8, 2024

Choose a reason for hiding this comment

sfc-gh-japatel Mar 8, 2024 • edited Loading

Choose a reason for hiding this comment

sfc-gh-japatel left a comment

Choose a reason for hiding this comment

sfc-gh-japatel commented Mar 8, 2024

sfc-gh-japatel left a comment

Choose a reason for hiding this comment

sfc-gh-xhuang commented Apr 11, 2024

sfc-gh-rcheng left a comment

Choose a reason for hiding this comment

sfc-gh-xhuang commented May 10, 2024 • edited Loading

sfc-gh-xhuang commented May 22, 2024

sfc-gh-wtrefon Jun 13, 2024

Choose a reason for hiding this comment

sfc-gh-wtrefon commented Jun 14, 2024

sfc-gh-tzhang commented Mar 1, 2024 •

edited

Loading

sfc-gh-japatel commented Mar 7, 2024 •

edited

Loading

sfc-gh-japatel Mar 8, 2024 •

edited

Loading

sfc-gh-xhuang commented May 10, 2024 •

edited

Loading