Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polling efficiency #378

Merged

Conversation

Claudenw
Copy link
Contributor

@Claudenw Claudenw commented Dec 20, 2024

Fixes for KCON-26 - Backoff when no data available.
Fixes for KCON-28 - Improve poll method

Creates an AbstractSourceTask in commons to handle response to poll and backoff calculations as well as start, stop. Implementations need to implement an Iterator that poll will call to retrieve data.

Private classes Timer and Backoff are created in AbstractSourceTask and may be moved out at a later date if needed elsewhere.

Changes made to configurations to support configuration extraction in AbstractSourceTask.

Modifications to S3SourceTask to operate under AbstractSourceTask.

Additional tests added

@Claudenw Claudenw force-pushed the polling_efficiency branch 2 times, most recently from bb477b3 to b6f2d08 Compare December 27, 2024 15:07
@Claudenw Claudenw self-assigned this Dec 27, 2024
@@ -0,0 +1,439 @@
/*
* Copyright 2024 Aiven Oy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably 2025 now

@Claudenw Claudenw force-pushed the polling_efficiency branch from df11418 to af45037 Compare January 2, 2025 21:40
@Claudenw Claudenw force-pushed the polling_efficiency branch from af45037 to be775b0 Compare January 6, 2025 11:46
@Claudenw Claudenw requested a review from a team as a code owner January 7, 2025 13:44
throw exception;
}
} else {
// TODO validate that the iterator does not lose an S3Object. Add test to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S3ObjectIterator is gone now, although we do have a test for rehydration now.

Copy link
Contributor

@aindriu-aiven aindriu-aiven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two minor things, The rest looks good to me.

Copy link
Contributor

@aindriu-aiven aindriu-aiven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thank you

Copy link
Contributor

@jjaakola-aiven jjaakola-aiven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review, on the abstract source task. Key question is about the defined poll API and returning null and periodically returning control to the framework.

}

@Override
public final List<SourceRecord> poll() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarification: https://kafka.apache.org/37/javadoc/org/apache/kafka/connect/source/SourceTask.html#poll--

This function does comply with the defined API and returns control from time to time to the caller. The question and difference to the defined API is that this does not return null, but returns empty list when there is no data available.

logger.error("Error during poll(): {}", e.getMessage(), e);
if (config.getErrorsTolerance() == ErrorsTolerance.NONE) {
logger.error("Stopping Task");
return null; // NOPMD must return null in this case.
Copy link
Contributor

@muralibasani muralibasani Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on this
"Poll this source task for new records. If no data is currently available, this method should block but return control to the caller regularly (by returning null) in order for the task to transition to the PAUSED state if requested to do so.
The task will be stopped on a separate thread, and when that happens this method is expected to unblock, quickly finish up any remaining processing, and return.
Returns:
a list of source records"

returning null will not stop the task. It returns control and says no recs are available.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code change to rethrow the error.

Copy link
Contributor

@muralibasani muralibasani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one minor thing clarify on poll / returning null.

Claudenw and others added 12 commits January 9, 2025 08:13
…tractSourceTask.java

Co-authored-by: Jarkko Jaakola <91882676+jjaakola-aiven@users.noreply.github.com>
…tractSourceTask.java

Co-authored-by: Jarkko Jaakola <91882676+jjaakola-aiven@users.noreply.github.com>
…tractSourceTask.java

Co-authored-by: Jarkko Jaakola <91882676+jjaakola-aiven@users.noreply.github.com>
…urce/S3SourceTask.java

Co-authored-by: Jarkko Jaakola <91882676+jjaakola-aiven@users.noreply.github.com>
…tractSourceTask.java

Co-authored-by: Jarkko Jaakola <91882676+jjaakola-aiven@users.noreply.github.com>
…urce/S3SourceTask.java

Co-authored-by: Jarkko Jaakola <91882676+jjaakola-aiven@users.noreply.github.com>
…urce/utils/SourceRecordIterator.java

Co-authored-by: Murali Basani <murali.basani@gmail.com>
Copy link
Contributor

@jjaakola-aiven jjaakola-aiven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concerns addressed.

} catch (RuntimeException e) { // NOPMD must catch runtime here.
logger.error("Error during poll(): {}", e.getMessage(), e);
if (config.getErrorsTolerance() == ErrorsTolerance.NONE) {
logger.error("Stopping Task");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Claudenw Is this addressed ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it now re-throws the exception.

Copy link
Contributor

@muralibasani muralibasani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @Claudenw for the changes

@aindriu-aiven aindriu-aiven merged commit c4604f4 into Aiven-Open:s3-source-release Jan 9, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants