Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

source-twilio: improve MessageMedia incremental sync speed #2182

Merged
merged 1 commit into from
Dec 5, 2024

Conversation

Alex-Bair
Copy link
Contributor

@Alex-Bair Alex-Bair commented Dec 5, 2024

Description:

MessageMedia was previously checking every message between the config's start date and the present for new media, then filtering out any media created before the last seen cursor value. That made incremental syncs take an extremely long time without any apparent progress; the stream could be searching through the past few years of messages when it usually only needs to search through the past few minutes.

This change makes the MessageMedia stream only check messages created since the most recent cursor value, falling back to the config's start date if no cursor value is present. This significantly speeds up the connector during incremental syncs.

This change also increases the date window size used when fetching a message's media from 1 year to 100 years. This reduces the number of API requests needed when backfilling media records over a year old; instead of requesting a single year of media at a time, the connector essentially requests all of a message's media in one request. For example, instead of making two requests spanning NOV2023-NOV2024 and NOV2024-DEC2024, a single request is made for NOV2023 - DEC2024.

It would make more sense to not use a sliding date window strategy for fetching a single message's media, but rewriting the MessageMedia stream in a backwards compatible way is a large effort I'd like to avoid, especially when small, targeted changes address the current issue.

Workflow steps:

(How does one use this feature, and how has it changed)

Documentation links affected:

(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)

Notes for reviewers:

Tested on a local stack. Confirmed that for the MessageMedia stream:

  • backfills use the config's start date for searching the parent stream Messages.
  • restarts after a partial/complete backfill use the last seen cursor value for searching the parent stream Messages.
  • retrieving media from messages over a year old is done in a single request.

This change is Reviewable

`MessageMedia` was previously checking every message between the
config's start date and the present for new media, then filtering out
any media created before the cursor value. That made incremental syncs
take an extremely long time without any apparent progress; the stream
could be searching through the past few years of messages when it
usually only needs to search through the past few minutes.

This change makes the `MessageMedia` stream only check messages created
since the most recent cursor value, falling back to the config's start
date if no cursor value is present. This significantly speeds up the
connector during incremental syncs.

This change also increases the date window size used when fetching a
message's media from 1 year to 100 years. This reduces the number of API
requests needed when backfilling media records over a year old; instead
of requesting a single year of media at a time, the connector
essentially requests all of a message's media in one request.

It would make more sense to not use a sliding date window strategy for
fetching a single message's media, but rewriting the `MessageMedia`
stream in a backwards compatible way is a large effort I'd like to
avoid, especially when small, targeted changes address the current issue.
@Alex-Bair Alex-Bair added the change:unplanned Unplanned change, useful for things like doc updates label Dec 5, 2024
@Alex-Bair Alex-Bair changed the title source-twilio: improve MessageMedia`incremental sync speed source-twilio: improve MessageMedia incremental sync speed Dec 5, 2024
@Alex-Bair Alex-Bair marked this pull request as ready for review December 5, 2024 15:11
@Alex-Bair Alex-Bair requested a review from a team December 5, 2024 15:11
Copy link
Member

@jgraettinger jgraettinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Alex-Bair Alex-Bair merged commit ca22f97 into main Dec 5, 2024
71 of 79 checks passed
@Alex-Bair Alex-Bair deleted the bair/source-twilio-message-media-fixes branch December 5, 2024 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
change:unplanned Unplanned change, useful for things like doc updates
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants