Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Source-stripe-native #1546

Merged
merged 13 commits into from
Jun 4, 2024
Merged

Added Source-stripe-native #1546

merged 13 commits into from
Jun 4, 2024

Conversation

Luishfs
Copy link
Collaborator

@Luishfs Luishfs commented May 2, 2024

Description:

Added Source-stripe-native.

Documentation links affected:

(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)

Notes for reviewers:

Stop Date

This connector implements a feature called Stop Date. When backfilling, the connector will constantly check wether the given document is older than the Stop Date ( or the cutoff, by default ). If so, the connector will instantly stop backfilling that specific stream.

Issuing endpoints

This connector implements 3 streams that uses "Issuing" Endpoints: Authorizations, CardHolders, Transactions
Issuing Endpoint requires permission to be accessed, so we validate the permission before creating Issuing objects inside all_resources


This change is Reviewable

@Luishfs Luishfs self-assigned this May 2, 2024
Luishfs added 6 commits May 6, 2024 18:07
Some streams models are not validating since the stripe docs was not up-to-date, will be fixing those in next commits
Full tests still havents started, and this version is not stable
…cords and created new resources to better handle edge cases
Modified stop_date behaviour to only yield equal stop date values

Modified stream names to new pattern

Setting the PR ready to review
@Luishfs Luishfs marked this pull request as ready for review May 16, 2024 18:02
@Luishfs Luishfs changed the title [WIP] Added Source-stripe-native Added Source-stripe-native May 16, 2024
Copy link
Member

@williamhbaker williamhbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some higher-level comments per @jonwihl's request. Many of these comments are general and are applicable to the code overall, as I have not commented on every single place where they apply. But overall this seems fairly reasonable, from what I can tell after a relatively cursory review.

# Checking if user has "Issuing" permissions
try:
#Using Authorizations stream for testing
url = f"https://api.stripe.com/v1/{Authorizations.SEARCH_NAME}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we do this, as opposed to requiring that the user has the required permissions? As far as I know we don't use this kind of adaptive discovery mechanism elsewhere, so I think we shouldn't here either unless there is some reason that I'm not aware of.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe this comment is more related to my bad writing than a concept itself.

"issuing" is a product inside Stripe, that is available on only a few countries and select accounts. As other services, you cannot allow/enable/disable usage throught a key like hubspot. Stripe is the one that allows your account to access such endpoints.
Which means that this checks if this feature is enabled by Stripe, and not if the user gave us permission to access it.
The text was changed here 0046721

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future reference, to test this, i created a US dev account, and was able to successfully create a sandbox enviroment.

"Removing 'Issuing' streams")

return all_streams

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird amount of extra whitespace here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in 0046721

class EndpointConfig(BaseModel):
credentials: AccessToken = Field(
title="API Key" )
stop_date: AwareDatetime = Field(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's still correct to call this a start date. Stop date would imply that we'd get everything earlier than this date but nothing afterwards . The connector works backward chronologically through the API responses, but the end result is that we will get records on or after this date, which makes it a start date.

Copy link
Collaborator Author

@Luishfs Luishfs May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea to call it a stop_date and not start_date is to reference the real-time incremental sync method along with the backfilling.
Yes, we get records on and after the date, but not like other singer connectors. The run will start with incremental getting the newest data and backfilling getting the rest.
I know that conceptually that is basically the same thing and shouln't really matter, but it was a way i found to show that this connector behaves differently than a imported one, and that the user will get real-time data & data untill its stop date, and not the connector starting at the given date, untill it reaches "real-time" like others.
Else, it may seem that our native versions are simply other variations of the same connector

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also realize that users are accustomed to the start_date term, hence why i reference stop_date as a "similar to start_date" in its description

data: List[Item]

class BackfillResult(BaseModel, Generic[Item], extra="allow"):
# Set extra as allow since Refunds has one aditional field
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Refunds the only one with an additional field? It would be better to accurately model the responses if possible, rather than allowing any extra fields through all the time, which could make things more difficult to debug.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this specific case, what happens is that BackfillResult is a generic name for the defualt Stripe API response.
All data endpoints have the exact same response format shown. but Refunds has one aditional field called count if i remember correctly.

I figured some ways to fix this: Add the count field with a default 0 value to the BackfillResult model, create a new object, with a new fetch_backfill specific for Refunds or allow the model to be permissive with extra fields.

I went with allowing the model to be permissive since this extra response data is actually not used, and if stripe decides to add a new random field that specific backfilling would simply break. Else, since this data is not yielded, i believe the resulting model should not affect performance or schema generation that much.
But, if you still believe it is worth to change that, would adding the field with a default value be the best alternative?


url = f"{API}/events"
parameters = {"type": cls.TYPES, "limit": 100}
recent = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since all we are trying to do is keep track of the largest (most recent) log cursor, we can just do that and not keep them all in an array that has to be sorted, like this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reference, just modified that in 0046721

if _s_to_dt(results.created) > log_cursor:
recent.append(_s_to_dt(results.created))
doc = _cls.model_validate(results.data.object)
doc.meta_ = _cls.Meta(op="u")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't need to set this (anymore); the default is now "u":

meta_: Meta = Field(
default=Meta(op="u"), alias="_meta", description="Document metadata"
)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in 0046721


_cls: Any = cls # Silence mypy false-positive

while stop:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's pretty confusing that while a variable called stop is true, the loop continues, and only when stop is set to false does the loop stop.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes a lot of sense, changed this variable name to iterating in 0046721

)

for doc in result.data:
log.debug(f"{_s_to_dt(doc.created)}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not seem like a useful log

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, forgot to remove that one. Removed in 0046721


child_url = f"{API}/{search_name}/{id}/{cls_child.SEARCH_NAME}"
parameters = cls_child.PARAMETERS
if "subscription" in parameters.keys():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The body of this function looks pretty ugly. Lots of magic strings and branching. I don't understand the Stripe data model well enough with child streams etc. or whatever is going on here to offer any suggestion other than that. I'd suggest trying to clean this up.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was to handle some special cases and dont make the models be that much different from one another. The subscription one was actually removed now, and the check was changed to compare only the stream name.
Mod was made here 0046721

await http.request(log, url, method="GET", params=parameters)
)
for results in result.data:
if _s_to_dt(results["created"]) > log_cursor:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the modeling is not right here. ListResult has data: List[Item] but Item = TypeVar("Item"). The modeling should updated so we don't have to access the "created" property of an untyped variable like this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated here 0046721

@Luishfs
Copy link
Collaborator Author

Luishfs commented May 30, 2024

@williamhbaker I believe to have answered everything as of now.
As always, thanks a lot for your time!

Copy link
Contributor

@jonwihl jonwihl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jonwihl jonwihl merged commit 498d528 into main Jun 4, 2024
43 of 46 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants