-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Source-stripe-native #1546
Conversation
Some streams models are not validating since the stripe docs was not up-to-date, will be fixing those in next commits Full tests still havents started, and this version is not stable
SetupAttempts streams to child streams
…cords and created new resources to better handle edge cases
Modified stop_date behaviour to only yield equal stop date values Modified stream names to new pattern Setting the PR ready to review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some higher-level comments per @jonwihl's request. Many of these comments are general and are applicable to the code overall, as I have not commented on every single place where they apply. But overall this seems fairly reasonable, from what I can tell after a relatively cursory review.
# Checking if user has "Issuing" permissions | ||
try: | ||
#Using Authorizations stream for testing | ||
url = f"https://api.stripe.com/v1/{Authorizations.SEARCH_NAME}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would we do this, as opposed to requiring that the user has the required permissions? As far as I know we don't use this kind of adaptive discovery mechanism elsewhere, so I think we shouldn't here either unless there is some reason that I'm not aware of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i believe this comment is more related to my bad writing than a concept itself.
"issuing" is a product inside Stripe, that is available on only a few countries and select accounts. As other services, you cannot allow/enable/disable usage throught a key like hubspot. Stripe is the one that allows your account to access such endpoints.
Which means that this checks if this feature is enabled by Stripe, and not if the user gave us permission to access it.
The text was changed here 0046721
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For future reference, to test this, i created a US dev account, and was able to successfully create a sandbox enviroment.
"Removing 'Issuing' streams") | ||
|
||
return all_streams | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird amount of extra whitespace here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed in 0046721
class EndpointConfig(BaseModel): | ||
credentials: AccessToken = Field( | ||
title="API Key" ) | ||
stop_date: AwareDatetime = Field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's still correct to call this a start date. Stop date would imply that we'd get everything earlier than this date but nothing afterwards . The connector works backward chronologically through the API responses, but the end result is that we will get records on or after this date, which makes it a start date.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My idea to call it a stop_date
and not start_date
is to reference the real-time incremental sync method along with the backfilling.
Yes, we get records on and after the date, but not like other singer connectors. The run will start with incremental getting the newest data and backfilling getting the rest.
I know that conceptually that is basically the same thing and shouln't really matter, but it was a way i found to show that this connector behaves differently than a imported one, and that the user will get real-time data & data untill its stop date, and not the connector starting at the given date, untill it reaches "real-time" like others.
Else, it may seem that our native versions are simply other variations of the same connector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also realize that users are accustomed to the start_date
term, hence why i reference stop_date
as a "similar to start_date" in its description
data: List[Item] | ||
|
||
class BackfillResult(BaseModel, Generic[Item], extra="allow"): | ||
# Set extra as allow since Refunds has one aditional field |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is Refunds the only one with an additional field? It would be better to accurately model the responses if possible, rather than allowing any extra fields through all the time, which could make things more difficult to debug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this specific case, what happens is that BackfillResult
is a generic name for the defualt Stripe
API response.
All data endpoints have the exact same response format shown. but Refunds
has one aditional field called count
if i remember correctly.
I figured some ways to fix this: Add the count
field with a default 0 value to the BackfillResult
model, create a new object, with a new fetch_backfill specific for Refunds or allow the model to be permissive with extra fields.
I went with allowing the model to be permissive since this extra response data is actually not used, and if stripe decides to add a new random field that specific backfilling would simply break. Else, since this data is not yielded, i believe the resulting model should not affect performance or schema generation that much.
But, if you still believe it is worth to change that, would adding the field with a default value be the best alternative?
|
||
url = f"{API}/events" | ||
parameters = {"type": cls.TYPES, "limit": 100} | ||
recent = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since all we are trying to do is keep track of the largest (most recent) log cursor, we can just do that and not keep them all in an array that has to be sorted, like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reference, just modified that in 0046721
if _s_to_dt(results.created) > log_cursor: | ||
recent.append(_s_to_dt(results.created)) | ||
doc = _cls.model_validate(results.data.object) | ||
doc.meta_ = _cls.Meta(op="u") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't need to set this (anymore); the default is now "u"
:
connectors/estuary-cdk/estuary_cdk/capture/common.py
Lines 60 to 62 in ea8b996
meta_: Meta = Field( | |
default=Meta(op="u"), alias="_meta", description="Document metadata" | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed in 0046721
|
||
_cls: Any = cls # Silence mypy false-positive | ||
|
||
while stop: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's pretty confusing that while a variable called stop
is true
, the loop continues, and only when stop
is set to false
does the loop stop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes a lot of sense, changed this variable name to iterating
in 0046721
) | ||
|
||
for doc in result.data: | ||
log.debug(f"{_s_to_dt(doc.created)}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does not seem like a useful log
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, forgot to remove that one. Removed in 0046721
|
||
child_url = f"{API}/{search_name}/{id}/{cls_child.SEARCH_NAME}" | ||
parameters = cls_child.PARAMETERS | ||
if "subscription" in parameters.keys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The body of this function looks pretty ugly. Lots of magic strings and branching. I don't understand the Stripe data model well enough with child streams etc. or whatever is going on here to offer any suggestion other than that. I'd suggest trying to clean this up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was to handle some special cases and dont make the models be that much different from one another. The subscription
one was actually removed now, and the check was changed to compare only the stream name.
Mod was made here 0046721
await http.request(log, url, method="GET", params=parameters) | ||
) | ||
for results in result.data: | ||
if _s_to_dt(results["created"]) > log_cursor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the modeling is not right here. ListResult
has data: List[Item]
but Item = TypeVar("Item")
. The modeling should updated so we don't have to access the "created" property of an untyped variable like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated here 0046721
@williamhbaker I believe to have answered everything as of now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description:
Added Source-stripe-native.
Documentation links affected:
(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)
Notes for reviewers:
Stop Date
This connector implements a feature called Stop Date. When backfilling, the connector will constantly check wether the given document is older than the Stop Date ( or the cutoff, by default ). If so, the connector will instantly stop backfilling that specific stream.
Issuing endpoints
This connector implements 3 streams that uses "Issuing" Endpoints: Authorizations, CardHolders, Transactions
Issuing Endpoint requires permission to be accessed, so we validate the permission before creating Issuing objects inside
all_resources
This change is