Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix!: merge payload and data fields of Request #542

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

vdusek
Copy link
Collaborator

@vdusek vdusek commented Sep 24, 2024

Description

  • We had data and payload fields on the Request model.
    • payload was not being provided to the HTTP clients, only the data field.
  • In this PR, I'm merging them together, keeping only the data field (use the same naming as HTTPX & CurlImpersonate).
  • In this PR, I'm merging them together, keeping only the payload field (use the same naming as in JS Crawlee).
  • I also defined type aliases for HTTP data and HTTP query parameters and used them across the project.
  • Some struggle with Pydantic & serialization, but should be OK now...

Issues

Testing

  • The current set of unit tests should cover the changes.

Checklist

  • CI passed

@vdusek vdusek added bug Something isn't working. t-tooling Issues with this label are in the ownership of the tooling team. adhoc Ad-hoc unplanned task added during the sprint. labels Sep 24, 2024
@vdusek vdusek added this to the 99th sprint - Tooling team milestone Sep 24, 2024
@vdusek vdusek self-assigned this Sep 24, 2024
@github-actions github-actions bot added the tested Temporary label used only programatically for some analytics. label Sep 24, 2024
@B4nan
Copy link
Member

B4nan commented Sep 24, 2024

JS version have only payload, we should aim for compact with that in the first place.

@B4nan
Copy link
Member

B4nan commented Sep 24, 2024

Also, since this is breaking, maybe we should wait, and merge it once we have more breaking changes than a simple one like this (or some feature worth the minor bump).

@honzajavorek
Copy link

I learned about data here, but I don't see it in changed files: https://crawlee.dev/python/docs/examples/fill-and-submit-web-form#preparing-a-post-request

@honzajavorek
Copy link

It also says "Alternatively, you can send form data as URL parameters using the query_params argument." btw, not sure if that is something related or not.

@vdusek
Copy link
Collaborator Author

vdusek commented Sep 24, 2024

Also, since this is breaking, maybe we should wait, and merge it once we have more breaking changes than a simple one like this (or some feature worth the minor bump).

We will have the first version of fingerprints for Playwright soon 🤞.

I learned about data here, but I don't see it in changed files: https://crawlee.dev/python/docs/examples/fill-and-submit-web-form#preparing-a-post-request

Thanks. Of course, I forgot to update the docs.

Copy link
Collaborator

@janbuchar janbuchar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Just a couple of questions...

@@ -26,6 +26,10 @@

HttpMethod: TypeAlias = Literal['GET', 'HEAD', 'POST', 'PUT', 'DELETE', 'CONNECT', 'OPTIONS', 'TRACE', 'PATCH']

HttpQueryParams: TypeAlias = dict[str, Any]

HttpPayload: TypeAlias = dict[str, Any]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dict is kinda surprising here - I'd expect bytes or something. How did this happen? 😄

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The downstream libraries (HTTPX & CurlImpersonate) do the conversions. And both of them accept RequestData = Mapping[str, Any] type.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I did some digging and this is what I found. payload is part of the Request schema on the API, and the API client requires it to be a string. It'll get serialized and passed to the API endpoint on platform. Of course, it also needs to be compatible with whatever the HTTP client accepts - that's where the actual processing gets done.

As a side note, I'm not sure what data was supposed to be, but it seems it's not needed...

src/crawlee/_request.py Outdated Show resolved Hide resolved
src/crawlee/_types.py Outdated Show resolved Hide resolved
@vdusek vdusek force-pushed the merge-payload-and-data branch 3 times, most recently from 649eaa5 to 4c3d891 Compare September 24, 2024 14:09
@vdusek vdusek mentioned this pull request Sep 25, 2024
1 task
@vdusek
Copy link
Collaborator Author

vdusek commented Sep 25, 2024

Also, since this is breaking, maybe we should wait, and merge it once we have more breaking changes than a simple one like this (or some feature worth the minor bump).

I've separated the Curl impersonation fix and other minor updates into a separate PR (#543), allowing us to release a patch version for 0.3.

The payload & data merge and HttpHeaders thing will be resolved later...

vdusek added a commit that referenced this pull request Sep 25, 2024
### Description

- Version 1.7.2 of `curl-cffi` introduces breaking changes.
- This update includes adopting the new version, adding type aliases for
`Request` fields, and incorporating other minor changes from PR #542.

### Issues

- N/A (ad-hoc fix)

### Testing

- The current set of unit tests should cover the changes.

### Checklist

- [x] CI passed
@vdusek vdusek marked this pull request as draft September 25, 2024 07:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
adhoc Ad-hoc unplanned task added during the sprint. bug Something isn't working. t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants