Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: use packaging to parse requirements #735

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

mkniewallner
Copy link
Collaborator

PR Checklist

  • A description of the changes is added to the description of this PR.
  • If there is a related issue, make sure it is linked to this PR.
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added or modified a feature, documentation in docs is updated

Description of changes

This is something that has been on my mind for quite some time now.

We currently rely on several regexes to parse dependencies in requirements files. Although this allows parsing formats that pip handles, there are many formats that PEP 508 does not cover, as both remote dependencies and local dependencies need to follow <package> @ <path> format. Even pip documentation suggests to use PEP 508 format.

The usage of regexes itself definitely makes the parsing best-effort, but it could also creates some false positives, as for instance for what looks like git URLs, we try to guess where the package name is, based on the git project name in the URL, which could depend on the git server used, or, worse, the git project name could be different than the real Python package name.

This PR suggests using packaging, maintained by PyPA, to parse dependencies where we expect PEP 508 format to be used (requirements files, PEP 621 metadata). This would remove support for URLs that do not follow PEP 508 dependencies, so this is a breaking change we would have to mention in the changelog, if we effectively want to go this way.

@mkniewallner mkniewallner added the breaking Change that introduces a breaking change label Jun 16, 2024
@mkniewallner mkniewallner force-pushed the feat/parse-requirements-packaging branch from bad3487 to 1cc9ba7 Compare June 16, 2024 15:37
Copy link

codecov bot commented Jun 16, 2024

Codecov Report

Attention: Patch coverage is 95.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 93.1%. Comparing base (0f0a1c6) to head (9f0bb47).

Files Patch % Lines
python/deptry/dependency_getter/pep_621.py 87.5% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##            main    #735     +/-   ##
=======================================
+ Coverage   92.8%   93.1%   +0.3%     
=======================================
  Files         35      35             
  Lines        920     888     -32     
  Branches     165     154     -11     
=======================================
- Hits         854     827     -27     
+ Misses        52      49      -3     
+ Partials      14      12      -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mkniewallner mkniewallner force-pushed the feat/parse-requirements-packaging branch from 1cc9ba7 to f9ddda5 Compare June 16, 2024 15:46
@mkniewallner mkniewallner changed the title feat: use packacing to parse requirements feat: use packaging to parse requirements Jun 16, 2024
@mkniewallner mkniewallner marked this pull request as ready for review June 16, 2024 16:07
@fpgmaas
Copy link
Owner

fpgmaas commented Jun 17, 2024

I do like the idea of using packaging to extract the dependencies, instead of using our own regexes, I think that is an improvement. As I understand; the only breaking change that we are aware of is for parsing requirements in requirements.txt in one of the following forms, right?

https://github.com/urllib3/urllib3/archive/refs/tags/1.26.8.zip
git+https://github.com/baz/foo-bar.git@asd#egg=foo-bar

In the link you share to the pip documentation, they suggest use PEP 508 format for installing from a package index. But they also show that they support other formats for packages that do not come from a package index. So I do think it would be good to keep supporting the formats in requirements.txt that we currently support, to reduce the risk of a breaking change.

Can we maybe do both for requirements.txt files? First try to extract the dependency with packaging, and if that fails, use a regex to extract the URL? Or maybe we can use something different completely? e.g. https://pypi.org/project/requirements-parser/

@mkniewallner
Copy link
Collaborator Author

Can we maybe do both for requirements.txt files? First try to extract the dependency with packaging, and if that fails, use a regex to extract the URL? Or maybe we can use something different completely? e.g. https://pypi.org/project/requirements-parser/

Between the 2 options, I'd personally prefer the first one, as packaging will not only be used to parse dependencies in requirements.txt files, but also in other formats that support PEP 508 (for instance [project.dependencies] in pyproject.toml.

I still think though that trying to guess the package name from a random URL on which we have no real control on feels quite hacky, even if most of the time this should give the user the expected result.

I'll put back the PR as a draft for now until I find the time to get back to this.

@mkniewallner mkniewallner marked this pull request as draft July 16, 2024 22:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Change that introduces a breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants