Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tidy First] Use isort to order/format/de-dupe python imports #10085

Merged
merged 4 commits into from
May 3, 2024

Conversation

QMalcolm
Copy link
Contributor

@QMalcolm QMalcolm commented May 3, 2024

resolves N/A

Problem

Our imports haven't been ordered ever, probably because doing so manually can be annoying (especially when things are already a mess). However having them out of order actually slows us (at least me) down because when we want to import something its harder to visually figure out if the necessary module is already being imported from or not.

This disorganization of imports caused instances where we were importing from the same module multiple times in the same file by accident. I wrote a short script to figure out how many distinct duplicate module import paths we had (this is when an from x.y.z import occurs more than once in a single file). The script found 60 instances across 50 files 🙃 One example of this for a visual representation is that from dbt.artifacts.resources import appeared twice in unparsed.py (Exhibit A, Exhibit B). In another file there was a module path that was imported from 3 separate times...

Solution

We've been using isort in other repositories such as metricflow and dbt-semantic-intefaces for this exact problem. So I've gone and added isort here as well, and it'll be invoked in pre-commit hooks.

Checklist

  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX
  • This PR includes type annotations for new and modified functions

The tool `isort` reorders imports to be in alphabetical order. I've
added it because our imports in most files are in random order. The lack
of order meant that:
- sometimes the same module would be imported from twice
- figuring out if a a module was already being imported from took
  longer

In the next commit I'll actually run isort to order everything. The best
part is that when developing, we don't have to put them in correct order.
Though you can if you want. However, `isort` will take care of re-ordering
things at commit time :)
Specifically we set two config values: `extend_skip_glob` and `known_first_party`.
The `extend_skip_glob` extends the default skipped paths. The defaults can be seen
here https://pycqa.github.io/isort/docs/configuration/options.html#skip. We are skipping
third party stubs because these are more so provided (I believe). We are skipping
`.github` and `scripts` as they feel out of scope and things we can be less strict with.

The `known_first_party` setting makes it so that these imports get grouped separately from
all other imports, which is useful visually of "this comes from us" vs "this comes from
someone/somewhere else".
I was seeing some odd behavior where running pre-commit, adding the modified
files, and then running pre-commit again would result making more modifications
to some of the same files. This felt odd. You shouldn't have to run pre-commit
more multiple times for it to eventually come to a final "solution". I believe
the problem was because we are using the tool `black` to format things, but weren't
registering the black profile with `isort` this lead to some conflicting formatting
rules, and the two tools had to negotiate a few times before being both satisfied.
Registering the profile `black` with `isort` resolved this problem.
This was done by running `pre-commit run --all`. I ran it separately from
the commit process itself because I wanted to run it against all files
instead of only changed files.

Of note, this not only reordered and formatted our imports. But we also
had 60 distinct duplicate module import paths across 50 files, which this
took care of. When I say "distinct duplicate module import paths" I mean
when `from x.y.z import` was imported more than once in a single file.
@QMalcolm QMalcolm added Skip Changelog Skips GHA to check for changelog file tidy_first "Tidy First" incremental cleanup changes labels May 3, 2024
@QMalcolm QMalcolm requested a review from a team as a code owner May 3, 2024 00:07
@cla-bot cla-bot bot added the cla:yes label May 3, 2024
@QMalcolm QMalcolm added the artifact_minor_upgrade To bypass the CI check by confirming that the change is not breaking label May 3, 2024
@QMalcolm
Copy link
Contributor Author

QMalcolm commented May 3, 2024

I've added the artifact-minor-upgrade label because the files in artifacts have been changed, but it's just the imports.

Copy link

codecov bot commented May 3, 2024

Codecov Report

Attention: Patch coverage is 98.71645% with 11 lines in your changes are missing coverage. Please review.

Project coverage is 88.20%. Comparing base (dc744f6) to head (98c0b81).

Files Patch % Lines
core/dbt/contracts/sql.py 0.00% 5 Missing ⚠️
core/dbt/task/sql.py 0.00% 3 Missing ⚠️
core/dbt/cli/context.py 0.00% 1 Missing ⚠️
core/dbt/clients/yaml_helper.py 83.33% 1 Missing ⚠️
core/dbt/task/docs/serve.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #10085      +/-   ##
==========================================
- Coverage   88.20%   88.20%   -0.01%     
==========================================
  Files         181      181              
  Lines       22722    22748      +26     
==========================================
+ Hits        20042    20064      +22     
- Misses       2680     2684       +4     
Flag Coverage Δ
integration 85.58% <98.59%> (-0.01%) ⬇️
unit 62.63% <93.23%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@QMalcolm
Copy link
Contributor Author

QMalcolm commented May 3, 2024

I think it's debatable if a changie doc is needed for this. We've added a dependency, but it's only a dev dependency. We've made changes in the source code that we distribute, but it's just the imports that are touched. Seems almost less than an Under the Hood change.

Copy link
Member

@emmyoop emmyoop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will pre-commit acknowledge these settings? I thought they would need to be in pre-commit-config.yaml as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is what I originally thought as well. I put the settings in the pre-commit-config.yaml and they were ignored. The isort configuration doc page lists many ways to set configs (notably not pre-commit). I did some testing with the .isort.cfg and it immediately worked respecting the options.

@QMalcolm QMalcolm merged commit 29b8359 into main May 3, 2024
81 of 85 checks passed
@QMalcolm QMalcolm deleted the qmalcolm--tidy-first-order-imports-with-isort branch May 3, 2024 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
artifact_minor_upgrade To bypass the CI check by confirming that the change is not breaking cla:yes Skip Changelog Skips GHA to check for changelog file tidy_first "Tidy First" incremental cleanup changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants