Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python: Use a context manager for opening files (SIM115) to solve some ResourceWarnings #4224

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

echoix
Copy link
Member

@echoix echoix commented Aug 25, 2024

This PR starts fixing some ResourceWarnings. I'm starting small, I am juggling with 4 sets of changes that are tested individually and one on top of the others, plus on top of my coverage branch. There are at least 368 places where a context manager should be used to open and close files. It shouldn't be done in one PR.

This first set is ready to be merged. It already solves some resource warnings appearing in macOS Pytest jobs that I was seeing.

Later on, I will merge some changes, but will explicitly make sure that changes in an input/output pair are in separate PRs. For example, if stds_export is changed, stds_import will be in another one. For changes in gunittest, it will be on their own. That way, it will help isolate the possibility of changing the code and the code treating the expected value, and will help make sure the existing code works well with only this part of the changes.

@github-actions github-actions bot added temporal Related to temporal data processing Python Related code is in Python libraries module labels Aug 25, 2024
in_file.close()
if out_file is not sys.stdout:
out_file.close()
in_file.close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's kind of hard to review this due to the changed indentation, but shouldn't this be removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. At first I wasn't as sure, but in the 3 later PRs (not submitted here yet), I've been doing it, after having tried it more.

Unfortunately, there will be a lots of places in future PRs where the indentation will change. Using side by side comparaison instead of inline would be easier to follow. I am trying to avoid just indenting everything. There are simple cases where a Path(a_file_var).read_text() can be used instead, and I try to use it first.
(See example in https://docs.astral.sh/ruff/rules/read-whole-file/ and https://docs.astral.sh/ruff/rules/write-whole-file/).

Some places I try to just change the order if there are no dependencies so the with is on its own without having to change the whole indent. It's difficult when it's used way below afterwards (like stds_export in a future PR, the diff is a little harder to follow).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about refactoring some of the functionality out to functions first so that the investment into the review is for the refactoring rather than just with statement? (Probably as a PR separate from this one.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'm for applying any "read/write whole file" changes first before this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'm for applying any "read/write whole file" changes first before this PR.

The ones that could be detected automatically were done in #4047 in mid July. The rule is active since so no more new introductions can happen.

It's only after some manual refactorings that I saw new occurrences appear.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about refactoring some of the functionality out to functions first so that the investment into the review is for the refactoring rather than just with statement? (Probably as a PR separate from this one.)

You're aware that there are 368+ places where there's missing a context manager for opened files? It's something a bit big... When the goal is to reduce some new Pylint errors, and help introducing pytest on windows....

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make a plan how to handle all those? It will be a lot to review even if the changes would be trivial.

If the edits are automated and 1:1, then we can say we will review only a sample. What I as suggesting before is that we could treat that based on categories: dev/null group, stdout group, pathlib one-liners, huge changes which may as well be refactoring, ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are not automated, yet. The patterns are easy to apply, so in theory there is nothing that would prevent ruff to create a fix in the future, buts it's just not done yet.

It ends up being a little keyboard dance, so it's not that long to do.

temporal/t.rast.what/t.rast.what.py Outdated Show resolved Hide resolved
temporal/t.rast.what/t.rast.what.py Outdated Show resolved Hide resolved
python/grass/script/db.py Show resolved Hide resolved
in_file.close()
if out_file is not sys.stdout:
out_file.close()
in_file.close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about refactoring some of the functionality out to functions first so that the investment into the review is for the refactoring rather than just with statement? (Probably as a PR separate from this one.)

in_file.close()
if out_file is not sys.stdout:
out_file.close()
in_file.close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'm for applying any "read/write whole file" changes first before this PR.

echoix and others added 2 commits August 26, 2024 17:35
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Comment on lines -446 to -447
if out_file is not sys.stdout:
out_file.close()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to have a test that triggers this? When reading carefully my code coverage equivalent branch, the t.rast.what file was only imported. I think it's just a collection problem, as there is a test using stdout and doesn't specify layout, so row will be used, entering in this function.
I want to know what happens when having a sys.stdout used as with a context manager, does it close it when exiting the context? It didn't break any tests though.
To be sure of the syntax, in the with of out_file above, I used way more parentheses, and let black simplify them back. I also checked with small examples in the same function above that pyright detected usages that allow or not with a context manager, and sys.stdout can be used with a context manager. It works just like text files. Can it really be closed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are three tests in test_what.py which test with output="-", so that's likely testing this stdout case.

I didn't check, but I would guess that you can close stdout, but you don't want to because any code later can be "surprised" by that.

Copy link
Member

@wenzeslaus wenzeslaus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the PR looks good and can be merged, but the t.rast.what code is a lot of review for the change it brings and there is a problem with closing sys.stdout, so I suggest removing that file from this PR.

matrix = []
for line in lines:
matrix.append(line.split(separator))
with open(output, "w") if output != "-" else sys.stdout as out_file:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python code:

import sys
with sys.stdout as f:
    f.write("Hello\n")
sys.stdout.write("World\n")

produces:

Hello
Traceback (most recent call last):
  File "/tmp/test.py", line 4, in <module>
    sys.stdout.write("World\n")
ValueError: I/O operation on closed file.

And I think that will be unexpected for anyone who will touch this code later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not fixed yet, I only solved the conflicts

@echoix
Copy link
Member Author

echoix commented Sep 18, 2024

I'll take a new look at my next available coding time.

Since that time, I've found other ways to not have to indent everything, by moving the static contents up and keeping the part that has to be indented small. Otherwise, it might be easier to look at the diff side by side rather than inline, and VS Code with whitespace highlighting is really good at seing that there are literally no change apart 4 spaces + black.

if file:
fd = open(file)

line = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what the line = True is here for. Is the value ever used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably so that it is initialized inside the whole loop? But it shouldn't work as expected, as it is being set by readlines()

@echoix
Copy link
Member Author

echoix commented Oct 17, 2024

This PR is not ready yet, there's still the stdout handling to go through

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libraries module Python Related code is in Python temporal Related to temporal data processing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants