Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve unreachable code analysis #302

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

kreathon
Copy link
Contributor

Motivation

def f():
    try:
        return
    except:
        raise Exception()
    print("Unreachable")

At the moment, the print statement is not identified as unreachable.

Implementation

I added a data structure (no_fall_through_nodes) that stores ast nodes that do not allow a fall through. During the ast traversal continue, break, raise and return are added into the this data structure. For every control flow statement / object (for, try, while, with, but also module and functiondef), we check if any of the statement in the body is in no_fall_through_nodes. If that is the case, we report the next statement (if there is any) and add the current node also into no_fall_through_nodes. To make this work generic_visit (means visiting children) is now executed before visiting the current node.

The algorithm was added into the existing Vulture object with the generic_visit (which handles recursion). I think a cleaner implementation could implement a separate node visitor (that handles its own recursion), but I was not sure if the project is open for that.

The old error reporting message is reused (e.g. unreachable code after 'try'). I am not sure if this is the best message or if all unreachable messages should be simplified to something like unreachable code.

Limitations

  • does not support match statements
  • does not support with statements (context manager can suppress exceptions)

Related Issue

This PR is addressing the issue #270.

Checklist:

  • I have updated the documentation in the README.md file or my changes don't require an update.
  • I have added an entry in CHANGELOG.md.
  • I have added or adapted tests to cover my changes.
  • I have run tox -e fix-style to format my code and checked the result with tox -e style.

@codecov-commenter
Copy link

codecov-commenter commented Feb 17, 2023

Codecov Report

Merging #302 (a59eac4) into main (fc23f33) will increase coverage by 0.06%.
Report is 13 commits behind head on main.
The diff coverage is 100.00%.

❗ Current head a59eac4 differs from pull request most recent head 8d9fcf4. Consider uploading reports for the commit 8d9fcf4 to get more accurate results

@@            Coverage Diff             @@
##             main     #302      +/-   ##
==========================================
+ Coverage   98.96%   99.03%   +0.06%     
==========================================
  Files          21       21              
  Lines         679      727      +48     
==========================================
+ Hits          672      720      +48     
  Misses          7        7              
Files Changed Coverage Δ
vulture/core.py 98.75% <100.00%> (+0.21%) ⬆️
vulture/whitelists/ast_whitelist.py 100.00% <100.00%> (ø)

... and 2 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Owner

@jendrikseipp jendrikseipp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising this PR! I like the algorithm and that it nicely generalizes the earlier solution. However, I'd like to propose a different code organization, so that the core module doesn't grow too large:

  • Add a reachability.py module with a Reachability class.
  • This class stores the no-fall-through-nodes, allows to reset them, has a visit(node) method that checks the node's type and depending on the type marks nodes as no-fall-through or checks whether their body is fully executable.
  • The Vulture class has a Reachability member and calls its visit(node) function, and retrieves the unreachable code from Reachability before switching to the next module.

This goes in the direction of adding a second NodeVisitor class, but avoids traversing the AST twice. Or do you think a second NodeVisitor class (with all of these visitor_* functions) would be a better alternative?

vulture/core.py Outdated
@@ -225,6 +227,10 @@ def scan(self, code, filename=""):
self.noqa_lines = noqa.parse_noqa(self.code)
self.filename = filename

# We can reset the fall_through_nodes for every module to reduce
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# We can reset the fall_through_nodes for every module to reduce
# Reset the no-fall-through-nodes for every module to reduce memory usage.

vulture/core.py Show resolved Hide resolved
vulture/core.py Outdated

def _can_fall_through_statements_analysis(self, statements):
"""Report unreachable statements.
Returns True if we cannot fall though the list of statements
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Returns True if we cannot fall though the list of statements
Return True if we can execute the full list of statements.

@kreathon
Copy link
Contributor Author

kreathon commented Sep 3, 2023

This goes in the direction of adding a second NodeVisitor class, but avoids traversing the AST twice. Or do you think a second NodeVisitor class (with all of these visitor_* functions) would be a better alternative?

I do not see a clear advantage of one of the solutions and it probably does not matter too much right now (if the reachability is in a second module it should be really easy to switch between them anyway).

I will update the code according to your proposed code organization.

@kreathon
Copy link
Contributor Author

kreathon commented Sep 3, 2023

I am not sure if passing _define as report into Reachaiblity is the cleanest solution, but I did not want to create a new datastructure that passes the information back to the Vulture object. What do you think?

@kreathon
Copy link
Contributor Author

kreathon commented Sep 5, 2023

I had a look at it again, and I think it is cleaner to also move the _handle_conditional_node into the reachablitly.py (such that the module handles the entire self.unreachable_code LoggingList data structure).

I will propose the change as soon as I find time for it.

@kreathon
Copy link
Contributor Author

kreathon commented Sep 9, 2023

I updated

  • check_unreachable for checking multiple unreachable code segments (alternative would be to add another method for that or assert that there is only a single unreachable code segment separately)
  • move some tests from test_conditions.py into test_reachability.py
  • merge handle_condition_node() into Reachability to get more powerful analysis results. There are still cases that are not handled (we need to distinguish between raise/return and break/continue for that).

For example.

while True:
    raise Exception()
print(b)

I could handle this in another PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants