Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve consistency of JoinedStr inference #2622

Conversation

ericvergnaud
Copy link
Contributor

@ericvergnaud ericvergnaud commented Oct 22, 2024

This PR:

  • improves consistency of JoinedStr inference by always returning either Uninferable or a fully inferred Const
    optionally, using a flag on JoinedStr, allows returning a partially inferred string such as "a/{MISSING_VALUE}/b" when inferring f"a/{missing}/b" and missing cannot be inferred.

Type of Changes

Type
🐛 Bug fix

Description

See above

Closes #2621

ericvergnaud and others added 5 commits October 22, 2024 18:43
Signed-off-by: Eric Vergnaud <eric.vergnaud@databricks.com>
Signed-off-by: Eric Vergnaud <eric.vergnaud@databricks.com>
…om:ericvergnaud/astroid into improve-consistency-of-joinedstr-inference
Copy link

codecov bot commented Oct 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.21%. Comparing base (e380fd1) to head (4e9680c).
Report is 8 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2622      +/-   ##
==========================================
- Coverage   93.24%   93.21%   -0.04%     
==========================================
  Files          93       93              
  Lines       11060    11076      +16     
==========================================
+ Hits        10313    10324      +11     
- Misses        747      752       +5     
Flag Coverage Δ
linux 93.10% <100.00%> (-0.04%) ⬇️
pypy 93.21% <100.00%> (-0.04%) ⬇️
windows 93.21% <100.00%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
astroid/nodes/node_classes.py 94.93% <100.00%> (-0.21%) ⬇️

@Pierre-Sassoulas Pierre-Sassoulas added this to the 4.0.0 milestone Oct 22, 2024
@Pierre-Sassoulas Pierre-Sassoulas added the Needs decision 🔒 Needs a decision before implemention or rejection label Oct 22, 2024
Copy link
Collaborator

@DanielNoord DanielNoord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I want the flag to be a thing on the class. astroid is not really configurable and I think that is actually beneficial in terms of maintainability.

I don't really understand your use case: can't we just return Uninferable when we can't infer the full string? That seems consistent with other nodes and is also the suggestion I already made when inference of JoinedStr was first added.

@ericvergnaud
Copy link
Contributor Author

I don't really understand your use case: can't we just return Uninferable when we can't infer the full string? That seems consistent with other nodes and is also the suggestion I already made when inference of JoinedStr was first added.

Returning Uninferable is not helpful in some usage scenarios.
Consider the following example:

def method_with_arg(arg):
    spark.table(f"/mnt/root/dir/{arg}/logs")

We want to tell our users that reading spark tables from mounts is no longer supported.
With JoinedStr.inferred() returning Uninferable, we're unable to detect that.
The feature proposed in this PR makes it possible, optionally, to return "/mnt/root/dir/{MISSING_VALUE}/logs", which is what we need to detect the "/mnt/" prefix, and also provide better feedback to users.
Happy to place the flag elsewhere.

Copy link
Member

@Pierre-Sassoulas Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/mnt/{MISSING_VALUE}/... should be/mnt/{Uninferable}/... imo. I'm not sure about this, there might be unexpected problems that I don't see at the moment. But it's probably better to infer a fake string when we know for sure it's going to be a string than to just say everything is uninferable, so +0.1 from me.

@jacobtylerwalls
Copy link
Member

The feature proposed in this PR makes it possible, optionally, to return "/mnt/root/dir/{MISSING_VALUE}/logs", which is what we need to detect the "/mnt/" prefix, and also provide better feedback to users.

I guess l'm wondering why you would detect that by calling inferred(), wouldn't you just directly inspect the AST?

>>> from astroid import extract_node
>>> n = extract_node("""
... def method_with_arg(arg):
...     spark.table(f"/mnt/root/dir/{arg}/logs")
... """)
>>> from astroid import nodes
>>> js = list(n.nodes_of_class(nodes.JoinedStr))[0]
>>> js.values[0].value
'/mnt/root/dir/'

I'll comment further on the issue.

@jacobtylerwalls
Copy link
Member

I guess l'm wondering why you would detect that by calling inferred(), wouldn't you just directly inspect the AST?

Oh, I see, it's so that both cases (mnt injected by the formatted value) can be handled by one call. I guess I'm not sure astroid needs to support that.

@DanielNoord
Copy link
Collaborator

Personally I feel like this use case is best served with a dedicated method, or an implementation in the pylint plugin itself. There just seem to be too many edge cases where the behaviour of the flag would be hard to define.

I think .infer() should just have one behaviour which should be both easily explainable and matching with what we do for other node types. Any specific JoinedStr behaviour should probably not be part of a generic interface.

@ericvergnaud
Copy link
Contributor Author

ericvergnaud commented Oct 23, 2024

The feature proposed in this PR makes it possible, optionally, to return "/mnt/root/dir/{MISSING_VALUE}/logs", which is what we need to detect the "/mnt/" prefix, and also provide better feedback to users.

I guess l'm wondering why you would detect that by calling inferred(), wouldn't you just directly inspect the AST?

>>> from astroid import extract_node
>>> n = extract_node("""
... def method_with_arg(arg):
...     spark.table(f"/mnt/root/dir/{arg}/logs")
... """)
>>> from astroid import nodes
>>> js = list(n.nodes_of_class(nodes.JoinedStr))[0]
>>> js.values[0].value
'/mnt/root/dir/'

I'll comment further on the issue.

Your proposal is not workable with the following code:

def method_with_arg(arg):
    path = f"/mnt/root/dir/{arg}/logs"
    spark.table(path)

not speaking of even more complex scenarios:

def method_with_arg(arg):
    paths = [f"/mnt/root/dir/{arg}/logs", "/some/other/path"]
    tables = [spark.table(path) for path in paths]

That's what makes node.inferred() so valuable!

@ericvergnaud
Copy link
Contributor Author

Personally I feel like this use case is best served with a dedicated method, or an implementation in the pylint plugin itself. There just seem to be too many edge cases where the behaviour of the flag would be hard to define.

I think .infer() should just have one behaviour which should be both easily explainable and matching with what we do for other node types. Any specific JoinedStr behaviour should probably not be part of a generic interface.

We're using astroid directly so a pylint plugin is not an option.
I agree with the above philosophy in theory, but I disagree in practice because it stops me from getting the outcome I need from astroid. We're using it for inference (we could do parsing and transforms using ast)
I guess it's legitimate for users to want to customise behaviour a bit, depending on their requirements.
A better design would be to provide a replaceable NodeFactory, such that users can provide their own implementation of JoinedStr. Happy to provide such PR if it's welcome.

@ericvergnaud
Copy link
Contributor Author

ericvergnaud commented Oct 23, 2024

I don't think I want the flag to be a thing on the class. astroid is not really configurable and I think that is actually beneficial in terms of maintainability.

I don't really understand your use case: can't we just return Uninferable when we can't infer the full string? That seems consistent with other nodes and is also the suggestion I already made when inference of JoinedStr was first added.

TBH this PR is not proposing to change the current behaviour (see FAIL_ON_UNINFERABLE = True), rather it makes it possible to customise it if required.

@ericvergnaud
Copy link
Contributor Author

@jacobtylerwalls @DanielNoord @Pierre-Sassoulas can you opine on the factory proposal ?

@Pierre-Sassoulas
Copy link
Member

Pierre-Sassoulas commented Oct 24, 2024

I think .infer() should just have one behaviour which should be both easily explainable and matching with what we do for other node types. Any specific JoinedStr behaviour should probably not be part of a generic interface.

Agree. But if we know it's a string we can return a string even if it's not 'perfect'.

Still lightly in favor of inferring string the best we can even if we have to include 'Uninferrable' in the middle
(/mnt/{Uninferable}/...). But it should be safeguarded, i.e. :

  • f"/mnt{uninferrable}/..." => we infer a str : /mnt/{Uninferable}/...
  • "/mnt" + uninferrable + "/..." => there's no cast to string for uninferrable so we might have a crash and we need to still detect it

@jacobtylerwalls
Copy link
Member

I guess I'm not certain we even know it's a string. Don't these uninferable cases include failures to format?

@DanielNoord
Copy link
Collaborator

Exactly, or they can include memory overflow issues, or calls to functions that crash or exit the interpreter.

I don't really see why we would special case this infer. If we infer an import we also know that it will be a module but we don't insert dummy module nodes if we can't infer the import.

@Pierre-Sassoulas
Copy link
Member

Let's remove the flag and return fully inferred const or uninferable then.

@Pierre-Sassoulas Pierre-Sassoulas added Enhancement ✨ Improvement to a component and removed Needs decision 🔒 Needs a decision before implemention or rejection labels Oct 24, 2024
@ericvergnaud
Copy link
Contributor Author

That doesn't solve my problem.
Any comments on the node factory proposal ?

@DanielNoord
Copy link
Collaborator

I think without seeing the code to do this we can't really comment on it.
Bigger refactors of astroid have sadly failed before so I'm wary of rewriting how we construct nodes.

Can't you solve your use cases with a function that takes these nodes? Potentially by making that a method in astroid.
Although you're not writing a pylint plug-in your use case does seem to imply you know what kind of node you're dealing with before calling infer.

@ericvergnaud
Copy link
Contributor Author

I think without seeing the code to do this we can't really comment on it. Bigger refactors of astroid have sadly failed before so I'm wary of rewriting how we construct nodes.

Can't you solve your use cases with a function that takes these nodes? Potentially by making that a method in astroid. Although you're not writing a pylint plug-in your use case does seem to imply you know what kind of node you're dealing with before calling infer.

This was just an example. We have dozens of use cases where a partial value is preferable to Uninferable, and the node type could be anything (Name, Attribute).

Re node factory, I'm not looking for pre-approval of a PR, rather I'm not interested in investing time providing one if philosophical concerns make it a no-go. I guess the above means 'possibly ok' ?

@DanielNoord
Copy link
Collaborator

Could you provide more? I'm still leaning towards infer() should be correct or Uninferable but I'm definitely open to suggestions. I just find it hard to wrap my head around how we could introduce this behaviour change without breaking expectations everywhere.

I'm wondering if try_to_infer() would be a nice extension of the Protocol.

"Possibly ok" is correct in the sense that we never reject proposals by default. I must warn though that maintainers time on astroid is very limited so grand refactors of the codebase might be hard to pull off.

@ericvergnaud
Copy link
Contributor Author

A node factory would transfer the responsibility of partial inference behavior to astroid users (via a dedicated class deriving from JoinedStr) rather than have it with the astroid team. From there the builtin JoinedStr.infer() would indeed be either correct or Uninferable, but astroid users would be able to override the behavior at no cost for astroid and astroid team.
I suspect the refactoring would be small in essence (impacting mostly the parser) but may span minor changes across many files.

@DanielNoord
Copy link
Collaborator

I'm still not convinced the effort is worth the result, especially considering the alternatives seems much nicer to me:

  1. Make this part of the package you are working on to get full control
  2. Add a 'partial infer' method that has the behaviour you're looking for

@ericvergnaud
Copy link
Contributor Author

I'm still not convinced the effort is worth the result, especially considering the alternatives seems much nicer to me:

  1. Make this part of the package you are working on to get full control
  2. Add a 'partial infer' method that has the behavior you're looking for

Not sure what you mean by 1. ? Can you explain ? The last thing we want to do is fork astroid...

Re partial_infer, such a method would have to be available on all node types since the JoinedStr may not be immediately reachable, see for example:

def print_stuff(a):
    b = f"Hello {a}!"
    print(b)

In the above, b is a Name, not a JoinedStr, so we'd need a partial_infer method on Name, that would call the partial_infer method of its value... Not sure it's nicer...
A node factory has the immense benefit of allowing all sorts of overrides, not just this one...

@ericvergnaud
Copy link
Contributor Author

Looks like this PR is not going o make it, despit having addressed the comments, and narrowed the scope to bug fixing ?

@Pierre-Sassoulas
Copy link
Member

Please wait a little before falling into despair, we're volunteers with limited time, no fast reviews do not mean it's going to be rejected only that we did not find the time to review yet ;)

("s1 = f'{5}' #@", "5"),
(
"s1 = f'{5}' #@",
"5",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"5",
"5"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trailing comma here is fine with me.

Copy link
Member

@jacobtylerwalls jacobtylerwalls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for being game to rescope this and fixing the underlying inconsistency while we discussed the other new feature proposal. I'll comment on the factory proposal in a minute.

We could consider backporting this as a bug fix, but perhaps not, just to keep things more stable (every time inference changes, users could get new false positives).

@jacobtylerwalls
Copy link
Member

Re the factory proposal, as far as I understand, we have the ability to do that already:

>>> import astroid
>>> def my_transform(node):
...     def _infer_with_values(self, context=None, **kwargs):
...         1/0  # replace with your override
...     node._infer_with_values = _infer_with_values
...     return node
...     
>>> astroid.MANAGER.register_transform(astroid.nodes.JoinedStr, my_transform)
>>> js = astroid.extract_node("f'{missing}'")
>>> js.inferred()
Traceback (most recent call last):
  File "<python-input-4>", line 1, in <module>
    js.inferred()
    ~~~~~~~~~~~^^
  File "/Users/jwalls/astroid/astroid/nodes/node_ng.py", line 585, in inferred
    return list(self.infer())
  File "/Users/jwalls/astroid/astroid/nodes/node_ng.py", line 167, in infer
    for i, result in enumerate(self._infer(context=context, **kwargs)):
                     ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jwalls/astroid/astroid/nodes/node_classes.py", line 4785, in _infer
    yield from self._infer_with_values(context)
               ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^
  File "<python-input-1>", line 3, in _infer_with_values
    1/0
    ~^~
ZeroDivisionError: division by zero

@jacobtylerwalls jacobtylerwalls added Bug 🪳 and removed Enhancement ✨ Improvement to a component labels Nov 17, 2024
@DanielNoord
Copy link
Collaborator

Agree with Jacob here, thanks for taking in the review and changing the PR. I don't have time for a review: @jacobtylerwalls would you be okay with merging this?

@ericvergnaud We would first need to rebase to prevent the conflicts.

@ericvergnaud
Copy link
Contributor Author

Agree with Jacob here, thanks for taking in the review and changing the PR. I don't have time for a review: @jacobtylerwalls would you be okay with merging this?

@ericvergnaud We would first need to rebase to prevent the conflicts.

The PR says there are no conflicts, so not sure why rebasing would be needed ?

@jacobtylerwalls jacobtylerwalls merged commit 805a3f6 into pylint-dev:main Nov 18, 2024
21 checks passed
@Pierre-Sassoulas
Copy link
Member

Thank you for you work and persistence on this @ericvergnaud !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inference of JoinedStr is inconsistent
4 participants