JavaSrc2Cpg: Infer type by Namespace and arg/parameter size #4434

khemrajrathore · 2024-04-08T16:22:57Z

This PR has the following changes

Currently in TypeInferencePass, a call is linked to a method If
- Namespace matches and
- Number of arguments in call and Number of parameter in method matches and
- Type of arguments in call and Type of parameter in method matches
If after applying these we are not able to link a call to a method, we try to link If
- Namespace matches and
- Number of argument in call and Number of parameter in method matches

DavidBakerEffendi

This type inference is meant to be more sound than type recovery, so an additional constraint I'd prefer you add is that, if we're ignoring argument types, that no other method has the same number of args, otherwise we may be calling either method.

DavidBakerEffendi · 2024-04-09T08:23:28Z

...cli/frontends/javasrc2cpg/src/main/scala/io/joern/javasrc2cpg/passes/TypeInferencePass.scala

+      candidateMethodsIter.find(isMatchingMethod(_, call, callNameParts, ignoreArgTypes = ignoreArgTypes)).flatMap {
+        method =>
+          val otherMatchingMethod =
+            candidateMethodsIter.find(isMatchingMethod(_, call, callNameParts, ignoreArgTypes = ignoreArgTypes))


An iterator can only be used once, and I see it used twice. Rather get rid of candidateMethodsIter and just call candidateMethods.start or candidateMethods.iterator when you need it.

The reason for calling it twice is

The first iterator call gives us the first match and the iterator stops there

we use the same iterator to continue the search to see if we get another matching method

This is an optimization than calling the iterator from the start again

No, iterators cannot be used more than once. I've verified this in a shell for you e.g.

scala> val x = Iterator(1, 2, 3) val x: Iterator[Int] = <iterator> scala> x.toList val res2: List[Int] = List(1, 2, 3) scala> x.toList val res3: List[Int] = List()

This is intentional (maybe a comment is required in code to explain this) and is an optimisation to avoid having to traverse the entire iterator twice.

The first call to candidateMethodsIter.find consumes iterator elements until the first match, but stops at this point. The second find call then continues the search until a second match is found.

To illustrate with an example considering only the first find call:

scala> val x = Iterator(1, 2, 3, 2, 4) val x: Iterator[Int] = <iterator> scala> x.find(_ == 2) val res3: Option[Int] = Some(2) scala> x.toList val res4: List[Int] = List(3, 2, 4)

And for both calls:

scala> val x = Iterator(1, 2, 3, 2, 4) val x: Iterator[Int] = <iterator> scala> x.find(_ == 2) val res5: Option[Int] = Some(2) scala> x.find(_ == 2) val res6: Option[Int] = Some(2) scala> x.toList val res7: List[Int] = List(4)

Agree, find won't consume the iterator fully. Instead for condition = true, find will return the matched item to the flatMap for further processing

Ah now I understand, TDIL! Thanks

khemrajrathore · 2024-04-09T08:53:05Z

This type inference is meant to be more sound than type recovery, so an additional constraint I'd prefer you add is that, if we're ignoring argument types, that no other method has the same number of args, otherwise we may be calling either method.

Yes, this is already handled.

The pass will only proceed and infer if we find only a single method matching the criteria

johannescoetzee · 2024-04-09T09:52:58Z

The changes in this PR seem to achieve the goal they intend to, but this was actually the way it was done in the TypeInferencePass in the past and we added the type check because of issues with inherited, non-overridden methods not being considered. This contributed to the general problem of unresolved type issues being hidden.

johannescoetzee · 2024-04-09T10:03:38Z

@fabsx00 and @DavidBakerEffendi I think it would be good to have another discussion about type inference in javasrc2cpg. We've run into a couple of situations where legitimate bugs were hidden by various type inference mechanisms (which would've been discovered easily if the signatures contained <unresolved...

Ideally, the CPG created by javasrc2cpg would only contain type information we know from the JavaParserSymbolSolver, along with type info from imports, with everything else happening in optional passes after AST creation (which this PR would fit).

This is a discussion to have separate from this PR though.

DavidBakerEffendi · 2024-04-09T11:28:42Z

@khemrajrathore, based on @johannescoetzee's message, I think it would be faster to override this in Privado's side if this is an urgent feature.

DavidBakerEffendi · 2024-04-09T12:57:26Z

@johannescoetzee Another option we discussed was if we added a flag that allows for this behaviour to be enabled, and is off by default. Would this be a good compromise?

johannescoetzee · 2024-04-09T14:34:56Z

@DavidBakerEffendi @khemrajrathore An off-by-default flag would work, although in my opinion this would be a temporary measure until we reach a decision on how to handle inference in general. As such, I suggest adding it as a hidden flag with a description saying this is temporary to avoid issues if we want to remove it later

khemrajrathore added 2 commits April 8, 2024 21:44

add - support to infer type if parameter / argument size matches

fb70dda

minor refactor

8486748

fabsx00 requested a review from johannescoetzee April 8, 2024 17:28

DavidBakerEffendi reviewed Apr 9, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JavaSrc2Cpg: Infer type by Namespace and arg/parameter size #4434

JavaSrc2Cpg: Infer type by Namespace and arg/parameter size #4434

khemrajrathore commented Apr 8, 2024

DavidBakerEffendi left a comment

DavidBakerEffendi Apr 9, 2024

khemrajrathore Apr 9, 2024

DavidBakerEffendi Apr 9, 2024

johannescoetzee Apr 9, 2024

khemrajrathore Apr 9, 2024

DavidBakerEffendi Apr 9, 2024

khemrajrathore commented Apr 9, 2024

johannescoetzee commented Apr 9, 2024 •

edited

Loading

johannescoetzee commented Apr 9, 2024

DavidBakerEffendi commented Apr 9, 2024

DavidBakerEffendi commented Apr 9, 2024

johannescoetzee commented Apr 9, 2024

JavaSrc2Cpg: Infer type by Namespace and arg/parameter size #4434

Are you sure you want to change the base?

JavaSrc2Cpg: Infer type by Namespace and arg/parameter size #4434

Conversation

khemrajrathore commented Apr 8, 2024

DavidBakerEffendi left a comment

Choose a reason for hiding this comment

DavidBakerEffendi Apr 9, 2024

Choose a reason for hiding this comment

khemrajrathore Apr 9, 2024

Choose a reason for hiding this comment

DavidBakerEffendi Apr 9, 2024

Choose a reason for hiding this comment

johannescoetzee Apr 9, 2024

Choose a reason for hiding this comment

khemrajrathore Apr 9, 2024

Choose a reason for hiding this comment

DavidBakerEffendi Apr 9, 2024

Choose a reason for hiding this comment

khemrajrathore commented Apr 9, 2024

johannescoetzee commented Apr 9, 2024 • edited Loading

johannescoetzee commented Apr 9, 2024

DavidBakerEffendi commented Apr 9, 2024

DavidBakerEffendi commented Apr 9, 2024

johannescoetzee commented Apr 9, 2024

johannescoetzee commented Apr 9, 2024 •

edited

Loading