Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Expand UnscopedFind offenses in "graphql" namespaces #30

Merged
merged 5 commits into from
Dec 18, 2023

Conversation

willlockwood
Copy link
Contributor

Summary

Currently, the UnscopedFind cop currently identifies violations of trust-root-chaining by looking for different types of custom finds that contain "user input"—specifically, that a non-trust-root-chained find contains a direct or indirect reference to params. For ruby applications using GraphQL, this cop will almost never find violations, since GraphQL implementations will idiomatically have no reference to params.

Here, I'm attempting to bridge the UnscopedFind gap for GraphQL implementations by also considering relevant find nodes in violation with the cop when inside of a namespace matching "graphql". Since popular packages like graphql-ruby promote the usage of GraphQL namespaces, I think this is the highest-fidelity fix I can imagine to expand the relevance of this cop outside of REST, without fully removing the params-based logic for the current cop.

/domain @smudge @effron
/domain @lindan-betterment
/platform @Betterment/ruby-platform-reviewers

node
.ancestors
.select { |n| n.class_type? || n.module_type? }
.any? { |n| n.identifier.to_s.downcase.include?("graphql") }
Copy link
Contributor

@dkubb dkubb Dec 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Just curious if we should match the string if it contains the "graphql" string anywhere even if it's a subset of some larger string?

What if instead we used:

n.identifier.to_s.match?(/\bGraphQL\b/)

If we want to match GraphQL and Graphql or similar, then we could add the /i regexp modifier to make it case insensitive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mm interesting, so I had intentionally written it to be at least flexible enough to match GraphQL and Graphql, thinking that I could easily imagine someone making the choice to downcase the QL, and thinking that that decision didn't feel super relevant to me, as to whether this cop should still apply.

Down to hear any other opinions on the question! Either way, I think the regex is a nicer touch (particularly with the word boundary) so I can switch to the case insensitive version and then see what others think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should note that I think the odds of another string containing the subset graphql, but it being something else, is pretty low. It's more as a matter of practice that I try to match strings in a way that would be correct in more instances.

I have seen production bugs caused by using String#include? where a stricter match wouldn't have.

Copy link
Contributor

@dkubb dkubb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one question about the use of String#include? and asking if we should be more precise in what we are matching.

platform tafn

@6f6d6172
Copy link
Contributor

6f6d6172 commented Dec 8, 2023

/task https://app.asana.com/0/1200240516476402/1206132040061266/f

will review this coming week!

node
.ancestors
.select { |n| n.class_type? || n.module_type? }
.any? { |n| n.identifier.to_s.match(/\bGraphQL\b/i) }
.any? { |n| n.identifier.to_s.match(GRAPHQL_PATTERN) }
Copy link
Contributor

@dkubb dkubb Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: I would suggest using String#match? because it satisfies the requirements, but also has a simpler return value; and is a "simpler primitive".

In code I tend to try to follow the principle of least power, which means that if I have 2 choices for a language construct, and both satisfy the requirements, I will choose the approach that has "fewer degrees of freedom" -- that is, it either has fewer or more constrained inputs or a more constrained return type.

In this case we are moving from a method with the MatchData return value to a Boolean. Sure, the evaluation of MatchData works due to how ruby evaluates non-nil objects in boolean context, but the Boolean return value of #match? has even fewer ways it can be used (or misused!). At some point in the future another developer is going to read or change this code, and our intentions are more clear with a Boolean return value.

6f6d6172
6f6d6172 previously approved these changes Dec 13, 2023
Copy link
Contributor

@6f6d6172 6f6d6172 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

domain lgtm -- very straight forward change. thanks for putting this together!

6f6d6172
6f6d6172 previously approved these changes Dec 13, 2023
Copy link
Contributor

@6f6d6172 6f6d6172 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

domain lgtm

Comment on lines 58 to 65
def graphql_namespace?(node)
return true if processed_source.path&.match?(GRAPHQL_PATTERN)

node
.ancestors
.select { |n| n.class_type? || n.module_type? }
.any? { |n| n.identifier.to_s.match?(GRAPHQL_PATTERN) }
end
Copy link
Contributor

@dkubb dkubb Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: This seems to be mixing concerns a bit, since it matches a file path against a pattern and then proceeds to match the stringified representation of the AST against the regexp. It works, but I would suggest splitting it out so it's more obvious.

The identifier.to_s serialized to a longer string and I thought it might be better to match precisely on the name exactly.

What I would propose is to use RuboCop's built-in matching syntax to match the nodes rather than doing all of it by hand. The existing code uses some the matching syntax already so it's not much of a stretch, eg:

# Add under the two `def_node_matcher` calls, returns only nodes matching the regexp:

def_node_search :find_namespace_nodes, <<~PATTERN, name: GRAPHQL_PATTERN
  (const _ %name)
PATTERN


# At the end of on_send:

add_offense(node) if find_param_arg(arg_nodes) || graphql_file? || graphql_namespace?(node)

# ...

def graphql_file?
  processed_source.path&.match?(GRAPHQL_PATTERN)
end

def graphql_namespace?(node)
  node
    .each_ancestor(:class, :module)
    .any? { |ancestor| find_namespace_nodes(ancestor).any? }
end

EDIT: I looked at the RuboCop pattern matching language and sadly there is no expression to say "match ancestors" to eliminate that extra #each_ancestors call. I would've loved it if I could have #graphql_namespace? just be something simple like: find_namespace_nodes(node).any? .. unfortunately no luck. This is the simplest I could make it, but if I'm missing something lmk.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! This works well, and it's nice to use the intended tools for this. I also tried for a while to see if even the syntax you have here could be more concise using a smarter matcher instead of a full search, but ended up coming up with nothing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(also FYI, I changed your method name to find_graphql_namespace_nodes since I felt like there was context missing from it)

Copy link
Contributor

@dkubb dkubb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the improved name for the node search method.

platform lgtm

Copy link
Member

@smudge smudge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

domain LGTM!

@smudge smudge merged commit 0040628 into Betterment:main Dec 18, 2023
4 checks passed
smudge pushed a commit that referenced this pull request Dec 18, 2023
This PR updates the betterlint version to `1.7.0` to include #29 and #30
in a new gem release.
@willlockwood willlockwood deleted the lockwood/graphql-unscoped-find branch December 18, 2023 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants