Lint Architecture to support semantic model and other queries #2603

xunilrj · 2022-05-20T18:06:14Z

xunilrj
May 20, 2022

This will be implemented by #2488

Today we have a nice way to write linters:

tools/crates/rome_analyze/src/analyzers/no_delete.rs

Lines 14 to 32 in d78e802

    
           pub(crate) enum NoDelete {} 
        
           impl Rule for NoDelete { 
        
               const NAME: &'static str = "noDelete"; 
        
               const CATEGORY: RuleCategory = RuleCategory::Lint; 
        
               type Query = JsUnaryExpression; 
        
               type State = MemberExpression; 
        
               fn run(node: &Self::Query) -> Option<Self::State> { 
        
                   let op = node.operator().ok()?; 
        
                   if op != JsUnaryOperator::Delete { 
        
                       return None; 
        
                   } 
        
                   let argument = node.argument().ok()?; 
        
                   MemberExpression::try_from(argument).ok() 
        
               }

The problem is that when the Rule::runmethod is called and you need information outside the CST, today this is impossible. In this case, we only have access to the typed node: JsUnaryExpression.

To solve this problem I think we have four options:

1 - Introduce a new parameter that will be a façade to multiple services;
2 - "Inject" services as parameters to the lint methods;
3 - Allow the lint object to capture services it needs when it is instantiated;
4 - Specific Rule traits with chosen available services;

Another possible discussion point is about
A - Query-based architecture;
B - async/await.

My suggestion is: 1AB. 😄

xunilrj · 2022-05-20T18:06:33Z

xunilrj
May 20, 2022
Author

1 - New parameter

This is the easiest approach. We can have something like:

pub(crate) enum UnusedVariable {}
impl Rule for UnusedVariable {
    const NAME: &'static str = "unusedVariable";
    const CATEGORY: RuleCategory = RuleCategory::Lint;

    type Query = JsVariableDeclaration;
    type State = JsVariableDeclaration;

    fn run(ctx: crate::registry::RuleContext<Self::Query>) -> Option<Self::State> {
        let map = ctx.queries().reference_count_map();
        let map = map.value();
        let range = ctx.node().syntax().text_range();
        match map.get(&range) {
            Some(0) => Some(ctx.node().clone()),
            _ => None,
        }
    }

    ...
}

Inside this option we have some flavours:
1 - We send a façade where the lint can have access to services, queries and the node itself: ctx.semantic() returns the semantic model façade, ctx.queries() returns the query façade and ctx.node() return the requested node.

2 - ctx can behave like a service locator. You ask for ctx.service<SemanticModel>() or ctx.service<ReferenceCountMap>() and etc...

I actually prefer the first one.

3 replies

ematipico May 24, 2022

I don't mind this solution. Is simple enough and I suppose it doesn't involve too much work behind the scenes?

leops May 30, 2022

This is also the option I find preferable, it integrates nicely with the existing infrastructure and would allow us to continue extending the API available to individual rules with minimal disruption
One small change I would propose would be to write the argument as ctx: RuleContext<Self>, with RuleContext having a new where T: Rule trait bound and the query() method returning T::Query. Besides making the signature of run slightly shorter to write it lets the context access all items declared in the Rule trait, including future associated types we could want to add

xunilrj May 31, 2022
Author

I think makes sense. My proposal at #2603 (comment) incorporates this idea.

xunilrj · 2022-05-20T18:06:43Z

xunilrj
May 20, 2022
Author

2 - Injection

It is possible to implement dependency injection for each lint method. We could do:

pub(crate) enum UnusedVariable {}
impl Rule for UnusedVariable {
    const NAME: &'static str = "unusedVariable";
    const CATEGORY: RuleCategory = RuleCategory::Lint;

    type Query = JsVariableDeclaration;
    type State = JsVariableDeclaration;
    type Services = (SemanticModel, ReferenceCountMap)

    fn run(node: &Self::Query, (semantic_model, map): Self::Services) -> Option<Self::State> {
        match map.count_of(node.text_range()) {
            Some(0) => Some(ctx.node().clone()),
            _ => None,
        }
    }

    ...
}

1 reply

ematipico May 24, 2022

This option is exactly what we used to have in Rome classic, I am already familiar with the design and third party contributors too. I don't mind this solution but I don't have strong opinions, I can adapt to new designs.

xunilrj · 2022-05-20T18:06:55Z

xunilrj
May 20, 2022
Author

3 - Instance Services

Today, the lint registry does not store the rule instance. But we could, and each rule could store the services its needs.

pub(crate) enum UnusedVariable {
    map: ReferenceCountMap
}

impl Rule for UnusedVariable {
    const NAME: &'static str = "unusedVariable";
    const CATEGORY: RuleCategory = RuleCategory::Lint;

    type Query = JsVariableDeclaration;
    type State = JsVariableDeclaration;

    fn run(node: &Self::Query) -> Option<Self::State> {
        match self.map.count_of(node.text_range()) {
            Some(0) => Some(ctx.node().clone()),
            _ => None,
        }
    }

    ...
}

This would demand a specific new method like

impl UnusedVariable  {
    pub fn new(services: ServiceLocator) -> Self {
        Self {
            map: services.get(),
        }
    }
}

It seems possible to #[derive(Inject)] and automatically generate this method. There are also a lot of DI crates in the wild.

1 reply

ematipico May 26, 2022

Not sure if we want to rely on another crate for doing dependency injection, unless we want to use DI in a broader part of our tools.

xunilrj · 2022-05-20T18:07:09Z

xunilrj
May 20, 2022
Author

4 - Specific Rule Traits

Today the Rule trait is limited to no services. But we could have different rules like:

tools/crates/rome_analyze/src/registry.rs

Lines 85 to 99 in d78e802

    
           pub(crate) trait Rule { 
        
               /// The name of this rule, displayed in the diagnostics it emits 
        
               const NAME: &'static str; 
        
               /// The category this rule belong to, this is used for broadly filtering 
        
               /// rules when running the analyzer 
        
               const CATEGORY: RuleCategory; 
        
               /// The type of AstNode this rule is interested in 
        
               type Query: AstNode; 
        
               /// A generic type that will be kept in memory between a call to `run` and 
        
               /// subsequent executions of `diagnostic` or `action`, allows the rule to 
        
               /// hold some temporary state between the moment a signal is raised and 
        
               /// when a diagnostic or action needs to be built 
        
               type State;

pub(crate) trait SemanticRule {
    const NAME: &'static str;
    const CATEGORY: RuleCategory;

    type Query: AstNode;
    type State;

    fn run(ctx: RuleContext<Self::Query>, model: SemanticModel) -> Option<Self::State>;

    ...
}

Here we have actually two choices:
1 - Be extremely strict on the second parameter, like I was;
2 - Or be looser and pass a more generic façade allowing access to multiple other services.

This option is interesting because we have a hardcoded categorization of the rules. For example, we know that rules that implement Rule, don't depend on anything and a probably simpler. We can start running them as soon as we finish parsing.

Rules that implement SemanticRule already signalled that they want a SemanticModel. So we can generate the semantic model in advance, whilst the other rules are running, and call them when everything is ready (this does not apply to case number two above).

In this case, we could have

Rule (run after parsing);
ScopeRule (run after scope resolution of the file is done);
SemanticRule (run after the full semantic model of the file is done);
WorkspaceRule (run after the full semantic model of the workspace is done).

1 reply

ematipico May 24, 2022

I like this idea! The only obstacle - a big one - is that the developer must know in advance which type of rule is creating. And if you think about third party contributors, this will create a bigger friction.

If we decided to go this route, we need to do a good job to document each trait and explain when to use each of them.

xunilrj · 2022-05-20T18:07:24Z

xunilrj
May 20, 2022
Author

A - Query-based architecture

At the beginning of the project, we discussed following a query-based architecture. We looked at https://github.com/salsa-rs/salsa and https://github.com/Adapton/adapton.rust.

We can create another specific discussion about salsa because it is not a simple library.

Nonetheless, having a query-based architecture can be very beneficial, and this would be our first point to start this. In my first example, I was actually imagining a query-based architecture.

That is why I did:

 fn run(ctx: crate::registry::RuleContext<Self::Query>) -> Option<Self::State> {
        let map = ctx.queries().reference_count_map();
        let map = map.value();
        let range = ctx.node().syntax().text_range();
        match map.get(&range) {
            Some(0) => Some(ctx.node().clone()),
            _ => None,
        }
    }

ctx.queries().reference_count_map() behind the scenes would be something like

    fn reference_count_map(&mut self) -> HashMap<TextRange, usize> {
            let events = self.queries().scope_events();
            let map: HashMap<TextRange, usize> = HashMap::new();
            for e in events.value().iter() {
                //TODO
            }
            map
    }

    fn scope_events(&mut self) -> Vec<ScopeResolutionEvent> {       
        let result = self.queries().parse();
        let events: Vec<ScopeResolutionEvent> = todo!();
        events
    }

I actually have a POC of this architecture working, it makes sense to discuss this any further. The general idea would be that somehow automagically, the code above would be cached and its dependency would be discovered so we invalidate things correctly. An example of the dependency graph that my POC is generating:

In this particular case. Updating a file invalidates its "parse", that propagates and invalidates everything. We can be smarter than that, but that is fine for now.

5 replies

ematipico May 24, 2022

How would this work with multiple rules that interrogate the same portion of code/scope? As far as I understand, a query based architecture is beneficial in a sense that we can cache queries along the way and re-use things that have been requested. Although, we also thing to when we invalidate the cache.

Given the following example:

function f() {
	let a = { c: "something" };
	let b;
	delete a.c;
}

Now, let's suppose that in this snippet we apply two rules: noUnusedVariables and noDelete (let's skip the rule details at the moment). The first one will be applied to line let b and the second one will be applied to line delete a.c.

Now, let's suppose also that both rules have safe fixes (in real world example they don't, but for the sake of the example let's assume they have). The first rule will execute and applies the fix. While doing so it will remove let b; from the source code.

Now we run the second rule and we apply the safe fix. This will transform delete a.c to a.c = undefined.

Given this, my question is the following: when executing the second rule and we query the current scope, how many scoped variables do we have?
When we run the first rule, we have two scoped variables: a and b. The first rule remove let b; from the source code (the CST), which means that now we should have only one scoped variable, a. When executing the second rule, is the mapped scope invalidated and updated? (meaning it has only a as scoped variable).

leops May 24, 2022

With the current architecture of the analyzer, code actions are emitted as "signals" and it's up to whatever is consuming those (the CLI or Language Server) to apply them. In order apply two different fixes the analyzer needs to be run two times: the first run will emit a code action, applying this code action creates a new version of the tree, and this new tree needs to be run through the analyzer again to pull the second action and apply it. This means by the time the second rule is run it does so on a version of the syntax tree (and its derived semantic tree) that doesn't have the b symbol defined.
If we add caching and incremental analysis to speed things up, from the point of view of the rules it should always behave as if they were accessing a consistent view of the "latest" revision of the syntax and semantic trees.

MichaReiser May 25, 2022

What are your thoughts on how to support different languages where many queries may only make sense for a specific language.

Would it be possible for consumers to extend the available queries so that rome only needs to expose the base queries and other crates can build more fine-grain queries on top of it.

xunilrj May 26, 2022
Author

Given this, my question is the following: when executing the second rule and we query the current scope, how many scoped variables do we have?

point of view of the rules it should always behave as if they were accessing a consistent view of the "latest" revision of the syntax and semantic trees.

As @leops said.
Just one. Just a.

So if for some reason the second rule is affected by the first fix, it would vanish. Something like "tooManyLocalVariables". After removing all unused variables, this rule may be ok. It will never know because it will query for how many local variables a block has.

xunilrj May 26, 2022
Author

how to support different languages where many queries may only make sense for a specific language.
other crates can build more fine-grain queries on top of it.

If we decide on a unique façade, it can vary by language as SyntaxNode does.
Something like

pub trait LanguageRuleContext {
    type Queries;
}

pub struct RuleContext<Language: LanguageRuleContext> {
    ...
}

impl<Language: LanguageRuleContext> RuleContext<Language> {
    pub fn queries(&self) -> Language::Queries {
        todo!()
    }
}

pub struct JsRuleContext {
    
}

impl LanguageRuleContext for JsRuleContext {
    type Queries = ();
}

fn main() {
    let ctx: RuleContext<JsRuleContext> = RuleContext {
        ...
    };
}

That would enable the JsRuleContext to exist inside any rome_js_* crate.

We may need to break the analyzer crate in two: one at the bottom which languages crates depend on; and another on top which depends on each language crate.

xunilrj · 2022-05-20T18:07:43Z

xunilrj
May 20, 2022
Author

B - Async/Await

Although there is no IO when linting (or very little), lint Rules could benefit from being async/await. Specifically, if we choose an async/await architecture.

The very first lint rule can ask for the semantic model of all files and that would jam the pipeline because today we don't parallelize rules.

One easy way to avoid this problem is to make the rules async.

 async fn run(ctx: crate::registry::RuleContext<Self::Query>) -> Option<Self::State> {
    let map = ctx.queries().reference_count_map().await;
    let map = map.value();
    let range = ctx.node().syntax().text_range();
    match map.get(&range) {
        Some(0) => Some(ctx.node().clone()),
        _ => None,
    }
}

The code above is exactly the same as my first example, but with async/await. The good thing here is that this rule, which demands the scope resolution and the reference count map, would block until everything is ready.

And the code that runs the rules, can now very easily just

for rule in rules {
    tokio::spawn(...);
}

And the async runtime will handle the scheduling of everything. So if other rules also query the same reference count map, they will all wait together.

There are complications, of course. How do we cancel all this? Probably everything will need to return an Result<>.

2 replies

ematipico May 24, 2022

Do we need async/await to parallelize the execution of rules?

The rules that don't require semantic model can be executed in parallel by using rayon using an iterator (just an idea). When all the previous rules finished, the rules that require semantic models can be executed in sequence using a queue system. Each rule needs the previous one because their safe fix might change the semantic model.

How tokio will help? I am trying to understand 😅

xunilrj May 26, 2022
Author

The rules that don't require semantic model can be executed in parallel by using rayon

Yes, but the problem is not so much how to parallelize. That is the easy part, but how not to block when running queries.

As an easy example, suppose we run a small pool with 3 threads. And we are very unlucky and the first 3 rules running in parallel run the same query and block whilst the query is running. Now the whole analyzer is blocked. Because all threads on the pool are blocked.

An async runtime would solve this. Because it would block tasks and never threads. In our unlucky scenario, the first three rules are blocked waiting for the query, but all threads on the pool are free and are already starting the next batch of rules.

leops · 2022-05-23T07:45:14Z

leops
May 23, 2022

How I originally envisioned this to work when designing the Rule trait in the analyzer is that while the Query associated type is currently bound to AstNode, it would eventually use a different Queryable trait instead.
This trait would be sealed (can only be implemented within rome_analyzer) and it's "signature" would be conceptually undefined (it may define any number of methods and associated constants if that's needed to speed up the querying process, and external crates shouldn't depend on those), and with the current version of the analyzer this trait would only need to be implemented for a newtype struct Syntax<N: AstNode>(N) to let existing rules query for a certain AST node type.
In order to integrate with the semantic model a second struct Semantic<N: SemanticNode>(N) would be defined to let rules query for certain "semantic nodes": typed declarations, references and scopes.
I think it could also be useful to implement this trait on tuples to let rules have queries like (Syntax<JsFunctionDeclaration>, Semantic<Declaration>) to search for semantic declaration nodes that are also function declaration syntax nodes (it wouldn't be mandatory to use this feature to access the syntax tree though, I imagine that all semantic nodes would have a syntax() method to access the underlying syntax node)

As for how the actual semantic model would be built and used, I think we could have a hybrid model where parts of it are computed eagerly (enumerating all scopes, declarations, and references for instance as that information would act as an entry point for subsequent analysis, including "syntax" rules checking for duplicates declarations and undefined references that we probably want to run unconditionally on most files anyway) and parts of it are computed lazily (listing all references to a certain declaration, or all declarations within a given scope for instance). This then flows back into the above proposal for query-based architecture and async / await.

0 replies

ematipico · 2022-05-24T09:02:29Z

ematipico
May 24, 2022

I see that you use the word "services" many times, could you please expand what are these services? What should expect from them and what we can request from these services?

1 reply

xunilrj May 26, 2022
Author

I think I am being vague on purpose because I don't know what we want to offer.

We know that we want to offer some lazily evaluated queries. So these queries will be in these "services".

But other examples that come to mind are access to a cache, maybe allowing file access, maybe we want to allow "fixes" to call API to open issues in other systems etc...

Honestly, it is just a façade to a bag of services we want to offer to rules.

xunilrj · 2022-05-31T08:25:46Z

xunilrj
May 31, 2022
Author

The RuleContext proposal seems to be the prefered one. Merging the need for multiple languages and the idea of the context having access to the Rule type, we would have something like the code below.

Is this solution fine with everyone? @rome/staff
I can create the PR as soon as everyone gives an ok.

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=2feac6e2a74eb9070944797e45d1cb42

use std::marker::*;
use std::default::Default;

fn main() {
    // This is the context, that each lint rule will have access...
    let ctx: RuleContext<SomeJsRule> = RuleContext::default();
    
    // We can get the requested node like this
    let _q = ctx.query();
    
    // We can access any service like this
    let _svcs = ctx.get::<SomeGenericService>();
    
    // We can offer a better api tailored for each language like this
    let _tree = ctx.parsed_tree();
    let _model = ctx.semantic_model();
    
    // A CSS rule for example would have css specific methods like this
    let ctx: RuleContext<SomeCssRule> = RuleContext::default();
    ctx.all_that_change_background();
}

// Defines the languages we support
pub struct Js;
pub struct Css;

// Define the AstNode
pub trait AstNode {
    type Language;
}

// Define a fake node for Js
#[derive(Default)]
pub struct JsSomeNode;
impl AstNode for JsSomeNode {
    type Language = Js;
}

// Define a fake node for Css
#[derive(Default)]
pub struct CssSomeNode;
impl AstNode for CssSomeNode {
    type Language = Css;
}

// Define the Rule Trait
pub trait Rule {
    type Query: AstNode;
}

// Define a fake Js Rule
#[derive(Default)]
struct SomeJsRule;
impl Rule for SomeJsRule {
    type Query = JsSomeNode;
}

// Define a fake Css Rule
#[derive(Default)]
struct SomeCssRule;
impl Rule for SomeCssRule {
    type Query = CssSomeNode;
}

// Define a fake generic service that will be available for all rules
#[derive(Default)]
pub struct SomeGenericService;

// The RuleContext implementation

#[derive(Default)]
pub struct RuleContext<TRule>
    where TRule: Rule
{
    phantom: PhantomData<TRule>
}

impl<TRule> RuleContext<TRule> 
    where TRule: Rule
{
    // This returns the node requested in the Rule
    pub fn query(&self) -> TRule::Query
        where <TRule as Rule>::Query: Default
    {
        Default::default()
    }

    // Generic method that retrieves a service by its type, like
    // a service locator.
    pub fn get<T>(&self) -> Option<T>
        where T: Default
    {
        Some(Default::default())
    }
}

// Methods/Services specific for Javascript
trait JsRuleContext {
    fn parsed_tree(&self);
    fn semantic_model(&self);
}

impl<TRule>  JsRuleContext for RuleContext<TRule> 
    where 
        TRule: Rule,
        <TRule as Rule>::Query: AstNode<Language = Js>
{
    fn parsed_tree(&self) {}
    fn semantic_model(&self) {}
}

// Methods/Services specific for CSS
trait CssRuleContext {
    fn all_that_change_background(&self);
}

impl<TRule> CssRuleContext for RuleContext<TRule> 
    where 
        TRule: Rule,
        <TRule as Rule>::Query: AstNode<Language = Css>
{
    fn all_that_change_background(&self) {}
}

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lint Architecture to support semantic model and other queries #2603

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 9 comments 14 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Lint Architecture to support semantic model and other queries #2603

Replies: 9 comments · 14 replies

xunilrj May 20, 2022 Author

1 - New parameter

xunilrj May 31, 2022 Author

xunilrj May 20, 2022 Author

2 - Injection

xunilrj May 20, 2022 Author

3 - Instance Services

xunilrj May 20, 2022 Author

4 - Specific Rule Traits

xunilrj May 20, 2022 Author

A - Query-based architecture

xunilrj May 26, 2022 Author

xunilrj May 26, 2022 Author

xunilrj May 20, 2022 Author

B - Async/Await

xunilrj May 26, 2022 Author

xunilrj May 26, 2022 Author

xunilrj May 31, 2022 Author

Replies: 9 comments 14 replies

xunilrj
May 20, 2022
Author

xunilrj May 31, 2022
Author

xunilrj
May 20, 2022
Author

xunilrj
May 20, 2022
Author

xunilrj
May 20, 2022
Author

xunilrj
May 20, 2022
Author

xunilrj May 26, 2022
Author

xunilrj May 26, 2022
Author

xunilrj
May 20, 2022
Author

xunilrj May 26, 2022
Author

xunilrj May 26, 2022
Author

xunilrj
May 31, 2022
Author