-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use salsa accumulators for diagnostics #14760
base: main
Are you sure you want to change the base?
Conversation
7db3778
to
1f6247c
Compare
CodSpeed Performance ReportMerging #14760 will degrade performances by 10.43%Comparing Summary
Benchmarks breakdown
|
|
1d26d75
to
040f4d9
Compare
5cbbfc2
to
e0f43ee
Compare
e0f43ee
to
35f3815
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of report_
everywhere makes me feel like I'm reading the pyright codebase haha. But in most cases it seems like it probably is the best verb to use. Still, there's a couple of places where I think we could use some better names:
pub struct CompileDiagnostic(std::sync::Arc<dyn Diagnostic>); | ||
|
||
impl CompileDiagnostic { | ||
pub fn report<T>(db: &dyn Db, diagnostic: T) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is report
the best verb here? It feels to me like we're adding a diagnostic to be reported later, rather than reporting it to the user immediately. Maybe this could be called CompileDiagnostic::add()
?
format_args!("Name `{id}` used when not defined"), | ||
); | ||
/// Reports a diagnostic for the given node and file if diagnostic reporting is enabled for this file. | ||
pub(crate) fn report_type_diagnostic( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similarly here, add_type_diagnostic
would be a more natural name for me
Hmm, I'm looking into suppressions right now, specifically how we'd recognize unused suppression comments. Another alternative to emitting the diagnostic is to emit a |
diagnostic | ||
})); | ||
let mut diagnostics = check_types::accumulated::<CompileDiagnostic>(db, test_file.file); | ||
// Filter out diagnostics that are not related to the current file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only tangentially related: do we have a feeling for how many unrelated-file diagnostics we generate typically? If this would be a large fraction of all diagnostics, we should probably detect that earlier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be any number of diagnostics and salsa doesn't provide a way to detect this earlier.
Salsa accumulators work by traversing the entire dependency tree of a query. In our case, this is check_file
. check_file
can branch out into arbitrary files when resolving imports (and checking those files in turn). The only way for us to truncate this earlier is by not using accumulators because salsa doesn't know about the file boundaries.
I don't think this is very terrible because the CLI uses check_workspace
to get all diagnostics. The LSP uses check_file
today but I don't think it should. The wasm playground uses check_file
... this is probably fine
use crate::Db; | ||
|
||
type AnnotationParseResult = Result<Parsed<ModExpression>, TypeCheckDiagnostics>; | ||
type AnnotationParseResult = Result<Parsed<ModExpression>, ()>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Something like
type AnnotationParseResult = Result<Parsed<ModExpression>, ()>; | |
struct StringAnnotationParseError; | |
type AnnotationParseResult = Result<Parsed<ModExpression>, StringAnnotationParseError>; |
would maybe make match
es at the call sites of this function a bit easier to understand
diagnostics.sort_unstable_by(|a, b| { | ||
let a_file = a.file(); | ||
let b_file = b.file(); | ||
|
||
a_file.cmp(&b_file).then_with(|| { | ||
a_file | ||
.path(db) | ||
.as_str() | ||
.cmp(b_file.path(db).as_str()) | ||
.then_with(|| { | ||
a.range() | ||
.unwrap_or_default() | ||
.start() | ||
.cmp(&b.range().unwrap_or_default().start()) | ||
}) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few questions:
- Do we really want an unstable sort here? Two different diagnostics could compare equal according to the ordering defined here if they refer to the same range, or even just the same range start.
- Why do we order by file and then by the path of the file? Can multiple files refer to the same path?
- If performance is not a major concern here, we could potentially do something like
diagnostics.sort_unstable_by_key(|diagnostic| { let file = diagnostic.file(); let path = file.path(db).as_str(); let range_start = diagnostic.range().unwrap_or_default().start(); (file, path, range_start) });
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we order by file and then by the path of the file? Can multiple files refer to the same path?
Oh, that's just wrong. The idea was to short-cut if the files are identical.
Do we really want an unstable sort here? Two different diagnostics could compare equal according to the ordering defined here if they refer to the same range, or even just the same range start.
I don't think it's important for us to preserve the original order if there are multiple diagnostics at the same line, for as long as the order is deterministic (Running Red Knot multiple times should give you the same result). We should probably include the rule code as well as disambiguator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for as long as the order is deterministic
Exactly, that was my concern.
We should probably include the rule code as well as disambiguator
👍
Summary
This is a reimplementation of #14116 now that Salsa salsa accumulators are cheaper (close to "free" for queries without diagnostics).
The main benefit of salsa accumulators is that they're an easy way to emit a diagnostic from anywhere inside a salsa query, and salsa takes care to deduplicate diagnostics if the same query is called multiple times.
Performance
This still significantly regresses the incremental performance. I created another salsa PR to reduce the regression to 9%. I don't think we can do much better except reducing the number of inputs by using coarse-grained salsa dependencies because:
Considering this, I'm still leaning towards migrating to salsa accumulators because of what they unlock: We can now emit diagnostics from any part of the code. This includes the module resolver (e.g. emit a warning if we find an invalid
pth
file?)Other changes
This PR upgrades Salsa to a version with "cheap" accumulators. The new salsa version now requires that
Db
structs implementClone
for its parallel db support (which doesn't seem to work yet).Test Plan
cargo test