Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix verification of tracked struct from high-durability query #550

Closed
wants to merge 2 commits into from

Conversation

carljm
Copy link
Contributor

@carljm carljm commented Aug 2, 2024

This adds a test for, and proposes one possible fix for, a bug we ran into with durabilities where we were hitting the "access struct from previous revision" assert.

The test models a situation where we have two File inputs (0, 1), where File(0) has LOW durability and File(1) has HIGH durability. We can query an index for each file, and a definitions from that index (just a sub-part of the index), and we can infer each file. The index and definitions queries depend only on the File they operate on, but the infer query has some other dependencies: infer(0) depends on infer(1), and infer(1) also depends directly on File(0). The dependency graph is shown below: light color represents low durability, black represents high durability, dark gray are structs and struct fields; I didn't get around to adding their durability to the graph yet.

(The graph is automatically generated using some hacky printlns I added to Salsa that create a mermaid.live graph; if I end up debugging another issue where it's useful, I'll probably clean this code up and try to make it generally reusable.)

Screenshot 2024-08-01 at 10 19 02 PM

The panic occurs because definitions(1) is high durability, and depends on index(1) which is also high durability. index(1) creates the tracked struct Definition(1), and infer(1) (which is low durability) depends on Definition.file(1).

After a change to file 0 (low durability), we only shallowly verify definitions(1) -- it doesn't need deep-verify because it is high durability. So that means we never verify index(1) at all (deeply or shallowly), which means we never mark Definition(1) validated. So when we deep-verify infer(1), we try to access its dependency Definition.file(1), and hit the panic because we are accessing a tracked struct that has never been re-validated or re-recreated in R2.

The proposed fix is that in maybe_changed_after for a tracked struct field, rather than checking that the struct has been created/validated at the current revision, check that it has been created/validated at least as recently as its durability last changed, and if that is older than the current revision, validate it. The logic here is that this tracked struct can't have changed unless some input at or higher than its durability has changed.

The other possible fix I considered (but haven't tried implementing yet) is that when we short-circuit a shallow verify due to durability, we need to not only mark our own outputs validated, but recursively find all our inputs that also verify due to durability, and validate all of their outputs as well. That fix seems more complex and expensive than what I have in this PR.

Copy link

netlify bot commented Aug 2, 2024

Deploy Preview for salsa-rs canceled.

Name Link
🔨 Latest commit 5c541dc
🔍 Latest deploy log https://app.netlify.com/sites/salsa-rs/deploys/66bcc55a4b54c600086ce361

Copy link

codspeed-hq bot commented Aug 2, 2024

CodSpeed Performance Report

Merging #550 will not alter performance

Comparing carljm:struct-panic-main (5c541dc) with master (08820ea)

Summary

✅ 8 untouched benchmarks

@carljm carljm force-pushed the struct-panic-main branch from 924b53d to b30e9a7 Compare August 2, 2024 06:32
@carljm carljm marked this pull request as ready for review August 2, 2024 06:47
Copy link
Contributor

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I don't think I understand salsa well enough at this point to say with high confidence that this is correct. We may have to wait for @nikomatsakis's opinion (but can merge the fix in our fork to unblock)

src/tracked_struct/struct_map.rs Outdated Show resolved Hide resolved
src/tracked_struct/struct_map.rs Outdated Show resolved Hide resolved
src/tracked_struct/struct_map.rs Outdated Show resolved Hide resolved
@@ -201,6 +209,50 @@ where
unsafe { C::struct_from_raw(data.as_raw()) }
}

fn get_and_validate_last_changed<'db>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would find comment describing how it is different from get_from_map useful because the difference is very subtle

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the comment.

One of my questions here for @nikomatsakis is whether instead of adding a new get_and_validate_last_changed, maybe the implementation of get_from_map should just always do this. It's possible that other code paths that just call .get are still susceptible to this bug?

But it's also a little weird if a method called get can mutate the stored data.

src/tracked_struct/struct_map.rs Outdated Show resolved Hide resolved
let id = input.unwrap();
let data = self.struct_map.get(current_revision, id);
let data = self
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand salsa enough to make a real assessment but changing the last changed in maybe_changed_after seems slightly off to me.

I also wonder if we'll run into problems with it if the setup is slightly different because, as far as I understand, maybe_changed_after is only called from deep verify memo. That means, it would get by passed if all queries only perform a shallow compare. I'm not sure if that matters for tracked fields or if some other mechanism would take over.

But maybe it is allright because it is kind of similar to what FunctionIngredient::maybe_changed_after does.

Copy link
Contributor Author

@carljm carljm Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder if we'll run into problems with it if the setup is slightly different because, as far as I understand, maybe_changed_after is only called from deep verify memo. That means, it would get by passed if all queries only perform a shallow compare.

Yes, this relates to my question above about whether it's possible for other code paths besides maybe_changed_after which call get to run into this bug.

But I think it is safe as-is, because any query of low enough durability to possibly re-run in the current revision will also go through deep verify, which will check maybe_changed_after on all of its inputs. So a struct that never gets maybe_changed_after called on it in a given revision can't be accessed in that revision. I think?

But maybe it is allright because it is kind of similar to what FunctionIngredient::maybe_changed_after does.

Yes, it seems like it is already part of the contract of maybe_changed_after (and verification in general) that it can validate things to the current revision.

@nikomatsakis
Copy link
Member

Will attempt to read this tomorrow-ish.

@carljm carljm force-pushed the struct-panic-main branch from b2ec4b9 to 1ac16e2 Compare August 5, 2024 16:39
@MichaReiser
Copy link
Contributor

I don't have a good repro yet but I just ran into the following error.

2024-08-06 13:52:36.553 [info]    0: red_knot_server::server::Server::run::{{closure}}
             at /home/micha/astral/ruff/crates/red_knot_server/src/server.rs:140:29
   1: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/alloc/src/boxed.rs:2077:9
   2: std::panicking::rust_panic_with_hook
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/panicking.rs:799:13
   3: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/panicking.rs:656:13
   4: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/sys_common/backtrace.rs:171:18
   5: rust_begin_unwind
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/panicking.rs:652:5
   6: core::panicking::panic_fmt
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/core/src/panicking.rs:72:14
   7: salsa::tracked_struct::struct_map::StructMap<C>::get_from_map
             at /home/micha/.cargo/git/checkouts/salsa-fa0738f776139e4c/ece083e/src/tracked_struct/struct_map.rs:200:9
   8: salsa::tracked_struct::struct_map::StructMap<C>::get
             at /home/micha/.cargo/git/checkouts/salsa-fa0738f776139e4c/ece083e/src/tracked_struct/struct_map.rs:177:9
   9: salsa::tracked_struct::IngredientImpl<C>::lookup_struct
             at /home/micha/.cargo/git/checkouts/salsa-fa0738f776139e4c/ece083e/src/tracked_struct.rs:382:9
  10: red_knot_python_semantic::semantic_index::definition::_::<impl salsa::id::LookupId for red_knot_python_semantic::semantic_index::definition::Definition>::lookup_id
             at /home/micha/.cargo/git/checkouts/salsa-fa0738f776139e4c/ece083e/components/salsa-macro-rules/src/setup_tracked_struct.rs:144
2024-08-06 13:52:36.553 [info] :21
  11: <red_knot_python_semantic::types::infer::infer_definition_types::Configuration_ as salsa::function::Configuration>::id_to_input
             at /home/micha/.cargo/git/checkouts/salsa-fa0738f776139e4c/ece083e/components/salsa-macro-rules/src/setup_tracked_fn.rs:196:29
  12: salsa::function::execute::<impl salsa::function::IngredientImpl<C>>::execute::{{closure}}
             at /home/micha/.cargo/git/checkouts/salsa-fa0738f776139e4c/ece083e/src/function/execute.rs:46:58
  13: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/core/src/panic/unwind_safe.rs:272:9
  14: std::panicking::try::do_call
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/panicking.rs:559:40
  15: __rust_try
  16: std::panicking::try
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/panicking.rs:523:19
  17: std::panic::catch_unwind
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/panic.rs:149:14
  18: salsa::cycle::Cycle::catch
             at /home/micha/.cargo/git/checkouts/salsa-fa0738f776139e4c/ece083e/src/cycle.rs:42:15
  19: salsa::function::execute::<impl salsa::function::IngredientImpl<C>>::execute
             at /home/micha/.cargo/git/checkouts/salsa-fa0738f776139e4c/ece083e/src/function/execute.rs:46:27
  20: salsa::function::fetch::<impl salsa::function::IngredientImpl<C>>::fetch_cold
             at /home/micha/.cargo/git/checkouts/salsa-fa0738f776139e4c/ece083e/src/function/fetch.rs:104:14
  21: salsa::function::fetch::<impl salsa::function::IngredientImpl<C>>::compute_value::{{closure}}
             at /home/micha/.cargo/git/checkouts/salsa-fa0738f776139e4c/ece083e/src/function/fetch.rs:46:29
  22: core::option::Option<T>::or_else

Which may indicate that we need to validate all tracked structs?

@carljm
Copy link
Contributor Author

carljm commented Aug 6, 2024

Yeah, that traceback does suggest that there's still a path to accessing an un-validated struct :/

When you say you don't have a good repro, do you mean that you can't reproduce it consistently at all? That's going to make things difficult...

@carljm carljm force-pushed the struct-panic-main branch from 1ac16e2 to 986fddf Compare August 6, 2024 16:30
@MichaReiser
Copy link
Contributor

@carljm I haven't spent time on finding a consistent repro. I ran into it when I switched between different files in the LSP

Copy link
Member

@nikomatsakis nikomatsakis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I get the overall bug. Let me think on this.

@nikomatsakis
Copy link
Member

nikomatsakis commented Aug 8, 2024 via email

@carljm
Copy link
Contributor Author

carljm commented Aug 8, 2024

Always updating on the get path makes sense to me and should simplify this PR. I'll update to that approach.

@carljm carljm force-pushed the struct-panic-main branch from 986fddf to 8d553da Compare August 13, 2024 00:30
@carljm
Copy link
Contributor Author

carljm commented Aug 13, 2024

Updated to always validate on get. Some things I'm not sure about here:

  1. This means there can be cases (as in the added test) where the same struct is validated on get, and then later re-validated via mark_validated_output in the same revision. This means validate can no longer assert that the struct isn't already validated. I'm not sure what the comment "Never update a struct twice in the same revision" was attempting to prevent, but it doesn't seem like setting data.created_at = current_revision when they are already equal is going to cause a problem?

  2. I wondered whether validating on get might mean we no longer need mark_validated_output on tracked structs at all, but making it a no-op caused another test (deletion-cascade) to fail with the "access obsolete struct" panic.

  3. This change means both get and update now need to take &Zalsa rather than just a current_revision, so that in get_from_map we can call zalsa.last_changed_revision with the durability, which we only have access to after we get the data out of the map.

@carljm
Copy link
Contributor Author

carljm commented Aug 13, 2024

Test failure looks like a compiler improvement in nightly catching an unmatchable pattern, not related to this PR.

@davidbarsky
Copy link
Contributor

Test failure looks like a compiler improvement in nightly catching an unmatchable pattern, not related to this PR.

I've ignored it; it should be fixed soonish: rust-lang/rust#129031. If you rebase, CI will pass.

@carljm carljm force-pushed the struct-panic-main branch from 8d553da to 5c541dc Compare August 14, 2024 14:55
@nikomatsakis
Copy link
Member

I THINK this is fixed by #564

@MichaReiser
Copy link
Contributor

I can confirm this is fixed (https://github.com/astral-sh/ruff/pull/13016/files).

Did we add a test for it?

@nikomatsakis
Copy link
Member

@MichaReiser I don't think I added a specific test, I'm going to close this PR, but a PR with a test would be great

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants