Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: tracker uses unfinalised sc stream #4146

Merged
merged 14 commits into from
Nov 7, 2023

Conversation

msgmaxim
Copy link
Contributor

Pull Request

Closes: PRO-923

Checklist

Please conduct a thorough self-review before opening the PR.

  • I am confident that the code works.
  • I have updated documentation where appropriate.

Summary

As @AlastairHolmes suggested, I changed state chain client to return both finalised and unfinalised streams. Now StateChainStreamApi has a bool parameter to specify if the blocks should be finalized and it can be used to restrict certain api to only allow finalized streams. IngressAddresses and EgressItems are examples where both finalized and unfinalised streams are supported, and the tracker now uses unfinalized streams for them.

Note that I had to change blanked impl of StateChainStreamApi and implement StateChainStreamApi directly on InnerCachedStream, because otherwise FinalizedCachedStream would also get impl for StateChainStreamApi (on top of StateChainStreamApi), which seemed less then ideal (though it would still work).

@linear
Copy link

linear bot commented Oct 20, 2023

PRO-923 Remove Block Delay in Ingress-Egress Tracker

One of the design choices for PRO-868 was to re-use as much as possible of the existing engine witnesser, and remove safety margin so that blocks are processed as fast as possible (which is the whole point of the tracker). I recently noticed that even without safety margin, engine's witnesser delays blocks by 20 seconds or so on average (specifically, it waits for chain tracking for that block +1 to have made it into SC's finalised state). This seems to be done to ensure we have the full list of deposit addresses before we process them.

For tracking, however, this seems way too conservative. Most of the time, by the time users send funds to a deposit address, it will have already been recorded on SC (in an unfinalised block, at least). I imagine one potential solution is to somehow specialise tracker's witnesser obtain addresses from unfinalised SC storage and wait for unfinalised chain tracking for blocks. (Reorgs should be somewhat rare with Aura, right?)

kylezs

@codecov
Copy link

codecov bot commented Oct 20, 2023

Codecov Report

Merging #4146 (19f3350) into main (8270431) will decrease coverage by 0%.
The diff coverage is 32%.

@@          Coverage Diff           @@
##            main   #4146    +/-   ##
======================================
- Coverage     72%     72%    -0%     
======================================
  Files        381     383     +2     
  Lines      62151   62349   +198     
  Branches   62151   62349   +198     
======================================
+ Hits       44520   44635   +115     
- Misses     15308   15377    +69     
- Partials    2323    2337    +14     
Files Coverage Δ
engine/src/state_chain_observer/test_helpers.rs 100% <100%> (ø)
utilities/src/with_std.rs 75% <ø> (+3%) ⬆️
api/lib/src/lib.rs 32% <0%> (ø)
api/lib/src/queries.rs 20% <0%> (ø)
engine/src/witness/eth/key_manager.rs 0% <0%> (ø)
...gine/src/state_chain_observer/sc_observer/tests.rs 88% <75%> (-<1%) ⬇️
...nked_chain_source/chunked_by_vault/egress_items.rs 0% <0%> (ø)
...ne/src/state_chain_observer/client/base_rpc_api.rs 1% <0%> (-<1%) ⬇️
...rc/state_chain_observer/client/finalized_stream.rs 59% <59%> (ø)
...chain_source/chunked_by_vault/deposit_addresses.rs 0% <0%> (ø)
... and 1 more

... and 3 files with indirect coverage changes

📣 Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in Chrome or Firefox today!

@kylezs
Copy link
Contributor

kylezs commented Oct 20, 2023

Just realised this delay issue due to the SC finalised stream is present in the pre-witnessing too. Would be good to apply this change there as well (can be in a separate PR).

(PRO-929)

@msgmaxim msgmaxim force-pushed the feat/tracker-unfinalised-sc-stream branch from aec65ce to abe0832 Compare October 27, 2023 04:23
Comment on lines +96 to +98
StateChainStream: StateChainStreamApi<FINALIZED>,
StateChainClient: StorageApi + Send + Sync + 'static,
const FINALIZED: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
StateChainStream: StateChainStreamApi<FINALIZED>,
StateChainClient: StorageApi + Send + Sync + 'static,
const FINALIZED: bool,
StateChainStream: StateChainStreamApi,
StateChainClient: StorageApi + Send + Sync + 'static,

You don't need to mention FINALIZED.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with the other cases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? I'm getting "missing generics for trait StateChainStreamApi" error as expected.

@AlastairHolmes
Copy link
Contributor

I'm not going to be able to get to this for a few days, due to higher priority work (LP api). There is an important issue with the unfinalized stream basically that we shouldn't try to fill in gaps in the block stream. Also I'd like the initialization code to be cleaned up before it is merged as I feel it is adding significant tech debt to code that I put a great deal of effort into keeping clean, that most likely I'll have to clean up. Which is why I still wish to review this before it is merged.

@msgmaxim
Copy link
Contributor Author

msgmaxim commented Nov 1, 2023

we shouldn't try to fill in gaps in the block stream

I was thinking about that too, and yesterday I was hoping to ask for your opinion on this actually. I know that we aren't supposed to have gaps in the unfinalised stream, but I'm also not sure that it hurts to keep that code in case we do miss a block for some reason. Either way, it is an easy change to make. I don't understand why you had to "work on a fix" for this in the background without any communication (and without any apparent progress on it either).

I put a great deal of effort into keeping clean

FWIW I don't think the existing code particularly "clean" and there are some things that I would change, but didn't because I want to focus on the feature and to avoid arguing about style again.

that most likely I'll have to clean up

Not necessarily. I can clean it up too, if necessary. We have to agree of what "clean" means though. It is not just up to you. I already have some other code on top of this, factoring some code out and adding tests (which we need, and I wish I had while making changes in this PR). It just doesn't make sense dragging this PR out because of things that are likely going to change soon anyway...

I'm not going to be able to get to this for a few days, due to higher priority work (LP api)

Sounds like we should let you focus on that higher priority work. Now that you have expressed your main concerns, and provided that they are all addressed, wouldn't it be enough to get, say, @kylezs approval to get this merged? Your coding styles aren't too different I imagine, and I'm happy to make whatever changes he suggests to make it more "clean".

api/lib/src/lib.rs Outdated Show resolved Hide resolved
api/lib/src/queries.rs Outdated Show resolved Hide resolved
engine/src/state_chain_observer/client/mod.rs Outdated Show resolved Hide resolved
engine/src/state_chain_observer/client/mod.rs Show resolved Hide resolved
async move {
let base_rpc_client = &base_rpc_client;
let intervening_headers: Vec<_> =
futures::stream::iter(prev_header.number + 1..next_header.number)
Copy link
Contributor

@kylezs kylezs Nov 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The asserts below here can panic in the unfinalised case. Because it's possible that the chain changes in between these requests, there's no guarantee they remain the same between queries.

The original code is tailored around the accuracy case, of using the finalised stream, and ensuring that we don't have gaps, and that regardless of substrate weirdness we end up with a consistent, monotonic stream. So, naturally we end up having to put if finalized in a bunch of places in the code - which, yes, would also fix the asserts here too..

Instead, we should probably take the approach that gives us more certainty in the case we need it (the finalised case), and more flexibility in the unfinalised case - by separating these concerns entirely, into separate functions, rather than weaving in the two concerns (of accuracy and speed) with one another. Think this just makes it easier to reason about, can just think about one thing at a time.

There's also the question of whether catching up actually makes sense in the unfinalised case. Because the consumers of this stream access storage (and not events), then the fact that we skip some blocks doesn't really matter, and it doesn't make sense to include the complexity required to gain accuracy we need in the finalised case, in this case where we only want speed.

The only problematic case may be something like the underlying unfinalised stream skipping 24 hours+ of blocks, since this would go beyond the swapping channel expiry, which I think we can accept as quite unlikely - and not that bad if it did happen anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The asserts below here can panic in the unfinalised case. Because it's possible that the chain changes in between these requests, there's no guarantee they remain the same between queries.

Which assert in particular? These two shouldn't panic because we request block_header by hash (not by number), and we should then get a block where hash() equals the requested hash, even with unfinalised blocks. Also, the height of the block doesn't change even if its position gets taken over by a different block (otherwise its hash would have changed as well).

assert_eq!(
	block_header.hash(),
	block_hash,
	"{SUBSTRATE_BEHAVIOUR}"
);
assert_eq!(
	block_header.number, block_number,
	"{SUBSTRATE_BEHAVIOUR}",
);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead, we should probably take the approach that gives us more certainty in the case we need it (the finalised case), and more flexibility in the unfinalised case - by separating these concerns entirely, into separate functions,

I don't mind this. I actually already extracted inject_intervening_blocks on a separate branch (so that we can test it), which should also make it easier to do what you are suggesting here.

Copy link
Contributor Author

@msgmaxim msgmaxim Nov 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also the question of whether catching up actually makes sense in the unfinalised case.

All we need for the tracker is a stream of signals whenever there is a new block. I assumed we wanted to make use of unfinalised subscription elsewhere, where we might also be interested in events (e.g. P2P registrations). I'm happy to remove catching up from the unfinalised stream if you think that's the right thing to do. Perhaps we should actually write down for the requirements for this PR/stream are?

Copy link
Contributor

@kylezs kylezs Nov 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two shouldn't panic because we request block_header by hash (not by number), and we should then get a block where hash() equals the requested hash

True. In this case it can just error on the request stage, since we:

  1. Query by number, get the hash of this block
  2. Query by hash - hash doesn't exist so we error.
  3. This error then propogates, ultimately resulting in a shut down of the stream at the bottom of create_block_subscription
				scope.spawn_with_handle({
					let base_rpc_client = base_rpc_client.clone();
					async move {
						loop {
							let block_header =
								block_header_stream.next().await.unwrap()?;
							let block_hash = block_header.hash();
							if let Some((required_version, _)) = required_version_and_wait {
								let current_release_version = base_rpc_client.storage_value::<pallet_cf_environment::CurrentReleaseVersion<state_chain_runtime::Runtime>>(block_hash).await?;
								if !required_version.is_compatible_with(current_release_version) {
									break Err(anyhow!("This version '{}' is no longer compatible with the release version '{}' at block: {}", required_version, current_release_version, block_hash))
								}
							}

							if !block_sender.send((block_hash, block_header)).await {
								break Ok(())
							}
							if latest_block_hash_sender.send(block_hash).is_err() {
								break Ok(())
							}
						}
					}
				}),

Though, yeah, we can just avoid a lot of this complexity and just have two functions, with different responsibilities. One to be effectively left as is (maybe some changes for some pieces of code that will be factored out), and then one separate one for the unfinalised.

P2P registrations

Perhaps we should also use storage for this too. We started moving away from the events in the witnessing refactor, which allows the components to be more isolated and self-contained, rather than depending on the SCO - which also helps us in the "uninline SCO" goal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should also use storage for this too.

I don't mind that, and actually I have been thinking about that too. It will be a bit of work to compute the diff, but shouldn't be too bad and it would be easily testable.

@msgmaxim msgmaxim force-pushed the feat/tracker-unfinalised-sc-stream branch from c4c8294 to 117c370 Compare November 6, 2023 04:59
@msgmaxim msgmaxim mentioned this pull request Nov 6, 2023
2 tasks
Copy link
Contributor

@kylezs kylezs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌 yeah think this way is the way to go, just a few minor / testing comments

@@ -743,3 +781,145 @@ pub mod mocks {
}
}
}

async fn inject_intervening_headers<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a comment here to state this this should only be used with a finalised stream and also that it can skip blocks etc. As there are a few gotchas here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment to only be used with finalized streams. What do you mean by "it can skip blocks" though?

move |hash| Ok(chain.headers.get(&hash).expect("unknown hash").clone())
});

let sparse_stream = tokio_stream::iter([0, 1, 3, 6].map(|num| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also add tests for some base cases like:

  • No headers
  • Headers already in order without gaps

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. For "no headers" I changed the code to return error rather than panicking (for ease of testing). Arguably it should return an empty stream too, but doesn't seem like an important change to make, so I will leave it as is.

@msgmaxim msgmaxim enabled auto-merge (squash) November 7, 2023 06:05
@msgmaxim msgmaxim merged commit d7659ac into main Nov 7, 2023
41 checks passed
@msgmaxim msgmaxim deleted the feat/tracker-unfinalised-sc-stream branch November 7, 2023 07:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants