Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: deposit channel expiry #3998

Merged
merged 19 commits into from
Sep 27, 2023
Merged

fix: deposit channel expiry #3998

merged 19 commits into from
Sep 27, 2023

Conversation

kylezs
Copy link
Contributor

@kylezs kylezs commented Sep 13, 2023

Pull Request

Closes: PRO-171

SDK companion: chainflip-io/chainflip-sdk-monorepo#227 (required so that bouncer can pass)

Checklist

Please conduct a thorough self-review before opening the PR.

  • I am confident that the code works.
  • I have updated documentation where appropriate.

Summary

  • Removes ChannelAction storage in ingress-egress pallet. This was just used to ensure once the address is expired, that we can't deposit again, until it's explicitly reused. Instead we can just use the DepositChannelPool directly which already has this information.
  • Removes expiry from the lp and swapping pallets. Instead, we add expiry to ingress-egress, such that it can be based on the external chain block number, rather than the SC block number.
  • CFE filters by expires_at to ensure we don't witness beyond the expires_at
  • We recycle (different from expiry) at DepositChannelLifetime blocks more than the expiry block i.e. 2 * DepositChannelLifetime from channel creation. This ensures that any delay in witnesses getting into the SC after the CFE has submitted them is accounted for. Note that sending a tx to Ethereum at block 20 when the expires_at is block 19 will not go through. The CFEs will not witness it, even if the address has not yet been recycled.
  • Removes lp_deposit_expiry test, since it's not invalid. I created PRO-863 to ensure we unit test what it was effectively trying to test.

Next steps:
@dandanlen is working on some refactoring as part of the deposit tracking which will untangle some of the lifecycles from the storage items and transitions. This will make it easier to optimise this further. Potentially by having a separate storage item for the active addresses (any address the CFE might need to witness - it might not if the block number it is at is not within the range still), and one for the inactive addresses - these would be the addresses that have been recycled, ready for re-use, but the CFE definitely does not need witness. This saves on CFE compute and SC compute since we don't need to iterate through so many addresses on each on_idle call if we know we don't need to (in the case of inactive addresses.

The product (@fernandezdavid ) should use expires_at + 1/2 * DepositChannelLifetime this ensures that users are depositing long before the hard cut off. Maybe we could make it something like 3/4, but perhaps we want to start safer before expanding, will also be a per-chain configuration.

Base automatically changed from refactor/ingress-egress-for-ingress-expiry to main September 14, 2023 07:34
@linear
Copy link

linear bot commented Sep 14, 2023

PRO-171 Ingress witnessing cooldown period

Problem

In the ingress addresses adapter, there is a race condition between the witnessing of blocks and the expiry on the SC of ingress addresses, as the engine uses the SC state to determine what addresses to witness at a given block. But when the SC expires an address, all state for that address is removed.

Therefore it is possible that different engines, are looking at different sets of addresses when witnessing ingresses for a particular block, This means the hashes of the witness calls may not match, leading to a stuck witness (One that doesnÄt reach 2/3).

Solution

We define a per-chain number of blocks that a particular egress channel will last for, N

So, when an ingress channel is opened, we store the opened_at in the DepositChannelDetails. Then in N blocks the CFEs will stop witnessing that channel.

Therefore, the DepositChannelDetails storage should only be deleted once the chain tracking for that chain has reached the opened_at + N block. However, chain tracking is ahead of the witnessing, so we'd have to witness at opened_at + N + safety margin + latency buffer. Then there's another issue, which is, if the blocks fill up and the tx can't get in, we can expire something prematurely, even though the CFEs submitted their extrinsics on time.

@kylezs kylezs changed the title WIP: fix: remove expired deposit channel details from storage WIP: fix: deposit channel expiry Sep 14, 2023
@kylezs kylezs force-pushed the fix/remove-deposit-channel-details branch 2 times, most recently from 1b5204e to 85abcc0 Compare September 20, 2023 14:15
@kylezs kylezs changed the base branch from main to refactor/ingress-expiry-prep September 20, 2023 14:15
@kylezs kylezs force-pushed the fix/remove-deposit-channel-details branch from 85abcc0 to 9e5b4d2 Compare September 20, 2023 14:19
@codecov
Copy link

codecov bot commented Sep 20, 2023

Codecov Report

Merging #3998 (b510bef) into main (3c2afa8) will decrease coverage by 0%.
The diff coverage is 68%.

@@          Coverage Diff           @@
##            main   #3998    +/-   ##
======================================
- Coverage     72%     72%    -0%     
======================================
  Files        370     373     +3     
  Lines      59306   59189   -117     
  Branches   59306   59189   -117     
======================================
- Hits       42681   42507   -174     
- Misses     14508   14577    +69     
+ Partials    2117    2105    -12     
Files Coverage Δ
engine/src/witness/btc.rs 62% <100%> (+<1%) ⬆️
...ate-chain/cf-integration-tests/src/mock_runtime.rs 100% <100%> (ø)
state-chain/node/src/chain_spec/partnernet.rs 0% <ø> (ø)
state-chain/node/src/chain_spec/perseverance.rs 0% <ø> (ø)
state-chain/node/src/chain_spec/sisyphos.rs 0% <ø> (ø)
state-chain/node/src/chain_spec/testnet.rs 0% <ø> (ø)
state-chain/pallets/cf-broadcast/src/lib.rs 82% <ø> (ø)
...hain/pallets/cf-ingress-egress/src/benchmarking.rs 100% <100%> (ø)
state-chain/pallets/cf-ingress-egress/src/mock.rs 98% <100%> (ø)
state-chain/pallets/cf-lp/src/benchmarking.rs 100% <ø> (ø)
... and 19 more

... and 6 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@kylezs kylezs force-pushed the fix/remove-deposit-channel-details branch from 9e5b4d2 to fccfe67 Compare September 21, 2023 06:42
@@ -381,6 +396,45 @@ pub mod pallet {

#[pallet::hooks]
impl<T: Config<I>, I: 'static> Hooks<BlockNumberFor<T>> for Pallet<T, I> {
/// Recycle addresses if we can
fn on_idle(_n: BlockNumberFor<T>, remaining_weight: Weight) -> Weight {
let current_target_chain_height = T::ChainTracking::get_block_height();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These calculations are quite rough, didn't want to spend too much time on it because with the changes that @dandanlen is bringing in with the deposit tracking which will refactor a lot of the lifecycle stuff, it should be easier to restructure the code further to allow for a more efficient recycling of the channels.

@kylezs kylezs force-pushed the fix/remove-deposit-channel-details branch from 2c23295 to 88e10f6 Compare September 21, 2023 09:12
@kylezs kylezs marked this pull request as ready for review September 21, 2023 09:27
@kylezs kylezs changed the title WIP: fix: deposit channel expiry fix: deposit channel expiry Sep 21, 2023
@dandanlen
Copy link
Collaborator

The product (@fernandezdavid ) should use expires_at + 1/2 * DepositChannelLifetime this ensures that users are depositing long before the hard cut off. Maybe we could make it something like 3/4, but perhaps we want to start safer before expanding, will also be a per-chain configuration.

About this - we could/should provide an rpc to handle this.

@kylezs
Copy link
Contributor Author

kylezs commented Sep 21, 2023

About this - we could/should provide an rpc to handle this.

I assume you mean an RPC to allow querying of the number of blocks to wait before soft expiry for each chain?

@@ -24,7 +24,6 @@ use tracing::log;
#[derive(Serialize, Deserialize, Clone)]
pub struct BrokerSwapDepositAddress {
pub address: String,
pub expiry_block: BlockNumber,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a breaking change to the api. SDK will probably need to be updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, linked the companion PR in the description

addresses.iter().filter(|details| details.opened_at <= index).cloned().collect()
addresses
.iter()
.filter(|details| details.opened_at <= index && index <= details.expires_at)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer if we use an exclusive upper bound here.

This is more consistent with ranges in general (opened_at..expires_at).

Also it seems odd that we can submit something at the block in which it expires.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inclusive end is consistent with how we witness everything else, the external chain block number for the vault rotation, is witnessed by the previous authority set, i.e. to the previous set, it's inclusive of the range they witness, which is why I made it inclusive

state-chain/node/src/chain_spec.rs Outdated Show resolved Hide resolved
// this too early. Even accounting for the safety margin, there is potential for
// latency, full SC blocks which might block ingress witnesses getting in etc. By
// having this buffer we protect against this probabilistically.
if details.expires_at + channel_lifetime <= current_target_chain_height {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can save a read by using expires_at - opened_at to infer the lifetime of the channel. 🤓


let mut number_recycled = 0;
for (_, details) in
DepositChannelLookup::<T, I>::iter().take(number_to_recycle as usize)
Copy link
Collaborator

@dandanlen dandanlen Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this works as intended. It's possible we never recycle unless the remaining weight is enough to process all of the channels.

The order of iteration is based on the hashes of the keys, so we will always read them in the same order, and we might never reach the to-be-expired channel.

I think a better option is probably to store a vec of (block_number, address) and then iterate these in order, stopping when we reach the current_target_chain_hegiht or exceed the available weight budget.

Writing to this is cheap: We can use append. Reading is more expensive but at least it's a single read operation. The decoding might be expensive but at least it will only contain the unexpired channels.

Btw. we don't remove the value from the DepositChannelLookup, it this intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah didn't consider the sort order, was intending to split them into active and expired sets, but that requires some of what you're refactoring I believe. For now I've done your suggestion so it's at least correct now.

Btw. we don't remove the value from the DepositChannelLookup, it this intended?

It was, but because there's a large enough gap between expiry and recycle block then the fetch will be able to occur in there, so I changed it to take(). I think eventually we will be able to make this more correct after your refactor.

state-chain/pallets/cf-ingress-egress/src/lib.rs Outdated Show resolved Hide resolved
IngressEgress::expire_channel(*address);
}
let recycle_block = BlockHeightProvider::<MockEthereum>::get_block_height() +
DepositChannelLifetime::<Test, _>::get() * 2;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit magical again... maybe we can implement a method DepositChannelDetails::expiry_block(&self) to factor this out?

Base automatically changed from refactor/ingress-expiry-prep to main September 22, 2023 14:32
@kylezs kylezs force-pushed the fix/remove-deposit-channel-details branch from 44ba4a0 to cdc5e0d Compare September 22, 2023 14:51
@kylezs kylezs force-pushed the fix/remove-deposit-channel-details branch from f812571 to 26b60b0 Compare September 25, 2023 11:53
@kylezs kylezs added the blocked label Sep 25, 2023
@kylezs kylezs force-pushed the fix/remove-deposit-channel-details branch from 3c1b102 to 1ed2d1d Compare September 25, 2023 13:09
@kylezs kylezs removed the blocked label Sep 26, 2023
@kylezs kylezs force-pushed the fix/remove-deposit-channel-details branch from 1ed2d1d to f2b2998 Compare September 26, 2023 14:06
@kylezs kylezs force-pushed the fix/remove-deposit-channel-details branch from f2b2998 to 6312068 Compare September 26, 2023 14:08
@kylezs kylezs force-pushed the fix/remove-deposit-channel-details branch from 0dee99c to 29f2fb6 Compare September 27, 2023 09:56
@kylezs kylezs enabled auto-merge (squash) September 27, 2023 11:19
@kylezs kylezs merged commit 25d2e4d into main Sep 27, 2023
44 checks passed
@kylezs kylezs deleted the fix/remove-deposit-channel-details branch September 27, 2023 11:39
dandanlen added a commit that referenced this pull request Sep 28, 2023
* fix: SC recycles based on external chain block

* chore: use genesis

* feat: CFE filers based on expires_at

* chore: add weight calculations for on_idle

* chore: clean up some comments

* chore: add genesis config

* chore: use constants for expiry blocks

* chore: use log_or_panic

* fix: use correct expiry

* test: more recycle tests

* chore: clippy

* chore: move constants into chainspec

* fix: use mutate, rename

* chore: ingress-egress pallet migrations

* chore: use TargetChainBlockNumber

* chore: remove the expiry storage from the swapping and lp pallets

expiry is now done in the ingress-egress pallet

* chore: update SDK version

* chore: remove now-invalid bouncer test

* chore: do the migrations

---------

Co-authored-by: Daniel <daniel@chainflip.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants