Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CCIP-2971 Optimize token/gas prices database interactions #14074

Merged
merged 4 commits into from
Aug 22, 2024

Conversation

mateusz-sekara
Copy link
Collaborator

@mateusz-sekara mateusz-sekara commented Aug 8, 2024

Reference https://smartcontract-it.atlassian.net/browse/CCIP-2971

Problem

Leader lane implementation inserts every price of every token/gas to the observed_token_prices and observed_gas_prices tables. It also evicts stale elements by removing them from the database, so all background workers (per lane) keep inserting token prices and deleting stale ones. Then leader lane worker is responsible for querying that table to pick only the relevant data (latest update per token)

It makes the entire process very db-heavy and inefficient because with every round we keep inserting new prices and evicting stale ones, which leads to:

  • high dead tuples ratio - it impacts the storage by increasing IOPSes and pg overall performance by making auto vacuum busy
  • high number of tuples fetched from that table - we scan all items from the index to pick the latest one by token_addr (it’s even higher than the evm.logs !)

Solution

  1. Get rid of deleting logic (simplify the code) and use upsert instead of insert. This will keep the table rather small - as many rows as tokens per chain. The table doesn’t grow in time so we don’t need the entire deletion logic - simplify the codebase, and reduce database connection usage by node. Token price updates only update the row in the database instead of creating a new row with every update. The huge benefit is also lightweight lookup (searching token prices by leader lane), which will be extremely small because the table is small.

  2. Don’t insert if it’s not necessary. Multiple jobs might have the same tokens, before fetching the token price we can check which tokens didn’t get an update recently (to be defined what “recently” means in that case) and update only the ones that require it. Therefore we don’t flood the table with upsert requests.

Tests

Go bench results

Keep inserting random prices to the table. Please see Benchmark_UpsertsTheSameTokenPrices

// Before changes (always insert new logs). It's underestimated because it doesn't include deleting overhead 
Benchmark_UpsertsTheSameTokenPrices
Benchmark_UpsertsTheSameTokenPrices-12    	    2418	    483486 ns/op
Benchmark_UpsertsTheSameTokenPrices-12    	    2662	    487156 ns/op
Benchmark_UpsertsTheSameTokenPrices-12    	    2427	    514165 ns/op
Benchmark_UpsertsTheSameTokenPrices-12    	    2481	    526274 ns/op
Benchmark_UpsertsTheSameTokenPrices-12    	    2529	    507806 ns/op

// Always upsert
Benchmark_UpsertsTheSameTokenPrices
Benchmark_UpsertsTheSameTokenPrices-12    	    1047	   5467925 ns/op
Benchmark_UpsertsTheSameTokenPrices-12    	    1204	   6483758 ns/op
Benchmark_UpsertsTheSameTokenPrices-12    	     897	   4814489 ns/op
Benchmark_UpsertsTheSameTokenPrices-12    	    1300	   7206590 ns/op
Benchmark_UpsertsTheSameTokenPrices-12    	    1112	   6422288 ns/op

// Upsert only relevant data (more read queries, less writes)
Benchmark_UpsertsTheSameTokenPrices
Benchmark_UpsertsTheSameTokenPrices-12    	    5346	    246378 ns/op
Benchmark_UpsertsTheSameTokenPrices-12    	    4807	    245983 ns/op
Benchmark_UpsertsTheSameTokenPrices-12    	    4275	    239922 ns/op
Benchmark_UpsertsTheSameTokenPrices-12    	    4386	    248482 ns/op
Benchmark_UpsertsTheSameTokenPrices-12    	    4514	    254113 ns/op

CCIP baseline test

  • Tuples fetches/read from index dropped from 1k to 400
  • Dead tuples dropped from 4k to 300
  • Similar Postgres CPU utilization
  • Higher index usage, from 3 RPS to 18 RPS. It's caused by:
    • we use an index when inserting to resolve conflicting entries
    • we fetch before upsert to reduce unnecessary updates
  • Although we scan the index more often, we still fetch less data than we used to fetch in the previous implementation

Before

Screenshot 2024-08-14 at 14 16 00

After
Screenshot 2024-08-14 at 18 41 46

@mateusz-sekara mateusz-sekara force-pushed the optimize-token-prices branch from 7481fab to 3b503f5 Compare August 8, 2024 12:16
@mateusz-sekara mateusz-sekara changed the title CCIP-2971 Optimize token/gas prices database interations CCIP-2971 Optimize token/gas prices database interactions Aug 8, 2024

CREATE TABLE ccip.observed_token_prices
(
chain_selector NUMERIC(20, 0) NOT NULL,
Copy link
Collaborator Author

@mateusz-sekara mateusz-sekara Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jobId is not relevant with upsert_approach, so I did remove it entirely

@mateusz-sekara mateusz-sekara marked this pull request as ready for review August 13, 2024 09:01
@mateusz-sekara mateusz-sekara requested a review from a team as a code owner August 13, 2024 09:01
@matYang
Copy link
Contributor

matYang commented Aug 20, 2024

@matYang should we backport from ccip repo the distinction between token and gas price updates here?

	gasPriceUpdateInterval = 1 * time.Minute
	tokenPriceUpdateInterval = 10 * time.Minute

yeh would be ideal, alternatively we can ignore the PriceServices changes here and only apply it inside ccip repo after merging chainlink to ccip, btw there is another PR in ccip that touches PriceServices: https://github.com/smartcontractkit/ccip/pull/1275/files, from a merge perspective, the latter should be easier, do we wish to keep ccip code in chainlink repo somewhat up to date?

@mateusz-sekara
Copy link
Collaborator Author

mateusz-sekara commented Aug 20, 2024

@matYang should we backport from ccip repo the distinction between token and gas price updates here?

	gasPriceUpdateInterval = 1 * time.Minute
	tokenPriceUpdateInterval = 10 * time.Minute

yeh would be ideal, alternatively we can ignore the PriceServices changes here and only apply it inside ccip repo after merging chainlink to ccip, btw there is another PR in ccip that touches PriceServices: https://github.com/smartcontractkit/ccip/pull/1275/files, from a merge perspective, the latter should be easier, do we wish to keep ccip code in chainlink repo somewhat up to date?

Ouch, that's the problem here. We need a single source of truth and unfortunately, PRs touching db migrations have to be done in chainlink repo first. Not sure what's the best approach here, @asoliman92 any suggestions?

We can either cherry-pick ccip changes to chainlink and resolve conflicts in this PR or merge that PR and resolve conflicts on ccip level

@asoliman92
Copy link
Contributor

Ouch, that's the problem here. We need a single source of truth and unfortunately, PRs touching db migrations have to be done in chainlink repo first. Not sure what's the best approach here, @asoliman92 any suggestions?
We can either cherry-pick ccip changes to chainlink and resolve conflicts in this PR or merge that PR and resolve conflicts on ccip level

I've done the cherry-pick approach while doing the initial porting from CCIP to CL. It worked well at the time and I think it should be okay with your PR.

I'm actually more interested in merging CL to CCIP and fixing the conflicts as we'll need to do this someway or another once V1.5 is ready. But this might be outside the scope of your PR.

@mateusz-sekara mateusz-sekara force-pushed the optimize-token-prices branch 3 times, most recently from d20c2d7 to 7dd61a9 Compare August 20, 2024 13:14

_, err := o.ds.ExecContext(ctx, stmt, destChainSelector, expireSec)
return err
tokensToIgnore := mapset.NewThreadUnsafeSet[string](dbTokensToIgnore...)
Copy link
Contributor

@asoliman92 asoliman92 Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the ThreadUnsafeSet intentional? I think it won't cause issues as it's locally created but seeing Unsafe does an alert so making extra sure that we're not missing something :D

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, is there a reason to use NewThreadUnsafeSet vs alternatives?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, mapset default implementation is threadsafe which adds locking. It's not needed here, because we use that set only within this function. Locking comes with overhead so I've decided to use the implementation that doesn't have any locks under hood. However, if it raises questions I can either use locking implementation or just use a regular Go map

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// NewThreadUnsafeSet creates and returns a new set with the given elements.
// Operations on the resulting set are not thread-safe.

Co-authored-by: Abdelrahman Soliman (Boda) <2677789+asoliman92@users.noreply.github.com>
@cl-sonarqube-production
Copy link

Quality Gate failed Quality Gate failed

Failed conditions
2 New Blocker Issues (required ≤ 0)

See analysis details on SonarQube

Catch issues before they fail your Quality Gate with our IDE extension SonarLint SonarLint

@mateusz-sekara mateusz-sekara added this pull request to the merge queue Aug 22, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 22, 2024
@lukaszcl lukaszcl added this pull request to the merge queue Aug 22, 2024
Merged via the queue into develop with commit a865709 Aug 22, 2024
151 of 153 checks passed
@lukaszcl lukaszcl deleted the optimize-token-prices branch August 22, 2024 09:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants