Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

safekeeper: add basic WAL ingestion benchmarks #9531

Open
wants to merge 3 commits into
base: erik/wal-generator-generic
Choose a base branch
from

Conversation

erikgrinaker
Copy link
Contributor

@erikgrinaker erikgrinaker commented Oct 26, 2024

Problem

We don't have any benchmarks for Safekeeper WAL ingestion.

Resolves #9339.
Blocked by #9614.

Summary of changes

Add some basic benchmarks for WAL ingestion, specifically for SafeKeeper::process_msg() (single append) and WalAcceptor (pipelined batch ingestion).

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist
    In

@erikgrinaker erikgrinaker requested a review from a team as a code owner October 26, 2024 12:15
Copy link

github-actions bot commented Oct 26, 2024

5328 tests run: 5100 passed, 6 failed, 222 skipped (full report)


Failures on Postgres 17

# Run all failed tests locally:
scripts/pytest -vv -n $(nproc) -k "test_pg_regress[release-pg17-None] or test_pg_regress[release-pg17-None] or test_pg_regress[debug-pg17-None] or test_pg_regress[release-pg17-4] or test_pg_regress[release-pg17-4] or test_pg_regress[debug-pg17-4]"

Test coverage report is not available

The comment gets automatically updated with the latest test results
f4b8795 at 2024-11-03T17:25:29.676Z :recycle:

Base automatically changed from erik/wal-generator to main October 30, 2024 11:46
@arssher arssher requested a review from a team as a code owner October 30, 2024 11:46
@arssher arssher requested a review from knizhnik October 30, 2024 11:46
@erikgrinaker erikgrinaker removed request for a team and knizhnik October 30, 2024 11:57
Copy link
Contributor

@VladLazar VladLazar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, maybe Arseny could take a look as well for some more SK context?

/// each individual message to amortize costs (e.g. fsync), which is more realistic. Records are
/// XlLogicalMessage with a tiny payload.
///
/// TODO: add benchmarks with larger data volume, and measure throughput.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I expected this benchmark to measure tput. Does it not make sense to do that now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll add it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a separate wal_acceptor_throughput benchmark, with the intent that wal_acceptor measures the cost of ingesting trivial messages (i.e. per-message processing latency), while wal_acceptor_throughput measures bulk ingestion (typically IO-bound).

Smaller messages really kill throughput. Lower than 1 KB per message wasn't viable since the benchmark would take 30 minutes to ingest 1 GB. Haven't looked into why, might be some low-hanging fruit here.

wal_acceptor_throughput/fsync=false/size=1024
                        time:   [14.427 s 14.487 s 14.550 s]
                        thrpt:  [70.378 MiB/s 70.683 MiB/s 70.979 MiB/s]
wal_acceptor_throughput/fsync=false/size=4096
                        time:   [4.8211 s 4.8395 s 4.8602 s]
                        thrpt:  [210.69 MiB/s 211.59 MiB/s 212.40 MiB/s]
wal_acceptor_throughput/fsync=false/size=131072
                        time:   [1.9312 s 1.9410 s 1.9518 s]
                        thrpt:  [524.66 MiB/s 527.55 MiB/s 530.25 MiB/s]
wal_acceptor_throughput/fsync=false/size=1048576
                        time:   [1.9040 s 1.9095 s 1.9155 s]
                        thrpt:  [534.60 MiB/s 536.27 MiB/s 537.81 MiB/s]

safekeeper/benches/receive_wal.rs Outdated Show resolved Hide resolved
safekeeper/Cargo.toml Outdated Show resolved Hide resolved
@erikgrinaker erikgrinaker changed the base branch from main to erik/wal-generator-generic November 3, 2024 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants