fix(Full-Partition-Scan): Increase scan interval #5033

yarongilor · 2022-07-19T08:44:53Z

In order to decrease connection requests load on cluster by scan thread.
Intervals of 4 large-partitions longevities are increased to 5 minutes.
Fixes: https://github.com/scylladb/scylla-cluster-tests/issues/4960

Notes:

I'm not sure whether this should be backported or not (not a must).
The interval unit is not changed from seconds to minutes in order to keep future ability to test this lower granularity if needed.
That is since it is an important feature to an important customer (Discord) and scylla-bench doesn't have a stable supporting version for that.

https://trello.com/c/U3bhzNUL

PR pre-checks (self review)

I followed KISS principle and best practices
I didn't leave commented-out/debugging code
I added the relevant backport labels
New configuration option are added and documented (in sdcm/sct_config.py)
I have added tests to cover my changes (Infrastructure only - under unit-test/ folder)
All new and existing unit tests passed (CI)
I have updated the Readme/doc folder accordingly (if needed)

In order to decrease connection requests load on cluster by scan thread. Intervals of 4 large-partitions longevities are increased to 5 minutes. Fixes: scylladb#4960

roydahan

LGTM.
No need to backport for now.

fgelcer

issue #4960 it was asked not only to increase the timeout, but to align the unit used (from seconds, to minutes) to be according to run_fullscan threads..
please change the unit to be in minutes.

roydahan · 2022-07-19T16:51:09Z

issue #4960 it was asked not only to increase the timeout, but to align the unit used (from seconds, to minutes) to be according to run_fullscan threads.. please change the unit to be in minutes.

I disagree with that.
@yarongilor wrote correctly that he leaves the units as seconds in case we want to change it back in the future.

fgelcer · 2022-07-19T17:26:19Z

issue #4960 it was asked not only to increase the timeout, but to align the unit used (from seconds, to minutes) to be according to run_fullscan threads.. please change the unit to be in minutes.

I disagree with that. @yarongilor wrote correctly that he leaves the units as seconds in case we want to change it back in the future.

the issue this PR claims to fix, has a very clear request: we should align the meaning of interval, and also not do the scan in such small intervals

so if we really want to keep it in seconds to still have the ability to return to this smaller granularity, we shall change the other one to use seconds, instead of minutes..

roydahan · 2022-07-20T10:17:47Z

Not needed.

fgelcer · 2022-07-26T19:56:39Z

@roydahan , IIUC we are seeing this issue on 2022.1 Azure longevity, as the instances in there are kind of 1/2 of the ones we usually use for the parallel longevity in AWS, so i believe the cluster is super overloaded, and these full scans are surely not helping.

or maybe we should disable the full scans for Azure, until we get better quota to run the test with the resources we need?
@soyacz , WDYT?

soyacz · 2022-07-27T13:58:44Z

I wouldn't disable it, we don't hit too many issues related to high load on 2022.1 branch (perf. is different on master - higher ops/s and load, while on 2022.1 ops/s and load is lower)

fix(Full-Partition-Scan): Increase scan interval

d37b1a3

In order to decrease connection requests load on cluster by scan thread. Intervals of 4 large-partitions longevities are increased to 5 minutes. Fixes: scylladb#4960

yarongilor requested review from fgelcer, fruch and roydahan July 19, 2022 08:51

roydahan approved these changes Jul 19, 2022

View reviewed changes

fgelcer suggested changes Jul 19, 2022

View reviewed changes

roydahan added the backport/none Backport is not required label Jul 20, 2022

roydahan merged commit 415907a into scylladb:master Jul 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(Full-Partition-Scan): Increase scan interval #5033

fix(Full-Partition-Scan): Increase scan interval #5033

yarongilor commented Jul 19, 2022 •

edited

Loading

roydahan left a comment

fgelcer left a comment

roydahan commented Jul 19, 2022

fgelcer commented Jul 19, 2022

roydahan commented Jul 20, 2022

fgelcer commented Jul 26, 2022

soyacz commented Jul 27, 2022

fix(Full-Partition-Scan): Increase scan interval #5033

fix(Full-Partition-Scan): Increase scan interval #5033

Conversation

yarongilor commented Jul 19, 2022 • edited Loading

PR pre-checks (self review)

roydahan left a comment

Choose a reason for hiding this comment

fgelcer left a comment

Choose a reason for hiding this comment

roydahan commented Jul 19, 2022

fgelcer commented Jul 19, 2022

roydahan commented Jul 20, 2022

fgelcer commented Jul 26, 2022

soyacz commented Jul 27, 2022

yarongilor commented Jul 19, 2022 •

edited

Loading