Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

full_scan interval are in minutes while run_full_partition_scan intervals are in secoands #4960

Closed
fruch opened this issue Jun 30, 2022 · 8 comments
Assignees
Labels
Bug Something isn't working right

Comments

@fruch
Copy link
Contributor

fruch commented Jun 30, 2022

run_fullscan: '{"ks_cf": "scylla_bench.test", "interval": 10}' # 'ks.cf|random, interval(min)'

run_full_partition_scan: '{"ks_cf": "scylla_bench.test", "interval": 2, "pk_name":"pk", "rows_count": 5000, "validate_data": "true", "include_data_column": "true"}' # 'ks.cf, interval(sec), partition-key name, number-of-rows-per-partition, validate reversed query output, include data-column or only validate pk + ck'

some cases have 2sec intervals, which seem to be very problematic (since it reconnect ~3 time each round)

we should align the meaning of interval, and also not do the scan in such small intervals

@fruch fruch added the Bug Something isn't working right label Jun 30, 2022
@fgelcer
Copy link
Contributor

fgelcer commented Jun 30, 2022

@yarongilor , please fix it ASAP

@fruch fruch changed the title rull_scan interval are in minutes while run_full_partition_scan intervals are in secoands full_scan interval are in minutes while run_full_partition_scan intervals are in secoands Jul 3, 2022
@yarongilor
Copy link
Contributor

@fruch , @fgelcer , why is it problematic? why is it 'ASAP'. It was a request by @roydahan in the original PR.
@roydahan , can it be changed to a higher value like 60 seconds?

@fgelcer
Copy link
Contributor

fgelcer commented Jul 3, 2022

@fruch , @fgelcer , why is it problematic? why is it 'ASAP'. It was a request by @roydahan in the original PR. @roydahan , can it be changed to a higher value like 60 seconds?

we may be overloading some tests, with this high frequency scans... and as @fruch says, we have few reconnects for each round, and it can have an impact in the whole test/network...

as a side change (less important) is to align the units used

@yarongilor
Copy link
Contributor

I don't think it ever caused a network issue or something. It is quite negligible since uses only 1 single connection.
The interval format is different on purpose.
These are the only tests that covers reversed-queries and we don't currently have any other alternatices for testing that.
IMPORTANT NOTE: it will be more reasonable reducing this frequency once scylla-bench new stable version exist, since it should already include reversed-queries support.
I think until then, we can close this issue unless we see some unwanted impact on tests.

@fruch
Copy link
Contributor Author

fruch commented Jul 3, 2022

I don't think it ever caused a network issue or something. It is quite negligible since uses only 1 single connection. The interval format is different on purpose. These are the only tests that covers reversed-queries and we don't currently have any other alternatices for testing that. IMPORTANT NOTE: it will be more reasonable reducing this frequency once scylla-bench new stable version exist, since it should already include reversed-queries support. I think until then, we can close this issue unless we see some unwanted impact on tests.

who exactly is working on reversed query support for scylla-bench ?

and how's scylla-bench stability is related to SCT chocking scylla with all those rapid full scans ?

also it's not 1 single connection, it's 1 connection X number of shards each time a session is opened. and the issue isn't overloading the network, but overloading scylla cluster. (which in turns can causes scylla-bench queries to timeout), as well as the full scans to timeout (in some cases)

fruch added a commit to fruch/scylla-cluster-tests that referenced this issue Jul 3, 2022
…terval

make `run_full_partition_scan.interval` 2 minutes, insted two 2sec
to avoid overloading scylla during this use case

Ref: scylladb#4960
@roydahan
Copy link
Contributor

@yarongilor I think in the beginning you set it such low by mistake and then we saw it can handle this so we kept it.
I think it's ok now, to reduce it to happen every few minutes.

Regarding the scylla-bench support, I don't remember what happened with this.
I recall that we got the support for it, but maybe there was a bug in scylla-bench that caused it to crash and we had to revert to previous version?
Please revive this thread / issue.

@yarongilor
Copy link
Contributor

@yarongilor I think in the beginning you set it such low by mistake and then we saw it can handle this so we kept it. I think it's ok now, to reduce it to happen every few minutes.

Regarding the scylla-bench support, I don't remember what happened with this. I recall that we got the support for it, but maybe there was a bug in scylla-bench that caused it to crash and we had to revert to previous version? Please revive this thread / issue.

@roydahan , correct, this is what happened.
So there's the following s-b open issue: scylladb/scylla-bench#90
And there are 2 tasks that are 'done':
https://trello.com/c/A93qT8XQ
https://trello.com/c/eaotAoI2

yarongilor added a commit to yarongilor/scylla-cluster-tests that referenced this issue Jul 19, 2022
	In order to decrease connection requests load on cluster by scan thread.
	Intervals of 4 large-partitions longevities are increased to 5 minutes.
	Fixes: scylladb#4960
@roydahan
Copy link
Contributor

Did you see the comment from Dmitry from February?
He said that s-b exited due to reaching the error limit, not due to a coredump.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working right
Projects
None yet
Development

No branches or pull requests

4 participants