-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scylla-bench fails to reconnect after altering table #114
Comments
maybe
isn't enough for this test case ? |
We face the issue with disconnecting from cluster after simple table modifications in scylla-bench (counters multidc test): scylladb/scylla-bench#114 It was proven it behaves correctly when s-b is in version 0.1.3. This commit pins s-b for couters multidc test.
I'm not sure, disconnections were persisting sometimes for 2 minutes. We would need to test it. |
I tried with timeout settings like this: |
While running a large-partitions test I encountered a similar problem. Not sure if it's tied to this, but it's a possbility. After the pre-write workload, when starting one of the stress workloads, we got:
Running the same job with a pinned version of scylla-bench (0.1.14) did not reproduce this issue. Similarly, a run without Raft did not fail at this point, so there might be some flakiness involved here. Installation detailsKernel Version: 5.15.0-1026-aws Cluster size: 4 nodes (i3en.3xlarge) Scylla Nodes used in this run:
OS / Image: Test: Issue description>>>>>>>
Logs:
|
@avelanarius, we suspect there is a regression or at least a behavior change in how s-b works for us with later (latest?) gocql driver. |
We face the issue with disconnecting from cluster after simple table modifications in scylla-bench (counters multidc test): scylladb/scylla-bench#114 It was proven it behaves correctly when s-b is in version 0.1.3. This commit pins s-b for couters multidc test.
the case was pinned to older s-b cause of: scylladb/scylla-bench#114 but since then we implemented s-b retires that should fix most of the of the timeout observed in 2023.1 run. Ref: scylladb/scylla-bench#114
the case was pinned to older s-b cause of: scylladb/scylla-bench#114 but since then we implemented s-b retires that should fix most of the of the timeout observed in 2023.1 run. Ref: scylladb/scylla-bench#114
the case was pinned to older s-b cause of: scylladb/scylla-bench#114 but since then we implemented s-b retires that should fix most of the of the timeout observed in 2023.1 run. Ref: scylladb/scylla-bench#114 (cherry picked from commit b1b3fe0)
the case was pinned to older s-b cause of: scylladb/scylla-bench#114 but since then we implemented s-b retires that should fix most of the of the timeout observed in 2023.1 run. Ref: scylladb/scylla-bench#114 (cherry picked from commit b1b3fe0)
scylla-bench failed with
PackagesScylla version: Kernel Version: Issue description
Describe your issue in detail and steps it took to produce it. ImpactDescribe the impact this issue causes to the user. How frequently does it reproduce?Describe the frequency with how this issue can be reproduced. Installation detailsCluster size: 5 nodes (n2-highmem-16) Scylla Nodes used in this run:
OS / Image: Test: Logs and commands
Logs:
|
I'm trying to understand if it's a scylla-bench issue, it looks like a gocql issue to me. |
It's probably cause of scylla slowing down, the internal queries might not have enough timeouts setup. So as always it's a combination of a scylla issue, and how strict we want to be with timeouts, and how configurable those internal queries are. |
Installation details
Kernel Version: 5.15.0-1026-aws
Scylla version (or git commit hash):
5.2.0~dev-20221209.6075e01312a5
with build-id0e5d044b8f9e5bdf7f53cc3c1e959fab95bf027c
Cluster size: 9 nodes (i3.2xlarge)
Scylla Nodes used in this run:
OS / Image:
ami-0b85d6f35bddaff65 ami-0a1ff01b931943772 ami-08e5c2ae0089cade3
(aws: eu-west-1)Test:
longevity-counters-6h-multidc-test
Test id:
7785df01-a1fe-483a-beb7-2f63b9044b87
Test name:
scylla-master/raft/longevity-counters-6h-multidc-test
Test config file(s):
Issue description
Counters test in multidc scenario is failing persistenlty after altering table.
E.g. after running
ALTER TABLE scylla_bench.test_counters WITH bloom_filter_fp_chance = 0.45374057709882093
orALTER TABLE scylla_bench.test_counters WITH read_repair_chance = 0.9;
, or evenALTER TABLE scylla_bench.test_counters WITH comment = 'IHQS6RAYS5VQ6CQZYBYEX1GP';
after such changes, scylla-bench is failing tests due error:
later it looks connection is recovered - so connection issues are not permanent. But it is enough to fail test critically ending the test.
$ hydra investigate show-monitor 7785df01-a1fe-483a-beb7-2f63b9044b87
$ hydra investigate show-logs 7785df01-a1fe-483a-beb7-2f63b9044b87
Logs:
| 20221209_161654 | grafana | https://cloudius-jenkins-test.s3.amazonaws.com/7785df01-a1fe-483a-beb7-2f63b9044b87/20221209_161654/grafana-screenshot-longevity-counters-6h-multidc-test-scylla-per-server-metrics-nemesis-20221209_161803-longevity-counters-multidc-master-monitor-node-7785df01-1.png |
| 20221209_161654 | grafana | https://cloudius-jenkins-test.s3.amazonaws.com/7785df01-a1fe-483a-beb7-2f63b9044b87/20221209_161654/grafana-screenshot-overview-20221209_161654-longevity-counters-multidc-master-monitor-node-7785df01-1.png |
| 20221209_162553 | db-cluster | https://cloudius-jenkins-test.s3.amazonaws.com/7785df01-a1fe-483a-beb7-2f63b9044b87/20221209_162553/db-cluster-7785df01.tar.gz |
| 20221209_162553 | loader-set | https://cloudius-jenkins-test.s3.amazonaws.com/7785df01-a1fe-483a-beb7-2f63b9044b87/20221209_162553/loader-set-7785df01.tar.gz |
| 20221209_162553 | monitor-set | https://cloudius-jenkins-test.s3.amazonaws.com/7785df01-a1fe-483a-beb7-2f63b9044b87/20221209_162553/monitor-set-7785df01.tar.gz |
| 20221209_162553 | sct | https://cloudius-jenkins-test.s3.amazonaws.com/7785df01-a1fe-483a-beb7-2f63b9044b87/20221209_162553/sct-runner-7785df01.tar.gz
Jenkins job URL
The text was updated successfully, but these errors were encountered: