Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loader node with scylla-bench v0.1.8 got a core dump #90

Open
yarongilor opened this issue Feb 8, 2022 · 3 comments
Open

Loader node with scylla-bench v0.1.8 got a core dump #90

yarongilor opened this issue Feb 8, 2022 · 3 comments

Comments

@yarongilor
Copy link

yarongilor commented Feb 8, 2022

Installation details
Kernel version: 5.11.0-1022-aws
Scylla version (or git commit hash): 4.6.rc5-0.20220203.5694ec189 with build-id f5d85bf5abe6d2f9fd3487e2469ce1c34304cc14
Cluster size: 4 nodes (i3en.3xlarge)
Scylla running with shards number (live nodes):
longevity-large-partitions-4d-4-6-db-node-e2adc2e9-1 (16.170.220.3 | 10.0.3.180): 12 shards
longevity-large-partitions-4d-4-6-db-node-e2adc2e9-2 (13.48.106.98 | 10.0.1.75): 12 shards
longevity-large-partitions-4d-4-6-db-node-e2adc2e9-4 (13.51.193.35 | 10.0.3.6): 12 shards
longevity-large-partitions-4d-4-6-db-node-e2adc2e9-5 (16.171.64.136 | 10.0.0.210): 12 shards
Scylla running with shards number (terminated nodes):
longevity-large-partitions-4d-4-6-db-node-e2adc2e9-3 (16.170.157.129 | 10.0.3.67): 12 shards
OS (RHEL/CentOS/Ubuntu/AWS AMI): ami-099a011bd5f16a168 (aws: eu-north-1)

Test: longevity-large-partition-4days-test
Test name: longevity_large_partition_test.LargePartitionLongevityTest.test_large_partition_longevity
Test config file(s):

  • longevity-large-partition-4days.yaml

Issue description

====================================

Two loader nodes running scylla-bench v0.1.8 got 3 core dumps:

2022-02-04 19:12:38.443: (CoreDumpEvent Severity.ERROR) period_type=one-time event_id=808c9044-c851-4fe5-884a-c7217aa8d4c7 node=Node longevity-large-partitions-4d-4-6-loader-node-e2adc2e9-2 [16.170.143.136 | 10.0.3.155] (seed: False)
2022-02-04 19:30:02.330: (CoreDumpEvent Severity.ERROR) period_type=one-time event_id=39c8e89c-222a-4e89-bfef-a0ad19fe9903 node=Node longevity-large-partitions-4d-4-6-loader-node-e2adc2e9-1 [13.48.13.196 | 10.0.3.125] (seed: False)
2022-02-04 20:50:58.048: (CoreDumpEvent Severity.ERROR) period_type=one-time event_id=c3e3ce43-fa0c-4381-aa90-2bfd04b8eb7c node=Node longevity-large-partitions-4d-4-6-loader-node-e2adc2e9-1 [13.48.13.196 | 10.0.3.125] (seed: False)

It looks like SCT encountered a problem uploading the coredump to s3:

< t:2022-02-04 19:30:02,331 f:file_logger.py  l:89   c:sdcm.sct_events.file_logger p:INFO  > 2022-02-04 19:30:02.330: (CoreDumpEvent Severity.ERROR) period_type=one-time event_id=39c8e89c-222a-4e89-bfef-a0ad19fe9903 node=Node longevity-large-partitions-4d-4-6-loader-node-e2adc2e9-1 [13.48.13.196 | 10.0.3.125
] (seed: False)
< t:2022-02-04 19:30:33,724 f:coredump.py     l:389  c:sdcm.cluster_aws     p:ERROR > Node longevity-large-partitions-4d-4-6-loader-node-e2adc2e9-1 [13.48.13.196 | 10.0.3.125] (seed: False): CoredumpExportSystemdThread: Failed to convert date 'Timestamp: Fri 2022-02-04 19:13:57 UTC (16min ago)' (Fri 2022-02-
04 19:13:57 UTC), due to error: time data 'Fri 2022-02-04 19:13:57 UTC' does not match format '%a %Y-%m-%d %H:%M:%S %z'
< t:2022-02-04 19:30:33,725 f:coredump.py     l:220  c:sdcm.cluster_aws     p:ERROR > Node longevity-large-partitions-4d-4-6-loader-node-e2adc2e9-1 [13.48.13.196 | 10.0.3.125] (seed: False): CoredumpExportSystemdThread: CoreDump[859] has inaccessible corefile, can't upload it
< t:2022-02-04 20:50:58,050 f:file_logger.py  l:89   c:sdcm.sct_events.file_logger p:INFO  > 2022-02-04 20:50:58.048: (CoreDumpEvent Severity.ERROR) period_type=one-time event_id=c3e3ce43-fa0c-4381-aa90-2bfd04b8eb7c node=Node longevity-large-partitions-4d-4-6-loader-node-e2adc2e9-1 [13.48.13.196 | 10.0.3.125
] (seed: False)
< t:2022-02-04 20:51:58,811 f:coredump.py     l:389  c:sdcm.cluster_aws     p:ERROR > Node longevity-large-partitions-4d-4-6-loader-node-e2adc2e9-1 [13.48.13.196 | 10.0.3.125] (seed: False): CoredumpExportSystemdThread: Failed to convert date 'Timestamp: Fri 2022-02-04 20:35:03 UTC (16min ago)' (Fri 2022-02-
04 20:35:03 UTC), due to error: time data 'Fri 2022-02-04 20:35:03 UTC' does not match format '%a %Y-%m-%d %H:%M:%S %z'
< t:2022-02-04 20:51:58,811 f:coredump.py     l:220  c:sdcm.cluster_aws     p:ERROR > Node longevity-large-partitions-4d-4-6-loader-node-e2adc2e9-1 [13.48.13.196 | 10.0.3.125] (seed: False): CoredumpExportSystemdThread: CoreDump[6632] has inaccessible corefile, can't upload it

====================================

Restore Monitor Stack command: $ hydra investigate show-monitor e2adc2e9-28de-4aab-8dd3-5420deabc259
Restore monitor on AWS instance using Jenkins job
Show all stored logs command: $ hydra investigate show-logs e2adc2e9-28de-4aab-8dd3-5420deabc259

Test id: e2adc2e9-28de-4aab-8dd3-5420deabc259

Logs:
grafana - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_211619/grafana-screenshot-longevity-large-partition-4days-test-scylla-per-server-metrics-nemesis-20220204_211840-longevity-large-partitions-4d-4-6-monitor-node-e2adc2e9-1.png](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_211619/grafana-screenshot-longevity-large-partition-4days-test-scylla-per-server-metrics-nemesis-20220204_211840-longevity-large-partitions-4d-4-6-monitor-node-e2adc2e9-1.png%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_211619/grafana-screenshot-longevity-large-partition-4days-test-scylla-per-server-metrics-nemesis-20220204_211840-longevity-large-partitions-4d-4-6-monitor-node-e2adc2e9-1.png)&source=gmail-html&ust=1644400806565000&usg=AOvVaw3FfBA-mhjEIAtL-7F3JgxY)
grafana - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_211619/grafana-screenshot-overview-20220204_211619-longevity-large-partitions-4d-4-6-monitor-node-e2adc2e9-1.png](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_211619/grafana-screenshot-overview-20220204_211619-longevity-large-partitions-4d-4-6-monitor-node-e2adc2e9-1.png%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_211619/grafana-screenshot-overview-20220204_211619-longevity-large-partitions-4d-4-6-monitor-node-e2adc2e9-1.png)&source=gmail-html&ust=1644400806565000&usg=AOvVaw2ZecaaF9ftF-uj5bd3z65d)
critical - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/critical-e2adc2e9.log.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/critical-e2adc2e9.log.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/critical-e2adc2e9.log.tar.gz)&source=gmail-html&ust=1644400806565000&usg=AOvVaw2MVPU_TmCiDmQRnWM5Tp-M)
db-cluster - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/db-cluster-e2adc2e9.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/db-cluster-e2adc2e9.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/db-cluster-e2adc2e9.tar.gz)&source=gmail-html&ust=1644400806565000&usg=AOvVaw3oalZn4yCxPZrxAc2h4CwH)
debug - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/debug-e2adc2e9.log.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/debug-e2adc2e9.log.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/debug-e2adc2e9.log.tar.gz)&source=gmail-html&ust=1644400806565000&usg=AOvVaw1E-ViLa0LBqslYlLheyX8u)
email_data - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/email_data-e2adc2e9.json.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/email_data-e2adc2e9.json.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/email_data-e2adc2e9.json.tar.gz)&source=gmail-html&ust=1644400806565000&usg=AOvVaw2TPSFU8f0o1GrJ-r1k-uyB)
error - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/error-e2adc2e9.log.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/error-e2adc2e9.log.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/error-e2adc2e9.log.tar.gz)&source=gmail-html&ust=1644400806566000&usg=AOvVaw0P2XY0WTQlEhEEeGTqoQZ1)
event - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/events-e2adc2e9.log.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/events-e2adc2e9.log.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/events-e2adc2e9.log.tar.gz)&source=gmail-html&ust=1644400806566000&usg=AOvVaw3LQ3ZZVbO9RjibrU8vd2B-)
left_processes - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/left_processes-e2adc2e9.log.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/left_processes-e2adc2e9.log.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/left_processes-e2adc2e9.log.tar.gz)&source=gmail-html&ust=1644400806566000&usg=AOvVaw35nrd7VqCxRMYLrdwMt7cH)
loader-set - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/loader-set-e2adc2e9.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/loader-set-e2adc2e9.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/loader-set-e2adc2e9.tar.gz)&source=gmail-html&ust=1644400806566000&usg=AOvVaw04odK_tFaz86XDrmqv8RXC)
monitor-set - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/monitor-set-e2adc2e9.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/monitor-set-e2adc2e9.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/monitor-set-e2adc2e9.tar.gz)&source=gmail-html&ust=1644400806566000&usg=AOvVaw3NsARY3BuTdlUrZFShnOyq)
normal - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/normal-e2adc2e9.log.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/normal-e2adc2e9.log.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/normal-e2adc2e9.log.tar.gz)&source=gmail-html&ust=1644400806566000&usg=AOvVaw2YLTrPn9946TN4w2NuOiFv)
output - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/output-e2adc2e9.log.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/output-e2adc2e9.log.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/output-e2adc2e9.log.tar.gz)&source=gmail-html&ust=1644400806566000&usg=AOvVaw0GwpuKgyScv2xgXFOE-5q0)
event - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/raw_events-e2adc2e9.log.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/raw_events-e2adc2e9.log.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/raw_events-e2adc2e9.log.tar.gz)&source=gmail-html&ust=1644400806566000&usg=AOvVaw2uG-2E9ad99zdPsbBdv-w8)
sct - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/sct-e2adc2e9.log.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/sct-e2adc2e9.log.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/sct-e2adc2e9.log.tar.gz)&source=gmail-html&ust=1644400806566000&usg=AOvVaw2LM6z_sWept67xziDCKTYY)
summary - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/summary-e2adc2e9.log.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/summary-e2adc2e9.log.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/summary-e2adc2e9.log.tar.gz)&source=gmail-html&ust=1644400806566000&usg=AOvVaw3BRMZOe4Lz_KWgA9irnCnC)
warning - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/warning-e2adc2e9.log.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/warning-e2adc2e9.log.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/warning-e2adc2e9.log.tar.gz)&source=gmail-html&ust=1644400806566000&usg=AOvVaw29-SvCMANTsOUzGrZgpt57)

Jenkins job URL

@yarongilor yarongilor changed the title Loader node with s-b v0.1.8 got a core dump Loader node with scylla-bench v0.1.8 got a core dump Feb 8, 2022
@aleksbykov
Copy link

Issue again reproduced with job:
Installation details
Kernel version: 5.11.0-1028-aws
Scylla version (or git commit hash): 5.1.dev-0.20220209.5099b1e27 with build-id b0986550af32b8da96b50442a53423047ed91696
Cluster size: 4 nodes (i3en.2xlarge)
Scylla running with shards number (live nodes):
longevity-twcs-48h-master-db-node-1a3d093d-1 (34.248.68.217 | 10.0.2.75): 8 shards
longevity-twcs-48h-master-db-node-1a3d093d-4 (34.247.155.5 | 10.0.2.44): 8 shards
longevity-twcs-48h-master-db-node-1a3d093d-5 (3.250.79.246 | 10.0.3.237): 8 shards
longevity-twcs-48h-master-db-node-1a3d093d-6 (34.244.37.56 | 10.0.0.57): 8 shards
Scylla running with shards number (terminated nodes):
longevity-twcs-48h-master-db-node-1a3d093d-2 (54.229.206.88 | 10.0.3.42): 8 shards
longevity-twcs-48h-master-db-node-1a3d093d-3 (34.253.105.243 | 10.0.2.98): 8 shards
OS (RHEL/CentOS/Ubuntu/AWS AMI): ami-0f894d9df1a4e76fc (aws: eu-west-1)

Test: longevity-twcs-48h-test
Test name: longevity_twcs_test.TWCSLongevityTest.test_custom_time
Test config file(s):

  • longevity-twcs-48h.yaml

Issue description

====================================

2022-02-11 10:30:08.885: (CoreDumpEvent Severity.ERROR) period_type=one-time event_id=c9442f32-bde5-489c-991b-0a0de71e893d node=Node longevity-twcs-48h-master-loader-node-1a3d093d-3 [54.170.161.43 | 10.0.2.226] (seed: False)

022-02-11 10:42:56.479: (ScyllaBenchEvent Severity.CRITICAL) period_type=end event_id=b1610c44-c0de-44cf-ae95-7915a0ffbf67 duration=6h10m36s: node=Node longevity-twcs-48h-master-loader-node-1a3d093d-1 [54.217.57.220 | 10.0.3.227] (seed: False)
stress_cmd=scylla-bench -workload=timeseries -mode=write -replication-factor=3 -partition-count=400 -clustering-row-count=10000000 -clustering-row-size=200 -concurrency=100 -rows-per-request=100 -start-timestamp=SET_WRITE_TIMESTAMP -connection-count 100 -max-rate 50000 --timeout 120s -duration=2880m -error-at-row-limit 1000
errors:

Stress command completed with bad status -1: 2022/02/11 04:37:47 EOF
2022/02/11 04:37:47 EOF
2022/02/11 04:37:47 EOF
2022/02/11 04:37:47 EOF
2022

====================================

Restore Monitor Stack command: $ hydra investigate show-monitor 1a3d093d-697d-40cc-8909-8949d48797b8
Restore monitor on AWS instance using Jenkins job
Show all stored logs command: $ hydra investigate show-logs 1a3d093d-697d-40cc-8909-8949d48797b8

Test id: 1a3d093d-697d-40cc-8909-8949d48797b8

Logs:
grafana - [https://cloudius-jenkins-test.s3.amazonaws.com/1a3d093d-697d-40cc-8909-8949d48797b8/20220211_105743/grafana-screenshot-longevity-twcs-48h-test-scylla-per-server-metrics-nemesis-20220211_105743-longevity-twcs-48h-master-monitor-node-1a3d093d-1.png](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/1a3d093d-697d-40cc-8909-8949d48797b8/20220211_105743/grafana-screenshot-longevity-twcs-48h-test-scylla-per-server-metrics-nemesis-20220211_105743-longevity-twcs-48h-master-monitor-node-1a3d093d-1.png%255D(https://cloudius-jenkins-test.s3.amazonaws.com/1a3d093d-697d-40cc-8909-8949d48797b8/20220211_105743/grafana-screenshot-longevity-twcs-48h-test-scylla-per-server-metrics-nemesis-20220211_105743-longevity-twcs-48h-master-monitor-node-1a3d093d-1.png)&source=gmail-html&ust=1645022475321000&usg=AOvVaw34oY4co22mCsxOKm3GcK3d)
db-cluster - [https://cloudius-jenkins-test.s3.amazonaws.com/1a3d093d-697d-40cc-8909-8949d48797b8/20220211_110702/db-cluster-1a3d093d.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/1a3d093d-697d-40cc-8909-8949d48797b8/20220211_110702/db-cluster-1a3d093d.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/1a3d093d-697d-40cc-8909-8949d48797b8/20220211_110702/db-cluster-1a3d093d.tar.gz)&source=gmail-html&ust=1645022475321000&usg=AOvVaw2bPQgaTy4OgbnznZMCcUdt)
loader-set - [https://cloudius-jenkins-test.s3.amazonaws.com/1a3d093d-697d-40cc-8909-8949d48797b8/20220211_110702/loader-set-1a3d093d.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/1a3d093d-697d-40cc-8909-8949d48797b8/20220211_110702/loader-set-1a3d093d.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/1a3d093d-697d-40cc-8909-8949d48797b8/20220211_110702/loader-set-1a3d093d.tar.gz)&source=gmail-html&ust=1645022475322000&usg=AOvVaw0mTChAipz7UAXn1shz9hl3)
monitor-set - [https://cloudius-jenkins-test.s3.amazonaws.com/1a3d093d-697d-40cc-8909-8949d48797b8/20220211_110702/monitor-set-1a3d093d.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/1a3d093d-697d-40cc-8909-8949d48797b8/20220211_110702/monitor-set-1a3d093d.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/1a3d093d-697d-40cc-8909-8949d48797b8/20220211_110702/monitor-set-1a3d093d.tar.gz)&source=gmail-html&ust=1645022475322000&usg=AOvVaw3gWirQrklrqkkRr39fUWj-)
sct - [https://cloudius-jenkins-test.s3.amazonaws.com/1a3d093d-697d-40cc-8909-8949d48797b8/20220211_110702/sct-runner-1a3d093d.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/1a3d093d-697d-40cc-8909-8949d48797b8/20220211_110702/sct-runner-1a3d093d.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/1a3d093d-697d-40cc-8909-8949d48797b8/20220211_110702/sct-runner-1a3d093d.tar.gz)&source=gmail-html&ust=1645022475322000&usg=AOvVaw0j-UT7bSRcgUSIDwUul5V4)

Jenkins job URL

@KnifeyMoloko
Copy link

Another reproduced instance:
Installation details
Kernel version: 5.11.0-1028-aws
Scylla version (or git commit hash): 5.1.dev-0.20220217.69fcc053b with build-id b8415b1ebbffff2b4183734680f4afab3bfed86d
Cluster size: 4 nodes (i3en.2xlarge)
Scylla running with shards number (live nodes):
longevity-twcs-48h-master-db-node-d3f0b0ff-1 (3.250.216.253 | 10.0.3.124): 8 shards
longevity-twcs-48h-master-db-node-d3f0b0ff-2 (34.242.112.161 | 10.0.2.63): 8 shards
longevity-twcs-48h-master-db-node-d3f0b0ff-4 (52.18.71.120 | 10.0.0.189): 8 shards
longevity-twcs-48h-master-db-node-d3f0b0ff-5 (54.194.249.252 | 10.0.0.178): 8 shards
Scylla running with shards number (terminated nodes):
longevity-twcs-48h-master-db-node-d3f0b0ff-3 (54.154.245.131 | 10.0.2.210): 8 shards
OS (RHEL/CentOS/Ubuntu/AWS AMI): ami-041d8500e7cf30167 (aws: eu-west-1)

Test: longevity-twcs-48h-test
Test name: longevity_twcs_test.TWCSLongevityTest.test_custom_time
Test config file(s):

  • longevity-twcs-48h.yaml

Issue description

====================================

2022-02-18 16:16:29.292: (ScyllaBenchEvent Severity.CRITICAL) period_type=end event_id=506e1e7b-88aa-4cab-a6a4-45168f5bb513 duration=11h31m49s: node=Node longevity-twcs-48h-master-loader-node-d3f0b0ff-2 [18.202.227.20 | 10.0.1.29] (seed: False)
stress_cmd=scylla-bench -workload=timeseries -mode=read -partition-count=20000 -concurrency=100 -replication-factor=3 -clustering-row-count=10000000 -clustering-row-size=200  -rows-per-request=100 -start-timestamp=GET_WRITE_TIMESTAMP -write-rate 125 -distribution hnormal --connection-count 100 -duration=2880m -error-at-row-limit 1000
errors:
Stress command completed with bad status 1: 2022/02/18 04:44:50 EOF
2022/02/18 04:44:50 EOF
2022/02/18 04:44:50 EOF
2022/02/18 04:44:50 EOF
2022

====================================

Restore Monitor Stack command: $ hydra investigate show-monitor d3f0b0ff-3989-41f4-9eaf-a0f510ea5895
Restore monitor on AWS instance using Jenkins job
Show all stored logs command: $ hydra investigate show-logs d3f0b0ff-3989-41f4-9eaf-a0f510ea5895

Test id: d3f0b0ff-3989-41f4-9eaf-a0f510ea5895

Logs:
grafana - [https://cloudius-jenkins-test.s3.amazonaws.com/d3f0b0ff-3989-41f4-9eaf-a0f510ea5895/20220218_162212/grafana-screenshot-longevity-twcs-48h-test-scylla-per-server-metrics-nemesis-20220218_162212-longevity-twcs-48h-master-monitor-node-d3f0b0ff-1.png](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/d3f0b0ff-3989-41f4-9eaf-a0f510ea5895/20220218_162212/grafana-screenshot-longevity-twcs-48h-test-scylla-per-server-metrics-nemesis-20220218_162212-longevity-twcs-48h-master-monitor-node-d3f0b0ff-1.png%255D(https://cloudius-jenkins-test.s3.amazonaws.com/d3f0b0ff-3989-41f4-9eaf-a0f510ea5895/20220218_162212/grafana-screenshot-longevity-twcs-48h-test-scylla-per-server-metrics-nemesis-20220218_162212-longevity-twcs-48h-master-monitor-node-d3f0b0ff-1.png)&source=gmail-html&ust=1645825098006000&usg=AOvVaw3AQdhYrphS-OE7P0r-NN9x)
db-cluster - [https://cloudius-jenkins-test.s3.amazonaws.com/d3f0b0ff-3989-41f4-9eaf-a0f510ea5895/20220218_163047/db-cluster-d3f0b0ff.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/d3f0b0ff-3989-41f4-9eaf-a0f510ea5895/20220218_163047/db-cluster-d3f0b0ff.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/d3f0b0ff-3989-41f4-9eaf-a0f510ea5895/20220218_163047/db-cluster-d3f0b0ff.tar.gz)&source=gmail-html&ust=1645825098006000&usg=AOvVaw15Oxplmmmx4A992P9NcV9p)
loader-set - [https://cloudius-jenkins-test.s3.amazonaws.com/d3f0b0ff-3989-41f4-9eaf-a0f510ea5895/20220218_163047/loader-set-d3f0b0ff.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/d3f0b0ff-3989-41f4-9eaf-a0f510ea5895/20220218_163047/loader-set-d3f0b0ff.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/d3f0b0ff-3989-41f4-9eaf-a0f510ea5895/20220218_163047/loader-set-d3f0b0ff.tar.gz)&source=gmail-html&ust=1645825098006000&usg=AOvVaw38lX6KJlCsMjv9MxMNn-kA)
monitor-set - [https://cloudius-jenkins-test.s3.amazonaws.com/d3f0b0ff-3989-41f4-9eaf-a0f510ea5895/20220218_163047/monitor-set-d3f0b0ff.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/d3f0b0ff-3989-41f4-9eaf-a0f510ea5895/20220218_163047/monitor-set-d3f0b0ff.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/d3f0b0ff-3989-41f4-9eaf-a0f510ea5895/20220218_163047/monitor-set-d3f0b0ff.tar.gz)&source=gmail-html&ust=1645825098006000&usg=AOvVaw3ryy58TiaDz2Cm-Q9kfJik)
sct - [https://cloudius-jenkins-test.s3.amazonaws.com/d3f0b0ff-3989-41f4-9eaf-a0f510ea5895/20220218_163047/sct-runner-d3f0b0ff.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/d3f0b0ff-3989-41f4-9eaf-a0f510ea5895/20220218_163047/sct-runner-d3f0b0ff.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/d3f0b0ff-3989-41f4-9eaf-a0f510ea5895/20220218_163047/sct-runner-d3f0b0ff.tar.gz)&source=gmail-html&ust=1645825098006000&usg=AOvVaw3KS4axaNfHioCWFQhkB8Gd)

Jenkins job URL

@dkropachev
Copy link
Contributor

Reported issue is wrong, scylla dropped core, sb quits due to the reaching limit of errors:

< t:2022-02-04 21:00:23,310 f:base.py         l:146  c:RemoteCmdRunner      p:ERROR > Error executing command: "/$HOME/go/bin/scylla-bench -workload=uniform -mode=read -replication-factor=3 -partition-count=60 -clustering-row-count=10000000 -clustering-row-size=2048 -rows-per-request=2000 -timeout=180s -concurrency=700 -max-rate=64000  -duration=5760m -connection-count 500 -error-at-row-limit 1000 -nodes 10.0.3.180"; Exit status: 1
< t:2022-02-04 21:00:23,321 f:base.py         l:148  c:RemoteCmdRunner      p:DEBUG > STDOUT: :          9m0.092137471s
< t:2022-02-04 21:00:23,321 f:base.py         l:148  c:RemoteCmdRunner      p:DEBUG >   95th:            9m0.092137471s
< t:2022-02-04 21:00:23,321 f:base.py         l:148  c:RemoteCmdRunner      p:DEBUG >   90th:            9m0.092137471s
< t:2022-02-04 21:00:23,321 f:base.py         l:148  c:RemoteCmdRunner      p:DEBUG >   median:  9m0.092137471s
< t:2022-02-04 21:00:23,321 f:base.py         l:148  c:RemoteCmdRunner      p:DEBUG >   mean:            8m51.248701013s
< t:2022-02-04 21:00:23,321 f:base.py         l:148  c:RemoteCmdRunner      p:DEBUG >
< t:2022-02-04 21:00:23,321 f:base.py         l:148  c:RemoteCmdRunner      p:DEBUG > Following critical errors where caught during the run:
< t:2022-02-04 21:00:23,321 f:base.py         l:148  c:RemoteCmdRunner      p:DEBUG >     Error limit (maxErrorsAtRow) of 1000 errors is reached

Reported core is not related to s-b.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants