double cluster with schema changes test is too heavy #10177

soyacz · 2025-02-24T07:40:21Z

Packages

Scylla version: 2025.1.0~rc2-20250216.6ee17795783f with build-id 8fc682bcfdf0a8cd9bc106a5ecaa68dce1c63ef6

Kernel Version: 6.8.0-1021-aws

Issue description

This issue is a regression.
It is unknown if this issue is a regression.

Looks like this test is too heavy for db cluster - especially on disk. I tried it with i4i.4xlarge and still having very high cluster usage (in regards to CPU it was very similar to i3en.xlarge, but disk throughput was very high). Reposting my analysis:

User profile has set for clustering key setting cluster: uniform(1..100). This way each insert is actually 50 rows insert (in average) per operation see c-s result:

Results:                                                                                                                                                                                                                                     │
Op rate                   :      542 op/s  [insert: 542 op/s]                                                                                                                                                                                │
Partition rate            :      542 pk/s  [insert: 542 pk/s]                                                                                                                                                                                │
Row rate                  :   27,349 row/s [insert: 27,349 row/s]

And because rows are big (5 columns, 1024 bytes each) and 2 MV and 2 indexes makes this test very difficult for disk (when c-s error happened, there was around 950MB/s of reads and 850MB/s disk write load) for given io_properties reaching read 3GB/s and write 2GB/s (which I think compete with each other, along with iops). loaders still was very quiet (~15% cpu load).

In my opinion disk is too heavy loaded, we should reduce the column sizes or row count per pk.

Impact

The test is never passing and it's hard to track regressions with it (it never passed in history due overload or enospc)

How frequently does it reproduce?

100%

Installation details

Cluster size: 3 nodes (i4i.4xlarge)

Scylla Nodes used in this run:

raft-double-cluster-size-with-schem-db-node-558d26a2-5 (34.247.189.90 | 10.4.3.175) (shards: 14)
raft-double-cluster-size-with-schem-db-node-558d26a2-4 (3.251.77.18 | 10.4.2.136) (shards: 14)
raft-double-cluster-size-with-schem-db-node-558d26a2-3 (3.250.12.201 | 10.4.0.80) (shards: 14)
raft-double-cluster-size-with-schem-db-node-558d26a2-2 (34.248.47.253 | 10.4.1.137) (shards: 14)
raft-double-cluster-size-with-schem-db-node-558d26a2-1 (3.250.129.165 | 10.4.0.74) (shards: 14)

OS / Image: ami-00dd94413096f919a (aws: undefined_region)

Test: longevity-double-cluster-with-schema-changes-vnodes-test
Test id: 558d26a2-a8eb-4110-8dce-c7d95875dd8c
Test name: scylla-2025.1/vnodes/tier2/longevity-double-cluster-with-schema-changes-vnodes-test
Test method: longevity_test.LongevityTest.test_custom_time
Test config file(s):

longevity-change-cluster-size-by-2-times.yaml

Logs and commands

Restore Monitor Stack command: $ hydra investigate show-monitor 558d26a2-a8eb-4110-8dce-c7d95875dd8c
Restore monitor on AWS instance using Jenkins job
Show all stored logs command: $ hydra investigate show-logs 558d26a2-a8eb-4110-8dce-c7d95875dd8c

Logs:

db-cluster-558d26a2.tar.zst - https://cloudius-jenkins-test.s3.amazonaws.com/558d26a2-a8eb-4110-8dce-c7d95875dd8c/20250220_171425/db-cluster-558d26a2.tar.zst
sct-runner-events-558d26a2.tar.zst - https://cloudius-jenkins-test.s3.amazonaws.com/558d26a2-a8eb-4110-8dce-c7d95875dd8c/20250220_171425/sct-runner-events-558d26a2.tar.zst
sct-558d26a2.log.tar.zst - https://cloudius-jenkins-test.s3.amazonaws.com/558d26a2-a8eb-4110-8dce-c7d95875dd8c/20250220_171425/sct-558d26a2.log.tar.zst
loader-set-558d26a2.tar.zst - https://cloudius-jenkins-test.s3.amazonaws.com/558d26a2-a8eb-4110-8dce-c7d95875dd8c/20250220_171425/loader-set-558d26a2.tar.zst
monitor-set-558d26a2.tar.zst - https://cloudius-jenkins-test.s3.amazonaws.com/558d26a2-a8eb-4110-8dce-c7d95875dd8c/20250220_171425/monitor-set-558d26a2.tar.zst
builder-558d26a2.log.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/558d26a2-a8eb-4110-8dce-c7d95875dd8c/upload_20250220_171558/builder-558d26a2.log.tar.gz

Jenkins job URL
Argus

The text was updated successfully, but these errors were encountered:

github-actions bot assigned soyacz Feb 24, 2025

fruch assigned aleksbykov and temichus and unassigned soyacz Feb 24, 2025

temichus assigned timtimb0t Feb 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

double cluster with schema changes test is too heavy #10177

double cluster with schema changes test is too heavy #10177

soyacz commented Feb 24, 2025

Logs:

double cluster with schema changes test is too heavy #10177

double cluster with schema changes test is too heavy #10177

Comments

soyacz commented Feb 24, 2025

Packages

Issue description

Impact

How frequently does it reproduce?

Installation details

Logs: