Skip to content

Commit

Permalink
Fix dual-channel replication test under valgrind (#904)
Browse files Browse the repository at this point in the history
Test dual-channel-replication primary gets cob overrun during replica
rdb load` fails during the Valgrind run. This is due to the load
handlers disconnecting before the tests complete, resulting in a low
primary COB. Increasing the handlers' timeout should resolve this issue.

Failure:
https://github.com/valkey-io/valkey/actions/runs/10361286333/job/28681321393

Server logs reveals that the load handler clients were disconnected
before the test started

Also the two previus test took about 20 seconds which is the handler
timeout.

---------

Signed-off-by: naglera <anagler123@gmail.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
  • Loading branch information
naglera and madolson committed Sep 3, 2024
1 parent b6a1104 commit 7fe17df
Showing 1 changed file with 6 additions and 6 deletions.
12 changes: 6 additions & 6 deletions tests/integration/dual-channel-replication.tcl
Original file line number Diff line number Diff line change
Expand Up @@ -697,9 +697,9 @@ start_server {tags {"dual-channel-replication external:skip"}} {
set replica_log [srv 0 stdout]
set replica_pid [srv 0 pid]

set load_handle0 [start_write_load $primary_host $primary_port 20]
set load_handle1 [start_write_load $primary_host $primary_port 20]
set load_handle2 [start_write_load $primary_host $primary_port 20]
set load_handle0 [start_write_load $primary_host $primary_port 60]
set load_handle1 [start_write_load $primary_host $primary_port 60]
set load_handle2 [start_write_load $primary_host $primary_port 60]

$replica config set dual-channel-replication-enabled yes
$replica config set loglevel debug
Expand All @@ -709,22 +709,22 @@ start_server {tags {"dual-channel-replication external:skip"}} {
# Pause primary main process after fork
$primary debug pause-after-fork 1
$replica replicaof $primary_host $primary_port
wait_for_log_messages 0 {"*Done loading RDB*"} 0 2000 1
wait_for_log_messages 0 {"*Done loading RDB*"} 0 1000 10

# At this point rdb is loaded but psync hasn't been established yet.
# Pause the replica so the primary main process will wake up while the
# replica is unresponsive. We expect the main process to fill the COB and disconnect the replica.
pause_process $replica_pid
wait_and_resume_process -1
$primary debug pause-after-fork 0
wait_for_log_messages -1 {"*Client * closed * for overcoming of output buffer limits.*"} $loglines 2000 1
wait_for_log_messages -1 {"*Client * closed * for overcoming of output buffer limits.*"} $loglines 1000 10
wait_for_condition 50 100 {
[string match {*replicas_waiting_psync:0*} [$primary info replication]]
} else {
fail "Primary did not free repl buf block after sync failure"
}
resume_process $replica_pid
set res [wait_for_log_messages -1 {"*Unable to partial resync with replica * for lack of backlog*"} $loglines 20000 1]
set res [wait_for_log_messages -1 {"*Unable to partial resync with replica * for lack of backlog*"} $loglines 2000 10]
set loglines [lindex $res 1]
}
$replica replicaof no one
Expand Down

0 comments on commit 7fe17df

Please sign in to comment.