Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect replica size after resource-definition auto-place #389

Open
0sorkon opened this issue Feb 15, 2024 · 11 comments
Open

Incorrect replica size after resource-definition auto-place #389

0sorkon opened this issue Feb 15, 2024 · 11 comments

Comments

@0sorkon
Copy link

0sorkon commented Feb 15, 2024

Hello
This is probably expected behavior but I would like to understand if this is normal and how it can be avoided. The situation is as follows. There is a linstor storage version 1.25.1 of two nodes. Storage pool on top of lvm-thin.

  1. Spawning resource:
    linstor rg spawn MyRG testres1 10G
  2. create an fs on it and mount it.
  3. Fill the resource with data:
    dd if=/dev/urandom of=/mnt/test/dd.file bs=1M count=5000 status=progress
    So far so good:

| Node1 | testres1 | DataPool | 0 | 1027 | /dev/drbd1027 | 5.13 GiB | InUse | UpToDate |
| Node2 | testres1 | DataPool | 0 | 1027 | /dev/drbd1027 | 5.13 GiB | Unused | UpToDate |

Delete the resource from one of the nodes:
linstor r delete Node2 testres1
Do auto-place and wait for sync:
linstor rd ap testres1
...
linstor r lv -r testres1

| Node1 | testres1 | DataPool | 0 | 1027 | /dev/drbd1027 | 5.13 GiB | InUse | UpToDate |
| Node2 | testres1 | DataPool | 0 | 1027 | /dev/drbd1027 | 10.00 GiB | Unused | UpToDate |

As we can see the size of both replicas does not match. On the second node it is equal to the allocated space. This can be fixed by running fstrim but this is not always possible especially in a virtualization environment when the guest file system does not support the discard option or it is not executed for other reasons. So the question is: what can I do in such a situation and how can I avoid it?

@ghernadi
Copy link
Contributor

What DRBD versions are in use here?

Also could you please attach from both nodes a dmesg | grep testres1. I'd like to know if DRBD ran a full-sync or not (but still, the actual DRBD version could be interesting)

@0sorkon
Copy link
Author

0sorkon commented Feb 15, 2024

DRBD version:

root@cloud:~# cat /proc/drbd
version: 9.2.6 (api:2/proto:86-122)

dmesg on Node1:

[Thu Feb 15 09:36:43 2024] drbd testres1: Starting worker thread (from drbdsetup [2328215])
[Thu Feb 15 09:36:43 2024] drbd testres1 Node2: Starting sender thread (from drbdsetup [2328220])
[Thu Feb 15 09:36:43 2024] drbd testres1/0 drbd1027: meta-data IO uses: blk-bio
[Thu Feb 15 09:36:43 2024] drbd testres1/0 drbd1027: disk( Diskless -> Attaching ) [attach]
[Thu Feb 15 09:36:43 2024] drbd testres1/0 drbd1027: Maximum number of peer devices = 7
[Thu Feb 15 09:36:43 2024] drbd testres1: Method to ensure write ordering: flush
[Thu Feb 15 09:36:43 2024] drbd testres1/0 drbd1027: drbd_bm_resize called with capacity == 20975152
[Thu Feb 15 09:36:43 2024] drbd testres1/0 drbd1027: resync bitmap: bits=2621894 words=286776 pages=561
[Thu Feb 15 09:36:43 2024] drbd testres1/0 drbd1027: size = 10 GB (10487576 KB)
[Thu Feb 15 09:36:43 2024] drbd testres1/0 drbd1027: recounting of set bits took additional 4ms
[Thu Feb 15 09:36:43 2024] drbd testres1/0 drbd1027: disk( Attaching -> UpToDate ) [attach]
[Thu Feb 15 09:36:43 2024] drbd testres1/0 drbd1027: attached to current UUID: BD93AC372CBCB7DA
[Thu Feb 15 09:36:43 2024] drbd testres1/0 drbd1027: Setting exposed data uuid: BD93AC372CBCB7DA
[Thu Feb 15 09:36:43 2024] drbd testres1/0 drbd1027: size = 10 GB (10487576 KB)
[Thu Feb 15 09:36:43 2024] drbd testres1 Node2: conn( StandAlone -> Unconnected ) [connect]
[Thu Feb 15 09:36:43 2024] drbd testres1 Node2: Starting receiver thread (from drbd_w_testres1 [2328216])
[Thu Feb 15 09:36:43 2024] drbd testres1 Node2: conn( Unconnected -> Connecting ) [connecting]
[Thu Feb 15 09:36:43 2024] drbd testres1 Node2: Handshake to peer 1 successful: Agreed network protocol version 122
[Thu Feb 15 09:36:43 2024] drbd testres1 Node2: Feature flags enabled on protocol level: 0x7f TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES RESYNC_DAGTAG
[Thu Feb 15 09:36:43 2024] drbd testres1 Node2: Peer authenticated using 20 bytes HMAC
[Thu Feb 15 09:36:43 2024] drbd testres1: Preparing cluster-wide state change 224631394 (0->1 499/146)
[Thu Feb 15 09:36:44 2024] drbd testres1/0 drbd1027 Node2: drbd_sync_handshake:
[Thu Feb 15 09:36:44 2024] drbd testres1/0 drbd1027 Node2: self BD93AC372CBCB7DA:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:20
[Thu Feb 15 09:36:44 2024] drbd testres1/0 drbd1027 Node2: peer BD93AC372CBCB7DA:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:20
[Thu Feb 15 09:36:44 2024] drbd testres1/0 drbd1027 Node2: uuid_compare()=no-sync by rule=both-off
[Thu Feb 15 09:36:44 2024] drbd testres1: State change 224631394: primary_nodes=0, weak_nodes=0
[Thu Feb 15 09:36:44 2024] drbd testres1: Committing cluster-wide state change 224631394 (36ms)
[Thu Feb 15 09:36:44 2024] drbd testres1 Node2: conn( Connecting -> Connected ) peer( Unknown -> Secondary ) [connected]
[Thu Feb 15 09:36:44 2024] drbd testres1/0 drbd1027 Node2: pdsk( DUnknown -> UpToDate ) repl( Off -> Established ) [connected]
[Thu Feb 15 09:36:44 2024] drbd testres1/0 drbd1027 Node2: cleared bm UUID and bitmap BD93AC372CBCB7DA:0000000000000000:0000000000000000:0000000000000000
[Thu Feb 15 09:36:44 2024] drbd testres1 Node2: Preparing remote state change 3483582802
[Thu Feb 15 09:36:44 2024] drbd testres1 Node2: Committing remote state change 3483582802 (primary_nodes=2)
[Thu Feb 15 09:36:44 2024] drbd testres1 Node2: peer( Secondary -> Primary ) [remote]
[Thu Feb 15 09:36:44 2024] drbd testres1 Node2: Preparing remote state change 2987984735
[Thu Feb 15 09:36:44 2024] drbd testres1 Node2: Committing remote state change 2987984735 (primary_nodes=0)
[Thu Feb 15 09:36:44 2024] drbd testres1 Node2: peer( Primary -> Secondary ) [remote]
[Thu Feb 15 09:39:12 2024] drbd testres1: Preparing cluster-wide state change 1619258910 (0->-1 3/1)
[Thu Feb 15 09:39:12 2024] drbd testres1: State change 1619258910: primary_nodes=1, weak_nodes=FFFFFFFFFFFFFFFC
[Thu Feb 15 09:39:12 2024] drbd testres1: Committing cluster-wide state change 1619258910 (0ms)
[Thu Feb 15 09:39:12 2024] drbd testres1: role( Secondary -> Primary ) [mkfs.ext4:2329613 auto-promote]
[Thu Feb 15 09:39:12 2024] drbd testres1/0 drbd1027: new current UUID: C127EDBE078D8DD7 weak: FFFFFFFFFFFFFFFC
[Thu Feb 15 09:39:12 2024] drbd testres1: Preparing cluster-wide state change 1011909654 (0->-1 3/2)
[Thu Feb 15 09:39:12 2024] drbd testres1: State change 1011909654: primary_nodes=0, weak_nodes=0
[Thu Feb 15 09:39:12 2024] drbd testres1: Committing cluster-wide state change 1011909654 (0ms)
[Thu Feb 15 09:39:12 2024] drbd testres1: role( Primary -> Secondary ) [mkfs.ext4:2329613 auto-demote]
[Thu Feb 15 09:39:40 2024] drbd testres1: Preparing cluster-wide state change 571488554 (0->-1 3/1)
[Thu Feb 15 09:39:40 2024] drbd testres1: State change 571488554: primary_nodes=1, weak_nodes=FFFFFFFFFFFFFFFC
[Thu Feb 15 09:39:40 2024] drbd testres1: Committing cluster-wide state change 571488554 (0ms)
[Thu Feb 15 09:39:40 2024] drbd testres1: role( Secondary -> Primary ) [mount:2329931 auto-promote]
[Thu Feb 15 09:45:34 2024] drbd testres1 Node2: Preparing remote state change 1269461190
[Thu Feb 15 09:45:34 2024] drbd testres1 Node2: Committing remote state change 1269461190 (primary_nodes=1)
[Thu Feb 15 09:45:34 2024] drbd testres1 Node2: conn( Connected -> TearDown ) peer( Secondary -> Unknown ) [remote]
[Thu Feb 15 09:45:34 2024] drbd testres1/0 drbd1027 Node2: pdsk( UpToDate -> DUnknown ) repl( Established -> Off ) [remote]
[Thu Feb 15 09:45:34 2024] drbd testres1 Node2: Terminating sender thread
[Thu Feb 15 09:45:34 2024] drbd testres1 Node2: Starting sender thread (from drbd_r_testres1 [2328236])
[Thu Feb 15 09:45:34 2024] drbd testres1 Node2: conn( TearDown -> Disconnecting ) [del-peer]
[Thu Feb 15 09:45:34 2024] drbd testres1 Node2: meta connection shut down by peer.
[Thu Feb 15 09:45:34 2024] drbd testres1 Node2: Connection closed
[Thu Feb 15 09:45:34 2024] drbd testres1 Node2: helper command: /sbin/drbdadm disconnected
[Thu Feb 15 09:45:34 2024] drbd testres1 Node2: helper command: /sbin/drbdadm disconnected exit code 0
[Thu Feb 15 09:45:34 2024] drbd testres1 Node2: conn( Disconnecting -> StandAlone ) [disconnected]
[Thu Feb 15 09:45:34 2024] drbd testres1 Node2: Terminating receiver thread
[Thu Feb 15 09:45:34 2024] drbd testres1 Node2: Terminating sender thread
[Thu Feb 15 09:45:34 2024] drbd testres1/0 drbd1027: bitmap WRITE of 560 pages took 8 ms
[Thu Feb 15 09:45:34 2024] drbd testres1 Node2: Starting sender thread (from drbdsetup [2336933])
[Thu Feb 15 09:45:34 2024] drbd testres1 Node2: conn( StandAlone -> Unconnected ) [connect]
[Thu Feb 15 09:45:34 2024] drbd testres1 Node2: Starting receiver thread (from drbd_w_testres1 [2328216])
[Thu Feb 15 09:45:34 2024] drbd testres1 Node2: conn( Unconnected -> Connecting ) [connecting]
[Thu Feb 15 09:45:35 2024] drbd testres1 Node2: conn( Connecting -> Disconnecting ) [del-peer]
[Thu Feb 15 09:45:35 2024] drbd testres1 Node2: Terminating sender thread
[Thu Feb 15 09:45:35 2024] drbd testres1 Node2: Starting sender thread (from drbd_r_testres1 [2336937])
[Thu Feb 15 09:45:35 2024] drbd testres1 Node2: Connection closed
[Thu Feb 15 09:45:35 2024] drbd testres1 Node2: helper command: /sbin/drbdadm disconnected
[Thu Feb 15 09:45:35 2024] drbd testres1 Node2: helper command: /sbin/drbdadm disconnected exit code 0
[Thu Feb 15 09:45:35 2024] drbd testres1 Node2: conn( Disconnecting -> StandAlone ) [disconnected]
[Thu Feb 15 09:45:35 2024] drbd testres1 Node2: Terminating receiver thread
[Thu Feb 15 09:45:35 2024] drbd testres1 Node2: Terminating sender thread
[Thu Feb 15 09:46:50 2024] drbd testres1 Node2: Starting sender thread (from drbdsetup [2340295])
[Thu Feb 15 09:46:50 2024] drbd testres1 Node2: conn( StandAlone -> Unconnected ) [connect]
[Thu Feb 15 09:46:50 2024] drbd testres1 Node2: Starting receiver thread (from drbd_w_testres1 [2328216])
[Thu Feb 15 09:46:50 2024] drbd testres1 Node2: conn( Unconnected -> Connecting ) [connecting]
[Thu Feb 15 09:46:50 2024] drbd testres1 Node2: Handshake to peer 1 successful: Agreed network protocol version 122
[Thu Feb 15 09:46:50 2024] drbd testres1 Node2: Feature flags enabled on protocol level: 0x7f TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES RESYNC_DAGTAG
[Thu Feb 15 09:46:50 2024] drbd testres1 Node2: Peer authenticated using 20 bytes HMAC
[Thu Feb 15 09:46:50 2024] drbd testres1: Preparing cluster-wide state change 2809173707 (0->1 499/145)
[Thu Feb 15 09:46:50 2024] drbd testres1/0 drbd1027 Node2: drbd_sync_handshake:
[Thu Feb 15 09:46:50 2024] drbd testres1/0 drbd1027 Node2: self C127EDBE078D8DD6:BD93AC372CBCB7DA:0000000000000000:0000000000000000 bits:2621440 flags:20
[Thu Feb 15 09:46:50 2024] drbd testres1/0 drbd1027 Node2: peer BD93AC372CBCB7DA:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:20
[Thu Feb 15 09:46:50 2024] drbd testres1/0 drbd1027 Node2: uuid_compare()=source-use-bitmap by rule=bitmap-self
[Thu Feb 15 09:46:50 2024] drbd testres1: State change 2809173707: primary_nodes=1, weak_nodes=FFFFFFFFFFFFFFFC
[Thu Feb 15 09:46:50 2024] drbd testres1: Committing cluster-wide state change 2809173707 (36ms)
[Thu Feb 15 09:46:50 2024] drbd testres1 Node2: conn( Connecting -> Connected ) peer( Unknown -> Secondary ) [connected]
[Thu Feb 15 09:46:50 2024] drbd testres1/0 drbd1027 Node2: pdsk( DUnknown -> Consistent ) repl( Off -> WFBitMapS ) [connected]
[Thu Feb 15 09:46:50 2024] drbd testres1/0 drbd1027 Node2: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 25(1), total 25; compression: 100.0%
[Thu Feb 15 09:46:50 2024] drbd testres1/0 drbd1027 Node2: pdsk( Consistent -> Outdated ) [peer-state]
[Thu Feb 15 09:46:50 2024] drbd testres1/0 drbd1027 Node2: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 25(1), total 25; compression: 100.0%
[Thu Feb 15 09:46:50 2024] drbd testres1/0 drbd1027 Node2: helper command: /sbin/drbdadm before-resync-source
[Thu Feb 15 09:46:50 2024] drbd testres1/0 drbd1027 Node2: helper command: /sbin/drbdadm before-resync-source exit code 0
[Thu Feb 15 09:46:50 2024] drbd testres1/0 drbd1027 Node2: pdsk( Outdated -> Inconsistent ) repl( WFBitMapS -> SyncSource ) [receive-bitmap]
[Thu Feb 15 09:46:50 2024] drbd testres1/0 drbd1027 Node2: Began resync as SyncSource (will sync 10485760 KB [2621440 bits set]).
[Thu Feb 15 09:48:38 2024] drbd testres1/0 drbd1027 Node2: updated UUIDs C127EDBE078D8DD6:0000000000000000:0000000000000000:0000000000000000
[Thu Feb 15 09:48:38 2024] drbd testres1/0 drbd1027 Node2: Resync done (total 107 sec; paused 0 sec; 97996 K/sec)
[Thu Feb 15 09:48:38 2024] drbd testres1/0 drbd1027 Node2: pdsk( Inconsistent -> UpToDate ) repl( SyncSource -> Established ) [resync-finished]
[Thu Feb 15 10:08:05 2024] drbd testres1: Preparing cluster-wide state change 411664249 (0->-1 3/2)
[Thu Feb 15 10:08:05 2024] drbd testres1: State change 411664249: primary_nodes=0, weak_nodes=0
[Thu Feb 15 10:08:05 2024] drbd testres1: Committing cluster-wide state change 411664249 (0ms)
[Thu Feb 15 10:08:05 2024] drbd testres1: role( Primary -> Secondary ) [umount:2353949 auto-demote]
[Thu Feb 15 10:08:38 2024] drbd testres1 Node2: Preparing remote state change 2367328672
[Thu Feb 15 10:08:38 2024] drbd testres1 Node2: Committing remote state change 2367328672 (primary_nodes=2)
[Thu Feb 15 10:08:38 2024] drbd testres1 Node2: peer( Secondary -> Primary ) [remote]

dmesg on Node2:

[Thu Feb 15 09:29:35 2024] drbd testres1: Starting worker thread (from drbdsetup [3596474])
[Thu Feb 15 09:29:35 2024] drbd testres1 Node1: Starting sender thread (from drbdsetup [3596479])
[Thu Feb 15 09:29:35 2024] drbd testres1/0 drbd1027: meta-data IO uses: blk-bio
[Thu Feb 15 09:29:35 2024] drbd testres1/0 drbd1027: disk( Diskless -> Attaching ) [attach]
[Thu Feb 15 09:29:35 2024] drbd testres1/0 drbd1027: Maximum number of peer devices = 7
[Thu Feb 15 09:29:35 2024] drbd testres1: Method to ensure write ordering: flush
[Thu Feb 15 09:29:35 2024] drbd testres1/0 drbd1027: drbd_bm_resize called with capacity == 20975152
[Thu Feb 15 09:29:35 2024] drbd testres1/0 drbd1027: resync bitmap: bits=2621894 words=286776 pages=561
[Thu Feb 15 09:29:35 2024] drbd testres1/0 drbd1027: size = 10 GB (10487576 KB)
[Thu Feb 15 09:29:35 2024] drbd testres1/0 drbd1027: recounting of set bits took additional 4ms
[Thu Feb 15 09:29:35 2024] drbd testres1/0 drbd1027: disk( Attaching -> UpToDate ) [attach]
[Thu Feb 15 09:29:35 2024] drbd testres1/0 drbd1027: attached to current UUID: BD93AC372CBCB7DA
[Thu Feb 15 09:29:35 2024] drbd testres1/0 drbd1027: Setting exposed data uuid: BD93AC372CBCB7DA
[Thu Feb 15 09:29:35 2024] drbd testres1/0 drbd1027: size = 10 GB (10487576 KB)
[Thu Feb 15 09:29:35 2024] drbd testres1 Node1: conn( StandAlone -> Unconnected ) [connect]
[Thu Feb 15 09:29:35 2024] drbd testres1 Node1: Starting receiver thread (from drbd_w_testres1 [3596475])
[Thu Feb 15 09:29:35 2024] drbd testres1 Node1: conn( Unconnected -> Connecting ) [connecting]
[Thu Feb 15 09:29:35 2024] drbd testres1 Node1: Handshake to peer 0 successful: Agreed network protocol version 122
[Thu Feb 15 09:29:35 2024] drbd testres1 Node1: Feature flags enabled on protocol level: 0x7f TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES RESYNC_DAGTAG
[Thu Feb 15 09:29:35 2024] drbd testres1 Node1: Peer authenticated using 20 bytes HMAC
[Thu Feb 15 09:29:35 2024] drbd testres1 Node1: Preparing remote state change 224631394
[Thu Feb 15 09:29:35 2024] drbd testres1/0 drbd1027 Node1: drbd_sync_handshake:
[Thu Feb 15 09:29:35 2024] drbd testres1/0 drbd1027 Node1: self BD93AC372CBCB7DA:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:20
[Thu Feb 15 09:29:35 2024] drbd testres1/0 drbd1027 Node1: peer BD93AC372CBCB7DA:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:20
[Thu Feb 15 09:29:35 2024] drbd testres1/0 drbd1027 Node1: uuid_compare()=no-sync by rule=both-off
[Thu Feb 15 09:29:35 2024] drbd testres1 Node1: Committing remote state change 224631394 (primary_nodes=0)
[Thu Feb 15 09:29:35 2024] drbd testres1 Node1: conn( Connecting -> Connected ) peer( Unknown -> Secondary ) [remote]
[Thu Feb 15 09:29:35 2024] drbd testres1/0 drbd1027 Node1: pdsk( DUnknown -> UpToDate ) repl( Off -> Established ) [remote]
[Thu Feb 15 09:29:35 2024] drbd testres1/0 drbd1027 Node1: cleared bm UUID and bitmap BD93AC372CBCB7DA:0000000000000000:0000000000000000:0000000000000000
[Thu Feb 15 09:29:36 2024] drbd testres1: Preparing cluster-wide state change 3483582802 (1->-1 3/1)
[Thu Feb 15 09:29:36 2024] drbd testres1: State change 3483582802: primary_nodes=2, weak_nodes=FFFFFFFFFFFFFFFC
[Thu Feb 15 09:29:36 2024] drbd testres1: Committing cluster-wide state change 3483582802 (0ms)
[Thu Feb 15 09:29:36 2024] drbd testres1: role( Secondary -> Primary ) [primary]
[Thu Feb 15 09:29:36 2024] drbd testres1: Preparing cluster-wide state change 2987984735 (1->-1 3/2)
[Thu Feb 15 09:29:36 2024] drbd testres1: State change 2987984735: primary_nodes=0, weak_nodes=0
[Thu Feb 15 09:29:36 2024] drbd testres1: Committing cluster-wide state change 2987984735 (0ms)
[Thu Feb 15 09:29:36 2024] drbd testres1: role( Primary -> Secondary ) [secondary]
[Thu Feb 15 09:32:04 2024] drbd testres1 Node1: Preparing remote state change 1619258910
[Thu Feb 15 09:32:04 2024] drbd testres1 Node1: Committing remote state change 1619258910 (primary_nodes=1)
[Thu Feb 15 09:32:04 2024] drbd testres1 Node1: peer( Secondary -> Primary ) [remote]
[Thu Feb 15 09:32:04 2024] drbd testres1/0 drbd1027 Node1: received new current UUID: C127EDBE078D8DD7 weak_nodes=FFFFFFFFFFFFFFFC
[Thu Feb 15 09:32:04 2024] drbd testres1 Node1: Preparing remote state change 1011909654
[Thu Feb 15 09:32:04 2024] drbd testres1 Node1: Committing remote state change 1011909654 (primary_nodes=0)
[Thu Feb 15 09:32:04 2024] drbd testres1 Node1: peer( Primary -> Secondary ) [remote]
[Thu Feb 15 09:32:32 2024] drbd testres1 Node1: Preparing remote state change 571488554
[Thu Feb 15 09:32:32 2024] drbd testres1 Node1: Committing remote state change 571488554 (primary_nodes=1)
[Thu Feb 15 09:32:32 2024] drbd testres1 Node1: peer( Secondary -> Primary ) [remote]
[Thu Feb 15 09:38:26 2024] drbd testres1: Preparing cluster-wide state change 1269461190 (1->0 496/16)
[Thu Feb 15 09:38:26 2024] drbd testres1: State change 1269461190: primary_nodes=1, weak_nodes=FFFFFFFFFFFFFFFE
[Thu Feb 15 09:38:26 2024] drbd testres1 Node1: Cluster is now split
[Thu Feb 15 09:38:26 2024] drbd testres1: Committing cluster-wide state change 1269461190 (0ms)
[Thu Feb 15 09:38:26 2024] drbd testres1 Node1: conn( Connected -> Disconnecting ) peer( Primary -> Unknown ) [down]
[Thu Feb 15 09:38:26 2024] drbd testres1/0 drbd1027: disk( UpToDate -> Outdated ) [down]
[Thu Feb 15 09:38:26 2024] drbd testres1/0 drbd1027 Node1: pdsk( UpToDate -> DUnknown ) repl( Established -> Off ) [down]
[Thu Feb 15 09:38:26 2024] drbd testres1 Node1: Terminating sender thread
[Thu Feb 15 09:38:26 2024] drbd testres1 Node1: Starting sender thread (from drbd_r_testres1 [3596497])
[Thu Feb 15 09:38:26 2024] drbd testres1 Node1: meta connection shut down by peer.
[Thu Feb 15 09:38:26 2024] drbd testres1 Node1: meta connection shut down by peer.
[Thu Feb 15 09:38:26 2024] drbd testres1 Node1: Connection closed
[Thu Feb 15 09:38:26 2024] drbd testres1 Node1: helper command: /sbin/drbdadm disconnected
[Thu Feb 15 09:38:26 2024] drbd testres1 Node1: helper command: /sbin/drbdadm disconnected exit code 0
[Thu Feb 15 09:38:26 2024] drbd testres1 Node1: conn( Disconnecting -> StandAlone ) [disconnected]
[Thu Feb 15 09:38:26 2024] drbd testres1 Node1: Terminating receiver thread
[Thu Feb 15 09:38:26 2024] drbd testres1 Node1: Terminating sender thread
[Thu Feb 15 09:38:26 2024] drbd testres1/0 drbd1027: disk( Outdated -> Detaching ) [down]
[Thu Feb 15 09:38:26 2024] drbd testres1/0 drbd1027: disk( Detaching -> Diskless ) [go-diskless]
[Thu Feb 15 09:38:26 2024] drbd testres1/0 drbd1027: drbd_bm_resize called with capacity == 0
[Thu Feb 15 09:38:26 2024] drbd testres1: Terminating worker thread
[Thu Feb 15 09:39:35 2024] drbd testres1: Starting worker thread (from drbdsetup [3608712])
[Thu Feb 15 09:39:35 2024] drbd testres1 Node1: Starting sender thread (from drbdsetup [3608717])
[Thu Feb 15 09:39:35 2024] drbd testres1/0 drbd1027: meta-data IO uses: blk-bio
[Thu Feb 15 09:39:35 2024] drbd testres1/0 drbd1027: disk( Diskless -> Attaching ) [attach]
[Thu Feb 15 09:39:35 2024] drbd testres1/0 drbd1027: Maximum number of peer devices = 7
[Thu Feb 15 09:39:35 2024] drbd testres1: Method to ensure write ordering: flush
[Thu Feb 15 09:39:35 2024] drbd testres1/0 drbd1027: drbd_bm_resize called with capacity == 20975152
[Thu Feb 15 09:39:35 2024] drbd testres1/0 drbd1027: resync bitmap: bits=2621894 words=286776 pages=561
[Thu Feb 15 09:39:35 2024] drbd testres1/0 drbd1027: size = 10 GB (10487576 KB)
[Thu Feb 15 09:39:35 2024] drbd testres1/0 drbd1027: recounting of set bits took additional 0ms
[Thu Feb 15 09:39:35 2024] drbd testres1/0 drbd1027: disk( Attaching -> UpToDate ) [attach]
[Thu Feb 15 09:39:35 2024] drbd testres1/0 drbd1027: attached to current UUID: BD93AC372CBCB7DA
[Thu Feb 15 09:39:35 2024] drbd testres1/0 drbd1027: Setting exposed data uuid: BD93AC372CBCB7DA
[Thu Feb 15 09:39:35 2024] drbd testres1/0 drbd1027: size = 10 GB (10487576 KB)
[Thu Feb 15 09:39:35 2024] drbd testres1 Node1: conn( StandAlone -> Unconnected ) [connect]
[Thu Feb 15 09:39:35 2024] drbd testres1 Node1: Starting receiver thread (from drbd_w_testres1 [3608713])
[Thu Feb 15 09:39:35 2024] drbd testres1 Node1: conn( Unconnected -> Connecting ) [connecting]
[Thu Feb 15 09:39:42 2024] drbd testres1 Node1: Handshake to peer 0 successful: Agreed network protocol version 122
[Thu Feb 15 09:39:42 2024] drbd testres1 Node1: Feature flags enabled on protocol level: 0x7f TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES RESYNC_DAGTAG
[Thu Feb 15 09:39:42 2024] drbd testres1 Node1: Peer authenticated using 20 bytes HMAC
[Thu Feb 15 09:39:42 2024] drbd testres1 Node1: Preparing remote state change 2809173707
[Thu Feb 15 09:39:42 2024] drbd testres1/0 drbd1027 Node1: drbd_sync_handshake:
[Thu Feb 15 09:39:42 2024] drbd testres1/0 drbd1027 Node1: self BD93AC372CBCB7DA:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:20
[Thu Feb 15 09:39:42 2024] drbd testres1/0 drbd1027 Node1: peer C127EDBE078D8DD6:BD93AC372CBCB7DA:0000000000000000:0000000000000000 bits:2621440 flags:1020
[Thu Feb 15 09:39:42 2024] drbd testres1/0 drbd1027 Node1: uuid_compare()=target-use-bitmap by rule=bitmap-peer
[Thu Feb 15 09:39:42 2024] drbd testres1 Node1: Committing remote state change 2809173707 (primary_nodes=1)
[Thu Feb 15 09:39:42 2024] drbd testres1 Node1: conn( Connecting -> Connected ) peer( Unknown -> Primary ) [remote]
[Thu Feb 15 09:39:42 2024] drbd testres1/0 drbd1027: disk( UpToDate -> Outdated ) [remote]
[Thu Feb 15 09:39:42 2024] drbd testres1/0 drbd1027 Node1: pdsk( DUnknown -> UpToDate ) repl( Off -> WFBitMapT ) [remote]
[Thu Feb 15 09:39:42 2024] drbd testres1/0 drbd1027: Setting exposed data uuid: C127EDBE078D8DD6
[Thu Feb 15 09:39:42 2024] drbd testres1/0 drbd1027 Node1: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 25(1), total 25; compression: 100.0%
[Thu Feb 15 09:39:42 2024] drbd testres1/0 drbd1027 Node1: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 25(1), total 25; compression: 100.0%
[Thu Feb 15 09:39:42 2024] drbd testres1/0 drbd1027 Node1: helper command: /sbin/drbdadm before-resync-target
[Thu Feb 15 09:39:42 2024] drbd testres1/0 drbd1027 Node1: helper command: /sbin/drbdadm before-resync-target exit code 0
[Thu Feb 15 09:39:42 2024] drbd testres1/0 drbd1027: disk( Outdated -> Inconsistent ) [receive-bitmap]
[Thu Feb 15 09:39:42 2024] drbd testres1/0 drbd1027 Node1: repl( WFBitMapT -> SyncTarget ) [receive-bitmap]
[Thu Feb 15 09:39:42 2024] drbd testres1/0 drbd1027 Node1: Began resync as SyncTarget (will sync 10485760 KB [2621440 bits set]).
[Thu Feb 15 09:41:30 2024] drbd testres1/0 drbd1027 Node1: Resync done (total 107 sec; paused 0 sec; 97996 K/sec)
[Thu Feb 15 09:41:30 2024] drbd testres1/0 drbd1027 Node1: updated UUIDs C127EDBE078D8DD6:0000000000000000:0000000000000000:0000000000000000
[Thu Feb 15 09:41:30 2024] drbd testres1/0 drbd1027: disk( Inconsistent -> UpToDate ) [resync-finished]
[Thu Feb 15 09:41:30 2024] drbd testres1/0 drbd1027 Node1: repl( SyncTarget -> Established ) [resync-finished]
[Thu Feb 15 09:41:30 2024] drbd testres1/0 drbd1027 Node1: helper command: /sbin/drbdadm after-resync-target
[Thu Feb 15 09:41:30 2024] drbd testres1/0 drbd1027 Node1: helper command: /sbin/drbdadm after-resync-target exit code 0
[Thu Feb 15 10:00:56 2024] drbd testres1 Node1: Preparing remote state change 411664249
[Thu Feb 15 10:00:56 2024] drbd testres1 Node1: Committing remote state change 411664249 (primary_nodes=0)
[Thu Feb 15 10:00:56 2024] drbd testres1 Node1: peer( Primary -> Secondary ) [remote]
[Thu Feb 15 10:01:30 2024] drbd testres1: Preparing cluster-wide state change 2367328672 (1->-1 3/1)
[Thu Feb 15 10:01:30 2024] drbd testres1: State change 2367328672: primary_nodes=2, weak_nodes=FFFFFFFFFFFFFFFC
[Thu Feb 15 10:01:30 2024] drbd testres1: Committing cluster-wide state change 2367328672 (0ms)
[Thu Feb 15 10:01:30 2024] drbd testres1: role( Secondary -> Primary ) [mount:3632705 auto-promote]

@ghernadi
Copy link
Contributor

I see. Let me explain a bit of how DRBD works:

DRBD has 2 "initial sync" options, one is the "full-sync", which simply sends all the data to the new peer, and the "partial-sync". The latter is what you expect - only send the changes to the new peer.

However, the point you are missing here is what counts as a change.

  1. create an fs on it and mount it.

If you create the FS with default settings, the FS will discard the whole device during the mkfs command. That does indeed count already as a change for DRBD. Although the FS might trim afterwards, freeing up the space on the thinly backed disc afterwards, DRBD already marked the entire disc as having changed.
This can be seen in this line:

[Thu Feb 15 09:39:42 2024] drbd testres1/0 drbd1027 Node1: peer C127EDBE078D8DD6:BD93AC372CBCB7DA:0000000000000000:0000000000000000 bits:2621440 flags:1020

Those 2621440 bits (each bit covering 4K) match the capacity of 10G. So DRBD saw every single block changed since its creation.

That is a rough summary of why your partial-sync behaves like a full-sync.

You can avoid this issue, but creating the FS without the initial discard, using mkfs.xfs -K or mkfs.ext4 -E nodiscard.

Another thing you can check if the property DrbdOptions/Disk/rs-discard-granularity is properly set on volume-definition list-property level. If not, please check if DrbdOptions/auto-rs-discard-granularity is set to False - but that could be on any level (VD, RD, VG, RG, C). If it is set to False on any level, removing the property should give you the DrbdOptions/Disk/rs-discard-granularity property back.
This property should instruct DRBD to convert zero-block syncs to discard operations, not allocating new space on the target.

@0sorkon
Copy link
Author

0sorkon commented Feb 15, 2024

Thank you for the explanation.
The mkfs command is just an example. It does not affect the result because in practical terms, a cloned VM image is stored on the DRBD device and no mkfs command is used to deploy it.
I also checked that the auto-rs-discard-granularity option does not appear anywhere and DrbdOptions/Disk/rs-discard-granularity is set to 1048576 (apparently the default value) on all resources including the test resource.
Do I understand correctly that with this value, if a 1048576 bytes block has all zeros, then such a block is ignored for synchronization. If so, what should be the value in my situation?

@0sorkon
Copy link
Author

0sorkon commented Feb 15, 2024

Also for volume-defenition I see the option DrbdOptions/Disk/discard-zeroes-if-aligned yes. Shouldn't it make a difference?

@ghernadi
Copy link
Contributor

Do I understand correctly that with this value, if a 1048576 bytes block has all zeros, then such a block is ignored for synchronization. If so, what should be the value in my situation?

Yes, but only in combination with this property:

┊ DrbdOptions/Disk/discard-zeroes-if-aligned ┊ yes              ┊

(just wanted to ask you to confirm it, but you just did :) )

Can you also verify if those values exist in the corresponding .res file on all diskful nodes?
If so, please also run a drbdadm -vvv adjust <resource_name> and see if that tries (and fails?) to set some rs-discard-granularity property.

@0sorkon
Copy link
Author

0sorkon commented Feb 15, 2024

Yes, this values is present in res file.

root@cloud:~# grep discard /var/lib/linstor.d/testres1.res
discard-zeroes-if-aligned yes;
rs-discard-granularity 1048576;
discard-zeroes-if-aligned yes;
rs-discard-granularity 1048576;

but command drbdadm -vvv adjust testres1 does nothing. No output or log entries on both nodes. And allocated size sill the same

@ghernadi
Copy link
Contributor

The drbdadm adjust was not supposed to change the allocated size, but to check if the sync is configured properly. But no output means that everything is configured within DRBD as stated in the config file, which is usually good, but I am not fully sure why then the sync still synced all blocks instead of discarding more or less half of the disc.

Did you check the drdbadm -vvv adjust testres1 command on both nodes?

@0sorkon
Copy link
Author

0sorkon commented Feb 16, 2024

Yes, I executed the command on both nodes.
Can you reproduce this problem? Maybe if it can be reproduced in a different environment from mine we can understand more about why this is happening

@ghernadi
Copy link
Contributor

That was also my thought, so I tested this issue with the same version of LINSTOR and DRBD as you have, but could not reproduce this issue, regardless if I use -E nodiscard or not when creating the filesystem..

Can you compare your setup with my test? Or feel free to do it the other way around, and show me the full reproducer on your side, including in the end the following:

  • drbdsetup show testres1 (from both nodes)
  • lvs (from both nodes)
  • lsblk -D (from both nodes)

(Feel free to filter the output of the commands to the relevant parts)

My test:
linstor c sp DrbdOptions/auto-add-quorum-tiebreaker False
linstor n c bravo
linstor n c charlie

# tested on both nodes with no specific "--chunk-size" and also with different values, always with the same result
ssh root@bravo lvcreate --size 5G -T scratch/thin --chunk-size 2M
ssh root@charlie lvcreate --size 5G -T scratch/thin --chunk-size 2M

linstor n l
linstor sp c lvmthin bravo lvmthinpool scratch/thin
linstor sp c lvmthin charlie lvmthinpool scratch/thin
linstor rd c rsc
linstor vd c rsc 2G
linstor r c bravo rsc -s lvmthinpool
linstor r c charlie rsc -s lvmthinpool

# again, tested with and without "-E nodiscard", in both cases with the same result
ssh root@bravo mkfs.ext4 /dev/drbd1000
ssh root@bravo mkdir -p /mnt/ext4
ssh root@bravo mount /dev/drbd1000 /mnt/ext4
ssh root@bravo dd if=/dev/urandom bs=1M count=1024 of=/mnt/ext4/test oflag=direct
linstor --no-utf8 --no-color v l -a
+----------------------------------------------------------------------------------------------------+
| Node    | Resource | StoragePool | VolNr | MinorNr | DeviceName    | Allocated | InUse  |    State |
|====================================================================================================|
| bravo   | rsc      | lvmthinpool |     0 |    1000 | /dev/drbd1000 |  1.11 GiB | InUse  | UpToDate |
| charlie | rsc      | lvmthinpool |     0 |    1000 | /dev/drbd1000 |  1.11 GiB | Unused | UpToDate |
+----------------------------------------------------------------------------------------------------+

linstor r d charlie rsc
linstor rd ap rsc
linstor --no-utf8 --no-color v l -a
+--------------------------------------------------------------------------------------------------------+
| Node    | Resource | StoragePool | VolNr | MinorNr | DeviceName    | Allocated | InUse  |        State |
|========================================================================================================|
| bravo   | rsc      | lvmthinpool |     0 |    1000 | /dev/drbd1000 |  1.11 GiB | InUse  |     UpToDate |
| charlie | rsc      | lvmthinpool |     0 |    1000 | /dev/drbd1000 |  3.90 MiB | Unused | Inconsistent |
+--------------------------------------------------------------------------------------------------------+

ssh root@bravo drbdadm wait-sync rsc
linstor --no-utf8 --no-color v l -a
+----------------------------------------------------------------------------------------------------+
| Node    | Resource | StoragePool | VolNr | MinorNr | DeviceName    | Allocated | InUse  |    State |
|====================================================================================================|
| bravo   | rsc      | lvmthinpool |     0 |    1000 | /dev/drbd1000 |  1.11 GiB | InUse  | UpToDate |
| charlie | rsc      | lvmthinpool |     0 |    1000 | /dev/drbd1000 |  1.02 GiB | Unused | UpToDate |
+----------------------------------------------------------------------------------------------------+

sleep 3.0s
linstor --no-utf8 --no-color v l -a
+----------------------------------------------------------------------------------------------------+
| Node    | Resource | StoragePool | VolNr | MinorNr | DeviceName    | Allocated | InUse  |    State |
|====================================================================================================|
| bravo   | rsc      | lvmthinpool |     0 |    1000 | /dev/drbd1000 |  1.11 GiB | InUse  | UpToDate |
| charlie | rsc      | lvmthinpool |     0 |    1000 | /dev/drbd1000 |  1.02 GiB | Unused | UpToDate |
+----------------------------------------------------------------------------------------------------+

@0sorkon
Copy link
Author

0sorkon commented Feb 20, 2024

This is strange, but having done a new deployment in a test environment, I was unable to reproduce the problem. Despite the fact that all deployments were done in the same one-to-one manner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants