forked from LINBIT/drbd
-
Notifications
You must be signed in to change notification settings - Fork 0
/
ChangeLog
658 lines (628 loc) · 33.8 KB
/
ChangeLog
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
Latest:
------
For even more detail, use "git log" or visit
https://github.com/LINBIT/drbd-9.0/commits/drbd-9.0
9.0.27-1 (api:genl2/proto:86-118/transport:14)
--------
* Fix regression: allow live migration between two diskful peers again
9.0.26-1 (api:genl2/proto:86-118/transport:14)
--------
* fix a source of possible data corruption; related to a resync and
a primary node that is connected to the sync-source node only
* fix for writes not getting mirrored over a connection while the primary
transitions through the WFBitMapS state
* complete size 0 reads immediately; some workloads (KVM and
iscsi targets) in combination with a ZFS zvol as the backend can lead to
a kernel OOPS in ZFS; this is a workaround in DRBD for that
* fix a crash if during resync a discard operation fails on the
resync-target node
* fix a case of a disk unexpectedly becoming Outdated by moving the
exchange of the initial packets into the body of the two-phase-commit
that happens at a connect
* fix for sporadic "Clearing bitmap UUID for node" log entries;
a potential source of problems later on leading to false split-brain
or unrelated data messages.
* retry connect properly in case of bitmap-uuid changes during the handshake
* completed missing logic of the new two-phase-commit based connect process;
avoid connecting partitions with a primary in each; ensure consistent
decisions if the connect attempt will be retried
* fix an unexpected occurrence of NetworkFailure state in a tight
drbdsetup disconnect; drbdsetup connect sequence
* fix online verify to return to Established from VerifyS if the VerifyT node
was temporarily Inconsistent during the run
* fix a corner case where a node ends up Outdated after the crash and rejoin
of a primary node
* pause a resync if the sync-source node becomes inconsistent; an example
is a cascading resync where the upstream resync aborts and leaves the
sync-source node for the downstream resync with an inconsistent disk;
note, the node at the end of the chain could still have an outdated disk
(better than inconsistent)
* reduce lock contention on the secondary for many resources; can improve
performance significantly
* fix online verify to not clamp disk states to UpToDate
* fix promoting resync-target nodes; the problem was that it could modify
the bitmap of an ongoing resync; which leads to alarming log messages
* allow force primary on a sync-target node by breaking the resync
* fix adding of new volumes to resources with a primary node
* reliably detect split brain situation on both nodes
* improve error reporting for failures during attach
* implement 'blockdev --setro' in DRBD
* following upstream changes to DRBD up to Linux 5.10 and ensure
compatibility with Linux 5.8, 5.9, and 5.10
9.0.25-1 (api:genl2/proto:86-117/transport:14)
--------
* fix a missed resync after regaining quorum if a node gets disconnected
while (or shortly before) losing quorum
* fix a race condition between receiving UUIDs and finishing a resync
that can lead to a false-positive split-brain detection later on
* fix access after free of peer_req objects, that only happened when
a resync target node is paused sync source at the same time
* fix abortion of local state changes in case they can not proceed due
to loss of connection
* fix corner cases with reconciliation resync and parallel promote
* fix a race that can lead to a stuck resync when a resync goes
into pause (e.g. because the resync source becomes a resync target
from another node)
* fix an issue establishing a connection when the multipath feature is
used to connect to a stacked resource without a dedicated service IP
* fix sometimes a peer-disk state to another resync-target staying Outdated
after two resyncs from the same sync source node finish
* fix transition to paused-resync-state when an other connection continues
its resync operation
* fix an (unlikely) deadlock while establishing a connection
* deactivate the kref_debug code, it has performance implicatios
* Introduce the "disconnected" hander; it receives the last connection
state in the evnironment variable DRBD_CSTATE
9.0.24-1 (api:genl2/proto:86-117/transport:14)
--------
* fix deadlock when connecting drbd-9 to drbd-8.4 and the drbd-9
side becomes sync-source
* fix an issue with 3 (or more) node configurations; with a diskless node
and two storage nodes; if one of the storage nodes was hard rebooted
and came back and the diskless got primary and did not issue write
requests and the returning storage node established a connection with
the surviving storage node first, DRBD failed to upgrade the disk
state to UpToDate after the resync
* detect split-brain situations also when both nodes are primary;
this is how it was in drbd-8.4; up to now drbd-9 did not realize
the split-brain since it complains about the not allowed dual
primary first; for this change a new protocol version was necessary
* verified it compiles with Linux 5.7
9.0.23-1 (api:genl2/proto:86-116/transport:14)
--------
* fix a deadlock (regression introduced in 9.0.22) that can happen when
new current UUID is generated while a connection gets established
* Do not create a new data generation if the node has
'allow-remote-read = no' set, is primary, and the local disk fails
(because it has no access to good data anyome)
* fix a deadlock (regression introduced in 9.0.22) that can be
triggered if a minor is added into a resource with an established
connection
* generate new UUID immediately if a primary loses a disk due to an IO
error
* fix read requests on diskless nodes that hit a read error on a
diskful node; the retry on another diskful node works, but a bug
could lead to a log-storm on the diskless node
* fix removal of diskless nodes from clusters with quorum enabled
(initiated from the diskless itself)
* fix wrongly declined state changes if connections are established
concurrently
* fix continuation of initial resync; before that the initial resync
always started from the beginning if it was interrupted
* use rwsem _non_owner() operations to avoid false positives of
lock-dep when running on a debug kernel
* fix a sometimes missed resync if only a diskless node was primary
since the day0 UUID
* fix a corner case where a SyncSource node does not recognise
that a SyncTarget node declared the resync as finished
* update compat up to Linux 5.6
9.0.22-1 (api:genl2/proto:86-116/transport:14)
--------
* introduce locking to avoid connection retries when UUIDs or
relevant flags change during the exchange of this data
* improve serialization of events after loosing a primary
* fix a constraint in sanitize state that can caused a promote to be
deliced by some other node
* fix a case of a false positive detection of a split brain condition
* allow a resync target to switch to the resync source with less
bits out of sync
* fix bitmap UUID after resync to use current UUID from self rather
than sync source
* fix pushing bitmap UUID into history when changed
* fix regression introduced with 9.0.20, that can cause a missed
resync after a reconciliation resync
* fix regression introduced with 9.0.20, that can cause a missed
resync after a promote on a diskless node
* fix UUID handling in case a node promotes during (a short)
reconciliation resync
* fix removing of a diskless node when quorum is enabled
9.0.21-1 (api:genl2/proto:86-116/transport:14)
--------
* fix compat for write same on linux 4.9 and the Debian users
* fix kernel compat for linux 4.8 and 4.9; this mainly affected Debian
users; The symptoms where slow resync and resync getting stuck always at
the same point
* enable resync of lost and re-created backing devices (think lost node) when
the backing device was thinly provisioned and its current uuid is pre-set
to a 'day0 UUID' (by LINSTOR); that works by copying a unused bitmap slot
which tracks all changes since day 0
* fix attach when bitmap is on PMEM; before it was set to
'all blocks out-of-sync' upon attach
* avoid doing reconciliation resync multiple times by updating the
resync target's dagtag after if completed successfully
* return disk-state from Outdated to UpToDate when it loses connection
while in WFBitMapT and we have a stable and UpToDate peer
* new option --force-resync flag can be passed to new-current-uuid, that
can be used to trigger initial resync without touching the role
9.0.20-1 (api:genl2/proto:86-115/transport:14)
--------
* fix a case of false split brain detection if a diskless node promotes
multiple times, by aligning the rules for generating a new current-UUID
on a diskless nodes with them on a node with disk
* check if we still have quorum by exchanging a drbd-ping with peers
before creating new current UUID after loosing one peer
* fix after weak handling to not interfere with reconciliation resyncs
* retry connect when one of the relevant flags changes during UUID exchange
* fix reconciliation resync if one of the secondary got an current-UUID update
* fix resync to make progress after it was paused by an other resync operation
* fix false split-brains when a resync source changes current-UUID during
resync operation
* fix restore of D_OUTDATED when the resource first only attached and
then the peer objects are created (in contrast to the usual, new-peer,
attach, connect)
* abort creating new current-UUID when writing to meta-data fails in
the moment where the new-current-UUID should be written
* removed DRBD marking itself as read-only when secondary; this flag
was exposed using the BLKROGET ioctl; that should be left to user-land
use; recent KVM checks that flag, and does not dare auto-promote when
set by DRBD
* fix a small memory-leak when creating peer devices
* fix a possible information leak of kernel memory that pads net-link packets
* completing implications of "allow-remote-read=no"; i.e. when not to
create a new-current-UUID as read-write access to the data set was lost;
also fail both reads and writes if reads are no longer possible
* new option value "rr-conflict=retry-connect"; that helps in scenarios with
quorum where stopping a service takes longer than a temporarily network
outage and DRBD's reconnect
* code cleanups, introduced enums for remaining magic numbers
* new kernel-backward-compatibility framework based on spatch/coccinelle,
replacing an unmaintainable moloch of C preprocessor hell; Merged the
complete kernel-compat submodule
* ships with pre-computed compat-patches for main distros' kernels; in case
an other kernel is found it tries to use local spatch, if that is not
installed the build process tries to use a LINBIT hosted web service
to create the compat patch ("spatch-as-a-service").
* compat with up to Linux-5.3-rc2
9.0.19-1 (api:genl2/proto:86-115/transport:14)
--------
* check on CAP_SYS_ADMIN instead of CAP_NET_ADMIN for certain operations
* fix detection of unstable resync
* fix possible stuck resync when resync started from an other secondary
and later continued from a primary
* fix NULL dereference with disk-timeout enabled; was introduced in 9.0.9
* retry connect when own current UUID changes during UUID exchange
* fix quorum tie-breaker diskless logic for settings other than "majority"
* disable quorum tie-breaker for 0 voters
* fix dax_direct_access() error return check
* fix resync stuck at near completion; bug was intorduces with 9.0.17
* unblock IO when on-quorum-lost policy is changed (suspend -> io-error)
* introduce allow-remote-read configuration option; set it to "no" for
DR links you only want to write, but never read
* only complain about UUID changes during initial handshake
9.0.18-1 (api:genl2/proto:86-115/transport:14)
--------
* Fix an IO deadlock under memory pressure
* Fix disconnect timing in case the network connection suddenly
drops all packets
* Fix some misbehavior that surfaced with Ahead/Behind
* Fix potential spinlock deadlock in IRQ
* Minor fixes: forget-peer, _rcu iterators
* Quickly stop resync during AHEAD/BEHIND by introducing new
packet for that purpose.
* The quorum feature can now use the connectivity to
the majority of Diskless nodes as tiebreaker
* Access meta-data using DAX if it is on persistent memory
(NVDIMM or PMEM); For write intense workloads this is a x2 to x4 speedup!
9.0.17-1 (api:genl2/proto:86-114/transport:14)
--------
* Fix UUID handling of a diskless primary that has no peer with
usable data may not touch the current UUID
* Fix resync-after dependencies; cross-resource dependencies
and missing resources
* Fix resync when the sync source suddenly connects to a more recent
data set via an other connection and becomes sync target on that
other connection; pause first resync; fix wrong display of negative
resync progress percentage in this case
* Fix volume numbers between 32767 and 65534
* Fix the data integrity implementation; it was broken since drbd-9.0
and reported only false positives
* Fix for a corner-case when a promote action happens concurrently with
a reconciliation resync
* Improve resync code to be able to fully utilize fast storage
backend devices and fast networks with resync traffic; as a side
effect the settling time of the resync controller got shorter in
for most cases
* Show in the user-visible message who the opener is if if demote/down
fails doe to someone holding a drbd device open
* docker file for a "load drbd module container" and allow to disable
user-mode-helpers, which is necessary for this container
* compat for v5.0 kernel
9.0.16-1 (api:genl2/proto:86-114/transport:14)
--------
* Fix regression (introduced with 9.0.15) in handling request timeouts;
all pending requests always considered as overdue when the timer function
was executed; this led to false positives in detecting timeouts
* Fix a possible distributed loop when establishing a connection
* Fix a corner case in case a resync "overtakes" an other one
* Fix clearing of the PRIMARY_LOST_QUORUM flag
* Check peers (to ensure quorum is not lost) before generating new current
UUID after loosing a node
* In case the locally configured address of a connection is not
available keep on retrying until it comes back
9.0.15-1 (api:genl2/proto:86-114/transport:14)
--------
* fix tracking of changes (on a secondary) against the lost disk of a
primary and also fix re-attaching in case the disk is replaced (has
new meta-data)
* fix live migrate of VMs on DRBD when migrated to/from diskless
nodes; before that fix a race condition can lead to one of the nodes
seeing the other one as consistent only
* fix an IO deadlock in DRBD when the activity log on a secondary runs full;
In the real world, this was very seldom triggered but can be easily
reproduced with a workload that touches one block every 4M and writes
them all in a burst
* fix hanging demote after IO error followed by attaching the disk again
and the corresponding resync
* fix DRBD dropping connection after an IO error on the secondary node
* new module parameter to disable support for older protocol versions,
an in case you configured peers that are not expected to connect it
might have positive effects because then this node does not need to
assume that such peer is ancient
* improve details when online changing devices from diskless to with disk and
vice versa. (Including peers freeing bitmap slots)
* remove no longer relevant compat tests
* expose openers via debugfs; that helps to answer the question why does
DRBD not demote to secondary, why does it give tell me "Device is held
open by someone"
* optimize IO submit code path; this can improve IOPs up to 30% on a system
with fast backend storage; lowers CPU load caused by DRBD on every workload
* compat for v4.18 kernel
9.0.14-1 (api:genl2/proto:86-113/transport:14)
--------
* fix regression in 9.0.13: call after-split-brain-recovery handlers
no auto-recovery strategies (not even the default: disconnect) would be
applied, nodes would stay connected and all nodes would try to become the
source of the resync.
* fix spurious temporary promotion failure: if after Primary loss
failover happened too quickly, transparently retry internally.
* fixup recently introduced P_ZEROES to actually work as intended
* fix online-verify to account for skipped blocks; otherwise, it won't
notice that it has finished, apparently being stuck near "100% done"
* expose more resync and online-verify statistics and details
* improve accounting of "in-flight" data and resync requests
* allow taking down an already useless minor device during "down",
even if it is (temporarily) opened by for example udev scanning
* fix for a node staying "only" Consistent and not returning to UpToDate
in certain scenarios when fencing is enabled
* fix data generation UUID propagate during resync
* compat for upstream kernels up to v4.17
9.0.13-1 (api:genl2/proto:86-113/transport:14)
--------
* abort a resync if a resync source becomes weakly connected and the
sync target is a neighbor of the primary; the lack of doing so was
a possible source of data corruption
* fix UUID handling with multiple diskless nodes; If the primary role
is moved between them, and no write happens before the storage
nodes are disconnect; before this fix the storage nodes would outdate
themselves upon reconnect
* When a data-set gets into contact (attach or connect) with an all
diskless cluster with a primary and the exposed UUID does not match
the arriving data-set, make sure to either set it to "Consistent"
or to reject the attach
* correctly handle when a node that was marked as intentional diskless
should get a disk; allocate bitmap slots when the --bitmap=no flag
gets removed; reject peers to attach if they are marked with --bitmap=no
* fix outdating of weakly connected nodes; It was broken when an already
primary node joins the cluster at the other end
* made returning from Ahead to SyncSource more reliable; the old code
may have missed the event if the write to the local backend was still
pending when the barrier-ack comes in
* fix a hard to trigger deadlock in the receiver; it triggered sometimes
on the Secondary if a resync was going on and writes on the primary
happen to the same area while the connection is interrupted; it caused
the device to be stuck in "NetworkFailure" state
* fix online resize in the presence of two or more diskless nodes
* fix online add of volumes to diskless nodes when it already has
established connections
* Set the SO_KEEPALIVE socket option on data sockets. Can be important
if long lived DRBD connections go through a firewall with connection
tracking
* automatically solve a specific split brain when quorum is enabled
and a node does no IO between losing connections to other nodes
* Compat: Drop support for kernels older 2.6.32 and distros older than
RHEL6; Added support for kernels up to v4.15.x
* new wire packet P_ZEROES a cousin of P_DISCARD, following the kernel
as it introduced separated BIO ops for writing zeros and discarding
* compat workaround for two RHEL 7.5 idiosyncrasies regarding refcount_t
and struct nla_policy
9.0.12-1 (api:genl2/proto:86-112/transport:14)
--------
* Fix a race condition in the device_open code path that can cause an
internal counter to go negative; It only triggered on Ubuntu/Debian
systems since the udev there uses FMODE_NDELAY when opening the
devices; The effect was that such a device fails attempts to remove it
9.0.11-1 (api:genl2/proto:86-112/transport:14)
--------
* Fix bug in compat code: Without this fix large bios are not split.
The user visible behavior is then a hanging DRBD.
9.0.10-1 (api:genl2/proto:86-112/transport:14)
--------
* Fix resync of two secondary nodes in the presence of a 3rd node that is
primary (maybe with disk or diskless); Fixed the race condition that
caused the resync to sometimes not terminate
* Improve connection behavior with autopromote enable one node Primary and
udev present. The problem was that if udev opens the device on the
Secondary side "in the right moment" the connection attempt was
aborted. Fixed that by waiting until udev closes the device
again. Improves connect speed!
* Fix in memory alignment of DRBD's struct bio. Got offseted by one due to a
buggy compat code. Only affects architectures that choke on unaligned
word accesses. I.e. Power and ARM not x86_64.
* Improve the quorum implementation, so that is works nicely with for the
purpose of replacing fencing with quorum in a Pacemaker setup. Quorum
lost affects the completion status of writes in flight; Quorum state is
visible to user-space; a new meta-data flags quorum-lost to handle a
corner case
* Ensure compatibility with upstream Linux kernel 4.14
9.0.9-1 (api:genl2/proto:86-112/transport:14)
--------
* fix occasionally forgotten resyncs in installations where
diskless primaries are present. The bug tigers when a storage
node is re-integrated, and it happens to connect to the diskless
primary first; This bug is severe, since it might cause inconsistent
data read back on the diskless primary!
* fix an issue that causes unexpected split-brain situations upon
connect. This issue triggers only when one of the node has a
node_id bigger than 3
* in a cluster with a diskless primary, when a server goes away,
and is not outdated, outdate it upon reconnect. This gets done
when it's current UUID does not match the diskless primaries
exposed data UUID; with this bug present it can lead to
inconsistent data presented on the diskless primary node to
readers
* fix update of exposed data UUID on diskless primaries. It could
lead to false reject of further diskfull secondaries that
want to join
* fix a possible OOPS when in a debug message regarding bitmap
locking
* fix discard bigger than 1MiB; The bug causes disconnect with
bigger discard requests
* fix left over bits in bitmap on SyncSource after resync; the
issue was triggered by write requests that come in while the
resync starts
* fix peers becoming unexpectedly displayed as D_OUTDATED at the
end of a resync; While the disk state on the node stays D_UP_TO_DATE
* fix a race between auto promote and auto demote of multiple volumes
in a single resource; The symptom was that the a process opening
the /dev/drbdX for read-write gets an -EROFS errno
* Speed up down of many resources by using call_rcu() instead
of synchronize_rcu()
* Make it compatible with the soon to be released 4.13 kernel
9.0.8-1 (api:genl2/proto:86-112/transport:14)
--------
* fix a race condition between adding connections and receiving data
blocks; When you hit it caused an OOPS
* fix a OOPS on a diskfull node when a request from a diskless
node causes an error from the back-end device. (This is a regression that
was introduced with the 9.0.7 release)
* fix a distributed deadlock when doing a discard/write-same burst
* fix an issue with diskless nodes adopting wrong current UUIDs
* fix wrongly rejected two-phase-state transactions; this issue was a
cause for drbdmanage init failing for 3 nodes
* fix attach of first disk after resource had no disk at all
* fix to not clear bitmap content and UUIDs in case we ignored a
split-brain situation
* fix a possible OOPS, caused when the sender thread tries to send
something after the sockets where free()'ed
* fix ignoring split-brain situations to the rare cases it was
intended to in the first place
* fix an issue with current-uuid not getting written for 5 seconds. When a
resources was down'd/up'd or the node rebooted, you got split brain
afterwards
* fix a race that caused DRBD to go StandAlone instead of Connecting
after loosing a connection
* Serialize reconciliation resync after the nodes found it if they
are UpToDate
* fix initial resync, triggered by "--force primary"; this is a fix
for a regression introduced in 9.0.7
* new configuration option that allows to report back IO errors instead
of freezing IO in case a partition looses quorum
* Speed-up AL-updates with bio flags REQ_META and REQ_PRIO
* merged changes from 8.4.10 and with that compatibility with Linux-4.12
9.0.7-1 (api:genl2/proto:86-112/transport:14)
--------
* various fixes to the 2-phase-commit online resize
* fix fencing and disk state transition and from Consistent; Now it is
as complete as it was under drbd-8.4; Necessary for crm-fence-peer
* fix moving previous current-uuid into bitmap slot for peer with
inconsistent disk state. Old code produced false "split-brain detected"
* fix calculation of authoritative nodes. Old code could lead to to
false "split-bain detected"
* udev workaround: Do not fail down if there is a read-only opener, instead
wait up to a second if the read-only-opener goes away.
* a primary notes if a "far away" nodes outdates itself, and records the
fact in its meta-data.
* report back to userspace if a node (or peer) is diskless by intention
* restore "Outdated" peer disk state from the meta-data upon attach
* Quorum to avoid data divergence, an alternative to fencing when the
resource has 3 or more replicas
* improve error reporting in case a promote/down/disconnect/detach
operation failes. E.g. tells which peer declined a state transition
* compiles with Linux 4.10
9.0.6-1 (api:genl2/proto:86-112/transport:14)
--------
* fixed error handling in del_path
* merge fixes from drbd-8.4; a potential NULL deref with kernel_sendmsg()
* fix multiple issues with concurrent two-phase-commits
* improve connect time; with older releases it usually took between 20 to
40 seconds to establish a connection; now it usually works now in less than
2 seconds
* allow multiple updatese per AL-transaction on a secondary; this can
(depending on the IO pattern) dramatically increase the the write
throuput and decrease IO amplification
* reorganize data structures of receiver for efficiency
* upon an open for read-only access, wait for up the auto-promote-timeout
for accessible data
9.0.5-1 (api:genl2/proto:86-112/transport:14)
--------
* fix a bug that causes data inconsistency between mirrors. The bug triggered
if you write on a diskless primary while a resync is going on between two
nodes with disk.
It might happen that the diskless primary chooses the not-updated mirror for
subsequent reads of those blocks, then this bug manifests as data corruption.
* no longer allocate bitmap-slots on diskfull nodes for nodes that are
configures to be diskless; Saves bitmap-slots, thus allows you to use
smaller max-peers numbers, thus saves memory and CPU resources
* fix bugs in the try_become_up_to_date() logic; they caused hanging
machines or hangs during state transitions
* fix a IO deadlock that could happen when two secondaries are in a resync
and a write on a third node (primary) is sent two the two resyncing nodes
* fix two phase-commits when the nodes form a circular structure
* fix support for WRITE_SAME. Was broken since merged from 8.4
* balance read requests from diskless primary nodes among multiple peers
(if available). In the past it read always from one node (that happened
to be first in the internal data structures)
* make sure it compiles with Linux 4.8
9.0.4-1 (api:genl2/proto:86-112/transport:14)
--------
* fix enforcement of single-primary constraint; a regression of 9.0.3
* fix resetting of ko_count
* fix the sometimes seen two-phase-commit ABORT endless loop
* fix a possible sleep in atomic with no longer taking locks in _destroy
functions
* fix two-phase-commits when the nodes build a loop; correctly handle
them where the two "flows" merge, both for PREPARE and COMMIT
* fix the resync after online grow
* fix a refcount leak in case a user-process terminates while doing
a .dumpit operation (drbdsetup state; drbdsetup show; drbdsetup events2...)
The damage of the bug was a resouce that never goes away
* fix a multi-node issue in comparing the GIs
* fix data-structure handling for RCU consumers (adding connections)
* reimplement online resize as two-phase-commit operation; the old online resize
code could get into packet-loop storms, had many problems; works when all nodes
support protocol 112
* enable kref-debug code by default
* enable parallel_ops for gennetlink operations; that is a workaround for
a "sleep while atomic" on kernels 4.3 and upwards; it is a performance
improvement for all; especially users of Open vSwitch will rejoice
* avoid "initial packet S crossed"
* a number of fixes to speed up establishing of connections
* remove some home made mutex deadlocks
(auto promote - down; del_resource - new_path)
9.0.3-1 (api:genl2/proto:86-111/transport:14)
--------
* fix a deadlock in try_become_up_to_date()
* fix an unintended overlay of an internal peer device flag and a device flag
The bug manifested itself that sometimes DRBD forgot to create a new
current UUID
* Create a new current UUID when the peer's disk breaks, do not wait
for a following write
* In case of an auto-promote event with no up-to-date data present wait
also wait for an connection with an up-to-date data peer for the time
of the auto-promote-timeout
* mark permanently diskless nodes in the meta-data of all nodes
* rework online resizing
* fix a reace condition that triggered a BUG() because it called
add_timer() twice for the same timer object
* fix multiple causes of outdated disks not becomming up-to-date
after no longer being weakly connected
* removed a use of an uninitialized value that might lead to an
unexpected outdate of the local disk upon disconnect
* found and fixed why sometimes a P_TWOPC_ABORT packet did not reached
all nodes of a cluster; that was the reason for hanging two-phase
commits
* Allow multiple diskless primaries if they are connected to a common
secondary that has a disk
* empty flush requests no longer trigger a bogus "IO ERROR" log entry
* restored the new-current-uuid --clear-bitmap functionality to skip
the initial resync
9.0.2-1 (api:genl2/proto:86-111/transport:14)
--------
* Fix list walks on the transfer log; Lead to crashes in _tl_restart()
and (sometimes) other places.
* Fix a corner case that lead to nodes showing the disk states of
other nodes as Outdated after resync
* Fix waiting for barrier acks; The bug was that demoting to secondary
after AHEAD/BEHIND may blocked forever
* Fix WRITE_SAME support
* Fix a possible OOPS in got_peer_ack()
* Fix resync controller; Was always on c-min-rate
* Fix a possible deadlock on disconnect
* Use modern kernel's expclicit plugging to reduce number of IOPs on
the secondary node
* Introduce a per resource max-io-depth limit; applies to each volume
in the resource
* Promptly send peer_acks if we can put the last reference on an
activity log entry. Helps the to get the AL in use on secondaries
down.
9.0.1-1 (api:genl2/proto:86-111/transport:14)
--------
* Fix ahead/behind mode: spinlock deadlock when entering and resync afterwards
* Fix crash in online verify
* Fix crash when receiving pages on a diskless node
* Fix issues with resize when more than two nodes are present
* Many minor fixes
* Work on the multi-path support; the TCP transport now tries all paths while
connecting, afterwards it uses one path
* The RDMA transport supports multiple paths and can use them in parallel
* Configuration option to make resync thinly provisioned storage
friendly: rs-discard-granularity
* Make sure it compiles with Linux 4.4
* Merge changes from the 8.4.7 release
* Fix a preformance regression of 8.4.0; it got worse with bigger devices;
really significant above 1TByte
9.0.0-3 (api:genl2/proto:86-110/transport:10)
--------
* Fixes for the RDMA transport
* Fixes for 8.4 compatibility
* Merge changes from the 8.4.6 release
9.0.0 (api:genl2/proto:86-110)
--------
* split drbdsetup connect into new-peer, add-path, connect;
split disconnect into del-path, disconnect and del-peer
* create a new configuration object 'peer-device'; move config
params of resync to the peer-device
* a transport abstraction layer that will allow for alternative
transports besides TCP. Planed are RDMA/Infiniband and SCTP
* moved the user space code to a dedicated repository (with
that the driver no longer uses autotools)
* drbdsetup events2 supersedes events; drbdsetup status supersedes
cat /proc/drbd; both commands are also available with drbd 8.4.6
and recent drbd-utils
* Fixed the wait-[connect|sync] drbdadm commands. Now they actually
work on all three object types (resources, connections, volumes)
* New command called "forget-peer". It is used to free a peer-device
slot. Online (via drbdsetup) or offline (via drbdmeta)
* Introduced two phase commits that span include all (direct
and indirectly) connected nodes. This new type of transactions
is used when a node bacomes primary, or when a new connection
is established.
* Reworked resync decisions for multiple connections, and weakly
connected nodes.
* Support for multiple connections in a single resource; this feature
obsoletes device stacking; new configuration keywords: connection,
host, node-id, connection-mesh, hosts.
* Automatic promote; if a process opens a drbd device for r/w, drbd
tries to promotes that resource to primary; if the last process
closes the device drbd demotes that resource to secondary; opens
fails if promotion fails. This feature obsoletes become-primary-on
* Activity log striping; Besides the striping this allows AL
sizes of up to 65536.
* Non blocking queuing of AL-updates; This change significantly
improves the number of IOPs in case the workload does not fit into
the configured AL size.
* Resync extents are now 128MiByte instead of 16MiBytes
* Use git submodules to share code between drbd core, user-space
tools and drbd transports
* a dedicated disk state for Detaching (previously Failed was
used for a detaching disk)