Skip to content

Commit

Permalink
Required-githooks: true
Browse files Browse the repository at this point in the history
Merge branch 'release/2.4' into mjmac/DAOS-13236-13471
  • Loading branch information
mjmac committed Jul 17, 2023
2 parents 3fbd6d7 + 3cad544 commit 71a5bd8
Show file tree
Hide file tree
Showing 145 changed files with 3,240 additions and 2,137 deletions.
16 changes: 14 additions & 2 deletions debian/changelog
Original file line number Diff line number Diff line change
@@ -1,8 +1,20 @@
daos (2.3.108-2) unstable; urgency=medium
daos (2.3.108-4) unstable; urgency=medium
[Michael MacDonald]
* Add golang-go as a tests dependency for dfuse/daos_build.py

-- Michael MacDonald <mjmac.macdonald@intel.com> Thu, 29 Jun 2023 10:10:00 -0400
-- Michael MacDonald <mjmac.macdonald@intel.com> Mon, 17 Jul 2023 10:10:00 -0400

daos (2.3.108-3) unstable; urgency=medium
[ Wang Shilong ]
* Remove lmdb-devel for MD on SSD

-- Wang Shilong <shilong.wang@intel.com> Thu, 13 Jul 2023 22:44:00 +0800

daos (2.3.108-2) unstable; urgency=medium
[ Li Wei ]
* Update raft to 0.10.1-1408.g9524cdb

-- Li Wei <wei.g.li@intel.com> Wed, 28 Jun 2023 10:38:00 +0900

daos (2.3.108-1) unstable; urgency=medium
[ Jeff Olivier ]
Expand Down
3 changes: 1 addition & 2 deletions debian/control
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,9 @@ Build-Depends: debhelper (>= 10),
libboost-dev,
libspdk-dev (>= 22.01.2),
libipmctl-dev,
libraft-dev (= 0.9.1-1401.gc18bcb8),
libraft-dev (= 0.10.1-1408.g9524cdb),
python3-tabulate,
liblz4-dev,
liblmdb-dev,
libcapstone-dev
Standards-Version: 4.1.2
Homepage: https://docs.daos.io/
Expand Down
2 changes: 2 additions & 0 deletions docs/admin/env_variables.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@ Environment variables in this section only apply to the server side.
|----------------------|-----------|
|RDB\_ELECTION\_TIMEOUT|Raft election timeout used by RDBs in milliseconds. INTEGER. Default to 7000 ms.|
|RDB\_REQUEST\_TIMEOUT |Raft request timeout used by RDBs in milliseconds. INTEGER. Default to 3000 ms.|
|RDB_LEASE_MAINTENANCE_GRACE|Raft grace period of leadership lease maintenance used by RDBs in milliseconds. INTEGER. Default to 7000 ms. If a Raft leader is unable to maintain leadership leases from a majority for more than RDB_ELECTION_TIMEOUT + RDB_LEASE_MAINTENANCE_GRACE, it steps down voluntarily.|
|RDB_USE_LEASES|Whether RDBs shall use Raft leadership leases, instead of RPCs, to verify leadership. BOOL. Default to true. Rafts track leadership leases regardless; this environment variable essentially controls whether RDBs use Raft leadership leases to improve RDB TX performance.|
|RDB\_COMPACT\_THRESHOLD|Raft log compaction threshold in applied entries. INTEGER. Default to 256 entries.|
|RDB\_AE\_MAX\_ENTRIES |Maximum number of entries in a Raft AppendEntries request. INTEGER. Default to 32.|
|RDB\_AE\_MAX\_SIZE |Maximum total size in bytes of all entries in a Raft AppendEntries request. INTEGER. Default to 1 MB.|
Expand Down
4 changes: 2 additions & 2 deletions docs/admin/hardware.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,9 +35,9 @@ validated on a regular basis.
An RDMA-capable fabric is preferred for best performance.
The DAOS data plane relies on [OFI libfabric](https://ofiwg.github.io/libfabric/)
and supports OFI providers for Ethernet/tcp and InfiniBand/verbs.
Starting with a Technology Preview in DAOS 2.2, [UCX](https://www.openucx.org/)
[UCX](https://www.openucx.org/)
is also supported as an alternative network stack for DAOS.
Refer to [UCX Fabric Support (DAOS 2.2 Technology Preview)](./ucx.md)
Refer to [UCX Fabric Support](./ucx.md)
for details on setting up DAOS with UCX support.

DAOS supports multiple network interfaces on the servers
Expand Down
18 changes: 18 additions & 0 deletions docs/admin/md-on-ssd.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Metadata on SSD Phase1 (Technology Preview)

DAOS Version 2.4 includes a Technology Preview of the
_Metadata-on-SSD (Phase1)_
code path to support DAOS servers without Intel Optane Persistent Memory.

Please refer to the DAOS Wiki articles on the
[Metadata-on-SSD Design](https://daosio.atlassian.net/wiki/spaces/DC/pages/11196923911/Metadata+on+SSDs)
and the
[WAL Detailed Design](https://daosio.atlassian.net/wiki/spaces/DC/pages/11215339529/WAL+Detailed+Design)
for more information.

A presentation on this new code path,
including initial performance comparisons of DAOS Servers with and without PMem,
can be found in the presentation
[DAOS Beyond PMem](https://www.ixpug.org/images/docs/ISC23/DAOS_mhennecke.pptx)
from the
[ISC 2023 IXPUG Workshop](https://www.ixpug.org/events/isc23-ixpug-workshop).
14 changes: 7 additions & 7 deletions docs/admin/pool_operations.md
Original file line number Diff line number Diff line change
Expand Up @@ -486,7 +486,7 @@ To create a pool with a custom ACL:
$ dmg pool create --size <size> --acl-file <path> <pool_label>
```

The ACL file format is detailed in [here](https://docs.daos.io/v2.4/overview/security/#acl-file).
The ACL file format is detailed [here](https://docs.daos.io/v2.4/overview/security/#acl-file).

### Displaying ACL

Expand Down Expand Up @@ -629,7 +629,7 @@ operation is ongoing. Drain additionally enables non-replicated data to be
rebuilt onto another target whereas in a conventional failure scenario non-replicated
data would not be integrated into a rebuild and would be lost.
Drain operation is not allowed if there are other ongoing rebuild operations, otherwise
it will return -DER_BUSY.
it will return -DER\_BUSY.

To drain a target from a pool:

Expand All @@ -650,7 +650,7 @@ original state.
The operator can either reintegrate specific targets for an engine rank by
supplying a target idx list, or reintegrate an entire engine rank by omitting the list.
Reintegrate operation is not allowed if there are other ongoing rebuild operations,
otherwise it will return -DER_BUSY.
otherwise it will return -DER\_BUSY.

```
$ dmg pool reintegrate $DAOS_POOL --rank=${rank} --target-idx=${idx1},${idx2},${idx3}
Expand Down Expand Up @@ -702,7 +702,7 @@ pool.
This will automatically trigger a server rebalance operation where objects
within the extended pool will be rebalanced across the new storage.
Extend operation is not allowed if there are other ongoing rebuild operations,
otherwise it will return -DER_BUSY.
otherwise it will return -DER\_BUSY.

```
$ dmg pool extend $DAOS_POOL --ranks=${rank1},${rank2}...
Expand All @@ -717,14 +717,14 @@ small extensions.

### Resize

Support for quiescent pool resize (changing capacity used on each storage node
without adding new ones) is currently not supported and is under consideration.
Support for quiescent pool resize (changing capacity used on each storage engine
without adding new engines) is currently not supported and is under consideration.

## Pool Catastrophic Recovery

A DAOS pool is instantiated on each target by a set of pmemobj files
managed by PMDK and SPDK blobs on SSDs. Tools to verify and repair this
persistent data is scheduled for DAOS v2.4 and will be documented here
persistent data are scheduled for DAOS Version 2.6 and will be documented here
once available.

Meanwhile, PMDK provides a recovery tool (i.e., pmempool check) to verify
Expand Down
9 changes: 9 additions & 0 deletions docs/admin/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -463,6 +463,15 @@ fabric_iface_port: 31316
# engine 1
fabric_iface_port: 31416
```
### daos_agent cache of engine URIs is stale

The `daos_agent` cache may become invalid if `daos_engine` processes restart with different
configurations or IP addresses, or if the DAOS system is reformatted.
If this happens, the `daos` tool (as well as other I/O or `libdaos` operations) may return
`-DER_BAD_TARGET` (-1035) errors.

To resolve the issue, a privileged user may send a `SIGUSR2` signal to the `daos_agent` process to
force an immediate cache refresh.

## Diagnostic and Recovery Tools

Expand Down
17 changes: 4 additions & 13 deletions docs/admin/ucx.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,10 @@
# UCX Fabric Support (DAOS 2.2 Technology Preview)
# UCX Fabric Support

DAOS 2.2 includes a technology preview of
DAOS 2.4 includes
[UCX](https://www.openucx.org/) support for clusters using InfiniBand,
as an alternative to the default
[libfabric](https://ofiwg.github.io/libfabric/) network stack.

!!! note UCX support has been enabled for the DAOS builds on
EL8 and Leap15 only. It is not supported on CentOS7.

The goal of this technology preview is to allow early
evaluation and testing. DAOS over UCX has not been fully
validated yet, and it is not recommended to use it in a
production environment with DAOS 2.2.
It is a roadmap item to fully support UCX in DAOS 2.4.

!!! note The network provider is an immutable property of a DAOS system.
Changing the network provider to UCX requires that the DAOS storage
is reformatted.
Expand Down Expand Up @@ -77,7 +68,8 @@ the following steps are needed:
zypper install mercury-ucx
```

* To **update** from DAOS 2.0 (with libfabric) to DAOS 2.2 with
* To **update** from an earlier DAOS version (with libfabric)
to DAOS 2.4 with
UCX, the recommended path is to first perform a standard DAOS
RPM update (which will update the default `mercury` package).
After the update, the `mercury` RPM package can be replaced by
Expand All @@ -89,4 +81,3 @@ configuration file (`/etc/daos/daos_server.yml`).
A sample YML file is available on
[github](https://github.com/daos-stack/daos/blob/release/2.4/utils/config/examples/daos_server_ucx.yml).
The recommended setting for UCX is `provider: ucx+dc_x`.

1 change: 1 addition & 0 deletions docs/overview/terminology.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@
|[SPDK](https://spdk.io/)|Storage Performance Development Kit|
|SSD|Solid State Drive|
|[SWIM](https://doi.org/10.1109/DSN.2002.1028914)|Scalable Weakly-consistent Infection-style process group Membership Protocol|
|[UCF](https://ucfconsortium.org/)|Unified Communication Framework (UCF Consortium)|
|[UCP](https://www.openucx.org/)|Unified Communication Protocols (high-level API of UCX)|
|[UCS](https://www.openucx.org/)|Unified Communication Transports (low-level API of UCX)|
|[UCT](https://www.openucx.org/)|Unified Communication Services (common utilities of UCX)|
Expand Down
Loading

0 comments on commit 71a5bd8

Please sign in to comment.