diff --git a/docs/QSG/build_from_scratch.md b/docs/QSG/build_from_scratch.md index a3bd9c69add..b6a00f9b961 100644 --- a/docs/QSG/build_from_scratch.md +++ b/docs/QSG/build_from_scratch.md @@ -81,10 +81,10 @@ the user outside of a virtual environment, in which case `~/.local/bin` will nee PATH. ```bash - $ python3 -m venv venv - $ source venv/bin/activate - $ python3 -m pip --no-cache-dir install --upgrade pip - $ python3 -m pip install -r requirements-build.txt +$ python3 -m venv venv +$ source venv/bin/activate +$ python3 -m pip --no-cache-dir install --upgrade pip +$ python3 -m pip install -r requirements-build.txt ``` ## Build DAOS diff --git a/docs/admin/predeployment_check.md b/docs/admin/predeployment_check.md index 8b5a391d8a9..0968bde8a0e 100644 --- a/docs/admin/predeployment_check.md +++ b/docs/admin/predeployment_check.md @@ -87,9 +87,9 @@ The DAOS Agent (running on the client nodes) is responsible for resolving a user UID/GID to user/group names, which are then added to a signed credential and sent to the DAOS storage nodes. -## HPC Fabric setup +## Network Setup -DAOS depends on the HPC fabric software stack and drivers. Depending on the type of HPC fabric +DAOS depends on the network fabric software stack and drivers. Depending on the type of fabric that is used, a supported version of the fabric stack needs to be installed. Note that for InfiniBand fabrics, DAOS is only supported with the MLNX\_OFED stack that is @@ -162,9 +162,33 @@ Some distributions install a firewall as part of the base OS installation. DAOS for its management service. If this port is blocked by firewall rules, neither `dmg` nor the `daos_agent` on a remote node will be able to contact the DAOS server(s). -Either configure the firewall to allow traffic for this port, or disable the firewall +If telemetry is enabled in the server configuration file, the telemetry port (9191 by default) +must also be accessible on the DAOS server nodes. + +Depending on the provider used, each engine might also listen on a range of ports. This is +the case for the tcp provider. This range will start at the fabric_iface_port specified in the +server YAML file and use two ports for management, one port per target and helper xstream. For instance, +with fabric_iface_port set to 20000, 16 targets and 4 helper streams, the engine will listen on ports +in the range from 20000 to 20021 for a total of 22 ports. + +Moreover, there are cases where an engine might have to initiate a connection to a running application. +In this case, inbound connections from the storage nodes to the compute nodes must be allowed. +The default port range used by applications is 20100-21100 with the tcp provider. This can be modified +by setting the FI_TCP_PORT_LOW_RANGE and FI_TCP_PORT_HIGH_RANGE environment variables before running +the application. + +Either configure the firewall to allow traffic for these ports, or disable the firewall (for example, by running `systemctl stop firewalld; systemctl disable firewalld`). +The table below summarizes all ports that should be opened on the firewall: + +| Node Type | Component | Process | Settings | Default | +| --------- | --------------|-------------|-------------------------------------------------------|-------------| +| Server | Control plane | daos_server | port: | 10001 | +| Server | Telemetry | daos_server | telemetry_port: | 9191 | +| Server | Data plane | daos_engine | fabric_iface_port: + 2 + targets: + nr_xs_helpers: | 20000-20019 | +| Client | libdaos | application | FI_TCP_PORT_LOW_RANGE/FI_TCP_PORT_HIGH_RANGE env vars | 20100-21100 | + ## Install from Source When DAOS is installed from source (and not from pre-built packages), extra manual diff --git a/src/cart/crt_init.c b/src/cart/crt_init.c index d341a2a6dac..9bf03e91749 100644 --- a/src/cart/crt_init.c +++ b/src/cart/crt_init.c @@ -523,6 +523,23 @@ prov_settings_apply(bool primary, crt_provider_t prov, crt_init_options_t *opt) if (prov != CRT_PROV_OFI_CXI && prov != CRT_PROV_OFI_TCP) d_setenv("NA_OFI_UNEXPECTED_TAG_MSG", "1", 0); + /** + * Force specific port range for application when using tcp provider to know what + * ports to open when firewall is used. + */ + if (!crt_is_service() && (prov == CRT_PROV_OFI_TCP || prov == CRT_PROV_OFI_TCP_RXM)) { + uint32_t port_low_range = UINT32_MAX; + uint32_t port_high_range = UINT32_MAX; + + crt_env_get(FI_TCP_PORT_LOW_RANGE, &port_low_range); + crt_env_get(FI_TCP_PORT_HIGH_RANGE, &port_high_range); + + if (port_low_range == UINT32_MAX && port_high_range == UINT32_MAX) { + d_setenv("FI_TCP_PORT_LOW_RANGE", "20100", 0); + d_setenv("FI_TCP_PORT_HIGH_RANGE", "21100", 0); + } + } + g_prov_settings_applied[prov] = true; } diff --git a/src/cart/crt_internal_types.h b/src/cart/crt_internal_types.h index 857c1a4522d..db896ef0638 100644 --- a/src/cart/crt_internal_types.h +++ b/src/cart/crt_internal_types.h @@ -220,6 +220,8 @@ struct crt_event_cb_priv { ENV(SWIM_PING_TIMEOUT) \ ENV(SWIM_PROTOCOL_PERIOD_LEN) \ ENV(SWIM_SUSPECT_TIMEOUT) \ + ENV(FI_TCP_PORT_LOW_RANGE) \ + ENV(FI_TCP_PORT_HIGH_RANGE) \ ENV_STR(UCX_IB_FORK_INIT) /* uint env */