Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-16636 cart: force port range for tcp provider #15209

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions docs/QSG/build_from_scratch.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,10 +81,10 @@ the user outside of a virtual environment, in which case `~/.local/bin` will nee
PATH.

```bash
$ python3 -m venv venv
$ source venv/bin/activate
$ python3 -m pip --no-cache-dir install --upgrade pip
$ python3 -m pip install -r requirements-build.txt
$ python3 -m venv venv
$ source venv/bin/activate
$ python3 -m pip --no-cache-dir install --upgrade pip
$ python3 -m pip install -r requirements-build.txt
```

## Build DAOS
Expand Down
30 changes: 27 additions & 3 deletions docs/admin/predeployment_check.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,9 +87,9 @@ The DAOS Agent (running on the client nodes) is responsible for resolving a user
UID/GID to user/group names, which are then added to a signed credential and sent to
the DAOS storage nodes.

## HPC Fabric setup
## Network Setup

DAOS depends on the HPC fabric software stack and drivers. Depending on the type of HPC fabric
DAOS depends on the network fabric software stack and drivers. Depending on the type of fabric
that is used, a supported version of the fabric stack needs to be installed.

Note that for InfiniBand fabrics, DAOS is only supported with the MLNX\_OFED stack that is
Expand Down Expand Up @@ -162,9 +162,33 @@ Some distributions install a firewall as part of the base OS installation. DAOS
for its management service. If this port is blocked by firewall rules, neither `dmg` nor the
`daos_agent` on a remote node will be able to contact the DAOS server(s).

Either configure the firewall to allow traffic for this port, or disable the firewall
If telemetry is enabled in the server configuration file, the telemetry port (9191 by default)
must also be accessible on the DAOS server nodes.

Depending of the provider that is used, ech engine might also listens on a range of ports. This is
the case for the tcp provider. This range will start at the fabric_iface_port specified in the
server yaml file and use 2 ports for management, 1 port per target and helper xstream. For instance,
with fabric_iface_port set to 20000, 16 targets and 4 helper streams, the engine will listen on port
range from 20000 to 20021 for a total of 22 ports.

Moreover, there are cases where a engine might have to initiate a connection to a running application.
In this case, inbound connection from the storage nodes to the compute nodes must be allowed.
The default port range use by applications is 20100-21100 with the tcp provider. This can be modified
by setting the FI_TCP_PORT_LOW_RANGE and FI_TCP_PORT_HIGH_RANGE environment variables before running
the application.

Either configure the firewall to allow traffic for these ports, or disable the firewall
(for example, by running `systemctl stop firewalld; systemctl disable firewalld`).

The table below summarizes all ports that should be opened on the firewall:

| Node Type | Component | Process | Settings | Default |
| --------- | --------------|-------------|-------------------------------------------------------|-------------|
| Server | Control plane | daos_server | port: | 10001 |
| Server | Telemetry | daos_server | telemetry_port: | 9191 |
| Server | Data plane | daos_engine | fabric_iface_port: + 2 + targets: + nr_xs_helpers: | 20000-20019 |
| Client | libdaos | application | FI_TCP_PORT_LOW_RANGE/FI_TCP_PORT_HIGH_RANGE env vars | 20100-21100 |

## Install from Source

When DAOS is installed from source (and not from pre-built packages), extra manual
Expand Down
17 changes: 17 additions & 0 deletions src/cart/crt_init.c
Original file line number Diff line number Diff line change
Expand Up @@ -523,6 +523,23 @@ prov_settings_apply(bool primary, crt_provider_t prov, crt_init_options_t *opt)
if (prov != CRT_PROV_OFI_CXI && prov != CRT_PROV_OFI_TCP)
d_setenv("NA_OFI_UNEXPECTED_TAG_MSG", "1", 0);

/**
* Force specific port range for application when using tcp provider to know what
* ports to open when firewall is used.
*/
if (!crt_is_service() && (prov == CRT_PROV_OFI_TCP || prov == CRT_PROV_OFI_TCP_RXM)) {
uint32_t port_low_range = UINT32_MAX;
uint32_t port_high_range = UINT32_MAX;

crt_env_get(FI_TCP_PORT_LOW_RANGE, &port_low_range);
crt_env_get(FI_TCP_PORT_HIGH_RANGE, &port_high_range);

if (port_low_range == UINT32_MAX && port_high_range == UINT32_MAX) {
d_setenv("FI_TCP_PORT_LOW_RANGE", "20100", 0);
d_setenv("FI_TCP_PORT_HIGH_RANGE", "21100", 0);
Comment on lines +534 to +539
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need to crt_env_get() high and low beforehand if you are calling d_setenv with overwrite=0 anyway?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

He is only setting them if they aren't set. Note the initial setting to UINT32_MAX on 531/532. This is just setting them to default values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One issue, I think we might have here is with MPI jobs that also use libfabric. This may be too late to have an effect on ofi.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

		d_setenv("FI_TCP_PORT_LOW_RANGE", "20100", 0);

this accomplishes the same though, sets it only if its not set.
@johannlombardi is the idea to only set them if both are not set vs only 1?

}
}

g_prov_settings_applied[prov] = true;
}

Expand Down
2 changes: 2 additions & 0 deletions src/cart/crt_internal_types.h
Original file line number Diff line number Diff line change
Expand Up @@ -220,6 +220,8 @@ struct crt_event_cb_priv {
ENV(SWIM_PING_TIMEOUT) \
ENV(SWIM_PROTOCOL_PERIOD_LEN) \
ENV(SWIM_SUSPECT_TIMEOUT) \
ENV(FI_TCP_PORT_LOW_RANGE) \
ENV(FI_TCP_PORT_HIGH_RANGE) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why putting this in other order would help (saw your last patch).

ENV_STR(UCX_IB_FORK_INIT)

/* uint env */
Expand Down
Loading