How to get from dCache 8.2 to dCache 9.2
- dCache now supports Java 17 as its platform
- Improve concurrent file create rate in a single directory
- Performance improvements for the concurrent directory creation and removal
Consequences:
- When upgrading from 8.2 to 9.2, you need to upgrade the entire instance
- If you run srm-manager, you need to clean
/var/lib/dcache/credentials
,srmrequestcredentials
and all entries in the*requests
and*filerequests
tables from the srm database. To do so, run the following on the SRM database:
truncate srmrequestcredentials;
truncate srmuser cascade;
- DCAP and NFS doors will fail the request if file’s storage unit is not configured in PoolManager
- linklocal and localhost interfaces are not published by doors and pools
- DCAP movers always start in passive mode
- removed experimental message encoding format
- removed default HSM operation timeout
- Starting version 9.1 the nlink count for directories shows only number of subdirectories. Thus, the existing nlink count can be out-of-sync with we no automatic re-synchronization is performed.
- dropped gplazma support for XACML
- pool binds TCP port for http and xroot movers on startup
- The
cleaner
cell name no longer exists, the service now consists of two cells:cleaner-disk
andcleaner-hsm
-
The storage layer has been redesigned to conserve space and for efficiency/throughput. Moving up to the new database schema may require some time. The amount of time can be generally computed in terms of the number of entries in the
request_target
table; figure about 1 hour for every 10 million. If it is necessary to maintain all such entries, we recommend you do the upgrade offline, using thedcache database update
command-line tool.
If there is no need to keep completed requests in the database, we would advise truncation or at least deletion of the completed requests before the upgrade. -
The way that the containers are managed has also beem significantly modified. The visible changes have to do with properties. First, the
max-permits
properties on the individual activities are no longer used. Throttling of activity operations is achieved using thebulk.limits.dir-list-semaphore
andbulk.limits.in-flight-semaphore
values, along with rate limiters on the endpoints (bulk.limits.pin-manager-rate-per-second
,bulk.limits.pnfs-manager-rate-per-second
,bulk.limits.qos-engine-rate-per-second
).
The thread pools have also changed somewhat;bulk.limits.delay-clear-threads
,bulk.limits.dir-list-threads
andbulk.limits.activity-callback-threads
are no longer used.It is anticipated that adjustments to these defaults should not be necessary under normal loads.
-
Periodic archiving of requests has been added; this is configurable via properties and admin commands. The properties:
bulk.limits.archiver-window
bulk.limits.archiver-window.unit
bulk.limits.archiver-period
bulk.limits.archiver-period.unit
determine the arrival time (from now) in the past before which terminated requests will be cleared when the archiver runs, and the frequency with which it runs, respectively. To reset these, use
\s bulk archiver reset
The archive table maintains an abbreviated summary of the requests, and can be purged via admin command as well.
\s request archived ls
\s request archived clear
Whenever you run
\s request clear
, a summary entry is also written to the archive table before deletion from the main tables.Depending on the volume of activity, it may be wise to tighten the period and window for archiving (the defaults are very conservative). As stated in the release notes, the
clearOnSuccess
andclearOnFailure
options for individual requests are not available fromSTAGE
requests (the TAPE API), so the archiver is the only way to clean these up automatically. -
Activity providers now can capture the environment so that defaults can be customized (this presently pertains only to PIN and STAGE lifetime attributes).
See:
bulk.plugin!pin.default-lifetime
bulk.plugin!pin.default-lifetime.unit
bulk.plugin!stage.default-lifetime
bulk.plugin!stage.default-lifetime.unit
-
Support has been added for the QoS update request to handle policy arguments. Note that
targetQos
is still valid. Please see the new cookbook section on QoS policies for details.\s bulk arguments UPDATE_QOS NAME | DEFAULT | VALUE SPEC | DESCRIPTION qosPolicy | null | string, max 64 chars | the name of the qos policy to apply to the file qosState | 0 | integer | the index into the desired policy's list states targetQos | null | disk|tape|disk+tape | the desired qos transition ('disk' is limited to files with volatile/unknown qos status)
-
Bulk has been made replicable (HA). Please read the requirements in the cookbook section on High Availability. The following property controls the timeout for a response from the leader:
bulk.service.ha-leader.timeout
bulk.service.ha-leader.timeout.unit
-
The
delay clear
option has been eliminated from bulk requests. -
The
prestore
option has also been eliminated, as it is no longer necessary (since all initial targets are immediately batch stored synchronously) before the submission request returns. -
NOTE: Running more concurrent containers in Bulk means their files will be sliced. If it is important to keep most files in a request together for the purposes of staging optimization, then the max concurrency in Bulk should be turned down. It could even be set at 1 (essentially one request at a time) if necessary.
The cleaner
service, originally a single cell, now consists of two parts: one cell for disk cleaning (cleaner-disk
), one for hsm
cleaning (cleaner-hsm
). They can be deployed as desired, be assigned different resources and each run in HA mode.
This will hopefully improve performance issues and help admins configure and understand cleaner behaviour.
Be aware that the property names have changed their prefixes from cleaner.<something>
to cleaner-disk.<something>
and
cleaner-hsm.<something>
, while some admin commands have lost the "hsm" String from their name.
Note: As not all previously existing parameters were used to control the behaviour of both the disk and hsm parts of the
old, combined cleaner
cell, please check which parameters are carried over to cleaner-hsm
and cleaner-disk
, respectively.
Example setup:
[dCacheDomain]
[dCacheDomain/cleaner-disk]
cleaner-disk.cell.name=cleaner-disk1
[dCacheDomain/cleaner-disk]
cleaner-disk.cell.name=cleaner-disk2
[dCacheDomain/cleaner-hsm]
Also, an admin command was added to the hsm-cleaner cell that allows forgetting a tape-resident pnfsid, meaning removing any corresponding delete target entries from the cleaner’s trash table database.
Starting dCache version 9.1 chimera allows controlling the behaviour of the parent directory attribute update policy with configuration property chimera.attr-consistency
, which takes the following values:
policy | behaviour |
---|---|
strong | a creation of a filesystem object will right away update the parent directory's mtime, ctime, nlink and generation attributes |
weak | a creation of a filesystem object will eventually update (after 30 seconds) the parent directory's mtime, ctime, nlink and generation attributes. Multiple concurrent modifications to a directory are aggregated into a single attribute update. |
soft | same as weak, however, reading of directory attributes will take into account pending attribute updates. |
Read-write exported NFS doors SHOULD run with strong consistency or soft consistency to maintain POSIX compliance. Read-only NFS doors might run with weak consistency if non-up-to-date directory attributes can be tolerated, for example, when accessing existing data, or soft consistency, if up-to-date information is desired, typically when seeking newly arrived files through other doors.
- Support for .well-known/security.txt was added to both the frontend and WebDav ports. The dcache properties
dcache.wellknown!wlcg-tape-rest-api.path
anddcache.wellknown!security-txt.uri
apply globally;frontend.wellknown!wlcg-tape-rest-api.path
has been deprecated. - Support for relative paths and symlink prefix resolution for bulk and namespace resources.
- Authz checks have been removed in Quota GET methods.
- Use of RolePrincipal (see under gPlazma) replaces reliance on LoginAttributes and the old admin role (special gid) as defined by the roles plugin.
- Support for QoS Rule Engine policies. A new qos-policy resource allows one to add, remove, list and retrieve policy definitions. See the Swagger pages for details.
- An -optional query parameter was added to the namespace resource to retrieve extra information about a file; the new QoS Policy file attributes (QOS_POLICY and QOS_STATE) are included with this option.
- Support for pool migration has been introduced (migrations resource). See Swagger pages for details.
With the addition of the RolePrincipal, the roles
plugin may be considered deprecated. In 9.2, neither
Frontend nor dCacheView make use of it any more. Instead, one must grant privileges by using the multimap plugin.
An example:
dn:"/DC=org/DC=cilogon/C=US/O=Fermi National Accelerator Laboratory/OU=People/CN=Al Rossi/CN=UID:arossi" username:arossi uid:8773 gid:1530,true roles:admin
Currently there are three roles defined: admin, qos-user and qos-group. The second allows the user to transition files owned by the user's uid; the third allows the user to transition files whose group is the user's primary gid. The two qos roles can be combined. Admin grants full admin privileges.
To remove unused directory tags chimera keeps the reference count (nlink) of tags. This approach creates a ‘hot’ record that serializes all updates to a given top-level tag. Starting 9.2 dCache doesn’t rely on ref count anymore and uses conditional DELETE, which should improve the concurrent directory creation/deletion rate.
Prior to version 9.2 dCache, to support RHEL6-based clients, if no export options are specified, the NFSv4.1 door was publishing only the nfs4_1_files layout. Now on the door publishes all available layout types. If for whatever reason RHEL6 clients are still used, the old behaviour can be enforced by the lt=nfsv4_1_files
export option.
Typically, when dCache interacts with an HSM, there is a timeout on how long such requests can stay in the HSM queue. Despite the fact that those timeouts are HSM-specific, dCache comes with its own default values, which are usually incorrect, so admins usually end up explicitly setting them. Starting with version 9.0, the default timeout has been removed. This means that there is no timeout for HSM operations unless explicitly set by admins.
NOTE: this change is unlikely to break existing setups, as previous timeout values are already stored in the pool setup file.
In addition, a new command sh|rh|rm unset timeout
has been added to drop defined timeouts.
-
`QoS was made to support migration using a new pool mode, "DRAINING"; please consult the book chapter for further details.
-
The DB namespace endpoint can now be configured to be separate from the main Chimera database (originally introduced/changed for Resilience). In this way, the scanner, whose namespace queries are read-only, could be pointed at a database replica.
This remains possible for QoS, even though the QoSEngine now also is responsible for updating Chimera with file policy state. These writes are all done via messaging (PnfsHandler) rather than by direct DB connection, so they go to the master Chimera instance.
-
Requests for QoS transitions are now authorized on the basis of role (see under gPlazma).
-
The first version of the QoS Rule Engine has been added. With this, one can define a QoS policy to apply to files either through a directory tag or via a requested transition; the engine tracks the necessary changes in state over time. A new database table has been added to the qos database. Remember that if you are deploying QoS for the first time, you need to create the database:
createdb -U <user> qos
Please consult the cookbook chapter on QoS policies for further details.
-
In conformity with the new rule engine changes, the scanner has been modified in terms of how it runs scans.
The scan period refers to the default amount of time between sweeps to check for timeouts.
The scan windows refer to the amount of time between scheduled periodic system diagnostic scans.
QOS NEARLINE refers to files whose QoS policy is defined and whose RP is NEARLINE CUSTODIAL. ONLINE refers to scans of all files with persistent copies, whether or not they are REPLICA or CUSTODIAL.
ONLINE scanning is done by a direct query to the namespace, and is batched into requests determined by the batch size. Unlike with resilience, this kind of scan will only touch each inode entry once (whereas pool scans may overlap when multiple replicas are involved).
On the other hand, a general pool scan will only look at files on pools that are currently IDLE and UP, so those that are excluded or (temporarily) unattached will be skipped. This avoids generating a lot of alarms concerning files without disk copies that should exist.
The direct ONLINE scan is enabled by default. To use the pool scan instead, disable "online" either via the property or the admin reset command. Be aware, however, that unlike resilience, all pools will be scanned, not just those in the resilient/primary groups; thus the online window should be set to accommodate the amount of time it will take to cycle through the entire set of pools this way. Needless to say, doing a direct ONLINE scan probably will take less time than a general pool scan.
The batch size for a direct ONLINE scan is lowered to serve as an implicit backgrounding or de-prioritization (since the scan is done in batches, this allows for preemption by QOS scans if they are running concurrently).
The relevant properties;
qos.limits.scanner.scan-period qos.limits.scanner.scan-period.unit qos.limits.scanner.qos-nearline-window qos.limits.scanner.qos-nearline-window.unit qos.limits.scanner.enable.online-scan qos.limits.scanner.online-window=2 qos.limits.scanner.online-window.unit qos.limits.scanner.qos-nearline-batch-size qos.limits.scanner.online-batch-size
More details available in the Book. Scans can also be triggered manually:
\h sys scan NAME sys scan -- initiate an ad hoc background scan. SYNOPSIS sys scan [-online] [-qos] DESCRIPTION If a scan of the requested type is already running, it will not be automatically canceled. OPTIONS -online Scan online files (both REPLICA and CUSTODIAL). Depending on whether online is enabled (true by default), it will either scan the namespace entries or will trigger a scan of all IDLE ENABLED pools. Setting this scan to be run periodically should take into account the size of the namespace or the number of pools, and the proportion of ONLINE files they contain. -qos Scan NEARLINE files for which a QoS policy has been defined.
Note that the singleton QoS service (where all four components are plugged into each other directly) is no longer available; the four services can, however, still be run together or in separate domains, as with any dCache cell.
Resilience is still available in 9.2, but should be considered as superseded by the QoS services. We encourage you to switch to the latter as soon as is feasible. Remember not to run Resilience and QoS simultaneously.
- Proxying through the xroot door is now available. See the following properties:
#
# Proxy all transfers through the door.
#
xrootd.net.proxy-transfers=false
# What IP address to use for connections from the xroot door to pools.
#
# When the data transfer is proxied through the door, this property is
# used as a hint for the pool to select an interface to listen to for
# the internal connection created from the door to the pool.
#
xrootd.net.internal=
#
# Port range for proxied connections.
#
xrootd.net.proxy.port.min=${dcache.net.wan.port.min}
xrootd.net.proxy.port.max=${dcache.net.wan.port.max}
#
# How long to wait for a response from the pool.
#
xrootd.net.proxy.response-timeout-in-secs=30
-
Relative paths are supported in the xroot URL. Resolution of paths will be done on the basis of the user root as defined in the gPlazma configuration files.
We encourage you not to use
xrootd.root
if possible. -
Resolution of symlinks in path prefixes and paths is supported.
-
The efficiency of the stat list (ls -l) has been greatly improved.