-
Notifications
You must be signed in to change notification settings - Fork 68
User Stories
This page should capture some of the use cases we have encountered. Each use case may, or may-not, include specific details about the incident. If you need help or would like to suggest a use case, please create an issue.
This is one of the most common use cases.
Many LC users have had to migrate from one file system to another.
While dsync
is the recommended tool, dcp
can also be used.
With either tool, the use of dcmp
is recommended to ensure the data has been successfully copied.
dsync
can be very effective for data migration and there are a few options that users should make note of:
- The
--batch-files N
option can be group sets of N files during the migration. This acts as a “checkpoint” and allowsdsync
to make effective progress over multiple job allocations. - The
--contents
option can be used to perform a byte-by-byte comparison between files in the source and destination. If option is not used, the comparison is done based on file size and last modified time. - The
--delete
option can be to delete files in the destination that do not appear in the source location.
Before starting the copy, users should run dwalk
on the source location to get an estimate of how many items and how much data must be migrated.
To avoid having to do another walk when using the next tool, cache the file hierarchy using the --output FILE
flag.
Use dcp -p
to copy the files from the source location to the destination. As with all the tools, be sure to use multiple nodes with many MPI ranks per node; scaling up to 4 nodes is typically beneficial.
dcp
flags:
- The
-p
, or--preserve
, option will ensure that the files at the destination have the same permissions, group, timestamps, and extended attributes. - Use the
--input FILE
to read the cached file hierarchy data from step 1.
Ensure all data was successfully copied with dcmp
.
Use the --lite
option to use a light-weight comparison based on file size and last modified time.
Without this option, dcmp
does a byte-to-byte comparison between each file in the source and destination.
See the usage of dsh
as described in hpc/mpifileutils#249.
See dwalk --sort
option.
The great sierra migration by @gonsie was an example of this use case. Data migration by done by a system administrator (acting as individual users). The general process was:
- Use
dwalk
to determine the amount of data to move. - Use
dcp
to perform copy, with resources allocated depending ondwalk
results. - Use
dcmp --lite
to confirm success and find any new data at source. - Use
dsync
for any users with differences.
- The
dsync
tool deleted new data in the destination directory. This data was lost. As of v0.9,dsync
no longer defaults to deleting data at the destination.
LC parallel file systems operate with a quota policy: a user may only use a limited amount of space within the file system.
To help users manage their data, a special directory 0_LC_AutoDelete
is placed in each user’s directory.
Data moved to this location is automatically cleaned up via a script running drm
.