Skip to content

Lightweight implementation of rsync specifically designed to regularly copy Bruker NMR datasets from instrument computers to a server.

License

Notifications You must be signed in to change notification settings

greenwoodad/nmrsync

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

86 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nmrsync

nmrsync is a bash script for periodic synchronization of Bruker NMR data to a data server using rsync. It takes an input file as an argument which provides information including paths, SSH aliases, and rsync options. The script searches for files that have been modified recently (< x days/hr/min as specified in the input file), searching in folders up to the level of RemoteDataPath/(user)/nmr/(data set). For example, if a new spectrum 1/ or 2/ appears in the data set folder, the data set is flagged for syncing, but if something deeper (e.g. a proc file in a pdata folder) is changed, it won't be flagged.

Before syncing, the script can search for folders that are identically named except for case differences (which are unique in Linux but indistinguishable on Windows and Mac), as well as for folders that end in a period (which is permissible on Linux and Mac but not on Windows). The names of these spectra are then placed in SkipFileOld and they are not synced. An email is instead sent to the NMR manager who can then manually change the folder names once the data has finished acquiring. This can also be accomplished automatically using nmrfolderfix (https://github.com/greenwoodad/nmrfolderfix/).

I personally run this as a cron job every five minutes as well as every week with a second input file to ensure data is still eventually transferred after network or power outages. This can also be run (usually with the -full flag) occasionally to back up pp/wave/par folders from the Topspin directories if desired.

Prerequisites

This script requires a linux operating system with rsync. It has been tested in CentOS 6.8, 7.5 and Ubuntu 20 on the local side and CentOS 7.5, CentOS 5.1, and RHEL7.3 on the remote side. I've only tested this with Bruker NMR data, but future releases may be able to handle file structures generated by other instruments.

The email feature requires that the command sendmail is working on the machine running the script.

Installing

git clone https://github.com/greenwoodad/nmrsync

or

git clone https://(your github username)@github.com/greenwoodad/nmrsyc.git

followed by:

chmod +x ./nmrsync/nmrsync

Getting Started

Setting up password-less ssh logins to instrument machines

Because this script is intended to be run as a cron job, it is necessary to authorize the local machine to access the remote machine(s) with password-less ssh login using ssh keys. Tutorials are available here:

Briefly:

  1. On the machine you want to run the script and send emails from (as the user you want to do this as) run the command:
ssh-keygen -t rsa -b 4096

This will generate files ~/.ssh/id_rsa and ~/.ssh/id_rsa.pub

Press enter at the prompt "Enter passphrase (empty for no passphrase):" to skip passphrase generation.

  1. Next, run this command (from the local machine) for each remote workstation:
ssh-copy-id remote_username@remote_ip_address

You will be prompted for the password for this remote workstation.

If ssh-copy-id is not available, you should be able to run this instead:

cat ~/.ssh/id_rsa.pub | ssh remote_username@remote_ip_address "mkdir -p ~/.ssh && chmod 700 ~/.ssh && cat >> ~/.ssh/authorized_keys && chmod 600 ~/.ssh/authorized_keys"
  1. Last, add SSH aliases to your hosts file. In /etc/hosts, add entries:
IPAddress DomainName SSHAlias

for each remote workstation.

for example:

198.51.100.50     dmx500.chem.university.edu       DMX500
198.51.100.54     av400.chem.university.edu        AV400
198.51.100.59     neo400.chem.university.edu       NEO400

The SSHAliases here should be the same SSHAliases you enter in the nmrsync input file.

You should now be able to SSH to the remote workstations without entering a password by typing:

ssh remote_username@SSHAlias

in addition to

ssh remote_username@IPAddress 

and

ssh remote_username@DomainName 

The first time you do this, you will need to type "yes" to the question "Are you sure you want to continue connecting (yes/no)?" however. After this, you will be able to run the script automatically without manual password entry.

Configuring the input file

In the input file (nmrsync_input) there are a number of parameters and paths to set:

  • ScriptsPath: Full path to the location of the main script and the input, emailtxt, and log folders on local machine. Use full path!

  • ManagerEmail: Email address of the NMR facility manager.

  • Age: How many days/hr/min back to look for recent experiments to sync. Without extended outages, ~3 (days) usually works well.

  • Timescale: Units for "Age"-- use 'day' 'hr' or 'min'

  • RsyncOptions: Rsync options. I use '-quvrltD --modify-window=1 --protect-args'

  • SkipFlag: Defines what folders are not synced. 'period' to skip folders ending in a period, 'dup' to skip folders with case-insensitive duplicates, 'both,' or 'none.' Default is 'both.' Note that if a different value of SkipFlag is specified with -s when the script is run, it overrules the value specified in the input file.

  • Instrument: Name of instrument. Can be anything (no spaces) but make sure it is unique (not entered twice in the table).

  • /nmr directory?: Set this to 'y' for the default /(user)/nmr/(data set)/(expt #) data organization on the remote computer. Set it to 'n' for data organized as /(user)/(data set)/(expt #)

  • sort data by username: Set this to 'y' if you want to group all of a user's data in a common folder (data stored as username/instrument/data instead of instrument/username/data)

  • SourceDataPath: Full path containing NMR data, usually on a remote computer. Topspin/ICON-NMR usernames should be found in this folder. Use full path!

  • DestinationDataPath: Full path on local computer to transfer the data to. Can be a mounted directory. Use full path!

  • SSHAlias: Alias for password-less SSH to this instrument computer. Optional if the source path is on the local computer.

  • RemoteUser: User on the remote computer that you can SSH as. Optional if the source path is on the local computer.

IMPORTANT: When editing this file, entries should be separated by either a tab or multiple spaces.

Instruments in the instrument table can be commented out with a #.

NOTE: Additional modifications can be made to the variables 'SendMailPath', 'ManualFlag', 'ExcludeFlag', 'FullFlag', and 'VerboseFlag' at top of the script itself. These are generally the default values for options that can be provided when the script is run (see Usage, below).

Usage

nmrsync [OPTIONS]... path/to/nmrsync_input

Options

-h, -?, --help Show help message.

-i, --input Set input file (flag optional).

-s, --skip (default 'both') Set to 'period' to skip folders ending in period, 'dup' to skip case-insensitive duplicates, 'both' to skip both and 'none' to skip none.

-m, --manual Manual mode: enter password instead of using SSH keys--not recommended.

-p, --processed Processed data mode: will ignore excludelist.instrument list in the input folder and copy processed data.

-f, --full Full mode: copy over all data instead of just recently added data.

-b, --verbose Verbose mode.

The defaults here can be modified at the top of the script itself.

To run this as a cron job, make an entry in your crontab like this:

*/5 * * * * /path/to/nmrsync "/path/to/input/nmrsync_input"
40 6 * * 0 /path/to/nmrsync "/path/to/input/nmrsync_input_weekly"

In the preceding example, there is a second input file which is configured to run looking for data that has been collected over the last week. The "fast" version is set to run every five minutes ( * /5) while the "slow" version is set to run on Sunday (0) at 6:40 AM (40 6).

Contributing

Pull requests are welcome.

Authors

License

MIT

About

Lightweight implementation of rsync specifically designed to regularly copy Bruker NMR datasets from instrument computers to a server.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages