Skip to content

Commit

Permalink
Merge pull request #17 from FredHutch/v1.3
Browse files Browse the repository at this point in the history
Allow for non-AWS usage
  • Loading branch information
vortexing authored Apr 2, 2022
2 parents ff8d6d7 + 32f578a commit 2c0b5cc
Show file tree
Hide file tree
Showing 5 changed files with 279 additions and 43 deletions.
76 changes: 46 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
# diy-cromwell-server
A repo containing instructions for running a Cromwell server on `gizmo` at the Fred Hutch. These instructions were created and tested by Amy Paguirigan, so drop her a line if they don't work for you or you need help. Fred Hutch username is `apaguiri`.
Alternatively, join the discussion in The Coop Slack in the [#question-and-answer channel](https://fhbig.slack.com/archives/CD3HGJHJT) using your Fred Hutch, UW, SCHARP or Sagebase email.
A repo containing instructions for running a Cromwell server on `gizmo` at the Fred Hutch. These instructions were created and tested by Amy Paguirigan, so drop her a line if they don't work for you or you need help (Fred Hutch username is `apaguiri`) or you can tag @vortexing in Issues filed here in the GitHub repository. Note if you do this, please be sure that you do not post sensitive information in your troubleshooting information like passwords and such, but the more info you can provide about errors, the better. Alternatively, join the discussion in the Fred Hutch Bioinformatics and Computational Research Community Slack in the [#question-and-answer channel](https://fhbig.slack.com/archives/CD3HGJHJT) using your Fred Hutch, UW, SCHARP or Sagebase email.


## Cromwell Resources
Expand All @@ -16,13 +15,15 @@ Amy also made a shiny app you can use to monitor your own Cromwell server workfl
## Steps to prepare
If you have questions about these steps, feel free to contact Amy Paguirigan (`apaguiri`) or `scicomp`.

### Rhino Access
### Rhino Access (one time)
Currently, to run your own Cromwell server you'll need to know how to connect to `rhino` at the Fred Hutch. If you have never used the local cluster (`rhino`/`gizmo`), you may need to file a ticket by emailing fredhutch email `scicomp` and requesting your account be set up. To do this you'll need to specify which PI you are sponsored by/work for. You also may want to read a bit more about the use of our cluster over at [SciWiki](https://sciwiki.fredhutch.org/) in the Scientific Computing section about Access Methods, and Technologies.

### AWS S3 Access
This setup requires that you have a set of AWS credentials for access to an AWS S3 bucket, and specifically to the S3 bucket(s) from which you pull workflow inputs. Refer to [SciWiki](https://sciwiki.fredhutch.org/scicomputing/compute_cloud/#get-aws-credentials) or email `scicomp` to request credentials.
### AWS S3 Access (optional as of version 1.3)
Prior to release 1.3, this setup required that you have a set of AWS credentials for access to an AWS S3 bucket, and specifically to the S3 bucket(s) from which you pull workflow inputs. Refer to [SciWiki](https://sciwiki.fredhutch.org/scicomputing/compute_cloud/#get-aws-credentials) or email `scicomp` to request credentials.

### Database Setup
As of version 1.3 if you have credentials, then the Cromwell server will be configured to allow input files to directly specified using their AWS S3 url. However if you do not have AWS credentials, then the server will now successfully start up, simply without the ability to localize files from S3, thus all test workflows that use files in S3 will not work for you, but everything else should.

### Database Setup (one time)
These instructions let you stand up a Cromwell server for 7 days at a time. If you have workflows that run longer than that or you want to be able to get metadata for or restart jobs even after the server goes down, you'll want an external database to keep track of your progress even if your server goes down (for whatever reason). It also will allow your future workflows to use cached copies of data when the exact task has already been done (and recorded in the database). We have found as well that by using a MySQL database for your Cromwell server, it will run faster and be better able to handle simultaneous workflows while also making all the metadata available to you during and after the run.

We currently suggest you go to [DB4Sci](https://mydb.fredhutch.org/login) and see the Wiki entry for DB4Sci [here](https://sciwiki.fredhutch.org/scicomputing/store_databases/#db4sci--previously-mydb). There, you will login using Fred Hutch credentials, choose `Create DB Container`, and choose the MariaDB option. The default database container values are typically fine, EXCEPT you likely need either weekly or no backups (no backups preferred) for this database. Save the `DB/Container Name`, `DB Username` and `DB Password` as you will need them for the configuration step. Once you click submit, a confirmation screen will appear (hopefully), and you'll need to note which `Port` is specified. This is a 5 digit number currently.
Expand All @@ -45,7 +46,7 @@ MariaDB [(none)]> exit

## Server setup instructions

1. Decide where you want to keep your Cromwell configuration files. This must be a place where `rhino` can access them, such as in your `Home` directory, which is typically the default directory when you connect to the `rhinos`. Create a `cromwell-home` folder (or whatever you want to call it) and follow these git instructions to clone it directly.
1. Decide where you want to keep your Cromwell configuration files. This must be a place where `rhino` can access them, such as in your `Home` directory, which is typically the default directory when you connect to the `rhinos`. We suggest you create a `cromwell-home` folder (or whatever you want to call it) and follow these git instructions to clone it directly.


2. First set up the customizations per user that you're going to want for your server(s) by making user configuration file(s) in your `cromwell-home` or wherever you find convenient. You can manage mulitple Cromwell profiles this way by just maintaining different files full of credentials and configuration variables that you want.
Expand Down Expand Up @@ -80,26 +81,35 @@ chmod +x cromwellv1.2.sh
5. Much like the `grabnode` command you may have used previously, the script will run and print back to the console instructions once the resources have been provisioned for the server. You should see something like this:
```
Your configuration details have been found...
Getting an updated copy of Crowmell configs from GitHub...
Cloning into 'tg-cromwell-server'...
remote: Enumerating objects: 196, done.
remote: Counting objects: 100% (196/196), done.
remote: Compressing objects: 100% (179/179), done.
remote: Total 196 (delta 96), reused 109 (delta 13), pack-reused 0
Receiving objects: 100% (196/196), 41.14 KiB | 533.00 KiB/s, done.
Resolving deltas: 100% (96/96), done.
Getting an updated copy of Cromwell configs from GitHub...
Setting up all required directories...
Detecting existence of AWS credentials...
Credentials found, setting appropriate configuration...
Requesting resources from SLURM for your server...
Submitted batch job 36391382
Submitted batch job 50205062
Your Cromwell server is attempting to start up on **node/port gizmok30:2020**. If you encounter errors, you may want to check your server logs at /home/username/cromwell-home/server-logs to see if Cromwell was unable to start up.
Go have fun now.
```
> NOTE: Please write down the node and port it specifies here. This is the only place where you will be able to find the particular node/port for this instance of your Cromwell server, and you'll need that to be able to send jobs to the Crowmell server. If you forget it, `scancel` the Cromwell server job and start a new one.
6. For using the API (via a browser, some other method of submission), you'll want to keep the node and port number it tells you when you start up a fresh Cromwell server. When you go your browser, you can go to (for example) `http://gizmok30:2020` ("2020" or whatever the webservice port it told you) to use the Swagger UI to submit workflows. This node host and port also is what you use to submit and monitor workflows with the Shiny app at [cromwellapp.fredhutch.org](https://cromwellapp.fredhutch.org/) where it says "Current Cromwell host:port", you put `gizmok30:2020`.

6. This node host and port is what you use to submit and monitor workflows with the Shiny app at [cromwellapp.fredhutch.org](https://cromwellapp.fredhutch.org/). After you click the "Connect to Server" button, you'll put `gizmok30:2020` (or whatever your node:port is) where it says "Current Cromwell host:port".

7. While your server will normally stop after 7 days (the default), at which point if you have jobs still running you can simply restart your server and it will reconnect to existing jobs/workflows. However, if you need to take down your server for whatever reason before that point, you can go to `rhino` and do:

```
# Here `username` is your Fred Hutch username
squeue -u username
## Or if you want to get fancy:
squeue -o '%.18i %.9P %j %.8T %.10M %.9l %.6C %R' -u username
## You'll see a jobname "cromwellServer". Next to that will be a JOBID. In this example the JOBID of the server is 50062886.
scancel 50062886
7. See our [Test Workflow folder](https://github.com/FredHutch/diy-cromwell-server/tree/main/testWorkflows) once your server is up and run through the tests specified in the markdown there.
```

8. See our [Test Workflow folder](https://github.com/FredHutch/diy-cromwell-server/tree/main/testWorkflows) once your server is up and run through the tests specified in the markdown there.
> NOTE: For those test workflows that use Docker containers, know that the first time you run them, you may notice that jobs aren't being sent very quickly. That is because for our cluster, we need to convert those Docker containers to something that can be run by Singularity. The first time a Docker container is used, it must be converted, but in the future Cromwell will used the cached version of the Docker container and jobs will be submitted more quickly.
## Guidance and Support
Expand Down Expand Up @@ -129,11 +139,6 @@ SCRATCHDIR=/fh/scratch/delete90/...
### Suggestion: /fh/fast/pilastname_f/cromwell/workflow-logs
WORKFLOWLOGDIR=~/cromwell-home/workflow-logs
## Where do you want final output files specified by workflows to be copied for your subsequent use?
## Note: this is a default for the server and can be overwritten for a given workflow in workflow-options.
### Suggestion: /fh/fast/pilastname_f/cromwell/outputs
WORKFLOWOUTPUTSDIR=/fh/scratch/delete90/...
## Where do you want to save Cromwell server logs for troubleshooting Cromwell itself?
### Suggestion: ~/home/username/cromwell-home/server-logs
SERVERLOGDIR=~./cromwell-home/server-logs
Expand All @@ -146,21 +151,32 @@ CROMWELLDBNAME=...
CROMWELLDBUSERNAME=...
CROMWELLDBPASSWORD=...
## Number of cores for your Cromwell server itself - usually 4 is sufficient.
###Increase if you want to run many, complex workflows simultaneously or notice your server is slowing down.
NCORES=4
## Length of time you want the server to run for.
### Note: when servers go down, all jobs they'd sent will continue. When you start up a server the next time
### using the same database, the new server will pick up whereever the previous workflows left off. "7-0" is 7 days, zero hours.
SERVERTIME="7-0"
```
Whether these customizations are done user-by-user or lab-by-lab depend on how your group wants to interact with workflows and data. Also, as there are additional features provided in the additional config's we provide, there may be additional customization parameters that you'll need. Check the config directories to see if there are additional copies of those files and associated server shell scripts. If they are absent that means you can use the base setup. Contact Amy Paguirigan about these issues for some advice.

Contact Amy Paguirigan about these issues for some advice or file an issue on this repo.

## Task Defaults and Runtime Variables available
For the gizmo backend, the following runtime variables are available that are customized to our configuration. What is specified below is the current default as written, you can edit these in the config file if you'd like OR you can specify these variables in your `runtime` block in each task to change only the variables you want to change from the default for that particular task.

- `Int cpu = 1`
- `cpu = 1`
- An integer number of cpus you want for the task
- `String walltime = "18:00:00"`
- `walltime = "18:00:00"`
- A string of date/time that specifies how many hours/days you want to request for the task
- `Int memory = 2000`
- `memory = 2000`
- An integer number of MB of memory you want to use for the task
- `String partition = "campus-new"`
- Which partition you want to use, currently the only Bionic option is `campus-new`
- `String modules = ""`
- `partition = "campus-new"`
- Which partition you want to use, the default is `campus-new` but whatever is in the runtime block of your WDL will overrride this.
- `modules = ""`
- A space-separated list of the environment modules you'd like to have loaded (in that order) prior to running the task.
- `docker = "ubuntu:latest"`
- A specific Docker container to use for the task. For the custom Hutch configuration, docker containers can be specified and the necessary conversions (to Singularity) will be performed by Cromwell (not the user). Note: when docker is used, soft links cannot be used in our filesystem, so workflows using very large datasets may run slightly slower due to the need for Cromwell to copy files rather than link to them.
- `dockerSL= "ubuntu:latest"`
- This is a custom configuration for the Hutch that allows users to use docker and softlinks only to specific locations in Scratch. It is helpful when working with very large files.
6 changes: 1 addition & 5 deletions cromUserConfig.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,6 @@ SCRATCHDIR=/fh/scratch/delete90/...
### Suggestion: /fh/fast/pilastname_f/cromwell/workflow-logs
WORKFLOWLOGDIR=~/cromwell-home/workflow-logs

## Where do you want final output files specified by workflows to be copied for your subsequent use?
## Note: this is a default for the server and can be overwritten for a given workflow in workflow-options.
### Suggestion: /fh/fast/pilastname_f/cromwell/outputs
WORKFLOWOUTPUTSDIR=/fh/scratch/delete90/...

## Where do you want to save Cromwell server logs for troubleshooting Cromwell itself?
### Suggestion: ~/home/username/cromwell-home/server-logs
Expand All @@ -26,7 +22,7 @@ CROMWELLDBUSERNAME=...
CROMWELLDBPASSWORD=...

## Number of cores for your Cromwell server itself - usually 4 is sufficient.
###Increase if you want to run many, complex workflows simlutaneously or notice your server is slowing down.
###Increase if you want to run many, complex workflows simultaneously or notice your server is slowing down.
NCORES=4
## Length of time you want the server to run for.
### Note: when servers go down, all jobs they'd sent will continue. When you start up a server the next time
Expand Down
18 changes: 15 additions & 3 deletions cromwellv1.2.sh → cromwellv1.3.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ if [ ! -f ${1} ]; then
exit
fi
source ${1}
if [[ -z $NCORES || -z $SCRATCHDIR || -z $WORKFLOWLOGDIR || -z $WORKFLOWOUTPUTSDIR || -z $SERVERLOGDIR || -z $CROMWELLDBPORT || -z $CROMWELLDBNAME || -z $CROMWELLDBUSERNAME || -z $CROMWELLDBPASSWORD ]]; then
if [[ -z $NCORES || -z $SCRATCHDIR || -z $WORKFLOWLOGDIR || -z $SERVERLOGDIR || -z $CROMWELLDBPORT || -z $CROMWELLDBNAME || -z $CROMWELLDBUSERNAME || -z $CROMWELLDBPASSWORD ]]; then
echo "One or more of your personal configuration variables is unset, please check your configuration file and try again."
exit 1
fi
Expand All @@ -17,19 +17,31 @@ fi
echo "Getting an updated copy of Cromwell configs from GitHub..."
# If the repo already exists, delete it then re-clone a fresh copy
if [ -d "diy-cromwell-server" ]; then rm -Rf diy-cromwell-server; fi
git clone --branch v1.2 https://github.com/FredHutch/diy-cromwell-server.git
git clone --branch v1.3 https://github.com/FredHutch/diy-cromwell-server.git --quiet

echo "Setting up all required directories..."
# If the directory to write server logs to doesn't yet exist, make it.
if [ ! -d $SERVERLOGDIR ]; then
mkdir -p $SERVERLOGDIR
fi

# If the user doesn't have AWS credentials then the AWS-naive Cromwell config file needs to be used.
# Note this doesn't check that if the aws credentials exist that they are valid - that occurs when jobs using AWS get created in a workflow.
echo "Detecting existence of AWS credentials..."
if [ -f ~/.aws/credentials ]
then
echo "Credentials found, setting appropriate configuration..."
CONFFILE="./diy-cromwell-server/fh-S3-cromwell.conf"
else
echo "Credentials not found, setting appropriate configuration..."
CONFFILE="./diy-cromwell-server/fh-cromwell.conf"
fi

echo "Requesting resources from SLURM for your server..."
# Submit the server job, and tell it to send netcat info back to port 6000
sbatch --export=MYPORT=6000 --cpus-per-task=$NCORES -N 1 --time=$SERVERTIME --job-name="cromwellServer" \
--output=$SERVERLOGDIR/cromwell_%A.out\
./diy-cromwell-server/cromwellServer.sh ./diy-cromwell-server/fh-S3-cromwell.conf \
./diy-cromwell-server/cromwellServer.sh $CONFFILE \
${1}

nc -l -p 6000
Expand Down
6 changes: 1 addition & 5 deletions fh-S3-cromwell.conf
Original file line number Diff line number Diff line change
Expand Up @@ -61,11 +61,7 @@ aws {
}
engine {
filesystems {
local {
localization: [
"soft-link", "copy" # See SLURM backend for definitions.
]
}
local {localization: ["soft-link", "copy" ]}
s3 { auth = "default" }
}
}
Expand Down
Loading

0 comments on commit 2c0b5cc

Please sign in to comment.