Skip to content

Latest commit

 

History

History
123 lines (104 loc) · 4.46 KB

README.dev.md

File metadata and controls

123 lines (104 loc) · 4.46 KB

BioWardrobe NG

BioWardrobe-NG startup routine

  1. Read biowardrobe object from settings.json to establish connection with BioWardrobe DB

      "biowardrobe": {
        "db": {
          "host":     String,
          "user":     String,
          "password": String,
          "database": "ems",
          "port":     Int
        }
      }
  2. Check if settings.json includes information required for connection with the Central Post

       "rc_server": String,
       "rc_server_token": String

2.1 YES: connection to the Central Post can be established

  • Run syncronization with the Central Post

2.2 NO: Connection to the Central Post can't be established

  • Fetch workflows from GitHub
    • Read GitHub configuration from the settings.json

        "git": {
          "path":         String,  # Absolute path to the cloned repository
          "url":          String,  # URL to remote
        }
    • Try to clone repository from git.url

    • If clonning failed, open local repository from git.path

    • Fetch changes from the remote origin (currently harcoded here)

    • Merge fetched changes into the master branch

    • Get latest commit from the master branch

    • Get file list from workflows directory

    • For each workflow file:

      • Pack workflow and all dependencies into a single file
      • Upsert document in CWL collection (update if the document with the relative path the the workflow file and remote URL from where it was fetched already exist in the collection)
      • Read airflow object from settings.json
        "airflow":{
          "dags_folder": String  # From this folder Airflow loads DAGs
        }
      • Export *.cwl and correspondent *.py file into airflow.dags_folder

BioWardrobe-NG invoice generation

  1. Read billing configuration from settings.json to allow invoice generation (if absent - skip invoice generation)

       "billing":{
         "organization": "",
         "businessUnit": "",
         "fund": "",
         "department": "",
         "account": ""
       }
  2. Invoices are accessible only for admins. To add admin permissions to a specific user, use the following command

       db.users.update({_id: "UNIQUE_USER_ID"}, {$addToSet: {"roles.__global_roles__":"admin"}})

    The results should look similar to this

       "roles": {
           "__global_roles__": [
               "admin"
           ]
       }

BioWardrobe-NG Aria2

  1. Read Aria2 configuration from download["aria2"] field of settings.json configuration file. If absent, Aria2 won't be used. Set additional security options if necessary.

       "download": {
         "aria2": {
           "host": "localhost",
           "port": 6800,
           "secure": false,
           "secret": "",
           "path": "/jsonrpc"
         }
       }
  2. Run Aria2 server following the example. Set additional security options if necessary.

       aria2c --enable-rpc --rpc-listen-all=false --auto-file-renaming=false --rpc-listen-port=6800 --console-log-level=debug
  3. To download data from dna.cchmc.org file input should look the following way

         "fastq_file": {
             "class": "File",
             "location": "core:///input.fastq.gz",
             "format": "http://edamontology.org/format_1930"
         },

    Make sure to set protocol to core for the remote module to process the download (in settings.json). Additionally, all the required URLs for this specific module should be properly configured.

  4. To download data from GEO file input should look the following way

         "fastq_file": {
             "class": "File",
             "location": "geo://SRR123456,SRR234567",
             "format": "http://edamontology.org/format_1930"
         },

    Make sure to set protocol to geo for the remote module to process the download (in settings.json). Comma-separated list of SRR will be merged in one file (paired-end data will be merged properly). In case of paired-end data the first occurance of geo:// input will be assumed to be upstream, the next one - downstream. fastq-dump should be installed. To make fastq-dump work faster make sure to enable cache in SRA Toolkit configuration file (see ~/.ncbi directory)