Skip to content

oicr-gsi/cBioPortal_Importer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cBioPortal Importer

Script pycbio.py is used to generate an import folder with all the data and metadata files required for uploading data to cBioPortal.

The script is available as a module:

module load cbioportal-importer

The module will also load accessory tools in the environment required for processing and annotating mutations.

Currently, data accepted for CbioPortal uploads are maf files from the VEP workflow, .seg files from sequenza, .genes.results files from the rsem workflow and .tab from the mavis workflow.

The data should be organized in a comma-separated map.csv file with the following information:

patient_id,sample_id,maf_file.maf.gz,seg_file.seg,rsem.genes.results,mavis.tab

Options, including path the output directory, path to the mapping file and filters can be specified in the config file

Generate the import folder with the following command:

cbio_importer generate -cf /path/to/config

By default, the only clinical sample information that is required are the patient and sample identifiers which are extracted from the mapping file. It is possible to add user-defined clinical fields using an optional tab-delimited clinical file. The first two columns should be labeled Patient and Sample and must contain the same patient and sample identifiers as in the map.csv file. Any other column names are valid but each column must contain a single data type (eg, boolean, string or number). The column names provided in the clinical information file will be displayed in cBioPortal.

cbio_importer generate -cf /path/to/config -cl /path/to/clinical_information

Data uploaded to cBioPortal will replace any data already on the server. To add new data without replacing the existing data it is possible to merge data from existing import folder to new data. The raw data from the existing folder will be added to and processed with the new data to generate a new import folder. This allows to 1) upload data for which the original files have been deleted, 2) add data incrementally to cBioPortal.

cbio_importer generate generate -cf /path/to/config/ --append -mid /path/to/previous/outputdirectory

Note that the /path/to/previous/outputdirectory is the outdir in the config file used to generate the previous import folder. Its it not the import folder itself.

Example command:

cbio_importer generate -cf /.mounts/labs/gsiprojects/gsi/gsiusers/rjovelin/cbiportal_importer_dev/config_cbioportal_batch2.ini --append -mid /.mounts/labs/gsiprojects/gsi/gsiusers/rjovelin/cbiportal_importer_dev/batch1/out/

Parameters

argument purpose required/optional
-cf Path to configuration file required
-cl Path to sample clinical information file optional
--append Flag to indicate that data will be merged with data from a previous import folder optional
-mid Path to the output directory (outdir in the config) of a previous import folder for which data will be merged optional