immpload converts input data files into files formatted from Immport upload templates.
- Python3 with the pip package installer
-
Install the
immpload
Python package and executable:pip install immpload
The simplest case copies input columns whose name matches the corresponding output Immport template column:
$ immpload subjectAnimals /path/to/input/subjects.txt
which will create the Immport upload file subjectAnimals.txt
in the current directory.
To place the output in a different directory, use the -o
or
--outDir
option:
$ immpload -o /path/to/output subjectAnimals /path/to/input/subjects.xslx
Note that the input can be either a .xslx
Excel spreadsheet
or a tab-delimited text file.
The command:
$ immpload --help
shows all immpload
arguments and options.
It is often useful to specify the conversion mapping in a YAML configuration file. For example, the following configuration:
columns:
Subject ID: ID
Arm Or Cohort ID: Cohort
converts the ID
and Cohort
input values to Subject ID
and
Arm Or Cohort ID
output values, resp. The command is invoked
with the -c
or --config
option, e.g:
$ immpload -o /path/to/output --config /path/to/conf/subjects.yaml \
subjectAnimals /path/to/input/subjects.xslx
The configuration can include value mappings, e.g.:
values:
Species: Mus musculus
sets the output Species
to Mus musculus
for all rows.
The configuration:
columns:
Gender: Sex
values:
Gender:
n/a: Not Specified
transforms the input Sex
value n/a
to the output Gender
value
Not Specified
. Other input values are copied without change.
immpload
can flatten each input row into several output rows based
on matching input column names against a pattern. For example, the
configuration:
columns:
Subject ID: ID
Arm Or Cohort ID: Cohort
Study Day: day
patterns:
Result Value Reported: D(?P<day>\d+)$
converts an input row with columns D1
, D2
and D3
into three
output rows with column Study Day
values 1
, 2
and 3
and Result Value Reported
values given by the D1
, D2
and D3
input values, resp.
Immport upload data can be derived solely from fields embedded in column names. For example, the configuration:
columns:
Analyte Reported: analyte
patterns:
Analyte Reported: (?P<subject>.+)_(?P<day>.+)_(?P<analyte>.+)$
matches the input column names against the given pattern and
writes one output row per matching column with the Analyte Reported
column set to the embedded analyte match value. In this case,
no other input rows are read besides the first header row of column
names. Note that Analyte Reported
is assigned the match value
rather than the matching column value.
immpload
supplies certain required output columns with a reasonable
default, as follows:
-
Animal Subjects (
subjectAnimals.txt
)Age Unit
-Days
Age Event
-Age at infection
-
Experiment Samples (
experimentSamples.*.txt
)Experiment ID
- lower-case, underscoredExperiment Name
Biosample ID
-Expsample ID
, if present, otherwise the lower-case, underscoredBiosample Name
, if present, otherwise derived from theSubject ID
,Treatment ID
andExperiment ID
Expsample ID
-Biosample ID
(defaulted, if necessary)
-
Treatments (
treatments.txt
)Name
- derived from the values and unitsUser Defined ID
- lower-case, underscoredName
Use Treatment?
- default isYes
-
Assessments (
assessments.txt
)Planned Visit ID
-Study ID
followed byd
and theStudy Day
Panel Name Reported
- copied from theAssessment Type
Assessment Panel ID
- derived from thePanel Name Reported
User Defined ID
- derived from theSubject ID
,Planned Visit ID
andComponent Name Reported
The default is set if and only if the mapped column value is missing.
Defaults are disabled with the --no-defaults
option, e.g.:
$ immpload -o /path/to/output --config /path/to/conf/subjects.yaml \
--no-defaults subjectAnimals /path/to/input/subjects.xslx
This is useful when submitting an update to an existing upload.
By default, imppload
checks the output for required fields. If a
required field is missing, then an error message is displayed and
processing is halted.
Validation is disabled with the --no-validate
option, e.g.:
$ immpload -o /path/to/output --config /path/to/conf/subjects.yaml \
--no-validate subjectAnimals /path/to/input/subjects.xslx
As with no-defaults
, no-validate
is useful when submitting an
update to an existing upload.
For advanced usage, the immpload
Python module can be used directly
in a Python script with a callback function, e.g.:
from immpload import munger
def add_results(in_row, in_col_ndx_map, out_col_ndx_map, out_row):
"""
Modifies the output row after the configuration-based conversion.
:param: in_row: the input data row
:param: in_col_ndx_map: the input {column: index} dictionary
:param: out_col_ndx_map: the output {column: index} dictionary
:param: out_row :the output row
:return: a list of rows derived from the given output row
"""
###
### Modify out_row or create new output rows here...
###
# Return an array of rows.
return [out_row]
# Convert the input file.
munger.munge('assessments', /path/to/input.xslx, callback=add_results)
The munger.munge
method signature is as follows:
def munge(template, *in_files, config=None, out_dir=None,
sheet=None, input_filter=None, callback=None, **kwargs):
"""
Builds the Immport upload file for the given input file.
The template is a supported Immport template name, e.g.
`assessments`. The output is the Immport upload file,
e.g. `assessments,txt`, placed in the output directory.
The keyword arguments (_kwargs_) are static output
_column_`=`_value_ definitions that are applied to every
output row. The column name can be underscored, e.g.
`Study_ID`.
Output validation is disabled by default, but recommended
for new uploads. Enable validation by setting the _validate_
flag parameter to `True`.
:param template: the required Immport template name
:param in_files: the input file(s) to munge
:param config: the configuration dictionary or file name
of list of file names
:param out_dir: the target location (default current directory)
:param sheet: for an Excel workbook input file, the sheet to open
:param input_filter: optional input row validator which has
parameter in_row and returns whether the row is valid
:param callback: optional callback with parameters
in_row, in_col_ndx_map, out_col_ndx_map and out_row returning
an array of rows to write to the output file
:param defaults_opt: flag indicating whether to add defaults to the
output (default `True`)
:param validate_opt: flag indicating whether to validate the
output for required fields (default `True`)
:param append_opt: append rather than overwrite an existing output
file (default False)
:param kwargs: the optional static _column_`=`_value_ definitions
:return: the output file name
"""