Qiime2 is one of the most popular software used to analyze the output of metabarcoding experiment, and it introduced a unique data format in the bioinformatics scenario: the “Qiime2 artifact”.
Qiime2 artifacts are structured compressed archives containing a dataset (e.g., FASTQ reads, representative sequences in FASTA format, a phylogenetic tree in Newick format, etc.) and an exhaustive set of metadata (including the command that generated it, information on the execution environment, citations on the used software, and all the metadata of the artifacts used to produce it).
While artifacts can improve the shareability and reproducibility of Qiime workflows, they are less easily integrated with general bioinformatics pipelines, and even accessing metadata in the artifacts requires the full Qiime2 installation (not to mention that every release of Qiime2 will produce incompatible artifacts). Qiime Artifact Extractor (qxa) allows to easily interface with Qiime2 artifacts from the command line, without needing the full Qiime2 environment installed.
If you use this tool, please cite
Telatin A (2021) Qiime Artifact eXtractor (qax): A Fast and Versatile Tool to Interact with Qiime2 Archives. BioTech 10: 5. Available: (doi.org/10.3390/biotech10010005)[http://dx.doi.org/10.3390/biotech10010005]
Pre-compiled binaries are the fastest and easiest way to get qax. To get the latest version, use the following command, otherwise check the stable releases.
# From linux
wget "https://github.com/telatin/qax/raw/main/bin/qax"
chmod +x qax
# From macOS
wget -O qax "https://github.com/telatin/qax/raw/main/bin/qax_mac"
chmod +x qax
Alternatively, you can install qax from BioConda, if you have conda installed:
conda install -c conda-forge -c bioconda qax
qax
has four subprograms (general syntax is qax [program] [program-arguments]
):
- list (default): list artifact(s) properties
- citations: extract citations in BibTeX format
- extract: extract artifact data files
- provenance: describe artifact provenance, or generate its graph
- view: print the content of an artifact (eg. dna-sequences.fasta) to the terminal
This is the default module, and can be used to list the properties of one or more artifacts.
Some features:
- Supports multiple files at once
- 100X times faster than Qiime2
- Can be used to find an artifact given the ID
Example:
qax_mac -b -u input/*.*
┌───────────────────────────┬────────────────┬─────────────────────────┬─────────────────────────────┐
│ ID │ Basename │ Type │ Format │
├───────────────────────────┼────────────────┼─────────────────────────┼─────────────────────────────┤
│ bb1b2e93-...-2afa2110b5fb │ rep-seqs.qza │ FeatureData[Sequence] │ DNASequencesDirectoryFormat │
│ 313a0cf3-...-befad4ebf2f3 │ table.qza │ FeatureTable[Frequency] │ BIOMV210DirFmt │
│ 35c32fe7-...-85ef27545f00 │ taxonomy.qzv │ Visualization │ HTML │
└───────────────────────────┴────────────────┴─────────────────────────┴─────────────────────────────┘
This program extract the content of an artifact. By default, if a single file is present it will be extracted in the specified path. If multiple files are present, a directory containing them will be created instead.
Example:
# Extract representative sequences (will be called rep-seqs.fasta)
qax x -o ./ rep-seqs.qza
# Extract a visualization (a folder called "taxonomy" will be created)
qax x -o ./ taxonomy.qzv
Each Qiime module provides the citations for the software and resources that it uses, storing the citations in BibTeX format inside the artifacts. The cite module allows to extract all the citations from a list of artifacts, removing the duplicates, thus effectively allowing to prepare the bibliography for a complete Qiime2 analysis.
Example:
qax c files/*.qza > bibliography.bib
This program allows to print the provenance of an artifact, or to produce a publication grade graph of the provenance.
Example:
# To view a summary
qax p taxonomy.qzv
# To save the plot
qax p -o graph.dot taxonomy.qza
This program allows to print the content of an artifact data file to the terminal. If the artifact contains a single file, it will be printed. Otherwise the user can specify one or multiple files to be printed, and if none is specified, a list of files will be printed.
# Example: count the number of representative sequences
qax view rep-seqs.qza | grep -c '>'
To create a visualization artifact from a folder with a website (index.html must be present).
qax make -o report.qza /path/to/report_dir/