-
Notifications
You must be signed in to change notification settings - Fork 0
Software Overview
In this section, we will look at how the end-user can interact with CATE and the different functions currently available.
CATE now comes equipped with its viral simulation tool called APOLLO, the documentation on Apollo is provided here.
Each function will be explained further in its separate section in How to use.
Users will interact with CATE via the command line. They will enter the function they require with CATE’s proprietary properties file and in certain instances where needed a gene file containing the coordinates of the query regions to be analyzed. In each function's section, we explain and list out the different files and prerequisites that may be needed before their execution.
If broken down CATE essentially has seven steps of execution from preparation to the final retrieval of results. These steps are as follows:
- Selection of the evolutionary test function to be executed.
- Selection of the calculation mode (Window/ Gene mode).
- If any other external files need to be prepared such as a gene file, they must be prepared. This is dependent on the function being executed.
- Preparation of the parameters.json file.
- Execution of CATE via the Command Line Interface (CLI).
- The user will pass the argument for the function and,
- The location of the parameters.json file.
- While CATE is running it will provide the user with a series of messages updating the user on its progress in real-time.
- Finally, the user will be provided with a tab-delaminated file of the results.
CATE has a total of 17 functions. Eight complimentary tools were designed for genomic data processing, and augmentation, seven functions were dedicated for six different evolutionary tests, and two helper functions.
- VCF splitter
- Splits the parent VCF file into CATE’s indexed format. Requires a population file to separate the VCF by populations.
- FASTA splitter
- Can be used to extract a user-specific single FASTA sequence from a merged FASTA file or to extract all sequences separately.
- FASTA merger
- Merges multiple FASTA files in a folder into one singular FASTA file.
- Gene extractor
- Used to extract the FASTA sequences of target regions specified through CATE’s gene file. Requires a reference genome in FASTA format.
- GFF to Gene file converter
- Used to extract the gene regions from a GFF file. It organizes them into CATE’s gene file format.
- Haplotype extractor
- Identifies unique haplotypes for different regions and reconstructs the FASTA sequences for each haplotype. Can be used to reconstruct the entire sequence population as well.
- MAP File to Gene File converter
- Used to extract the SNPs from a MAP file. It organizes them into CATE’s gene file format for EHH SNP/ BP mode.
- Print sample parameter file
- Automatically generates a sample parameter file for the user.
- Tajima’s D statistics test
- Calculates the Tajima's D statistic (1989).
- Fu and Li statistics test
- Calculates the Fu and Li's D, D*, F and F* statistics (1993).
- The F* statistic's vf* and uf* are calculated based on the corrected equations in Simonsen et al (1995).
- Fay and Wu statistics test
- Calculates the Fay and Wu's normalized H and E statistics (2006).
- Neutrality tests
- Calculates the above three Neutrality tests (Tajima's Fu and Li's and Fay and Wu's) at once.
- McDonald–Kreitman neutrality index
- Calculates the McDonald–Kreitman Neutrality Index (NI) (1991).
- Fixation index
- Calculates the Fixation Index (Fst) (1965).
- Extended Haplotype Homozygosity
- Calculates the Extended Haplotype Homozygosity (EHH) (2002).
- CUDA device list
- Lists all available CUDA devices. User can then use the CUDA ID to configure CATE.
- Help menu
- Prints the general help menu built into CATE.