This is a small utility that extracts records from a FASTA file based on names provided in a list file. Depends only on a compiler that supports C++-14 and cmake
version 3.21 or later.
Clone the repository
git clone https://github.com/tonymugen/subsetfa
Next, create a build directory
cd subsetfa
mkdir build
Run cmake
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
cmake --build .
cmake --install .
Installation may require root privileges.
Optionally, one can also build the unit tests. These require Catch2, although its installation is taken care of by cmake
. To build the tests, create a build-Tests
directory, say, and run
cd build
cmake -DCMAKE_BUILD_TYPE=Test -DBUILD_TESTS=ON ..
cmake --build .
To run the tests from the build directory, simply run
./tests
The binary is subsetfa
. It requires a multi-sequence FASTA file and a list of FASTA headers for sequences to be extracted. Headers must match those in the target FASTA file exactly, those that do not match anything will be ignored. A name of the output FASTA file can also be provided. If not, the default name subset.fasta will be used. The order of records in the output will not necessarily be the same as in the original file. Running subsetfa
without any arguments will print the command line flag syntax information.