Skip to content
Patrick Durand edited this page Oct 17, 2017 · 3 revisions

The user manual

BLAST Filter Tool (BFT) is a very easy-to-use command line tool. Its general use is as follows:

usage: java -jar blast-filter-tool-X.Y.jar <args>
 -d <arg>   Directory with BLAST result files, legacy XML formated
            [mandatory]. Exclusive with '-i'.
 -e         launch the Filter Editor
 -f <arg>   filter file [mandatory]
 -h         print this message
 -i <arg>   Single BLAST result file, legacy XML formated [mandatory].
            Exclusive with '-d'.
 -n         verbose mode off; default in on
 -o <arg>   result file
 -p <arg>   Pattern to locate BLAST files in directory. Default is '*.*'.
            Use with '-d'.

So, first of all, you need two "things" to use BFT:

  • a filter
  • a BLAST result file (XML format)

What is a filter?

A filter is made of some rules, each of them being on constraint applied on the data contains in a BLAST result file.

For instance, a BLAST file provides scores, e-values, hit definitions, alignment lengths, etc. Using BFT, you can create specific contraints to retain only hits satisfying these criteria.

What is a BLAST XML result file?

BFT is capable of reading legacy NCBI XML file. This is the result file you can create when using the following argument of BLAST+:

-outfmt 5 

For those of you that are still using the legacy BLAST (i.e. blastall), use this argument:

-m 7

And for those of you that are using PLAST, use this argument:

-m 4

How to create a filter?

Use the filter editor provided with this software. Start it as follows:

  java -jar blast-filter-tool-5.0.0.jar -e

The first time you start the Filter Editor, you'll notice that it provides some sample filters, such as follows:

From the FilterManager, you can adapt them or create new ones to meet your requirements. For instance, here is the FilterEditor editing the filter selected on the previous snapshot:

It is worth noting that all your filters are automatically saved by the software, so that you never loose them (storage path is: <your_home_directory>/.bft_filter).

When done, look at the bottom of the FilterManager frame: you'll see the path to the file containg the filter. For instance, considering the FilterManager frame's bottom showed above, you can see: '-f /Users/pdurand/.bft_filter/filter9.xml'; this is the argument line fragment to use on the command line to apply that filter to your BLAST results. Let's see how to do that for real...

How to filter a result?

On the revious step, you've created a Filter that is stored in some file. Now, you use that filter file as follows:

 java -jar blast-filter-tool-5.0.0.jar -i <blast_xml_file> -f <filter_file>

 with:
    <blast_xml_file>: path to the BLAST XML result file
    <filter_file>   : path to the filter file
    (it is usually a good idea to use absolute path).

In this example, BFT stores the result in the new file called <blast_xml_file> suffixed with '_filtered'. If you want ot create a file with a name of your choice, simply add the following argument to the previous command:

  ...   -o <filtered_result_file>

Notice: when a filter does not retain any hits, no result file is created.

How to filter several results?

Of course, you could use procedure 'C' to setup a shell script when you have to process several BLAST results.

However, there is no need for such a script: BFT comes with specific arguments to deal with multiple BLAST results processing, as follows:

java -jar ... -d <blast_directory> -f <filter_file>

with:
   <blast_directory>: path to a directory containing BLAST result files
   <filter_file>    : path to the filter file
   (it is usually a good idea to use absolute path).

By default, that command will process ALL files contained in the provided directory. If you want to only process particular files that can be identified using a regular expression, use this:

java -jar ... -d <blast_directory> -p "<reg_exp>" -f <filter_file>

with:
   <blast_directory>: path to a directory containing BLAST result files
   <reg_exp>        : regular expression between double quotes
   <filter_file>    : path to the filter file
    (it is usually a good idea to use absolute path).

Example:

java -jar ... -d my_results -p "blast*.xml" ...

will only process all files matching 'blast*.xml' in directory 'my_results'.

All filtered results are saved in a file having a name made of the original BLAST file name suffixed with '_filtered'. When using '-d', you cannot use option '-o'.

How to turn off verbose mode?

By default, the tool tells you what it does. Use this argument to turn off verbose mode:

  ...   -n

Make a test!

So, let's make a try with test data.

First, put the 'blast-filter-tool-5.0.0.jar' file into a directory. Then put in the same directory the files 'blastp.xml' and 'filter1.xml' available in the 'test' directory of this project.

Now, start a job:

java -jar blast-filter-tool-5.0.0.jar -i blastp.xml -f filter.xml -o out.xml

Start filtering:
  read filter: filter1.xml
    filter is: HSP E-Value < 0.001
  read input file: blastp.xml
    content: 1 iteration ; 19 hits ; 20 HSPs
  filtering done
    content: 1 iteration ; 15 hits ; 16 HSPs
    writing file: out.xml

And you'll have a file called 'out.xml': the results of the filtering of 'blastp.xml'.

Memory (RAM) issues

BLAST XML file can be very huge. So, you could have to request the Java Virtual Machine to play with more memory using well known JVM arguments: -Xmx and -Xms.

For instance, the following command starts BFT with 256Mb of memory and will allow the process to use up to 2G of memory:

java -Xms256m -Xmx2G -jar blast-filter-tool-5.0.0.jar