Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to turn OFF the filtering of low complexity regions #5

Open
ulad-litvin opened this issue Nov 17, 2023 · 6 comments
Open

Option to turn OFF the filtering of low complexity regions #5

ulad-litvin opened this issue Nov 17, 2023 · 6 comments

Comments

@ulad-litvin
Copy link

Good afternoon,

I am wondering if there is a possibility to turn OFF the filtering of low-complexity regions in DIGS or make the filtering optional (e.g. in the control file). I am searching for a protein with a region of lox complexity, and unfortunately, this region cannot be found unless the filtering is switched off. I tested it using DIGS, NCBI tblastn, and Ensembl tblastn. NCBI and Ensembl allow me to find the exon corresponding to this region only when I deselect the "Filter low complexity region" option, while I cannot find the region at all using DIGS. I hope it makes sense.

Thanks a lot in advance!

@robjgiff
Copy link
Member

I think this should be possible. I will need to add an option to the command file to turn off filtering. If I understand correctly, this would mean supplying the following options to BLAST when running the forward BLAST search:

-dust no -soft_masking false

If you have any thoughts on this please let me know. I will do my best to implement this week.

@ulad-litvin
Copy link
Author

I looked through the help pages for different blasts, and it looks like -dust filtering option exists only for blastn. Other blasts have -seg filtering option. According to the BLAST manual, segmasker is used to mask low-complexity regions of protein sequences, while dustmasker is used to do a similar thing for nucleotide sequences. I don't know for sure, but since the forward BLAST search uses tblastn, I think we need to change only one thing:

-seg no
# -soft_masking is false by default in tblastn

Let me know what you think.

@robjgiff
Copy link
Member

Thanks this is helpful - It is possible to perform DIGS using blastn, so it looks as though I will need to implement this slightly differently depending on which BLAST program is being used.

@robjgiff
Copy link
Member

robjgiff commented Jan 23, 2024

I've updated the DIGS code that BLAST arguments can now be provided in the .ctl file, in the screensets block. e.g.

BEGIN SCREENSETS;
        query_na_fasta=/home2/rg128p/DIGS/projects/erv/fasta/herv/HERV_LTR-probes.fna
        reference_na_fasta=/home2/rg128p/DIGS/projects/erv/fasta/herv/HERV_LTR-references.fna;
        consolidated_reference_na_fasta=/home2/giff01r/DIGS/projects/erv/fasta/herv/HERV_LTR-references.fna;
        output_path=./tmp/;
        blast_bin_path='';
        bitscore_min_blastn=30;
        seq_length_minimum=50;
        defragment_range=50;
        consolidate_range=200;
        num_threads=8;
        dust=no;
        softmasking=false;
ENDBLOCK;

@ulad-litvin
Copy link
Author

Hi Rob, I've been testing DIGS for the last couple of weeks. I've run it with the same settings as before, but this time turning off the pre-filtering.

BEGIN SCREENSETS;
    query_aa_fasta=/home2/2820395l/documents/digs_shisa5_test_4/digs_shisa5_probes_v1.fa;
    reference_aa_fasta=/home2/2820395l/documents/digs_shisa5_test_4/digs_shisa5_references_v1.fa;
    output_path=/home2/2820395l/documents/digs_shisa5_test_4/digs_shisa5_mammals_test_4_results/;
    bitscore_min_tblastn=41;
    seq_length_minimum=70;
    defragment_range=10;
    num_threads=2;
    seg=no;
    dust=no;
    softmasking=false;
ENDBLOCK;

I've managed to find missing Pro-rich protein regions this time. However, the problem now is that they have the wrong labelling (assigned_gene_name). I'm wondering if this has something to do with pre-filtering for the reverse BLAST run. When I turn off the pre-filtering, does it work only for the forward BLAST run or the reverse run as well? For instance, I've noticed that tblastn is now running with -seg no option (as it should be when pre-filtering is off). Thanks a lot!

@robjgiff
Copy link
Member

OK thanks, this is good to know about, I'll take a look and get back to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants