Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem while running ./PROmiRNA for mouse miRNA TSS prediction: string_base.h:448 Assertion failed #1

Open
YucanChen opened this issue Oct 8, 2020 · 6 comments

Comments

@YucanChen
Copy link

Hello, I want to use PROmiRNA for mouse miRNA TSS prediction. While running the program, I bumped into problem like these. I wonder if it is because using the wrong -s file (gene start region gff), and I have not seen any detailed description about this file in README file. I tried "Mus_musculus.GRCm38.101.gff3" and "mus_musculus.GRCm38.Regulatory_Build.regulatory_features.20180516.gff", both failed.

The error output is listed below:

$./PROmiRNA -g ../external_data/mm10.fa -c ../external_data/Mus_musculus.GRCm38.101.gtf -s ../external_data/Mus_musculus.GRCm38.101.gff -r ../external_data/mm10_repeats.bed -a ../external_data/mmu.gff3 -m ../external_data/mirna.txt -n ../external_data/mirna_context.txt -p ../external_data/TATA_box_jaspar.psem -w ../external_data/mm10.60way.phastCons.wig -i ../external_data/bed_files/ -t 16

Starting miRNA promoter prediction
Number of miRNAs for analysis: 1226
Number of overlaps between tags and miRNAs:
../external_data/bed_files/mm10_fair_new_CAGE_peaks_phase1and2.bed 9312
Unique regions TSS: 9312
/mnt/c/Users/Administrator/Documents/GitHub/PROmiRNA/seqan/include/seqan/sequence/string_base.h:448 Assertion failed : static_cast(pos) < static_cast(length(me)) was: 1450975284 >= 7 (Trying to access an element behind the last one!)

stack trace:
0 [0x7f6a56feefaf] ./PROmiRNA(+0x3bfaf)
1 [0x7f6a5700ce08] ./PROmiRNA(+0x59e08)
2 [0x7f6a5703a5ef] mergeAndFilter(std::vector<DatasetRecord, std::allocator >&, MatrixPair const&, MatrixSingle const&, std::vector<std::map<std::pair<seqan::String<char, seqan::Alloc >, GenomicLocation>, std::pair<unsigned int, bool>, std::less<std::pair<seqan::String<char, seqan::Alloc >, GenomicLocation> >, std::allocator<std::pair<std::pair<seqan::String<char, seqan::Alloc >, GenomicLocation> const, std::pair<unsigned int, bool> > > >, std::allocator<std::map<std::pair<seqan::String<char, seqan::Alloc >, GenomicLocation>, std::pair<unsigned int, bool>, std::less<std::pair<seqan::String<char, seqan::Alloc >, GenomicLocation> >, std::allocator<std::pair<std::pair<seqan::String<char, seqan::Alloc >, GenomicLocation> const, std::pair<unsigned int, bool> > > > > > const&, std::map<std::pair<seqan::String<char, seqan::Alloc >, GenomicLocation>, unsigned int, std::less<std::pair<seqan::String<char, seqan::Alloc >, GenomicLocation> >, std::allocator<std::pair<std::pair<seqan::String<char, seqan::Alloc >, GenomicLocation> const, unsigned int> > > const&, std::map<seqan::String<char, seqan::Alloc >, unsigned int, std::less<seqan::String<char, seqan::Alloc > >, std::allocator<std::pair<seqan::String<char, seqan::Alloc > const, unsigned int> > >&, seqan::String<char, seqan::Alloc > const&, seqan::String<char, seqan::Alloc > const&, std::map<seqan::String<char, seqan::Alloc >, seqan::String<char, seqan::Alloc >, std::less<seqan::String<char, seqan::Alloc > >, std::allocator<std::pair<seqan::String<char, seqan::Alloc > const, seqan::String<char, seqan::Alloc > > > > const&, std::vector<GenomicLocation, std::allocator >&, unsigned int) + 0xd9f
3 [0x7f6a56fe6400] main + 0x1dc0
4 [0x7f6a565b70b3] __libc_start_main + 0xf3
5 [0x7f6a56fe91f6] ./PROmiRNA(+0x361f6)

Aborted (core dumped)

@sarahet
Copy link
Collaborator

sarahet commented Oct 9, 2020

Hi @YucanChen

I apologize we did not provide a better explanation, I will add this now. The gene starts file should be a file marking the gene starts or promoters, e.g the first exon or a pre-defined region around the TSS of genes/transcripts you would like to consider for the analysis. These are the regions that get excluded in order to not call the promoter of another gene/transcript by accident. If you have your desired file with this information and you still get this error, could you please paste the first lines here? Then I will have a better chance helping you.

Sara

@YucanChen
Copy link
Author

Thank you @sarahet. I've obtained the TSS annotation data for Mus musculus (GRCm38.p1) obtained from biomaRt, and the file is in gff3 format with .gff suffix. However, the problem still occured with the same error output, which seemed to be running past the end of an array. I wonder where the length,"7", of the array was defined? It does not make sense if the problem was due to the improper format of files, since the most of they were downloaded from the official websites, and .psem and the repeat file were from the link in paper. I can't figure out which files are problematic or if there are some other reasons leading to the issue. Could you help me with that?

@sarahet
Copy link
Collaborator

sarahet commented Oct 9, 2020

Could you please still paste the top 10 lines of the file you are using in here or send me a direct link to the file such that I can download it? Of course it could be a bug but I actually need to see what the input looks like no matter what the file format should be.

@YucanChen
Copy link
Author

(1) These files were directly downloaded from webs:
http://hgdownload.cse.ucsc.edu/goldenPath/mm10/bigZips/mm10.fa.gz
ftp://ftp.ensembl.org/pub/release-101/gtf/mus_musculus/Mus_musculus.GRCm38.101.gtf.gz
http://promirna.molgen.mpg.de/mm10_repeats.bed.gz
ftp://mirbase.org/pub/mirbase/22.1/genomes/mmu.gff3
ftp://mirbase.org/pub/mirbase/22.1/database_files/mirna.txt.gz
ftp://mirbase.org/pub/mirbase/22.1/database_files/mirna_context.txt.gz
(2) This file was converted to wig file by UCSC bigWigToWig tool:
http://hgdownload.cse.ucsc.edu/goldenPath/mm10/phastCons60way/mm10.60way.phastCons.bw
(3) This is the same as you put in the github:
TATA_box_jaspar.psem
(4) The TSS I extracted from biomart in R and exported to gff3 format using rtracklayer package:
TSS.mouse.GRCm38.zip

@hufanglq
Copy link

It seems to randomly happened. I had same error, but I can go through this step after trying several times. Finally, I got an error for not enough cage data. I only supplied fantom5 CAGE peaks.

@SianGol
Copy link

SianGol commented Aug 3, 2021

@hufanglq Did you manage to overcome this issue? I have reached the same step and get this error:

Starting miRNA promoter prediction
Number of miRNAs for analysis: 1226
Number of overlaps between tags and miRNAs:
~/PRO_miRNA_input/CAGE/mm10_fair+new_CAGE_peaks_phase1and2.bed 9312
Unique regions TSS: 9312
Unable to convert '.' into unsigned int.
Number of TSS in dataset for EM algorithm: 0
Number of overlaps between tags and background regions:
~/PRO_miRNA_input/CAGE/mm10_fair+new_CAGE_peaks_phase1and2.bed 0
ERROR: No overlap found between CAGE tags and miRNAs/background. Rerun with more/other CAGE data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants