-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error with dbCreator #13
Comments
ok my vcf was a product from a merging software that merged multiple vcf files of snps and indels...so i decided to use an vcf that was in this format generated by varscan. And now, I had another error: dbfile <- dbCreator(gtfFile=gtffile,vcfFile=vcffile,bedFile=bedfile,annotation_path=annotation,outfile_name=outfile_name,genome=Hsapiens,outdir=outfile_path) |
In your *.bed file, the coordinates of the start point (12697+575) in your junction part 2 exceed the end point (13221). This indicates that your file is not a standard output file generated by Tophat. Each junction from junctions.bed reported by TopHat will consists of two connected BED blocks in the column 11 and 12. In the *.gtf file. this function was designed for the cufflinks output. Only the transcript_id beginning with "Cuff." will be considered as novel sequences for the subsequent processing. |
ok... so what pipeline do you recommend to generate a dabase from 3 patients raw sequencing data? since pga accepts only one vcf, one bed and one gtf? cheers |
and why you don't create an alternative for tophat, since it's being discontinued since 2016? |
This information is missing in the tutorial. I think the tools used in the generation of the GTF and BED files needs to be mentioned briefly for ease of following the workflow. I have struggled and this information solved my problem |
hello I started from the beggining then I have a new vcf, new gtf and new tabFile |
and now I have got this error :
|
any ideas? |
@gwajnberg I also had that problem and I was not able to solve it |
Hello javanOkendo, I found the problem there is a function in the end of the script called Tab2Range, which tries to replace if you have a row into your tab File with a "." instead of + or - ... but instead of using a condition the function just try to create a tmp variable, which if you don't have any "." it will be empty...here is it Tab2Range #replace the strand "." with "+" and "-". part1_sta <- as.numeric(jun[,'V2'])+1-59 junction <- data.frame(chr=jun[, 'V1'], id="NULL", start=jun[, 'V2'], junRange <- GRanges(seqnames=junction$chr, ranges=IRanges(start=junction$part1_end, } |
@gwajnberg did this work with your data? if Yes can you share the workflow which you used so that I can give it a trial on my data? You can send the attachment to oknjav001@myuct.ac.za |
Hello javan, I will send you for sure!! I will just clean the script lol, because i went through line by line of each PGA's functions |
I will appreciate is and I look forward to it. Thanks for the good gesture
…On Thu, Jun 4, 2020 at 5:53 PM gwajnberg ***@***.***> wrote:
Hello javan, I will send you for sure!! I will just clean the script lol,
because i went through line by line of each PGA's functions
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#13 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGJ34OY3A4OXC54YV7VQFVDRU67N7ANCNFSM4MTZLVYA>
.
|
@gwajnberg did you manage to make the script work? |
Hello ,
I just tried to use with my data PGA package and i had an error in dbCreator:
library(BSgenome.Hsapiens.UCSC.hg19)
dbfile <- dbCreator(gtfFile=gtffile,vcfFile=vcffile,bedFile=bedfile,annotation_path=annotation_path,outfile_name=outfile_name,genome=Hsapiens,outdir=outfile_path)
Error in y[, z] : subscript out of bounds
this is the vcf header:
##fileformat=VCFv4.1
##source=VarScan2
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total depth of quality bases">
##INFO=<ID=SOMATIC,Number=0,Type=Flag,Description="Indicates if record is a somatic mutation">
##INFO=<ID=SS,Number=1,Type=String,Description="Somatic status of variant (0=Reference,1=Germline,2=Somatic,3=LOH, or 5=Unknown)">
##INFO=<ID=SSC,Number=1,Type=String,Description="Somatic score in Phred scale (0-255) derived from somatic p-value">
##INFO=<ID=GPV,Number=1,Type=Float,Description="Fisher's Exact Test P-value of tumor+normal versus no variant for Germline calls">
##INFO=<ID=SPV,Number=1,Type=Float,Description="Fisher's Exact Test P-value of tumor versus normal for Somatic/LOH calls">
##FILTER=<ID=str10,Description="Less than 10% or more than 90% of variant supporting reads on one strand">
##FILTER=<ID=indelError,Description="Likely artifact due to indel reads at this position">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=RD,Number=1,Type=Integer,Description="Depth of reference-supporting bases (reads1)">
##FORMAT=<ID=AD,Number=1,Type=Integer,Description="Depth of variant-supporting bases (reads2)">
##FORMAT=<ID=FREQ,Number=1,Type=String,Description="Variant allele frequency">
##FORMAT=<ID=DP4,Number=1,Type=String,Description="Strand read counts: ref/fwd, ref/rev, var/fwd, var/rev">
#CHROM POS ID REF ALT QUAL
chr1 10043516 . A G 0.01
chr1 10044378 . A G 0.01
iarc@iarc-H8QG6:/Processed_data/gabriel/amrita/tfRNA/EVs/derfinder-pipe$ head /New_data/pancreatic/new_data/junctions.bed
chr1 12697 13221 JUNC_0 0 + 12697 13221 255,0,0 2 50,50 0,575
chr1 14829 14970 JUNC_1 1 - 14829 14970 255,0,0 2 50,50 0,192
chr1 14829 15021 JUNC_2 0 - 14829 15021 255,0,0 2 50,50 0,243
chr1 15012 25233 JUNC_3 0 + 15012 25233 255,0,0 2 50,50 0,10272
chr1 15038 15796 JUNC_4 3 - 15038 15796 255,0,0 2 50,50 0,809
chr1 15947 16607 JUNC_5 3 - 15947 16607 255,0,0 2 50,50 0,711
chr1 16765 16854 JUNC_6 0 - 16765 16854 255,0,0 2 50,50 0,140
chr1 16765 16858 JUNC_7 0 - 16765 16858 255,0,0 2 50,50 0,144
chr1 17055 17233 JUNC_8 0 - 17055 17233 255,0,0 2 50,50 0,229
chr1 17055 17606 JUNC_9 0 - 17055 17606 255,0,0 2 50,50 0,602
iarc@iarc-H8QG6:/Processed_data/gabriel/amrita/tfRNA/EVs/derfinder-pipe$ head /New_data/pancreatic/new_data/merged.gtf
chr1 Cufflinks exon 852260 857889 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "1"; oId "CUFF.6.1"; class_code "u"; tss_id "TSS1";
chr1 Cufflinks exon 857991 858025 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "2"; oId "CUFF.6.1"; class_code "u"; tss_id "TSS1";
chr1 Cufflinks exon 878110 878438 . + . gene_id "XLOC_000002"; transcript_id "TCONS_00000002"; exon_number "1"; oId "CUFF.36.1"; class_code "u"; tss_id "TSS2";
chr1 Cufflinks exon 878633 878757 . + . gene_id "XLOC_000002"; transcript_id "TCONS_00000002"; exon_number "2"; oId "CUFF.36.1"; class_code "u"; tss_id "TSS2";
chr1 Cufflinks exon 879078 879188 . + . gene_id "XLOC_000002"; transcript_id "TCONS_00000002"; exon_number "3"; oId "CUFF.36.1"; class_code "u"; tss_id "TSS2";
chr1 Cufflinks exon 879288 880180 . + . gene_id "XLOC_000002"; transcript_id "TCONS_00000002"; exon_number "4"; oId "CUFF.36.1"; class_code "u"; tss_id "TSS2";
chr1 Cufflinks exon 880422 880526 . + . gene_id "XLOC_000002"; transcript_id "TCONS_00000002"; exon_number "5"; oId "CUFF.36.1"; class_code "u"; tss_id "TSS2";
chr1 Cufflinks exon 880898 881033 . + . gene_id "XLOC_000002"; transcript_id "TCONS_00000002"; exon_number "6"; oId "CUFF.36.1"; class_code "u"; tss_id "TSS2";
chr1 Cufflinks exon 881527 881666 . + . gene_id "XLOC_000002"; transcript_id "TCONS_00000002"; exon_number "7"; oId "CUFF.36.1"; class_code "u"; tss_id "TSS2";
chr1 Cufflinks exon 881782 881925 . + . gene_id "XLOC_000002"; transcript_id "TCONS_00000002"; exon_number "8"; oId "CUFF.36.1"; class_code "u"; tss_id "TSS2";
annotation driectory prepared with PrepareAnnotationRefseq2
The text was updated successfully, but these errors were encountered: