Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to set --min_candidate_support should be appropriate? #5

Open
wheatwill opened this issue Jul 20, 2019 · 3 comments
Open

How to set --min_candidate_support should be appropriate? #5

wheatwill opened this issue Jul 20, 2019 · 3 comments

Comments

@wheatwill
Copy link

Hi,when I run IsoCon,I found the results vary greatly with different --min_candidate_support set. So I wonder how to set this parameter is ok?

@ksahlin
Copy link
Owner

ksahlin commented Jul 20, 2019

It depends primarily on the characteristics of your data, but also on your goals. In general, the lower the cutoff, the more sensitive the algorithm will be (that is, detect more low expressed sequences but also predict more erroneous sequences).

  • What type of data is it ? Is the data pacbio CCS reads? Is the data targeted with primers specific to capture a specific 5' and 3' location (meaning that the ends will be well defined)?
  • How deep is the sequencing? Specifically: How many reads? How many transcripts variants do you expect, ballpark number, 10? 100? 1000? Number of reads divided by expected number of transcripts can give you an estimate of how you should set the cutoff.
  • Gene family/species can also be useful to know

@wheatwill
Copy link
Author

Thank you very much for your quick reply!
Actually, I am running a set of nontargeted Iso-Seq data. The gene family I am interested in is expected to contain 10-20 members(tandem repeat genes, but only 3 of them have been assembled successfully at the reference genome. So I try to get other transcripts from a full-length transcriptome generated by Pacbio RSII. I used the blastn method to get 1500 sequences from all the flnc reads. Then I run the isoline pipeline directly: IsoCon pipeline -fl_reads blast.out.flnc.fasta -outfolder test.IsoCon.out --ccs polished.total.flnc.bam --nr_cores 24 --min_candidate_support 10.
--min_candidate_support 10 get 4 final candidates
--min_candidate_support 5 get 15 final candidates

Should I trim these blast out flnc reads at the same start and end position?

@ksahlin
Copy link
Owner

ksahlin commented Jul 21, 2019

Trimming the start and ends at the same locations will greatly help IsoCon at finding the variants and work as it was designed for. This is the very much preferred option! Let's see if you get the same variability after this.

You can do some post analysis of IsoCon's results by looking at the read support of each final candidate (could be done as sanity check for results both with or without trimming ends). The support can be observed by counting the number of reads that were assigned to each consensus in the cluster_info.tsv file. (Alternatively, the accessions of the candidates in the final_candidates.fa contains related information of how many reads that supports them, but counting rows in the tsv is more exact).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants