How to set --min_candidate_support should be appropriate？ #5

wheatwill · 2019-07-20T14:17:18Z

Hi，when I run IsoCon，I found the results vary greatly with different --min_candidate_support set. So I wonder how to set this parameter is ok？

ksahlin · 2019-07-20T20:50:41Z

It depends primarily on the characteristics of your data, but also on your goals. In general, the lower the cutoff, the more sensitive the algorithm will be (that is, detect more low expressed sequences but also predict more erroneous sequences).

What type of data is it ? Is the data pacbio CCS reads? Is the data targeted with primers specific to capture a specific 5' and 3' location (meaning that the ends will be well defined)?
How deep is the sequencing? Specifically: How many reads? How many transcripts variants do you expect, ballpark number, 10? 100? 1000? Number of reads divided by expected number of transcripts can give you an estimate of how you should set the cutoff.
Gene family/species can also be useful to know

wheatwill · 2019-07-21T04:34:44Z

Thank you very much for your quick reply!
Actually, I am running a set of nontargeted Iso-Seq data. The gene family I am interested in is expected to contain 10-20 members（tandem repeat genes, but only 3 of them have been assembled successfully at the reference genome. So I try to get other transcripts from a full-length transcriptome generated by Pacbio RSII. I used the blastn method to get 1500 sequences from all the flnc reads. Then I run the isoline pipeline directly: IsoCon pipeline -fl_reads blast.out.flnc.fasta -outfolder test.IsoCon.out --ccs polished.total.flnc.bam --nr_cores 24 --min_candidate_support 10.
--min_candidate_support 10 get 4 final candidates
--min_candidate_support 5 get 15 final candidates

Should I trim these blast out flnc reads at the same start and end position?

ksahlin · 2019-07-21T11:18:18Z

Trimming the start and ends at the same locations will greatly help IsoCon at finding the variants and work as it was designed for. This is the very much preferred option! Let's see if you get the same variability after this.

You can do some post analysis of IsoCon's results by looking at the read support of each final candidate (could be done as sanity check for results both with or without trimming ends). The support can be observed by counting the number of reads that were assigned to each consensus in the cluster_info.tsv file. (Alternatively, the accessions of the candidates in the final_candidates.fa contains related information of how many reads that supports them, but counting rows in the tsv is more exact).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to set --min_candidate_support should be appropriate？ #5

How to set --min_candidate_support should be appropriate？ #5

wheatwill commented Jul 20, 2019

ksahlin commented Jul 20, 2019

wheatwill commented Jul 21, 2019

ksahlin commented Jul 21, 2019

How to set --min_candidate_support should be appropriate？ #5

How to set --min_candidate_support should be appropriate？ #5

Comments

wheatwill commented Jul 20, 2019

ksahlin commented Jul 20, 2019

wheatwill commented Jul 21, 2019

ksahlin commented Jul 21, 2019