Releases · ksahlin/IsoCon

29 Jul 02:48

ksahlin

0.3.2

645a8a9

0.3.2 Latest

Latest

Several speed improvements. IsoCon 0.3.2 is ~2-10x faster than version 0.3.1 and lower.
Various bugfixes
Added a script to estimate sequencing depth required to have >T% recall in the statistical test.

Assets 2

25 Mar 13:57

ksahlin

v0.3.1

a724a0b

0.3.1

New pairwise SW alignment approach --> changed from ssw to parasail that supports NW.
IsoCon can now take a fastq file as input with CCS reads and their quality values instead of the flnc and the ccs.bam.
Some minor bugfixes

Assets 2

03 Feb 02:49

ksahlin

v0.3.0

38b249b

Major runtime improvements and code base changes

Major updates to speed and code readability, minor bugfixes. Previous versions are deprecated.

Fixed

Bugfix when —nearest_neighbor_depth set. Previous version would not explore less than specified number of sequences when assigning reads to candidates (statistical test step).

Added

Added parameter --min_test_ratio X (default 5). This parameter omits testing candidates c to t if c has X times more support. This will speed up the algorithm because it omits tests that will (most likely) be significant as c is dominant to t.
Added CHANGELOG and GPL LICENSE.

Changed

IsoCon does not longer build the multi-alignment matrix (MAM) in the statistical testing step. We now get support, base probabilities etc. by (1) aligning only c to t to get positions where they differ, (2) Obtain base qualities over these positions by using (the already created) read alignments to c and t (a read is assigned to either c or t). This implementation therefor skips the realignment of all reads to the reference t as well as the creation of the MAM. This gives a speed-up of 5-20x for the statistical test (most speedup for longer noisier sequences).
Improved speed (~2-3x) in multi-alignment function (Now used only in the correction step).
significant re-write of code in some regions, such as the statistical test. Improved readability and factorization

Assets 2

02 Feb 23:47

ksahlin

0.2.5.1

d4567f3

v0.2.5.1

Stable version before major re-write of code resulting in improved readability and significant speedup in statistical test. This release fixed several minor bugs present in the IsoCon version (commit 85eb122, tag 0.2.4) that was used to generate results in the bioRxiv preprint made available 2018-01-10, as well as the version sent to a journal for review.

Fixed

Bugfix in if-statement described in Supplementary Section A: "Estimating the probability of a sequencing error" biorxiv-suppl.
Bugfix in how to treat tiebreakers, described in Supplementary Section A: "implementation details" in
biorxiv-suppl.

Added

Test data and instructions for running IsoCon on testdata.
Automatic builds and testing with Travis.
Installation through pip now possible.
Parameter --verbose and also removed lot of prints to stdout.
Added parameter --min_exon_diff to break alignment with this many consecutive '-' (an indel). Previously this was hardcoded.
Made the mapping quality values upper bound T (described in Supplementary Section A: "Estimating the probability of a sequencing error"biorxiv-suppl) a parameter --max_phred_q_trusted instead of hardcoded value.

Changed

change parameter --single_core (a flag that was false by default and IsoCon would use all cores available) to a more flexible format --nr_cores where the used can specify how many cores.
Changed terminology. All occurences of "minimizer" is changed to "nearest_neighbor" (or "neighbor" in parameters) to adapt for new notation of nearest neighbor graph instead of minimizer graph.

Removed

Removed option --barcodes as it serves no purpose anymore --- if we have barcodes they have been detected and reads have been split into batches in a upstream step.

Assets 2

03 Feb 00:02

ksahlin

v0.2.4

85eb122

bioRxiv version

This version was used to generate the results in the bioRxiv preprint made available 2018-01-10 (link), as well as the version sent to a journal for review.

Assets 2

09 Jul 19:49

ksahlin

0.2.3

a6c87be

Various improvements

Fixed places in code that were stochastic and gave inconsistent results between runs:
1. Fixed stochasticity when correcting sequences in partition. This had to do with the fact that majority base pair was chosen arbitrary (in python3) if there was a tie. I'n this version, we do not correct sequences on positions where majority is ambiguous.
2. Fixed ambiguity in partition: We choose the node with largest number of reachable nodes. Within this partition, we choose the minimizer as the string with the most direct support (nr identical strings + direct neighbors in graph)
Fixed a logical bug in the function creating a multiple alignment matrix from pairwise alignments
No filtering of candidates based on requirement of having to be consensus over each base pair is performed (neither in the output in the minimizer step nor in the output of the final candidates step). This was a heuristic used in earlier implementations that was used only to limit the number of candidates and covered other flaws/bugs with the old implementation -- such as the one in the multiple alignment creation mentioned above.
Fixed bug in bipartite partitioning algorithm. Now we use the "bipartite" data structure from networkx which also improves readability of code.
In statistical test:
1. Better multiple correction factor where there is difference in homopolymers
2. Statistical test is performed for each edge in the minimizer graph formed by candidates -- as described in methods section.
3. added support for calculating p-value under normal approximation which can be better than poisson if sample size is vary large, and lager enough probabilities. We however only output log of difference,. Poisson is still used to derive results.

Assets 2

02 May 16:10

ksahlin

0.2.0

7f60fdf

Exact minimizer graph, partitioning, reducing stochasticity.

Exact minimizer graph computation with library edlib.
Partitioning graph based on selecting the minimizers m with the highest number of reachable nodes as centers, instead of heuristic approximation of the largest number of neighbors, as in v0.1.0.
Update in error correction:
- Total number of errors according to types, insertions, deletions and substitutions are calculated for each partition. In v0.1.0, the weight of each error was simply the count. In this version, the weight is the count at the position divided by the total error count of this type.
- Instead of computing edit distance E to minimizer, we compute it to the consensus of the partition (denote this edit distance M), so the number of errors corrected changes from E/2 to M/2 in this version compared to old one.
- Version 0.1.0 finds the weight w (i.e., the count) of the E/2 position with the lowest weight after sorting positions according to weight. If P is all positions with weight <= w, then v0.1.0 selects a random subset of |E/2| positions to correct, where |E/2| <= |P|. In contrast, v0.2.0 finds weight w' (normalized count) of the M/2 position with the lowest weight after sorting positions according to weight. If P' is all positions with weight <= w', then v0.2.0 corrects all these |P'| positions instead of a random subset.
Error model in statistical test still assuming uniform errors across CCS reads

Assets 2

15 Mar 02:59

ksahlin

0.1.0

9be8c82

First stable version v0.1.0 Pre-release

Pre-release

First stable version on both simulated and biological datasets
Error model assuming uniform errors across CCS reads
Fast approximation of minimizer graph
Partitioning graph based on selecting the minimizers m with the highest number of neighbors as centers, and assigning each read that can reach m in minimizer graph.
Correcting E/2 errors each pass, for each read in partition, where E is the edit distance between minimizer and read. Positions to correct: Find the weight w (i.e., the count) of the E/2 position with the lowest weight after sorting positions according to weight. If P is all positions with weight <= w. Select a random subset of |E/2| positions to correct, where |E/2| <= |P|. The E/2 positions with lowest counts in PFM of consensus of partition get corrected. So in summary, if more positions with equal counts than E/2, choosing random subset of E/2 positions to correct.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed

Added

Changed

Fixed

Added

Changed

Removed

Releases: ksahlin/IsoCon

0.3.2

0.3.1

Major runtime improvements and code base changes

Fixed

Added

Changed

v0.2.5.1

Fixed

Added

Changed

Removed

bioRxiv version

Various improvements

Exact minimizer graph, partitioning, reducing stochasticity.

First stable version v0.1.0