You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The following exploration uses the optimized DAG /fh/fast/matsen_e/mbarker/larch/luka_optimized_10_iterations.pb. That DAG was produced by optimizing /fh/fast/matsen_e/wdumm/luka_larch/Lukas_dag.pb, which was prepared in Python from a gctree parsimony forest on a gcreplay alignment.
@marybarker should confirm, but I suspect the larch command was something like
There's an issue with DAGs produced by Larch where nodes with the same child
clades and compact genome are given different node IDs, and it seems these
node IDs are being used to distinguish nodes (which they should not be).
Here's an exploration that illustrates the problem with the optimized hDAG referenced above:
So, the duplicate nodes in this case are relatively close to leaves, and there
aren't many of them.
Here's what happens when we get rid of node IDs. We can do this here because
there are no ambiguities in leaf sequences, and leaf sequences are unique.
If node IDs were being assigned correctly, we should end up with a new hDAG
with the same number of nodes, edges, and histories, but here we expect to end
up with 8 fewer nodes, since that's the number of duplicates we saw above:
unique node IDs didn't allow collapsing before (I verified this, but left it
out here for clarity), but in their absence over 200
edges can be eliminated by collapsing, reducing the number of unique MP trees here to
12k from over 1M reported by Larch.
In summary, I think there are two issues, the first quite a bit more serious
than the second:
First, when optimized trees, or perhaps moves, are being merged into the
hDAG by Larch, either node comparisons aren't working correctly (using
only LeafSets and compact genomes), so nodes are comparing unequal when in
fact they are equal, or node ids are being assigned before merging, and
they're erroneously being used in the node comparison.
Second, I don't remember if Larch attempts to do some kind of collapsing of
edges without mutations before or during merge, but if so that doesn't
seem to be working, since collapsing is possible after removing node IDs.
The text was updated successfully, but these errors were encountered:
The following exploration uses the optimized DAG
/fh/fast/matsen_e/mbarker/larch/luka_optimized_10_iterations.pb
. That DAG was produced by optimizing/fh/fast/matsen_e/wdumm/luka_larch/Lukas_dag.pb
, which was prepared in Python from a gctree parsimony forest on a gcreplay alignment.@marybarker should confirm, but I suspect the larch command was something like
There's an issue with DAGs produced by Larch where nodes with the same child
clades and compact genome are given different node IDs, and it seems these
node IDs are being used to distinguish nodes (which they should not be).
Here's an exploration that illustrates the problem with the optimized hDAG referenced above:
We'll just trim to MP trees so it's quicker to work with:
This matches the information about the (trimmed) dag that Larch reported.
Now, to verify my claim that there are duplicate nodes assigned different
node_ids:
So, the duplicate nodes in this case are relatively close to leaves, and there
aren't many of them.
Here's what happens when we get rid of node IDs. We can do this here because
there are no ambiguities in leaf sequences, and leaf sequences are unique.
If node IDs were being assigned correctly, we should end up with a new hDAG
with the same number of nodes, edges, and histories, but here we expect to end
up with 8 fewer nodes, since that's the number of duplicates we saw above:
unique node IDs didn't allow collapsing before (I verified this, but left it
out here for clarity), but in their absence over 200
edges can be eliminated by collapsing, reducing the number of unique MP trees here to
12k from over 1M reported by Larch.
In summary, I think there are two issues, the first quite a bit more serious
than the second:
hDAG by Larch, either node comparisons aren't working correctly (using
only LeafSets and compact genomes), so nodes are comparing unequal when in
fact they are equal, or node ids are being assigned before merging, and
they're erroneously being used in the node comparison.
edges without mutations before or during merge, but if so that doesn't
seem to be working, since collapsing is possible after removing node IDs.
The text was updated successfully, but these errors were encountered: