Counting the number of trees in which each node (or clade) takes part #30

willdumm · 2022-10-17T23:10:12Z

For #29 we need a way to count the number of trees which each of a DAG's nodes takes part in.

This is implemented in the Python package here:
https://github.com/matsengrp/historydag/blob/88db496bb6420adf85fce78a861e28ab74031694/historydag/dag.py#L1516

This map from nodes to tree counts can be constructed for all nodes in two traversals of the DAG, first postorder and then reverse postorder. (For use in #29 we may need to sample multiple trees at once for use in sample-optimize-merge, since sampling will be fast, but constructing this map will be relatively slow)

First, compute the number of subtrees below each node in the DAG (e.g. with a call to SubtreeWeight<TreeCount>.ComputeWeightBelow(). Let's call $b(n)$ the number of subtrees below any node $n$
We can compute the number of "above subtrees" $a(n)$ for a node $n$ as follows... Here $P(n)$ is the set of parent nodes of $n$. Also, given a node $n$ and one of its parents $p$, then $s(n, p) = \sum\limits_{s\in S(n, p)} b(s)$, where $S(n, p)$ is the set of child nodes of $p$ which descend from the same clade as $n$ (including n):
$$a(n) = \sum_{p\in P(n)} \left [ \frac{a (p ) b(p )}{s(n, p)}\right ]$$
Then for any node, $c(n) = a(n)b(n)$ is the number of trees the node $n$ is part of in the DAG.
Instead of a map from nodes to the number of trees they take part in, we actually need a map from clade unions to the number of trees they take part in. This can be computed by iterating through all nodes and summing values of the map $c$ on all nodes which have the same clade union. By clade union we mean union of child clades, as a subset of leaf labels (rather than set of sets of leaf labels as in LeafSet). This is the map called $N$ in Sampling from the DAG's topological fringe #29

Credit to @williamhowardsnyder for the algorithm and Python implementation

The text was updated successfully, but these errors were encountered:

willdumm mentioned this issue Oct 17, 2022

Sampling from the DAG's topological fringe #29

Open

willdumm added the proposed feature label Oct 17, 2022

ognian- mentioned this issue Dec 3, 2022

Extendable dag #42

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Counting the number of trees in which each node (or clade) takes part #30

Counting the number of trees in which each node (or clade) takes part #30

willdumm commented Oct 17, 2022 •

edited

Loading

Counting the number of trees in which each node (or clade) takes part #30

Counting the number of trees in which each node (or clade) takes part #30

Comments

willdumm commented Oct 17, 2022 • edited Loading

willdumm commented Oct 17, 2022 •

edited

Loading