You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We would like to try sampling trees from the history DAG which are topologically strange, and use these preferentially in the sample-optimize-merge loop. Topological strangeness will be quantified as the sum of Robinson Foulds distance from a tree to any other tree in the DAG. We can compute this using the SubtreeWeight framework as follows:
We need first a map $N$ from any clade in the hDAG (that is, any set of taxa that can be realized as the union of a node's child clades) to the number of trees in the hDAG which contain a node with that clade. More on this in Counting the number of trees in which each node (or clade) takes part #30 .
I show here starting at line 115 that for any tree in the hDAG, the sum of the Robinson Foulds distances from that tree $t$ to every other tree in the DAG (a sum which I'll call $D_t$) is given by $$D_t = K + \sum_{e\in t} (|T| - 2N(c_e))$$
where...
$K$ is an easily computable constant which we won't worry about (it's the sum of number of clades in each tree in the DAG),
the sum is over edges $e$ in the tree $t$,
$c_e$ is the clade of the child node of the edge $e$ in the tree $t$,
and $|T|$ is the number of trees in the hDAG.
Note that the sum, and therefore the value $D_t - K$ which we'll be maximizing in trees, will probably be negative. Therefore, all we need for computing the (K-shifted) sum RF distance on all trees in the DAG via SubtreeWeight is an edge function which returns this value $|T| - 2N(c_e)$ for any edge $e$.
Once we have that implemented via SubtreeWeight, trees with higher shifted-sum-RF distances $D_t - K$ are the ones which are more exotic topologically, in the context of the other trees in the DAG. We now want to weight these higher in the distribution we're sampling trees from. This can be done by using cached weights from SubtreeWeight<ShiftedSumRFDistances>.ComputeWeightBelow(), and sampling edges below each clade in proportion to the relative cached weights of each edge's child node, in the same fashion as has been done for uniform sampling with SubtreeWeight<TreeCount>. The way this works will be related to the solution to Sampling from the DAG with probabilities (and constraints) #28 and how we resolve the discussion on PR Uniform sampling #31
The text was updated successfully, but these errors were encountered:
willdumm
changed the title
Sampling from the DAG's topological 'edge'
Sampling from the DAG's topological fringe
Oct 17, 2022
We would like to try sampling trees from the history DAG which are topologically strange, and use these preferentially in the sample-optimize-merge loop. Topological strangeness will be quantified as the sum of Robinson Foulds distance from a tree to any other tree in the DAG. We can compute this using the
SubtreeWeight
framework as follows:We need first a map$N$ from any clade in the hDAG (that is, any set of taxa that can be realized as the union of a node's child clades) to the number of trees in the hDAG which contain a node with that clade. More on this in Counting the number of trees in which each node (or clade) takes part #30 .
I show here starting at line 115 that for any tree in the hDAG, the sum of the Robinson Foulds distances from that tree$t$ to every other tree in the DAG (a sum which I'll call $D_t$ ) is given by
$$D_t = K + \sum_{e\in t} (|T| - 2N(c_e))$$
where...
Note that the sum, and therefore the value$D_t - K$ which we'll be maximizing in trees, will probably be negative. Therefore, all we need for computing the (K-shifted) sum RF distance on all trees in the DAG via $|T| - 2N(c_e)$ for any edge $e$ .
SubtreeWeight
is an edge function which returns this valueOnce we have that implemented via$D_t - K$ are the ones which are more exotic topologically, in the context of the other trees in the DAG. We now want to weight these higher in the distribution we're sampling trees from. This can be done by using cached weights from
SubtreeWeight
, trees with higher shifted-sum-RF distancesSubtreeWeight<ShiftedSumRFDistances>.ComputeWeightBelow()
, and sampling edges below each clade in proportion to the relative cached weights of each edge's child node, in the same fashion as has been done for uniform sampling withSubtreeWeight<TreeCount>
. The way this works will be related to the solution to Sampling from the DAG with probabilities (and constraints) #28 and how we resolve the discussion on PR Uniform sampling #31The text was updated successfully, but these errors were encountered: