-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A short observation about per-displayed-tree-clvs #47
Comments
Lol. Turns out blob optimization is not the way to go with this new likelihood definition we are using. Instead, each node in the network needs to store 2^(number_of_reticulations_in_subnetwork_rooted_at_the_node) clv vectors. I am currently working out the details (e.g., have subnetwork-specific tree_ids that encode which reticulations were taken, and combine them via bitshifting), but the potential speedup looks promising :sunglasses: Okay, if we had memory usage issues and needed some tuneable in-between solution, then we could do with multiple clv vectors only at the megablob roots, recomputing all the other clv vectors within a megablob on-the-fly while iterating over its trees. And then, then it would make sense to use gray-code iteration order to minimize the number of clv re-computations within a megablob. But with low numbers of reticulations we are currently talking about, and still rather low number of taxa, and not superduper insanely large MSAs, memory issues are currently not our priority. Maybe we can even reduce the number of stored CLVs a bit more... to ignore unreachable/dead areas (this is, areas that have some inner node with suddenly no active children, due to how the active reticulations were chosen). |
But even if memory usage ends up being an issue in the future, the blobs per-se do not make much sense anymore. |
RIP blobs and gray-code stuff... well, they still make sense for the "wrong" likelihood definition. Maybe we can use the wrong one as a heuristic during horizontal search, to speed things up a little... |
In order to avoid pointer craziness, I will modify pll_update_partials in my libpll2 fork: Instead of giving it a long list of operations, I will always just give a single pll_operation_t. And I will specify which clv vectors and scale buffers to use (for parent, left, right) via the function call, thus entirely avoiding the use of partition->clv and partition->scale_buffer. I will also modify pll_compute_root_loglikelihood accordingly. |
The speedup potential of this is HUGE, as it will give us back true incremental loglikelihood computation in networks!!! :-) (Also, I already have an idea for a very ad-hoc pseudolikelihood function that's quick to compute and easy to integrate then -> the one I wanted to try from the beginning... it apparently makes no biological sense, but maybe it serves as a good speedup-heuristic for identifying/pre-scoring promising move candidates) |
That all sounds very promising and makes sense
…On 28.02.21 13:40, Sarah Lutteropp wrote:
The speedup potential of this is HUGE, as it will give us back true
incremental loglikelihood computation in networks!!! :-)
(Also, I already have an idea for a very ad-hoc pseudolikelihood
function that's quick to compute and easy to integrate then -> the one I
wanted to try from the beginning... it apparently makes no biological
sense, but maybe it serves as a good speedup-heuristic for
identifying/pre-scoring promising move candidates)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#47 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGXB6STEKPSOKHEZC5J2HDTBITT7ANCNFSM4YFFIWMQ>.
--
Alexandros (Alexis) Stamatakis
Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
www.exelixis-lab.org
|
Done! It works nicely and is ready to be integrated into the master branch. 😎 |
Naively, one would store
number_of_network_nodes * number_of_displayed_trees
CLV vectors. However, there are regions in the network that are exactly the same over multiple displayed trees. Those regions can be identified with the blob optimization trick.---> One actually can save quite some memory and computations in there, if one cleverly reuses some CLVs/ shares them among multiple displayed trees. It will be tricky to make this work together with dynamical changes (network moves) to the network data structure.
The text was updated successfully, but these errors were encountered: