Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attractor detection refactoring #116

Merged
merged 18 commits into from
Apr 18, 2024
Merged

Attractor detection refactoring #116

merged 18 commits into from
Apr 18, 2024

Conversation

daemontus
Copy link
Collaborator

@daemontus daemontus commented Apr 11, 2024

This PR proposes a major refactoring/update of the attractor detection mechanism to make it more practical on large networks.

I have slightly updated the vocabulary in the code:

  • "Attractor candidate states" are those that cover every attractor, but can contain duplicates or spurious candidates.
  • "Attractor seed states" should correspond one-to-one with network attractors.
  • "Attractor set" is a complete symbolic set of states that represents one attractor.

Highlights:

  • New mechanism for dynamically optimizing the "retained set" to achieve smaller number of candidate states.
    • This can potentially spend "too much time" on the reatined set optimization, so in some cases it might be beneficial to disable it and hope that the simulation step can deal with all the candidates. But fundamentally, the initial candidate set can be sometimes extremely large and there is no other way to solve this than tuning the retained set, hence this is enabled by default.
  • Major improvements in the simulation reduction of the candidate states.
    • This mostly uses new code that is faster because it uses simpler operations and thus we can simulate much longer sequences of states. For models where symbolic reachability is fast, this could just be skipped, but overall there is very little reason to disable this. If the initial candidate set is reasonable, simulation should not take too long.
  • pint is now an "optional" dependency, because the first two optimizations make it largely redundant.
    • In my testing, the results obtained by pint could eliminate a few extra candidate states, but for the most part, cases where pint was used would have to go to symbolic reachability anyway. Hence I modified the code in such a way that pint can still be used, but is not mandatory and does not have to be installed.
  • mole is fully replaced with symbolic reachability in AEON.
    • Overall, only ~10 models actually get to this step (everything else is solved through the first two optimization steps). Out of those, three then fail with a 2h timeout (i.e. there is no out of memory error).
    • This method can also output the full symbolic attractor set, because that is the ultimate "proof" that the seed is unique in a specific attractor. So I have added the option to compute the symbolic set in full if desired.
  • I added an explicit limit on the number of stable motifs that can be expanded in a single node (100_000). This usually happens if the network has a lot of free inputs and is easier to explain than a generic "out of memory" error that would be produced otherwise. The limit can be changed using a constant if needed.
  • For similar reasons, I added a limit on the number of attractor candidate states that will be considered during attractor detection, because if the set exploded, you typically only got an out of memory error with no explanation.

Performance results for "seed" state detection:

  • We can compute attractors for almost all the networks in the balm-analysis repository, assuming we can build the succession diagram for them.
  • For BBM, the only issue are models 004 and 079. Here, we cannot build the full SD, but we can build some partial SD and find attractors there. This works for 079, but is still a problem for 004, because it has a lot of candidate states. It is possible it could be computed with SCC decomposition with sufficient timeout. (Model 122 is still permanently excluded because it has a gigantic PN encoding).
  • For the random_nk2 and random_nk3 models, we can compute all of them without major issues.
  • For the balm-new-real-models, we can compute everything that doesn't have too many inputs (there, we can't expand the root node so it just crashes). We can fix those inputs to some random values and then we can compute everything AFAIK.
  • For the random models, there is one model where even the new optimization cannot produce a retained set with fewer than 1M candidate states. Other than that, all models reduce to a handful of candidate states (some take a few minutes to get there though). Out of these, three fail on symbolic reachability due to timeout, the rest finishes.

Full attractor results (first is the new version, second is the old version, the random pass on the old version failed before the end due to unrelated reasons, but I think the picture is clear) full-attr.xlsx. There is one notable regression in the BBM dataset (ID195): here, the attractor search is very very simple and it actually isn't worth it to even compute the percolated network for each trap space, that's why it is taking longer now. But the actually hard instances like 002 or 211 do seem to benefit quite a bit, not just the random models.

Other relevant changes:

  • Removed pypint from requirements.txt and marked it as optional in pyproject.toml.
  • Added a new private package _sd_attractors for most of the new (and some of the old) code. Subsequently, the motif_avoidant.py module has been removed and its functionality is now mostly in the _sd_attractors module. Similarly, _sd_algorithms/compute_attractor_seeds.py is now in _sd_attractors.
  • Some of the motif_avoidant.py code also moved to _pint_reachability.py, which is now only responsible for the pint code, but hasn't changed much (I added a limit on the pint goal size because it was failing with more than a handful of candidate states).
  • Added keys percolated_network, percolated_petri_net, and percolated_nfvs to NodeData. These are computed lazily and can be removed either using SuccessionDiagram.reclaim_node_data() or by manually setting them to None. Each of these has a corresponding "compute" function in the succession diagram as well.
  • I manually release these cached values in a few places where it seems like they won't be needed anymore (it is not a problem if the SD is small, but keeping them around in SDs with 10.000+ nodes certainly seems wasteful). For now, I propose to keep this as is, and then create an issue that we need to address this more systematically if this ever becomes a problem (e.g. some sort of size limit after which we start reclaiming these aggressively).
  • Added keys for attractor_candidates, attractor_seeds and attractor_sets to NodeData. Here, if attractor_candidates is unknown and attractor_seeds is known, attractor_seeds can be used instead. Also, computing one can subsequently populate the others if the result is valid (e.g. computing a single candidate state also uses it as a seed state immediately).
  • A new symbolic_utils function which converts a BddValuation to a BooleanSpace. This is used in the new simulation code.
  • tests/motif_avoidant_test is now ignored and should be replaced with better tests for the new attractor detection pipeline.
  • Enabled --diff option in the CI command for black, because it gave me different results on my machine and I needed to compare them. Could be useful in the future too :)

TODOs:

  • Improve test coverage of the new code.
  • Update the benchmark folder since not all scripts there are currently working (or just move everything to balm-analysis).

Copy link

github-actions bot commented Apr 11, 2024

Coverage

Coverage Report
FileStmtsMissCoverMissing
biobalm
   _pint_reachability.py615018%24, 40–54, 69–93, 101–146
   control.py1141488%107, 119, 125, 129, 134, 143–159, 477, 480, 493
   interaction_graph_utils.py38489%11–13, 151–152
   petri_net_translation.py1491193%22–26, 79, 136, 308–309, 333–334, 343, 452
   space_utils.py1321688%26–28, 104–110, 133–139, 414, 462
   succession_diagram.py3816683%6, 118, 207–212, 225, 272–279, 383–390, 407–408, 418, 424, 540, 627–633, 749, 752, 870–888, 920, 930, 933, 970, 973, 980, 1031, 1045, 1125, 1308, 1319, 1327, 1355, 1370, 1382, 1387, 1393
   symbolic_utils.py32584%10, 39–44, 100, 128
   trappist_core.py1832785%14–18, 55, 57, 92, 215, 217, 219, 247–250, 254–256, 276–282, 340, 342, 372, 420, 422
biobalm/_sd_algorithms
   expand_attractor_seeds.py60788%6, 28, 42, 109–114, 119
   expand_bfs.py28196%6
   expand_dfs.py30197%6
   expand_minimal_spaces.py37295%6, 31
   expand_source_SCCs.py122497%14–16, 88, 133
   expand_to_target.py31390%6, 38, 43
biobalm/_sd_attractors
   attractor_candidates.py2638966%13–15, 26–27, 93, 101, 107–108, 130, 152, 187, 193–204, 223, 239–316, 321, 325, 331, 337, 348, 372, 377, 381, 387, 389–427, 500, 571–572, 673
   attractor_symbolic.py1141686%6–7, 75, 88–92, 103, 112, 144, 179, 191–193, 202, 230, 236
TOTAL186031683% 

Tests Skipped Failures Errors Time
357 0 💤 0 ❌ 0 🔥 44.766s ⏱️

@jcrozum
Copy link
Owner

jcrozum commented Apr 11, 2024

@daemontus any thoughts on how to reconcile this with #115? I'm thinking merge this first, then revert #115 to before the package rename and merge that, then make an entirely new pr just for the renaming. Not sure if there's a better way.

@daemontus
Copy link
Collaborator Author

I just tried to rebase it locally from #115 and it seems that we can first do #115 and then I'll just update this PR with the new name. Interestingly, other than the the newly created files/functions, most of the changes merged without conflicts... so it should be an easy fix :)

@daemontus daemontus requested a review from jcrozum April 11, 2024 15:58
@daemontus
Copy link
Collaborator Author

This should now be fully rebased with the latest name change. (bcc: @jcrozum)

Copy link
Owner

@jcrozum jcrozum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few minor comments. Feel free to address them or ignore them as you see fit. I'd be ok merging as-is.

.github/workflows/test.yml Outdated Show resolved Hide resolved
requirements.txt Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
biobalm/_pint_reachability.py Outdated Show resolved Hide resolved
biobalm/_pint_reachability.py Outdated Show resolved Hide resolved
biobalm/_sd_attractors/attractor_candidates.py Outdated Show resolved Hide resolved
biobalm/_sd_attractors/attractor_candidates.py Outdated Show resolved Hide resolved
biobalm/_sd_attractors/attractor_candidates.py Outdated Show resolved Hide resolved
biobalm/succession_diagram.py Outdated Show resolved Hide resolved
@daemontus
Copy link
Collaborator Author

@jcrozum I think I resolved the issue with the "constant configuration values": SuccessionDiagram now has a configuration dictionary that stores all of these and you can pass it as a constructor parameter. As such, if you serialize the diagram, the "configuration" is preserved. Furthermore, these are no longer global module "constants" so we don't have to worry about users needing to access/modify them.

Please have a quick look if that looks reasonable to you, and if yes, feel free to merge the PR.

@jcrozum
Copy link
Owner

jcrozum commented Apr 18, 2024

Looks good, I'll merge it.

@jcrozum jcrozum merged commit 0438418 into main Apr 18, 2024
8 checks passed
@daemontus daemontus deleted the attractor-benchmarks branch September 18, 2024 10:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants