Attractor detection refactoring #116

daemontus · 2024-04-11T07:37:33Z

This PR proposes a major refactoring/update of the attractor detection mechanism to make it more practical on large networks.

I have slightly updated the vocabulary in the code:

"Attractor candidate states" are those that cover every attractor, but can contain duplicates or spurious candidates.
"Attractor seed states" should correspond one-to-one with network attractors.
"Attractor set" is a complete symbolic set of states that represents one attractor.

Highlights:

New mechanism for dynamically optimizing the "retained set" to achieve smaller number of candidate states.
- This can potentially spend "too much time" on the reatined set optimization, so in some cases it might be beneficial to disable it and hope that the simulation step can deal with all the candidates. But fundamentally, the initial candidate set can be sometimes extremely large and there is no other way to solve this than tuning the retained set, hence this is enabled by default.
Major improvements in the simulation reduction of the candidate states.
- This mostly uses new code that is faster because it uses simpler operations and thus we can simulate much longer sequences of states. For models where symbolic reachability is fast, this could just be skipped, but overall there is very little reason to disable this. If the initial candidate set is reasonable, simulation should not take too long.
pint is now an "optional" dependency, because the first two optimizations make it largely redundant.
- In my testing, the results obtained by pint could eliminate a few extra candidate states, but for the most part, cases where pint was used would have to go to symbolic reachability anyway. Hence I modified the code in such a way that pint can still be used, but is not mandatory and does not have to be installed.
mole is fully replaced with symbolic reachability in AEON.
- Overall, only ~10 models actually get to this step (everything else is solved through the first two optimization steps). Out of those, three then fail with a 2h timeout (i.e. there is no out of memory error).
- This method can also output the full symbolic attractor set, because that is the ultimate "proof" that the seed is unique in a specific attractor. So I have added the option to compute the symbolic set in full if desired.
I added an explicit limit on the number of stable motifs that can be expanded in a single node (100_000). This usually happens if the network has a lot of free inputs and is easier to explain than a generic "out of memory" error that would be produced otherwise. The limit can be changed using a constant if needed.
For similar reasons, I added a limit on the number of attractor candidate states that will be considered during attractor detection, because if the set exploded, you typically only got an out of memory error with no explanation.

Performance results for "seed" state detection:

We can compute attractors for almost all the networks in the balm-analysis repository, assuming we can build the succession diagram for them.
For BBM, the only issue are models 004 and 079. Here, we cannot build the full SD, but we can build some partial SD and find attractors there. This works for 079, but is still a problem for 004, because it has a lot of candidate states. It is possible it could be computed with SCC decomposition with sufficient timeout. (Model 122 is still permanently excluded because it has a gigantic PN encoding).
For the random_nk2 and random_nk3 models, we can compute all of them without major issues.
For the balm-new-real-models, we can compute everything that doesn't have too many inputs (there, we can't expand the root node so it just crashes). We can fix those inputs to some random values and then we can compute everything AFAIK.
For the random models, there is one model where even the new optimization cannot produce a retained set with fewer than 1M candidate states. Other than that, all models reduce to a handful of candidate states (some take a few minutes to get there though). Out of these, three fail on symbolic reachability due to timeout, the rest finishes.

Full attractor results (first is the new version, second is the old version, the random pass on the old version failed before the end due to unrelated reasons, but I think the picture is clear) full-attr.xlsx. There is one notable regression in the BBM dataset (ID195): here, the attractor search is very very simple and it actually isn't worth it to even compute the percolated network for each trap space, that's why it is taking longer now. But the actually hard instances like 002 or 211 do seem to benefit quite a bit, not just the random models.

Other relevant changes:

Removed pypint from requirements.txt and marked it as optional in pyproject.toml.
Added a new private package _sd_attractors for most of the new (and some of the old) code. Subsequently, the motif_avoidant.py module has been removed and its functionality is now mostly in the _sd_attractors module. Similarly, _sd_algorithms/compute_attractor_seeds.py is now in _sd_attractors.
Some of the motif_avoidant.py code also moved to _pint_reachability.py, which is now only responsible for the pint code, but hasn't changed much (I added a limit on the pint goal size because it was failing with more than a handful of candidate states).
Added keys percolated_network, percolated_petri_net, and percolated_nfvs to NodeData. These are computed lazily and can be removed either using SuccessionDiagram.reclaim_node_data() or by manually setting them to None. Each of these has a corresponding "compute" function in the succession diagram as well.
I manually release these cached values in a few places where it seems like they won't be needed anymore (it is not a problem if the SD is small, but keeping them around in SDs with 10.000+ nodes certainly seems wasteful). For now, I propose to keep this as is, and then create an issue that we need to address this more systematically if this ever becomes a problem (e.g. some sort of size limit after which we start reclaiming these aggressively).
Added keys for attractor_candidates, attractor_seeds and attractor_sets to NodeData. Here, if attractor_candidates is unknown and attractor_seeds is known, attractor_seeds can be used instead. Also, computing one can subsequently populate the others if the result is valid (e.g. computing a single candidate state also uses it as a seed state immediately).
A new symbolic_utils function which converts a BddValuation to a BooleanSpace. This is used in the new simulation code.
tests/motif_avoidant_test is now ignored and should be replaced with better tests for the new attractor detection pipeline.
Enabled --diff option in the CI command for black, because it gave me different results on my machine and I needed to compare them. Could be useful in the future too :)

TODOs:

Improve test coverage of the new code.
Update the benchmark folder since not all scripts there are currently working (or just move everything to balm-analysis).

github-actions · 2024-04-11T07:47:48Z

Coverage Report

File	Stmts	Miss	Cover	Missing
biobalm
_pint_reachability.py	61	50	18%	24, 40–54, 69–93, 101–146
control.py	114	14	88%	107, 119, 125, 129, 134, 143–159, 477, 480, 493
interaction_graph_utils.py	38	4	89%	11–13, 151–152
petri_net_translation.py	149	11	93%	22–26, 79, 136, 308–309, 333–334, 343, 452
space_utils.py	132	16	88%	26–28, 104–110, 133–139, 414, 462
succession_diagram.py	381	66	83%	6, 118, 207–212, 225, 272–279, 383–390, 407–408, 418, 424, 540, 627–633, 749, 752, 870–888, 920, 930, 933, 970, 973, 980, 1031, 1045, 1125, 1308, 1319, 1327, 1355, 1370, 1382, 1387, 1393
symbolic_utils.py	32	5	84%	10, 39–44, 100, 128
trappist_core.py	183	27	85%	14–18, 55, 57, 92, 215, 217, 219, 247–250, 254–256, 276–282, 340, 342, 372, 420, 422
biobalm/_sd_algorithms
expand_attractor_seeds.py	60	7	88%	6, 28, 42, 109–114, 119
expand_bfs.py	28	1	96%	6
expand_dfs.py	30	1	97%	6
expand_minimal_spaces.py	37	2	95%	6, 31
expand_source_SCCs.py	122	4	97%	14–16, 88, 133
expand_to_target.py	31	3	90%	6, 38, 43
biobalm/_sd_attractors
attractor_candidates.py	263	89	66%	13–15, 26–27, 93, 101, 107–108, 130, 152, 187, 193–204, 223, 239–316, 321, 325, 331, 337, 348, 372, 377, 381, 387, 389–427, 500, 571–572, 673
attractor_symbolic.py	114	16	86%	6–7, 75, 88–92, 103, 112, 144, 179, 191–193, 202, 230, 236
TOTAL	1860	316	83%

Tests	Skipped	Failures	Errors	Time
357	0 💤	0 ❌	0 🔥	44.766s ⏱️

jcrozum · 2024-04-11T11:19:57Z

@daemontus any thoughts on how to reconcile this with #115? I'm thinking merge this first, then revert #115 to before the package rename and merge that, then make an entirely new pr just for the renaming. Not sure if there's a better way.

daemontus · 2024-04-11T11:45:18Z

I just tried to rebase it locally from #115 and it seems that we can first do #115 and then I'll just update this PR with the new name. Interestingly, other than the the newly created files/functions, most of the changes merged without conflicts... so it should be an easy fix :)

daemontus · 2024-04-11T18:43:03Z

This should now be fully rebased with the latest name change. (bcc: @jcrozum)

jcrozum

Left a few minor comments. Feel free to address them or ignore them as you see fit. I'd be ok merging as-is.

.github/workflows/test.yml

requirements.txt

pyproject.toml

biobalm/_pint_reachability.py

biobalm/_sd_attractors/attractor_candidates.py

biobalm/succession_diagram.py

daemontus · 2024-04-18T15:16:07Z

@jcrozum I think I resolved the issue with the "constant configuration values": SuccessionDiagram now has a configuration dictionary that stores all of these and you can pass it as a constructor parameter. As such, if you serialize the diagram, the "configuration" is preserved. Furthermore, these are no longer global module "constants" so we don't have to worry about users needing to access/modify them.

Please have a quick look if that looks reasonable to you, and if yes, feel free to merge the PR.

jcrozum · 2024-04-18T18:34:39Z

Looks good, I'll merge it.

daemontus requested a review from jcrozum April 11, 2024 15:58

daemontus added 9 commits April 11, 2024 20:10

Update expand benchmarks and output the SD to separate folder.

e77867a

Do not create the output directory automatically.

3960bf6

Fix formatting.

1d5c05e

Refactoring of the algorithm detection methods.

f941482

Formatting.

ec83587

Print changes recommended by black.

5238692

Formatting.

b960619

Release memory sooner to avoid wasting it on temporary values.

bee0c11

Minor bug in seed-guided expansion.

1054b3d

daemontus force-pushed the attractor-benchmarks branch from 78bc7bd to 1054b3d Compare April 11, 2024 18:35

Formatting.

fea3c20

jcrozum reviewed Apr 17, 2024

View reviewed changes

daemontus added 8 commits April 18, 2024 10:39

Do not require mole in CI.

76d9879

Make pint optional in requirements.txt

5da11a7

Update version.

17fe0b8

Typing in pint_reachability.

9b6782d

Nicer printing.

beb614e

Fix documentation headings.

e7bb36c

New configuration approach.

7bc242d

Add config to other constructors.

9f6de06

jcrozum merged commit 0438418 into main Apr 18, 2024
8 checks passed

daemontus deleted the attractor-benchmarks branch September 18, 2024 10:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attractor detection refactoring #116

Attractor detection refactoring #116

daemontus commented Apr 11, 2024 •

edited

Loading

github-actions bot commented Apr 11, 2024 •

edited

Loading

jcrozum commented Apr 11, 2024

daemontus commented Apr 11, 2024

daemontus commented Apr 11, 2024

jcrozum left a comment

daemontus commented Apr 18, 2024

jcrozum commented Apr 18, 2024

Attractor detection refactoring #116

Attractor detection refactoring #116

Conversation

daemontus commented Apr 11, 2024 • edited Loading

github-actions bot commented Apr 11, 2024 • edited Loading

jcrozum commented Apr 11, 2024

daemontus commented Apr 11, 2024

daemontus commented Apr 11, 2024

jcrozum left a comment

Choose a reason for hiding this comment

daemontus commented Apr 18, 2024

jcrozum commented Apr 18, 2024

daemontus commented Apr 11, 2024 •

edited

Loading

github-actions bot commented Apr 11, 2024 •

edited

Loading