Parameter-by-parameter continuation using featurizing #69

KalelR · 2023-05-30T14:13:32Z

So far the only continuation method using featurizing is the global pool version, based on MCBB. But there's no reason why we can't do a parameter-by-parameter continuation like with the recurrences method. This pull request implements this method, based heavily on the code for the recurrences. There are very few changes needed to do this. The idea is that:

You find the attractors and fractions via basins_fractions using AttractorsViaFeaturizing. Then you extract the attractors from this by re-integrating attractors from one initial condition belonging to each cluster you found.
Take ics from previous attractors and use them as additional seeds for the subsequent parameter value. This is the biggest difference to the recurrences version. For featurizing, these additional_ics need to be put together with the normally sampled ics and clustered together. What I did was to add an additional_ics keyword to basins_fractions, which receives these ics, extracts their features, and adds these additional features to the normal features Dataset. This was the least disruptive way I found. Note that, to ensure that the ic from each attractor will be characterized as an attractor, it is repeated a min_neighbors number of times.

What I did was:

Extend the extract_attractors function for when ics are a sampler function. For this, the sampler function is deepcopied, then one copy is sent to extract features, and another sent to regenerate the attractors.
Modify the seeds_from_attractor function for the featurizing
Create FeaturizingFindAndMatch and a new continuation function in the continuation_featurizing_findandmatch file.

Note that the changes are small enough that we can unify both methods in the same function, reducing the duplication. I'm opening the pull request to let you know about this and hear your insights. The code design can still be improved for sure!

I have tested this for lorenz and it seems to have worked, but I'd still like to test more become merging.

…function; not the featurizing mapper in basins_fractions extracts attractors in all cases, this will be useful in continuation

…ough featurizing_find_and_match

codecov-commenter · 2023-05-30T14:34:11Z

Codecov Report

Merging #69 (6519ff0) into main (6a4fd8f) will decrease coverage by 1.95%.
The diff coverage is 31.03%.

@@            Coverage Diff             @@
##             main      #69      +/-   ##
==========================================
- Coverage   68.47%   66.52%   -1.95%     
==========================================
  Files          21       22       +1     
  Lines        1126     1180      +54     
==========================================
+ Hits          771      785      +14     
- Misses        355      395      +40

Impacted Files	Coverage Δ
...tinuation/continuation_featurizing_findandmatch.jl	`0.00% <0.00%> (ø)`
src/mapping/attractor_mapping_featurizing.jl	`67.53% <90.00%> (+5.23%)`	⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

…gh featurizing; I realized that the ics cannot be re-generated consistently from the sampler if the ics are originally generated in a threaded loop - the uncertain order of the threads makes the access to the sampler unpredictable, and so impossible to re-generate

KalelR · 2023-06-01T10:49:09Z

Part of the code in this PR was offering support for extracting attractors in the Featurizing case when ics is a sampler function, not a pre-generated Dataset. The nice part would be saving up memory, since we wouldn't need to keep lots of unnecessary initial conditions in memory.

But I just realized that this can't be done when the feature extraction is parallelized: the initial conditions can't be re-generated from the sampler function if they are originally generated in a threaded loop. The uncertain order of the thread accesses makes the access to the sampler unpredictable, and so impossible to re-generate.

Simple parallelizing is in general much more useful than saving up memory, I am deleting these attempts. Therefore, the continuation needs to be done by giving a Dataset with the pre-generated initial conditions, so that the attractors can be regenerated. This is working fine on the tests I'm doing.

awage · 2023-06-01T10:58:40Z

src/continuation/continuation_featurizing_findandmatch.jl

+        fs, _ = basins_fractions(mapper, ics;
+            show_progress = false, N = samples_per_parameter, additional_ics
+        )
+        current_attractors = mapper.attractors


We have the extract_attractors backend. The method dispatch on the mapper type.

I'll change it!

awage · 2023-06-01T11:06:33Z

src/continuation/continuation_featurizing_findandmatch.jl

+            # If there are any attractors,
+            # match with previous attractors before storing anything!
+            rmap = match_attractor_ids!(
+                current_attractors, prev_attractors; distance, threshold


In this method you compare de distance between trajectories that spans the attractors. Can we compute the distance between each clusters of points in the feature space? Is it more reliable than matching trajectories?

So matching by comparing distances between the features, instead of attractors? It's a nice idea. As I understand, this could be done by the user via the distance function, right? Receive the attractor, extract the feature and compute the distance.

One advantage of matching directly by the features is that one wouldn't need to reconstruct the attractors, and so could still use ics as a sampler function. But I think the generality of having the attractors (generated by having ìcsas a dataset) outweights the memory savings of havingics` as a function. What do you think?

So matching by comparing distances between the features, instead of attractors? It's a nice idea. As

@KalelR that's what I was trying to tell you in-person in Portland... That you convert the clusters into StateSpaceSet and then call the match_attractor_ids! function on the clusters themselves. The function doesn't know what an attractor is; it could be an actual attractor or any StateSpaceSet.

With this matching process, you completely alleviate the need to re-construct the attractors and to store/find an initial condition to go to each attravtcort. Yet you still keep the slice-by-slice continuation in featurizing that you want.

You can match a subset of the cluster to avoid a memory bottleneck.

point is that, if you reconstruct the attractors, you can still do that (just extract the features again) but also much more (simple example being Euclidean distance).

Yes, I understand this. What I am not sure is whether there is benefit in mixing the two approaches in this way. The benefit of the recurrence approach is the rigorousness that you have a guarantee that you find an attractor, while for featurizing cluster != attractor.

do you guys think that this is not worth it? That if the attractors have been identified by clusterizing the features, then they also should be matched by comparing the features?

I am suggesting that, however I would like first to see results of matching directly on the feature space before converting to attractors. I think this is the best option. But I could be wrong, and what you prpose, to go for feature space to attractor space, then match, and then go back again, could be a better approach. It is much more complicated though both logically and in terms of code. So I would still argue with should first try parameter-by-parameter matching directly on the feature space, simply by considering each cluster / group-of-features as a StateSpaceSet and then literally calling the match_attractor_ids! function.

p.s.: Sorry for the late reply.

Is there a metric to measure how good is the matching between two clusters of features? For example distance correlation would be an example.

Actually, I'm not sure because this measure is invariant w.r.t. absolute position of the clusters. While for us this is crucial. If one cluster is identical to the other but shifted by 10 in each dimension, these should be totally different clusters / attractors.

Datseris · 2023-08-15T12:51:56Z

bump @KalelR

Datseris · 2023-09-27T11:34:44Z

bump @KalelR

KalelR · 2023-09-27T13:06:40Z

bump @KalelR

Hey, yes! I'll work on this and the other related projects this weekend/next week. Will update this soon :)

KalelR · 2023-10-02T10:45:10Z

Hey guys, sorry for taking so long to come back to this. I still have some questions I'd like to understand your point of view about before proceeding. I'm asking these questions because I'm thinking of my use-cases, but maybe there are others I haven't considered. To recap, the continuation should:

Find the attractors and fractions via basins_fractions using AttractorsViaFeaturizing.
Match attractors or clouds of features
Repeat 1-2, using ics from previous attractors as additional seeds for the subsequent parameter value.

The matching can be done in 3 ways:

Comparing the clouds of features
Comparing the attractors directly. This requires that initial condition with a distinct label to be integrated and considered as an attractor. This is the default procedure we've always done, and is indeed done by default on basins_fractions when ics is a pre-generated dataset. In this sense, the continuation function already has access to the attractors, no modification needed.
Compare features from the attractors (a specific case of 2). One feature per attractor, not a cloud.

My main question is: what is the point of doing the matching 1 (via clouds of features)? The arguments as I see are

No need to re-construct the attractors. True, but basins_fractions already does this for pre-generated ics. For sampler() ics this would be an advantage. But I don't see the real gain just from this.
Simplicity. The clustering is done on features, so the matching also should. This is the strongest point in my view. But why should it done for the cloud of features then, instead of a single features (like in matching 3)?
Featurizing does not guarantee that each cloud of feature corresponds to an attractor. This is true, but then what is the point of doing continuation if one does not trust the single-parameter algorithm to start with? If we are doing attractor continuation, we should first make sure that Featurizing is finding real attractors anyway. Would there be another use case in which the user does not trust the Featurizing algorithm? Also, in deterministic systems, decent features should convergence when on the same attractor. So clouds of features, in the sense of features centered around an average but with some deviation, are undesirable, a result of short integration times or bad features. Why would we choose to use the clouds then, instead of the centroid or some single feature from the cloud?

Further, say we choose to the cloud of features. As Alex said, using all of them might use too be much memory, or be too slow. Then we would need to select a few. How do we this in an objective way?

Datseris · 2023-10-08T07:31:15Z

Hey,

sorry for taking long to reply! So, here is how I see things:

Comparing the clouds of features

Compare features from the attractors (a specific case of 2). One feature per attractor, not a cloud.

Heh, but here is where I see things differently. This is one way to get "one feature per attractor". But another one is to get the centroid of the features. Which is a natural usage of the sets_distace function: it has Centroid out of the box. No need to find attractor, re-eveolve initial condition on the attractor, and then estimate the "features of the attractor". The centroid should be doing good enough of a job by itself and should be close to the "features of the attractor".

That is why, all the way from the start, I was arguing in favor of matching directly on the cloud of points. Maybe i wasn't making myself clear, but thankfully you reached the same conclusion.

So here is how I see it going:

continuation happens slice by slice. Hence, we need a new continuation method. Something like FeaturizeGroupParameterByParameter (perhaps a better name will come up).
The source code of this new continuation is very similar, almost identical, to RecurrencesFindAndMatch. The only difference is that instead of calling the setsofsets_disance function with attractors, we call it with clouds of features. Both attractors and clouds of features are StateSpaceSets.

That's it. You have to write very little source code with this approach.

(Hence, I would argue, if you go with this approach, please start a new pull request).

What remains to be discussed is if we believe there is any gain in the functionality of this PR, that "extracts" attractors from the grouping_featurizing method. I am not sure about that. At the moment, what does the function extract_attractors do when an AttractorsViaFeaturizing is given as an input?

Datseris · 2023-10-08T07:32:29Z

Then we would need to select a few. How do we this in an objective way?

Using the sample function. We already do this in the clustering method by subsampling features. Here you just need to make sure to sample per cluster. So each cluster provides 10% of its features.

Datseris · 2023-12-04T13:27:57Z

bump @KalelR

Datseris · 2024-02-17T18:03:18Z

bump @KalelR

Datseris · 2024-02-17T18:03:27Z

(also probably better to start a clean PR)

Datseris · 2024-08-08T10:15:50Z

See AttractorsSeedContinueMatch global continuation.

KalelR added 4 commits May 29, 2023 13:39

implement attractor extraction in featurizing when ics are a sampler …

50dbebc

…function; not the featurizing mapper in basins_fractions extracts attractors in all cases, this will be useful in continuation

allow additional_ics to be passed to add support for continuation thr…

ae17873

…ough featurizing_find_and_match

add code for parameter-by-parameter continuation using featurizing

cc6c265

include featurizing find and match in the api

6519ff0

KalelR requested review from awage and Datseris May 30, 2023 14:13

finish removing attempt to extract attractors for ics :: Function

c56484b

awage reviewed Jun 1, 2023

View reviewed changes

KalelR mentioned this pull request May 30, 2024

(2.0) Parameter-by-parameter continuation using featurizing #126

Closed

Datseris closed this Aug 8, 2024

Datseris deleted the continuation_featurizing branch August 8, 2024 10:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parameter-by-parameter continuation using featurizing #69

Parameter-by-parameter continuation using featurizing #69

KalelR commented May 30, 2023

codecov-commenter commented May 30, 2023

KalelR commented Jun 1, 2023

awage Jun 1, 2023

KalelR Jun 1, 2023

awage Jun 1, 2023

KalelR Jun 1, 2023

Datseris Jun 1, 2023 •

edited

Loading

Datseris Jun 1, 2023

awage Jun 1, 2023

Datseris Jun 9, 2023

Datseris Jun 9, 2023

awage Jun 13, 2023

Datseris Jun 13, 2023

Datseris Jun 13, 2023

Datseris commented Aug 15, 2023

Datseris commented Sep 27, 2023

KalelR commented Sep 27, 2023

KalelR commented Oct 2, 2023

Datseris commented Oct 8, 2023

Datseris commented Oct 8, 2023

Datseris commented Dec 4, 2023

Datseris commented Feb 17, 2024

Datseris commented Feb 17, 2024

Datseris commented Aug 8, 2024

Parameter-by-parameter continuation using featurizing #69

Parameter-by-parameter continuation using featurizing #69

Conversation

KalelR commented May 30, 2023

codecov-commenter commented May 30, 2023

Codecov Report

KalelR commented Jun 1, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Datseris Jun 1, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Datseris commented Aug 15, 2023

Datseris commented Sep 27, 2023

KalelR commented Sep 27, 2023

KalelR commented Oct 2, 2023

Datseris commented Oct 8, 2023

Datseris commented Oct 8, 2023

Datseris commented Dec 4, 2023

Datseris commented Feb 17, 2024

Datseris commented Feb 17, 2024

Datseris commented Aug 8, 2024

Datseris Jun 1, 2023 •

edited

Loading