-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parameter-by-parameter continuation using featurizing #69
Changes from all commits
50dbebc
ae17873
cc6c265
6519ff0
39e83c2
c56484b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,106 @@ | ||
export FeaturizingFindAndMatch | ||
import ProgressMeter | ||
using Random: MersenneTwister | ||
|
||
struct FeaturizingFindAndMatch{A, M, R<:Real, S, E} <: AttractorsBasinsContinuation | ||
mapper::A | ||
distance::M | ||
threshold::R | ||
seeds_from_attractor::S | ||
info_extraction::E | ||
end | ||
|
||
""" | ||
Very similar to the recurrences version, only difference being that the seeding from attractors is different. | ||
""" | ||
function FeaturizingFindAndMatch( | ||
mapper::AttractorsViaFeaturizing; distance = Centroid(), | ||
threshold = Inf, seeds_from_attractor = _default_seeding_process_featurizing, | ||
info_extraction = identity | ||
) | ||
return FeaturizingFindAndMatch( | ||
mapper, distance, threshold, seeds_from_attractor, info_extraction | ||
) | ||
end | ||
|
||
function _default_seeding_process_featurizing(attractor::AbstractStateSpaceSet, number_seeded_ics=10; rng = MersenneTwister(1)) | ||
return [rand(rng, vec(attractor)) for _ in 1:number_seeded_ics] #might lead to repeated ics, which is intended for the continuation | ||
end | ||
|
||
""" | ||
Continuation here is very similar to the one done with recurrences. The difference is only | ||
in how the ics from previous attractors are seeded to new parameters. In this case, we get ics sampled the previous attractors and pass them to | ||
basins_fractions, which extracts features from them and pushes them together with the other features. | ||
This could be generalized somehow so that one function could deal with both of the mappers, reducing this code duplication. | ||
""" | ||
function continuation( | ||
fam::FeaturizingFindAndMatch, | ||
prange, pidx, ics; | ||
samples_per_parameter = 100, show_progress = true, | ||
) | ||
progress = ProgressMeter.Progress(length(prange); | ||
desc="Continuating basins fractions:", enabled=show_progress | ||
) | ||
|
||
if ics isa Function | ||
error("`ics` needs to be a Dataset.") | ||
end | ||
|
||
(; mapper, distance, threshold) = fam | ||
reset!(mapper) | ||
# first parameter is run in isolation, as it has no prior to seed from | ||
set_parameter!(mapper.ds, pidx, prange[1]) | ||
fs, _ = basins_fractions(mapper, ics; show_progress = false, N = samples_per_parameter) | ||
# At each parmaeter `p`, a dictionary mapping attractor ID to fraction is created. | ||
fractions_curves = [fs] | ||
# Furthermore some info about the attractors is stored and returned | ||
prev_attractors = deepcopy(mapper.attractors) | ||
get_info = attractors -> Dict( | ||
k => fam.info_extraction(att) for (k, att) in attractors | ||
) | ||
info = get_info(prev_attractors) | ||
attractors_info = [info] | ||
ProgressMeter.next!(progress; showvalues = [("previous parameter", prange[1]),]) | ||
# Continue loop over all remaining parameters | ||
for p in prange[2:end] | ||
set_parameter!(mapper.ds, pidx, p) | ||
reset!(mapper) | ||
|
||
# Collect ics from previous attractors to pass as additional ics to basins fractions (seeding) | ||
# to ensure that the clustering will identify them as clusters, we need to guarantee that there | ||
# are at least `min_neighbors` entries | ||
additional_ics = Dataset(vcat(map(att-> | ||
fam.seeds_from_attractor(att, fam.mapper.group_config.min_neighbors), | ||
values(prev_attractors))...)) #dataset with ics seeded from previous attractors | ||
|
||
# Now perform basin fractions estimation as normal, utilizing found attractors | ||
fs, _ = basins_fractions(mapper, ics; | ||
show_progress = false, N = samples_per_parameter, additional_ics | ||
) | ||
current_attractors = mapper.attractors | ||
if !isempty(current_attractors) && !isempty(prev_attractors) | ||
# If there are any attractors, | ||
# match with previous attractors before storing anything! | ||
rmap = match_attractor_ids!( | ||
current_attractors, prev_attractors; distance, threshold | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In this method you compare de distance between trajectories that spans the attractors. Can we compute the distance between each clusters of points in the feature space? Is it more reliable than matching trajectories? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So matching by comparing distances between the features, instead of attractors? It's a nice idea. As I understand, this could be done by the user via the One advantage of matching directly by the features is that one wouldn't need to reconstruct the attractors, and so could still use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@KalelR that's what I was trying to tell you in-person in Portland... That you convert the clusters into There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. With this matching process, you completely alleviate the need to re-construct the attractors and to store/find an initial condition to go to each attravtcort. Yet you still keep the slice-by-slice continuation in featurizing that you want. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can match a subset of the cluster to avoid a memory bottleneck. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, I understand this. What I am not sure is whether there is benefit in mixing the two approaches in this way. The benefit of the recurrence approach is the rigorousness that you have a guarantee that you find an attractor, while for featurizing cluster != attractor.
I am suggesting that, however I would like first to see results of matching directly on the feature space before converting to attractors. I think this is the best option. But I could be wrong, and what you prpose, to go for feature space to attractor space, then match, and then go back again, could be a better approach. It is much more complicated though both logically and in terms of code. So I would still argue with should first try parameter-by-parameter matching directly on the feature space, simply by considering each cluster / group-of-features as a StateSpaceSet and then literally calling the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. p.s.: Sorry for the late reply. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a metric to measure how good is the matching between two clusters of features? For example distance correlation would be an example. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. good idea! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, I'm not sure because this measure is invariant w.r.t. absolute position of the clusters. While for us this is crucial. If one cluster is identical to the other but shifted by 10 in each dimension, these should be totally different clusters / attractors. |
||
) | ||
swap_dict_keys!(fs, rmap) | ||
end | ||
# Then do the remaining setup for storing and next step | ||
push!(fractions_curves, fs) | ||
push!(attractors_info, get_info(current_attractors)) | ||
overwrite_dict!(prev_attractors, current_attractors) | ||
ProgressMeter.next!(progress; showvalues = [("previous parameter", p),]) | ||
end | ||
# Normalize to smaller available integers for user convenience | ||
rmap = retract_keys_to_consecutive(fractions_curves) | ||
for (da, df) in zip(attractors_info, fractions_curves) | ||
swap_dict_keys!(da, rmap) | ||
swap_dict_keys!(df, rmap) | ||
end | ||
return fractions_curves, attractors_info | ||
end | ||
|
||
function reset!(mapper::AttractorsViaFeaturizing) | ||
empty!(mapper.attractors) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have the
extract_attractors
backend. The method dispatch on the mapper type.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change it!