Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameter-by-parameter continuation using featurizing #69

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/continuation/basins_fractions_continuation_api.jl
Original file line number Diff line number Diff line change
Expand Up @@ -57,4 +57,5 @@ function continuation end
include("match_attractor_ids.jl")
include("continuation_recurrences.jl")
include("continuation_grouping.jl")
include("continuation_featurizing_findandmatch.jl")
include("aggregate_attractor_fractions.jl")
106 changes: 106 additions & 0 deletions src/continuation/continuation_featurizing_findandmatch.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
export FeaturizingFindAndMatch
import ProgressMeter
using Random: MersenneTwister

struct FeaturizingFindAndMatch{A, M, R<:Real, S, E} <: AttractorsBasinsContinuation
mapper::A
distance::M
threshold::R
seeds_from_attractor::S
info_extraction::E
end

"""
Very similar to the recurrences version, only difference being that the seeding from attractors is different.
"""
function FeaturizingFindAndMatch(
mapper::AttractorsViaFeaturizing; distance = Centroid(),
threshold = Inf, seeds_from_attractor = _default_seeding_process_featurizing,
info_extraction = identity
)
return FeaturizingFindAndMatch(
mapper, distance, threshold, seeds_from_attractor, info_extraction
)
end

function _default_seeding_process_featurizing(attractor::AbstractStateSpaceSet, number_seeded_ics=10; rng = MersenneTwister(1))
return [rand(rng, vec(attractor)) for _ in 1:number_seeded_ics] #might lead to repeated ics, which is intended for the continuation
end

"""
Continuation here is very similar to the one done with recurrences. The difference is only
in how the ics from previous attractors are seeded to new parameters. In this case, we get ics sampled the previous attractors and pass them to
basins_fractions, which extracts features from them and pushes them together with the other features.
This could be generalized somehow so that one function could deal with both of the mappers, reducing this code duplication.
"""
function continuation(
fam::FeaturizingFindAndMatch,
prange, pidx, ics;
samples_per_parameter = 100, show_progress = true,
)
progress = ProgressMeter.Progress(length(prange);
desc="Continuating basins fractions:", enabled=show_progress
)

if ics isa Function
error("`ics` needs to be a Dataset.")
end

(; mapper, distance, threshold) = fam
reset!(mapper)
# first parameter is run in isolation, as it has no prior to seed from
set_parameter!(mapper.ds, pidx, prange[1])
fs, _ = basins_fractions(mapper, ics; show_progress = false, N = samples_per_parameter)
# At each parmaeter `p`, a dictionary mapping attractor ID to fraction is created.
fractions_curves = [fs]
# Furthermore some info about the attractors is stored and returned
prev_attractors = deepcopy(mapper.attractors)
get_info = attractors -> Dict(
k => fam.info_extraction(att) for (k, att) in attractors
)
info = get_info(prev_attractors)
attractors_info = [info]
ProgressMeter.next!(progress; showvalues = [("previous parameter", prange[1]),])
# Continue loop over all remaining parameters
for p in prange[2:end]
set_parameter!(mapper.ds, pidx, p)
reset!(mapper)

# Collect ics from previous attractors to pass as additional ics to basins fractions (seeding)
# to ensure that the clustering will identify them as clusters, we need to guarantee that there
# are at least `min_neighbors` entries
additional_ics = Dataset(vcat(map(att->
fam.seeds_from_attractor(att, fam.mapper.group_config.min_neighbors),
values(prev_attractors))...)) #dataset with ics seeded from previous attractors

# Now perform basin fractions estimation as normal, utilizing found attractors
fs, _ = basins_fractions(mapper, ics;
show_progress = false, N = samples_per_parameter, additional_ics
)
current_attractors = mapper.attractors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the extract_attractors backend. The method dispatch on the mapper type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change it!

if !isempty(current_attractors) && !isempty(prev_attractors)
# If there are any attractors,
# match with previous attractors before storing anything!
rmap = match_attractor_ids!(
current_attractors, prev_attractors; distance, threshold
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this method you compare de distance between trajectories that spans the attractors. Can we compute the distance between each clusters of points in the feature space? Is it more reliable than matching trajectories?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So matching by comparing distances between the features, instead of attractors? It's a nice idea. As I understand, this could be done by the user via the distance function, right? Receive the attractor, extract the feature and compute the distance.

One advantage of matching directly by the features is that one wouldn't need to reconstruct the attractors, and so could still use ics as a sampler function. But I think the generality of having the attractors (generated by having ìcsas a dataset) outweights the memory savings of havingics` as a function. What do you think?

Copy link
Member

@Datseris Datseris Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So matching by comparing distances between the features, instead of attractors? It's a nice idea. As

@KalelR that's what I was trying to tell you in-person in Portland... That you convert the clusters into StateSpaceSet and then call the match_attractor_ids! function on the clusters themselves. The function doesn't know what an attractor is; it could be an actual attractor or any StateSpaceSet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this matching process, you completely alleviate the need to re-construct the attractors and to store/find an initial condition to go to each attravtcort. Yet you still keep the slice-by-slice continuation in featurizing that you want.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can match a subset of the cluster to avoid a memory bottleneck.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

point is that, if you reconstruct the attractors, you can still do that (just extract the features again) but also much more (simple example being Euclidean distance).

Yes, I understand this. What I am not sure is whether there is benefit in mixing the two approaches in this way. The benefit of the recurrence approach is the rigorousness that you have a guarantee that you find an attractor, while for featurizing cluster != attractor.

do you guys think that this is not worth it? That if the attractors have been identified by clusterizing the features, then they also should be matched by comparing the features?

I am suggesting that, however I would like first to see results of matching directly on the feature space before converting to attractors. I think this is the best option. But I could be wrong, and what you prpose, to go for feature space to attractor space, then match, and then go back again, could be a better approach. It is much more complicated though both logically and in terms of code. So I would still argue with should first try parameter-by-parameter matching directly on the feature space, simply by considering each cluster / group-of-features as a StateSpaceSet and then literally calling the match_attractor_ids! function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p.s.: Sorry for the late reply.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a metric to measure how good is the matching between two clusters of features? For example distance correlation would be an example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm not sure because this measure is invariant w.r.t. absolute position of the clusters. While for us this is crucial. If one cluster is identical to the other but shifted by 10 in each dimension, these should be totally different clusters / attractors.

)
swap_dict_keys!(fs, rmap)
end
# Then do the remaining setup for storing and next step
push!(fractions_curves, fs)
push!(attractors_info, get_info(current_attractors))
overwrite_dict!(prev_attractors, current_attractors)
ProgressMeter.next!(progress; showvalues = [("previous parameter", p),])
end
# Normalize to smaller available integers for user convenience
rmap = retract_keys_to_consecutive(fractions_curves)
for (da, df) in zip(attractors_info, fractions_curves)
swap_dict_keys!(da, rmap)
swap_dict_keys!(df, rmap)
end
return fractions_curves, attractors_info
end

function reset!(mapper::AttractorsViaFeaturizing)
empty!(mapper.attractors)
end
14 changes: 10 additions & 4 deletions src/mapping/attractor_mapping_featurizing.jl
Original file line number Diff line number Diff line change
Expand Up @@ -102,19 +102,25 @@ ValidICS = Union{AbstractStateSpaceSet, Function}
#####################################################################################
# We only extend the general `basins_fractions`, because the clustering method
# cannot map individual initial conditions to attractors
# additional_ics is for internal use in the findandmatch continuation
function basins_fractions(mapper::AttractorsViaFeaturizing, ics::ValidICS;
show_progress = true, N = 1000
show_progress = true, N = 1000, additional_ics::Union{ValidICS, Nothing} = nothing,
)
features = extract_features(mapper, ics; show_progress, N)

if !isnothing(additional_ics)
additional_features = extract_features(mapper, additional_ics; show_progress, N)
features = vcat(features, additional_features)
end

group_labels = group_features(features, mapper.group_config)
fs = basins_fractions(group_labels) # Vanilla fractions method with Array input

if typeof(ics) <: AbstractStateSpaceSet
# TODO: If we could somehow extract the used initial conditions from `ics` 7
# in case `ics` was a function, that would be cool...
attractors = extract_attractors(mapper, group_labels, ics)
overwrite_dict!(mapper.attractors, attractors)
return fs, group_labels
else
else #no attractor extraction if `ics` are a sampler function
return fs
end
end
Expand Down
5 changes: 5 additions & 0 deletions test/mapping/attractor_mapping.jl
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ function test_basins(ds, u0s, grid, expected_fs_raw, featurizer;
end
# Generic test
fs = basins_fractions(mapper, sampler; show_progress = false)
approx_atts_sampler = extract_attractors(mapper)
for k in keys(fs)
@test 0 ≤ fs[k] ≤ 1
end
Expand All @@ -66,6 +67,10 @@ function test_basins(ds, u0s, grid, expected_fs_raw, featurizer;
@test abs(fs[k] - expected_fs_raw[k]) ≤ err
end
end

@test length(approx_atts_sampler) == length(approx_atts)
@test all(approx_atts_sampler[i] == approx_atts[i] for i in eachindex(approx_atts_sampler) )

# `basins_of_attraction` tests
basins, approx_atts = basins_of_attraction(mapper, reduced_grid; show_progress=false)
@test length(size(basins)) == length(grid)
Expand Down