Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Botorch with cardinality constraint via sampling #301

Draft
wants to merge 57 commits into
base: main
Choose a base branch
from

Conversation

Waschenbacher
Copy link
Collaborator

@Waschenbacher Waschenbacher commented Jul 3, 2024

(edited by @AdrianSosic)

This PR adds support for cardinality constraints to BotorchRecommender. The core idea is to tackle the problem in an exhaustive search like manner, i.e. by

  • enumerating the possible combinations of in-/active parameters dictated by the cardinality constraints
  • optimizing the corresponding restricted subspaces, where the cardinality constraint can then be simply removed since the in-/active sets are fixed within these subspaces
  • aggregating the optimization results of the individual subspaces into a single recommendation batch.

The PR implements two mechanisms for determining the configuration of inactive parameters:

  • When the combinatorial list of possible inactive parameter configurations is not too large, we iterate the full list
  • otherwise, a fixed amount of inactive parameter configurations is randomly selected

The current aggregation step is to simply optimize all subspaces independently of each other and then return the batch from the subspace where the highest acquisition value is achieved. This has the side-effect that the set of inactive parameters is the same across the entire recommendation batch. This can be a desirable property in many use cases but potentially higher acquisition values can be obtained by altering the in-/activity sets across the batch. A simple way to achieve this (though out of scope for this PR) is by generalizing the sequential greedy principle to multiple subspaces.

Out of scope

  • Fulfilling cardinality constraints by passing them in suitable form as nonlinear constraints to the optimizer
  • Sequential greedy optimization to achieve varying in-/activity sets (see explanation above)

@Waschenbacher Waschenbacher added the new feature New functionality label Jul 3, 2024
@Waschenbacher Waschenbacher self-assigned this Jul 3, 2024
@Waschenbacher Waschenbacher marked this pull request as draft July 4, 2024 07:38
@Waschenbacher Waschenbacher force-pushed the feature/cardinality_constraint_to_botorch_via_sampling branch from 0be3285 to 1f7783d Compare July 4, 2024 09:21
@Waschenbacher Waschenbacher marked this pull request as ready for review July 4, 2024 09:35
CHANGELOG.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@AdrianSosic AdrianSosic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Waschenbacher, thanks for the work. This is not yet a review but only a first batch of very high-level comments that I would like you to address before we can go into the actual review process. The reason being that the main functionality brought by this PR is currently rather convoluted and hard to parse, so I'd prefer to to work with a more readable version, tbh

baybe/constraints/continuous.py Outdated Show resolved Hide resolved
baybe/constraints/continuous.py Outdated Show resolved Hide resolved
baybe/recommenders/pure/bayesian/botorch.py Outdated Show resolved Hide resolved
baybe/recommenders/pure/bayesian/botorch.py Outdated Show resolved Hide resolved
baybe/recommenders/pure/bayesian/botorch.py Outdated Show resolved Hide resolved
baybe/searchspace/continuous.py Outdated Show resolved Hide resolved
baybe/searchspace/continuous.py Outdated Show resolved Hide resolved
baybe/searchspace/continuous.py Outdated Show resolved Hide resolved
@Waschenbacher Waschenbacher force-pushed the feature/cardinality_constraint_to_botorch_via_sampling branch from 1f7783d to 9d28b49 Compare July 12, 2024 07:38
@AdrianSosic AdrianSosic force-pushed the feature/cardinality_constraint_to_botorch_via_sampling branch from f68ce4f to fc52875 Compare August 15, 2024 12:52
Copy link
Collaborator

@AdrianSosic AdrianSosic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Waschenbacher, I've finally managed to spend some time on this important PR, thanks again for your preparation work. I've already refactored some parts that were clear to me. However, I'm not yet certain about the design at the remaining places. I have the feeling that it can potentially be simplified a lot, depending on whether it is possible to reuse the idea of reduced subspaces. Have marked the corresponding places with comments. I think we need to discuss this part first before we can continue.

baybe/searchspace/continuous.py Outdated Show resolved Hide resolved
baybe/recommenders/pure/bayesian/botorch.py Show resolved Hide resolved
@AdrianSosic AdrianSosic force-pushed the feature/cardinality_constraint_to_botorch_via_sampling branch from 8a7ef15 to 046a8e2 Compare October 29, 2024 17:17
Copy link
Collaborator

@AdrianSosic AdrianSosic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @AVHopp @Scienfitz. I spent some time an refactored @Waschenbacher's proposal quite a bit, and think that the logic shipped by this PR is now very clear. So from my end, everything seems in good shape and we are ready for a fresh round of review. But I feel we are close to getting it merged. There a few open things to be discussed, I've marked the corresponding parts and also updated the PR description. Let me know what you think

@@ -150,5 +150,38 @@ def summary(self) -> dict:
return param_dict


@define(frozen=True, slots=False)
class _FixedNumericalContinuousParameter(ContinuousParameter):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AVHopp @Scienfitz: The class proved to be useful for this PR, even though we could have implemented the mechanism in a different way where instead of fixing parameter values we could have created reduced spaces where the parameter values are dropped. However, since botorch's optimizers accept a list of fixed values, this was the simpler implementation.

I've kept it private for now because I first want to see if/how things might need to be adjusted. For example, just yesterday, I had a conversation with @julianStreibel about another feature that would be very useful in the future, namely conditioning recommendations onto certain subspaces without having to redefine the searchspace object. I guess we'll use it for that as well, but until then I'll keep it hidden.

Any comments, thoughts? If not, just give thumbs up and I will resolve

@@ -87,3 +89,51 @@ def get_parameters_from_dataframe(
def sort_parameters(parameters: Collection[Parameter]) -> tuple[Parameter, ...]:
"""Sort parameters alphabetically by their names."""
return tuple(sorted(parameters, key=lambda p: p.name))


def activate_parameter(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Scienfitz @AVHopp: This is one of the important open points to discuss. As you probably know by now, the mechanism of this PR is to divide parameters in to active and inactive sets. However, this opens a general question, not only related to this PR:

  • What do we even mean with an "active continuous parameter", i.e. what is the value range where such a parameter is considered "active"?

The current logic is to assume some small threshold around 0 to discriminate active/inactive ranges. And this function then takes care of "activating" a parameter according to this definition.

Problem to be discussed:
For parameters with bound falling into the threshold range, everything is simple. But how can we guarantee that a parameter with bounds (-a, b) with a,b >> threshold is active? We cannot simply "cut out" the threshold interval since we then end up with a non-continuous range (see comment in docstring). This causes downstream problems in the sense that we cannot strictly guarantee that the "min_cardinality" part of a cardinality constrained will be fulfilled, since any optimizer might decide that – even if we assign the parameter to the "active set" – its value should be driven to zero. And then the cardinality will drop by one. In theory, one could solve this by branching into the two separate subspaces of the non-continuous range of the parameter, but that will end up with yet another exponential blow-up.

TL;DR: I think min-cardinality constraints are really hard to strictly fulfill. On the other hand, I cannot think of any use case where people actually need a min-constraint. So I see two options:

  • (not preferred) dropping the option of min values in cardinality constraints.
  • (my suggestion) keeping everything as is and being aware (plus potentially mention it in the docstring) that the min-part of the constraint can be overridden by the optimizer. The min part still reduces the overall search burden in that it rules out certain subspaces to start with. But for the ones left, the min-part could be violated.

Input is appreciated!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick follow-up: whatever we decide here, the corresponding logic of the random recommender should also be updated, which does not yet use any concept of activated parameters. It's not as important for the random recommender as it is for optimization based ones, but still ...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First question: Wasn't this sort of a problem one of the main arguments against using Narrow Gaussian Processes? By my understanding, the problem that we are getting here is now basically identical, could you please give a quick heads-up why Narrow Gaussians are not an option/alternative? Don't need an extensive discussion, but rather a "Hey Alex, we already talked about that, let me quickly refresh your memory".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now to your actual points.

  • First problem that I see with allowing min_cardinality and just mentioning in the docstring that this might be overwritten by the optimizer seems VERY dangerous to me. As we know by now, people tend to not read documentation at all. So what I see is people thinking "Oh, great, I can use a min_cardinality", stop to read and think at exactly that point, assume that it works as intended and then be angry if it does not work.
  • This is a potential crucial silent error that people might have at very inconvenient times: Imagine that you really require this lower cardinality, and that it works the first X runs of your optimization, just to fail in run X+1, invalidating potentially your whole set up.
  • As you say yourself: "I cannot think of any use case where people actually need a min-constraint". We said that things like this should be a strong indicator to not implement stuff.

So I'd propose to drop the support for this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To first question: let's tackle this topic in a dedicated baybathon
Second: today in dev meeting

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As agreed in the baybathon, there are two options how to support the cardinality constraint (especially the minimum cardinality) in Botorch recommender:

  • Option I: this PR with hard threshold. If possible, the bounds of an active parameter gets updated by excluding the inactive range (-threshold, threshold). But if impossible, e.g. a parameter has bounds (-a, b) with a,b >> threshold (see discussion above), its value may be driven to zero by the optimizer and the min_cardinality might get violated. A warning MinimumCardinalityViolatedWarning will be raised if the minimum cardinality is violated.
  • Option II: a future PR based on narrow Gaussian kernel = soft-constraint.

@@ -76,6 +85,13 @@ class BotorchRecommender(BayesianRecommender):
optimization. **Does not affect purely discrete optimization**.
"""

max_n_subspaces: int = field(default=10, validator=[instance_of(int), ge(1)])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AVHopp @Scienfitz: To you think this is an appropriate attribute name?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it should be linked to cardinality somehow: For people that are not interested in cardinality constraints, it is only clear after reading the docstring that this is not interesting for them. This would however make the name quite long :/ So although I am not perfectly happy with the name, we could keep it as I do not see a better alternative while keeping it here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get your point but I see this emerging as part of a bigger problem, actually, not limited to this particular attribute. Namely that all attributes of the class are linked to particular settings only:

  • sequential_continuous is relevant for continuous spaces only
  • hybrid_samples is relevant for hybrid spaces only
  • sampling_percentage is relevant for hybrid spaces and specific sampling strategies only

In that sense, max_n_subspaces is even a little less problematic since it is strictly speaking not only relevant for cardinality constraints but could also be used for hybrid spaces. And now I also see another clash, since it's functionality basically overlaps with sampling_percentage 🤔

So two TODOs from I see:

  • Cleverly resolve the duplication somehow
  • Potentially, in a next PR: find a better structure. We could consider, for example, to capture all the settings in a separate object like a TypedDict or so, that then uses its own validation path

@Scienfitz: Thoughts from your end?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid the duplicates, we could rename sampling_percentage to hybrid_sampling_percentage or something similar, that might already be sufficient. Agree that it would be beneficial to somehow "merge" these two attributes soon.

),
]
return to_string(self.__class__.__name__, *fields)
def _optimize_continuous_subspaces(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AVHopp @Scienfitz: Just wanted to raise awareness. I've refactored @Waschenbacher's original version quite significantly, but the nice outcome of this is that we now have this useful and general helper that can be used to optimize over arbitrary non-connected subspaces. Thumbs up once you saw it and I'll resolve

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we potentially leverage this for Discrete Searchspaces or some sort of hybrid recommendation as well?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the use is by no means limited to cardinality constraints. It's basically the baybe equivalent to botorchs optimize_acqf_mixed, if you will.

@@ -301,6 +334,45 @@ def _drop_parameters(self, parameter_names: Collection[str]) -> SubspaceContinuo
],
)

def _enforce_cardinality_constraints_via_assignment(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not yet perfectly happy with the name here. Any better suggestions? One alternative I could imagine would be _enforce_cardinality_constraints_via_bounds. But input is welcome

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is only one way of enforcing cardinality consraints, why not simply enforce_cardinality_constraints? If people are interested in the details of the how, they can read the docstring.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can do, but would keep it private 👍🏼. @Scienfitz: thumbs up is sufficient for approval

Copy link
Collaborator

@AVHopp AVHopp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First high-level review regarding points that we should discuss. Not a full review yet as I think that the code might change depending on what is decided regarding the min_cardinality.

baybe/parameters/numerical.py Show resolved Hide resolved
@@ -87,3 +89,51 @@ def get_parameters_from_dataframe(
def sort_parameters(parameters: Collection[Parameter]) -> tuple[Parameter, ...]:
"""Sort parameters alphabetically by their names."""
return tuple(sorted(parameters, key=lambda p: p.name))


def activate_parameter(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First question: Wasn't this sort of a problem one of the main arguments against using Narrow Gaussian Processes? By my understanding, the problem that we are getting here is now basically identical, could you please give a quick heads-up why Narrow Gaussians are not an option/alternative? Don't need an extensive discussion, but rather a "Hey Alex, we already talked about that, let me quickly refresh your memory".

@@ -87,3 +89,51 @@ def get_parameters_from_dataframe(
def sort_parameters(parameters: Collection[Parameter]) -> tuple[Parameter, ...]:
"""Sort parameters alphabetically by their names."""
return tuple(sorted(parameters, key=lambda p: p.name))


def activate_parameter(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now to your actual points.

  • First problem that I see with allowing min_cardinality and just mentioning in the docstring that this might be overwritten by the optimizer seems VERY dangerous to me. As we know by now, people tend to not read documentation at all. So what I see is people thinking "Oh, great, I can use a min_cardinality", stop to read and think at exactly that point, assume that it works as intended and then be angry if it does not work.
  • This is a potential crucial silent error that people might have at very inconvenient times: Imagine that you really require this lower cardinality, and that it works the first X runs of your optimization, just to fail in run X+1, invalidating potentially your whole set up.
  • As you say yourself: "I cannot think of any use case where people actually need a min-constraint". We said that things like this should be a strong indicator to not implement stuff.

So I'd propose to drop the support for this.

@@ -76,6 +85,13 @@ class BotorchRecommender(BayesianRecommender):
optimization. **Does not affect purely discrete optimization**.
"""

max_n_subspaces: int = field(default=10, validator=[instance_of(int), ge(1)])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it should be linked to cardinality somehow: For people that are not interested in cardinality constraints, it is only clear after reading the docstring that this is not interesting for them. This would however make the name quite long :/ So although I am not perfectly happy with the name, we could keep it as I do not see a better alternative while keeping it here.

subspace_continuous: SubspaceContinuous,
batch_size: int,
) -> tuple[Tensor, Tensor]:
"""Recommend from a continuous search space with cardinality constraints.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it, only some information about how max_n_subspaces comes into play here is still missing for me :)

),
]
return to_string(self.__class__.__name__, *fields)
def _optimize_continuous_subspaces(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we potentially leverage this for Discrete Searchspaces or some sort of hybrid recommendation as well?

@@ -301,6 +334,45 @@ def _drop_parameters(self, parameter_names: Collection[str]) -> SubspaceContinuo
],
)

def _enforce_cardinality_constraints_via_assignment(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is only one way of enforcing cardinality consraints, why not simply enforce_cardinality_constraints? If people are interested in the details of the how, they can read the docstring.

@Scienfitz Scienfitz marked this pull request as draft November 26, 2024 12:48
@Waschenbacher
Copy link
Collaborator Author

@AdrianSosic This is the updated PR according to our discussion in the baybathon. The main changes are below:

  • A threshold near_zero_threshold is added to each parameter NumericalContinuousParameter. This implementation deviates from the original plan of

    • Threshold for each constraint or subspace vs threshold for each parameter. I decide for a threshold for one parameter because I think this way is more flexible and more logical
    • Threshold ratio (the original suggestion) vs threshold. I decide for threshold since the logic gets complicated with ratio of threshold and there are several doubts when I want to infer the absolute threshold from the ratio of threshold:
      • Is the threshold ratio based on the complete region or just one side?
      • If the bounds are not symmetrical, e.g. a parameter has bounds (a, b)with a!=b. How to infer the absolute threshold values? Do we want to keep a single absolute threshold or two different numbers?
      • How to deal with infinite bounds?
      • The user can infer the desired threshold from any threshold ratio, if needed, and assign them.
  • Check whether any minimum cardinality constraint is violated in botorch's recommendation and raise the warning MinimumCardinalityViolatedWarning if it occurs.

  • Added a test catching the MinimumCardinalityViolatedWarning is raised when any minimum cardinality constraint is violated.

  • Update the test related on cardinality constraint: all maximum cardinality constraints are fulfilled and either all minimum cardinality constraints are fulfilled or a warning is raised if not.

  • Created a botorch PR (Add InfeasibilityError exception pytorch/botorch#2652) for the dangerous ValueError due to infeasibility and added TODO.

baybe/constraints/continuous.py Show resolved Hide resolved
baybe/constraints/continuous.py Show resolved Hide resolved
baybe/parameters/numerical.py Show resolved Hide resolved
@@ -87,3 +89,51 @@ def get_parameters_from_dataframe(
def sort_parameters(parameters: Collection[Parameter]) -> tuple[Parameter, ...]:
"""Sort parameters alphabetically by their names."""
return tuple(sorted(parameters, key=lambda p: p.name))


def activate_parameter(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldnt htis be removed given the meeting outcome that these cases of non-guaranteed min_cardinality are acceptable with a warning?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we still need activate_parameters.

  • When the original bounds of an active parameter is [0, a] or [-a, 0]. with a>threshold, we still need this method to shift the bounds so that it is guaranteed active. I would say especially in this case, the acquisition function tends to have the optimum on the boundary, which is "0". Therefore, activate_parameter helps at least for this case.
  • When the original bounds of an active parameter is [a, b] with value a<0<b and the range is larger than the threshold, we cannot really activate the parameter. Only in this case, it may occur that the optimizer drives this parameter to 0. If it occurs, the minimum cardinality may not hold. This is the reason why we have the check on minimum cardinality and raise a warning if it gets violated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

second that

@@ -41,6 +49,9 @@ def validate_constraints( # noqa: DOC101, DOC103
param_names_discrete = [p.name for p in parameters if p.is_discrete]
param_names_continuous = [p.name for p in parameters if p.is_continuous]
param_names_non_numerical = [p.name for p in parameters if not p.is_numerical]
params_continuous: list[NumericalContinuousParameter] = [
p for p in parameters if isinstance(p, NumericalContinuousParameter)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature New functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants