Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New search space def #122

Merged
merged 9 commits into from
Mar 27, 2024
Merged

Conversation

perib
Copy link
Collaborator

@perib perib commented Mar 27, 2024

What does this PR do?

The primary feature of this PR is a new API for defining search spaces. This new approach is more modular and flexible allowing for more customizable search spaces. Hyperparameter and pipeline structure search spaces are now separated into different classes.

Changes:

  1. New search space API
    a. Node and Pipeline search spaces
    b. Hyperparameters are now defined with configspace
  2. Edited tutorials to reflect new API
  3. Modified parameters of the TPOTEstimator, TPOTClassifier, and TPOTRegressor
  4. Genetic Feature Selection Node
  5. Modified the FSS into its own Node class with the new API

Other minor changes:

  1. All 'rng_' parameters renamed to 'rng'
  2. removed subset_col from graphpipeline. This was an experimental feature that was unfinished. Removing this simplifies the code. This idea can also be more easily implemented with the new API if we want to return to it.
  3. Experimental optuna feature has been removed from TPOTEstimator. Previously, TPOT could run optuna optimization on the pareto front pipelines. However, that relied on the old search space functions. We would need to write a function to convert individuals/pipelines into a space optuna can search.

There are still some items that need to be updated.

  1. I have not tested the new search spaces with the SteadyState estimator yet.
  2. currently the new search spaces may return a pipeline, baseestimator, or graphpipeline. Some parameters of TPOTEstimator assumed a graphpipeline and have currently been disabled. These include memory and cross_val_predict_cv. One option to address this is to export all pipelines to graphpipeline. Another option is to have these be parameters of the search space rather than the TPOTEstimator.
  3. the unique_id() functions need to be edited to account for nested search spaces.
  4. Some search space definitions currently lack conditionals. unfortunately configspace also doesn't support None as a parameter, and that needs a workaround in the EstimatorNode.
  5. merge_duplicate_nodes need to be edited to account for nested search spaces.
  6. Since how hyperparameters are search has been completely rewritten, the old gradual hyperparameter search is incompatible. Thankfully, that is much easier to implement by simply creating a new version of EstimatorNode.

Questions:

  • Do the docs need to be updated?
  1. Documentation still needs to be updated, will work on that next
  • Does this PR add new (Python) dependencies?
  1. TPOT2 now uses the configspace spackage version 0.7.1 . Additionally, the Python version must be <3.12 for compatibility with this package.

@perib perib merged commit d5a27cc into EpistasisLab:search_space_api Mar 27, 2024
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant