We are happy to accept contributions of methods, as well as updates to the benchmarking framework. Below we specify minimal requirements for contributing a method to this benchmark.
- In general you should submit pull requests to the dev branch.
- Make the PR detailed and reference specific issues if the PR is meant to address any.
- Please be kind and please be patient. We will be, too.
To contribute a symbolic regression method for benchmarking, fork the repo, make the changes listed below, and submit a pull request to the dev
branch.
Once your method passes the basic tests and we've reviewed it, congrats!
We will plan to benchmark your method on hundreds of regression problems.
- An open-source method with a scikit-learn compatible API
- If your method uses a random seed, it should have a
random_state
attribute that can be set. - If your method is installable via pip or conda, add it to the environment file.
Otherwise, a bash install script in
experiment/methods/src/
namedyour-method_install.sh
that installs your method. See ellyn_install.sh as an example. Our Github actions workflow will automatically recognize it. - A minimal script in
experiment/methods/
that defines these items:est
: a sklearn-compatibleRegressor
objecthyper_params
: a dictionary or list of dictionaries specifying the hyperparameter search spacemodel(est)
: a function that returns a sympy-compatible string specifying the final model.- (optional): a dictionary named
eval_kwargs
that can specify method-specific arguments to evaluate_model(). See the experiment/methods/AFPRegressor.py and/or other methods in that folder for examples.
In order to check for exact solutions to problems with known, ground-truth models, each SR method returns a model string that can be manipulated in sympy. Assure the returned model meets these requirements:
- The variable names appearing in the model are identical to those in the training data.
- The operators/functions in the model are available in sympy's function set.
If they are not, they need to be defined in the model's script and referenced appropriately in the
model()
function, so that sympy can find them.
We are still working out how to handle operators uniformly and appropriately, and currently rely on experiment/symbolic_utils.py to post-process models.
However, new additions to the repo should not require post-processing for compatibility.
See issue #58 for more details.