This repository holds the code for our paper "HARRIS: Hybrid Ranking and Regression Forests for Algorithm Selection" by Lukas Fehring, Alexander Tornede, and Jonas Hanselle. It is build on top of https://github.com/alexandertornede/algorithm_survival_analysis. Regarding questions please contact lukas.fehring@stud.uni-hannover.de.
It is well known that different algorithms perform differently well on an instance of an algorithmic problem, motivating algorithm selection (AS): Given an instance of an algorithmic problem, which is the most suitable algorithm to solve it? As such, the AS problem has received considerable attention resulting in various approaches -- many of which either solve a regression or ranking problem under the hood. Although both of these formulations yield very natural ways to tackle AS, they have considerable weaknesses. On the one hand, correctly predicting the performance of an algorithm on an instance is a sufficient, but not a necessary condition to produce a correct ranking over algorithms and in particular ranking the best algorithm first. On the other hand, classical ranking approaches often do not account for concrete performance values available in the training data, but only leverage rankings composed from such data. We propose \name - Hybrid rAnking and RegRessIon foreSts - a new algorithm selector leveraging special forests, combining the strengths of both approaches while alleviating their weaknesses. \name' decisions are based on a forest model, whose trees are created based on splits optimized on a hybrid ranking and regression loss function. As our preliminary experimental study on ASLib shows, \name improves over standard algorithm selection approaches on some scenarios showing that combining ranking and regression in trees is indeed promising for AS.
For the sake of reproducibility, we will detail how to reproduce the results presented in the paper below.
In order to reproduce the results by running our code, we assume that you have a MySQL server with version >=5.7.9 running.
As a next step, you have to create a configuration file entitled experiment_configuration.cfg
in the conf
folder next to allocate.txt
. This configuration file should contain the following information:
[DATABASE]
host = my.sqlserver.com
username = username
password = password
database = databasename
table = tablename
ssl = true
[EXPERIMENTS]
scenarios=ASP-POTASSCO,CSP-Minizinc-Time-2016,MAXSAT12-PMS,MIP-2016,QBF-2011,QBF-2016,SAT12-ALL,SAT12-HAND,CPMP-2015
approaches=HARRIS,isac,per_algorithm_regressor,satzilla-11
amount_of_training_scenario_instances=-1
amount_of_cpus=16
tune_hyperparameters=0
train_status=all
You have to adapt all entries below the [DATABASE]
tag according to your database server setup. The entries have the following meaning:
host
: the address of your database serverusername
: the username the code can use to access the databasepassword
: the password the code can use to access the databasedatabase
: the name of the database where you imported the tablestable
: the name of the table, where results should be stored. This is created automatically by the code if it does not exist yet and should NOT be created manually.ssl
: whether ssl should be used or not
Entries below the [EXPERIMENTS]
define which experiments will be run. The configuration above will produce the main results presented in the paper.
For running the code several dependencies have to be fulfilled. The easiest way of getting there is by using Anaconda. For this purpose, you find an Anaconda environment definition called environment.yml
at the top-level of this project. Assuming that you have Anaconda installed, you can create an according environment with all required packages via
conda env create -f environment.yml
which will create an environment named HARRIS
. After it has been successfully installed, you can use
conda activate HARRIS
to activate the environment and run the code (see step 4).
Your cwd
should be set to ./survival_tests
for execution. Additionally, you need to append the folder ./survival_tests
to your lookup path e.g, with
export PYTHONPATH="$PYTHONPATH:path/to/cwd"
Obviously, the code requires access to the ASLib scenarios in order to run the requested evaluations. It expects the ASLib scenarios (which can be downloaded from Github) to be located in a folder data/aslib_data-master
on the top-level of your IDE project. I.e. your folder structure should look similar to this:
./survival_tests
./survival_tests/approaches
./survival_tests/approaches/survival_forests
./survival_tests/approaches/combined_ranking_and_regression_forest
./survival_tests/approaches/combined_ranking_and_regerssion_trees
./survival_tests/results
./survival_tests/singularity
./survival_tests/tests
./survival_tests/data
./survival_tests/data/aslib_data-master
./survival_tests/conf
At this point you should be good to go and can execute the experiments by running the run.py
on the top-level of the project.
All results will be stored in the table given in the configuration file and has the following columns:
scenario_name
: The name of the scenario.fold
: The train/test-fold associated with the scenario which is considered for this experimentapproach
: The approach which achieved the reported results, whereRun2SurviveExp := Expectation_algorithm_survival_forest
,Run2SurvivaPar10 := PAR10_algorithm_survival_forest
andRun2SurvivePolyLog := SurrogateAutoSurvivalForest
metric
: The metric which was used to generate the result. For thenumber_unsolved_instances
metric, the suffixTrue
indicates that feature costs are accounted for whereas forFalse
this is not the case. All other metrics automatically incorporate feature costs.result
: The output of the corresponding metric.impact_factor
: Null if not Hybrid Forest or Hybrid Tree, else λ from the paperstopping_criteria
: Null if not Hybrid Forest or Hybrid Tree, else stopping criterion of tree(s)ranking_error
: Null iff not Hybrid forest or Hybrid Tree, else ranking error used at training time
All plots/the table found in the paper can be generated using the self-explanatory Jupyter notebook visualization.ipynb
in the top-level results
folder.
- traitlets==5.2.1.post0
prefix: /home/tornede/anaconda3/envs/HARRIS