-
Notifications
You must be signed in to change notification settings - Fork 92
Example Problems
A number of example problems are currently provided with PonyGE2:
- String-match
- Regression
- Classification
- Pymax
- Program synthesis
- Genetic Improvement of Regex Runtime Performance
- Multiple Objective Optimisation
A brief description is given below of each problem, along with the command-line arguments necessary to call each problem. The developers of PonyGE2 encourage users to test out the various different operators and options available within PonyGE2 using these example problems in order to gain an appreciation of how they work.
The grammar specifies words as lists of vowels and consonants along with special characters. The aim is to match a target string.
To use it, specify the following command-line argument:
--parameters string_match.txt
The default string match target is Hello world!
, but this can be changed with the --target
argument.
The grammar generates a symbolic function composed of standard mathematical operations and a set of variables. This function is then evaluated using a pre-defined set of inputs, given in the datasets folder. Each problem suite has a unique set of inputs. The aim is to minimise some error between the expected output of the function and the desired output specified in the datasets. This is the default problem for PonyGE.
To try this problem, specify the following command-line argument:
--parameters regression.txt
The default regression problem is Vladislavleva4
, but this can be changed with the --grammar_file
, --dataset_train
and --dataset_test
arguments.
When running supervised learning problems like regression and classification, PonyGE2 can attempt to optimise constants. These constants will exist in the individual's phenotype string as c[0]
, c[1]
, etc., so the grammar should create such terms. Then, if we specify the following argument:
--optimize_constants
or set the parameter OPTIMIZE_CONSTANTS
to True
in either a parameters file or the params dictionary, then PonyGE2 will perform a gradient descent on the values of the constants (given the data) to minimise the error (as measured by the chosen --error_metric
). It will use the L-BFGS-B method. By default, optimisation of constants is switched off.
Classification can be considered a special case of symbolic regression but with a different error metric. Like with regression, the grammar generates a symbolic function composed of standard mathematical operations and a set of variables. This function is then evaluated using a pre-defined set of inputs, given in the datasets folder. Each problem suite has a unique set of inputs. The aim is to minimise some classification error between the expected output of the function and the desired output specified in the datasets.
To try this problem, specify the following command-line argument:
--parameters classification.txt
The default classification problem is Banknote
, but this can be changed with the --grammar_file
, --dataset_train
and --dataset_test
arguments.
One of the strongest aspects of a grammatical mapping approach such as PonyGE2 is the ability to generate executable computer programs in an arbitrary language [O'Neill & Ryan, 2003]. In order to demonstrate this in the simplest way possible, we have included an example Python programming problem.
The Pymax
problem is a traditional maximisation problem, where the goal is to produce as large a number as possible. However, instead of encoding the grammar in a symbolic manner and evaluating the result, we have encoded the grammar for the Pymax
problem as a basic Python programming example. The phenotypes generated by this grammar are executable python functions, whose outputs represent the fitness value of the individual. Users are encouraged to examine the pymax.pybnf
grammar and the resultant individual phenotypes to gain an understanding of how grammars can be used to generate such arbitrary programs.
To try this problem, specify the following command-line argument:
--parameters pymax.txt
The General Program Synthesis Benchmark Suite is available in PonyGE2. Grammars and datasets have been provided by HeuristicLab.CFGGP. The individuals produce executable Python code.
To try this problem, specify the following command-line argument:
--parameters progsys.txt
NOTE that multicore evaluation is not currently supported for this type of problem.
An example of Genetic Improvement for software engineering is given for program improvement of Regular Expressions (regexs). The given examples improve existing regexes by seeding them into the population. The fitness function measures runtime and functionality of regexs.
The fitness function for the regex performance improvement problem has a number of sub-modules and programs which are used for generating test cases and for timing the execution of candidate programs. This is an exmaple of how a fitness function can be extended in PonyGE2 with customised modules and classes to perform complex tasks.
To try this problem, specify the following command-line argument:
--parameters regex_improvement.txt
NOTE that multicore evaluation is not currently supported for this type of problem.
As detailed in the section on multiple objective optimisation, PonyGE2 comes as standard with an implementation of NSGA2 to optimise multiple fitness functions simultaneously.
Two example problems are provided:
-
Regression with bloat control
This example runs the standard Vladislavleva-4 regression example, but with a second fitness function that aims to minimise the number of nodes in the derivation tree. To try this problem, specify the following command-line argument:
--parameters moo/regression_bloat_control.txt
NOTE that it is advisable to use subtree mutation in combination with the mutate duplicates option with this example problem.
-
An example benchmark multi-objective optimisation problem from [Zitzler, et al.] is provided. To try this problem, specify the following command-line argument:
--parameters moo/zdt1.txt
It has been made as simple as possible to add new problems to PonyGE. To add in a new problem, you will need:
- a new grammar file specific to your problem,
- a new fitness function (if you don't want to use a previously existing one), and
- if you are doing supervised learning then you may also need to add some new datasets.
A template for new fitness functions is provided in fitness.base_ff_classes.ff_template
. This template allows for new fitness functions to be easily and quickly created in an easily compatible and extensible manner.
NOTE that it may be beneficial to create a new parameters file for any new problem.