Experiments for "Quality-of-Service Metrics for Intelligent Agents exploiting Symbolic Knowledge Injection via PSyKI" (JAAMAS).
Andrea Agiollo, Andrea Rafanelli, Matteo Magnini, Giovanni Ciatto, Andrea Omicini. "[Symbolic knowledge injection meets intelligent agents: QoS metrics and experiments]", in: Auton. Agents Multi Agent Syst. 37(2): 27 (2023).
Bibtex:
@article{DBLP:journals/aamas/ARMCO23,
author = {Andrea Agiollo and
Andrea Rafanelli and
Matteo Magnini and
Giovanni Ciatto and
Andrea Omicini},
title = {Symbolic knowledge injection meets intelligent agents: QoS metrics
and experiments},
journal = {Auton. Agents Multi Agent Syst.},
volume = {37},
number = {2},
pages = {27},
year = {2023},
url = {https://doi.org/10.1007/s10458-023-09609-6},
doi = {10.1007/S10458-023-09609-6},
timestamp = {Tue, 12 Sep 2023 07:57:44 +0200},
biburl = {https://dblp.org/rec/journals/aamas/AgiolloRMCO23.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Execute the command python -m setup load_datasets [-f] [-o]
to download datasets from UCI website.
By default, the command will store the original dataset into datasets
folder.
If you specify:
-f y
to binarize input features;-o y
to map the output classes into numeric indices.
Datasets are not tracked by git, so you first need to execute this command before doing anything else. For To reproduce the experiments in the paper you should run the command with both the options:
python -m setup load_datasets -f y -o y
UPDATE! (23/04/2023)
Recently, the UCI website updated one of the dataset that we are using in the experiments.
Therefore, to preserve reproducibility, we have added the preprocessed dataset to the repository.
Conversely, there is no need to execute the command python -m setup.py load_datasets
anymore.
Wisconsin breast cancer dataset (breast cancer)
It represents clinical data of patients. It consists of 9 categorical ordinal features:
- Clump Thickness
- Uniformity of Cell Size
- Uniformity of Cell Shape
- Marginal Adhesion
- Single Epithelial Cell Size
- Bare Nuclei
- Bland Chromatin
- Normal Nucleoli
- Mitoses
All features have integer values in [1, 10] range. Class indicates if the cancer is benign or malignant.
Primate splice junction gene sequences dataset (splice junction)
It represents DNA sequences.
Each sequence consists of 60 bases.
Values of one base can be a
, c
, g
, t
(adenine, cytosine, guanine, thymine).
Class indicates if a sequence activate a biological process: exon-intron
, intron-exon
, none
.
The dataset comes with its own knowledge base.
Both dataset and knowledge have special symbols in addition to the 4 bases.
These symbols indicate that for a particular position in the sequence more than one value of the 4 basis is allowed.
For this reason, the dataset is binarized (one-hot encoding) in order to represent dna sequences with just the 4 basis.
Census income dataset (census income)
It represents general person's data and the yearly income (less or above 50,000 USD). Features are continuous, (nominal and ordinal) categorical and binary.
- age, continuous (integers)
- workclass, nominal categorical
- fnlwgt (final weight), continuous
- education, nominal categorical
- education-num, ordinal categorical (integers)
- marital-status, nominal categorical
- occupation, nominal categorical
- relationship, nominal categorical
- race, nominal categorical
- sex, binary
- capital-gain, continuous
- capital-loss, continuous
- hours-per-week, continuous
- native-country, nominal categorical
Knowledge is already provided for the splice junction
dataset.
Instead, for the census income
and breast cancer
dataset it must be generated somehow.
We provide a command:
python -m setup generate_missing_knowledge
for this purpose. The knowledge is generated by a classification decision tree trained upon half of the training set. To make sure that this command is successful, be sure to download the datasets first.
To obtain the best hyperparameters for the predictors, run the command:
python -m setup grid_search
This command will perform a grid search over the hyperparameters of the predictors (uneducated and educated) for each dataset. The grid search is performed on the number of layers and on the number of neurons per layer.
Note that this command will take a long time to complete.
To run the experiments, execute the command:
python -m setup run_experiments
This command will run the experiments for each dataset and for each predictor (uneducated and educated).
The results will be stored in the results
folder.
Note that this command will take a long time to complete.