Cross validation of machine-learning models on Faculty platform. At present, the package mostly offers a way to cross validate models in parallel by means of Faculty jobs. To access the functionality one makes use of the class:
faculty_xval.validation.JobsCrossValidator
Additional information is found in the example notebooks provided. Please have a
look at the section Try out the examples
below.
The package supports keras
and sklearn
models. Whilst one can write custom
models that are compatible with faculty-xval
, no guarantee is given that the
package handles these situations correctly, in particular because of issues
concerning the randomisation of weights.
Two sets of installation instructions are provided below:
- If you would like to simply use
faculty-xval
, please follow theUser installation instructions
. - If you would like to develop
faculty-xval
further, please follow theDeveloper installation instructions
.
In your project on Faculty platform, create an environment named faculty_xval
.
In the PYTHON
section, select Python 3
and pip
from the dropdown menus.
Then, type faculty-xval
in the text box, and click on the ADD
button.
The environment installs the package faculty-xval
, and should be applied on
every server that you create; this includes both interactive servers and job
servers, as explained next.
Create a new job definition named cross_validation
. In the COMMAND
section,
paste the following:
faculty_xval_jobs_xval $in_paths
Then, add a PARAMETER
with the name in_paths
, and ensure that the
Make field mandatory
box is checked.
Finally, under SERVER SETTINGS
, add faculty_xval
to the ENVIRONMENTS
section.
For cross-validation jobs that are computationally intensive, we recommend using
dedicated servers as opposed to running on shared infrastructure. To achieve
this, click on Large and GPU servers
under SERVER RESOURCES
, and select an
appropriate server type from the dropdown menu.
Remember to click SAVE
when you are finished.
Before beginning the installation process, pick an appropriate username, such as
foo
. This does not necessarily need to match your Faculty platform username.
In the following instructions, your selected username will be referred to as
<USER_NAME>
.
Create the folder /project/<USER_NAME>
. Then, run the commands:
cd /project/<USER_NAME>
git clone https://github.com/facultyai/faculty-xval.git
Next, create an environment in your project named faculty_xval_<USER_NAME>
.
In this environment, under SCRIPTS
, paste in the following code to the BASH
section, remembering to change the USER_NAME
definition on the second line to
your selected <USER_NAME>
:
# Remember to change username!
USER_NAME=<USER_NAME>
# Install faculty-xval from local repository.
pip install /project/$USER_NAME/faculty-xval/
# Turn USER_NAME into an environment variable.
echo "export USER_NAME=$USER_NAME" > /etc/faculty_environment.d/app.sh
if [[ -d /etc/service/jupyter ]] ; then
sudo sv restart jupyter
fi
This environment should be applied on every server that you create; this includes both 'normal' interactive servers and job servers, as explained next.
Next, create a new job definition named cross_validation_<USER_NAME>
. In the
COMMAND
section, paste the following:
faculty_xval_jobs_xval $in_paths
Then, add a PARAMETER
with the name in_paths
, and ensure that the
Make field mandatory
box is checked.
Finally, under SERVER SETTINGS
, add faculty_xval_<USER_NAME>
to the
ENVIRONMENTS
section.
For cross-validation jobs that are computationally intensive, we recommend using
dedicated servers as opposed to running in the cluster. To achieve this, click
on Large and GPU servers
under SERVER RESOURCES
, and select an appropriate
server type from the dropdown menu.
Remember to click SAVE
when you are finished.
Please clone this repository. Examples of cross validation with faculty-xval
for the different types of model are provided in the directories
examples/keras
and examples/sklearn
. Usage instructions are then divided in
two notebooks:
jobs_cross_validator_run.ipynb
loads the data, instantiates the model, and starts a Faculty job that carries out the cross validation.jobs_cross_validator_analyse.ipynb
gathers the results from the cross validation, reloads the target data, and calculates the model accuracy over multiple train-test splits.
Note that the example notebooks must be run in the order just defined.