Templated boilerplate to perform Active Learning. Uses a fork of toupee as a submodule with custom modifications for this sandboxed implementation. Fixes compatability issues with tensorflow >= 2.2
.
- Readily extensible for,
- Aggregated training of single models.
- Deep Incremental Boosting (DIB) with customizable addition layers for boosting.
- Templates can be further extended for other ensembling techniques.
- Use the following command to create a local copy,
git clone https://github.com/karthik-d/Boilerplate-for-Active-Learning
- Use the
--recurse-submodules
option to set up the toupee fork as a submodule here itself. This can, however, be done at a later stage.
- Quick reference on the basics of conda environments.
- Either, use
dependencies/dep-file-conda.yml
file to create an environment with all dependencies by running,
conda env create -f dep-file-conda.yml
The default environment name is al_boilerplate. Edit the .yml file to change it.
- Or, for *nix systems, use
dependencies/dep-file-conda-nix.txt
file to create an environment by running,
conda create --name <env-name> --file dep-file-conda-nix.txt
- Then, switch to the created/modified environment by running
conda activate <env-name>
- Use
dependencies/dep-file-pip.txt
file to install all dependencies by running,pip install -r dep-file-pip.txt
- The requirements file for
pip
was generated on a *nix system. Hence, it will likely install without hiccups only on this platform. - For other platforms, the task may simply decompose to a manual dep-by-dep install! In this case, refer to the versions of dependencies in the dependency files when installing them.
- Use the standard
pip install <lib-name>
syntax to install the following additional dependencies that are currently not available in any of the anaconda repositories, - Note: Newer versions of conda may have already completed this for you.
Setup toupee as a submodule
- If you did not use
--recurse-submodules
when cloning this repository, set up the toupee fork now. - Run the following commands to do so,
to detect the submodule, and to fetch and set it up, respectively.
git submodule init git submodule update
- Use toupee's setup to install the dataset.
- For
tests/mnist_test
, get the MNIST dataset intodata/mnist
by running,
python toupee/bin/load_data.py mnist <destination-location>
- This prepares the data in the NPZ format, into the destination directory.
- Please note that,
- The path to destination should preferrably be an absolute path, but it could also be relative to the current working context.
- The
bin/load_data.py
script file, of course, must be traced to its location appropriately, depending on the current working context.
- Model test-cases are scripted in
tests/
- Each subdirectory is a test-case.
- Each test-case contains three files: a tensorflow model defined as a yml configuration (with .model extension); a training experiment configuration file (with .yml extension); and a Python script to define evaluation metrics and the testing workflow (with .py extension).
- Use
run.py
in the root of the repository to run models; this is a wrapper for toupee's model executors intoupee/bin
. - Edit
run.py
to,- set the parameters for model training; only two parameters are required, and the rest are optional:
root_path
which is already configured to the repository root, andparams_file
which must point to the .yml defintion file in the test-case directory. - choose the underlying toupee script to run, by calling the appropriate
run_*
function in__main__
.run_base_model()
is a wrapper for toupee'sbin/base_model.py
.run_ensemble_model()
is a wrapper for toupee'sbin/ensemble.py
.
- set the parameters for model training; only two parameters are required, and the rest are optional:
- To execute the script, after setting the parameters and calling the appropriate driver function, simply use,
python run.py
Should the driver run without any errors, the following outputs can be expected for training and subsequently evaluating the test workflows.
- Use
python3
in place ofpython
to execute Python scripts if you experience problems with Python version resolution between 2.x and 3.x. - Prefer the use of abolute paths over relative ones. In the latter case, adjust the path relative to the current context.
- Preferably, use
ROOT_DIR
defined inconfig.py
to reference the root of the codebase and construct the path from thereon. - If you need to make changes to the toupee fork,
Key changes made in toupee (!Incomplete List)
- Model definitions changed, in-pipeline, from YAML to JSON to adapt to newer versions of tensorflow and keras. No input/output changes.
- Environment dependencies and version-related issues fixed. Sandboxed and included as dependency files for easy reproduction.
- File paths configured to be referenced from codebase root.
- Wrapper developed for training experiments to integrate a fork of toupee as a submodule of the working codebase.