Skip to content

SRI's implementation of CMA code for DARPA AIE-CriticalMAAS TA3

License

Notifications You must be signed in to change notification settings

DARPA-CRITICALMAAS/sri-ta3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SRI DARPA AIE - CriticalMAAS TA3

PyTorch Lightning Config: Hydra

DEMO

IMAGE ALT TEXT HERE

Background

Key Tools

PyTorch - an open-source deep learning framework primarily developed by Facebook's AI Research lab (FAIR). It provides a flexible and dynamic computational graph computation model, making it popular among researchers and developers for building and training deep neural networks.

PyTorch Lightning - a lightweight PyTorch wrapper that simplifies the process of building, training, and deploying complex deep learning models. It provides a high-level interface and abstractions that abstract away boilerplate code, making it easier for researchers and practitioners to focus on experimenting with and improving their models rather than dealing with low-level implementation details.

Hydra - a framework for elegantly configuring complex applications. The key feature is the ability to dynamically create a hierarchical configuration by composition and override it through config files and the command line.

Project Structure

The directory structure looks like this:

├── data                   <- Project data
│   └── raster_libraries      <- Folder holding sets of individual rasters per CMA
|       ├── maniac_mini_raster_library  <- Raster Library for maniac_mini example
│       └── ...
├── docker                 <- Docker scripts to build images / run containers
│
├── logs                   <- Logs generated by hydra and lightning loggers
├── sri_maper              <- Primary source code folder for MAPER
│   ├── ckpts                 <- Optional folder to hold pretrained models (if not in logs)
│   │
│   ├── configs                 <- Hydra configs
│   │   ├── callbacks               <- Callbacks configs
│   │   ├── data                    <- Data configs
│   │   ├── debug                   <- Debugging configs
│   │   ├── experiment              <- Experiment configs
│   │   ├── extras                  <- Extra utilities configs
│   │   ├── hparams_search          <- Hyperparameter search configs
│   │   ├── hydra                   <- Hydra configs
│   │   ├── logger                  <- Logger configs
│   │   ├── model                   <- Model configs
│   │   ├── paths                   <- Project paths configs
│   │   ├── preprocess              <- Preprocessing configs
│   │   ├── trainer                 <- Trainer configs
│   │   │
│   │   ├── __init__.py        <- python module __init__
│   │   ├── test.yaml          <- Main config for testing
│   │   └── train.yaml         <- Main config for training
│   │
│   ├── notebooks              <- Jupyter notebooks
│   │
│   ├── src                    <- Source code
│   │   ├── data                    <- Data code
│   │   ├── models                  <- Model code
│   │   ├── utils                   <- Utility code
│   │   │
│   │   ├── __init__.py         <- python module __init__
│   │   ├── map.py              <- Run mapping via CLI
│   │   ├── pretrain.py         <- Run pretraining via CLI
│   │   ├── test.py             <- Run testing via CLI
│   │   └── train.py            <- Run training via CLI
│   │
│   ├── __init__.py        <- python module __init__
│
├── .gitignore                <- List of files ignored by git
├── LICENSE.txt               <- License for code repo
├── project_vars.sh           <- Project variables for infrastructure
├── setup.py                  <- File for installing project as a package
└── README.md

Installation

This repo is compatible with running locally, on docker locally, or on docker in a Kubernetes cluster. Please follow the corresponding instrcutions exactly, carefully so install is smooth. Once you are familiar with the structure, you can make changes. NOTE - the repo is currently DEPENDENT on having live CDR and StatMagic instances to recieve the inputs necessary to run as a server.

Local install and run

This setup presents the easiest installation but is more brittle than using docker containers. Please make a virtual environment of your choosing, source the environment, clone the repo, and install the code using setup.py. Below are example commands to do so.

# creates and activates virtual environment
conda create -n [VIRTUAL_ENV_NAME] python=3.10
conda activate [VIRTUAL_ENV_NAME]
# clone repo source code locally
git clone https://github.com/DARPA-CRITICALMAAS/sri-ta3.git
cd sri-ta3
# sets environment variables
source project_vars.sh
# installs from source code
python3 -m pip install -e .

If installation succeeded without errors, you should be able to run the SRI TA3 server. Skip to SRI TA3 server.

Install with docker container that is run locally

This setup is slightly more involved but provides more robustness across physical devices by using docker. We've written convenience bash scripts to make building and running the docker container much eaiser. First, clone the repo locally.

# clone repo source code locally
git clone https://github.com/DARPA-CRITICALMAAS/sri-ta3.git
cd sri-ta3

Next, edit the variables in project_vars.sh relevant to your use case. Typically, one needs to edit JOB_TAG REPO_HOST, DUSER, WANDB_API_KEY, CDR_TOKEN, CDR_HOST, and NGROK_AUTHTOKEN. After editing project_vars.sh, build and run the docker image. Below are example commands to do so using the conenivence scripts.

# builds docker image (installing source in image) and pushes to docker repo
bash docker/run_docker_build_push.sh
# runs docker image
bash docker/run_docker_local.sh

Optionally, if you would like to override the default logs and data folders within this repo that are empty to use exisitng ones (e.g. on datalake) that might contain existing logs and data, simply mount (or overwite) the corresponding folders on the datalake to the empty logs and data folders within this repo. Below are examles commands to do so.

sudo mount.cifs -o username=${USER},domain=sri,uid=$(id -u),gid=$(id -g) /datalake/path/to/existing/logs ./logs
sudo mount.cifs -o username=${USER},domain=sri,uid=$(id -u),gid=$(id -g) /datalake/path/to/existing/data ./data

If installation succeeded without errors, you should be able to run the SRI TA3 server. Skip to SRI TA3 server.

Install with docker container that is run on the SRI International Kubernetes cluster

This setup is slightly more involved but provides more scalability to use more compute by using docker and Kubernetes. First we'll need to prepare some folders on the datalake to contain your data, code, and logs. Under the criticalmaas-ta3 folder (namespace) within the vt-open datalake, make the following directory structure for YOUR use using your employee ID number (i.e. eXXXXX). NOTE, you only need to make the folders with the comment CREATE in it, the others should exist already. Be careful not to corrupt the folders of other users or namespaces.

vt-open
├── ... # other folders for other namespaces - avoid
├── criticalmaas-ta3 # top-level of criticalmaas-ta3 namespace
│   └── k8s # contains criticalmaas-ta3 code & logs for ALL users - (k8s READ & WRITE)
│       ├── eXXXXX # folder you should CREATE to contain your code & logs
│       │   ├── code # folder you should CREATE to contain your code
│       │   ├── data # folder you should CREATE to contain your data
│       │   └── logs # folder you should CREATE to contain your logs
│       └── ... # other folders for other users - avoid
└── ... # other folders for other namespaces - avoid

Next you will need to mount the code folder above locally. By mounting the code folder on the datalake locally, your local edits to source code will be reflected in the datalake, and therefore, on the Kubernetes cluster.

# makes a local code folder
mkdir k8s-code
# mount the datalake folder that hosts the code (Kubernetes will have access)
sudo mount.cifs -o username=${USER},domain=sri,uid=$(id -u),gid=$(id -g) /datalake/path/to/vt-open/criticalmaas-ta3/k8s/${USER}/code ./k8s-code

Last, we'll install the repo. We've written convenience bash scripts to make building and running the docker container much eaiser. Start by cloning the repo locally.

# clone repo source code locally
git clone https://github.com/DARPA-CRITICALMAAS/sri-ta3.git
cd sri-ta3

Next, edit the variables in project_vars.sh relevant to your use case. Typically, one needs to edit JOB_TAG REPO_HOST, DUSER, WANDB_API_KEY, CDR_TOKEN, CDR_HOST, and NGROK_AUTHTOKEN. After editing project_vars.sh, build and run the docker image. Below are example commands to do so using the conenivence scripts.

# builds docker image (installing source in image) and pushes to docker repo
bash docker/run_docker_build_push.sh
# runs docker image
bash docker/run_docker_local.sh

If installation succeeded without errors, you should be able to run the SRI TA3 server. Skip to SRI TA3 server.

SRI TA3 server

Assuming the installation above succeeded. You should now be in a bash terminal that can run the SRI TA3 server now. Run the following to start the server:

python sri_maper/src/server.py

If the server runs successfully, it will register with the configured CDR instance and then wait for mineral assessment job requests to be made via the StatMagic instance. The output should be similar to the following:

Registering with CDR
Starting TA3 server
INFO:     Started server process [9378]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:80 (Press CTRL+C to quit)

You can now start mineral assessments by interacting with the StatMagic GUI at https://statmagic.mtri.org/

Below is a video demonstrating how the SRI TA3 server processes a mineral assessment job initiated from the StatMagic GUI:

IMAGE ALT TEXT HERE

About

SRI's implementation of CMA code for DARPA AIE-CriticalMAAS TA3

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published