A project to build a robust method for foreground-object/caribou image detection from camera trap footage, recorded in the Alaskan Wilderness. Our main method of achieving this would be utilizing FgSegNet, a robust background subtraction model with a triple-CNN encoder-decoder architecture.
Ensure your computing system has at least 10 GB of GPU Memory/VRAM so FgSegNet won't crash! You may tinker with FgSegNet's batch size if you have limited resources.
-
Install the latest version of Anaconda.
- This will be needed to quickly manage the required dependencies the project's code will need.
-
Install labelme on a machine where you can and will annotate sample images.
- labelme will be used to manually generate training data.
- Clone this project onto your machine (using git or manually as a
.zip
):
git clone https://github.com/richardso21/SERP2021-BGSUB.git
- Utilize your anaconda install to create a new Python "environment" for the project's code to run:
cd SERP2021-BGSUB
conda env create --file environment.yml
conda activate bgsub
- Replace the existing
pyramids.py
of the scikit-image library in the environment with the patched file in this repository. This neat command will do this for you:
cp pyramids.py $(conda info | grep "active env location" | cut -d ' ' -f 9)/lib/python3.6/site-packages/skimage/transform/pyramids.py
- Run this command with
sudo
if you have a permission error.
NOTE: Your raw image data should be in the
.JPG
format for the project code to code properly.
-
Use labelme to draw polygonal annotations of foreground objects for a pool of foreground-positive image samples.
- Your images should be contained in the directory named
data/annotated
at the root of this repository. - The recommended number of images to annoatate is ≈200 images, although you can annotate less/more according to your data or needs.
- Your images should be contained in the directory named
-
Run
mask.py
(python mask.py
) in thedata/annotated
directory which contains the images. -
(Recommended) Additionally, run
resize.py
to shrink the size of the raw/label image data if they are very large, since that can also lead to FgSegNet crashing when attempting to train.
-
Check and tune the hyperparameters(variables) in the
FgSegNet.py
script.- You can change the
scene
name to differentiate between models for different image sets. (Scene name defaults to "BGSUB") - Change
num_frames
to match the number of annotated image samples. - Alter
batch_size
according to your computing resources (smaller batch size requies less resources).
- You can change the
-
Run
FgSegNet.py
in theFgSegNet
directory. It will automatically use your annotated image data to train a new model.- FgSegNet will generate a lot of terminal output, namely for debugging and process tracking purposes.
- If FgSegNet sucessfully trains, a new directory
models
will contain the model file in the formatmdl_<scene_name>.h5
.
-
Collect a pool of foreground-negative images and foreground-positive images.
- The recommended amount of images is > 2000 images for both types combined.
- The proportion between foreground-negative and positive images depend on the frequency of foreground objects present in the image dataset.
-
Run
FgSegNet_generate.py
in theFgSegNet
directory to convert these raw images into FgSegNet black/white masks.- Modify the
scene
variable to select the correct model for your specific dataset. - Outputs of the FgSegNet model should be stored in the
data/classifier/FgSegNet_O
directory.
- Modify the
-
Run
presence_classifier.py
in theFgSegNet
directory to train an external Convolutional Neural Network that predicts the probability of foreground present in a given FgSegNet mask.- The trained model and additional data will be stored in
FgSegNet/classifier
.
- The trained model and additional data will be stored in
The trained FgSegNet and complementary predictor models can be utilized to evaluate further examples of image data in the same dataset/scene. For instance, these models can be implemented into a pipeline to aid researchers in the annotation process.
I would like to thank Dr. Michael I Mandel from Brooklyn College CUNY as well as Dr. John Davis from Staten Island Technical High School for assisting, advising, and supervising me throughout this project.