Skip to content

Using Convolutional Neural Networks (CNNs) for binary classification of plant images, especially of the genius Frullania.

Notifications You must be signed in to change notification settings

emcdona1/field_classification

Repository files navigation

Convolutional Neural Networks and Microplants

field_classification repository

The code in this repository uses Convolutional Neural Networks (CNN) in Tensorflow/Keras to classify images of two sets of plant species (e.g. the morphologically similar plant families Lycopodieaceae and Selaginellaceae, or two species of the Frullania genus) based on the available corpus of images. Scripts are available to download and preprocess images. The CNN and classification programs are generic enough to accept images of any species of plants (or other objects!).


Setup

  1. Clone the repository to your local machine.
  2. Confirm you have the necessary Python version and packages installed (see Environment section below).
  3. Prepare two sets of images, each within a directory that indicates the class name. These folders should be put together in a directory, with no other image files. e.g.,
training_image_folder
└───species_a
└───species_b
  1. If you have TIF images, use the script in utilities\image_process\tif_to_jpg.py to quickly convert files.
    (The TIF files will be moved to a new subdirectory called tif.)
  2. You should prepare a separate group of test images, either manually, or you can use the available utility script: utilities\image_processing\create_test_group.py.
    • This script defaults to creates a split of 90% for training/validation and 10% for testing. It creates copies of the images in four new directories -- folder1test, folder1train, folder2test, and folder2train.

Environment

This code has been tested in Python 3.9.4 in Windows and Ubuntu, using Anaconda for virtual environments. Please consult requirements.txt or the list below for necessary Python packages.

Tested Package Versions:

  • tensorflow 2.5.0-rc3 (Release v1.0 and earlier are compatible with TensorFlow 1.15.0)
  • matplotlib~=3.4.2
  • numpy==1.19.5
  • opencv-python~=4.5.2.52
  • pandas==1.2.4
  • scikit-learn==0.24.2
  • pillow~=8.2.0
  • scipy~=1.6.2
  • requests~=2.25.1
  • scikit-image~=0.18.1
  • augmentor~=0.2.8

Workflow

  1. Run train_models_image_classification.py or train_handwriting_model.py, using arguments to specify image sets and hyper-parameters.
  • Arguments: (-h flag for full details)
    • training_set (positional, required) - file path of the directory that contains the training images (e.g. training_image_folder as described in the Setup section.)
    • height (positional, required) - desired image size (if the optional -w argument is not provided, images will be loaded as height x height square)
    • -w - image width for non-square images
    • (-color, -bw) - boolean flag for number of color channels (RGB or K) (default = color)
    • -lr - learning rate value (decimal number, default = 0.001)
    • -e - number of epochs per fold (integer >= 5, default=25)
    • -b - batch size for updates (integer >= 2, default=64)
    • -cls - number of classes (integer >= 2, default=2)
  • Weights:
    • Determined in model_training.py lines 33-43, 51
    • Uncomment line with desired weights
    • To use no weighting, comment:
      • Lines 33-43 (Optional)
      • Line 51 (class_weight=self.class_weight)
  • Output:
    • Directory saved_models is created in current working directory, which will contain one model file after training (CNN_1.model).
    • Directory graphs is created in current working directory, which will contain all generated graphs/plots for each run, plus a CSV summary for each fold.
      • Note: This directory will be empty after CTC model training.
  • Example execution (CNN): python train_models_image_classification.py training_images 128 -color -lr 0.005 -f 10 -e 50 -b 64 -cls 2 > species_a_b_training_output.txt &
  1. After the training is finished, use the model file(s) to classify test set images. The number of predictions generated = # of test images * # of model files To run classify_images_by_vote.py:
  • Arguments: (-h flag for full details)
    • images (positional, required) - file path of a directory containing the test image folders
    • models (positional, required) - a single model file, or a folder of models (e.g. saved_models in working directory)
    • height (positional, required) - desired image size (if the optional -w argument is not provided, images will be loaded as height x height square)
    • -w - image width for non-square images
    • (-color, -bw) - boolean flag for number of color channels (RGB or K) (default = color)
  • Output:
    • Directory predictions is created if needed, and the predictions are saved as a CSV file (yyyy-mm-dd-hh-mm-ssmodel_vote_predict.csv).
  • Example execution (CNN only): python classify_images_by_vote.py test_images saved_models 128 -color

Repository layout

Folders:
  • / - Contains the main files used for training and testing models.
  • /data_visualization - Contains the files for generating and saving graphs/data visualizations after training. Creates a graphs directory, if it doesn't already exist.
  • /labeled_images - Contains the files for loading in image sets.
  • /models - Contains the files used to define neural network layer architectures.
  • /utilities - Contains image preprocessing scripts, a simple program timer, and archived files.
Files:
  • train_models_image_classification.py
    • The main training program for image classification -- see Workflow above.
  • train_handwriting_model.py
    • The main training program for RNN/CTC handwriting digitization -- see Workflow above.
  • classify_images_by_vote.py
    • The main testing program for image classification -- see Workflow above.

Contributors and licensing

This code has been developed by Beth McDonald (emcdona1, Field Museum, former NEIU), Sean Cullen (SeanCullen11, NEIU) and Allison Chen (allisonchen23, UCLA).

This code was developed under the guidance of Dr. Matt von Konrat (Field Museum), Dr. Francisco Iacobelli (fiacobelli, NEIU), Dr. Rachel Trana (rtrana, NEIU), and Dr. Tom Campbell (NEIU).

This project was made possible thanks to the Grainger Bioinformatics Center at the Field Museum.

This project has been created for use in the Field Museum Gantz Family Collections Center, under the direction of Dr. Matt von Konrat, Head of Botanical Collections at the Field.

Please contact Dr. von Konrat for licensing inquiries.