- Author(s): Taissir Boukrouba
- Affiliation: University of UHBC
- Date: 07/2022
- Project Overview
- Importance
- Document Control
- Installation
- VIT Model Implementation
- Flutter App Implementation
- Project Limitations
This project focuses on the creation and testing of a personalized dataset consisting of 41 Algerian plant species. The primary goal is to apply state-of-the-art Vision Transformers (ViT), inspired by the Natural Language Processing (NLP) Transformer architecture, to achieve accurate multi-class image classification of these plants. Unlike traditional convolutional neural networks (CNNs), which rely on localized feature extraction, Vision Transformers offer a more global perspective, potentially enhancing classification accuracy in complex, multi-class datasets. By leveraging VITs and their self-attention mechanisms, this project seeks to explore the model’s ability to handle visual data, particularly in the context of distinguishing between multiple plant species.The trained model was integrated into a Flutter mobile application under the name of "NEBTA" , designed in Figma, to provide an easy-to-use interface for plant identification. This app uses the phone's camera, enabling real-time application of the Vision Transformer model to classify plant species directly from live images.
The multi-class classification of Algerian plants using Vision Transformers offers diverse practical applications. In agriculture, it can assist with an automated identification of crop species (crop monitoring) and can help in predicting crop yields by analyzing the health and growth stages of different plant species (Yield Prediction). For environmental management, it supports tracking changes in plant populations providing insights into ecosystem dynamics (biodiversity conservation) , and controlling invasive species. It also plays a role in pharmaceutical research by aiding in medicinal plant identification and quality control. Beyond this, the model can be applied in forestry management, smart farming, and accurate plant data supports informed decision-making in land use planning (GIS mapping), conservation zoning, and habitat restoration projects , promoting sustainable practice. Finally, the dataset and classification model serve as valuable resources for academic research in botany, ecology, and related fields.
This project follows a well-organized directory structure to streamline development and ensure efficient resource management. The /app
directory contains the core Flutter application files, including the source code and necessary configurations for the plant classification app. Documentation is housed in the /documentation
folder, offering comprehensive explanations of the methodology and the steps involved. The /design
folder stores Figma design files, covering the app's interface, which includes 14 pages and a 3D logo with a green and red color theme. Jupyter notebooks used for tasks such as data exploration, preprocessing, and model training are located in the /notebooks
folder. Finally, the requirements.txt
file outlines all the necessary Python libraries and dependencies required for running the machine learning components, ensuring seamless project execution and collaboration.
To save time executing the project, it is highly advisable to have access to good CPU and GPU resources. Using a computer equipped with high-performance GPUs, or opting for platforms like Colab Pro with premium GPU options such as NVIDIA A100, L4, or T4, can significantly speed up training and inference times. These powerful computing resources are essential for efficiently handling the different techniques and complex computations required by Vision Transformers, ensuring faster and more accurate results
# clonning repository
git clone https://github.com/taissirboukrouba/Muti-Class-Plant-Classification-Using-ViT.git
# changing to the project's directory
cd Muti-Class-Plant-Classification-Using-ViT
# installing the required libraries
pip install -r requirements.txt
Important
For a more in-depth documentation , please refer to the following directory
The dataset utilized in this approach was gathered through a collaborative effort involving my colleagues and experts from the National Plant Protection Institute, Labiod Medjadja, located in Chlef, Algeria. This collaboration was crucial in ensuring the dataset’s quality and diversity, as it leveraged the expertise of seasoned professionals in the field of plant science. The collection process spanned several months and involved field visits to various locations, including the institute's experimental fields and surrounding farms, to document and capture images of native Algerian plant species.
The resulting dataset comprised approximately 5,146 high-resolution images, representing 41 distinct species of Algerian plants. These plants were meticulously identified and categorized by the experts, ensuring accurate labeling and classification for the machine learning model. The following table summarizes the dataset :
Attribute | Details |
---|---|
Size | 604.1 MB |
Number of classes | 41 different plant classes |
Total number of instances | 5146 pictures |
Average number of images per class | About 100 pictures |
Dataset source | The National Plant Protection Institute - Labiod Medjadja |
Examples | Asphodel - Pot Marigold - Baby Sun Rose - Rapeseed |
After collecting the dataset of plant images, we now move to a crucial phase in the data processing pipeline: importing, preprocessing, and preparing the data for classification using Vision Transformers (ViT). This phase is essential for structuring the data and ensuring that the model can effectively learn from it. To achieve accurate plant classification, the pipeline is broken down into multiple steps, each designed to handle different aspects of data processing and modeling. These steps are as follows :
- Importing & Batching Data
- Hyperparameter Tuning
- Data Augmentation
- Vision Transformers
- Model Deployment
Note
The following image illustrates results from only 60 epochs, not the complete 85 epochs, which would reflect the true final results.
The model's performance over the course of training reveals a significant improvement in accuracy and a notable reduction in loss. In the first epoch, the model recorded an accuracy of only 5.53% with a loss of 4.5985, indicating that it struggled to correctly classify the images. However, by the second epoch, there was a marked improvement, with accuracy rising to 9.14% and loss decreasing to 3.5938. This upward trend continued throughout the training process.
By the final epoch (Epoch 85), the model achieved an impressive accuracy of 91.21% and a loss of 0.2832. This substantial increase in accuracy demonstrates that the model effectively learned to classify the plant species from the dataset, suggesting that the implemented techniques, including data augmentation and the Vision Transformer architecture, were successful. The low final loss value further indicates that the model is not only accurate but also well-calibrated, suggesting it can generalize effectively to unseen data. Overall, these results highlight the effectiveness of the training strategy and the robustness of the model in handling the complexities of the dataset.
The model’s overall performance was good, achieving acceptable accuracy, but some mispredictions occurred where certain plants were confused for similar-looking species. These misclassifications are likely due to visual similarities between plant species that share common features, such as leaf shapes, textures, or color patterns especially when there is a small dataset. This suggests that while the model effectively learned to distinguish the majority of classes, subtle differences between some plants proved challenging. Such mispredictions are not uncommon in plant classification tasks, where even human experts may struggle to differentiate between visually similar species. Further fine-tuning of the model, additional training data, or the inclusion of more distinguishing features could help improve accuracy and reduce these errors.
For the design of the app, I used Figma and opted for a minimalist, plant-oriented aesthetic to maintain clarity and focus on the app’s functionality. The design consists of over 14 thoughtfully crafted pages, including a custom 3D logo that aligns with the overall green and red theme used throughout the app interface. The design integrates these colors seamlessly into both the logo and the various UI elements for a cohesive look and feel.
I created 5 onboarding pages to introduce new users to the app's features and functionality. The home page features a scrollable list of detectable plants displayed in a card format, where users can tap on a plant to reveal additional information. A search bar is also integrated on the home page, allowing users to input the name of a plant and instantly access relevant details.
In addition, there is a "Tips" page offering guidance on how to use the app, plant care tips, and other useful information for plant enthusiasts. A user profile page is included, featuring a profile picture option, basic settings, and login/logout functionality to enhance personalization. The most crucial part of the app is the plant identification page, which allows users to either take a photo using their camera or upload an image from their gallery for identification. This key feature makes the app highly functional and user-friendly, offering a seamless experience for plant lovers looking to explore and learn about various species.
I was able to integrate several well-established heuristics into the design, including:
- Consistency: The design remained uniform across the entire application, with actions following a clear, logical order, ensuring that users could easily navigate and understand the app's functionality without confusion. This consistency helps create a seamless user experience.
- Recognition rather than recall: By prioritizing images and icons over text-heavy elements, the design allows users to recognize features and actions more easily rather than relying on memory. This approach reduces cognitive load and makes navigation more intuitive.
- Aesthetic and minimalist design: The interfaces were kept simple and uncluttered, presenting only the most essential information. This minimalist approach enables users to focus solely on key elements, enhancing usability and reducing distractions for a cleaner, more focused experience.
As shown in the sequence diagram, the user first selects an image through the app's user interface, either by capturing a new photo or uploading one from their gallery. Once the image is selected, the app loads the pre-trained model, which is stored in a .tflite
format (which is flutter-friendly) , directly into the app. This model has been specifically trained to classify images into one of 41 distinct plant classes.
After the model is loaded, the image is processed and classified into one of these plant categories. The classification results are then returned and sent back through the app's interface, where they are displayed to the user. The user is able to view the identified plant species along with additional information, ensuring a smooth and responsive interaction. This entire process, from image selection to displaying the results, is designed to be efficient, ensuring quick identification while maintaining user-friendly navigation.
This project encounters several limitations that affect its overall performance and effectiveness in classificaiton :
-
Small Dataset: The primary limitation is the relatively small size of the dataset, comprising only 5,146 images across 41 distinct classes. This limited number of images can hinder the model's ability to generalize effectively, leading to potential overfitting and reduced accuracy, especially for less-represented classes.
-
High Number of Classes: With 41 classes in total, the model must distinguish between many different plant species, some of which share similar characteristics. The abundance of classes relative to the number of available images per class poses a significant challenge, making it difficult for the model to learn unique features for each species.
-
Complex Model Architecture: The Vision Transformer (ViT) architecture employed in this project is inherently complex and resource-intensive. While it offers advanced capabilities for image classification, the model's complexity can require extensive computational resources and may not perform optimally with limited training data.
-
Similar Features Among Classes: Several plants in the dataset belong to the same family, which means they exhibit very similar physical characteristics. This similarity can lead to confusion during classification, as the model may struggle to differentiate between closely related species with overlapping visual features.
-
Computational Power Requirements: The project is computationally heavy, particularly concerning the GPU resources needed for training the model. The ViT architecture demands significant processing power, making it essential to have access to high-performance GPUs to facilitate efficient training and inference. Without adequate computational resources, the training process can be prolonged and may not yield the best results.
These limitations highlight the challenges faced in this project and underscore the need for larger, more diverse datasets, as well as more robust computational resources to improve model performance.