- About The Project
- Features
- Mobile Application
- Project Structure
- The Dataset
- Model Architecture & Training
- Hyperparameter Tuning: Grid Search
- Best Model Performance
- System Testing & Validation
- Getting Started
- Usage
- License
This project explores a modern approach to classifying maize (corn) leaf diseases by applying a Vision Transformer (ViT) architecture. Shifting from the conventional use of Convolutional Neural Networks (CNNs) for this task, this work investigates the effectiveness of the Transformer-based paradigm in computer vision for agricultural applications.
The methodology is centered around transfer learning, where a google/vit-base-patch16-224 model, pre-trained on the vast ImageNet dataset, is fine-tuned to accurately identify five common diseases and healthy maize leaves.
To demonstrate the practical viability of this approach, this prototype is implemented as a complete framework that includes:
- A robust training and evaluation pipeline.
- Systematic hyperparameter tuning using Grid Search.
- Dual inference endpoints: a user-friendly desktop GUI and a scalable Flask API.
- A comprehensive testing suite to ensure reliability and stability.
Ultimately, this entire backend system is designed to serve as the intelligent core for a mobile application, enabling real-time, on-the-field disease diagnosis.
-
State-of-the-Art Architecture: Leverages a Vision Transformer (ViT), providing a modern alternative to traditional CNNs for image classification tasks.
-
Transfer Learning: Built upon a pre-trained
google/vit-base-patch16-224model, significantly reducing training time and improving performance by utilizing features learned from the ImageNet dataset. -
Hyperparameter Optimization: Includes a complete Grid Search pipeline to systematically discover the most effective combination of hyperparameters for the model.
-
Advanced Training Techniques:
- Gradual Unfreezing: Intelligently unfreezes model layers during training to fine-tune effectively.
- Mixup Augmentation: Creates rich, synthetic training examples to enhance model generalization.
- Class Weighting: Addresses data imbalance by applying custom weights to the loss function, ensuring minority classes are not ignored.
- Learning Rate Scheduling & Early Stopping: Optimizes the learning process and prevents overfitting.
-
Multiple Deployment Options:
- Desktop GUI: A user-friendly application built with
CustomTkinterfor easy local inference. - REST API: A scalable backend server using Flask, ready to serve predictions to any client.
- Mobile-Ready: The API is specifically designed to be consumed by a companion mobile application for on-the-field use.
- Desktop GUI: A user-friendly application built with
-
Comprehensive Testing Suite: A robust set of unit, integration, and network tests ensures the reliability and stability of every component, from the model to the API.
-
Configuration-Driven: The entire project is managed through a central
config.yamlfile, making it simple to adjust parameters and run new experiments.
To translate this powerful model into a practical, real-world tool, a companion mobile application has been developed. This app allows users to leverage the complex Vision Transformer model directly from their smartphones, providing an accessible solution for on-the-field disease diagnosis.
The mobile application is a client that communicates with the Flask API server (gui/server.py) from this repository. The interaction is designed to be simple and efficient:
- Image Capture: The user captures an image of a maize leaf using the mobile app's camera.
- API Request: The app sends the captured image via an HTTP
POSTrequest to the/predictendpoint of the running backend server. - Backend Processing: The Flask server receives the image, processes it, and performs inference using the fine-tuned ViT model.
- JSON Response: The server returns the prediction results—specifically the disease class and confidence score—in a structured
JSONformat. - Display Results: The mobile app parses the JSON response and displays the diagnosis to the user in a clear, easy-to-understand interface.
Here is a glimpse of the mobile application's user interface.
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
The source code and installation instructions for the mobile application are available in a separate repository.
- Mobile App Repository: App
The repository is organized to promote modularity and maintainability, with a clear separation of concerns.
root/
├── config/
│ └── config.yaml
├── data/
│ ├── test/
│ │ ├── Common_Rust/
│ │ ├── Gray_Leaf_Spot/
│ │ ├── Healthy/
│ │ ├── Northern_Leaf_Blight/
│ │ ├── Phaeosphaeria_Leaf_Spot/
│ │ ├── Southern_Rust/
│ │ └── test.csv
│ ├── train/
│ │ ├── Common_Rust/
│ │ ├── Gray_Leaf_Spot/
│ │ ├── Healthy/
│ │ ├── Northern_Leaf_Blight/
│ │ ├── Phaeosphaeria_Leaf_Spot/
│ │ ├── Southern_Rust/
│ │ └── train.csv
│ ├── validation/
│ │ ├── Common_Rust/
│ │ ├── Gray_Leaf_Spot/
│ │ ├── Healthy/
│ │ ├── Northern_Leaf_Blight/
│ │ ├── Phaeosphaeria_Leaf_Spot/
│ │ ├── Southern_Rust/
│ │ └── validation.csv
│ └── description.txt
├── gui/
│ ├── __init__.py
│ ├── app.py
│ └── server.py
├── models/
│ ├── checkpoints/
│ │ └── best_model.pth
│ └── grid_search/
│ ├── best_model/
│ │ └── best_model_date.pth
│ └── results/date/
│ ├── best_params.json
│ ├── grid_search_results.csv
│ └── ... (metrics files)
├── src/
│ ├── data/
│ │ └── dataset.py
│ ├── eval/
│ │ ├── evaluate_best_model.py
│ │ └── results/
│ │ ├── ... (report & matrix files)
│ ├── grid_search/
│ │ ├── grid_search.py
│ │ └── run_grid_search.py
│ ├── models/
│ │ └── vit_model.py
│ ├── training/
│ │ └── trainer.py
│ └── utils/
│ └── helpers.py
├── test_images/
│ └── ... (sample images)
├── tests_results/
│ ├── network/
│ └── ... (test result artifacts)
├── tests/
│ ├── api/
│ ├── integration/
│ ├── model/
│ └── network/
├── .gitattributes
├── .gitignore
├── LICENSE
├── main.py
├── README.md
└── requirements.txt
This project utilizes the Maize Leaf Disease dataset, a collection of images specifically gathered for agricultural computer vision tasks. The data is pre-organized into three distinct splits—train, validation, and test—to ensure proper model training, tuning, and unbiased evaluation.
The model is trained to classify maize leaves into one of six categories:
- Common Rust
- Gray Leaf Spot
- Healthy
- Northern Leaf Blight
- Phaeosphaeria Leaf Spot
- Southern Rust
Below is a representative sample image from the training set for each class, providing a visual overview of the data.
| Common Rust | Gray Leaf Spot | Healthy |
|---|---|---|
![]() |
![]() |
![]() |
| Northern Leaf Blight | Phaeosphaeria Leaf Spot | Southern Rust |
![]() |
![]() |
![]() |
To build a robust model that generalizes well to new, unseen images, a series of data augmentations are applied to the training set in real-time. This process, defined in src/data/dataset.py, creates modified versions of the training images for each epoch, helping to prevent overfitting. The techniques used include:
- Geometric Transformations:
- Random Horizontal & Vertical Flips
- Random Rotations (up to 30 degrees)
- Random Affine transformations (scaling and translation)
- Random Perspective changes
- Color Space Adjustments:
- Random Color Jitter (adjusting brightness, contrast, saturation, and hue)
This project moves beyond traditional Convolutional Neural Networks (CNNs) and leverages a Vision Transformer (ViT), specifically the google/vit-base-patch16-224 model. The ViT architecture processes images by:
- Patching: Splitting the input image (224x224 pixels) into a sequence of smaller, fixed-size patches (16x16 pixels).
- Embedding: Linearly embedding each patch into a vector and adding positional information.
- Transformer Encoder: Feeding this sequence of vectors into a standard Transformer encoder, the same architecture that powers state-of-the-art models in Natural Language Processing.
This approach allows the model to learn global relationships between different parts of an image, making it highly effective for complex visual recognition tasks.
To achieve high accuracy without requiring an enormous dataset or extensive training time, this project employs transfer learning. The model is initialized with weights pre-trained on the massive ImageNet dataset. These weights contain rich, general-purpose visual features that are then fine-tuned on our specific maize leaf disease dataset.
The training process, orchestrated by the Trainer class in src/training/trainer.py, incorporates several key techniques to ensure a robust and well-generalized final model:
-
Gradual Unfreezing: A sophisticated fine-tuning strategy where initially only the final classification layer is trained. As training progresses, more layers of the ViT backbone are sequentially unfrozen. This allows the model to first adapt its decision-making process and then gradually adjust its deeper feature extraction capabilities to the specific nuances of maize leaf diseases.
-
Class Weighting: The dataset exhibits some class imbalance. To counteract this, a weighted Cross-Entropy Loss function is used. Classes with fewer samples (such as
Southern RustandPhaeosphaeria Leaf Spot) are assigned higher weights, forcing the model to pay more attention to them during training and preventing it from becoming biased towards the majority classes. -
Mixup Augmentation: A powerful data augmentation technique that creates synthetic training examples by linearly interpolating pairs of images and their labels. This helps to regularize the model, improve its generalization, and make it less sensitive to adversarial examples.
-
AdamW Optimizer: An improved version of the Adam optimizer that decouples weight decay from the gradient update, often leading to better model performance.
-
ReduceLROnPlateau Scheduler & Early Stopping: The learning rate is automatically reduced when the validation loss plateaus, allowing the model to make finer adjustments as it converges. To prevent overfitting and save resources, training is automatically halted if the validation loss fails to improve for a specified number of epochs.
To ensure the model performs optimally, a systematic hyperparameter search was conducted using a Grid Search methodology. This process, implemented in src/grid_search/grid_search.py, involves training and evaluating the model across a defined parameter space to identify the most effective configuration.
The search was conducted over the parameter space detailed below. Key hyperparameters such as learning_rate, weight_decay, and batch_size were varied, while others were held constant to ensure a controlled experiment.
| Parameter | Values / Searched Space | Description |
|---|---|---|
learning_rate |
[1e-5, 5e-5] |
Controls the step size during optimization. |
weight_decay |
[0.01, 0.02] |
Regularization technique to prevent overfitting. |
batch_size |
[16, 32] |
Number of samples processed before the model is updated. |
num_epochs |
15 | Total number of passes through the entire training dataset. |
scheduler_patience |
3 | Epochs with no improvement to wait before reducing LR. |
scheduler_factor |
0.1 | Factor by which the learning rate is reduced. |
hidden_dropout_prob |
0.1 | Dropout probability for the fully connected layers. |
attention_probs_dropout |
0.1 | Dropout probability for the attention mechanisms. |
Varying the learning_rate, weight_decay, and batch_size resulted in a total of 8 unique hyperparameter combinations (2 × 2 × 2) being trained and evaluated. The model with the highest validation accuracy was selected as the best performer.
The table below provides a comprehensive summary of the results for each run. The model selection was based purely on the validation accuracy (Val Acc).
| LR | WD | BS | Train Acc | Train Loss | Val Acc | Val Loss | Test Acc | Test Loss |
|---|---|---|---|---|---|---|---|---|
| 1e-5 | 0.01 | 16 | 0.9724 | 0.1026 | 0.8314 | 0.7527 | 0.9291 | 0.3539 |
| 1e-5 | 0.01 | 32 | 0.9757 | 0.0921 | 0.8416 | 0.6905 | 0.9321 | 0.3326 |
| 1e-5 | 0.02 | 16 | 0.9778 | 0.0904 | 0.8460 | 0.6991 | 0.9291 | 0.3317 |
| 1e-5 | 0.02 | 32 | 0.9739 | 0.0953 | 0.8592 | 0.6229 | 0.9409 | 0.3217 |
| 5e-5 | 0.01 | 16 | 0.9866 | 0.0572 | 0.8519 | 0.8494 | 0.9365 | 0.3683 |
| 5e-5 | 0.01 | 32 | 0.9836 | 0.0637 | 0.8680 | 0.8207 | 0.9498 | 0.3503 |
| 5e-5 | 0.02 | 16 | 0.9847 | 0.0624 | 0.8446 | 0.7792 | 0.9527 | 0.3405 |
| 5e-5 | 0.02 | 32 | 0.9829 | 0.0651 | 0.8622 | 0.6999 | 0.9527 | 0.3064 |
The search identified the combination of learning_rate=5e-5, weight_decay=0.01, and batch_size=32 as the optimal configuration, achieving the highest validation accuracy of 86.80%. This configuration was selected for the final model.
Following the Grid Search, the model configuration that yielded the highest validation accuracy was selected as the final, best-performing model. This model, saved in models/grid_search/best_model/, represents the most effective combination of hyperparameters found during the search.
The optimal hyperparameters identified by the Grid Search are as follows:
{
"learning_rate": 5e-05,
"weight_decay": 0.01,
"batch_size": 32,
"num_epochs": 15,
"scheduler_patience": 3,
"scheduler_factor": 0.1,
"hidden_dropout_prob": 0.1,
"attention_probs_dropout_prob": 0.1
}This optimized model was then evaluated on the completely unseen test set to provide an unbiased measure of its real-world performance.
- Overall Test Accuracy: 94.98%
The table below shows the detailed classification report, including precision, recall, and F1-score for each class.
| Class | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| Common Rust | 1.0000 | 0.9916 | 0.9958 | 119 |
| Gray Leaf Spot | 0.9558 | 0.9908 | 0.9730 | 109 |
| Healthy | 1.0000 | 0.9310 | 0.9643 | 116 |
| Northern Leaf Blight | 0.9894 | 0.9490 | 0.9688 | 98 |
| Phaeosphaeria Leaf Spot | 0.8516 | 0.9635 | 0.9041 | 137 |
| Southern Rust | 0.9438 | 0.8571 | 0.8984 | 98 |
| Accuracy | 0.9498 | 677 | ||
| Macro Avg | 0.9568 | 0.9472 | 0.9507 | 677 |
| Weighted Avg | 0.9532 | 0.9498 | 0.9502 | 677 |
The confusion matrix below provides a visual breakdown of the model's predictions versus the actual labels on the test set. The diagonal elements represent correctly classified samples.
This detailed analysis confirms the model's strong performance and its ability to distinguish between the different maize leaf diseases with high accuracy.
To ensure the reliability, stability, and correctness of the entire solution, this project includes a comprehensive and multi-layered testing suite located in the tests/ directory. The tests are designed to validate every component, from the core model logic to the API's behavior under stress.
The test results, including detailed performance plots and JSON logs, are automatically generated and saved in the tests_results/ directory.
-
Unit Tests (
tests/model/): These tests focus on the smallest components of the machine learning pipeline in isolation, verifying correct model loading, image processing, and inference output formats. -
API Tests (
tests/api/): These tests validate the Flask API server, ensuring it handles requests correctly, manages errors gracefully, and processes various image formats. -
Integration & Performance Tests (
tests/integration/): These tests evaluate the system as a whole. They include a full-flow test to identify bottlenecks, a load test to measure scalability (RPS), and a stability test to detect performance degradation over time. -
Network Tests (
tests/network/): These tests simulate adverse network conditions to verify the system's resilience against timeouts and connection losses.
The integration and performance tests generate several key visualizations, providing insights into the system's behavior. Below are some of the most important results from the test runs.
| Response Time Distribution | Processing Time by Phase |
|---|---|
![]() |
![]() |
| Server Processing Time Distribution | Total Time vs. Image Size |
![]() |
![]() |
This rigorous testing approach ensures that the project is not only accurate but also robust and production-ready.
Follow these instructions to set up the project environment and install all necessary dependencies to run the application on your local machine.
Before you begin, ensure you have the following software installed on your system:
- Python: Version 3.8 or higher. You can download it from python.org.
- Git: Required for cloning the repository. You can download it from git-scm.com.
- Virtual Environment Manager (Highly Recommended): Using a tool like
venv(included with Python) orcondais strongly advised to isolate project dependencies.
-
Clone the Repository Open your terminal or command prompt and run the following command to clone the project:
git clone https://github.com/AlvaroVasquezAI/Maize_Leaf_Disease_Classification.git cd Maize_Leaf_Disease_Classification -
Create and Activate a Virtual Environment It is best practice to create a virtual environment to avoid conflicts with other Python projects.
# Create the virtual environment python -m venv venv # Activate the environment # On Windows: venv\Scripts\activate # On macOS/Linux: source venv/bin/activate
You will know the environment is active when you see
(venv)at the beginning of your terminal prompt. -
Install Dependencies The
requirements.txtfile contains all the necessary libraries pinned to specific versions for guaranteed compatibility. Install them with a single command:pip install -r requirements.txt
This will install PyTorch, Transformers, Flask, and all other required packages. Once this process is complete, the project is ready to be used.
This project offers several ways to interact with the model, from running inference to training and testing. All commands should be executed from the root directory of the project.
For easy, local predictions, you can launch the desktop application. This provides a user-friendly interface to classify your own images.
- Run the application:
python -m gui.app
- Use the GUI:
- Click "Select Model" and navigate to
models/grid_search/best_model/to load the.pthfile. - Once the model is loaded, click "Select Image" to choose a maize leaf image from your computer.
- The predicted class and confidence score will appear automatically.
- Click "Select Model" and navigate to
To serve the model via a REST API (required for the mobile app or other clients), run the Flask server.
-
Start the server:
python -m gui.server
The server will start and listen for requests on
http://localhost:5000. -
Send a prediction request: You can use tools like
curlor Postman to send aPOSTrequest with an image file to the/predictendpoint.Example using
curl:curl -X POST -F "image=@/path/to/your/leaf_image.jpg" http://localhost:5000/predict
The main.py script is the entry point for training the model or running the hyperparameter search.
-
Run a Standard Training Session: This will train the model using the parameters defined in
config/config.yaml.python main.py --mode train
-
Run the Grid Search: This will execute the hyperparameter search defined in
src/grid_search/run_grid_search.pyto find the best model configuration.python main.py --mode grid_search
or
python -m src.grid_search.run_grid_search
After training, you can generate a detailed performance analysis (classification report and confusion matrix) of the best model using the dedicated evaluation script.
-
Evaluate on the Test Set (Default):
python -m src.eval.evaluate_best_model --split test -
Evaluate on the Validation Set:
python -m src.eval.evaluate_best_model --split validation
The output artifacts will be saved in the
src/eval/results/directory.
To verify the integrity and stability of the entire project, run the complete test suite.
python -m unittest discover testsThis command will automatically discover and run all tests located in the tests/ directory.
This project is distributed under the MIT License. See the LICENSE file for more information.
















