From e75d6efa332e7935c95d0eb9dc9ecbb1527eb4bb Mon Sep 17 00:00:00 2001 From: Jason Andrews Date: Wed, 18 Dec 2024 04:12:03 +0000 Subject: [PATCH] Review MNIST Learning Path --- .../_index.md | 18 +++-- .../_review.md | 4 +- .../app.md | 15 ++-- .../datasets-and-training.md | 14 ++-- .../inference.md | 10 +-- .../intro-android.md | 30 ++++--- .../intro-opt.md | 59 +++++++++++--- .../intro.md | 10 ++- .../intro2.md | 4 +- .../mobile-app.md | 37 ++++++--- .../model-opt.md | 50 ++++++++---- .../model.md | 4 +- .../optimisation.md | 24 ++++-- .../prepare-data.md | 31 ++++++-- .../user-interface.md | 79 ++++++++++++------- 15 files changed, 268 insertions(+), 121 deletions(-) diff --git a/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/_index.md b/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/_index.md index 6e375dd4c..519c799de 100644 --- a/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/_index.md +++ b/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/_index.md @@ -3,26 +3,27 @@ title: Create and train a PyTorch model for digit classification minutes_to_complete: 160 -who_is_this_for: This is an introductory topic for software developers interested in learning how to use PyTorch to create and train a feedforward neural network for digit classification. Also you will learn how to use the trained model in an android app. Last, you will discover model optimisations. +who_is_this_for: This is an advanced topic for software developers interested in learning how to use PyTorch to create and train a feedforward neural network for digit classification. You will also learn how to use the trained model in an Android application. Finally, you will apply model optimizations. learning_objectives: - Prepare a PyTorch development environment. - Download and prepare the MNIST dataset. - Create a neural network architecture using PyTorch. - Train a neural network using PyTorch. - - Creating an Android app and loading the pre-trained mdoel. - - Preparing an input dataset. - - Measuring the inference time. - - Optimise a neural network architecture using quantization and fusing. - - Use an optimised model in an Android app. + - Create an Android app and loading the pre-trained model. + - Prepare an input dataset. + - Measure the inference time. + - Optimize a neural network architecture using quantization and fusing. + - Use an optimized model in the Android application. + prerequisites: - - A computer that can run Python3 and Visual Studio Code. The OS can be Windows, Linux, or macOS. + - A computer that can run Python3, Visual Studio Code, and Android Studio. The OS can be Windows, Linux, or macOS. author_primary: Dawid Borycki ### Tags -skilllevels: Introductory +skilllevels: Advanced subjects: ML armips: - Cortex-A @@ -35,6 +36,7 @@ operatingsystems: tools_software_languages: - Android Studio - Coding + - VS Code shared_path: true shared_between: - servers-and-cloud-computing diff --git a/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/_review.md b/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/_review.md index fb1980742..8347d010f 100644 --- a/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/_review.md +++ b/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/_review.md @@ -34,12 +34,12 @@ review: Which loss function was used to train the PyTorch model on the MNIST dataset? answers: - Mean Squared Error Loss - - CrossEntropyLoss + - Cross Entropy Loss - Hinge Loss - Binary Cross-Entropy Loss correct_answer: 2 explanation: > - The CrossEntropyLoss function was used to train the model because it is suitable for multi-class classification tasks like digit classification. It measures the difference between the predicted probabilities and the true class labels, helping the model learn to make accurate predictions. + Cross Entropy Loss was used to train the model because it is suitable for multi-class classification tasks like digit classification. It measures the difference between the predicted probabilities and the true class labels, helping the model learn to make accurate predictions. # ================================================================================ # FIXED, DO NOT MODIFY diff --git a/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/app.md b/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/app.md index cc697cfca..d591afe57 100644 --- a/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/app.md +++ b/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/app.md @@ -1,20 +1,23 @@ --- # User change -title: "Running an Application" +title: "Run the Application" weight: 10 layout: "learningpathall" --- -You are now ready to run the application. You can use either an emulator or a physical device. In this guide, we will use an emulator. +You are now ready to run the Android application. You can use an emulator or a physical device. + +The screenshots below show an emulator. + +To run the app in Android Studio using an emulator, follow these steps: -To run an app in Android Studio using an emulator, follow these steps: 1. Configure the Emulator: * Go to Tools > Device Manager (or click the Device Manager icon on the toolbar). * Click Create Device to set up a new virtual device (if you haven’t done so already). -* Choose a device model (e.g., Pixel 4) and click Next. -* Select a system image (e.g., Android 11, API level 30) and click Next. +* Choose a device model, such as Pixel 4, and click Next. +* Select a system image, such as Android 11, API level 30, and click Next. * Review the settings and click Finish to create the emulator. 2. Run the App: @@ -29,4 +32,4 @@ Once the application is started, click the Load Image button. It will load a ran ![img](Figures/06.png) -In the next step you will learn how to further optimise the model. +In the next step you will learn how to further optimize the model. diff --git a/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/datasets-and-training.md b/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/datasets-and-training.md index d50b6d3c4..d1e499113 100644 --- a/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/datasets-and-training.md +++ b/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/datasets-and-training.md @@ -1,12 +1,14 @@ --- # User change -title: "Datasets and training" +title: "Perform training and save the model" weight: 5 layout: "learningpathall" --- +## Prepare the MNIST data + Start by downloading the MNIST dataset. Proceed as follows: 1. Open the pytorch-digits.ipynb you created earlier. @@ -60,7 +62,7 @@ The certifi Python package provides the Mozilla root certificates, which are ess Make sure to replace `x` with the number of Python version you have installed. -After running the code you will see the output that might look like shown below: +After running the code you see output similar to the screenshot below: ![image](Figures/01.png) @@ -122,18 +124,18 @@ for t in range(epochs): test_loop(test_dataloader, model, loss_fn) ``` -After running this code, you will see the following output that shows the training progress. +After running the code, you see the following output showing the training progress. ![image](Figures/02.png) -Once the training is complete, you will see something like the following: +Once the training is complete, you see output similar to: ```output Epoch 10: Accuracy: 95.4%, Avg loss: 1.507491 ``` -which shows the model achieved around 95% of accuracy. +The output shows the model achieved around 95% accuracy. # Save the model @@ -174,4 +176,4 @@ Setting the model to evaluation mode before tracing is important for several rea 3. Correct Tracing. Tracing captures the operations performed by the model using a given input. If the model is in training mode, the traced graph may include operations related to dropout and batch normalization updates. These operations can affect the correctness and performance of the model during inference. -In the next step, you will use the saved model for inference. +In the next step, you will use the saved model for ML inference. diff --git a/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/inference.md b/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/inference.md index 4e400056b..9aed5754e 100644 --- a/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/inference.md +++ b/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/inference.md @@ -1,6 +1,6 @@ --- # User change -title: "Inference" +title: "Use the model for inference" weight: 6 @@ -26,7 +26,7 @@ Start by installing matplotlib package: pip install matplotlib ``` -Then, in Visual Studio Code create a new file named `pytorch-digits-inference.ipynb` and modify the file to include the code below: +Use Visual Studio Code to create a new file named `pytorch-digits-inference.ipynb` and modify the file to include the code below: ```python import torch @@ -101,9 +101,9 @@ After running the code, you should see results similar to the following figure: ![image](Figures/03.png) -# What you have learned +# What have you learned? -In this exercise, you went through the complete process of training and using a PyTorch model for digit classification on the MNIST dataset. Using the training dataset, you optimized the model’s weights and biases over multiple epochs. You employed the CrossEntropyLoss function and the Adam optimizer to minimize prediction errors and improve accuracy. You periodically evaluated the model on the test dataset to monitor its performance, ensuring it was learning effectively without overfitting. +You have completed the process of training and using a PyTorch model for digit classification on the MNIST dataset. Using the training dataset, you optimized the model’s weights and biases over multiple epochs. You employed the CrossEntropyLoss function and the Adam optimizer to minimize prediction errors and improve accuracy. You periodically evaluated the model on the test dataset to monitor its performance, ensuring it was learning effectively without overfitting. After training, you saved the model using TorchScript, which captures both the model’s architecture and its learned parameters. This made the model portable and independent of the original class definition, simplifying deployment. @@ -111,4 +111,4 @@ Next, you performed inference. You loaded the saved model and set it to evaluati This comprehensive process, from model training and saving to inference and visualization, illustrates the end-to-end workflow for building and deploying a machine learning model in PyTorch. It demonstrates how to train a model, save it in a portable format, and then use it to make predictions on new data. -In the next step, you will learn how to use the model in the mobile Android application. \ No newline at end of file +In the next step, you will learn how to use the model in an Android application. \ No newline at end of file diff --git a/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/intro-android.md b/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/intro-android.md index 3f898a2b1..849c4cfc0 100644 --- a/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/intro-android.md +++ b/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/intro-android.md @@ -1,24 +1,36 @@ --- # User change -title: "Background for running inference on Android" +title: "Understand inference on Android" weight: 7 layout: "learningpathall" --- -Running pre-trained machine learning models on mobile and edge devices has become increasingly common as it enables these devices to gain intelligence and perform complex tasks directly on-device. This capability allows smartphones, IoT devices, and embedded systems to execute advanced functions such as image recognition, natural language processing, and real-time decision-making without relying on cloud-based services. By leveraging on-device inference, applications can offer faster responses, reduced latency, enhanced privacy, and offline functionality, making them more efficient and capable of handling sophisticated tasks in various environments. +Running pre-trained machine learning models on mobile and edge devices has become increasingly common as it enables these devices to gain intelligence and perform complex tasks directly on-device. This capability allows smartphones, IoT devices, and embedded systems to execute advanced functions such as image recognition, natural language processing, and real-time decision-making without relying on cloud-based services. -Arm provides a wide range of hardware and software accelerators designed to optimize the performance of machine learning (ML) models on edge devices. These include specialized processors like Arm's Neural Processing Units (NPUs) and Graphics Processing Units (GPUs), as well as software frameworks like the Arm Compute Library and Arm NN, which are tailored to leverage these hardware capabilities. Arm's technology is ubiquitous, powering a vast array of devices from smartphones and tablets to IoT gadgets and embedded systems. With Arm chips being the core of many Android-based smartphones and other devices, running ML models efficiently on this hardware is crucial for enabling advanced applications such as image recognition, voice assistance, and real-time analytics. By utilizing Arm’s accelerators, developers can achieve lower latency, reduced power consumption, and enhanced performance, making on-device AI both practical and powerful for a wide range of applications. +By leveraging on-device inference, applications can offer faster responses, reduced latency, enhanced privacy, and offline functionality, making them more efficient and capable of handling sophisticated tasks in various environments. -Running a machine learning model on Android involves a few key steps. First, you need to train and save the model in a mobile-friendly format, such as TensorFlow Lite, ONNX, or TorchScript, depending on the framework you are using. Next, you add the model file to your Android project’s assets directory. In your app’s code, use the corresponding framework’s Android library, such as TensorFlow Lite or PyTorch Mobile, to load the model. You then prepare the input data, ensuring it is formatted and preprocessed in the same way as during model training. The input data is passed through the model, and the output predictions are retrieved and interpreted accordingly. For improved performance, you can leverage hardware acceleration using Android’s Neural Networks API (NNAPI) or use GPU support if available. This process enables the Android app to make real-time predictions and execute complex machine learning tasks directly on the device. +Arm provides a wide range of hardware and software accelerators designed to optimize the performance of machine learning (ML) models on edge devices. These include specialized processors like Arm's Neural Processing Units (NPUs) and Graphics Processing Units (GPUs), as well as software frameworks like the Arm Compute Library and Arm NN, which are tailored to leverage these hardware capabilities. -In this Learning Path, you will learn how to perform such inference in the Android app using a pre-trained digit classifier, created [here](learning-paths/cross-platform/pytorch-digit-classification-training). +Running a machine learning model on Android involves a few key steps. + +First, you train and save the model in a mobile-friendly format, such as TensorFlow Lite, ONNX, or TorchScript, depending on the framework you are using. + +Next, you add the model file to your Android project’s assets directory. In your app’s code, use the corresponding framework’s Android library, such as TensorFlow Lite or PyTorch Mobile, to load the model. + +You then prepare the input data, ensuring it is formatted and preprocessed in the same way as during model training. The input data is passed through the model, and the output predictions are retrieved and interpreted accordingly. For improved performance, you can leverage hardware acceleration using Android’s Neural Networks API (NNAPI) or use GPU support if available. This process enables the Android app to make real-time predictions and execute complex machine learning tasks directly on the device. + +In this Learning Path, you will learn how to perform inference in an Android application using the pre-trained digit classifier from the previous sections. ## Before you begin -Before you begin make sure Python3, [Visual Studio Code](https://code.visualstudio.com/download) and [Android Studio](https://developer.android.com/studio/install) are installed on your system. -## Source code -The complete source code is available [here](https://github.com/dawidborycki/Arm.PyTorch.MNIST.Inference.git). +Before you begin make [Android Studio](https://developer.android.com/studio/install) is installed on your system. + +## Project source code + +The following steps explain how to build an Android application for MNIST inference. The application can be constructed from scratch, but there are two GitHub repositories available if you need to copy any files from them as you learn how to create the Android application. + +The complete source code for the [Android application](https://github.com/dawidborycki/Arm.PyTorch.MNIST.Inference.git) is available on GitHub. -The Python scripts are available [here](https://github.com/dawidborycki/Arm.PyTorch.MNIST.Inference.Python.git) +The [Python scripts](https://github.com/dawidborycki/Arm.PyTorch.MNIST.Inference.Python.git) used in the previous steps are also available on GitHub. diff --git a/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/intro-opt.md b/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/intro-opt.md index 4f5a04025..870aa445d 100644 --- a/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/intro-opt.md +++ b/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/intro-opt.md @@ -1,26 +1,65 @@ --- # User change -title: "Optimising neural network models in PyTorch" +title: "Optimizing neural network models in PyTorch" weight: 11 layout: "learningpathall" --- -In the realm of machine learning (ML) for edge and mobile inference, optimizing models is crucial to achieving efficient performance while minimizing resource consumption. As mobile and edge devices often have limited computational power, memory, and energy availability, various strategies are employed to ensure that ML models can run effectively in these constrained environments. +## Optimizing models -**Quantization** is one of the most widely used techniques, which reduces the precision of the model's weights and activations from floating-point to lower-bit representations, such as int8 or float16. This not only reduces the model size but also accelerates inference speed on hardware that supports lower precision arithmetic. +Optimizing models is crucial to achieving efficient performance while minimizing resource consumption. -Another key optimization strategy is **layer fusion**, where multiple operations, such as combining linear layers with their subsequent activation functions (like ReLU), into a single layer. This reduces the number of operations that need to be executed during inference, minimizing latency and improving throughput. +Because mobile and edge devices can have limited computational power, memory, and energy availability, various strategies are used to ensure that ML models can run effectively in these constrained environments. -In addition to these techniques, **pruning**, which involves removing less important weights or neurons from the model, can help in creating a leaner model that requires fewer resources without significantly affecting accuracy. +### Quantization -Finally, leveraging hardware-specific optimizations, such as **using the Android Neural Networks API (NNAPI)** allows developers to take full advantage of the underlying hardware acceleration available on edge devices. By employing these strategies, developers can significantly enhance the efficiency of ML models for deployment on mobile and edge platforms, ensuring a balance between performance and resource utilization. +Quantization is one of the most widely used techniques, which reduces the precision of the model's weights and activations from floating-point to lower-bit representations, such as int8 or float16. This not only reduces the model size but also accelerates inference speed on hardware that supports lower precision arithmetic. -PyTorch offers robust support for various optimization techniques that enhance the performance of machine learning models for edge and mobile inference. One of the key features is its quantization toolkit, which provides a streamlined workflow for applying quantization to models. PyTorch supports both static and dynamic quantization, allowing developers to reduce model size and improve inference speed without sacrificing accuracy. Additionally, PyTorch enables layer fusion through its torch.quantization module, enabling seamless integration of operations like fusing linear layers with their activation functions, thus optimizing execution by minimizing computational overhead. Furthermore, the TorchScript functionality allows for the creation of serializable and optimizable models that can be efficiently deployed on mobile devices. PyTorch’s integration with hardware acceleration libraries, such as NNAPI for Android, enables developers to leverage specific hardware capabilities, ensuring optimal model performance tailored to the device’s architecture. Overall, PyTorch provides a comprehensive ecosystem that empowers developers to implement effective optimizations for mobile and edge deployment, enhancing both speed and efficiency. +### Layer fusion -In this Learning Path, we will delve into the techniques of **quantization** and **fusion** using our previously created neural network model for [digit classification](/learning-paths/cross-platform/pytorch-digit-classification-arch-training/). By applying quantization, we will reduce the model's weight precision, transitioning from floating-point representations to lower-bit formats, which not only minimizes the model size but also enhances inference speed. This process is crucial for optimizing our model for deployment on resource-constrained devices. +Another key optimization strategy is layer fusion, where multiple operations, such as combining linear layers with their subsequent activation functions (like ReLU), into a single layer. This reduces the number of operations that need to be executed during inference, minimizing latency and improving throughput. -Additionally, we will explore layer fusion, which combines multiple operations within the model—such as fusing linear layers with their subsequent activation functions—into a single operation. This reduction in operational complexity further streamlines the model, leading to improved performance during inference. By implementing these optimizations, we aim to enhance the efficiency of our digit classification model, making it well-suited for deployment in mobile and edge environments. +### Pruning -First, we will modify our previous Python scripts for [both training and inference](/learning-paths/cross-platform/pytorch-digit-classification-arch-training/)to incorporate model optimizations like quantization and fusion. After adjusting the training pipeline to produce an optimized version of the model, we will also update our inference script to handle both the original and optimized models. Once these changes are made, we will modify the [Android app](pytorch-digit-classification-inference-android-app) to load either the original or optimized model based on user input, allowing us to switch between them dynamically. This setup will enable us to directly compare the inference speed of both models on the device, providing valuable insights into the performance benefits of model optimization techniques in real-world scenarios. \ No newline at end of file +In addition to these techniques, pruning, which involves removing less important weights or neurons from the model, can help in creating a leaner model that requires fewer resources without significantly affecting accuracy. + + +### Android NNAPI + +Leveraging hardware-specific optimizations, such as the Android Neural Networks API (NNAPI) allows you to take full advantage of the underlying hardware acceleration available on edge devices. + +### More on optimization + +By employing these strategies, you can significantly enhance the efficiency of ML models for deployment on mobile and edge platforms, ensuring a balance between performance and resource utilization. + +PyTorch offers robust support for various optimization techniques that enhance the performance of machine learning models for edge and mobile inference. + +One of the key PyTorch features is its quantization toolkit, which provides a streamlined workflow for applying quantization to models. PyTorch supports both static and dynamic quantization, allowing developers to reduce model size and improve inference speed without sacrificing accuracy. + +Additionally, PyTorch enables layer fusion through its torch.quantization module, enabling seamless integration of operations like fusing linear layers with their activation functions, thus optimizing execution by minimizing computational overhead. + +Furthermore, the TorchScript functionality allows for the creation of serializable and optimizable models that can be efficiently deployed on mobile devices. + +PyTorch’s integration with hardware acceleration libraries, such as NNAPI for Android, enables developers to leverage specific hardware capabilities, ensuring optimal model performance tailored to the device's architecture. + +Overall, PyTorch provides a comprehensive ecosystem that empowers developers to implement effective optimizations for mobile and edge deployment, enhancing both speed and efficiency. + +### Optimization Next steps + +In the following sections, you will delve into the techniques of quantization and fusion using the previously created neural network model and Android application. + +By applying quantization, you will reduce the model's weight precision, transitioning from floating-point representations to lower-bit formats, which not only minimizes the model size but also enhances inference speed. This process is crucial for optimizing our model for deployment on resource-constrained devices. + +Additionally, you will explore layer fusion, which combines multiple operations within the model, such as fusing linear layers with their subsequent activation functions into a single operation. This reduction in operational complexity further streamlines the model, leading to improved performance during inference. + +By implementing these optimizations, you can enhance the efficiency of the digit classification model, making it well-suited for deployment in mobile and edge environments. + +First, you will modify the previous Python scripts for training and inference to incorporate model optimizations like quantization and fusion. + +After adjusting the training pipeline to produce an optimized version of the model, you will update the inference script to handle both the original and optimized models. + +Once these changes are made, you will modify the Android application to load either the original or the optimized model based on user input, allowing you to switch between them dynamically. + +This setup enables you to compare the inference speed of both models on the device, providing valuable insights into the performance benefits of model optimization techniques in real-world scenarios. \ No newline at end of file diff --git a/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/intro.md b/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/intro.md index af7cffde5..4e256d7f9 100644 --- a/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/intro.md +++ b/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/intro.md @@ -7,6 +7,8 @@ weight: 2 layout: "learningpathall" --- +## Introduction to PyTorch + PyTorch is an open-source deep learning framework that is developed by Meta AI and is now part of the Linux Foundation. PyTorch is designed to provide a flexible and efficient platform for building and training neural networks. It is widely used due to its dynamic computational graph, which allows users to modify the architecture during runtime, making debugging and experimentation easier. @@ -22,7 +24,7 @@ A typical process for creating a feedforward neural network in PyTorch involves To create a model, users subclass the torch.nn.Module class, defining the network architecture in the __init__ method, and implement the forward pass in the forward method. PyTorch’s intuitive API and support for GPU acceleration make it ideal for building efficient feedforward networks, particularly in tasks such as image classification and digit recognition. -In this Learning Path, you will explore how to use PyTorch for creating a model for digit recognition, before then proceeding to train it. +In this Learning Path, you will explore how to use PyTorch to create and train a model for digit recognition. ## Before you begin @@ -40,7 +42,7 @@ Python 3.11.2 If Python3 is not installed, download and install it from [python.org](https://www.python.org/downloads/). -Alternatively, you can also install Python3 using package managers such as Brew or APT. +Alternatively, you can also install Python3 using package managers such as Homebrew or APT. If you are using Windows on Arm you can refer to the [Python install guide](https://learn.arm.com/install-guides/py-woa/). @@ -72,9 +74,9 @@ pytorch-env\Scripts\activate source pytorch-env/bin/activate ``` -Once activated, you should see the virtual environment name in your terminal prompt. +Once activated, you see the virtual environment name `(pytorch-env)` before your terminal prompt. -3. Install PyTorch using `pip`: +3. Install PyTorch using Pip: ```console pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu diff --git a/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/intro2.md b/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/intro2.md index ae6126132..35ce79242 100644 --- a/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/intro2.md +++ b/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/intro2.md @@ -1,12 +1,14 @@ --- # User change -title: "PyTorch model training" +title: "About PyTorch model training" weight: 4 layout: "learningpathall" --- +## PyTorch model training + In the previous section, you created a feedforward neural network for digit classification using the MNIST dataset. The network was left untrained and lacks the ability to make accurate predictions. To enable the network to recognize handwritten digits effectively, training is needed. Training in PyTorch involves configuring the network's parameters, such as weights and biases, by exposing the model to labeled data and iteratively adjusting these parameters to minimize prediction errors. This process allows the model to learn the patterns in the data, enabling it to make accurate classifications on new, unseen inputs. diff --git a/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/mobile-app.md b/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/mobile-app.md index 895cf6e2a..fe897f817 100644 --- a/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/mobile-app.md +++ b/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/mobile-app.md @@ -1,15 +1,15 @@ --- # User change -title: "Modify an Android App" +title: "Update the Android application" weight: 14 layout: "learningpathall" --- -You will now use the optimised model in the Android App we developed earlier in this Learning Path. +You can now use the optimized model in the Android application you developed earlier. -Start by modifying the activity_main.xml by adding the CheckBox: +Start by modifying the `activity_main.xml` by adding a `CheckBox` to use the optimized model: ```XML @@ -21,7 +21,9 @@ Start by modifying the activity_main.xml by adding the CheckBox: android:textSize="16sp"/> ``` -Then copy the optimised model to assets folder of the Anroid App project, and replace the MainActivity.kt by the following code: +Copy the optimized model to the `assets` folder of the Android project. + +Replace the `MainActivity.kt` by the following code: ```Kotlin package com.arm.armpytorchmnistinference @@ -212,17 +214,30 @@ class MainActivity : AppCompatActivity() { } ``` -Here is the proofread and expanded version: +The updated version of the Android application includes modifications to the Android Activity to dynamically load the model based on the state of the `CheckBox`. ---- +When the `CheckBox` is selected, the app loads the optimized model, which is quantized and fused for improved performance. + +If the `CheckBox` is not selected, the app loads the original model. + +After the model is loaded, the inference is run. To better estimate the execution time, the `runInference()` method executes the inference 100 times in a loop. This provides a more reliable measure of the average inference time by smoothing out any inconsistencies from single executions. + +The results for a run on a physical device are shown below. These results indicate that, on average, the optimized model reduced the inference time to about 65% of the original model's execution time, showing a significant improvement in performance. -In this updated version of the Android app, we made modifications to the Android Activity to dynamically load the model based on the state of the **CheckBox**. When the **CheckBox** is selected, the app loads the **optimized model**, which has been quantized and fused for improved performance. If the **CheckBox** is not selected, the app loads the original unoptimized model. +This optimization showcases the benefits of quantization and layer fusion for mobile inference, and there is further potential for enhancement by enabling hardware acceleration on supported devices. -After the model is loaded, we proceed to run the inference. To better estimate the execution time, we also modified the `runInference` method to execute the inference **100 times** in a loop. This gives us a more reliable measure of the average inference time by smoothing out any inconsistencies from single executions. +This would allow the model to take full advantage of the device's computational capabilities, potentially reducing the inference time even more. -Once you run the app on an actual device, you will see the results shown below. These results indicate that, on average, the optimized model reduced the inference time to about **65%** of the original model's execution time, showing a significant improvement in performance. This optimization showcases the benefits of quantization and layer fusion for mobile inference, and there is further potential for enhancement by enabling **hardware acceleration** on supported devices. This would allow the model to take full advantage of the device's computational capabilities, potentially reducing the inference time even more. ![fig](Figures/07.jpg) + ![fig](Figures/08.jpg) -# Summary -Here, we successfully optimized a neural network model for mobile inference using techniques such as quantization and layer fusion. The model was optimized through quantization and layer fusion, removing unnecessary elements like the Dropout layers during inference. By running multiple iterations of the inference process, we demonstrated that the optimized model significantly reduced the average inference time to around 65% of the original time, especially beneficial for deployment on resource-constrained mobile devices. Additionally, there is potential for further performance improvements by leveraging hardware acceleration. \ No newline at end of file +# What have you learned? + +You have successfully optimized a neural network model for mobile inference using quantization and layer fusion. + +Quantization and layer fusion removed unnecessary elements such as dropout layers during inference. + +By running multiple iterations of the inference process, you learned that the optimized model significantly reduced the average inference time to around 65% of the original time. + +You also learned that there is potential for further performance improvements by leveraging hardware acceleration. \ No newline at end of file diff --git a/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/model-opt.md b/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/model-opt.md index 957ed06fc..dc5b8556e 100644 --- a/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/model-opt.md +++ b/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/model-opt.md @@ -1,20 +1,23 @@ --- # User change -title: "Create a PyTorch model for MNIST" +title: "Create an optimized PyTorch model for MNIST" weight: 12 layout: "learningpathall" --- -Now, similarly as before, you will create and train a feedforward neural network to classify handwritten digits from the MNIST dataset. To remind this dataset contains 70,000 images, comprising 60,000 training and 10,000 testing images, of handwritten numerals (0-9), each with dimensions of 28x28 pixels. +You can create and train an optimized feedforward neural network to classify handwritten digits from the MNIST dataset. As a reminder, the dataset contains 70,000 images, comprising 60,000 training and 10,000 testing images, of handwritten numerals (0-9), each with dimensions of 28x28 pixels. -You will, however, introduce several changes to enable model quantization and fusing. +This time you will introduce several changes to enable model quantization and fusing. # Model architecture -Start by creating a new notebook, pytorch-digits-model-optimisations.ipynb. Then define the model architecture as follows (you can find the complete source code [here](https://github.com/dawidborycki/Arm.PyTorch.MNIST.Inference.Python)) -```Python +Start by creating a new notebook named `pytorch-digits-model-optimisations.ipynb`. + +Then define the model architecture using the code below. You can also find the source code on [GitHub](https://github.com/dawidborycki/Arm.PyTorch.MNIST.Inference.Python) + +```python import torch from torch import nn from torchsummary import summary @@ -50,11 +53,15 @@ class NeuralNetwork(nn.Module): return x # Outputs raw logits ``` -This code defines a neural network in PyTorch for digit classification, consisting of three linear layers with ReLU activations and optional dropout layers for regularization. The network first flattens the input (a 28x28 image) and passes it through two linear layers, each followed by a ReLU activation and a dropout layer (if enabled). The final layer produces raw logits as the output. Notably, the **Softmax layer has been removed** to enable **quantization and layer fusion** during model optimization, allowing better performance when deploying the model on mobile or edge devices. The output is left as logits, and the softmax function can be applied during post-processing, particularly during inference. +This code defines a neural network in PyTorch for digit classification, consisting of three linear layers with ReLU activations and optional dropout layers for regularization. The network first flattens the input (a 28x28 image) and passes it through two linear layers, each followed by a ReLU activation and a dropout layer (if enabled). The final layer produces raw logits as the output. Notably, the softmax layer has been removed to enable quantization and layer fusion during model optimization, allowing better performance when deploying the model on mobile or edge devices. + +The output is left as logits, and the softmax function can be applied during post-processing, particularly during inference. -This model includes **dropout layers**, which are used during training to randomly set a portion of the neurons to zero in order to prevent overfitting and improve generalization. The `use_dropout` parameter allows you to **enable or disable dropout**, with the option to bypass dropout by replacing it with an `nn.Identity` layer when set to `False`, which is typically done during inference or quantization for more consistent behavior. +This model includes dropout layers, which are used during training to randomly set a portion of the neurons to zero in order to prevent overfitting and improve generalization. -Then, add the following lines to display the model arhitecture: +The `use_dropout` parameter allows you to enable or disable dropout, with the option to bypass dropout by replacing it with an `nn.Identity` layer when set to `False`, which is typically done during inference or quantization for more consistent behavior. + +Add the following lines to display the model architecture: ```Python model = NeuralNetwork() @@ -62,7 +69,8 @@ model = NeuralNetwork() summary(model, (1, 28, 28)) ``` -After running the code, you will see the following output: +After running the code, you see the following output: + ```output ---------------------------------------------------------------- Layer (type) Output Shape Param # @@ -88,6 +96,7 @@ Estimated Total Size (MB): 0.41 ``` The output shows the structure of the neural network, including the layers, their output shapes, and the number of parameters. + * The network starts with a Flatten layer, which reshapes the input from [1, 28, 28] to [1, 784] without adding any parameters. * This is followed by two Linear (fully connected) layers with ReLU activations and optional Dropout layers in between, contributing to the parameter count. * The first linear layer (from 784 to 96 units) has 75,360 parameters, while the second (from 96 to 256 units) has 24,832 parameters. @@ -95,7 +104,8 @@ The output shows the structure of the neural network, including the layers, thei * The total number of trainable parameters in the model is 102,762, with no non-trainable parameters. # Training the model -Now add the data loading, train, and test loops to actually train the model (this proceeds exactly the same as in the original model): + +Now add the data loading, train, and test loops to actually train the model. This proceeds exactly the same as in the original model: ``` from torchvision import transforms, datasets @@ -165,13 +175,21 @@ for t in range(epochs): test_loop(test_dataloader, model, loss_fn) ``` -In this script, we begin by preparing the MNIST dataset for training and testing our neural network model. Using the torchvision library, we download the MNIST dataset and apply a transformation to convert the images into tensors, making them suitable for input into the model. We create two data loaders: one for the training set and one for the test set, each configured with a batch size of 32. These data loaders allow us to easily feed batches of images into the model during training and testing. +You begin by preparing the MNIST dataset for training and testing our neural network model. + +Using the torchvision library, you download the MNIST dataset and apply a transformation to convert the images into tensors, making them suitable for input into the model. + +Next, create two data loaders: one for the training set and one for the test set, each configured with a batch size of 32. These data loaders allow you to easily feed batches of images into the model during training and testing. -Next, we define a training loop, which is the core of the model’s learning process. For each batch of images and labels, the model generates predictions, and we calculate the cross-entropy loss to measure how far off the predictions are from the true labels. The Adam optimizer is then used to perform backpropagation, updating the model’s weights to reduce this error. This process repeats for every batch in the training dataset, gradually improving the model’s accuracy over time. +Next, define a training loop, which is the core of the model’s learning process. For each batch of images and labels, the model generates predictions, and you calculate the cross-entropy loss to measure how far off the predictions are from the true labels. -To ensure our model is learning effectively, we also define a testing loop. Here, the model is evaluated on a separate set of test images that it hasn’t seen during training. We calculate both the average loss and the accuracy of the predictions, giving us a clear sense of how well the model is performing. Importantly, this evaluation is done without updating the model’s weights, as the goal is simply to measure its performance. +The Adam optimizer is used to perform backpropagation, updating the model's weights to reduce this error. The process repeats for every batch in the training dataset, gradually improving model accuracy over time. -Finally, we run the training and testing loops over the course of 10 epochs. With each epoch, the model trains on the full training dataset, and afterward, we test it to monitor its progress. By the end of the process, we expect the model to have learned to classify the MNIST digits with a high degree of accuracy, as reflected in the final test results. +To ensure the model is learning effectively, you also define a testing loop. + +Here, the model is evaluated on a separate set of test images that it hasn't seen during training. You calculate both the average loss and the accuracy of the predictions, giving a clear sense of how well the model is performing. Importantly, this evaluation is done without updating the model's weights, as the goal is simply to measure its performance. + +Finally, run the training and testing loops over the course of 10 epochs. With each epoch, the model trains on the full training dataset, and afterward, you test it to monitor its progress. By the end of the process, the model has learned to classify the MNIST digits with a high degree of accuracy, as reflected in the final test results. This setup efficiently trains and evaluates the model for digit classification, providing feedback after each epoch on accuracy and loss. @@ -211,4 +229,6 @@ Accuracy: 96.5%, Avg loss: 0.137004 The above shows a similar accuracy as the original model. -You now have the trained model with the modified architecture. In the next step you will optimise it for mobile inference. \ No newline at end of file +You now have the trained model with the modified architecture. + +In the next step you will optimize it for mobile inference. \ No newline at end of file diff --git a/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/model.md b/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/model.md index abfc9f117..d89fdb39e 100644 --- a/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/model.md +++ b/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/model.md @@ -7,7 +7,7 @@ weight: 3 layout: "learningpathall" --- -You can create and train a feedforward neural network to classify handwritten digits from the MNIST dataset. This dataset contains 70,000 images, comprising 60,000 training and 10,000 testing images, of handwritten numerals (0-9), each with dimensions of 28x28 pixels. Some representative MNIST digits with their corresponding labels are shown below. +You can create and train a feedforward neural network to classify handwritten digits from the MNIST dataset. This dataset contains 70,000 images, comprised of 60,000 training images and 10,000 testing images, of handwritten numerals (0-9), each with dimensions of 28x28 pixels. Some representative MNIST digits with their corresponding labels are shown below. ![img3](Figures/3.png) @@ -129,7 +129,7 @@ The output is still a probability distribution over the 10 digit classes (0-9), Technically, the code will run without errors as long as you provide it with an input image of the correct dimensions, which is 28x28 pixels. The model can accept input, pass it through the layers, and return a prediction - a vector of 10 probabilities. However, the results are not useful until the model is trained. -# What you have learned so far +# What have you learned so far? You have successfully defined and initialized a feedforward neural network using PyTorch. diff --git a/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/optimisation.md b/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/optimisation.md index 1f1dc8424..06778bf47 100644 --- a/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/optimisation.md +++ b/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/optimisation.md @@ -1,15 +1,15 @@ --- # User change -title: "Model optimisation" +title: "Run optimization" weight: 13 layout: "learningpathall" --- -To optimise the model proceed as follows. In the pytorch-digits-model-optimisations.ipynb add the following lines: +To optimize the model use the `pytorch-digits-model-optimisations.ipynb` to add the following lines: -```Python +```python from torch.utils.mobile_optimizer import optimize_for_mobile # Instantiate the model without Dropout layers @@ -46,10 +46,20 @@ optimized_model = optimize_for_mobile(traced_quantized_model) optimized_model._save_for_lite_interpreter("optimized_model.ptl") ``` -In this code, the neural network model is being prepared for optimization and quantization to make it more suitable for mobile deployment. First, the model is instantiated without **Dropout layers** by setting `use_dropout=False`, as dropout is typically disabled during inference. The model's trained weights are then loaded using the `load_state_dict()` function, ensuring that it retains the knowledge learned during training. The model is set to evaluation mode with `eval()` to prepare it for inference. +In this code, the neural network model is being prepared for optimization and quantization to make it more suitable for mobile deployment. -Next, the **quantization process** is configured. A **quantization configuration** is applied using the `qnnpack` backend, which is designed for efficient quantization on mobile devices. Certain layers of the model, specifically the linear layers and their corresponding activation functions (ReLU), are **fused** using `torch.quantization.fuse_modules()`. This fusion reduces the computational overhead by combining operations, a common optimization technique. +First, the model is instantiated without Dropout layers by setting `use_dropout=False`, as dropout is typically disabled during inference. The model's trained weights are then loaded using the `load_state_dict()` function, ensuring that it retains the knowledge learned during training. The model is set to evaluation mode with `eval()` to prepare it for inference. -After fusing the layers, the model is prepared for **static quantization** with `torch.quantization.prepare()`, which involves calibrating the model on the training data to collect statistics needed for quantization. The **calibration** phase runs the model on some training data without updating the weights. +Next, the quantization process is configured. -Once calibration is complete, the model is **converted to a quantized version** using `torch.quantization.convert()`. The quantized model is then **traced** with `torch.jit.trace()`, which captures the model’s computational graph. Finally, the traced model is **optimized for mobile** using `optimize_for_mobile()`, further refining it for performance on mobile devices. The optimized model is saved in a format suitable for the **PyTorch Lite Interpreter** for efficient deployment on mobile platforms. The result is an optimized and quantized model stored as `"optimized_model.ptl"`, ready for deployment. \ No newline at end of file +A quantization configuration is applied using the `qnnpack` backend, which is designed for efficient quantization on mobile devices. Certain layers of the model, specifically the linear layers and their corresponding activation functions (ReLU), are fused using `torch.quantization.fuse_modules()`. This fusion reduces the computational overhead by combining operations, a common optimization technique. + +After fusing the layers, the model is prepared for static quantization with `torch.quantization.prepare()`, which involves calibrating the model on the training data to collect statistics needed for quantization. The calibration phase runs the model on some training data without updating the weights. + +Once calibration is complete, the model is converted to a quantized version using `torch.quantization.convert()`. The quantized model is then traced with `torch.jit.trace()`, which captures the model’s computational graph. + +Finally, the traced model is optimized for mobile using `optimize_for_mobile()`, further refining it for performance on mobile devices. + +The optimized model is saved in a format suitable for the PyTorch Lite Interpreter for efficient deployment on mobile platforms. + +The result is an optimized and quantized model stored as `"optimized_model.ptl"`, ready for deployment. \ No newline at end of file diff --git a/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/prepare-data.md b/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/prepare-data.md index 8910fc434..c16cb4c20 100644 --- a/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/prepare-data.md +++ b/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/prepare-data.md @@ -2,18 +2,25 @@ # User change title: "Prepare Test Data" -weight: 8 +weight: 9 layout: "learningpathall" --- -In this section you will add the pre-trained model and prepare the data for the application. +In this section you will add the pre-trained model and copy the bitmap image data to the Android project. ## Model -To add the model, start by creating the assets folder under app/src/main. Then simply copy the pre-trained model you created in this [Learning Path](learning-paths/cross-platform/pytorch-digit-classification-training). The model is also available in [this repository](https://github.com/dawidborycki/Arm.PyTorch.MNIST.Inference.git) -## Data -To prepare the data, you use the following Python script: +To add the model, create a folder named `assets` in the `app/src/main` folder. + +Copy the pre-trained model you created in the previous steps, `model.pth` to the `assets` folder. + +The model is also available in the [GitHub repository](https://github.com/dawidborycki/Arm.PyTorch.MNIST.Inference.git) if you need to copy it. + +## Image data + +The data preparation script is shown below: + ```Python from torchvision import datasets, transforms from PIL import Image @@ -59,8 +66,16 @@ for i, (image, label) in enumerate(test_data): break ``` -The above code snippet processes the MNIST test dataset to generate and save bitmap images for digit classification. It defines constants for the number of unique digits (0-9) and the number of examples to collect per digit. The dataset is loaded using torchvision.datasets with a transformation to convert images to tensors. A directory named mnist_bitmaps is created to store the images. A dictionary tracks the number of collected examples for each digit. The code iterates through the dataset, converting each image tensor back to a PIL image, and saves two examples of each digit in the format digit_index_example_index.png. The loop breaks once the specified number of examples per digit is saved, ensuring that exactly 20 images (2 per digit) are generated and stored in the specified directory. +The above code processes the MNIST test dataset to generate and save bitmap images for digit classification. + +It defines constants for the number of unique digits (0-9) and the number of examples to collect per digit. The dataset is loaded using `torchvision.datasets` with a transformation to convert images to tensors. + +A directory named `mnist_bitmaps` is created to store the images. A dictionary tracks the number of collected examples for each digit. The code iterates through the dataset, converting each image tensor back to a PIL image, and saves two examples of each digit in the format `digit_index_example_index.png`. + +The loop breaks once the specified number of examples per digit is saved, ensuring that exactly 20 images (2 per digit) are generated and stored in the specified directory. + +For your convenience the data is included in the [GitHub repository](https://github.com/dawidborycki/Arm.PyTorch.MNIST.Inference.git) -For your convenience the data is included in [this repository](https://github.com/dawidborycki/Arm.PyTorch.MNIST.Inference.git) +Copy the `mnist_bitmaps` folder to the `assets` folder. -Once you have a model and data simply copy them under the assets folder of the Android application \ No newline at end of file +Once you have the `model.pth` and the `mnist_bitmaps` folder in the `assets` folder continue to the next step to run the Android application. \ No newline at end of file diff --git a/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/user-interface.md b/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/user-interface.md index 7df96f673..bcf84520b 100644 --- a/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/user-interface.md +++ b/content/learning-paths/cross-platform/pytorch-digit-classification-arch-training/user-interface.md @@ -1,29 +1,41 @@ --- # User change -title: "Android App" +title: "Create an Android application" -weight: 9 +weight: 8 layout: "learningpathall" --- -In this section you will create an Android App to run digit classifier. The application will load a randomly selected image containing a handwritten digit, and its true label. Then you will be able to run an inference on this image to predict the digit. +In this section you will create an Android application to run digit classification. + +The application randomly loads a selected image containing a handwritten digit and its true label. + +The application runs an inference on the image and predicts the digit value. + +## Create an Android project + +Start by creating a project: -Start by creating a project and an user interface: 1. Open Android Studio and create a new project with an “Empty Views Activity.” + 2. Set the project name to **ArmPyTorchMNISTInference**, set the package name to: **com.arm.armpytorchmnistinference**, select **Kotlin** as the language, and set the minimum SDK to **API 27 ("Oreo" Android 8.1)**. -We set the API to Android 8.1 (API level 27) as it introduced NNAPI, providing a standard interface for running computationally intensive machine learning models on Android devices. Devices with ARM-based SoCs and corresponding hardware accelerators can leverage NNAPI to offload ML tasks to specialized hardware, such as NPUs (Neural Processing Units), DSPs (Digital Signal Processors), or GPUs (Graphics Processing Units). +Set the API to Android 8.1 (API level 27) because this version introduced NNAPI, providing a standard interface for running computationally intensive machine learning models on Android devices. + +Devices with hardware accelerators can leverage NNAPI to offload ML tasks to specialized hardware, such as NPUs (Neural Processing Units), DSPs (Digital Signal Processors), or GPUs (Graphics Processing Units). + +## User interface design + +The user interface design contains the following: -## User interface -You will design the user interface to contain the following: -1. A header. -2. An ImageView and TextView to display the image and its true label. -3. A button to load the image. -4. A button to run inference. -5. Two TextView controls to display the predicted label and inference time. +- A header. +- `ImageView` and `TextView` sections to display the image and its true label. +- A button to load the image. +- A button to run inference. +- Two `TextView` controls to display the predicted label and inference time. -To do so, replace the contents of activity_main.xml (located under src/main/res/layout) with the following code: +Use the Android Studio editor to replace the contents of `activity_main.xml`, located in `src/main/res/layout` with the following code: ```XML @@ -97,12 +109,21 @@ To do so, replace the contents of activity_main.xml (located under src/main/res/ ``` -The provided XML code defines a user interface layout for an Android activity using a vertical LinearLayout. It includes several UI components arranged vertically with padding and centered alignment. At the top, there is a TextView acting as a header, displaying the text “Digit Recognition” in bold and with a large font size. Below the header, an ImageView is used to display an image, with a default source set to sample_image. This is followed by another TextView that shows the true label of the displayed image, initially set to “True Label: N/A”. +The above XML code defines a user interface layout for an Android activity using a vertical `LinearLayout`. It includes several UI components arranged vertically with padding and centered alignment. -The layout also contains two buttons: one labeled “Load Image” for selecting an input image, and another labeled “Run Inference” to execute the inference process on the selected image. At the bottom, there are two TextView elements to display the predicted label and the inference time, both initially set to “N/A”. The layout uses margins and appropriate sizes for each element to ensure a clean and organized appearance. +At the top, there is a `TextView` acting as a header, displaying the text `Digit Recognition` in bold and with a large font size. + +Below the header, an `ImageView` displays an image, with a default source set to `sample_image`. + +This is followed by another `TextView` that shows the true label of the displayed image, initially set to `True Label: N/A`. + +The layout also contains two buttons: one labeled `Load Image` for selecting an input image, and another labeled `Run Inference` to execute the inference process on the selected image. + +At the bottom, there are two `TextView` elements to display the predicted label and the inference time, both initially set to `N/A`. The layout uses margins and appropriate sizes for each element to ensure a clean and organized appearance. ## Add PyTorch to the project -Before going further you will need to add PyTorch do the Android project. To do so, open the build.gradle.kts (Module:app) file and add the following two lines under dependencies: + +Add PyTorch to the project by opening the `build.gradle.kts` file and adding the following two lines under dependencies: ```XML implementation("org.pytorch:pytorch_android:1.10.0") @@ -127,9 +148,13 @@ dependencies { ``` ## Logic implementation -You will now implement the logic for the application. This will include loading the pre-trained model, loading and displaying images, and running the inference. -Open the MainActivity.kt and modify it as follows: +You will now implement the logic for the application. + +This includes loading the pre-trained model, loading and displaying images, and running inference. + +Open `MainActivity.kt` and modify it as follows: + ```Kotlin package com.arm.armpytorchmnistinference @@ -288,20 +313,20 @@ class MainActivity : AppCompatActivity() { } ``` -The above Kotlin code defines an Android app activity (MainActivity) that performs inference on the MNIST dataset using a pre-trained PyTorch model. The app allows the user to load a random MNIST image from the assets and run the model to classify the image. +The above Kotlin code defines an Android app activity called `MainActivity` that performs inference on the MNIST dataset using a pre-trained PyTorch model. The app allows the user to load a random MNIST image from the `assets` folder and runs the model to classify the image. -The MainActivity class contains several methods. The first one, onCreate(Bundle?) is called when the activity is first created. It sets up the user interface by inflating the layout defined in activity_main.xml and initializes several UI components, including an ImageView to display the image, TextView controls to show the true label and predicted label, and two buttons (selectImageButton and runInferenceButton) to select an image and run inference, respectively. The method then loads the PyTorch model from the assets folder using the assetFilePath function and sets up click listeners for the buttons. The selectImageButton is configured to select a random image from the mnist_bitmaps folder, while the runInferenceButton runs the inference on the selected image. +The MainActivity class contains several methods. The first one, `onCreate()` is called when the activity is first created. It sets up the user interface by inflating the layout defined in `activity_main.xml` and initializes several UI components, including an `ImageView` to display the image, `TextView` controls to show the true label and predicted label, and two buttons (`selectImageButton` and `runInferenceButton`) to select an image and run inference. The method then loads the PyTorch model from the assets folder using the `assetFilePath()` function and sets up click listeners for the buttons. The `selectImageButton` is configured to select a random image from the `mnist_bitmaps` folder, while the `runInferenceButton` runs the inference on the selected image. -Next, the selectRandomImageFromAssets() method is responsible for selecting a random image from the mnist_bitmaps folder in the assets. It lists all the files in the folder, picks one at random, and loads it as a Bitmap. The method then extracts the true label from the filename (e.g., 07_00.png implies a true label of 7), displays the selected image in the ImageView, and updates the trueLabel TextView with the correct label. If there is an error loading the image or the folder is empty, an appropriate error message is displayed in the trueLabel TextView. +Next, the `selectRandomImageFromAssets()` method is responsible for selecting a random image from the `mnist_bitmaps` folder in the assets. It lists all the files in the folder, picks one at random, and loads it as a bitmap. The method then extracts the true label from the filename (e.g., 07_00.png implies a true label of 7), displays the selected image in the `ImageView`, and updates the `trueLabel TextView` with the correct label. If there is an error loading the image or the folder is empty, an appropriate error message is displayed in the `trueLabel TextView`. -Afterward, the createTensorFromBitmap(Bitmap) converts a grayscale bitmap of size 28x28 (an image from the MNIST dataset) into a PyTorch Tensor. First, the method verifies that the bitmap has the correct dimensions. Then, it extracts pixel data from the bitmap, normalizes each pixel value to a float in the range [0, 1], and stores the values in a float array. The method finally constructs and returns a tensor with the shape [1, 1, 28, 28], where 1 is the batch size, 1 is the number of channels (for grayscale), and 28 represents the width and height of the image. This is required to match the input expected by the model. +Afterward, the `createTensorFromBitmap()` converts a grayscale bitmap of size 28x28 (an image from the MNIST dataset) into a PyTorch Tensor. First, the method verifies that the bitmap has the correct dimensions. Then, it extracts pixel data from the bitmap, normalizes each pixel value to a float in the range [0, 1], and stores the values in a float array. The method finally constructs and returns a tensor with the shape [1, 1, 28, 28], where 1 is the batch size, 1 is the number of channels (for grayscale), and 28 represents the width and height of the image. This is required to match the input expected by the model. -Subsequently, we have the runInference method. It accepts a Bitmap as input and performs inference using the pre-trained PyTorch model. It first converts the bitmap to a tensor using the createTensorFromBitmap method. Then, it measures the time taken to run the forward pass of the model using the measureTimeMicros method. The output tensor from the model, which contains the scores for each digit class, is processed to determine the predicted label. This predicted label is displayed in the predictedLabel TextView. The method also updates the inferenceTime TextView with the time taken for the inference in microseconds. +Subsequently, we have the `runInference()` method. It accepts a bitmap as input and performs inference using the pre-trained PyTorch model. It first converts the bitmap to a tensor using the `createTensorFromBitmap()` method. Then, it measures the time taken to run the forward pass of the model using the `measureTimeMicros()` method. The output tensor from the model, which contains the scores for each digit class, is processed to determine the predicted label. This predicted label is displayed in the `predictedLabel TextView`. The method also updates the `inferenceTime TextView` with the time taken for the inference in microseconds. -Also, we have an inline function measureTimeMicros. It is a utility method that measures the execution time of the provided code block in microseconds. It uses the measureNanoTime function to get the execution time in nanoseconds and then converts it to microseconds by dividing the result by 1000. This method is used to measure the time taken for model inference in the runInference method. +Also, we have an inline function `measureTimeMicros()`. It is a utility method that measures the execution time of the provided code block in microseconds. It uses the `measureNanoTime()` function to get the execution time in nanoseconds and then converts it to microseconds by dividing the result by 1000. This method is used to measure the time taken for model inference in the `runInference()` method. -The assetFilePath method is a helper function that copies a file from the assets folder to the app’s internal storage and returns the absolute path of the copied file. This is necessary because PyTorch’s Module.load() method requires a file path, not an InputStream. The function reads the specified asset file, writes its contents to a file in the internal storage, and returns the path to this file. This method is used in onCreate to load the PyTorch model file (model.pth) from the assets. +The `assetFilePath()` method is a helper function that copies a file from the assets folder to the application's internal storage and returns the absolute path of the copied file. This is necessary because PyTorch’s `Module.load()` method requires a file path, not an InputStream. The function reads the specified asset file, writes its contents to a file in the internal storage, and returns the path to this file. This method is used in `onCreate()` to load the PyTorch model file, `model.pth`, from the `assets` folder. -The MainActivity class initializes the UI components, loads a pre-trained PyTorch model, and allows the user to select random MNIST images and run inference on them. Each method is designed to handle a specific aspect of the functionality, such as loading images, converting them to tensors, running inference, and measuring execution time. The code is modular and organized, making it easy to understand and maintain. +The `MainActivity` class initializes the UI components, loads a pre-trained PyTorch model, and allows the user to select random MNIST images and run inference on them. Each method is designed to handle a specific aspect of the functionality, such as loading images, converting them to tensors, running inference, and measuring execution time. The code is modular and organized, making it easy to understand and maintain. -To be able to successfully run the application we will need to add the model and prepare bitmaps. \ No newline at end of file +To be able to successfully run the application you need to add the model and prepare the bitmaps. Continue to see how to prepare the data. \ No newline at end of file