Skip to content

Commit

Permalink
Merge pull request #1408 from dawidborycki/LP-PyTorch-Digit-Classific…
Browse files Browse the repository at this point in the history
…ation-Android

Extending LP on PyTorch to include android app and model optimisation
  • Loading branch information
jasonrandrews authored Dec 18, 2024
2 parents 9365362 + b9d2095 commit 0668d9f
Show file tree
Hide file tree
Showing 14 changed files with 961 additions and 3 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -1,16 +1,20 @@
---
title: Create and train a PyTorch model for digit classification

minutes_to_complete: 80
minutes_to_complete: 160

who_is_this_for: This is an introductory topic for software developers interested in learning how to use PyTorch to create and train a feedforward neural network for digit classification.
who_is_this_for: This is an introductory topic for software developers interested in learning how to use PyTorch to create and train a feedforward neural network for digit classification. Also you will learn how to use the trained model in an android app. Last, you will discover model optimisations.

learning_objectives:
- Prepare a PyTorch development environment.
- Download and prepare the MNIST dataset.
- Create a neural network architecture using PyTorch.
- Train a neural network using PyTorch.

- Creating an Android app and loading the pre-trained mdoel.
- Preparing an input dataset.
- Measuring the inference time.
- Optimise a neural network architecture using quantization and fusing.
- Use an optimised model in an Android app.
prerequisites:
- A computer that can run Python3 and Visual Studio Code. The OS can be Windows, Linux, or macOS.

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
---
# User change
title: "Running an Application"

weight: 10

layout: "learningpathall"
---

You are now ready to run the application. You can use either an emulator or a physical device. In this guide, we will use an emulator.

To run an app in Android Studio using an emulator, follow these steps:
1. Configure the Emulator:
* Go to Tools > Device Manager (or click the Device Manager icon on the toolbar).
* Click Create Device to set up a new virtual device (if you haven’t done so already).
* Choose a device model (e.g., Pixel 4) and click Next.
* Select a system image (e.g., Android 11, API level 30) and click Next.
* Review the settings and click Finish to create the emulator.

2. Run the App:
* Make sure the emulator is selected in the device dropdown menu in the toolbar (next to the “Run” button).
* Click the Run button (a green triangle). Android Studio will build the app, install it on the emulator, and launch it.

3. View the App on the Emulator: Once the app is installed, it will automatically open on the emulator screen, allowing you to interact with it as if it were running on a real device.

Once the application is started, click the Load Image button. It will load a randomly selected image. Then, click Run Inference to recognize the digit. The application will display the predicted label and the inference time as shown below:

![img](Figures/05.png)

![img](Figures/06.png)

In the next step you will learn how to further optimise the model.
Original file line number Diff line number Diff line change
Expand Up @@ -110,3 +110,5 @@ After training, you saved the model using TorchScript, which captures both the m
Next, you performed inference. You loaded the saved model and set it to evaluation mode to ensure that layers like dropout and batch normalization behaved correctly during inference. You randomly selected 16 images from the MNIST test dataset to evaluate the model’s performance on unseen data. For each selected image, you used the model to predict the digit, comparing the predicted labels with the actual ones. You displayed the images alongside their actual and predicted labels in a 4x4 grid, visually assessing the model’s accuracy and performance.

This comprehensive process, from model training and saving to inference and visualization, illustrates the end-to-end workflow for building and deploying a machine learning model in PyTorch. It demonstrates how to train a model, save it in a portable format, and then use it to make predictions on new data.

In the next step, you will learn how to use the model in the mobile Android application.
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
# User change
title: "Background for running inference on Android"

weight: 7

layout: "learningpathall"
---

Running pre-trained machine learning models on mobile and edge devices has become increasingly common as it enables these devices to gain intelligence and perform complex tasks directly on-device. This capability allows smartphones, IoT devices, and embedded systems to execute advanced functions such as image recognition, natural language processing, and real-time decision-making without relying on cloud-based services. By leveraging on-device inference, applications can offer faster responses, reduced latency, enhanced privacy, and offline functionality, making them more efficient and capable of handling sophisticated tasks in various environments.

Arm provides a wide range of hardware and software accelerators designed to optimize the performance of machine learning (ML) models on edge devices. These include specialized processors like Arm's Neural Processing Units (NPUs) and Graphics Processing Units (GPUs), as well as software frameworks like the Arm Compute Library and Arm NN, which are tailored to leverage these hardware capabilities. Arm's technology is ubiquitous, powering a vast array of devices from smartphones and tablets to IoT gadgets and embedded systems. With Arm chips being the core of many Android-based smartphones and other devices, running ML models efficiently on this hardware is crucial for enabling advanced applications such as image recognition, voice assistance, and real-time analytics. By utilizing Arm’s accelerators, developers can achieve lower latency, reduced power consumption, and enhanced performance, making on-device AI both practical and powerful for a wide range of applications.

Running a machine learning model on Android involves a few key steps. First, you need to train and save the model in a mobile-friendly format, such as TensorFlow Lite, ONNX, or TorchScript, depending on the framework you are using. Next, you add the model file to your Android project’s assets directory. In your app’s code, use the corresponding framework’s Android library, such as TensorFlow Lite or PyTorch Mobile, to load the model. You then prepare the input data, ensuring it is formatted and preprocessed in the same way as during model training. The input data is passed through the model, and the output predictions are retrieved and interpreted accordingly. For improved performance, you can leverage hardware acceleration using Android’s Neural Networks API (NNAPI) or use GPU support if available. This process enables the Android app to make real-time predictions and execute complex machine learning tasks directly on the device.

In this Learning Path, you will learn how to perform such inference in the Android app using a pre-trained digit classifier, created [here](learning-paths/cross-platform/pytorch-digit-classification-training).

## Before you begin
Before you begin make sure Python3, [Visual Studio Code](https://code.visualstudio.com/download) and [Android Studio](https://developer.android.com/studio/install) are installed on your system.

## Source code
The complete source code is available [here](https://github.com/dawidborycki/Arm.PyTorch.MNIST.Inference.git).

The Python scripts are available [here](https://github.com/dawidborycki/Arm.PyTorch.MNIST.Inference.Python.git)
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
---
# User change
title: "Optimising neural network models in PyTorch"

weight: 11

layout: "learningpathall"
---

In the realm of machine learning (ML) for edge and mobile inference, optimizing models is crucial to achieving efficient performance while minimizing resource consumption. As mobile and edge devices often have limited computational power, memory, and energy availability, various strategies are employed to ensure that ML models can run effectively in these constrained environments.

**Quantization** is one of the most widely used techniques, which reduces the precision of the model's weights and activations from floating-point to lower-bit representations, such as int8 or float16. This not only reduces the model size but also accelerates inference speed on hardware that supports lower precision arithmetic.

Another key optimization strategy is **layer fusion**, where multiple operations, such as combining linear layers with their subsequent activation functions (like ReLU), into a single layer. This reduces the number of operations that need to be executed during inference, minimizing latency and improving throughput.

In addition to these techniques, **pruning**, which involves removing less important weights or neurons from the model, can help in creating a leaner model that requires fewer resources without significantly affecting accuracy.

Finally, leveraging hardware-specific optimizations, such as **using the Android Neural Networks API (NNAPI)** allows developers to take full advantage of the underlying hardware acceleration available on edge devices. By employing these strategies, developers can significantly enhance the efficiency of ML models for deployment on mobile and edge platforms, ensuring a balance between performance and resource utilization.

PyTorch offers robust support for various optimization techniques that enhance the performance of machine learning models for edge and mobile inference. One of the key features is its quantization toolkit, which provides a streamlined workflow for applying quantization to models. PyTorch supports both static and dynamic quantization, allowing developers to reduce model size and improve inference speed without sacrificing accuracy. Additionally, PyTorch enables layer fusion through its torch.quantization module, enabling seamless integration of operations like fusing linear layers with their activation functions, thus optimizing execution by minimizing computational overhead. Furthermore, the TorchScript functionality allows for the creation of serializable and optimizable models that can be efficiently deployed on mobile devices. PyTorch’s integration with hardware acceleration libraries, such as NNAPI for Android, enables developers to leverage specific hardware capabilities, ensuring optimal model performance tailored to the device’s architecture. Overall, PyTorch provides a comprehensive ecosystem that empowers developers to implement effective optimizations for mobile and edge deployment, enhancing both speed and efficiency.

In this Learning Path, we will delve into the techniques of **quantization** and **fusion** using our previously created neural network model for [digit classification](/learning-paths/cross-platform/pytorch-digit-classification-arch-training/). By applying quantization, we will reduce the model's weight precision, transitioning from floating-point representations to lower-bit formats, which not only minimizes the model size but also enhances inference speed. This process is crucial for optimizing our model for deployment on resource-constrained devices.

Additionally, we will explore layer fusion, which combines multiple operations within the model—such as fusing linear layers with their subsequent activation functions—into a single operation. This reduction in operational complexity further streamlines the model, leading to improved performance during inference. By implementing these optimizations, we aim to enhance the efficiency of our digit classification model, making it well-suited for deployment in mobile and edge environments.

First, we will modify our previous Python scripts for [both training and inference](/learning-paths/cross-platform/pytorch-digit-classification-arch-training/)to incorporate model optimizations like quantization and fusion. After adjusting the training pipeline to produce an optimized version of the model, we will also update our inference script to handle both the original and optimized models. Once these changes are made, we will modify the [Android app](pytorch-digit-classification-inference-android-app) to load either the original or optimized model based on user input, allowing us to switch between them dynamically. This setup will enable us to directly compare the inference speed of both models on the device, providing valuable insights into the performance benefits of model optimization techniques in real-world scenarios.
Loading

0 comments on commit 0668d9f

Please sign in to comment.