-
Notifications
You must be signed in to change notification settings - Fork 161
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1408 from dawidborycki/LP-PyTorch-Digit-Classific…
…ation-Android Extending LP on PyTorch to include android app and model optimisation
- Loading branch information
Showing
14 changed files
with
961 additions
and
3 deletions.
There are no files selected for viewing
Binary file added
BIN
+737 KB
...-paths/cross-platform/pytorch-digit-classification-arch-training/Figures/05.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+726 KB
...-paths/cross-platform/pytorch-digit-classification-arch-training/Figures/06.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+196 KB
...-paths/cross-platform/pytorch-digit-classification-arch-training/Figures/07.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+182 KB
...-paths/cross-platform/pytorch-digit-classification-arch-training/Figures/08.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
10 changes: 7 additions & 3 deletions
10
...rning-paths/cross-platform/pytorch-digit-classification-arch-training/_index.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
32 changes: 32 additions & 0 deletions
32
...learning-paths/cross-platform/pytorch-digit-classification-arch-training/app.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
--- | ||
# User change | ||
title: "Running an Application" | ||
|
||
weight: 10 | ||
|
||
layout: "learningpathall" | ||
--- | ||
|
||
You are now ready to run the application. You can use either an emulator or a physical device. In this guide, we will use an emulator. | ||
|
||
To run an app in Android Studio using an emulator, follow these steps: | ||
1. Configure the Emulator: | ||
* Go to Tools > Device Manager (or click the Device Manager icon on the toolbar). | ||
* Click Create Device to set up a new virtual device (if you haven’t done so already). | ||
* Choose a device model (e.g., Pixel 4) and click Next. | ||
* Select a system image (e.g., Android 11, API level 30) and click Next. | ||
* Review the settings and click Finish to create the emulator. | ||
|
||
2. Run the App: | ||
* Make sure the emulator is selected in the device dropdown menu in the toolbar (next to the “Run” button). | ||
* Click the Run button (a green triangle). Android Studio will build the app, install it on the emulator, and launch it. | ||
|
||
3. View the App on the Emulator: Once the app is installed, it will automatically open on the emulator screen, allowing you to interact with it as if it were running on a real device. | ||
|
||
Once the application is started, click the Load Image button. It will load a randomly selected image. Then, click Run Inference to recognize the digit. The application will display the predicted label and the inference time as shown below: | ||
|
||
![img](Figures/05.png) | ||
|
||
![img](Figures/06.png) | ||
|
||
In the next step you will learn how to further optimise the model. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
24 changes: 24 additions & 0 deletions
24
...aths/cross-platform/pytorch-digit-classification-arch-training/intro-android.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
--- | ||
# User change | ||
title: "Background for running inference on Android" | ||
|
||
weight: 7 | ||
|
||
layout: "learningpathall" | ||
--- | ||
|
||
Running pre-trained machine learning models on mobile and edge devices has become increasingly common as it enables these devices to gain intelligence and perform complex tasks directly on-device. This capability allows smartphones, IoT devices, and embedded systems to execute advanced functions such as image recognition, natural language processing, and real-time decision-making without relying on cloud-based services. By leveraging on-device inference, applications can offer faster responses, reduced latency, enhanced privacy, and offline functionality, making them more efficient and capable of handling sophisticated tasks in various environments. | ||
|
||
Arm provides a wide range of hardware and software accelerators designed to optimize the performance of machine learning (ML) models on edge devices. These include specialized processors like Arm's Neural Processing Units (NPUs) and Graphics Processing Units (GPUs), as well as software frameworks like the Arm Compute Library and Arm NN, which are tailored to leverage these hardware capabilities. Arm's technology is ubiquitous, powering a vast array of devices from smartphones and tablets to IoT gadgets and embedded systems. With Arm chips being the core of many Android-based smartphones and other devices, running ML models efficiently on this hardware is crucial for enabling advanced applications such as image recognition, voice assistance, and real-time analytics. By utilizing Arm’s accelerators, developers can achieve lower latency, reduced power consumption, and enhanced performance, making on-device AI both practical and powerful for a wide range of applications. | ||
|
||
Running a machine learning model on Android involves a few key steps. First, you need to train and save the model in a mobile-friendly format, such as TensorFlow Lite, ONNX, or TorchScript, depending on the framework you are using. Next, you add the model file to your Android project’s assets directory. In your app’s code, use the corresponding framework’s Android library, such as TensorFlow Lite or PyTorch Mobile, to load the model. You then prepare the input data, ensuring it is formatted and preprocessed in the same way as during model training. The input data is passed through the model, and the output predictions are retrieved and interpreted accordingly. For improved performance, you can leverage hardware acceleration using Android’s Neural Networks API (NNAPI) or use GPU support if available. This process enables the Android app to make real-time predictions and execute complex machine learning tasks directly on the device. | ||
|
||
In this Learning Path, you will learn how to perform such inference in the Android app using a pre-trained digit classifier, created [here](learning-paths/cross-platform/pytorch-digit-classification-training). | ||
|
||
## Before you begin | ||
Before you begin make sure Python3, [Visual Studio Code](https://code.visualstudio.com/download) and [Android Studio](https://developer.android.com/studio/install) are installed on your system. | ||
|
||
## Source code | ||
The complete source code is available [here](https://github.com/dawidborycki/Arm.PyTorch.MNIST.Inference.git). | ||
|
||
The Python scripts are available [here](https://github.com/dawidborycki/Arm.PyTorch.MNIST.Inference.Python.git) |
26 changes: 26 additions & 0 deletions
26
...ng-paths/cross-platform/pytorch-digit-classification-arch-training/intro-opt.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
--- | ||
# User change | ||
title: "Optimising neural network models in PyTorch" | ||
|
||
weight: 11 | ||
|
||
layout: "learningpathall" | ||
--- | ||
|
||
In the realm of machine learning (ML) for edge and mobile inference, optimizing models is crucial to achieving efficient performance while minimizing resource consumption. As mobile and edge devices often have limited computational power, memory, and energy availability, various strategies are employed to ensure that ML models can run effectively in these constrained environments. | ||
|
||
**Quantization** is one of the most widely used techniques, which reduces the precision of the model's weights and activations from floating-point to lower-bit representations, such as int8 or float16. This not only reduces the model size but also accelerates inference speed on hardware that supports lower precision arithmetic. | ||
|
||
Another key optimization strategy is **layer fusion**, where multiple operations, such as combining linear layers with their subsequent activation functions (like ReLU), into a single layer. This reduces the number of operations that need to be executed during inference, minimizing latency and improving throughput. | ||
|
||
In addition to these techniques, **pruning**, which involves removing less important weights or neurons from the model, can help in creating a leaner model that requires fewer resources without significantly affecting accuracy. | ||
|
||
Finally, leveraging hardware-specific optimizations, such as **using the Android Neural Networks API (NNAPI)** allows developers to take full advantage of the underlying hardware acceleration available on edge devices. By employing these strategies, developers can significantly enhance the efficiency of ML models for deployment on mobile and edge platforms, ensuring a balance between performance and resource utilization. | ||
|
||
PyTorch offers robust support for various optimization techniques that enhance the performance of machine learning models for edge and mobile inference. One of the key features is its quantization toolkit, which provides a streamlined workflow for applying quantization to models. PyTorch supports both static and dynamic quantization, allowing developers to reduce model size and improve inference speed without sacrificing accuracy. Additionally, PyTorch enables layer fusion through its torch.quantization module, enabling seamless integration of operations like fusing linear layers with their activation functions, thus optimizing execution by minimizing computational overhead. Furthermore, the TorchScript functionality allows for the creation of serializable and optimizable models that can be efficiently deployed on mobile devices. PyTorch’s integration with hardware acceleration libraries, such as NNAPI for Android, enables developers to leverage specific hardware capabilities, ensuring optimal model performance tailored to the device’s architecture. Overall, PyTorch provides a comprehensive ecosystem that empowers developers to implement effective optimizations for mobile and edge deployment, enhancing both speed and efficiency. | ||
|
||
In this Learning Path, we will delve into the techniques of **quantization** and **fusion** using our previously created neural network model for [digit classification](/learning-paths/cross-platform/pytorch-digit-classification-arch-training/). By applying quantization, we will reduce the model's weight precision, transitioning from floating-point representations to lower-bit formats, which not only minimizes the model size but also enhances inference speed. This process is crucial for optimizing our model for deployment on resource-constrained devices. | ||
|
||
Additionally, we will explore layer fusion, which combines multiple operations within the model—such as fusing linear layers with their subsequent activation functions—into a single operation. This reduction in operational complexity further streamlines the model, leading to improved performance during inference. By implementing these optimizations, we aim to enhance the efficiency of our digit classification model, making it well-suited for deployment in mobile and edge environments. | ||
|
||
First, we will modify our previous Python scripts for [both training and inference](/learning-paths/cross-platform/pytorch-digit-classification-arch-training/)to incorporate model optimizations like quantization and fusion. After adjusting the training pipeline to produce an optimized version of the model, we will also update our inference script to handle both the original and optimized models. Once these changes are made, we will modify the [Android app](pytorch-digit-classification-inference-android-app) to load either the original or optimized model based on user input, allowing us to switch between them dynamically. This setup will enable us to directly compare the inference speed of both models on the device, providing valuable insights into the performance benefits of model optimization techniques in real-world scenarios. |
Oops, something went wrong.