Skip to content

Commit

Permalink
Merge pull request #1428 from madeline-underwood/Profile-ML-Ops
Browse files Browse the repository at this point in the history
Profile the Performance of Machine Learning models on Arm_approved by Andy Pickard
  • Loading branch information
pareenaverma authored Dec 13, 2024
2 parents 7708c98 + aa7fcbe commit 7eeef96
Show file tree
Hide file tree
Showing 7 changed files with 187 additions and 103 deletions.
Original file line number Diff line number Diff line change
@@ -1,20 +1,21 @@
---
title: Profile the performance of ML models on Arm

draft: true
cascade:
draft: true
title: Profile the Performance of AI and ML Mobile Applications on Arm

minutes_to_complete: 60

who_is_this_for: This is an introductory topic for software developers who want to learn how to profile the performance of their ML models running on Arm devices.
who_is_this_for: This is an introductory topic for software developers who want to learn how to profile the performance of Machine Learning (ML) models running on Arm devices.

learning_objectives:
- Profile the execution times of ML models on Arm devices.
- Profile ML application performance on Arm devices.
- Describe how profiling can help optimize the performance of Machine Learning applications.

prerequisites:
- An Arm-powered Android smartphone, and USB cable to connect with it.
- An Arm-powered Android smartphone, and a USB cable to connect to it.
- For profiling the ML inference, [ArmNN's ExecuteNetwork](https://github.com/ARM-software/armnn/releases).
- For profiling the application, [Arm Performance Studio's Streamline](https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Studio).
- Android Studio Profiler.


author_primary: Ben Clark

Expand All @@ -28,7 +29,7 @@ armips:
- Immortalis
tools_software_languages:
- Android Studio
- tflite
- LiteRT
operatingsystems:
- Android
- Linux
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,35 +4,35 @@ review:
question: >
Streamline Profiling lets you profile:
answers:
- Arm CPU activity
- Arm GPU activity
- when your Neural Network is running
- All of the above
- Arm CPU activity.
- Arm GPU activity.
- When your Neural Network is running.
- All of the above.
correct_answer: 4
explanation: >
Streamline will show you CPU and GPU activity (and a lot more counters!), and if Custom Activity Maps are used, you can see when your Neural Network and other parts of your application are running.
Streamline shows you CPU and GPU activity (and a lot more counters!) and if Custom Activity Maps are used, you can see when your Neural Network and other parts of your application are running.
- questions:
question: >
Does Android Studio have a profiler?
answers:
- "Yes"
- "No"
- "Yes."
- "No."
correct_answer: 1
explanation: >
Yes, Android Studio has a built-in profiler that can be used to monitor the memory usage of your app among other things
Yes, Android Studio has a built-in profiler that can be used to monitor the memory usage of your application, amongst other functions.
- questions:
question: >
Is there a way to profile what is happening inside your Neural Network?
answers:
- Yes, Streamline just shows you out of the box
- No.
- Yes, ArmNN's ExecuteNetwork can do this
- Yes, Android Studio Profiler can do this
- Yes, Streamline just shows you out of the box.
- Yes, ArmNN's ExecuteNetwork can do this.
- Yes, Android Studio Profiler can do this.
correct_answer: 3
explanation: >
Standard profilers don't have an easy way to see what is happening inside an ML framework to see a model running inside it. ArmNN's ExecuteNetwork can do this for TensorFlow Lite models, and ExecuTorch has tools that can do this for PyTorch models.
Standard profilers do not have an easy way to see what is happening inside an ML framework to see a model running inside it. ArmNN's ExecuteNetwork can do this for LiteRT models, and ExecuTorch has tools that can do this for PyTorch models.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,39 +7,72 @@ layout: learningpathall
---

## Android Memory Profiling
Memory is often a problem in ML, with ever bigger models and data. For profiling an Android app's memory, Android Studio has a built-in profiler. This can be used to monitor the memory usage of your app, and to find memory leaks.
Memory is a common problem in ML, with ever-increasing model parameters and datasets. For profiling an Android app's memory, Android Studio has a built-in profiler. You can use this to monitor the memory usage of your app, and to detect memory leaks.

To find the Profiler, open your project in Android Studio and click on the *View* menu, then *Tool Windows*, and then *Profiler*. This opens the Profiler window. Attach your device in Developer Mode with a USB cable, and then you should be able to select your app's process. Here there are a number of different profiling tasks available.
### Set up the Profiler

Most likely with an Android ML app you'll need to look at memory both from the Java/Kotlin side and the native side. The Java/Kotlin side is where the app runs, and may be where buffers are allocated for input and output if, for example, you're using LiteRT (formerly known as TensorFlow Lite). The native side is where the ML framework will run. Looking at the memory consumption for Java/Kotlin and native is 2 separate tasks in the Profiler: *Track Memory Consumption (Java/Kotlin Allocations)* and *Track Memory Consumption (Native Allocations)*.
* To find the Profiler, open your project in Android Studio, and select the **View** menu.

Before you start either task, you have to build your app for profiling. The instructions for this and for general profiling setup can be found [here](https://developer.android.com/studio/profile). You will want to start the correct profiling version of the app depending on the task.
* Next, click **Tool Windows**, and then **Profiler**. This opens the Profiler window.

![Android Studio profiling run types alt-text#center](android-profiling-version.png "Figure 1. Profiling run versions")
* Attach your device in Developer Mode with a USB cable, and then select your app's process. There are a number of different profiling tasks available.

For the Java/Kotlin side, you want the **debuggable** "Profile 'app' with complete data", which is based off the debug variant. For the native side, you want the **profileable** "Profile 'app' with low overhead", which is based off the release variant.
Most likely with an Android ML app you will need to look at memory both from the Java/Kotlin side, and the native side:

* The Java/Kotlin side is where the app runs, and might be where buffers are allocated for input and output if, for example, you are using LiteRT.
* The native side is where the ML framework runs.

{{% notice Note %}}
Before you start either task, you must build your app for profiling. The instructions for this, and for general profiling setup can be found at [Profile your app performance](https://developer.android.com/studio/profile) on the Android Studio website. You need to start the correct profiling version of the app depending on the task.
{{% /notice %}}

Looking at the memory consumption for Java/Kotlin and native, there are two separate tasks in the Profiler:

* **Track Memory Consumption (Java/Kotlin Allocations)**.
* **Track Memory Consumption (Native Allocations)**.

![Android Studio profiling run types alt-text#center](android-profiling-version.png "Figure 3: Profiling Run Versions")

For the Java/Kotlin side, select **Profile 'app' with complete data**, which is based off the debug variant. For the native side, you want the **profileable** "Profile 'app' with low overhead", which is based off the release variant.

### Java/Kotlin

If you start looking at the [Java/Kotlin side](https://developer.android.com/studio/profile/record-java-kotlin-allocations), choose *Profiler: Run 'app' as debuggable*, and then select the *Track Memory Consumption (Java/Kotlin Allocations)* task. Navigate to the part of the app you wish to profile and then you can start profiling. At the bottom of the Profiling window it should look like Figure 2 below. Click *Start Profiler Task*.
To investigate the Java/Kotlin side, see the notes on [Record Java/Kotlin allocations](https://developer.android.com/studio/profile/record-java-kotlin-allocations).

Select **Profiler: Run 'app' as debuggable**, and then select the **Track Memory Consumption (Java/Kotlin Allocations)** task.

Navigate to the part of the app that you would like to profile, and then you can start profiling.

![Android Studio Start Profile alt-text#center](start-profile-dropdown.png "Figure 2. Start Profile")
The bottom of the profiling window should resemble Figure 4.

When you're ready, *Stop* the profiling again. Now there will be a nice timeline graph of memory usage. While Android Studio has a nicer interface for the Java/Kotlin side than the native side, the key to the timeline graph may be missing. This key is shown below in Figure 3, so you can refer to the colors from this.
![Android Studio memory key alt-text#center](profiler-jk-allocations-legend.png "Figure 3. Memory key for the Java/Kotlin Memory Timeline")
![Android Studio Start Profile alt-text#center](start-profile-dropdown.png "Figure 4: Start Profile")

The default height of the Profiling view, as well as the timeline graph within it is usually too small, so adjust these heights to get a sensible graph. You can click at different points of the graph to see the memory allocations at that time. If you look according to the key you can see how much memory is allocated by Java, Native, Graphics, Code etc.
Click **Start profiler task**.

Looking further down you can see the *Table* of Java/Kotlin allocations for your selected time on the timeline. With ML a lot of your allocations are likely to be byte[] for byte buffers, or possibly int[] for image data, etc. Clicking on the data type will open up the particular allocations, showing their size and when they were allocated. This will help to quickly narrow down their use, and whether they are all needed etc.
When you're ready, select *Stop* to stop the profiling again.

Now there will be a timeline graph of memory usage. While Android Studio has a more user-friendly interface for the Java/Kotlin side than the native side, the key to the timeline graph might be missing. This key is shown in Figure 3.

![Android Studio memory key alt-text#center](profiler-jk-allocations-legend.png "Figure 3: Memory key for the Java/Kotlin Memory Timeline")

If you prefer, you can adjust the default height of the profiling view, as well as the timeline graph within it, as they are usually too small.

Now click on different points of the graph to see the memory allocations at each specific time. Using the key on the graph, you can see how much memory is allocated by different categories of consumption, such as Java, Native, Graphics, and Code.

If you look further down, you can see the **Table** of Java/Kotlin allocations for your selected time on the timeline. With ML, many of your allocations are likely to be scenarios such as byte[] for byte buffers, or possibly int[] for image data. Clicking on the data type opens up the particular allocations, showing their size and when they were allocated. This will help to quickly narrow down their use, and whether they are all needed.

### Native

For the [native side](https://developer.android.com/studio/profile/record-native-allocations), the process is similar but with different options. Choose *Profiler: Run 'app' as profileable*, and then select the *Track Memory Consumption (Native Allocations)* task. Here you have to *Start profiler task from: Process Start*. Choose *Stop* once you've captured enough data.
For the [native side](https://developer.android.com/studio/profile/record-native-allocations), the process is similar but with different options. Select **Profiler: Run 'app' as profileable**, and then select the **Track Memory Consumption (Native Allocations)** task. Here you have to **Start profiler task from: Process Start**. Select **Stop** once you've captured enough data.

The Native view doesn't have the same nice timeline graph as the Java/Kotlin side, but it does have the *Table* and *Visualization* tabs. The *Table* tab no longer has a list of allocations, but options to *Arrange by allocation method* or *callstack*. Choose *Arrange by callstack* and then you can trace down which functions were allocating significant memory. Potentially more useful, you can also see Remaining Size.
The Native view does not provide the same kind of timeline graph as the Java/Kotlin side, but it does have the **Table** and **Visualization** tabs. The **Table** tab no longer has a list of allocations, but options to **Arrange by allocation method** or **callstack**. Select **Arrange by callstack** and then you can trace down which functions allocate significant memory resource. There is also the **Remaining Size** tab, which is arguably more useful.

In the Visualization tab you can see the callstack as a graph, and once again you can look at total Allocations Size or Remaining Size. If you look at Remaining Size, you can see what is still allocated at the end of the profiling, and by looking a few steps up the stack, probably see which allocations are related to the ML model, by seeing functions that relate to the framework you are using. A lot of the memory may be allocated by that framework rather than in your code, and you may not have much control over it, but it is useful to know where the memory is going.
In the **Visualization** tab, you can see the callstack as a graph, and once again you can look at total **Allocations Size** or **Remaining Size**. If you look at **Remaining Size**, you can see what remains allocated at the end of the profiling, and by looking a few steps up the stack, probably see which allocations are related to the ML model, by seeing functions that relate to the framework you are using. A lot of the memory may be allocated by that framework rather than in your code, and you may not have much control over it, but it is useful to know where the memory is going.

## Other platforms

On other platforms, you will need a different memory profiler. The objective of working out where the memory is being used is the same, and whether there are issues with leaks or just too much memory being used. There are often trade-offs between memory and speed, and they can be considered more sensibly if the numbers involved are known.
On other platforms, you will need a different memory profiler. The objective is the same; to investigate memory consumption in terms of identifying whether there are issues with leaks or if there is too much memory being used.

There are often trade-offs between memory and speed, and investigating memory consumption provides data that can help inform assessments of this balance.


Loading

0 comments on commit 7eeef96

Please sign in to comment.