From 393756d3d6325324b5b258fc625cac548426b2bb Mon Sep 17 00:00:00 2001 From: Maddy Underwood <167196745+madeline-underwood@users.noreply.github.com> Date: Tue, 3 Dec 2024 17:36:51 +0000 Subject: [PATCH 01/19] Editorial. --- .../smartphones-and-mobile/profiling-ml-on-arm/_index.md | 6 +++--- .../profiling-ml-on-arm/app-profiling-android-studio.md | 4 ++-- .../profiling-ml-on-arm/app-profiling-streamline.md | 9 +++++++-- .../profiling-ml-on-arm/why-profile.md | 8 ++++---- 4 files changed, 16 insertions(+), 11 deletions(-) diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_index.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_index.md index 1271520b0..b027109c1 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_index.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_index.md @@ -1,16 +1,16 @@ --- -title: Profile the performance of ML models on Arm +title: Profile the Performance of ML models on Arm minutes_to_complete: 60 -who_is_this_for: This is an introductory topic for software developers who want to learn how to profile the performance of their ML models running on Arm devices. +who_is_this_for: This is an introductory topic for software developers who want to learn how to profile the performance of ML models running on Arm devices. learning_objectives: - Profile the execution times of ML models on Arm devices. - Profile ML application performance on Arm devices. prerequisites: - - An Arm-powered Android smartphone, and USB cable to connect with it. + - An Arm-powered Android smartphone, and a USB cable to connect to it. author_primary: Ben Clark diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-android-studio.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-android-studio.md index 9f8508f3a..5bb20a96c 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-android-studio.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-android-studio.md @@ -7,9 +7,9 @@ layout: learningpathall --- ## Android Memory Profiling -Memory is often a problem in ML, with ever bigger models and data. For profiling an Android app's memory, Android Studio has a built-in profiler. This can be used to monitor the memory usage of your app, and to find memory leaks. +Memory is often a problem in ML, with ever-bigger models and data. For profiling an Android app's memory, Android Studio has a built-in profiler. This can be used to monitor the memory usage of your app, and to detect memory leaks. -To find the Profiler, open your project in Android Studio and click on the *View* menu, then *Tool Windows*, and then *Profiler*. This opens the Profiler window. Attach your device in Developer Mode with a USB cable, and then you should be able to select your app's process. Here there are a number of different profiling tasks available. +To find the Profiler, open your project in Android Studio, and select the **View** menu. Next, click *Tool Windows*, and then *Profiler*. This opens the Profiler window. Attach your device in Developer Mode with a USB cable, and then you should be able to select your app's process. Here there are a number of different profiling tasks available. Most likely with an Android ML app you'll need to look at memory both from the Java/Kotlin side and the native side. The Java/Kotlin side is where the app runs, and may be where buffers are allocated for input and output if, for example, you're using LiteRT (formerly known as TensorFlow Lite). The native side is where the ML framework will run. Looking at the memory consumption for Java/Kotlin and native is 2 separate tasks in the Profiler: *Track Memory Consumption (Java/Kotlin Allocations)* and *Track Memory Consumption (Native Allocations)*. diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-streamline.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-streamline.md index e55e4e172..6712ef7f2 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-streamline.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-streamline.md @@ -7,9 +7,14 @@ layout: learningpathall --- ## Application Profiling -Application profiling can be split into 2 main types - *Instrumentation* and *Sampling*. [Streamline](https://developer.arm.com/Tools%20and%20Software/Streamline%20Performance%20Analyzer), for example, is a sampling profiler, that takes regular samples of various counters and registers in the system to provide a detailed view of the system's performance. Sampling will only provide a statistical view, but it is less intrusive and has less processing overhead than instrumentation. +Application profiling can be split into two main types: -The profiler can look at memory, CPU activity and cycles, cache misses, and many parts of the GPU as well as other performance metrics. It can also provide a timeline view of these counters to show the application's performance over time. This will show bottlenecks, and help you understand where to focus your optimization efforts. +* Instrumentation. +* Sampling. + +[Streamline](https://developer.arm.com/Tools%20and%20Software/Streamline%20Performance%20Analyzer), for example, is a sampling profiler, that takes regular samples of various counters and registers in the system to provide a detailed view of the system's performance. Sampling only provides a statistical view, but it is less intrusive and has less processing overhead than instrumentation. + +The profiler can look at memory, CPU activity and cycles, cache misses, and many parts of the GPU, as well as other performance metrics. It can also provide a timeline-view of these counters to show the application's performance over time. This can reveal bottlenecks, and can help you to understand where to focus your optimization efforts. ![Streamline image alt-text#center](Streamline.png "Figure 1. Streamline timeline view") diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/why-profile.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/why-profile.md index 7d688a4ad..9ecdd421b 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/why-profile.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/why-profile.md @@ -7,15 +7,15 @@ layout: learningpathall --- ## Performance -Working out what is taking the time and memory in your application is the first step to getting the performance you want. Profiling can help you identify the bottlenecks in your application and understand how to optimize it. +Working out what is consuming time and memory in your application is the first step to achieving the performance you want. Profiling can help you identify the bottlenecks in your application, and understand how to optimize it. -With Machine Learning (ML) applications, the inference of the Neural Network (NN) itself is often the heaviest part of the application in terms of computation and memory usage. This is not guaranteed however, so it is important to profile the application as a whole to see if pre- or post-processing or other code is an issue. +With Machine Learning (ML) applications, the inference of the Neural Network (NN) is often the heaviest part of the application in terms of computation and memory usage. This is not guaranteed however, so it is important to profile the application as a whole to see if pre- or post-processing or other code is an issue. -In this Learning Path, you will profile an Android example using TFLite, but most of the steps shown will also work with Linux and cover a wide range of Arm devices. The principles for profiling your application are the same for use with other inference engines and platforms, but the tools are different. +In this Learning Path, you will profile an Android example using TFLite, but most of the steps listed will also work with Linux, and cover a wide range of Arm devices. The principles for profiling your application are the same for use with other inference engines and platforms, but the tools are different. ## Tools -You will need to use different tools to profile the ML inference or the application's performance running on your Arm device. +You need to use different tools to profile the ML inference or the application's performance running on your Arm device. For profiling the ML inference, you will use [ArmNN](https://github.com/ARM-software/armnn/releases)'s ExecuteNetwork. From d9fbc711f21e3e39d3fe734d2b12fe014b0ef11a Mon Sep 17 00:00:00 2001 From: Maddy Underwood <167196745+madeline-underwood@users.noreply.github.com> Date: Wed, 4 Dec 2024 05:41:31 +0000 Subject: [PATCH 02/19] Editorial. --- .../profiling-ml-on-arm/app-profiling-streamline.md | 4 ++-- .../nn-profiling-executenetwork.md | 12 ++++++------ .../profiling-ml-on-arm/why-profile.md | 12 ++++++------ 3 files changed, 14 insertions(+), 14 deletions(-) diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-streamline.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-streamline.md index 6712ef7f2..c369871e8 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-streamline.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-streamline.md @@ -14,9 +14,9 @@ Application profiling can be split into two main types: [Streamline](https://developer.arm.com/Tools%20and%20Software/Streamline%20Performance%20Analyzer), for example, is a sampling profiler, that takes regular samples of various counters and registers in the system to provide a detailed view of the system's performance. Sampling only provides a statistical view, but it is less intrusive and has less processing overhead than instrumentation. -The profiler can look at memory, CPU activity and cycles, cache misses, and many parts of the GPU, as well as other performance metrics. It can also provide a timeline-view of these counters to show the application's performance over time. This can reveal bottlenecks, and can help you to understand where to focus your optimization efforts. +The profiler looks at memory, CPU activity and cycles, cache misses, and many parts of the GPU, as well as other performance metrics. It can also provide a timeline-view of these counters to show any changes in the application's performance. This can reveal bottlenecks, and can help you to understand where to focus your optimization efforts. -![Streamline image alt-text#center](Streamline.png "Figure 1. Streamline timeline view") +![Streamline image alt-text#center](Streamline.png "Figure 1. Streamline Timeline View") ## Example Android Application diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/nn-profiling-executenetwork.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/nn-profiling-executenetwork.md index f4ca26994..29b4e5621 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/nn-profiling-executenetwork.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/nn-profiling-executenetwork.md @@ -1,5 +1,5 @@ --- -title: ML profiling of a tflite model with ExecuteNetwork +title: ML profiling of a LiteRT model with ExecuteNetwork weight: 6 ### FIXED, DO NOT MODIFY @@ -7,11 +7,11 @@ layout: learningpathall --- ## ArmNN's Network Profiler -One way of running tflite models is with ArmNN. This is available as a delegate to the standard tflite interpreter. But to profile the model, ArmNN comes with a command-line utility called `ExecuteNetwork`. This program just runs the model without the rest of the app. It is able to output layer timings and other useful information to let you know where there might be bottlenecks within your model. +One way of running LiteRT models is with ArmNN. This is available as a delegate to the standard LiteRT interpreter. But to profile the model, ArmNN comes with a command-line utility called `ExecuteNetwork`. This program just runs the model without the rest of the app. It is able to output layer timings and other useful information to let you know where there might be bottlenecks within your model. -If you are using tflite without ArmNN, then the output from `ExecuteNetwork` will be more of an indication than a definitive answer. But it can still be useful to spot any obvious problems. +If you are using LiteRT without ArmNN, then the output from `ExecuteNetwork` will be more of an indication than a definitive answer. But it can still be useful to spot any obvious problems. -To try this out, you can download a tflite model from the [Arm Model Zoo](https://github.com/ARM-software/ML-zoo). In this Learning Path, you will download [mobilenet tflite](https://github.com/ARM-software/ML-zoo/blob/master/models/image_classification/mobilenet_v2_1.0_224/tflite_int8/mobilenet_v2_1.0_224_INT8.tflite). +To try this out, you can download a LiteRT model from the [Arm Model Zoo](https://github.com/ARM-software/ML-zoo). In this Learning Path, you will download [mobilenet tflite](https://github.com/ARM-software/ML-zoo/blob/master/models/image_classification/mobilenet_v2_1.0_224/tflite_int8/mobilenet_v2_1.0_224_INT8.tflite). To get `ExecuteNetwork` you can download it from the [ArmNN GitHub](https://github.com/ARM-software/armnn/releases). Download the version appropriate for the Android phone you wish to test on - the Android version and the architecture of the phone. If you are unsure of the architecture, you can use a lower one, but you may miss out on some optimizations. Inside the `tar.gz` archive that you download, `ExecuteNetwork` is included. Note among the other release downloads on the ArmNN Github is the separate file for the `aar` delegate which is the easy way to include the ArmNN delegate into your app. @@ -38,13 +38,13 @@ chmod 777 ExecuteNetwork chmod 777 *.so ``` -Now you can run ExecuteNetwork to profile the model. With the example tflite, you can use the following command: +Now you can run ExecuteNetwork to profile the model. With the example LiteRT, you can use the following command: ```bash LD_LIBRARY_PATH=. ./ExecuteNetwork -m mobilenet_v2_1.0_224_INT8.tflite -c CpuAcc -T delegate --iterations 2 --do-not-print-output --enable-fast-math --fp16-turbo-mode -e --output-network-details > modelout.txt ``` -If you are using your own tflite, replace `mobilenet_v2_1.0_224_INT8.tflite` with the name of your tflite file. +If you are using your own LiteRT, replace `mobilenet_v2_1.0_224_INT8.tflite` with the name of your tflite file. This will run the model twice, outputting the layer timings to `modelout.txt`. The `--iterations 2` flag is the command that means it runs twice: the first run includes a lot of startup costs and one-off optimizations, so the second run is more indicative of the real performance. diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/why-profile.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/why-profile.md index 9ecdd421b..8f99fe2e7 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/why-profile.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/why-profile.md @@ -7,17 +7,17 @@ layout: learningpathall --- ## Performance -Working out what is consuming time and memory in your application is the first step to achieving the performance you want. Profiling can help you identify the bottlenecks in your application, and understand how to optimize it. +Working out what is consuming time and memory in your application is the first step to achieving the performance that you want. Profiling can help you identify the bottlenecks in your application, and give you clues as how to optimize it. -With Machine Learning (ML) applications, the inference of the Neural Network (NN) is often the heaviest part of the application in terms of computation and memory usage. This is not guaranteed however, so it is important to profile the application as a whole to see if pre- or post-processing or other code is an issue. +With Machine Learning (ML) applications, the inference of the Neural Network (NN) is often the heaviest part of the application in terms of computation and memory usage. This is not always the case however, so it is important to profile the application as a whole to detect other possible issues in pre- or post-processing, or the code. -In this Learning Path, you will profile an Android example using TFLite, but most of the steps listed will also work with Linux, and cover a wide range of Arm devices. The principles for profiling your application are the same for use with other inference engines and platforms, but the tools are different. +In this Learning Path, you will profile an Android example using LiteRT, but most of the steps also work with Linux, and a wide range of Arm devices. The principles for profiling your application apply to many other inference engines and platforms - only the tools are different. ## Tools -You need to use different tools to profile the ML inference or the application's performance running on your Arm device. +You need two different tools to profile the ML inference or the application's performance running on your Arm device. -For profiling the ML inference, you will use [ArmNN](https://github.com/ARM-software/armnn/releases)'s ExecuteNetwork. +* For profiling the ML inference, you will use [ArmNN](https://github.com/ARM-software/armnn/releases)'s ExecuteNetwork. -For profiling the application as a whole, you will use [Arm Performance Studio](https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Studio)'s Streamline, and the Android Studio Profiler. +* For profiling the application as a whole, you will use [Arm Performance Studio](https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Studio)'s Streamline, and the Android Studio Profiler. From 9fbae553065e49d6069e9ec48c90980830ebd13c Mon Sep 17 00:00:00 2001 From: Maddy Underwood <167196745+madeline-underwood@users.noreply.github.com> Date: Wed, 4 Dec 2024 06:43:40 +0000 Subject: [PATCH 03/19] Editorial. --- .../profiling-ml-on-arm/_index.md | 6 ++++- .../app-profiling-streamline.md | 25 ++++++++++++++----- .../profiling-ml-on-arm/why-profile.md | 15 ++++++----- 3 files changed, 31 insertions(+), 15 deletions(-) diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_index.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_index.md index b027109c1..164379de8 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_index.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_index.md @@ -11,6 +11,10 @@ learning_objectives: prerequisites: - An Arm-powered Android smartphone, and a USB cable to connect to it. + - For profiling the ML inference, [ArmNN](https://github.com/ARM-software/armnn/releases)'s ExecuteNetwork. + - For profiling the application as a whole, [Arm Performance Studio](https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Studio)'s Streamline. + - Android Studio Profiler. + author_primary: Ben Clark @@ -24,7 +28,7 @@ armips: - Immortalis tools_software_languages: - Android Studio - - tflite + - LiteRT operatingsystems: - Android - Linux diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-streamline.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-streamline.md index c369871e8..e7a6fd87b 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-streamline.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-streamline.md @@ -12,21 +12,34 @@ Application profiling can be split into two main types: * Instrumentation. * Sampling. -[Streamline](https://developer.arm.com/Tools%20and%20Software/Streamline%20Performance%20Analyzer), for example, is a sampling profiler, that takes regular samples of various counters and registers in the system to provide a detailed view of the system's performance. Sampling only provides a statistical view, but it is less intrusive and has less processing overhead than instrumentation. +[Streamline](https://developer.arm.com/Tools%20and%20Software/Streamline%20Performance%20Analyzer), is an example of a sampling profiler, that takes regular samples of various counters and registers in the system to provide a detailed view of the system's performance. -The profiler looks at memory, CPU activity and cycles, cache misses, and many parts of the GPU, as well as other performance metrics. It can also provide a timeline-view of these counters to show any changes in the application's performance. This can reveal bottlenecks, and can help you to understand where to focus your optimization efforts. +Sampling only provides a statistical view, but it is less intrusive and has less processing overhead than instrumentation. + +The profiler looks at memory, CPU activity and cycles, cache misses, and many parts of the GPU, as well as other performance metrics. + +It can also provide a timeline-view of these counters to show any changes in the application's performance, which can reveal bottlenecks, and help you identify where to focus your optimization efforts. ![Streamline image alt-text#center](Streamline.png "Figure 1. Streamline Timeline View") ## Example Android Application -In this Learning Path, you will use profile [an example Android application](https://github.com/dawidborycki/Arm.PyTorch.MNIST.Inference) using Streamline. -Start by cloning the repository containing this example on your machine and open it in a recent Android Studio. It is generally safest to not update the Gradle version when prompted. +In this Learning Path, you will use profile [an example Android application](https://github.com/dawidborycki/Arm.PyTorch.MNIST.Inference) using Streamline. + +Start by cloning the repository containing this example on your machine, and open it in a recent version of Android Studio. + +{{% notice Note %}} +It is generally safest to not update the Gradle version when prompted. +{{% /notice %}} ## Streamline -You will install Streamline and Performance Studio on your host machine and connect to your target Arm device to capture the data. In this example, the target device is an Arm-powered Android phone. The data is captured over a USB connection, and then analyzed on your host machine. +You will install Streamline and Performance Studio on your host machine and connect to your target Arm device to capture the data. + +In this example, the target device is an Arm-powered Android phone. The data is captured over a USB connection, and then analyzed on your host machine. + +For more details on Streamline usage you can refer to these [tutorials and training videos](https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Studio). -For more details on Streamline usage you can refer to these [tutorials and training videos](https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Studio). While the example you are running is based on Android, you can use [the setup and capture instructions for Linux](https://developer.arm.com/documentation/101816/0903/Getting-started-with-Streamline/Profile-your-Linux-application). +While the example you are running is based on Android, you can use [the Setup and Capture Instructions for Linux](https://developer.arm.com/documentation/101816/0903/Getting-started-with-Streamline/Profile-your-Linux-application). First, follow these [setup instructions](https://developer.arm.com/documentation/102477/0900/Setup-tasks?lang=en), to make sure you have `adb` (Android Debug Bridge) installed. If you have installed [Android Studio](https://developer.android.com/studio), you will have installed adb already. Otherwise, you can get it as part of the Android SDK platform tools [here](https://developer.android.com/studio/releases/platform-tools.html). diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/why-profile.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/why-profile.md index 8f99fe2e7..e8ef73bf3 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/why-profile.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/why-profile.md @@ -7,17 +7,16 @@ layout: learningpathall --- ## Performance -Working out what is consuming time and memory in your application is the first step to achieving the performance that you want. Profiling can help you identify the bottlenecks in your application, and give you clues as how to optimize it. +One first step towards achieving the performance that you want is to identify what is consuming time and memory in your application. Profiling can help you identify the bottlenecks in your application, and provide clues about how to optimize operations. -With Machine Learning (ML) applications, the inference of the Neural Network (NN) is often the heaviest part of the application in terms of computation and memory usage. This is not always the case however, so it is important to profile the application as a whole to detect other possible issues in pre- or post-processing, or the code. +With Machine Learning (ML) applications, whilst the inference of the Neural Network (NN) is often the heaviest part of the application in terms of computation and memory usage, it is not necessarily always the case, so it is important to profile the application as a whole to detect other possible issues, such as with pre- or post-processing, or the code. -In this Learning Path, you will profile an Android example using LiteRT, but most of the steps also work with Linux, and a wide range of Arm devices. The principles for profiling your application apply to many other inference engines and platforms - only the tools are different. +In this Learning Path, you will profile an Android example using LiteRT, but most of the steps also work with Linux, and a wide range of Arm devices. -## Tools +The principles for profiling an application apply to many other inference engines and , only the tools differ. -You need two different tools to profile the ML inference or the application's performance running on your Arm device. +{{% notice Note %}} +LiteRT is the new name for TensorFlow Lite, or TFLite. +{{% /notice %}} -* For profiling the ML inference, you will use [ArmNN](https://github.com/ARM-software/armnn/releases)'s ExecuteNetwork. - -* For profiling the application as a whole, you will use [Arm Performance Studio](https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Studio)'s Streamline, and the Android Studio Profiler. From 4259d4c7e492ab1222b04f00a082617112718652 Mon Sep 17 00:00:00 2001 From: Maddy Underwood <167196745+madeline-underwood@users.noreply.github.com> Date: Wed, 4 Dec 2024 12:08:16 +0000 Subject: [PATCH 04/19] Editorial. --- .../profiling-ml-on-arm/_index.md | 4 ++-- .../app-profiling-streamline.md | 14 +++++++------- .../profiling-ml-on-arm/why-profile.md | 8 ++++---- 3 files changed, 13 insertions(+), 13 deletions(-) diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_index.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_index.md index 164379de8..db4c506e8 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_index.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_index.md @@ -1,9 +1,9 @@ --- -title: Profile the Performance of ML models on Arm +title: Profile the Performance of Machine Learning models on Arm minutes_to_complete: 60 -who_is_this_for: This is an introductory topic for software developers who want to learn how to profile the performance of ML models running on Arm devices. +who_is_this_for: This is an introductory topic for software developers who want to learn how to profile the performance of Machine Learning (ML) models running on Arm devices. learning_objectives: - Profile the execution times of ML models on Arm devices. diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-streamline.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-streamline.md index e7a6fd87b..995af2508 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-streamline.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-streamline.md @@ -12,9 +12,9 @@ Application profiling can be split into two main types: * Instrumentation. * Sampling. -[Streamline](https://developer.arm.com/Tools%20and%20Software/Streamline%20Performance%20Analyzer), is an example of a sampling profiler, that takes regular samples of various counters and registers in the system to provide a detailed view of the system's performance. +[Streamline](https://developer.arm.com/Tools%20and%20Software/Streamline%20Performance%20Analyzer)is an example of a sampling profiler that takes regular samples of various counters and registers in the system to provide a detailed view of the system's performance. -Sampling only provides a statistical view, but it is less intrusive and has less processing overhead than instrumentation. +Whilst sampling only provides a statistical view, it is less intrusive and has less processing overhead than instrumentation. The profiler looks at memory, CPU activity and cycles, cache misses, and many parts of the GPU, as well as other performance metrics. @@ -37,11 +37,11 @@ You will install Streamline and Performance Studio on your host machine and conn In this example, the target device is an Arm-powered Android phone. The data is captured over a USB connection, and then analyzed on your host machine. -For more details on Streamline usage you can refer to these [tutorials and training videos](https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Studio). +For more details on Streamline usage, you can refer to these [tutorials and training videos](https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Studio). -While the example you are running is based on Android, you can use [the Setup and Capture Instructions for Linux](https://developer.arm.com/documentation/101816/0903/Getting-started-with-Streamline/Profile-your-Linux-application). +While the example that you are running is based on Android, you can use [the Setup and Capture Instructions for Linux](https://developer.arm.com/documentation/101816/0903/Getting-started-with-Streamline/Profile-your-Linux-application). -First, follow these [setup instructions](https://developer.arm.com/documentation/102477/0900/Setup-tasks?lang=en), to make sure you have `adb` (Android Debug Bridge) installed. If you have installed [Android Studio](https://developer.android.com/studio), you will have installed adb already. Otherwise, you can get it as part of the Android SDK platform tools [here](https://developer.android.com/studio/releases/platform-tools.html). +Firstly, follow these [setup instructions](https://developer.arm.com/documentation/102477/0900/Setup-tasks?lang=en), to make sure you have `adb` (Android Debug Bridge) installed. If you have installed [Android Studio](https://developer.android.com/studio), you will have installed adb already. Otherwise, you can get it as part of the Android SDK platform tools [here](https://developer.android.com/studio/releases/platform-tools.html). Make sure `adb` is in your path. You can check this by running `adb` in a terminal. If it is not in your path, you can add it by installing the [Android SDK `platform-tools`](https://developer.android.com/tools/releases/platform-tools#downloads) directory to your path. @@ -49,9 +49,9 @@ Next, install [Arm Performance Studio](https://developer.arm.com/Tools%20and%20S Connect your Android phone to your host machine through USB. Ensure that your Android phone is set to [Developer mode](https://developer.android.com/studio/debug/dev-options). -On your phone, go to `Settings > Developer Options` and enable USB Debugging. If your phone asks you to authorize connection to your host machine, confirm this. Test the connection by running `adb devices` in a terminal. You should see your device ID listed. +On your phone, go to **Settings** > **Developer Options** and enable USB Debugging. If your phone asks you to authorize connection to your host machine, confirm this. Test the connection by running `adb devices` in a terminal. You should see your device ID listed. -Next, you need a debuggable build of the application you want to profile. +Next, you need a debuggable-build of the application that you want to profile. - In Android Studio, ensure your *Build Variant* is set to `debug`. You can then build the application and install it on your device. - For a Unity app, select Development Build under File > Build Settings when building your application. - In Unreal Engine, open Project Settings > Project > Packaging > Project, and ensure that the For Distribution checkbox is not set. diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/why-profile.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/why-profile.md index e8ef73bf3..363d88cd2 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/why-profile.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/why-profile.md @@ -7,13 +7,13 @@ layout: learningpathall --- ## Performance -One first step towards achieving the performance that you want is to identify what is consuming time and memory in your application. Profiling can help you identify the bottlenecks in your application, and provide clues about how to optimize operations. +A first step towards achieving the performance that you want in a ML Model is to identify what is consuming time and memory in your application. Profiling can help you identify the bottlenecks in your application, and provide clues about how to optimize operations. -With Machine Learning (ML) applications, whilst the inference of the Neural Network (NN) is often the heaviest part of the application in terms of computation and memory usage, it is not necessarily always the case, so it is important to profile the application as a whole to detect other possible issues, such as with pre- or post-processing, or the code. +With Machine Learning (ML) applications, whilst the inference of the Neural Network (NN) is often the heaviest part of the application in terms of computation and memory usage, it is not necessarily always the case. It is therefore important to profile the application as a whole to detect other possible issues that can negatively impact performance, such as issues with pre- or post-processing, or the code itself. -In this Learning Path, you will profile an Android example using LiteRT, but most of the steps also work with Linux, and a wide range of Arm devices. +In this Learning Path, you will profile an Android example using LiteRT. Most of the steps are transferable and also work with Linux, and across a wide range of Arm devices. -The principles for profiling an application apply to many other inference engines and , only the tools differ. +The principles for profiling an application apply to many other inference engines and ?? , only the tools differ. {{% notice Note %}} LiteRT is the new name for TensorFlow Lite, or TFLite. From 6288308710f989ec6132c334c4ed340038f2f34a Mon Sep 17 00:00:00 2001 From: Maddy Underwood <167196745+madeline-underwood@users.noreply.github.com> Date: Wed, 4 Dec 2024 12:58:47 +0000 Subject: [PATCH 05/19] Editorial review. --- .../profiling-ml-on-arm/_review.md | 20 ++++++++-------- .../app-profiling-streamline.md | 23 +++++++++++-------- .../profiling-ml-on-arm/why-profile.md | 10 ++++---- 3 files changed, 28 insertions(+), 25 deletions(-) diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_review.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_review.md index 7eae5a8b1..8ce2fe609 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_review.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_review.md @@ -4,23 +4,23 @@ review: question: > Streamline Profiling lets you profile: answers: - - Arm CPU activity - - Arm GPU activity - - when your Neural Network is running - - All of the above + - Arm CPU activity. + - Arm GPU activity. + - When your Neural Network is running. + - All of the above. correct_answer: 4 explanation: > - Streamline will show you CPU and GPU activity (and a lot more counters!), and if Custom Activity Maps are used, you can see when your Neural Network and other parts of your application are running. + Streamline shows you CPU and GPU activity (and a lot more counters!), and if Custom Activity Maps are used, you can see when your Neural Network and other parts of your application are running. - questions: question: > Does Android Studio have a profiler? answers: - - "Yes" - - "No" + - "Yes." + - "No." correct_answer: 1 explanation: > - Yes, Android Studio has a built-in profiler that can be used to monitor the memory usage of your app among other things + Yes, Android Studio has a built-in profiler which can be used to monitor the memory usage of your application, among other things. - questions: question: > @@ -28,8 +28,8 @@ review: answers: - Yes, Streamline just shows you out of the box - No. - - Yes, ArmNN's ExecuteNetwork can do this - - Yes, Android Studio Profiler can do this + - Yes, ArmNN's ExecuteNetwork can do this. + - Yes, Android Studio Profiler can do this. correct_answer: 3 explanation: > Standard profilers don't have an easy way to see what is happening inside an ML framework to see a model running inside it. ArmNN's ExecuteNetwork can do this for TensorFlow Lite models, and ExecuTorch has tools that can do this for PyTorch models. diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-streamline.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-streamline.md index 995af2508..f2ac667cb 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-streamline.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-streamline.md @@ -26,7 +26,8 @@ It can also provide a timeline-view of these counters to show any changes in the In this Learning Path, you will use profile [an example Android application](https://github.com/dawidborycki/Arm.PyTorch.MNIST.Inference) using Streamline. -Start by cloning the repository containing this example on your machine, and open it in a recent version of Android Studio. +* Start by cloning the repository containing this example on your machine. +* Open it in a recent version of Android Studio. {{% notice Note %}} It is generally safest to not update the Gradle version when prompted. @@ -37,11 +38,11 @@ You will install Streamline and Performance Studio on your host machine and conn In this example, the target device is an Arm-powered Android phone. The data is captured over a USB connection, and then analyzed on your host machine. -For more details on Streamline usage, you can refer to these [tutorials and training videos](https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Studio). +For more information on Streamline usage, see [Tutorials and Training Videos](https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Studio). While the example that you are running is based on Android, you can use [the Setup and Capture Instructions for Linux](https://developer.arm.com/documentation/101816/0903/Getting-started-with-Streamline/Profile-your-Linux-application). -Firstly, follow these [setup instructions](https://developer.arm.com/documentation/102477/0900/Setup-tasks?lang=en), to make sure you have `adb` (Android Debug Bridge) installed. If you have installed [Android Studio](https://developer.android.com/studio), you will have installed adb already. Otherwise, you can get it as part of the Android SDK platform tools [here](https://developer.android.com/studio/releases/platform-tools.html). +Firstly, follow these [Setup Instructions](https://developer.arm.com/documentation/102477/0900/Setup-tasks?lang=en), to make sure you have `adb` (Android Debug Bridge) installed. If you have installed [Android Studio](https://developer.android.com/studio), you will have adb installed already. Otherwise, you can get it as part of the Android SDK platform tools [here](https://developer.android.com/studio/releases/platform-tools.html). Make sure `adb` is in your path. You can check this by running `adb` in a terminal. If it is not in your path, you can add it by installing the [Android SDK `platform-tools`](https://developer.android.com/tools/releases/platform-tools#downloads) directory to your path. @@ -49,17 +50,19 @@ Next, install [Arm Performance Studio](https://developer.arm.com/Tools%20and%20S Connect your Android phone to your host machine through USB. Ensure that your Android phone is set to [Developer mode](https://developer.android.com/studio/debug/dev-options). -On your phone, go to **Settings** > **Developer Options** and enable USB Debugging. If your phone asks you to authorize connection to your host machine, confirm this. Test the connection by running `adb devices` in a terminal. You should see your device ID listed. +On your phone, go to **Settings** > **Developer Options** and enable **USB Debugging**. If your phone asks you to authorize connection to your host machine, confirm authorization. Test the connection by running `adb devices` in a terminal. You should see your device ID listed. Next, you need a debuggable-build of the application that you want to profile. -- In Android Studio, ensure your *Build Variant* is set to `debug`. You can then build the application and install it on your device. -- For a Unity app, select Development Build under File > Build Settings when building your application. -- In Unreal Engine, open Project Settings > Project > Packaging > Project, and ensure that the For Distribution checkbox is not set. -- In the general case, you can set `android:debuggable=true` in the application manifest file. +- In Android Studio, ensure your **Build Variant** is set to **debug**. You can then build the application, and install it on your device. +- For a Unity app, select **Development Build** under **File** > **Build Settings** when building your application. +- In Unreal Engine, open **Project Settings** > **Project** > **Packaging** > **Project**, and ensure that the **For Distribution** checkbox is clear. +- Generally, you can set `android:debuggable=true` in the application manifest file. -For the example application that you cloned earlier, the Build Variant is `debug` by default, but you can verify this by going to `Build > Select Build Variant` in Android Studio. Build and install this application on your device. +For the example application that you cloned earlier, the Build Variant is **debug** by default, but you can verify this by going to **Build** > **Select Build Variant** in Android Studio. -You can now run Streamline and [capture a profile](https://developer.arm.com/documentation/102477/0900/Capture-a-profile?lang=en) of your application. But before you do, lets add some useful annotations to your code that can help with more specific performance analysis of your application. +Build and install this application on your device. + +You can now run Streamline and [capture a profile](https://developer.arm.com/documentation/102477/0900/Capture-a-profile?lang=en) of your application. But before you do, you can add some useful annotations to your code that can help with more specific performance analysis of your application. ## Custom Annotations diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/why-profile.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/why-profile.md index 363d88cd2..760098185 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/why-profile.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/why-profile.md @@ -1,19 +1,19 @@ --- -title: Why do you need to profile your ML application? +title: Why should you profile your ML application? weight: 2 ### FIXED, DO NOT MODIFY layout: learningpathall --- -## Performance -A first step towards achieving the performance that you want in a ML Model is to identify what is consuming time and memory in your application. Profiling can help you identify the bottlenecks in your application, and provide clues about how to optimize operations. +## Optimizing Performance +A first step towards achieving performance requirements in a ML Model is to identify what is consuming the most time and memory in your application. Profiling can help you identify the bottlenecks, and it can offer clues about how to optimize operations. With Machine Learning (ML) applications, whilst the inference of the Neural Network (NN) is often the heaviest part of the application in terms of computation and memory usage, it is not necessarily always the case. It is therefore important to profile the application as a whole to detect other possible issues that can negatively impact performance, such as issues with pre- or post-processing, or the code itself. -In this Learning Path, you will profile an Android example using LiteRT. Most of the steps are transferable and also work with Linux, and across a wide range of Arm devices. +In this Learning Path, you will profile an Android example using LiteRT. Most of the steps are transferable and work with Linux, and across a wide range of Arm devices. -The principles for profiling an application apply to many other inference engines and ?? , only the tools differ. +The principles for profiling an application apply to many other inference engines and platforms, only the tools differ. {{% notice Note %}} LiteRT is the new name for TensorFlow Lite, or TFLite. From d0c9bfb41ab2d5ba937fa80cd217ae8d34016238 Mon Sep 17 00:00:00 2001 From: Maddy Underwood <167196745+madeline-underwood@users.noreply.github.com> Date: Wed, 4 Dec 2024 15:19:12 +0000 Subject: [PATCH 06/19] Editorial. --- .../smartphones-and-mobile/profiling-ml-on-arm/_review.md | 2 +- .../profiling-ml-on-arm/nn-profiling-general.md | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_review.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_review.md index 8ce2fe609..7bca269a0 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_review.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_review.md @@ -26,7 +26,7 @@ review: question: > Is there a way to profile what is happening inside your Neural Network? answers: - - Yes, Streamline just shows you out of the box + - Yes, Streamline just shows you out of the box. - No. - Yes, ArmNN's ExecuteNetwork can do this. - Yes, Android Studio Profiler can do this. diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/nn-profiling-general.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/nn-profiling-general.md index 91a35381f..6ece889ab 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/nn-profiling-general.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/nn-profiling-general.md @@ -7,9 +7,9 @@ layout: learningpathall --- ## Profiling your model -App profilers will give you a good overall view of your performance, but often you might want to look inside the model and work out bottlenecks within the network. The network is often the bulk of the time, in which case it will warrant closer analysis. +App profilers provide a good overall view of performance, but you might want to look inside the model and identify bottlenecks within the network. The network is often where the bulk of the bottlenecks lie, so it often warrants closer analysis. -With general profilers this is hard to do, as there needs to be annotations inside the ML framework code to get the information. It is a large task to write the profiling annotations throughout the framework, so it is easier to use tools from a framework or inference engine that already has the required instrumentation. +With general profilers this is hard to do, as there needs to be annotation inside the ML framework code to retrieve the information. It is a large task to write the profiling annotation throughout the framework, so it is easier to use tools from a framework or inference engine that already has the required instrumentation. Depending on your model, your choice of tools will differ. For example, if you are using LiteRT (formerly TensorFlow Lite), Arm provides the ArmNN delegate that you can run with the model running on Linux or Android, CPU or GPU. ArmNN in turn provides a tool called `ExecuteNetwork` that can run the model and give you layer timings among other useful information. From 2ec8d0c1808b54842b7c35da405561f9d5c739f3 Mon Sep 17 00:00:00 2001 From: Maddy Underwood <167196745+madeline-underwood@users.noreply.github.com> Date: Wed, 4 Dec 2024 17:02:11 +0000 Subject: [PATCH 07/19] Editorial. --- .../profiling-ml-on-arm/nn-profiling-general.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/nn-profiling-general.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/nn-profiling-general.md index 6ece889ab..09fc32e18 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/nn-profiling-general.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/nn-profiling-general.md @@ -6,11 +6,13 @@ weight: 5 layout: learningpathall --- -## Profiling your model -App profilers provide a good overall view of performance, but you might want to look inside the model and identify bottlenecks within the network. The network is often where the bulk of the bottlenecks lie, so it often warrants closer analysis. +## Tools you can use +App profilers provide a good overall view of performance, but you might want to look inside the model and identify bottlenecks within the network. The network is often where the bulk of the bottlenecks lie, so it warrants closer analysis. With general profilers this is hard to do, as there needs to be annotation inside the ML framework code to retrieve the information. It is a large task to write the profiling annotation throughout the framework, so it is easier to use tools from a framework or inference engine that already has the required instrumentation. -Depending on your model, your choice of tools will differ. For example, if you are using LiteRT (formerly TensorFlow Lite), Arm provides the ArmNN delegate that you can run with the model running on Linux or Android, CPU or GPU. ArmNN in turn provides a tool called `ExecuteNetwork` that can run the model and give you layer timings among other useful information. +Depending on the model you use, your choice of tools will vary. For example, if you are using LiteRT (formerly TensorFlow Lite), Arm provides the ArmNN delegate that you can run with the model running on Linux or Android, CPU or GPU. -If you are using PyTorch, you will probably use ExecuTorch the ons-device inference runtime for your Android phone. ExecuTorch has a profiler available alongside it. +ArmNN in turn provides a tool called ExecuteNetwork that can run the model and provide layer timings, amongst other useful information. + +If you are using PyTorch, you will probably use ExecuTorch, which is the on-device inference runtime for your Android phone. ExecuTorch has a profiler available alongside it. From 1cef5aa7959c06371cd36787bd00b89acf0394c1 Mon Sep 17 00:00:00 2001 From: Maddy Underwood <167196745+madeline-underwood@users.noreply.github.com> Date: Wed, 4 Dec 2024 18:25:07 +0000 Subject: [PATCH 08/19] Editorial. --- .../app-profiling-android-studio.md | 23 ++++++++++++++----- .../app-profiling-streamline.md | 10 ++++---- 2 files changed, 22 insertions(+), 11 deletions(-) diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-android-studio.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-android-studio.md index 5bb20a96c..2c7320ebb 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-android-studio.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-android-studio.md @@ -7,17 +7,28 @@ layout: learningpathall --- ## Android Memory Profiling -Memory is often a problem in ML, with ever-bigger models and data. For profiling an Android app's memory, Android Studio has a built-in profiler. This can be used to monitor the memory usage of your app, and to detect memory leaks. +Memory is a common problem in ML, with ever-increasing model parameters and datasets. For profiling an Android app's memory, Android Studio has a built-in profiler. You can use this to monitor the memory usage of your app, and to detect memory leaks. -To find the Profiler, open your project in Android Studio, and select the **View** menu. Next, click *Tool Windows*, and then *Profiler*. This opens the Profiler window. Attach your device in Developer Mode with a USB cable, and then you should be able to select your app's process. Here there are a number of different profiling tasks available. +### Set up the Profiler -Most likely with an Android ML app you'll need to look at memory both from the Java/Kotlin side and the native side. The Java/Kotlin side is where the app runs, and may be where buffers are allocated for input and output if, for example, you're using LiteRT (formerly known as TensorFlow Lite). The native side is where the ML framework will run. Looking at the memory consumption for Java/Kotlin and native is 2 separate tasks in the Profiler: *Track Memory Consumption (Java/Kotlin Allocations)* and *Track Memory Consumption (Native Allocations)*. +* To find the Profiler, open your project in Android Studio, and select the **View** menu. -Before you start either task, you have to build your app for profiling. The instructions for this and for general profiling setup can be found [here](https://developer.android.com/studio/profile). You will want to start the correct profiling version of the app depending on the task. +* Next, click **Tool Windows**, and then **Profiler**. This opens the Profiler window. -![Android Studio profiling run types alt-text#center](android-profiling-version.png "Figure 1. Profiling run versions") +* Attach your device in Developer Mode with a USB cable, and then select your app's process. Here there are a number of different profiling tasks available. -For the Java/Kotlin side, you want the **debuggable** "Profile 'app' with complete data", which is based off the debug variant. For the native side, you want the **profileable** "Profile 'app' with low overhead", which is based off the release variant. +Most likely with an Android ML app you will need to look at memory both from the Java/Kotlin side, and the native side. The Java/Kotlin side is where the app runs, and might be where buffers are allocated for input and output if, for example, you are using LiteRT (formerly known as TensorFlow Lite). The native side is where the ML framework runs. + +Looking at the memory consumption for Java/Kotlin and native, there are two separate tasks in the Profiler: + +* **Track Memory Consumption (Java/Kotlin Allocations)**. +* **Track Memory Consumption (Native Allocations)**. + +Before you start either task, you must build your app for profiling. The instructions for this, and for general profiling setup can be found [here](https://developer.android.com/studio/profile). You need to start the correct profiling version of the app depending on the task. + +![Android Studio profiling run types alt-text#center](android-profiling-version.png "Figure 1. Profiling Run Versions") + +For the Java/Kotlin side, select **Profile 'app' with complete data", which is based off the debug variant. For the native side, you want the **profileable** "Profile 'app' with low overhead", which is based off the release variant. ### Java/Kotlin diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-streamline.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-streamline.md index f2ac667cb..d9557401b 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-streamline.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-streamline.md @@ -58,21 +58,21 @@ Next, you need a debuggable-build of the application that you want to profile. - In Unreal Engine, open **Project Settings** > **Project** > **Packaging** > **Project**, and ensure that the **For Distribution** checkbox is clear. - Generally, you can set `android:debuggable=true` in the application manifest file. -For the example application that you cloned earlier, the Build Variant is **debug** by default, but you can verify this by going to **Build** > **Select Build Variant** in Android Studio. +For the example application that you cloned earlier, the Build Variant is `debug` by default, but you can verify this by going to **Build** > **Select Build Variant** in Android Studio. Build and install this application on your device. -You can now run Streamline and [capture a profile](https://developer.arm.com/documentation/102477/0900/Capture-a-profile?lang=en) of your application. But before you do, you can add some useful annotations to your code that can help with more specific performance analysis of your application. +You are now able to run Streamline and [capture a profile](https://developer.arm.com/documentation/102477/0900/Capture-a-profile?lang=en) of your application. But before you do, you can add some useful annotations to your code that enables more specific performance analysis of your application. ## Custom Annotations -In Streamline, it is possible to add custom annotations to the timeline view. This can be useful to mark the start and end of specific parts of your application, or to mark when a specific event occurs. This can help you understand the performance of your application in relation to these events. At the bottom of *Figure 1* above there are custom annotations to show when inference, pre-processing, and post-processing are happening. +In Streamline, it is possible to add custom annotations to the timeline view. This can be useful to mark the start and end of specific parts of your application, or to mark when a specific event occurs. This then allows you to view the performance of your application in relation to these events. At the bottom of *Figure 1* there are custom annotations to show when inference, pre-processing, and post-processing are happening. -To add annotations, you will need to add some files into your project from the **gator** daemon that Streamline uses. These files are named `streamline_annotate.c`, `streamline_annotate.h` and `streamline_annotate_logging.h` and made available [here](https://github.com/ARM-software/gator/tree/main/annotate). Using these annotations, you will be able to show log strings, markers, counters and Custom Activity Maps. WIthin your example project, create a `cpp` folder under the `app/src/main` folder, and add these three files there. +To add annotations, you will need to add some files into your project from the **gator** daemon that Streamline uses. These files are named `streamline_annotate.c`, `streamline_annotate.h`, and `streamline_annotate_logging.h` and made available [here](https://github.com/ARM-software/gator/tree/main/annotate). Using these annotations, you can see log strings, markers, counters, and Custom Activity Maps. Within your example project, create a `cpp` folder under the `app/src/main` folder, and add these three files there. These files are written in C, so if your Android Studio project is in Java or Kotlin, you will need to add a C library to your project. This is slightly trickier than just adding a Java or Kotlin file, but it is not difficult. You can find instructions on how to do this [here](https://developer.android.com/studio/projects/add-native-code). -Create a file in the `app/src/main/cpp/` folder under your project and name it `annotate_jni_wrapper.c`. This will be a wrapper around the gator daemon's functions, and will be called from your Kotlin code. Copy the code below into this file. You can also create very similar wrapper functions for other gator daemon functions. +Create a file in the `app/src/main/cpp/` folder under your project, and name it `annotate_jni_wrapper.c`. This will be a wrapper around the gator daemon's functions, and will be called from your Kotlin code. Copy the code below into this file. You can also create similar wrapper functions for other gator daemon functions. ```c #include From 1b165ac636c56e7f5498f16eac9fe90d96f74162 Mon Sep 17 00:00:00 2001 From: Maddy Underwood <167196745+madeline-underwood@users.noreply.github.com> Date: Wed, 4 Dec 2024 19:46:33 +0000 Subject: [PATCH 09/19] Editorial. --- .../app-profiling-streamline.md | 33 +++++++++---------- .../profiling-ml-on-arm/why-profile.md | 4 +-- 2 files changed, 18 insertions(+), 19 deletions(-) diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-streamline.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-streamline.md index d9557401b..f2b6db3f6 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-streamline.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-streamline.md @@ -9,32 +9,31 @@ layout: learningpathall ## Application Profiling Application profiling can be split into two main types: -* Instrumentation. * Sampling. +* Instrumentation. [Streamline](https://developer.arm.com/Tools%20and%20Software/Streamline%20Performance%20Analyzer)is an example of a sampling profiler that takes regular samples of various counters and registers in the system to provide a detailed view of the system's performance. Whilst sampling only provides a statistical view, it is less intrusive and has less processing overhead than instrumentation. -The profiler looks at memory, CPU activity and cycles, cache misses, and many parts of the GPU, as well as other performance metrics. +The profiler looks at memory, CPU activity and cycles, cache misses, and many parts of the GPU, amongst other performance metrics. It can also provide a timeline-view of these counters to show any changes in the application's performance, which can reveal bottlenecks, and help you identify where to focus your optimization efforts. ![Streamline image alt-text#center](Streamline.png "Figure 1. Streamline Timeline View") -## Example Android Application +## Get started with an example Android Application -In this Learning Path, you will use profile [an example Android application](https://github.com/dawidborycki/Arm.PyTorch.MNIST.Inference) using Streamline. +In this Learning Path, you will profile [an example Android application](https://github.com/dawidborycki/Arm.PyTorch.MNIST.Inference) using Streamline. -* Start by cloning the repository containing this example on your machine. -* Open it in a recent version of Android Studio. +Start by cloning the repository containing this example on your machine, then open it in a recent version of Android Studio. {{% notice Note %}} It is generally safest to not update the Gradle version when prompted. {{% /notice %}} ## Streamline -You will install Streamline and Performance Studio on your host machine and connect to your target Arm device to capture the data. +You will now install Streamline and Performance Studio on your host machine and connect to your target Arm device to capture the data. In this example, the target device is an Arm-powered Android phone. The data is captured over a USB connection, and then analyzed on your host machine. @@ -66,7 +65,7 @@ You are now able to run Streamline and [capture a profile](https://developer.arm ## Custom Annotations -In Streamline, it is possible to add custom annotations to the timeline view. This can be useful to mark the start and end of specific parts of your application, or to mark when a specific event occurs. This then allows you to view the performance of your application in relation to these events. At the bottom of *Figure 1* there are custom annotations to show when inference, pre-processing, and post-processing are happening. +In Streamline, it is possible to add custom annotations to the timeline view. This can be useful to mark the start and end of specific parts of your application, or to mark when a specific event occurs. This then allows you to view the performance of your application in relation to these events. At the bottom of *Figure 1* there are custom annotations to show when inference, pre-processing, and post-processing occur. To add annotations, you will need to add some files into your project from the **gator** daemon that Streamline uses. These files are named `streamline_annotate.c`, `streamline_annotate.h`, and `streamline_annotate_logging.h` and made available [here](https://github.com/ARM-software/gator/tree/main/annotate). Using these annotations, you can see log strings, markers, counters, and Custom Activity Maps. Within your example project, create a `cpp` folder under the `app/src/main` folder, and add these three files there. @@ -87,7 +86,7 @@ JNIEXPORT jlong JNICALL Java_AnnotateStreamline_GetTime(JNIEnv* env, jobject obj } ``` -Some functions have `unsigned int`, but that needs to be a `jint` in the wrapper, with some casting required in your Kotlin code to enforce type correctness at that end. Some functions have strings as arguments, and you will need to do a small conversion as shown below: +Some functions have `unsigned int`, but this needs to be a `jint` in the wrapper, with some casting required in your Kotlin code to enforce type correctness at that end. Some functions have strings as arguments, and you will need to do a small conversion as shown below: ```c JNIEXPORT void JNICALL Java_AnnotateStreamline_AnnotateMarkerColorStr(JNIEnv* env, jobject obj, jint color, jstring str) { @@ -97,7 +96,7 @@ JNIEXPORT void JNICALL Java_AnnotateStreamline_AnnotateMarkerColorStr(JNIEnv* en } ``` -In Android Studio `cmake` is used to create your C library, so you will need a `CMakelists.txt` file in the same directory as the C files (`app/src/main/cpp/` in the example). Copy the contents shown below into `CMakelists.txt`: +In Android Studio, `cmake` is used to create your C library, so you will need a `CMakelists.txt` file in the same directory as the C files (`app/src/main/cpp/` in the example). Copy the contents shown below into `CMakelists.txt`: ```cmake # Sets the minimum CMake version required for this project. @@ -133,7 +132,7 @@ Now add the code below to the `build.gradle` file of the Module you wish to prof } ``` -This will create a `libStreamlineAnnotationJNI.so` library that you can load in your Kotlin code, and then you can call the functions. Here you will create a singleton `AnnotateStreamline.kt`. Place the file alongside `MainActivity.kt` in `app\src\main\java\com\arm\armpytorchmnistinference` for the example. Add the following code to `AnnotateStreamline.kt` to enable Kotlin calls to the gator daemon from the rest of your code: +This creates a `libStreamlineAnnotationJNI.so` library that you can load in your Kotlin code, and then you can call the functions. Here now you can create a singleton `AnnotateStreamline.kt`. Place the file alongside `MainActivity.kt` in `app\src\main\java\com\arm\armpytorchmnistinference` for the example. Add the following code to `AnnotateStreamline.kt` to enable Kotlin calls to the gator daemon from the rest of your code: ```kotlin // Kotlin wrapper class for integration into Android project @@ -191,17 +190,17 @@ The `AnnotateStreamline` class can now be used in your Kotlin code to add annota AnnotateStreamline.annotateMarkerColorStr(AnnotateStreamline.ANNOTATE_BLUE, "Model Load") ``` -In the example app you could add this in the `onCreate()` function of `MainActivity.kt` after the `Module.load()` call to load the `model.pth`. +In the example app, you can add this in the `onCreate()` function of `MainActivity.kt` after the `Module.load()` call to load the `model.pth`. -This 'colored marker with a string' annotation will add the string and time to Streamline's log view, and look like the image shown below in Streamline's timeline (in the example app ArmNN isn't used, so there are no white ArmNN markers): +This 'colored marker with a string' annotation will add the string and time to Streamline's log view, and it appears like the image shown below in Streamline's timeline (in the example app, ArmNN is not used, so there are no white ArmNN markers): ![Streamline image alt-text#center](streamline_marker.png "Figure 2. Streamline timeline markers") ## Custom Activity Maps (CAMs) -In addition to adding strings to the log and colored markers to the timeline, a particularly useful set of annotations is the Custom Activity Maps. These are the named colored bands you can see at the bottom of the Streamline timeline view shown in *Figure 1*. They can be used to show when specific parts of your application are running, such as the pre-processing or inference, and layered for functions within functions etc. +In addition to adding strings to the log and colored markers to the timeline, a particularly useful set of annotations is the Custom Activity Maps (CAMs). These are the named colored bands you can see at the bottom of the Streamline timeline view, as shown in *Figure 1*. They can be used to show when specific parts of your application are running, such as the pre-processing or inference, and layered for functions within functions. -To add these you will need to import the functions that start `gator_cam_` from `streamline_annotate.h` through your wrapper files in the same way as the functions above. Then you can use CAMs, but first you will need to set up the tracks the annotations will appear on and an id system for each annotation. The `baseId` code below is to ensure that if you add annotations in multiple places in your code, the ids are unique. +To add these, you need to import the functions that start `gator_cam_` from `streamline_annotate.h` through your wrapper files in the same way as the functions above. Then you can use CAMs, but first you need to set up the tracks the annotations will appear on, and an ID system for each annotation. The `baseId` code below is used to ensure that if you add annotations in multiple places in your code, the IDs are unique. Here is an example setup in a class's companion object: @@ -265,6 +264,6 @@ In the example app, the CAM annotations are added to the `runInference()` functi } ``` -The example application is very fast and simple, so the CAMs will not show much information. In a more complex application you could add more CAMs, including child-level ones, to give more detailed annotations to show where time is spent in your application. For this example app with its very fast inference, it's best to change the Streamline timeline view scale to 10µs in order to see the CAM annotations better. +The example application is fast and simple, and the CAMs do not show much information. In a more complex application, you can add further CAMs, including child-level ones, to give more detailed annotations to show where time is spent in your application. For this example app with its very fast inference, it is best to change the Streamline timeline view scale to 10µs in order to better see the CAM annotations. -Once you've added in useful CAM annotations, you can build and deploy a debug version of your application. You can run Streamline and see the annotations and CAMs in the timeline view. See the [Streamline documentation](https://developer.arm.com/documentation/101816/latest/) for how to make a capture for profiling. After the capture is made and analyzed, you will be able to see when your application is running the inference, ML pre-processing, ML post-processing, or other parts of your application. From there you can see where the most time is spent, and how hard the CPU or GPU is working during different parts of the application. From this you can then decide if work is needed to improve performance and where that work needs doing. +Once you have added in useful CAM annotations, you can build and deploy a debug version of your application. You can run Streamline and see the annotations and CAMs in the timeline view. See the [Streamline documentation](https://developer.arm.com/documentation/101816/latest/) for information on how to make a capture for profiling. After the capture is made and analyzed, you will be able to see when your application is running the inference, performing ML pre-processing or ML post-processing, or other operations from parts of your application. From there you can see where the most time is spent, and how hard the CPU or GPU is working during different parts of the application. From this you can then decide if work is needed to improve performance and where that work needs doing. diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/why-profile.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/why-profile.md index 760098185..b1d4b7035 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/why-profile.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/why-profile.md @@ -7,11 +7,11 @@ layout: learningpathall --- ## Optimizing Performance -A first step towards achieving performance requirements in a ML Model is to identify what is consuming the most time and memory in your application. Profiling can help you identify the bottlenecks, and it can offer clues about how to optimize operations. +A first step towards achieving optimal performance in a ML Model is to identify what is consuming the most time and memory in your application. Profiling can help you identify the bottlenecks, and it can offer clues about how to optimize operations. With Machine Learning (ML) applications, whilst the inference of the Neural Network (NN) is often the heaviest part of the application in terms of computation and memory usage, it is not necessarily always the case. It is therefore important to profile the application as a whole to detect other possible issues that can negatively impact performance, such as issues with pre- or post-processing, or the code itself. -In this Learning Path, you will profile an Android example using LiteRT. Most of the steps are transferable and work with Linux, and across a wide range of Arm devices. +In this Learning Path, you will profile an Android example using LiteRT. Most of the steps are transferable and work with Linux, and you can use them on a wide range of Arm devices. The principles for profiling an application apply to many other inference engines and platforms, only the tools differ. From c5885be61af4a18202b5765ef4ce9841a320b306 Mon Sep 17 00:00:00 2001 From: Maddy Underwood <167196745+madeline-underwood@users.noreply.github.com> Date: Wed, 4 Dec 2024 19:49:36 +0000 Subject: [PATCH 10/19] Editorial. --- .../profiling-ml-on-arm/nn-profiling-executenetwork.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/nn-profiling-executenetwork.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/nn-profiling-executenetwork.md index 29b4e5621..f479dbbb4 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/nn-profiling-executenetwork.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/nn-profiling-executenetwork.md @@ -48,7 +48,7 @@ If you are using your own LiteRT, replace `mobilenet_v2_1.0_224_INT8.tflite` wit This will run the model twice, outputting the layer timings to `modelout.txt`. The `--iterations 2` flag is the command that means it runs twice: the first run includes a lot of startup costs and one-off optimizations, so the second run is more indicative of the real performance. -The other flags to note are the `-e` and `--output-network-details` flags which will output a lot of timeline information about the model, including the layer timings. The `--do-not-print-output` flag will stop the output of the model, which can be very large, and without sensible input it is meaningless. The `--enable-fast-math` and `--fp16-turbo-mode` flags enable some math optimizations. `CpuAcc` is the acclerated CPU backend, it can be replaced with `GpuAcc` for the accelerated GPU backend. +The other flags to note are the `-e` and `--output-network-details` flags which will output a lot of timeline information about the model, including the layer timings. The `--do-not-print-output` flag will stop the output of the model, which can be very large, and without sensible input it is meaningless. The `--enable-fast-math` and `--fp16-turbo-mode` flags enable some math optimizations. `CpuAcc` is the accelerated CPU backend, it can be replaced with `GpuAcc` for the accelerated GPU backend. After running the model, you can pull the output file back to your host machine with the following commands: From 05b94793592817812f4ad4a761cbfe82f903fe3f Mon Sep 17 00:00:00 2001 From: Maddy Underwood <167196745+madeline-underwood@users.noreply.github.com> Date: Thu, 5 Dec 2024 10:22:57 +0000 Subject: [PATCH 11/19] Editorial. --- .../smartphones-and-mobile/profiling-ml-on-arm/_index.md | 2 +- .../profiling-ml-on-arm/app-profiling-android-studio.md | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_index.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_index.md index db4c506e8..f982da10c 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_index.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_index.md @@ -12,7 +12,7 @@ learning_objectives: prerequisites: - An Arm-powered Android smartphone, and a USB cable to connect to it. - For profiling the ML inference, [ArmNN](https://github.com/ARM-software/armnn/releases)'s ExecuteNetwork. - - For profiling the application as a whole, [Arm Performance Studio](https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Studio)'s Streamline. + - For profiling the application, [Arm Performance Studio](https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Studio)'s Streamline. - Android Studio Profiler. diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-android-studio.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-android-studio.md index 2c7320ebb..6a52e3a4f 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-android-studio.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-android-studio.md @@ -45,11 +45,11 @@ Looking further down you can see the *Table* of Java/Kotlin allocations for your ### Native -For the [native side](https://developer.android.com/studio/profile/record-native-allocations), the process is similar but with different options. Choose *Profiler: Run 'app' as profileable*, and then select the *Track Memory Consumption (Native Allocations)* task. Here you have to *Start profiler task from: Process Start*. Choose *Stop* once you've captured enough data. +For the [native side](https://developer.android.com/studio/profile/record-native-allocations), the process is similar but with different options. Select **Profiler: Run 'app' as profileable**, and then select the **Track Memory Consumption (Native Allocations)** task. Here you have to **Start profiler task from: Process Start**. Select **Stop** once you've captured enough data. -The Native view doesn't have the same nice timeline graph as the Java/Kotlin side, but it does have the *Table* and *Visualization* tabs. The *Table* tab no longer has a list of allocations, but options to *Arrange by allocation method* or *callstack*. Choose *Arrange by callstack* and then you can trace down which functions were allocating significant memory. Potentially more useful, you can also see Remaining Size. +The Native view does not provide the same kind of timeline graph as the Java/Kotlin side, but it does have the **Table** and **Visualization** tabs. The **Table** tab no longer has a list of allocations, but options to **Arrange by allocation method** or **callstack**. Select **Arrange by callstack** and then you can trace down which functions allocate significant memory resource. Also there is the **Remaining Size** tab, which is arguably more useful. -In the Visualization tab you can see the callstack as a graph, and once again you can look at total Allocations Size or Remaining Size. If you look at Remaining Size, you can see what is still allocated at the end of the profiling, and by looking a few steps up the stack, probably see which allocations are related to the ML model, by seeing functions that relate to the framework you are using. A lot of the memory may be allocated by that framework rather than in your code, and you may not have much control over it, but it is useful to know where the memory is going. +In the **Visualization** tab, you can see the callstack as a graph, and once again you can look at total **Allocations Size** or **Remaining Size**. If you look at **Remaining Size**, you can see what remains allocated at the end of the profiling, and by looking a few steps up the stack, probably see which allocations are related to the ML model, by seeing functions that relate to the framework you are using. A lot of the memory may be allocated by that framework rather than in your code, and you may not have much control over it, but it is useful to know where the memory is going. ## Other platforms From 2e406bb1210aa8622dd406e70a512f77d536482e Mon Sep 17 00:00:00 2001 From: Maddy Underwood <167196745+madeline-underwood@users.noreply.github.com> Date: Mon, 9 Dec 2024 15:07:54 +0000 Subject: [PATCH 12/19] Editorial review. --- .../profiling-ml-on-arm/_index.md | 4 +-- .../app-profiling-android-studio.md | 25 ++++++++++++------ .../app-profiling-streamline.md | 26 +++++++++---------- 3 files changed, 32 insertions(+), 23 deletions(-) diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_index.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_index.md index f982da10c..855d0a99e 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_index.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_index.md @@ -11,8 +11,8 @@ learning_objectives: prerequisites: - An Arm-powered Android smartphone, and a USB cable to connect to it. - - For profiling the ML inference, [ArmNN](https://github.com/ARM-software/armnn/releases)'s ExecuteNetwork. - - For profiling the application, [Arm Performance Studio](https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Studio)'s Streamline. + - For profiling the ML inference, [ArmNN's ExecuteNetwork](https://github.com/ARM-software/armnn/releases). + - For profiling the application, [Arm Performance Studio's Streamline](https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Studio). - Android Studio Profiler. diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-android-studio.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-android-studio.md index 6a52e3a4f..806967e4a 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-android-studio.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-android-studio.md @@ -24,24 +24,33 @@ Looking at the memory consumption for Java/Kotlin and native, there are two sepa * **Track Memory Consumption (Java/Kotlin Allocations)**. * **Track Memory Consumption (Native Allocations)**. -Before you start either task, you must build your app for profiling. The instructions for this, and for general profiling setup can be found [here](https://developer.android.com/studio/profile). You need to start the correct profiling version of the app depending on the task. +Before you start either task, you must build your app for profiling. The instructions for this, and for general profiling setup can be found on the Android Studio website at a page called [Profile your app performance](https://developer.android.com/studio/profile). You need to start the correct profiling version of the app depending on the task. -![Android Studio profiling run types alt-text#center](android-profiling-version.png "Figure 1. Profiling Run Versions") +![Android Studio profiling run types alt-text#center](android-profiling-version.png "Figure 1: Profiling Run Versions") For the Java/Kotlin side, select **Profile 'app' with complete data", which is based off the debug variant. For the native side, you want the **profileable** "Profile 'app' with low overhead", which is based off the release variant. ### Java/Kotlin -If you start looking at the [Java/Kotlin side](https://developer.android.com/studio/profile/record-java-kotlin-allocations), choose *Profiler: Run 'app' as debuggable*, and then select the *Track Memory Consumption (Java/Kotlin Allocations)* task. Navigate to the part of the app you wish to profile and then you can start profiling. At the bottom of the Profiling window it should look like Figure 2 below. Click *Start Profiler Task*. +To investigate the Java/Kotlin side, see the notes on [Record Java/Kotlin allocations](https://developer.android.com/studio/profile/record-java-kotlin-allocations). -![Android Studio Start Profile alt-text#center](start-profile-dropdown.png "Figure 2. Start Profile") +Select **Profiler: Run 'app' as debuggable**, and then select the **Track Memory Consumption (Java/Kotlin Allocations)** task. -When you're ready, *Stop* the profiling again. Now there will be a nice timeline graph of memory usage. While Android Studio has a nicer interface for the Java/Kotlin side than the native side, the key to the timeline graph may be missing. This key is shown below in Figure 3, so you can refer to the colors from this. -![Android Studio memory key alt-text#center](profiler-jk-allocations-legend.png "Figure 3. Memory key for the Java/Kotlin Memory Timeline") +Navigate to the part of the app that you would like to profile, and then you can start profiling. -The default height of the Profiling view, as well as the timeline graph within it is usually too small, so adjust these heights to get a sensible graph. You can click at different points of the graph to see the memory allocations at that time. If you look according to the key you can see how much memory is allocated by Java, Native, Graphics, Code etc. +The bottom of the profiling window should resemble Figure 2. -Looking further down you can see the *Table* of Java/Kotlin allocations for your selected time on the timeline. With ML a lot of your allocations are likely to be byte[] for byte buffers, or possibly int[] for image data, etc. Clicking on the data type will open up the particular allocations, showing their size and when they were allocated. This will help to quickly narrow down their use, and whether they are all needed etc. +![Android Studio Start Profile alt-text#center](start-profile-dropdown.png "Figure 2: Start Profile") + +Click **Start Profiler Task**. + +When you're ready, select *Stop* to stop the profiling again. Now there will be a timeline graph of memory usage. While Android Studio has a more user-friendly interface for the Java/Kotlin side than the native side, the key to the timeline graph might be missing. This key is shown in Figure 3, so you can refer to the colors from this. + +![Android Studio memory key alt-text#center](profiler-jk-allocations-legend.png "Figure 3: Memory key for the Java/Kotlin Memory Timeline") + +Adjust the default height of the profiling view, as well as the timeline graph within it, as they are usually too small. You can click on different points of the graph to see the memory allocations at that specific time. Using the key on the graph, you can see how much memory is allocated by different categories of consumption, such as Java, Native, Graphics, and Code. + +If you looking further down, you can see the **Table** of Java/Kotlin allocations for your selected time on the timeline. With ML, many of your allocations are likely to be byte[] for byte buffers, or possibly int[] for image data, etc. Clicking on the data type will open up the particular allocations, showing their size and when they were allocated. This will help to quickly narrow down their use, and whether they are all needed etc. ### Native diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-streamline.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-streamline.md index f2b6db3f6..6bf19f070 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-streamline.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-streamline.md @@ -39,37 +39,37 @@ In this example, the target device is an Arm-powered Android phone. The data is For more information on Streamline usage, see [Tutorials and Training Videos](https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Studio). -While the example that you are running is based on Android, you can use [the Setup and Capture Instructions for Linux](https://developer.arm.com/documentation/101816/0903/Getting-started-with-Streamline/Profile-your-Linux-application). +While the example that you are running is based on Android, you can use the [Setup and Capture Instructions for Linux](https://developer.arm.com/documentation/101816/0903/Getting-started-with-Streamline/Profile-your-Linux-application). -Firstly, follow these [Setup Instructions](https://developer.arm.com/documentation/102477/0900/Setup-tasks?lang=en), to make sure you have `adb` (Android Debug Bridge) installed. If you have installed [Android Studio](https://developer.android.com/studio), you will have adb installed already. Otherwise, you can get it as part of the Android SDK platform tools [here](https://developer.android.com/studio/releases/platform-tools.html). +Firstly, follow these [Setup Instructions](https://developer.arm.com/documentation/102477/0900/Setup-tasks?lang=en), to make sure you have `adb` (Android Debug Bridge) installed. If you have installed [Android Studio](https://developer.android.com/studio), you will have adb installed already. Otherwise, you can get it as part of the Android SDK platform tools which can be found on the [SDK Platform Tools Release Notes page](https://developer.android.com/studio/releases/platform-tools.html). -Make sure `adb` is in your path. You can check this by running `adb` in a terminal. If it is not in your path, you can add it by installing the [Android SDK `platform-tools`](https://developer.android.com/tools/releases/platform-tools#downloads) directory to your path. +Make sure `adb` is in your path. You can check this by running `adb` in a terminal. If it is not in your path, you can add it by installing the SDK platform tools from the [SDK Platform Tools Release Notes Downloads page](https://developer.android.com/tools/releases/platform-tools#downloads). Next, install [Arm Performance Studio](https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Studio#Downloads), which includes Streamline. -Connect your Android phone to your host machine through USB. Ensure that your Android phone is set to [Developer mode](https://developer.android.com/studio/debug/dev-options). +Connect your Android phone to your host machine through USB. Ensure that your Android phone is set to developer mode. For more information on how to do this, see [Configure on-device developer options](https://developer.android.com/studio/debug/dev-options). -On your phone, go to **Settings** > **Developer Options** and enable **USB Debugging**. If your phone asks you to authorize connection to your host machine, confirm authorization. Test the connection by running `adb devices` in a terminal. You should see your device ID listed. +On your phone, navigate to **Settings**, then **Developer Options**. Enable **USB Debugging**. If your phone requests authorization for connection to your host machine, confirm authorization. Test the connection by running `adb devices` in a terminal. You will see your device ID listed. -Next, you need a debuggable-build of the application that you want to profile. +Next, you need a debuggable build of the application that you want to profile. - In Android Studio, ensure your **Build Variant** is set to **debug**. You can then build the application, and install it on your device. -- For a Unity app, select **Development Build** under **File** > **Build Settings** when building your application. -- In Unreal Engine, open **Project Settings** > **Project** > **Packaging** > **Project**, and ensure that the **For Distribution** checkbox is clear. -- Generally, you can set `android:debuggable=true` in the application manifest file. +- For a Unity app, select **Development Build** in the **Build Settings** menu under **File**, when building your application. +- In Unreal Engine, expand the navigtion menu **Project Settings** > **Project** > **Packaging** > **Project**, and ensure that the **For Distribution** checkbox is clear. +- You can set `android:debuggable=true` in the application manifest file. For the example application that you cloned earlier, the Build Variant is `debug` by default, but you can verify this by going to **Build** > **Select Build Variant** in Android Studio. Build and install this application on your device. -You are now able to run Streamline and [capture a profile](https://developer.arm.com/documentation/102477/0900/Capture-a-profile?lang=en) of your application. But before you do, you can add some useful annotations to your code that enables more specific performance analysis of your application. +You are now able to run Streamline and capture a profile of your application by following the instructions [Capture a profile](https://developer.arm.com/documentation/102477/0900/Capture-a-profile?lang=en). But before you do, you can add some useful annotations to your code that enables specific performance analysis of your application. ## Custom Annotations -In Streamline, it is possible to add custom annotations to the timeline view. This can be useful to mark the start and end of specific parts of your application, or to mark when a specific event occurs. This then allows you to view the performance of your application in relation to these events. At the bottom of *Figure 1* there are custom annotations to show when inference, pre-processing, and post-processing occur. +In Streamline, it is possible to add custom annotations to the timeline view. This can be useful to mark the start and end of parts of your application, or to mark when a specific event occurs. This then allows you to view the performance of your application in relation to these events. At the bottom of *Figure 1* there are custom annotations to show when inference, pre-processing, and post-processing occur. -To add annotations, you will need to add some files into your project from the **gator** daemon that Streamline uses. These files are named `streamline_annotate.c`, `streamline_annotate.h`, and `streamline_annotate_logging.h` and made available [here](https://github.com/ARM-software/gator/tree/main/annotate). Using these annotations, you can see log strings, markers, counters, and Custom Activity Maps. Within your example project, create a `cpp` folder under the `app/src/main` folder, and add these three files there. +To add annotations, you will need to add some files into your project from the **gator** daemon that Streamline uses. These files are named `streamline_annotate.c`, `streamline_annotate.h`, and `streamline_annotate_logging.h` and made available at [this Github respository](https://github.com/ARM-software/gator/tree/main/annotate). Using these annotations, you can see log strings, markers, counters, and Custom Activity Maps. Within your example project, create a `cpp` folder under the `app/src/main` folder, and add these three files there. -These files are written in C, so if your Android Studio project is in Java or Kotlin, you will need to add a C library to your project. This is slightly trickier than just adding a Java or Kotlin file, but it is not difficult. You can find instructions on how to do this [here](https://developer.android.com/studio/projects/add-native-code). +These files are written in C, so if your Android Studio project is in Java or Kotlin, you will need to add a C library to your project. This is slightly trickier than adding a Java or Kotlin file, but it is not difficult. You can find instructions on how to do this at a page called [Add C and C++ code to your project](https://developer.android.com/studio/projects/add-native-code). Create a file in the `app/src/main/cpp/` folder under your project, and name it `annotate_jni_wrapper.c`. This will be a wrapper around the gator daemon's functions, and will be called from your Kotlin code. Copy the code below into this file. You can also create similar wrapper functions for other gator daemon functions. From b2381b06ec89146586843d3534512751ed0ffa4f Mon Sep 17 00:00:00 2001 From: Maddy Underwood <167196745+madeline-underwood@users.noreply.github.com> Date: Tue, 10 Dec 2024 04:22:53 +0000 Subject: [PATCH 13/19] Editorial review. --- .../smartphones-and-mobile/profiling-ml-on-arm/_index.md | 4 ---- .../profiling-ml-on-arm/app-profiling-android-studio.md | 2 +- 2 files changed, 1 insertion(+), 5 deletions(-) diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_index.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_index.md index a4b549139..855d0a99e 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_index.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_index.md @@ -1,10 +1,6 @@ --- title: Profile the Performance of Machine Learning models on Arm -draft: true -cascade: - draft: true - minutes_to_complete: 60 who_is_this_for: This is an introductory topic for software developers who want to learn how to profile the performance of Machine Learning (ML) models running on Arm devices. diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-android-studio.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-android-studio.md index 806967e4a..9b481c674 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-android-studio.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-android-studio.md @@ -28,7 +28,7 @@ Before you start either task, you must build your app for profiling. The instruc ![Android Studio profiling run types alt-text#center](android-profiling-version.png "Figure 1: Profiling Run Versions") -For the Java/Kotlin side, select **Profile 'app' with complete data", which is based off the debug variant. For the native side, you want the **profileable** "Profile 'app' with low overhead", which is based off the release variant. +For the Java/Kotlin side, select **Profile 'app' with complete data**, which is based off the debug variant. For the native side, you want the **profileable** "Profile 'app' with low overhead", which is based off the release variant. ### Java/Kotlin From 56388f75155daf475ec6b11edb28ea9bac05451e Mon Sep 17 00:00:00 2001 From: Maddy Underwood <167196745+madeline-underwood@users.noreply.github.com> Date: Tue, 10 Dec 2024 04:44:25 +0000 Subject: [PATCH 14/19] Editorial review. --- .../profiling-ml-on-arm/app-profiling-android-studio.md | 4 ++-- .../profiling-ml-on-arm/app-profiling-streamline.md | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-android-studio.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-android-studio.md index 9b481c674..c242e96a7 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-android-studio.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-android-studio.md @@ -50,13 +50,13 @@ When you're ready, select *Stop* to stop the profiling again. Now there will be Adjust the default height of the profiling view, as well as the timeline graph within it, as they are usually too small. You can click on different points of the graph to see the memory allocations at that specific time. Using the key on the graph, you can see how much memory is allocated by different categories of consumption, such as Java, Native, Graphics, and Code. -If you looking further down, you can see the **Table** of Java/Kotlin allocations for your selected time on the timeline. With ML, many of your allocations are likely to be byte[] for byte buffers, or possibly int[] for image data, etc. Clicking on the data type will open up the particular allocations, showing their size and when they were allocated. This will help to quickly narrow down their use, and whether they are all needed etc. +If you look further down, you can see the **Table** of Java/Kotlin allocations for your selected time on the timeline. With ML, many of your allocations are likely to be scenarios such as byte[] for byte buffers, or possibly int[] for image data. Clicking on the data type opens up the particular allocations, showing their size and when they were allocated. This will help to quickly narrow down their use, and whether they are all needed. ### Native For the [native side](https://developer.android.com/studio/profile/record-native-allocations), the process is similar but with different options. Select **Profiler: Run 'app' as profileable**, and then select the **Track Memory Consumption (Native Allocations)** task. Here you have to **Start profiler task from: Process Start**. Select **Stop** once you've captured enough data. -The Native view does not provide the same kind of timeline graph as the Java/Kotlin side, but it does have the **Table** and **Visualization** tabs. The **Table** tab no longer has a list of allocations, but options to **Arrange by allocation method** or **callstack**. Select **Arrange by callstack** and then you can trace down which functions allocate significant memory resource. Also there is the **Remaining Size** tab, which is arguably more useful. +The Native view does not provide the same kind of timeline graph as the Java/Kotlin side, but it does have the **Table** and **Visualization** tabs. The **Table** tab no longer has a list of allocations, but options to **Arrange by allocation method** or **callstack**. Select **Arrange by callstack** and then you can trace down which functions allocate significant memory resource. There is also the **Remaining Size** tab, which is arguably more useful. In the **Visualization** tab, you can see the callstack as a graph, and once again you can look at total **Allocations Size** or **Remaining Size**. If you look at **Remaining Size**, you can see what remains allocated at the end of the profiling, and by looking a few steps up the stack, probably see which allocations are related to the ML model, by seeing functions that relate to the framework you are using. A lot of the memory may be allocated by that framework rather than in your code, and you may not have much control over it, but it is useful to know where the memory is going. diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-streamline.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-streamline.md index 6bf19f070..8d0353d29 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-streamline.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-streamline.md @@ -54,7 +54,7 @@ On your phone, navigate to **Settings**, then **Developer Options**. Enable **US Next, you need a debuggable build of the application that you want to profile. - In Android Studio, ensure your **Build Variant** is set to **debug**. You can then build the application, and install it on your device. - For a Unity app, select **Development Build** in the **Build Settings** menu under **File**, when building your application. -- In Unreal Engine, expand the navigtion menu **Project Settings** > **Project** > **Packaging** > **Project**, and ensure that the **For Distribution** checkbox is clear. +- In Unreal Engine, expand the navigation menu **Project Settings** > **Project** > **Packaging** > **Project**, and ensure that the **For Distribution** checkbox is clear. - You can set `android:debuggable=true` in the application manifest file. For the example application that you cloned earlier, the Build Variant is `debug` by default, but you can verify this by going to **Build** > **Select Build Variant** in Android Studio. @@ -67,7 +67,7 @@ You are now able to run Streamline and capture a profile of your application by In Streamline, it is possible to add custom annotations to the timeline view. This can be useful to mark the start and end of parts of your application, or to mark when a specific event occurs. This then allows you to view the performance of your application in relation to these events. At the bottom of *Figure 1* there are custom annotations to show when inference, pre-processing, and post-processing occur. -To add annotations, you will need to add some files into your project from the **gator** daemon that Streamline uses. These files are named `streamline_annotate.c`, `streamline_annotate.h`, and `streamline_annotate_logging.h` and made available at [this Github respository](https://github.com/ARM-software/gator/tree/main/annotate). Using these annotations, you can see log strings, markers, counters, and Custom Activity Maps. Within your example project, create a `cpp` folder under the `app/src/main` folder, and add these three files there. +To add annotations, you will need to add some files into your project from the **gator** daemon that Streamline uses. These files are named `streamline_annotate.c`, `streamline_annotate.h`, and `streamline_annotate_logging.h` and made available at [this GitHub respository](https://github.com/ARM-software/gator/tree/main/annotate). Using these annotations, you can see log strings, markers, counters, and Custom Activity Maps. Within your example project, create a `cpp` folder under the `app/src/main` folder, and add these three files there. These files are written in C, so if your Android Studio project is in Java or Kotlin, you will need to add a C library to your project. This is slightly trickier than adding a Java or Kotlin file, but it is not difficult. You can find instructions on how to do this at a page called [Add C and C++ code to your project](https://developer.android.com/studio/projects/add-native-code). From 95550657a20f767e3c719d7a3b267683569c7789 Mon Sep 17 00:00:00 2001 From: Maddy Underwood <167196745+madeline-underwood@users.noreply.github.com> Date: Tue, 10 Dec 2024 05:23:26 +0000 Subject: [PATCH 15/19] Editorial. --- .../app-profiling-android-studio.md | 35 +++++++++++++------ .../nn-profiling-general.md | 4 +-- 2 files changed, 26 insertions(+), 13 deletions(-) diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-android-studio.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-android-studio.md index c242e96a7..4c675b238 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-android-studio.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-android-studio.md @@ -15,18 +15,23 @@ Memory is a common problem in ML, with ever-increasing model parameters and data * Next, click **Tool Windows**, and then **Profiler**. This opens the Profiler window. -* Attach your device in Developer Mode with a USB cable, and then select your app's process. Here there are a number of different profiling tasks available. +* Attach your device in Developer Mode with a USB cable, and then select your app's process. There are a number of different profiling tasks available. -Most likely with an Android ML app you will need to look at memory both from the Java/Kotlin side, and the native side. The Java/Kotlin side is where the app runs, and might be where buffers are allocated for input and output if, for example, you are using LiteRT (formerly known as TensorFlow Lite). The native side is where the ML framework runs. +Most likely with an Android ML app you will need to look at memory both from the Java/Kotlin side, and the native side: + +* The Java/Kotlin side is where the app runs, and might be where buffers are allocated for input and output if, for example, you are using LiteRT. +* The native side is where the ML framework runs. + +{{% notice Note %}} +Before you start either task, you must build your app for profiling. The instructions for this, and for general profiling setup can be found at [Profile your app performance](https://developer.android.com/studio/profile) on the Android Studio website. You need to start the correct profiling version of the app depending on the task. +{{% /notice %}} Looking at the memory consumption for Java/Kotlin and native, there are two separate tasks in the Profiler: * **Track Memory Consumption (Java/Kotlin Allocations)**. * **Track Memory Consumption (Native Allocations)**. -Before you start either task, you must build your app for profiling. The instructions for this, and for general profiling setup can be found on the Android Studio website at a page called [Profile your app performance](https://developer.android.com/studio/profile). You need to start the correct profiling version of the app depending on the task. - -![Android Studio profiling run types alt-text#center](android-profiling-version.png "Figure 1: Profiling Run Versions") +![Android Studio profiling run types alt-text#center](android-profiling-version.png "Figure 3: Profiling Run Versions") For the Java/Kotlin side, select **Profile 'app' with complete data**, which is based off the debug variant. For the native side, you want the **profileable** "Profile 'app' with low overhead", which is based off the release variant. @@ -38,17 +43,21 @@ Select **Profiler: Run 'app' as debuggable**, and then select the **Track Memory Navigate to the part of the app that you would like to profile, and then you can start profiling. -The bottom of the profiling window should resemble Figure 2. +The bottom of the profiling window should resemble Figure 4. + +![Android Studio Start Profile alt-text#center](start-profile-dropdown.png "Figure 4: Start Profile") -![Android Studio Start Profile alt-text#center](start-profile-dropdown.png "Figure 2: Start Profile") +Click **Start profiler task**. -Click **Start Profiler Task**. +When you're ready, select *Stop* to stop the profiling again. -When you're ready, select *Stop* to stop the profiling again. Now there will be a timeline graph of memory usage. While Android Studio has a more user-friendly interface for the Java/Kotlin side than the native side, the key to the timeline graph might be missing. This key is shown in Figure 3, so you can refer to the colors from this. +Now there will be a timeline graph of memory usage. While Android Studio has a more user-friendly interface for the Java/Kotlin side than the native side, the key to the timeline graph might be missing. This key is shown in Figure 3. ![Android Studio memory key alt-text#center](profiler-jk-allocations-legend.png "Figure 3: Memory key for the Java/Kotlin Memory Timeline") -Adjust the default height of the profiling view, as well as the timeline graph within it, as they are usually too small. You can click on different points of the graph to see the memory allocations at that specific time. Using the key on the graph, you can see how much memory is allocated by different categories of consumption, such as Java, Native, Graphics, and Code. +If you prefer, you can adjust the default height of the profiling view, as well as the timeline graph within it, as they are usually too small. + +Now click on different points of the graph to see the memory allocations at each specific time. Using the key on the graph, you can see how much memory is allocated by different categories of consumption, such as Java, Native, Graphics, and Code. If you look further down, you can see the **Table** of Java/Kotlin allocations for your selected time on the timeline. With ML, many of your allocations are likely to be scenarios such as byte[] for byte buffers, or possibly int[] for image data. Clicking on the data type opens up the particular allocations, showing their size and when they were allocated. This will help to quickly narrow down their use, and whether they are all needed. @@ -62,4 +71,8 @@ In the **Visualization** tab, you can see the callstack as a graph, and once aga ## Other platforms -On other platforms, you will need a different memory profiler. The objective of working out where the memory is being used is the same, and whether there are issues with leaks or just too much memory being used. There are often trade-offs between memory and speed, and they can be considered more sensibly if the numbers involved are known. +On other platforms, you will need a different memory profiler. The objective is the same; to investigate memory consumption in terms of identifying whether there are issues with leaks or if there is too much memory being used. + +There are often trade-offs between memory and speed, and investigating memory consumption provides data that can help inform assessments of this balance. + + diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/nn-profiling-general.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/nn-profiling-general.md index 09fc32e18..afc2ac452 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/nn-profiling-general.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/nn-profiling-general.md @@ -6,10 +6,10 @@ weight: 5 layout: learningpathall --- -## Tools you can use +## Tools that you can use App profilers provide a good overall view of performance, but you might want to look inside the model and identify bottlenecks within the network. The network is often where the bulk of the bottlenecks lie, so it warrants closer analysis. -With general profilers this is hard to do, as there needs to be annotation inside the ML framework code to retrieve the information. It is a large task to write the profiling annotation throughout the framework, so it is easier to use tools from a framework or inference engine that already has the required instrumentation. +With general profilers this is hard to do, as there needs to be annotation inside the ML framework code to retrieve the information. It is a complex task to write the profiling annotation throughout the framework, so it is easier to use tools from a framework or inference engine that already has the required instrumentation. Depending on the model you use, your choice of tools will vary. For example, if you are using LiteRT (formerly TensorFlow Lite), Arm provides the ArmNN delegate that you can run with the model running on Linux or Android, CPU or GPU. From 3d3d067993947db21d448d2835453d2a91be5c1e Mon Sep 17 00:00:00 2001 From: Maddy Underwood <167196745+madeline-underwood@users.noreply.github.com> Date: Tue, 10 Dec 2024 10:51:43 +0000 Subject: [PATCH 16/19] Editorial review. --- .../profiling-ml-on-arm/_index.md | 1 + .../app-profiling-streamline.md | 38 +++++++++++------ .../nn-profiling-executenetwork.md | 41 +++++++++++++------ 3 files changed, 54 insertions(+), 26 deletions(-) diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_index.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_index.md index 855d0a99e..747c6cdc7 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_index.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_index.md @@ -8,6 +8,7 @@ who_is_this_for: This is an introductory topic for software developers who want learning_objectives: - Profile the execution times of ML models on Arm devices. - Profile ML application performance on Arm devices. + - Describe how profiling can help optimize the performance of Machine Learning applications. prerequisites: - An Arm-powered Android smartphone, and a USB cable to connect to it. diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-streamline.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-streamline.md index 8d0353d29..e80988f62 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-streamline.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/app-profiling-streamline.md @@ -16,9 +16,9 @@ Application profiling can be split into two main types: Whilst sampling only provides a statistical view, it is less intrusive and has less processing overhead than instrumentation. -The profiler looks at memory, CPU activity and cycles, cache misses, and many parts of the GPU, amongst other performance metrics. +The profiler looks at performance metrics such as memory, CPU activity and cycles, cache misses, and many parts of the GPU. -It can also provide a timeline-view of these counters to show any changes in the application's performance, which can reveal bottlenecks, and help you identify where to focus your optimization efforts. +It can also provide a timeline-view of these counters to show any changes in the application's performance, which can reveal bottlenecks, and help you to identify where to focus your optimization efforts. ![Streamline image alt-text#center](Streamline.png "Figure 1. Streamline Timeline View") @@ -33,13 +33,15 @@ It is generally safest to not update the Gradle version when prompted. {{% /notice %}} ## Streamline -You will now install Streamline and Performance Studio on your host machine and connect to your target Arm device to capture the data. +Now you can install Streamline and Arm Performance Studio on your host machine and connect to your target Arm device to capture the data. In this example, the target device is an Arm-powered Android phone. The data is captured over a USB connection, and then analyzed on your host machine. For more information on Streamline usage, see [Tutorials and Training Videos](https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Studio). -While the example that you are running is based on Android, you can use the [Setup and Capture Instructions for Linux](https://developer.arm.com/documentation/101816/0903/Getting-started-with-Streamline/Profile-your-Linux-application). +While the example that you are running is based on Android, you can also run it on Linux. See [Setup and Capture Instructions for Linux](https://developer.arm.com/documentation/101816/0903/Getting-started-with-Streamline/Profile-your-Linux-application). + +### Installation Firstly, follow these [Setup Instructions](https://developer.arm.com/documentation/102477/0900/Setup-tasks?lang=en), to make sure you have `adb` (Android Debug Bridge) installed. If you have installed [Android Studio](https://developer.android.com/studio), you will have adb installed already. Otherwise, you can get it as part of the Android SDK platform tools which can be found on the [SDK Platform Tools Release Notes page](https://developer.android.com/studio/releases/platform-tools.html). @@ -67,7 +69,7 @@ You are now able to run Streamline and capture a profile of your application by In Streamline, it is possible to add custom annotations to the timeline view. This can be useful to mark the start and end of parts of your application, or to mark when a specific event occurs. This then allows you to view the performance of your application in relation to these events. At the bottom of *Figure 1* there are custom annotations to show when inference, pre-processing, and post-processing occur. -To add annotations, you will need to add some files into your project from the **gator** daemon that Streamline uses. These files are named `streamline_annotate.c`, `streamline_annotate.h`, and `streamline_annotate_logging.h` and made available at [this GitHub respository](https://github.com/ARM-software/gator/tree/main/annotate). Using these annotations, you can see log strings, markers, counters, and Custom Activity Maps. Within your example project, create a `cpp` folder under the `app/src/main` folder, and add these three files there. +To add annotations, you will need to add some files into your project from the **gator** daemon that Streamline uses. These files are named `streamline_annotate.c`, `streamline_annotate.h`, and `streamline_annotate_logging.h` and made available at [this GitHub repository](https://github.com/ARM-software/gator/tree/main/annotate). Using these annotations, you can see log strings, markers, counters, and Custom Activity Maps. Within your example project, create a `cpp` folder under the `app/src/main` folder, and add these three files there. These files are written in C, so if your Android Studio project is in Java or Kotlin, you will need to add a C library to your project. This is slightly trickier than adding a Java or Kotlin file, but it is not difficult. You can find instructions on how to do this at a page called [Add C and C++ code to your project](https://developer.android.com/studio/projects/add-native-code). @@ -132,7 +134,13 @@ Now add the code below to the `build.gradle` file of the Module you wish to prof } ``` -This creates a `libStreamlineAnnotationJNI.so` library that you can load in your Kotlin code, and then you can call the functions. Here now you can create a singleton `AnnotateStreamline.kt`. Place the file alongside `MainActivity.kt` in `app\src\main\java\com\arm\armpytorchmnistinference` for the example. Add the following code to `AnnotateStreamline.kt` to enable Kotlin calls to the gator daemon from the rest of your code: +This creates a `libStreamlineAnnotationJNI.so` library that you can load in your Kotlin code, and then you can call the functions. + +In this location you can now create a singleton `AnnotateStreamline.kt`. + +Place the file alongside `MainActivity.kt` in `app\src\main\java\com\arm\armpytorchmnistinference` for the example. + +Add the following code to `AnnotateStreamline.kt` to enable Kotlin calls to the gator daemon from the rest of your code: ```kotlin // Kotlin wrapper class for integration into Android project @@ -184,7 +192,11 @@ class AnnotateStreamline { Fill in all the function calls to match the functions you added into `annotate_jni_wrapper.c`. -The `AnnotateStreamline` class can now be used in your Kotlin code to add annotations to the Streamline timeline view. The first thing is to make sure `AnnotateStreamline.setup()` is called before any other gator functions. For the example project, add it into the `onCreate()` function of `MainActivity.kt`. Then you can add annotations like this: +You can now use the `AnnotateStreamline` class in your Kotlin code to add annotations to the Streamline timeline view. + +Firstly, make sure that `AnnotateStreamline.setup()` is called before any other gator function. + +For the example project, add it into the `onCreate()` function of `MainActivity.kt`. Then you can add annotations like this: ```kotlin AnnotateStreamline.annotateMarkerColorStr(AnnotateStreamline.ANNOTATE_BLUE, "Model Load") @@ -192,15 +204,15 @@ The `AnnotateStreamline` class can now be used in your Kotlin code to add annota In the example app, you can add this in the `onCreate()` function of `MainActivity.kt` after the `Module.load()` call to load the `model.pth`. -This 'colored marker with a string' annotation will add the string and time to Streamline's log view, and it appears like the image shown below in Streamline's timeline (in the example app, ArmNN is not used, so there are no white ArmNN markers): +This *colored marker with a string* annotation will add the string and time to Streamline's log view, and it appears like the image shown below in Streamline's timeline (in the example app, ArmNN is not used, so there are no white ArmNN markers): ![Streamline image alt-text#center](streamline_marker.png "Figure 2. Streamline timeline markers") ## Custom Activity Maps (CAMs) -In addition to adding strings to the log and colored markers to the timeline, a particularly useful set of annotations is the Custom Activity Maps (CAMs). These are the named colored bands you can see at the bottom of the Streamline timeline view, as shown in *Figure 1*. They can be used to show when specific parts of your application are running, such as the pre-processing or inference, and layered for functions within functions. +In addition to adding strings to the log and colored markers to the timeline, a particularly useful set of annotations is the Custom Activity Maps (CAMs). These are the named colored bands that you can see at the bottom of the Streamline timeline view, as shown in *Figure 1*. They can be used to show when specific parts of your application are running, such as the pre-processing or inference, and layered for functions within functions. -To add these, you need to import the functions that start `gator_cam_` from `streamline_annotate.h` through your wrapper files in the same way as the functions above. Then you can use CAMs, but first you need to set up the tracks the annotations will appear on, and an ID system for each annotation. The `baseId` code below is used to ensure that if you add annotations in multiple places in your code, the IDs are unique. +To add these, in the same way as the functions above, you need to import the functions that are prefixed with `gator_cam_` from `streamline_annotate.h`. You can then use CAMs, but first you need to set up the tracks the annotations will appear on, and an ID system for each annotation. The `baseId` code below is used to ensure that if you add annotations in multiple places in your code, the IDs are unique. Here is an example setup in a class's companion object: @@ -221,7 +233,7 @@ Here is an example setup in a class's companion object: For the example app, add this to the `MainActivity` class. -Then it can be used like this: +Then you can use it in this way: ```kotlin val preprocess = currentId++ @@ -234,7 +246,7 @@ Then it can be used like this: AnnotateStreamline.camJobEnd(camViewId, preprocess, AnnotateStreamline.getTime()) ``` -In the example app, the CAM annotations are added to the `runInference()` function, which should look like this: +In the example app, the CAM annotations are added to the `runInference()` function, that looks like this: ```kotlin private fun runInference(bitmap: Bitmap) { @@ -264,6 +276,6 @@ In the example app, the CAM annotations are added to the `runInference()` functi } ``` -The example application is fast and simple, and the CAMs do not show much information. In a more complex application, you can add further CAMs, including child-level ones, to give more detailed annotations to show where time is spent in your application. For this example app with its very fast inference, it is best to change the Streamline timeline view scale to 10µs in order to better see the CAM annotations. +The example application is fast and simple, and the CAMs do not show a lot of information. In a more complex application, you can add further CAMs, including child-level ones, to give more detailed annotations to show where time is spent in your application. For this example app with its very fast inference, it is best to change the Streamline timeline view scale to 10µs in order to better see the CAM annotations. Once you have added in useful CAM annotations, you can build and deploy a debug version of your application. You can run Streamline and see the annotations and CAMs in the timeline view. See the [Streamline documentation](https://developer.arm.com/documentation/101816/latest/) for information on how to make a capture for profiling. After the capture is made and analyzed, you will be able to see when your application is running the inference, performing ML pre-processing or ML post-processing, or other operations from parts of your application. From there you can see where the most time is spent, and how hard the CPU or GPU is working during different parts of the application. From this you can then decide if work is needed to improve performance and where that work needs doing. diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/nn-profiling-executenetwork.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/nn-profiling-executenetwork.md index f479dbbb4..c3c5535ce 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/nn-profiling-executenetwork.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/nn-profiling-executenetwork.md @@ -7,15 +7,21 @@ layout: learningpathall --- ## ArmNN's Network Profiler -One way of running LiteRT models is with ArmNN. This is available as a delegate to the standard LiteRT interpreter. But to profile the model, ArmNN comes with a command-line utility called `ExecuteNetwork`. This program just runs the model without the rest of the app. It is able to output layer timings and other useful information to let you know where there might be bottlenecks within your model. +One way of running LiteRT models is to use ArmNN, which is open-source network machine learning (ML) software. This is available as a delegate to the standard LiteRT interpreter. But to profile the model, ArmNN comes with a command-line utility called `ExecuteNetwork`. This program runs the model without the rest of the app. It is able to output layer timings and other useful information to report where there might be bottlenecks within your model. -If you are using LiteRT without ArmNN, then the output from `ExecuteNetwork` will be more of an indication than a definitive answer. But it can still be useful to spot any obvious problems. +If you are using LiteRT without ArmNN, then the output from `ExecuteNetwork` is more of an indication than a definitive answer, but it can still be useful in identifying any obvious problems. -To try this out, you can download a LiteRT model from the [Arm Model Zoo](https://github.com/ARM-software/ML-zoo). In this Learning Path, you will download [mobilenet tflite](https://github.com/ARM-software/ML-zoo/blob/master/models/image_classification/mobilenet_v2_1.0_224/tflite_int8/mobilenet_v2_1.0_224_INT8.tflite). +### Download a LiteRT Model -To get `ExecuteNetwork` you can download it from the [ArmNN GitHub](https://github.com/ARM-software/armnn/releases). Download the version appropriate for the Android phone you wish to test on - the Android version and the architecture of the phone. If you are unsure of the architecture, you can use a lower one, but you may miss out on some optimizations. Inside the `tar.gz` archive that you download, `ExecuteNetwork` is included. Note among the other release downloads on the ArmNN Github is the separate file for the `aar` delegate which is the easy way to include the ArmNN delegate into your app. +To try this out, you can download a LiteRT model from the [Arm Model Zoo](https://github.com/ARM-software/ML-zoo). Specifically for this Learning Path, you will download [mobilenet tflite](https://github.com/ARM-software/ML-zoo/blob/master/models/image_classification/mobilenet_v2_1.0_224/tflite_int8/mobilenet_v2_1.0_224_INT8.tflite). -To run `ExecuteNetwork` you'll need to use `adb` to push the model and the executable to your phone, and then run it from the adb shell. `adb` is included with Android Studio, but you may need to add it to your path. Android Studio normally installs it to a location like `\\AppData\Local\Android\Sdk\platform-tools`. `adb` can also be downloaded separately from the [Android Developer site](https://developer.android.com/studio/releases/platform-tools). +### Download and setup ExecuteNetwork + +You can download `ExecuteNetwork` from the [ArmNN GitHub](https://github.com/ARM-software/armnn/releases). Download the version appropriate for the Android phone that you are testing on, ensuring that it matches the Android version and architecture of the phone. If you are unsure of the architecture, you can use a lower one, but you might miss out on some optimizations.`ExecuteNetwork` is included inside the `tar.gz` archive that you download. Among the other release downloads on the ArmNN Github is a separate file for the `aar` delegate which you can also easily download. + +To run `ExecuteNetwork,` you need to use `adb` to push the model and the executable to your phone, and then run it from the adb shell. `adb` is included with Android Studio, but you might need to add it to your path. Android Studio normally installs it to a location such as: + + `\\AppData\Local\Android\Sdk\platform-tools`. `adb` can also be downloaded separately from the [Android Developer site](https://developer.android.com/studio/releases/platform-tools). Unzip the `tar.gz` folder you downloaded. From a command prompt, you can then adapt and run the following commands to push the files to your phone. The `/data/local/tmp` folder of your Android device is a place with relaxed permissions that you can use to run this profiling. @@ -38,6 +44,8 @@ chmod 777 ExecuteNetwork chmod 777 *.so ``` +### Run ExecuteNetwork to profile the model + Now you can run ExecuteNetwork to profile the model. With the example LiteRT, you can use the following command: ```bash @@ -46,9 +54,11 @@ LD_LIBRARY_PATH=. ./ExecuteNetwork -m mobilenet_v2_1.0_224_INT8.tflite -c CpuAcc If you are using your own LiteRT, replace `mobilenet_v2_1.0_224_INT8.tflite` with the name of your tflite file. -This will run the model twice, outputting the layer timings to `modelout.txt`. The `--iterations 2` flag is the command that means it runs twice: the first run includes a lot of startup costs and one-off optimizations, so the second run is more indicative of the real performance. +This runs the model twice, outputting the layer timings to `modelout.txt`. The `--iterations 2` flag is the command that instructs it to run twice: the first run includes a lot of start-up costs and one-off optimizations, whilst the second run is more indicative of the level of performance. -The other flags to note are the `-e` and `--output-network-details` flags which will output a lot of timeline information about the model, including the layer timings. The `--do-not-print-output` flag will stop the output of the model, which can be very large, and without sensible input it is meaningless. The `--enable-fast-math` and `--fp16-turbo-mode` flags enable some math optimizations. `CpuAcc` is the accelerated CPU backend, it can be replaced with `GpuAcc` for the accelerated GPU backend. +The other flags to note are the `-e` and `--output-network-details` flags which output a lot of timeline information about the model, including the layer timings. The `--do-not-print-output` flag stops the output of the model, which can be very large, and without sensible input it is meaningless. The `--enable-fast-math` and `--fp16-turbo-mode` flags enable some math optimizations. `CpuAcc` is the accelerated CPU backend, and you can replace it with `GpuAcc` for the accelerated GPU backend. + +### Analyze the output After running the model, you can pull the output file back to your host machine with the following commands: @@ -56,11 +66,11 @@ After running the model, you can pull the output file back to your host machine exit adb pull /data/local/tmp/modelout.txt ``` -Once again, this can be done with drag and drop in Android Studio's *Device Explorer > Files*. +Once again, this can be done with drag-and-drop in Android Studio's **Device Explorer > Files**. -Depending on the size of your model, the output will probably be quite large. You can use a text editor to view the file. The output is in JSON format, so you can use a JSON viewer to make it more readable. Usually some scripting can be used to extract the information you need more easily out of the very raw data in the file. +Depending on the size of your model, the output will probably be quite large. You can use a text editor to view the file. The output is in JSON format, so you can use a JSON viewer to make it more readable. Usually you can use some scripting to extract the information you need more easily out of the raw data in the file. -At the top is the summary, with the setup time and inference time of your 2 runs, which will look something like this: +At the top is the summary, with the setup time and inference time of the two runs, which look something like this: ```text Info: ArmNN v33.2.0 @@ -78,8 +88,13 @@ Info: Execution time: 468.42 ms. Info: Inference time: 468.58 ms ``` -After the summary comes the graph of the model, then the layers and their timings from the second run. At the start of the layers there are a few optimizations and their timings recorded before the network itself. You can skip past the graph and the optimization timings to get to the part that needs analyzing. +After the summary, you will see: + +* The graph of the model. +* The layers and their timings from the second run. + +At the start of the layers, there are a few optimizations and their timings recorded before the network itself. You can skip past the graph and the optimization timings to get to the part that you need to analyze. -In the mobilenet example output, the graph is from lines 18 to 1629. After this is the optimization timings, which are part of the runtime, but not the network - these go until line 1989. Next there are a few wall clock recordings for the loading of the network, before the first layer "Convolution2dLayer_CreateWorkload_#18" at line 2036. Here is where the layer info that needs analyzing starts. +In the mobilenet example output, the graph is from lines 18 to 1629. After this are the optimization timings, which are part of the runtime, but not the network - these go until line 1989. Next there are a few wall clock recordings for the loading of the network, before the first layer "Convolution2dLayer_CreateWorkload_#18" at line 2036. This is where the layer information that requires analysis starts. -The layers' "Wall clock time" in microseconds shows how long they took to run. These layers and their timings can then be analyzed to see which layers, and which operators, took the most time. +The layers' "Wall clock time" in microseconds shows you how long they took to run. These layers and their timings can then be analyzed to see which layers, and which operators, took the most time. From 0b25239e872b3498519cdaad6225667702399798 Mon Sep 17 00:00:00 2001 From: Maddy Underwood <167196745+madeline-underwood@users.noreply.github.com> Date: Tue, 10 Dec 2024 11:07:14 +0000 Subject: [PATCH 17/19] Editorial review. --- .../smartphones-and-mobile/profiling-ml-on-arm/_review.md | 8 ++++---- .../profiling-ml-on-arm/nn-profiling-executenetwork.md | 8 +++++--- 2 files changed, 9 insertions(+), 7 deletions(-) diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_review.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_review.md index 7bca269a0..6dcc1f122 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_review.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_review.md @@ -10,7 +10,7 @@ review: - All of the above. correct_answer: 4 explanation: > - Streamline shows you CPU and GPU activity (and a lot more counters!), and if Custom Activity Maps are used, you can see when your Neural Network and other parts of your application are running. + Streamline shows you CPU and GPU activity (and a lot more counters!) and if Custom Activity Maps are used, you can see when your Neural Network and other parts of your application are running. - questions: question: > @@ -20,19 +20,19 @@ review: - "No." correct_answer: 1 explanation: > - Yes, Android Studio has a built-in profiler which can be used to monitor the memory usage of your application, among other things. + Yes, Android Studio has a built-in profiler that can be used to monitor the memory usage of your application, amongst other functions. - questions: question: > Is there a way to profile what is happening inside your Neural Network? answers: - - Yes, Streamline just shows you out of the box. - No. + - Yes, Streamline just shows you out of the box. - Yes, ArmNN's ExecuteNetwork can do this. - Yes, Android Studio Profiler can do this. correct_answer: 3 explanation: > - Standard profilers don't have an easy way to see what is happening inside an ML framework to see a model running inside it. ArmNN's ExecuteNetwork can do this for TensorFlow Lite models, and ExecuTorch has tools that can do this for PyTorch models. + Standard profilers do not have an easy way to see what is happening inside an ML framework to see a model running inside it. ArmNN's ExecuteNetwork can do this for LiteRT models, and ExecuTorch has tools that can do this for PyTorch models. diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/nn-profiling-executenetwork.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/nn-profiling-executenetwork.md index c3c5535ce..d173918c1 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/nn-profiling-executenetwork.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/nn-profiling-executenetwork.md @@ -33,7 +33,9 @@ adb push libarmnn.so /data/local/tmp/ adb push libarmnn_support_library.so /data/local/tmp/ # more ArmNN .so library files ``` -Push all the `.so` library files that are in the base folder of the `tar.gz` archive you downloaded, alongside `ExecuteNetwork`, and all the `.so` files in the `delegate` sub-folder. If you are using a recent version of Android Studio this copying can be done much more easily with drag and drop in the *Device Explorer > Files*. +Push all the `.so` library files that are in the base folder of the `tar.gz` archive you downloaded, alongside `ExecuteNetwork`, and all the `.so` files in the `delegate` sub-folder. + +If you are using a recent version of Android Studio this copying can be done much more easily with with drag-and-drop in Android Studio in **Device Explorer > Files**. Then you need to set the permissions on the files: @@ -66,7 +68,7 @@ After running the model, you can pull the output file back to your host machine exit adb pull /data/local/tmp/modelout.txt ``` -Once again, this can be done with drag-and-drop in Android Studio's **Device Explorer > Files**. +Once again, you can do this with drag-and-drop in Android Studio in **Device Explorer > Files**. Depending on the size of your model, the output will probably be quite large. You can use a text editor to view the file. The output is in JSON format, so you can use a JSON viewer to make it more readable. Usually you can use some scripting to extract the information you need more easily out of the raw data in the file. @@ -97,4 +99,4 @@ At the start of the layers, there are a few optimizations and their timings reco In the mobilenet example output, the graph is from lines 18 to 1629. After this are the optimization timings, which are part of the runtime, but not the network - these go until line 1989. Next there are a few wall clock recordings for the loading of the network, before the first layer "Convolution2dLayer_CreateWorkload_#18" at line 2036. This is where the layer information that requires analysis starts. -The layers' "Wall clock time" in microseconds shows you how long they took to run. These layers and their timings can then be analyzed to see which layers, and which operators, took the most time. +The layers' wall-clock time in microseconds shows you how much time elapsed. You can then analyze these layers and timings to identify which layers and operators took the most time to run. From 766fad4fc2726c052880f2c81fd6c760f22f6b3f Mon Sep 17 00:00:00 2001 From: Maddy Underwood <167196745+madeline-underwood@users.noreply.github.com> Date: Tue, 10 Dec 2024 11:09:15 +0000 Subject: [PATCH 18/19] Editorial review. --- .../profiling-ml-on-arm/nn-profiling-executenetwork.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/nn-profiling-executenetwork.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/nn-profiling-executenetwork.md index d173918c1..d8a9990bb 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/nn-profiling-executenetwork.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/nn-profiling-executenetwork.md @@ -35,7 +35,7 @@ adb push libarmnn_support_library.so /data/local/tmp/ ``` Push all the `.so` library files that are in the base folder of the `tar.gz` archive you downloaded, alongside `ExecuteNetwork`, and all the `.so` files in the `delegate` sub-folder. -If you are using a recent version of Android Studio this copying can be done much more easily with with drag-and-drop in Android Studio in **Device Explorer > Files**. +If you are using a recent version of Android Studio this copying can be done much more easily with drag-and-drop in Android Studio in **Device Explorer > Files**. Then you need to set the permissions on the files: From aa7fcbee601861f9e5b8477850117fd4ad92e2f7 Mon Sep 17 00:00:00 2001 From: Maddy Underwood <167196745+madeline-underwood@users.noreply.github.com> Date: Thu, 12 Dec 2024 11:42:59 +0000 Subject: [PATCH 19/19] Update _index.md Amended title after consultation with Gemma and Ben. --- .../smartphones-and-mobile/profiling-ml-on-arm/_index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_index.md b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_index.md index 747c6cdc7..25b48cc84 100644 --- a/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_index.md +++ b/content/learning-paths/smartphones-and-mobile/profiling-ml-on-arm/_index.md @@ -1,5 +1,5 @@ --- -title: Profile the Performance of Machine Learning models on Arm +title: Profile the Performance of AI and ML Mobile Applications on Arm minutes_to_complete: 60