Skip to content

Commit

Permalink
Merge pull request #1231 from ArmDeveloperEcosystem/main
Browse files Browse the repository at this point in the history
Production update
  • Loading branch information
jasonrandrews authored Sep 6, 2024
2 parents 5d66583 + fc9febb commit 86b45de
Show file tree
Hide file tree
Showing 33 changed files with 1,589 additions and 109 deletions.
15 changes: 14 additions & 1 deletion .wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3101,4 +3101,17 @@ configurability
darwin
dmg
madvise
osKernelInitialize
osKernelInitialize
Alexandros
CopyWord
Lamprineas
Multiversioning
SkipWord
Tallund
VoD
autocomplete
ifunc
ifuncs
lm
memcpy
multiversioning
30 changes: 28 additions & 2 deletions content/install-guides/streamline-cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,21 @@ layout: installtoolsall # DO NOT MODIFY. Always true for tool install ar

The Streamline CLI tools are native command-line tools that are designed to run directly on an Arm server running Linux. The tools provide a software profiling methodology that gives you clear and actionable performance data. You can use this data to guide the optimization of the heavily used functions in your software.

## Platform support

Streamline CLI tools are supported with the following host operating systems running on an Arm AArch64 host machine:

* Amazon Linux 2023 or newer
* Debian 10 or newer
* RHEL 8 or newer
* Ubuntu 20.04 or newer

Streamline CLI tools are supported on the following Arm CPUs:

* Arm Neoverse N1
* Arm Neoverse N2
* Arm Neoverse V1

## Before you begin

Use the Arm Sysreport utility to determine whether your system configuration supports hardware-assisted profiling. Follow the instructions in [Get ready for performance analysis with Sysreport][1] to discover how to download and run this utility.
Expand Down Expand Up @@ -79,8 +94,7 @@ If you are using the `workflow_topdown_basic option`, ensure that your applicati

## Applying the kernel patch

For best results, we provide a Linux kernel patch that modifies the behavior of Linux perf to improve support for capturing function-attributed top-down
metrics on Arm systems. This patch provides two new capabilities:
For best results, we provide a Linux kernel patch that modifies the behavior of Linux perf to improve support for capturing function-attributed top-down metrics on Arm systems. This patch provides two new capabilities:

* It allows a new thread to inherit the perf counter group configuration of its parent.
* It decouples the perf event-based sampling window size from the overall sample rate. This allows strobed mark-space sampling patterns where the tool can capture a small window without using a high sample rate.
Expand Down Expand Up @@ -119,6 +133,12 @@ patch -p 1 -i v6.7-combined.patch

Follow these steps to integrate these patches into an RPM-based distribution's kernel:
1. Install the RPM build tools:
```
sudo yum install rpm-build rpmdevtools
```
1. Remove any existing `rpmbuild` directory, renaming as appropriate:
```sh
Expand Down Expand Up @@ -166,6 +186,12 @@ Follow these steps to integrate these patches into an RPM-based distribution's k
1. Save the changes and exit the editor.
1. Install the build dependencies:
```sh
sudo dnf builddep SPECS/kernel.spec
```
1. Build the kernel and other rpms:
```sh
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
---
title: Learn about function multiversioning

minutes_to_complete: 60

who_is_this_for: This is an advanced topic for developers interested in optimizing their C/C++ applications across Arm64 targets.

learning_objectives:
- Use hardware features to tune your applications at function level.
- Create multiple versions of C/C++ functions for the targets that you intend to run applications on.
- Assist the compiler in generating optimal code for the targets, or provide your own optimized versions at source level.
- Automatically select the most appropriate function version at runtime.
- Reuse your optimized application binaries across various targets.

prerequisites:
- Basic knowledge of GNU function attributes.
- Familiarity with indirect functions (ifuncs).
- Basic knowledge of loop vectorization.
- Familiarity with Arm assembly.
- A LLVM 19 compiler with runtime library support or GCC 14.

author_primary: Alexandros Lamprineas

### Tags
skilllevels: Advanced
subjects: Performance and Architecture
armips:
- Cortex-A
- Neoverse
tools_software_languages:
- C/C++
operatingsystems:
- Linux
- Android
- macOS

### Cross-platform metadata only
shared_path: true
shared_between:
- servers-and-cloud-computing
- smartphones-and-mobile
- laptops-and-desktops
- embedded-systems

# ================================================================================
# FIXED, DO NOT MODIFY
# ================================================================================
weight: 1 # _index.md always has weight of 1 to order correctly
layout: "learningpathall" # All files under learning paths have this same wrapper
learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
---
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
next_step_guidance:

recommended_path: /learning-paths/PLACEHOLDER_CATEGORY/PLACEHOLDER_LEARNING_PATH/

further_reading:
- resource:
title: Arm C Language Extensions
link: https://arm-software.github.io/acle/main/acle.html
type: documentation

# ================================================================================
# FIXED, DO NOT MODIFY
# ================================================================================
weight: 21 # set to always be larger than the content in this path, and one more than 'review'
title: "Next Steps" # Always the same
layout: "learningpathall" # All files under learning paths have this same wrapper
---
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
---
review:
- questions:
question: >
What is the main benefit of Function Multiversioning?
answers:
- I can reuse my binaries on different targets without sacrificing runtime performance.
- My application binaries are smaller.
correct_answer: 1
explanation: >
The binaries produced can be reused on different targets, but they might be larger in size.
- questions:
question: >
Can I implement versions of a function in separate translation units?
answers:
- Yes, function versions can spread across different translations units.
- No, all of the functions must be in the same translation unit.
correct_answer: 1
explanation: >
There is no requirement for function versions to be defined in the same translation unit. However, they must all be declared in the translation unit which contains the definition of the default version.
- questions:
question: >
Under what circumstances will two targets, one with SVE2 and one with SVE, run the same version of a function?
answers:
- The versioned function has versions for SVE2 and default only.
- The versioned function has versions for SVE2, SVE and default only.
- The versioned function has versions for SVE and default only.
correct_answer: 3
explanation: >
Answer 3 is the only one where the most specific version for both targets is the same, namely SVE. In answer 1, one target would pick SVE2 and the other default, and in answer two, one would pick SVE2 and the other SVE.
# ================================================================================
# FIXED, DO NOT MODIFY
# ================================================================================
title: "Review" # Always the same title
weight: 20 # Set to always be larger than the content in this path
layout: "learningpathall" # All files under learning paths have this same wrapper
---
Loading

0 comments on commit 86b45de

Please sign in to comment.