Skip to content

Commit

Permalink
Updated restrict LP
Browse files Browse the repository at this point in the history
  • Loading branch information
pareenaverma committed Nov 7, 2023
1 parent 665aef6 commit 82f86f0
Show file tree
Hide file tree
Showing 4 changed files with 20 additions and 15 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ learning_objectives:
- Learn the importance of using the `restrict` keyword in C correctly

prerequisites:
- An Arm based system with Linux OS and recent compiler (Clang or GCC)
- An Arm computer running Linux OS and a recent version of compiler (Clang or GCC) installed

author_primary: Konstantinos Margaritis, VectorCamp

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,8 +48,8 @@ There are 2 points to make here:
1. `scaleVectors()` is the important function here, it scales two vectors by the same scale factor `*C`
2. vector `a` overlaps with vector `b`. (`b = &a[2]`).
this rather simple program produces this output:
```
This simple program produces this output:
```output
a(before): 1 2 3 4
b(before): 3 4 5 6
a(after) : 2 4 12 16
Expand All @@ -60,7 +60,7 @@ Notice that after the scaling, the contents of `a` are also affected by the scal

We will include the assembly output of `scaleVectors` as produced by `clang-17 -O3`:

```
```output
scaleVectors: // @scaleVectors
ldr x8, [x2]
ldr x9, [x0]
Expand Down Expand Up @@ -108,7 +108,11 @@ Unsurprisingly, the disassembled output of `scaleVectors` is the same. The reaso

## The Solution: restrict

This is what the C99 `restrict` keyword resolves. It instructs the compiler that the passed arguments are not dependent on each other and that access to the memory of each happens only through the respective pointer. This way the compiler can schedule the instructions in a much more efficient way. Essentially it can group and schedule the loads and stores. **Note**, `restrict` only works in C, not in C++.
This is what the C99 `restrict` keyword resolves. It instructs the compiler that the passed arguments are not dependent on each other and that access to the memory of each happens only through the respective pointer. This way the compiler can schedule the instructions in a much more efficient way. Essentially it can group and schedule the loads and stores.

{{% notice Note %}}
The `restrict` keyword only works in C, not in C++.
{{% /notice %}}

Let's add `restrict` to `A` in the parameter list:
```C
Expand Down Expand Up @@ -191,10 +195,10 @@ It is interesting to see that in such an example adding the `restrict` keyword r
## What about SVE2?
We have shown the obvious benefit of `restrict` in this function, on an armv8-a CPU, but we have new armv9-a CPUs out there with SVE2 as well as Neon/ASIMD.
Could the compiler generate better code in that case using `restrict`? The output without `restrict` is almost the same, but with `restrict` used, this is the result (we used `clang-17 -O3 -march=armv9-a`):
You have now seen the benefit of `restrict` in this function, on an Armv8-A CPU. You can now try it on an Armv9-A CPU which supports SVE2 as well as Neon/ASIMD.
Could the compiler generate better code in that case using `restrict`? The output without `restrict` is almost the same, but with `restrict` used, this is the result (Compiler flags used: `clang-17 -O3 -march=armv9-a`):
```
```output
scaleVectors: // @scaleVectors
ldp q1, q2, [x0]
ldp q3, q4, [x1]
Expand All @@ -210,4 +214,4 @@ scaleVectors: // @scaleVectors

There are just 10 instructions, 31% of the original code size! The compiler has made great use of the SVE2 features, combining the multiplications and reducing them to 4 and, at the same time, grouping loads and stores down to 2 each. We have optimized our code by more than 3x just by adding a C99 keyword.

We are now going to look at another example.
Next, lets take a look at another example.
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
---
title: When can we use restrict
title: When can you use restrict
weight: 4

### FIXED, DO NOT MODIFY
layout: learningpathall
---

When can we use `restrict` or, put differently, how do we recognize that we need `restrict` in our code?
When can you use `restrict` or, put differently, how do you recognize that you need `restrict` in your code?

`restrict` as a pointer attribute is rather easy to test. As a rule of thumb, if the function includes one or more pointers to memory objects as arguments, we can use `restrict` if we are certain that the memory pointed to by these pointer arguments does not overlap and there is no other way to access them in the body of the function, except by the use of those pointers, i.e., there is no other global pointer or some other indirect way to access these elements.
`restrict` as a pointer attribute is rather easy to test. As a rule of thumb, if the function includes one or more pointers to memory objects as arguments, you can use `restrict` if you are certain that the memory pointed to by these pointer arguments does not overlap and there is no other way to access them in the body of the function, except by the use of those pointers, i.e., there is no other global pointer or some other indirect way to access these elements.

Let's show a counter example:
Let's see a counter example:

```
```C
int A[10];

int f(int *B, size_t n) {
Expand All @@ -30,4 +30,4 @@ int main() {
This example does not not benefit from `restrict` in either gcc and clang.
However, there are plenty of cases that are candidates for the `restrict` optimization. It's safe and easy to try but, even if it looks like a good candidate, it is still possible that the compiler will not detect a pattern that is suited for optimization and we might not see any reduction in the code or speed gain. It is up to the compiler; in some cases clang handles this better or differently from gcc, and vice versa, and this will also depend on the version. If you have a particular piece of code that you would like to optimize, before you attempt to refactor it completely, rewrite it in assembly or use any SIMD instructions, it might be worth trying `restrict`. Even saving a couple of instructions in a critical loop function is worth having by just adding one keyword.
However, there are plenty of cases that are candidates for the `restrict` optimization. It's safe and easy to try but, even if it looks like a good candidate, it is still possible that the compiler will not detect a pattern that is suited for optimization and you might not see any reduction in the code or speed gain. It is up to the compiler; in some cases clang handles this better or differently from gcc, and vice versa, and this will also depend on the version. If you have a particular piece of code that you would like to optimize, before you attempt to refactor it completely, rewrite it in assembly or use any SIMD instructions, it might be worth trying `restrict`. Even saving a couple of instructions in a critical loop function is worth having by just adding one keyword.
1 change: 1 addition & 0 deletions contributors.csv
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,4 @@ Frédéric -lefred- Descamps,OCI,,,,lefred.be
Kristof Beyls,Arm,,,,
David Spickett,Arm,,,,
Uma Ramalingam,Arm,uma-ramalingam,,,
Konstantinos Margaritis,VectorCamp,,,,

0 comments on commit 82f86f0

Please sign in to comment.