-
Notifications
You must be signed in to change notification settings - Fork 164
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #537 from VectorCamp/main
restrict pointer C99 Arm Learning Paths material
- Loading branch information
Showing
6 changed files
with
451 additions
and
0 deletions.
There are no files selected for viewing
37 changes: 37 additions & 0 deletions
37
content/learning-paths/embedded-systems/restrict-keyword-c99/_index.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
--- | ||
title: restrict keyword in C99 | ||
|
||
minutes_to_complete: 30 | ||
|
||
who_is_this_for: C developers who are interested in software optimization | ||
|
||
learning_objectives: | ||
- Learn the importance of using 'restrict' keyword in C correctly | ||
|
||
prerequisites: | ||
- An Arm based system with Linux OS and recent compiler (clang or gcc) | ||
|
||
author_primary: Konstantinos Margaritis, VectorCamp | ||
|
||
### Tags | ||
skilllevels: Advanced | ||
subjects: Programming | ||
armips: | ||
- Aarch64 | ||
- Armv8-a | ||
- Armv9-a | ||
tools_software_languages: | ||
- GCC | ||
- Clang | ||
- SVE2 | ||
- Coding | ||
operatingsystems: | ||
- Linux | ||
|
||
|
||
### FIXED, DO NOT MODIFY | ||
# ================================================================================ | ||
weight: 1 # _index.md always has weight of 1 to order correctly | ||
layout: "learningpathall" # All files under learning paths have this same wrapper | ||
learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content. | ||
--- |
23 changes: 23 additions & 0 deletions
23
content/learning-paths/embedded-systems/restrict-keyword-c99/_next-steps.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
--- | ||
next_step_guidance: You should now be able to test the `restrict` keyword on your own or other open-source code and discover potential optimizations! | ||
|
||
recommended_path: /learning-paths/embedded-systems/ | ||
|
||
further_reading: | ||
- resource: | ||
title: Wikipedia restrict entry | ||
link: https://en.wikipedia.org/wiki/Restrict | ||
type: documentation | ||
- resource: | ||
title: Godbolt restrict tests | ||
link: https://godbolt.org/z/PxWxjc1oh | ||
type: website | ||
|
||
|
||
# ================================================================================ | ||
# FIXED, DO NOT MODIFY | ||
# ================================================================================ | ||
weight: 21 # set to always be larger than the content in this path, and one more than 'review' | ||
title: "Next Steps" # Always the same | ||
layout: "learningpathall" # All files under learning paths have this same wrapper | ||
--- |
48 changes: 48 additions & 0 deletions
48
content/learning-paths/embedded-systems/restrict-keyword-c99/_review.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
--- | ||
review: | ||
- questions: | ||
question: > | ||
Where is `restrict` placed in the code? | ||
answers: | ||
- In the function declaration | ||
- As an enum value | ||
- Between the pointer symbol (*) and the parameter name | ||
correct_answer: 3 | ||
explanation: > | ||
`restrict` is placed in the arguments list of a function, between the * and the parameter name, like this: | ||
`int func(char *restrict arg)` | ||
- questions: | ||
question: > | ||
What does `restrict` do? | ||
answers: | ||
- It increases the frequency of the CPU cores, making your program run faster | ||
- It issues a command to clear the cache, leaving more room for your program | ||
- It restricts the standard of the C library used to C99 | ||
- It hints to the compiler that the memory pointed to by the parameter, cannot be accessed through any other means inside the particular function except, using this pointer | ||
correct_answer: 4 | ||
explanation: > | ||
In order for the compiler to better schedule the instructions of a function, it needs to know if there is any | ||
dependency between the parameter variables. If there is no dependency, usually the compiler can group together instructions | ||
increasing performance and efficiency. | ||
- questions: | ||
question: > | ||
Which language supports `restrict` | ||
answers: | ||
- Python | ||
- C and C++ | ||
- C only (after C99) | ||
- Rust | ||
correct_answer: 3 | ||
explanation: > | ||
`restrict` is a C-only keyword, it does not exist on C++ (`__restrict__` does, but it is not exactly the same) | ||
# ================================================================================ | ||
# FIXED, DO NOT MODIFY | ||
# ================================================================================ | ||
title: "Review" # Always the same title | ||
weight: 20 # Set to always be larger than the content in this path | ||
layout: "learningpathall" # All files under learning paths have this same wrapper | ||
--- |
95 changes: 95 additions & 0 deletions
95
...t/learning-paths/embedded-systems/restrict-keyword-c99/restrict-example-sve2.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,95 @@ | ||
--- | ||
title: Another example with SVE2 | ||
weight: 3 | ||
|
||
### FIXED, DO NOT MODIFY | ||
layout: learningpathall | ||
--- | ||
|
||
## Example 2: SVE2 unleashed | ||
|
||
Let's try another example, one from [gcc restrict pointer examples](https://www.gnu.org/software/c-intro-and-ref/manual/html_node/restrict-Pointer-Example.html): | ||
|
||
```C | ||
void process_data (const char *in, char *out, size_t size) | ||
{ | ||
for (int i = 0; i < size; i++) | ||
out[i] = in[i] + in[i + 1]; | ||
} | ||
``` | ||
This example will be easier to demonstrate with SVE2, and we found gcc 13 to have a better result than clang, this is the output of `gcc-13 -O3 -march=armv9-a`: | ||
``` | ||
process_data: | ||
cbz x2, .L1 | ||
add x5, x0, 1 | ||
cntb x3 | ||
sub x4, x1, x5 | ||
sub x3, x3, #1 | ||
cmp x4, x3 | ||
bls .L6 | ||
mov w4, w2 | ||
mov x3, 0 | ||
whilelo p0.b, wzr, w2 | ||
.L4: | ||
ld1b z0.b, p0/z, [x0, x3] | ||
ld1b z1.b, p0/z, [x5, x3] | ||
add z0.b, z0.b, z1.b | ||
st1b z0.b, p0, [x1, x3] | ||
incb x3 | ||
whilelo p0.b, w3, w4 | ||
b.any .L4 | ||
.L1: | ||
ret | ||
.L6: | ||
mov x3, 0 | ||
.L3: | ||
ldrb w4, [x5, x3] | ||
ldrb w6, [x0, x3] | ||
add w4, w4, w6 | ||
strb w4, [x1, x3] | ||
add x3, x3, 1 | ||
cmp x2, x3 | ||
bne .L3 | ||
ret | ||
``` | ||
Do not worry about each instruction in the assembly here, but notice that gcc has added 2 loops, one that uses the SVE2 `while*` instructions to the processing (.L4) and one scalar loop (.L3). The latter is executed in case theis any pointer aliasing -if there is any overlap between the memory pointers basically. Let's try adding `restrict` to pointer `in`: | ||
```C | ||
void process_data (const char *restrict in, char *out, size_t size) | ||
{ | ||
for (int i = 0; i < size; i++) | ||
out[i] = in[i] + in[i + 1]; | ||
} | ||
``` | ||
|
||
This is now the output from gcc-13: | ||
``` | ||
process_data: | ||
cbz x2, .L1 | ||
add x5, x0, 1 | ||
mov w4, w2 | ||
mov x3, 0 | ||
whilelo p0.b, wzr, w2 | ||
.L3: | ||
ld1b z1.b, p0/z, [x0, x3] | ||
ld1b z0.b, p0/z, [x5, x3] | ||
add z0.b, z0.b, z1.b | ||
st1b z0.b, p0, [x1, x3] | ||
incb x3 | ||
whilelo p0.b, w3, w4 | ||
b.any .L3 | ||
.L1: | ||
ret | ||
``` | ||
|
||
This is a huge improvement! Code size reduction is down from 30 lines to 14, less than half the original size. In both cases, you will note that the main loop (`.L4` in the former case, `.L3` in the latter) is exactly the same, but the entry and exit code of the function are very much simplified. The compiler was able to distinguish that the memory pointed by `in` does not overlap with memory pointed by `out`, it was able to simplify the code by eliminating the scalar loop and remove the associated code that checked if it needed to enter it. | ||
|
||
But I can almost hear the question: "Why is that important if the main loop is still the same?" | ||
And it is a right question. The answer is this: | ||
|
||
If your function is going to be called once and run over tens of billions of elements, then saving a few instructions before and after the main loop does not really matter. | ||
|
||
But if your function is called on smaller sizes millions or even *billions* of times, then saving a few instructions in this function means we are saving a few *billions* of instructions total, which means less time to spend running on the CPU and less energy wasted. |
Oops, something went wrong.