Skip to content

Commit

Permalink
Merge pull request #537 from VectorCamp/main
Browse files Browse the repository at this point in the history
restrict pointer C99 Arm Learning Paths material
  • Loading branch information
pareenaverma authored Oct 30, 2023
2 parents b4af6dd + e597026 commit 3a3f686
Show file tree
Hide file tree
Showing 6 changed files with 451 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
---
title: restrict keyword in C99

minutes_to_complete: 30

who_is_this_for: C developers who are interested in software optimization

learning_objectives:
- Learn the importance of using 'restrict' keyword in C correctly

prerequisites:
- An Arm based system with Linux OS and recent compiler (clang or gcc)

author_primary: Konstantinos Margaritis, VectorCamp

### Tags
skilllevels: Advanced
subjects: Programming
armips:
- Aarch64
- Armv8-a
- Armv9-a
tools_software_languages:
- GCC
- Clang
- SVE2
- Coding
operatingsystems:
- Linux


### FIXED, DO NOT MODIFY
# ================================================================================
weight: 1 # _index.md always has weight of 1 to order correctly
layout: "learningpathall" # All files under learning paths have this same wrapper
learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
---
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---
next_step_guidance: You should now be able to test the `restrict` keyword on your own or other open-source code and discover potential optimizations!

recommended_path: /learning-paths/embedded-systems/

further_reading:
- resource:
title: Wikipedia restrict entry
link: https://en.wikipedia.org/wiki/Restrict
type: documentation
- resource:
title: Godbolt restrict tests
link: https://godbolt.org/z/PxWxjc1oh
type: website


# ================================================================================
# FIXED, DO NOT MODIFY
# ================================================================================
weight: 21 # set to always be larger than the content in this path, and one more than 'review'
title: "Next Steps" # Always the same
layout: "learningpathall" # All files under learning paths have this same wrapper
---
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
---
review:
- questions:
question: >
Where is `restrict` placed in the code?
answers:
- In the function declaration
- As an enum value
- Between the pointer symbol (*) and the parameter name
correct_answer: 3
explanation: >
`restrict` is placed in the arguments list of a function, between the * and the parameter name, like this:
`int func(char *restrict arg)`
- questions:
question: >
What does `restrict` do?
answers:
- It increases the frequency of the CPU cores, making your program run faster
- It issues a command to clear the cache, leaving more room for your program
- It restricts the standard of the C library used to C99
- It hints to the compiler that the memory pointed to by the parameter, cannot be accessed through any other means inside the particular function except, using this pointer
correct_answer: 4
explanation: >
In order for the compiler to better schedule the instructions of a function, it needs to know if there is any
dependency between the parameter variables. If there is no dependency, usually the compiler can group together instructions
increasing performance and efficiency.
- questions:
question: >
Which language supports `restrict`
answers:
- Python
- C and C++
- C only (after C99)
- Rust
correct_answer: 3
explanation: >
`restrict` is a C-only keyword, it does not exist on C++ (`__restrict__` does, but it is not exactly the same)
# ================================================================================
# FIXED, DO NOT MODIFY
# ================================================================================
title: "Review" # Always the same title
weight: 20 # Set to always be larger than the content in this path
layout: "learningpathall" # All files under learning paths have this same wrapper
---
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
---
title: Another example with SVE2
weight: 3

### FIXED, DO NOT MODIFY
layout: learningpathall
---

## Example 2: SVE2 unleashed

Let's try another example, one from [gcc restrict pointer examples](https://www.gnu.org/software/c-intro-and-ref/manual/html_node/restrict-Pointer-Example.html):

```C
void process_data (const char *in, char *out, size_t size)
{
for (int i = 0; i < size; i++)
out[i] = in[i] + in[i + 1];
}
```
This example will be easier to demonstrate with SVE2, and we found gcc 13 to have a better result than clang, this is the output of `gcc-13 -O3 -march=armv9-a`:
```
process_data:
cbz x2, .L1
add x5, x0, 1
cntb x3
sub x4, x1, x5
sub x3, x3, #1
cmp x4, x3
bls .L6
mov w4, w2
mov x3, 0
whilelo p0.b, wzr, w2
.L4:
ld1b z0.b, p0/z, [x0, x3]
ld1b z1.b, p0/z, [x5, x3]
add z0.b, z0.b, z1.b
st1b z0.b, p0, [x1, x3]
incb x3
whilelo p0.b, w3, w4
b.any .L4
.L1:
ret
.L6:
mov x3, 0
.L3:
ldrb w4, [x5, x3]
ldrb w6, [x0, x3]
add w4, w4, w6
strb w4, [x1, x3]
add x3, x3, 1
cmp x2, x3
bne .L3
ret
```
Do not worry about each instruction in the assembly here, but notice that gcc has added 2 loops, one that uses the SVE2 `while*` instructions to the processing (.L4) and one scalar loop (.L3). The latter is executed in case theis any pointer aliasing -if there is any overlap between the memory pointers basically. Let's try adding `restrict` to pointer `in`:
```C
void process_data (const char *restrict in, char *out, size_t size)
{
for (int i = 0; i < size; i++)
out[i] = in[i] + in[i + 1];
}
```

This is now the output from gcc-13:
```
process_data:
cbz x2, .L1
add x5, x0, 1
mov w4, w2
mov x3, 0
whilelo p0.b, wzr, w2
.L3:
ld1b z1.b, p0/z, [x0, x3]
ld1b z0.b, p0/z, [x5, x3]
add z0.b, z0.b, z1.b
st1b z0.b, p0, [x1, x3]
incb x3
whilelo p0.b, w3, w4
b.any .L3
.L1:
ret
```

This is a huge improvement! Code size reduction is down from 30 lines to 14, less than half the original size. In both cases, you will note that the main loop (`.L4` in the former case, `.L3` in the latter) is exactly the same, but the entry and exit code of the function are very much simplified. The compiler was able to distinguish that the memory pointed by `in` does not overlap with memory pointed by `out`, it was able to simplify the code by eliminating the scalar loop and remove the associated code that checked if it needed to enter it.

But I can almost hear the question: "Why is that important if the main loop is still the same?"
And it is a right question. The answer is this:

If your function is going to be called once and run over tens of billions of elements, then saving a few instructions before and after the main loop does not really matter.

But if your function is called on smaller sizes millions or even *billions* of times, then saving a few instructions in this function means we are saving a few *billions* of instructions total, which means less time to spend running on the CPU and less energy wasted.
Loading

0 comments on commit 3a3f686

Please sign in to comment.