diff --git a/content/learning-paths/embedded-systems/restrict-keyword-c99/_index.md b/content/learning-paths/embedded-systems/restrict-keyword-c99/_index.md index 227259611..e52d14a9d 100644 --- a/content/learning-paths/embedded-systems/restrict-keyword-c99/_index.md +++ b/content/learning-paths/embedded-systems/restrict-keyword-c99/_index.md @@ -9,7 +9,7 @@ learning_objectives: - Learn the importance of using the `restrict` keyword in C correctly prerequisites: - - An Arm based system with Linux OS and recent compiler (Clang or GCC) + - An Arm computer running Linux OS and a recent version of compiler (Clang or GCC) installed author_primary: Konstantinos Margaritis, VectorCamp diff --git a/content/learning-paths/embedded-systems/restrict-keyword-c99/what-is-restrict.md b/content/learning-paths/embedded-systems/restrict-keyword-c99/what-is-restrict.md index 83072aa82..67791a85c 100644 --- a/content/learning-paths/embedded-systems/restrict-keyword-c99/what-is-restrict.md +++ b/content/learning-paths/embedded-systems/restrict-keyword-c99/what-is-restrict.md @@ -48,8 +48,8 @@ There are 2 points to make here: 1. `scaleVectors()` is the important function here, it scales two vectors by the same scale factor `*C` 2. vector `a` overlaps with vector `b`. (`b = &a[2]`). -this rather simple program produces this output: -``` +This simple program produces this output: +```output a(before): 1 2 3 4 b(before): 3 4 5 6 a(after) : 2 4 12 16 @@ -60,7 +60,7 @@ Notice that after the scaling, the contents of `a` are also affected by the scal We will include the assembly output of `scaleVectors` as produced by `clang-17 -O3`: -``` +```output scaleVectors: // @scaleVectors ldr x8, [x2] ldr x9, [x0] @@ -108,7 +108,11 @@ Unsurprisingly, the disassembled output of `scaleVectors` is the same. The reaso ## The Solution: restrict -This is what the C99 `restrict` keyword resolves. It instructs the compiler that the passed arguments are not dependent on each other and that access to the memory of each happens only through the respective pointer. This way the compiler can schedule the instructions in a much more efficient way. Essentially it can group and schedule the loads and stores. **Note**, `restrict` only works in C, not in C++. +This is what the C99 `restrict` keyword resolves. It instructs the compiler that the passed arguments are not dependent on each other and that access to the memory of each happens only through the respective pointer. This way the compiler can schedule the instructions in a much more efficient way. Essentially it can group and schedule the loads and stores. + +{{% notice Note %}} +The `restrict` keyword only works in C, not in C++. +{{% /notice %}} Let's add `restrict` to `A` in the parameter list: ```C @@ -191,10 +195,10 @@ It is interesting to see that in such an example adding the `restrict` keyword r ## What about SVE2? -We have shown the obvious benefit of `restrict` in this function, on an armv8-a CPU, but we have new armv9-a CPUs out there with SVE2 as well as Neon/ASIMD. -Could the compiler generate better code in that case using `restrict`? The output without `restrict` is almost the same, but with `restrict` used, this is the result (we used `clang-17 -O3 -march=armv9-a`): +You have now seen the benefit of `restrict` in this function, on an Armv8-A CPU. You can now try it on an Armv9-A CPU which supports SVE2 as well as Neon/ASIMD. +Could the compiler generate better code in that case using `restrict`? The output without `restrict` is almost the same, but with `restrict` used, this is the result (Compiler flags used: `clang-17 -O3 -march=armv9-a`): -``` +```output scaleVectors: // @scaleVectors ldp q1, q2, [x0] ldp q3, q4, [x1] @@ -210,4 +214,4 @@ scaleVectors: // @scaleVectors There are just 10 instructions, 31% of the original code size! The compiler has made great use of the SVE2 features, combining the multiplications and reducing them to 4 and, at the same time, grouping loads and stores down to 2 each. We have optimized our code by more than 3x just by adding a C99 keyword. -We are now going to look at another example. +Next, lets take a look at another example. diff --git a/content/learning-paths/embedded-systems/restrict-keyword-c99/when-to-use-restrict.md b/content/learning-paths/embedded-systems/restrict-keyword-c99/when-to-use-restrict.md index 8b3caf9e3..4d5165019 100644 --- a/content/learning-paths/embedded-systems/restrict-keyword-c99/when-to-use-restrict.md +++ b/content/learning-paths/embedded-systems/restrict-keyword-c99/when-to-use-restrict.md @@ -1,18 +1,18 @@ --- -title: When can we use restrict +title: When can you use restrict weight: 4 ### FIXED, DO NOT MODIFY layout: learningpathall --- -When can we use `restrict` or, put differently, how do we recognize that we need `restrict` in our code? +When can you use `restrict` or, put differently, how do you recognize that you need `restrict` in your code? -`restrict` as a pointer attribute is rather easy to test. As a rule of thumb, if the function includes one or more pointers to memory objects as arguments, we can use `restrict` if we are certain that the memory pointed to by these pointer arguments does not overlap and there is no other way to access them in the body of the function, except by the use of those pointers, i.e., there is no other global pointer or some other indirect way to access these elements. +`restrict` as a pointer attribute is rather easy to test. As a rule of thumb, if the function includes one or more pointers to memory objects as arguments, you can use `restrict` if you are certain that the memory pointed to by these pointer arguments does not overlap and there is no other way to access them in the body of the function, except by the use of those pointers, i.e., there is no other global pointer or some other indirect way to access these elements. -Let's show a counter example: +Let's see a counter example: -``` +```C int A[10]; int f(int *B, size_t n) { @@ -30,4 +30,4 @@ int main() { This example does not not benefit from `restrict` in either gcc and clang. -However, there are plenty of cases that are candidates for the `restrict` optimization. It's safe and easy to try but, even if it looks like a good candidate, it is still possible that the compiler will not detect a pattern that is suited for optimization and we might not see any reduction in the code or speed gain. It is up to the compiler; in some cases clang handles this better or differently from gcc, and vice versa, and this will also depend on the version. If you have a particular piece of code that you would like to optimize, before you attempt to refactor it completely, rewrite it in assembly or use any SIMD instructions, it might be worth trying `restrict`. Even saving a couple of instructions in a critical loop function is worth having by just adding one keyword. +However, there are plenty of cases that are candidates for the `restrict` optimization. It's safe and easy to try but, even if it looks like a good candidate, it is still possible that the compiler will not detect a pattern that is suited for optimization and you might not see any reduction in the code or speed gain. It is up to the compiler; in some cases clang handles this better or differently from gcc, and vice versa, and this will also depend on the version. If you have a particular piece of code that you would like to optimize, before you attempt to refactor it completely, rewrite it in assembly or use any SIMD instructions, it might be worth trying `restrict`. Even saving a couple of instructions in a critical loop function is worth having by just adding one keyword. diff --git a/contributors.csv b/contributors.csv index 9043db3fa..5ff202e36 100644 --- a/contributors.csv +++ b/contributors.csv @@ -18,3 +18,4 @@ Frédéric -lefred- Descamps,OCI,,,,lefred.be Kristof Beyls,Arm,,,, David Spickett,Arm,,,, Uma Ramalingam,Arm,uma-ramalingam,,, +Konstantinos Margaritis,VectorCamp,,,,