fixed explanations according to comments

ArmDeveloperEcosystem · Oct 25, 2023 · cc6c2b4 · cc6c2b4
1 parent b991ad1
commit cc6c2b4
Show file tree

Hide file tree

Showing 2 changed files with 5 additions and 5 deletions.
diff --git a/...t/learning-paths/embedded-systems/restrict-keyword-c99/restrict-example-sve2.md b/...t/learning-paths/embedded-systems/restrict-keyword-c99/restrict-example-sve2.md
@@ -55,7 +55,7 @@ process_data:
         ret
 ```
 
-Do not worry about each instruction in the assembly here, but notice that gcc correctly uses the SVE2 `while*` instructions to do the loops, resulting in far smaller code than with Neon. But in order to illustrate our point, let's try adding `restrict` to pointer `in`:
+Do not worry about each instruction in the assembly here, but notice that gcc has added 2 loops, one that uses the SVE2 `while*` instructions to the processing (.L4) and one scalar loop (.L3). The latter is executed in case theis any pointer aliasing -if there is any overlap between the memory pointers basically. Let's try adding `restrict` to pointer `in`:
 
 ```C
 void process_data (const char *restrict in, char *out, size_t size)
@@ -85,7 +85,7 @@ process_data:
         ret
 ```
 
-This is a huge improvement! Code size reduction is down from 30 lines to 14, less than half the original size, and faster too. In both cases, you will note that the main loop `.L3` is exactly the same, but the entry and exit code of the function are very much simplified, because the compiler was able to distinguish that the memory pointed by `in` does not overlap with memory pointed by `out`, it was able to simplify the conditions for entering and exiting the main loop.
+This is a huge improvement! Code size reduction is down from 30 lines to 14, less than half the original size. In both cases, you will note that the main loop (`.L4` in the former case, `.L3` in the latter) is exactly the same, but the entry and exit code of the function are very much simplified, because the compiler was able to distinguish that the memory pointed by `in` does not overlap with memory pointed by `out`, it was able to simplify the code by eliminating the scalar loop.
 
 But I can almost hear the question: "Why is that important if the main loop is still the same?"
 And it is a right question. The answer is this: 

diff --git a/content/learning-paths/embedded-systems/restrict-keyword-c99/what-is-restrict.md b/content/learning-paths/embedded-systems/restrict-keyword-c99/what-is-restrict.md
@@ -97,7 +97,7 @@ scaleVectors:                           // @scaleVectors
         ret
 ```
 
-This doesn't look optimal. `scaleVectors` seems to be doing each load, multiplication, store in sequence, surely it can be further optimized? This is because the memory pointers are overlapping, let's try different assignments of `a` and `b` in `main()` to make them explicitly independent, perhaps the compiler can detect that and better schedule the instructions.
+This doesn't look optimal. `scaleVectors` seems to be doing each load, multiplication, store in sequence, surely it can be further optimized? This is because the memory pointers are overlapping, let's try different assignments of `a` and `b` in `main()` to make them explicitly independent, perhaps the compiler can detect that and generate faster instructions to do the same thing.
 
 ```
     int64_t a[] = { 1, 2, 3, 4 };
@@ -120,7 +120,7 @@ void scaleVectors(int64_t *restrict A, int64_t *B, int64_t *C) {
 }
 ```
 
-This is the assembly output with `clang-17` (gcc has a similar output):
+This is the assembly output with `clang-17 -O3` (gcc has a similar output):
 
 ```assembly
 scaleVectors:                           // @scaleVectors
@@ -161,7 +161,7 @@ void scaleVectors(int64_t *restrict A, int64_t *restrict B, int64_t *C) {
 }
 ```
 
-And the assembly output with `clang-17`:
+And the assembly output with `clang-17 -O3`:
 
 ```
 scaleVectors:                           // @scaleVectors