From fb58f9dbc898c059ade9cb1baced82d67a03c9ed Mon Sep 17 00:00:00 2001 From: Konstantinos Margaritis Date: Thu, 19 Oct 2023 18:27:46 +0300 Subject: [PATCH 01/35] WIP: restrict keyword C99 Arm Learning Paths material --- .../restrict-keyword-c99/_index.md | 38 ++++ .../restrict-keyword-c99/_next-steps.md | 23 ++ .../restrict-keyword-c99/_review.md | 48 ++++ .../restrict-example-sve2.md | 95 ++++++++ .../restrict-keyword-c99/what-is-restrict.md | 213 ++++++++++++++++++ .../when-to-use-restrict.md | 15 ++ 6 files changed, 432 insertions(+) create mode 100644 content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/_index.md create mode 100644 content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/_next-steps.md create mode 100644 content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/_review.md create mode 100644 content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/restrict-example-sve2.md create mode 100644 content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/what-is-restrict.md create mode 100644 content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/when-to-use-restrict.md diff --git a/content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/_index.md b/content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/_index.md new file mode 100644 index 000000000..17f9bc591 --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/_index.md @@ -0,0 +1,38 @@ +--- +title: restrict keyword in C99 + +minutes_to_complete: 20 + +who_is_this_for: C developers who are interested in software optimization. + +learning_objectives: + - Learn the importance of using 'restrict' keyword in C correctly + +prerequisites: + - An Arm based system with Linux OS and recent compiler (clang or gcc) + +author_primary: Konstantinos Margaritis, VectorCamp + +### Tags +skilllevels: Advanced +subjects: Programming +armips: + - Aarch64 + - Armv8-a + - Armv9-a +tools_software_languages: + - Linux + - GCC + - Clang + - SVE2 + - Coding +operatingsystems: + - Linux + + +### FIXED, DO NOT MODIFY +# ================================================================================ +weight: 1 # _index.md always has weight of 1 to order correctly +layout: "learningpathall" # All files under learning paths have this same wrapper +learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content. +--- diff --git a/content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/_next-steps.md b/content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/_next-steps.md new file mode 100644 index 000000000..ffaa68ad4 --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/_next-steps.md @@ -0,0 +1,23 @@ +--- +next_step_guidance: PLACEHOLDER TEXT 1 + +recommended_path: /learning-paths/PLACEHOLDER_CATEGORY/PLACEHOLDER_LEARNING_PATH/ + +further_reading: + - resource: + title: Wikipedia restrict entry + link: https://en.wikipedia.org/wiki/Restrict + type: documentation + - resource: + title: Godbolt restrict tests + link: https://godbolt.org/z/PxWxjc1oh + type: website + + +# ================================================================================ +# FIXED, DO NOT MODIFY +# ================================================================================ +weight: 21 # set to always be larger than the content in this path, and one more than 'review' +title: "Next Steps" # Always the same +layout: "learningpathall" # All files under learning paths have this same wrapper +--- diff --git a/content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/_review.md b/content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/_review.md new file mode 100644 index 000000000..db48157a9 --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/_review.md @@ -0,0 +1,48 @@ +--- +review: + - questions: + question: > + Where is `restrict` placed in the code? + answers: + - In the function declaration + - As an enum value + - Between the pointer symbol (*) and the argument name + correct_answer: 3 + explanation: > + `restrict` is placed in the arguments list of a function, between the * and the variable name, like this: + `int func(char *restrict arg)` + - questions: + question: > + What does `restrict` do? + answers: + - It increases the performance of the CPU cores, making your program run faster + - It issues a command to clear the cache, leaving more room for your program + - It restricts the standard of the C library used to C99 + - It hints the compiler that the memory pointed to by the variable cannot be accessed through any other means apart from this variable, inside the particular function + correct_answer: 4 + explanation: > + In order for the compiler to better schedule the instructions of a function, it needs to know if there is any + dependency between the argument variables. If there is none, usually the compiler can group together instructions + increasing performance and efficiency. + + - questions: + question: > + Which language supports `restrict` + answers: + - Python + - C and C++ + - C only (after C99) + - Rust + correct_answer: 3 + explanation: > + `restrict` is a C-only keyword, it does nothing on C++. + + + +# ================================================================================ +# FIXED, DO NOT MODIFY +# ================================================================================ +title: "Review" # Always the same title +weight: 20 # Set to always be larger than the content in this path +layout: "learningpathall" # All files under learning paths have this same wrapper +--- diff --git a/content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/restrict-example-sve2.md b/content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/restrict-example-sve2.md new file mode 100644 index 000000000..334f5aadd --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/restrict-example-sve2.md @@ -0,0 +1,95 @@ +--- +title: Another example with SVE2 +weight: 3 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Example 2: SVE2 unleashed + +Let's try another example, one from [gcc restrict pointer examples](https://www.gnu.org/software/c-intro-and-ref/manual/html_node/restrict-Pointer-Example.html): + +```C +void process_data (const char *in, char *out, size_t size) +{ + for (int i = 0; i < size; i++) + out[i] = in[i] + in[i + 1]; +} +``` + +This example will be easier to demonstrate with SVE2, and we found gcc 13 to have a better result than clang, this is the output of `gcc-13 -O3 -march=armv9-a`: + +``` +process_data: + cbz x2, .L1 + add x5, x0, 1 + cntb x3 + sub x4, x1, x5 + sub x3, x3, #1 + cmp x4, x3 + bls .L6 + mov w4, w2 + mov x3, 0 + whilelo p0.b, wzr, w2 +.L4: + ld1b z0.b, p0/z, [x0, x3] + ld1b z1.b, p0/z, [x5, x3] + add z0.b, z0.b, z1.b + st1b z0.b, p0, [x1, x3] + incb x3 + whilelo p0.b, w3, w4 + b.any .L4 +.L1: + ret +.L6: + mov x3, 0 +.L3: + ldrb w4, [x5, x3] + ldrb w6, [x0, x3] + add w4, w4, w6 + strb w4, [x1, x3] + add x3, x3, 1 + cmp x2, x3 + bne .L3 + ret +``` + +We will not go into explaining the assembly, but we will note that gcc correctly uses the SVE2 `while*` instructions to do the loops, resulting in far smaller code than with Neon. But in order to illustrate our point, let's try adding `restrict` to pointer `in`: + +```C +void process_data (const char *restrict in, char *out, size_t size) +{ + for (int i = 0; i < size; i++) + out[i] = in[i] + in[i + 1]; +} +``` + +This is now the output from gcc-13: +``` +process_data: + cbz x2, .L1 + add x5, x0, 1 + mov w4, w2 + mov x3, 0 + whilelo p0.b, wzr, w2 +.L3: + ld1b z1.b, p0/z, [x0, x3] + ld1b z0.b, p0/z, [x5, x3] + add z0.b, z0.b, z1.b + st1b z0.b, p0, [x1, x3] + incb x3 + whilelo p0.b, w3, w4 + b.any .L3 +.L1: + ret +``` + +This is a huge improvement! Code size reduction is down from 30 lines to 14, less than half the original size, and faster too. In both cases, you will note that the main loop `.L3` is exactly the same, but the entry and exit code of the function are very much simplified, because the compiler was able to distinguish that the memory pointed by `in` does not overlap with memory pointed by `out`, it was able to simplify the conditions for entering and exiting the main loop. + +But I can almost hear the question: "Why is that important if the main loop is still the same?" +And it is a right question. The answer is this: + +If your function is going to be called once and run over tens of billions of elements, then saving a few instructions before and after the main loop does not really matter. + +But if your function is called on smaller sizes millions or even *billions* of times, then saving a few instructions in this function means we are saving a few *billions* of instructions total, which means less time to spend running on the CPU and less energy wasted. diff --git a/content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/what-is-restrict.md b/content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/what-is-restrict.md new file mode 100644 index 000000000..62dc2d760 --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/what-is-restrict.md @@ -0,0 +1,213 @@ +--- +title: What problem does restrict solve? +weight: 2 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## The problem: Overlapping memory regions as pointer arguments + +Before we go into detail of the `restrict` keyword, let's first demonstrate the problem. + +Let's consider this C code, which is a variation of the one in [wikipedia](https://en.wikipedia.org/wiki/Restrict): +```C +#include +#include +#include + +void scaleVectors(int64_t *A, int64_t *B, int64_t *C) { + for (int i = 0; i < 4; i++) { + A[i] *= *C; + B[i] *= *C; + } +} + +void printVector(char *t, int64_t *A) { + printf("%s: ", t); + for (int i=0;i < 4; i++) { + printf("%ld ", A[i]); + } + printf("\n"); +} + +int main() { + int64_t a[] = { 1, 2, 3, 4, 5, 6, 7, 8 }; + int64_t *b = &a[2]; + int64_t c = 2; + + printVector("a(before)", a); + printVector("b(before)", b); + scaleVectors(a, b, &c); + printVector("a(after) ", a); + printVector("b(after) ", b); +} +``` + +So, there are 2 points to make here: +1. `scaleVectors()` is the important function here, it scales two vectors by the same scalefactor `*C` +2. vector `a` overlaps with vector `b`. (`b = &a[2]`). + +this rather simple program produces this output: +``` +a(before): 1 2 3 4 +b(before): 3 4 5 6 +a(after) : 2 4 12 16 +b(after) : 12 16 10 12 +``` + +Notice that after the scaling the contents of `a` are also affected by the scaling of `b` as their elements overlap in memory. + +We will include the assembly output of `scaleVectors` as produced by `clang-17 -O3`: + +``` +scaleVectors: // @scaleVectors + ldr x8, [x2] + ldr x9, [x0] + mul x8, x9, x8 + str x8, [x0] + ldr x8, [x2] + ldr x9, [x1] + mul x8, x9, x8 + str x8, [x1] + ldr x8, [x2] + ldr x9, [x0, #8] + mul x8, x9, x8 + str x8, [x0, #8] + ldr x8, [x2] + ldr x9, [x1, #8] + mul x8, x9, x8 + str x8, [x1, #8] + ldr x8, [x2] + ldr x9, [x0, #16] + mul x8, x9, x8 + str x8, [x0, #16] + ldr x8, [x2] + ldr x9, [x1, #16] + mul x8, x9, x8 + str x8, [x1, #16] + ldr x8, [x2] + ldr x9, [x0, #24] + mul x8, x9, x8 + str x8, [x0, #24] + ldr x8, [x2] + ldr x9, [x1, #24] + mul x8, x9, x8 + str x8, [x1, #24] + ret +``` + +This doesn't look optimal. `scaleVectors` seems to be doing each load,multiplication,store in sequence, surely it can be further optimized? This is because the memory pointers are overlapping, let's try different assignments of `a` and `b` in `main()` to make them explicitly independent, perhaps the compiler can detect that and better schedule the instructions. + +``` + int64_t a[] = { 1, 2, 3, 4 }; + int64_t b[] = { 5, 6, 7, 8 }; +``` + +Unsurprisingly, the disassembled output of `scaleVectors` is the same. The reason for this is that the compiler has no hint of the dependency between the two pointers used in the function so it has no choice than to assume that it has to process one element at a time. The function has no way of knowing with what arguments it is to be called. We see 8 instances of `mul`, which is correct but the number of loads and stores in between indicates that the CPU spends its time waiting for data to arrive from/to the cache. We need a way to be able to hint the compiler that it can assume the buffers passed are independent. + +## The Solution: restrict + +This is what the C99 `restrict` keyword has come to solve. It instructs the compiler that the passed arguments are in no way dependant on each other and access to the memory of each happens only through the respective pointer. This way the compiler can schedule the instructions in a much better way. In essence it can group and schedule the loads and stores. `restrict` only works in C, not in C++. + +Let's add `restrict` to `A` in the parameter list: +```C +void scaleVectors(int64_t *restrict A, int64_t *B, int64_t *C) { + for (int i = 0; i < 4; i++) { + A[i] *= *C; + B[i] *= *C; + } +} +``` + +This is the assembly output with `clang-17` (gcc has a similar output): + +```assembly +scaleVectors: // @scaleVectors + ldp x9, x10, [x1] + ldr x8, [x2] + ldp x11, x12, [x1, #16] + mul x9, x9, x8 + ldp x13, x14, [x0] + str x9, [x1] + ldr x9, [x2] + mul x8, x13, x8 + mul x10, x10, x9 + mul x9, x14, x9 + str x10, [x1, #8] + ldr x10, [x2] + stp x8, x9, [x0] + mul x11, x11, x10 + str x11, [x1, #16] + ldp x15, x11, [x0, #16] + ldr x13, [x2] + mul x10, x15, x10 + mul x11, x11, x13 + mul x12, x12, x13 + stp x10, x11, [x0, #16] + str x12, [x1, #24] + ret +``` + +We see an obvious reduction in the number of instructions, from 32 instructions down to 22! That's 68% of the original count, which is impressive on its own. One can easily see that the loads are grouped, as well as the multiplications. Of course, still 8 multiplications, that cannot change, but far fewer loads and stores as the compiler found the opportunity to use `LDP`/`STP` which load/store in pairs for the pointer `A`. + +Let's try adding `restrict` to `B` as well: +```C +void scaleVectors(int64_t *restrict A, int64_t *restrict B, int64_t *C) { + for (int i = 0; i < 4; i++) { + A[i] *= *C; + B[i] *= *C; + } +} +``` + +And the assembly output with `clang-17`: + +``` +scaleVectors: // @scaleVectors + ldp x9, x10, [x0] + ldr x8, [x2] + ldp x11, x12, [x0, #16] + ldp x13, x14, [x1] + mul x9, x9, x8 + ldp x15, x16, [x1, #16] + mul x10, x10, x8 + mul x11, x11, x8 + mul x12, x12, x8 + mul x13, x13, x8 + stp x9, x10, [x0] + mul x9, x14, x8 + mul x10, x15, x8 + mul x8, x16, x8 + stp x11, x12, [x0, #16] + stp x13, x9, [x1] + stp x10, x8, [x1, #16] + ret +``` + +Another reduction in the number of instructions, down to 17, for a total reduction to 53% the original count. This time, only 5 loads and 4 stores. And as before, all the loads/stores are paired (because the `LDP`/`STP` instructions are used). + +It is interesting to see that in such an example, adding just the `restrict` keyword reduced our code size to almost half. This will have an obvious impact in performance and efficiency. + +## What about SVE2? + +We have shown the obvious benefit of `restrict` in this function, on an armv8-a CPU, but we have new armv9-a CPUs out there with SVE2 as well as Neon/ASIMD. +Could the compiler generate better code in that case using `restrict`? To save time, the output without `restrict` is almost the same, however with `restrict` used, this is the result (we used `clang-17 -O3 -march=armv9-a`): + +``` +scaleVectors: // @scaleVectors + ldp q1, q2, [x0] + ldp q3, q4, [x1] + ld1r { v0.2d }, [x2] + mul z1.d, z1.d, z0.d + mul z2.d, z2.d, z0.d + stp q1, q2, [x0] + mul z1.d, z3.d, z0.d + mul z0.d, z4.d, z0.d + stp q1, q0, [x1] + ret +``` + +This is just 10 instructions, only 31% of the original code size! The compiler made a great use of SVE2 features, combining the multiplications and reducing them to 4, at the same time grouping loads and stores down to 2 each. We have optimized our code more than 3x by only adding a C99 keyword! + +We are going to look at another example next. \ No newline at end of file diff --git a/content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/when-to-use-restrict.md b/content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/when-to-use-restrict.md new file mode 100644 index 000000000..7ed04cb93 --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/when-to-use-restrict.md @@ -0,0 +1,15 @@ +--- +title: When can we use restrict +weight: 4 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## So, when can we use restrict? + +This is all very good, but when can we use it? Or put differently, how to recognize we need `restrict` in our code? + +`restrict` as a pointer attribute is rather easy to test. As a rule of thumb, if our function includes one or more pointers to memory objects as arguments, we can use `restrict` if we are certain that the memory pointed by those pointer arguments does not overlap and there is no other way to access it in the body of the function, except by the use of those pointers -eg. there is no other global pointer, or some other indirect way to access these elements. + +If this applies, then it's safe to try `restrict`. Unfortunately, even if the above holds, it is still possible that the compiler will not detect a pattern that is liable for optimization and we might not see any reduction in the code or any speed up. It is up to the compiler, some cases clang handles better or differently than gcc, and vice versa, and that even depends on the version. If you have a particular piece of code that falls in the above criteria that you would care to optimize, before you attempt to refactor it completely, or rewrite it in asm or SIMD, it might be worth a shot to try `restrict`. Even saving a couple of instructions in a critical loop function is worth having to add just one keyword! \ No newline at end of file From b991ad1b90d73062c1b35fce1f0a6db5b3c1934a Mon Sep 17 00:00:00 2001 From: Konstantinos Margaritis Date: Fri, 20 Oct 2023 11:18:40 +0300 Subject: [PATCH 02/35] Changed category, fixed minor issues --- .../restrict-keyword-c99/_index.md | 5 ++- .../restrict-keyword-c99/_next-steps.md | 4 +-- .../restrict-keyword-c99/_review.md | 12 +++---- .../restrict-example-sve2.md | 2 +- .../restrict-keyword-c99/what-is-restrict.md | 8 ++--- .../when-to-use-restrict.md | 35 +++++++++++++++++++ .../when-to-use-restrict.md | 15 -------- 7 files changed, 50 insertions(+), 31 deletions(-) rename content/learning-paths/{servers-and-cloud-computing => embedded-systems}/restrict-keyword-c99/_index.md (95%) rename content/learning-paths/{servers-and-cloud-computing => embedded-systems}/restrict-keyword-c99/_next-steps.md (78%) rename content/learning-paths/{servers-and-cloud-computing => embedded-systems}/restrict-keyword-c99/_review.md (70%) rename content/learning-paths/{servers-and-cloud-computing => embedded-systems}/restrict-keyword-c99/restrict-example-sve2.md (91%) rename content/learning-paths/{servers-and-cloud-computing => embedded-systems}/restrict-keyword-c99/what-is-restrict.md (90%) create mode 100644 content/learning-paths/embedded-systems/restrict-keyword-c99/when-to-use-restrict.md delete mode 100644 content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/when-to-use-restrict.md diff --git a/content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/_index.md b/content/learning-paths/embedded-systems/restrict-keyword-c99/_index.md similarity index 95% rename from content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/_index.md rename to content/learning-paths/embedded-systems/restrict-keyword-c99/_index.md index 17f9bc591..db51af3da 100644 --- a/content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/_index.md +++ b/content/learning-paths/embedded-systems/restrict-keyword-c99/_index.md @@ -1,9 +1,9 @@ --- title: restrict keyword in C99 -minutes_to_complete: 20 +minutes_to_complete: 30 -who_is_this_for: C developers who are interested in software optimization. +who_is_this_for: C developers who are interested in software optimization learning_objectives: - Learn the importance of using 'restrict' keyword in C correctly @@ -21,7 +21,6 @@ armips: - Armv8-a - Armv9-a tools_software_languages: - - Linux - GCC - Clang - SVE2 diff --git a/content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/_next-steps.md b/content/learning-paths/embedded-systems/restrict-keyword-c99/_next-steps.md similarity index 78% rename from content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/_next-steps.md rename to content/learning-paths/embedded-systems/restrict-keyword-c99/_next-steps.md index ffaa68ad4..ba23e557c 100644 --- a/content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/_next-steps.md +++ b/content/learning-paths/embedded-systems/restrict-keyword-c99/_next-steps.md @@ -1,7 +1,7 @@ --- -next_step_guidance: PLACEHOLDER TEXT 1 +next_step_guidance: You should now be able to test the `restrict` keyword on your own or other open-source code and discover potential optimizations! -recommended_path: /learning-paths/PLACEHOLDER_CATEGORY/PLACEHOLDER_LEARNING_PATH/ +recommended_path: /learning-paths/embedded-systems/ further_reading: - resource: diff --git a/content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/_review.md b/content/learning-paths/embedded-systems/restrict-keyword-c99/_review.md similarity index 70% rename from content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/_review.md rename to content/learning-paths/embedded-systems/restrict-keyword-c99/_review.md index db48157a9..d9f0a8080 100644 --- a/content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/_review.md +++ b/content/learning-paths/embedded-systems/restrict-keyword-c99/_review.md @@ -6,23 +6,23 @@ review: answers: - In the function declaration - As an enum value - - Between the pointer symbol (*) and the argument name + - Between the pointer symbol (*) and the parameter name correct_answer: 3 explanation: > - `restrict` is placed in the arguments list of a function, between the * and the variable name, like this: + `restrict` is placed in the arguments list of a function, between the * and the parameter name, like this: `int func(char *restrict arg)` - questions: question: > What does `restrict` do? answers: - - It increases the performance of the CPU cores, making your program run faster + - It increases the frequency of the CPU cores, making your program run faster - It issues a command to clear the cache, leaving more room for your program - It restricts the standard of the C library used to C99 - - It hints the compiler that the memory pointed to by the variable cannot be accessed through any other means apart from this variable, inside the particular function + - It hints to the compiler that the memory pointed to by the parameter, cannot be accessed through any other means inside the particular function except, using this pointer correct_answer: 4 explanation: > In order for the compiler to better schedule the instructions of a function, it needs to know if there is any - dependency between the argument variables. If there is none, usually the compiler can group together instructions + dependency between the parameter variables. If there is no dependency, usually the compiler can group together instructions increasing performance and efficiency. - questions: @@ -35,7 +35,7 @@ review: - Rust correct_answer: 3 explanation: > - `restrict` is a C-only keyword, it does nothing on C++. + `restrict` is a C-only keyword, it does not exist on C++ (`__restrict__` does, but it is not exactly the same) diff --git a/content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/restrict-example-sve2.md b/content/learning-paths/embedded-systems/restrict-keyword-c99/restrict-example-sve2.md similarity index 91% rename from content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/restrict-example-sve2.md rename to content/learning-paths/embedded-systems/restrict-keyword-c99/restrict-example-sve2.md index 334f5aadd..78111fdf8 100644 --- a/content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/restrict-example-sve2.md +++ b/content/learning-paths/embedded-systems/restrict-keyword-c99/restrict-example-sve2.md @@ -55,7 +55,7 @@ process_data: ret ``` -We will not go into explaining the assembly, but we will note that gcc correctly uses the SVE2 `while*` instructions to do the loops, resulting in far smaller code than with Neon. But in order to illustrate our point, let's try adding `restrict` to pointer `in`: +Do not worry about each instruction in the assembly here, but notice that gcc correctly uses the SVE2 `while*` instructions to do the loops, resulting in far smaller code than with Neon. But in order to illustrate our point, let's try adding `restrict` to pointer `in`: ```C void process_data (const char *restrict in, char *out, size_t size) diff --git a/content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/what-is-restrict.md b/content/learning-paths/embedded-systems/restrict-keyword-c99/what-is-restrict.md similarity index 90% rename from content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/what-is-restrict.md rename to content/learning-paths/embedded-systems/restrict-keyword-c99/what-is-restrict.md index 62dc2d760..9cc9e9732 100644 --- a/content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/what-is-restrict.md +++ b/content/learning-paths/embedded-systems/restrict-keyword-c99/what-is-restrict.md @@ -10,7 +10,7 @@ layout: learningpathall Before we go into detail of the `restrict` keyword, let's first demonstrate the problem. -Let's consider this C code, which is a variation of the one in [wikipedia](https://en.wikipedia.org/wiki/Restrict): +Let's consider this C code: ```C #include #include @@ -97,18 +97,18 @@ scaleVectors: // @scaleVectors ret ``` -This doesn't look optimal. `scaleVectors` seems to be doing each load,multiplication,store in sequence, surely it can be further optimized? This is because the memory pointers are overlapping, let's try different assignments of `a` and `b` in `main()` to make them explicitly independent, perhaps the compiler can detect that and better schedule the instructions. +This doesn't look optimal. `scaleVectors` seems to be doing each load, multiplication, store in sequence, surely it can be further optimized? This is because the memory pointers are overlapping, let's try different assignments of `a` and `b` in `main()` to make them explicitly independent, perhaps the compiler can detect that and better schedule the instructions. ``` int64_t a[] = { 1, 2, 3, 4 }; int64_t b[] = { 5, 6, 7, 8 }; ``` -Unsurprisingly, the disassembled output of `scaleVectors` is the same. The reason for this is that the compiler has no hint of the dependency between the two pointers used in the function so it has no choice than to assume that it has to process one element at a time. The function has no way of knowing with what arguments it is to be called. We see 8 instances of `mul`, which is correct but the number of loads and stores in between indicates that the CPU spends its time waiting for data to arrive from/to the cache. We need a way to be able to hint the compiler that it can assume the buffers passed are independent. +Unsurprisingly, the disassembled output of `scaleVectors` is the same. The reason for this is that the compiler has no hint of the dependency between the two pointers used in the function so it has no choice than to assume that it has to process one element at a time. The function has no way of knowing with what arguments it is to be called. We see 8 instances of `mul`, which is correct but the number of loads and stores inbetween indicates that the CPU spends its time waiting for data to arrive from/to the cache. We need a way to be able to hint the compiler that it can assume the buffers passed are independent. ## The Solution: restrict -This is what the C99 `restrict` keyword has come to solve. It instructs the compiler that the passed arguments are in no way dependant on each other and access to the memory of each happens only through the respective pointer. This way the compiler can schedule the instructions in a much better way. In essence it can group and schedule the loads and stores. `restrict` only works in C, not in C++. +This is what the C99 `restrict` keyword has come to solve. It instructs the compiler that the passed arguments are in no way dependant on each other and access to the memory of each happens only through the respective pointer. This way the compiler can schedule the instructions in a much better way. In essence it can group and schedule the loads and stores. As a note, `restrict` only works in C, not in C++. Let's add `restrict` to `A` in the parameter list: ```C diff --git a/content/learning-paths/embedded-systems/restrict-keyword-c99/when-to-use-restrict.md b/content/learning-paths/embedded-systems/restrict-keyword-c99/when-to-use-restrict.md new file mode 100644 index 000000000..65f617558 --- /dev/null +++ b/content/learning-paths/embedded-systems/restrict-keyword-c99/when-to-use-restrict.md @@ -0,0 +1,35 @@ +--- +title: When can we use restrict +weight: 4 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## So, when can we use restrict? + +This is all very good, but when can we use it? Or put differently, how to recognize we need `restrict` in our code? + +`restrict` as a pointer attribute is rather easy to test. As a rule of thumb, if our function includes one or more pointers to memory objects as arguments, we can use `restrict` if we are certain that the memory pointed by those pointer arguments does not overlap and there is no other way to access it in the body of the function, except by the use of those pointers -eg. there is no other global pointer, or some other indirect way to access these elements. + +Let's show a coutner-example: + +``` +int A[10]; + +int f(int *B, size_t n) { + int sum = 0; + for (int i=0; i < n; i++) { + sum += A[i] * B[i]; // B is used in conjunction with A + } +} + +int main() { + int s = f(A, 10); // A is passed to f, so f will be calculating sum of A[i] * A[i] elements + printf("sum = %d", s); +} +``` + +This example does not not benefit from `restrict` at all in both gcc and clang. + +However, there are plenty of cases that are candidates for the `restrict` optimization. And it's safe and easy to try. Nevertheless, even if it looks like a good candidate, it is still possible that the compiler will not detect a pattern that is suited for optimization and we might not see any reduction in the code or speed gain. It is up to the compiler, some cases clang handles better or differently than gcc, and vice versa, and that even depends on the version. If you have a particular piece of code that falls in the above criteria that you would care to optimize, before you attempt to refactor it completely, or rewrite it in assembly or use any SIMD instructions, it might be worth a shot to try `restrict`. Even saving a couple of instructions in a critical loop function is worth having to add just one keyword! \ No newline at end of file diff --git a/content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/when-to-use-restrict.md b/content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/when-to-use-restrict.md deleted file mode 100644 index 7ed04cb93..000000000 --- a/content/learning-paths/servers-and-cloud-computing/restrict-keyword-c99/when-to-use-restrict.md +++ /dev/null @@ -1,15 +0,0 @@ ---- -title: When can we use restrict -weight: 4 - -### FIXED, DO NOT MODIFY -layout: learningpathall ---- - -## So, when can we use restrict? - -This is all very good, but when can we use it? Or put differently, how to recognize we need `restrict` in our code? - -`restrict` as a pointer attribute is rather easy to test. As a rule of thumb, if our function includes one or more pointers to memory objects as arguments, we can use `restrict` if we are certain that the memory pointed by those pointer arguments does not overlap and there is no other way to access it in the body of the function, except by the use of those pointers -eg. there is no other global pointer, or some other indirect way to access these elements. - -If this applies, then it's safe to try `restrict`. Unfortunately, even if the above holds, it is still possible that the compiler will not detect a pattern that is liable for optimization and we might not see any reduction in the code or any speed up. It is up to the compiler, some cases clang handles better or differently than gcc, and vice versa, and that even depends on the version. If you have a particular piece of code that falls in the above criteria that you would care to optimize, before you attempt to refactor it completely, or rewrite it in asm or SIMD, it might be worth a shot to try `restrict`. Even saving a couple of instructions in a critical loop function is worth having to add just one keyword! \ No newline at end of file From cc6c2b4c4bb3cf482526f33c9c788e8ca883d600 Mon Sep 17 00:00:00 2001 From: Konstantinos Margaritis Date: Wed, 25 Oct 2023 13:42:55 +0300 Subject: [PATCH 03/35] fixed explanations according to comments --- .../restrict-keyword-c99/restrict-example-sve2.md | 4 ++-- .../restrict-keyword-c99/what-is-restrict.md | 6 +++--- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/content/learning-paths/embedded-systems/restrict-keyword-c99/restrict-example-sve2.md b/content/learning-paths/embedded-systems/restrict-keyword-c99/restrict-example-sve2.md index 78111fdf8..101bbb33c 100644 --- a/content/learning-paths/embedded-systems/restrict-keyword-c99/restrict-example-sve2.md +++ b/content/learning-paths/embedded-systems/restrict-keyword-c99/restrict-example-sve2.md @@ -55,7 +55,7 @@ process_data: ret ``` -Do not worry about each instruction in the assembly here, but notice that gcc correctly uses the SVE2 `while*` instructions to do the loops, resulting in far smaller code than with Neon. But in order to illustrate our point, let's try adding `restrict` to pointer `in`: +Do not worry about each instruction in the assembly here, but notice that gcc has added 2 loops, one that uses the SVE2 `while*` instructions to the processing (.L4) and one scalar loop (.L3). The latter is executed in case theis any pointer aliasing -if there is any overlap between the memory pointers basically. Let's try adding `restrict` to pointer `in`: ```C void process_data (const char *restrict in, char *out, size_t size) @@ -85,7 +85,7 @@ process_data: ret ``` -This is a huge improvement! Code size reduction is down from 30 lines to 14, less than half the original size, and faster too. In both cases, you will note that the main loop `.L3` is exactly the same, but the entry and exit code of the function are very much simplified, because the compiler was able to distinguish that the memory pointed by `in` does not overlap with memory pointed by `out`, it was able to simplify the conditions for entering and exiting the main loop. +This is a huge improvement! Code size reduction is down from 30 lines to 14, less than half the original size. In both cases, you will note that the main loop (`.L4` in the former case, `.L3` in the latter) is exactly the same, but the entry and exit code of the function are very much simplified, because the compiler was able to distinguish that the memory pointed by `in` does not overlap with memory pointed by `out`, it was able to simplify the code by eliminating the scalar loop. But I can almost hear the question: "Why is that important if the main loop is still the same?" And it is a right question. The answer is this: diff --git a/content/learning-paths/embedded-systems/restrict-keyword-c99/what-is-restrict.md b/content/learning-paths/embedded-systems/restrict-keyword-c99/what-is-restrict.md index 9cc9e9732..e4d08eeb5 100644 --- a/content/learning-paths/embedded-systems/restrict-keyword-c99/what-is-restrict.md +++ b/content/learning-paths/embedded-systems/restrict-keyword-c99/what-is-restrict.md @@ -97,7 +97,7 @@ scaleVectors: // @scaleVectors ret ``` -This doesn't look optimal. `scaleVectors` seems to be doing each load, multiplication, store in sequence, surely it can be further optimized? This is because the memory pointers are overlapping, let's try different assignments of `a` and `b` in `main()` to make them explicitly independent, perhaps the compiler can detect that and better schedule the instructions. +This doesn't look optimal. `scaleVectors` seems to be doing each load, multiplication, store in sequence, surely it can be further optimized? This is because the memory pointers are overlapping, let's try different assignments of `a` and `b` in `main()` to make them explicitly independent, perhaps the compiler can detect that and generate faster instructions to do the same thing. ``` int64_t a[] = { 1, 2, 3, 4 }; @@ -120,7 +120,7 @@ void scaleVectors(int64_t *restrict A, int64_t *B, int64_t *C) { } ``` -This is the assembly output with `clang-17` (gcc has a similar output): +This is the assembly output with `clang-17 -O3` (gcc has a similar output): ```assembly scaleVectors: // @scaleVectors @@ -161,7 +161,7 @@ void scaleVectors(int64_t *restrict A, int64_t *restrict B, int64_t *C) { } ``` -And the assembly output with `clang-17`: +And the assembly output with `clang-17 -O3`: ``` scaleVectors: // @scaleVectors From e59702674db1a18307dde44278e2fffe851d20f5 Mon Sep 17 00:00:00 2001 From: Konstantinos Margaritis Date: Fri, 27 Oct 2023 11:07:26 +0300 Subject: [PATCH 04/35] Added explanation --- .../restrict-keyword-c99/restrict-example-sve2.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/learning-paths/embedded-systems/restrict-keyword-c99/restrict-example-sve2.md b/content/learning-paths/embedded-systems/restrict-keyword-c99/restrict-example-sve2.md index 101bbb33c..947f0e4a5 100644 --- a/content/learning-paths/embedded-systems/restrict-keyword-c99/restrict-example-sve2.md +++ b/content/learning-paths/embedded-systems/restrict-keyword-c99/restrict-example-sve2.md @@ -85,7 +85,7 @@ process_data: ret ``` -This is a huge improvement! Code size reduction is down from 30 lines to 14, less than half the original size. In both cases, you will note that the main loop (`.L4` in the former case, `.L3` in the latter) is exactly the same, but the entry and exit code of the function are very much simplified, because the compiler was able to distinguish that the memory pointed by `in` does not overlap with memory pointed by `out`, it was able to simplify the code by eliminating the scalar loop. +This is a huge improvement! Code size reduction is down from 30 lines to 14, less than half the original size. In both cases, you will note that the main loop (`.L4` in the former case, `.L3` in the latter) is exactly the same, but the entry and exit code of the function are very much simplified. The compiler was able to distinguish that the memory pointed by `in` does not overlap with memory pointed by `out`, it was able to simplify the code by eliminating the scalar loop and remove the associated code that checked if it needed to enter it. But I can almost hear the question: "Why is that important if the main loop is still the same?" And it is a right question. The answer is this: From 2b9d3b106c2a61fc5271727829bc1f5a26f270d9 Mon Sep 17 00:00:00 2001 From: GitHub Actions Stats Bot <> Date: Mon, 30 Oct 2023 01:43:00 +0000 Subject: [PATCH 05/35] automatic update of stats files --- data/stats_weekly_data.yml | 41 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) diff --git a/data/stats_weekly_data.yml b/data/stats_weekly_data.yml index 386a30a05..f2a6b4609 100644 --- a/data/stats_weekly_data.yml +++ b/data/stats_weekly_data.yml @@ -851,3 +851,44 @@ avg_close_time_hrs: 0 num_issues: 7 percent_closed_vs_total: 0.0 +- a_date: '2023-10-30' + content: + cross-platform: 7 + embedded-systems: 14 + install-guides: 73 + laptops-and-desktops: 9 + microcontrollers: 22 + servers-and-cloud-computing: 47 + smartphones-and-mobile: 7 + total: 179 + contributions: + external: 3 + internal: 174 + github_engagement: + num_forks: 30 + num_prs: 6 + individual_authors: + brenda-strech: 1 + christopher-seidl: 4 + daniel-gubay: 1 + dawid-borycki: 1 + elham-harirpoush: 2 + florent-lebeau: 5 + "fr\xE9d\xE9ric--lefred--descamps": 2 + gabriel-peterson: 3 + jason-andrews: 77 + julie-gaskin: 1 + julio-suarez: 5 + kasper-mecklenburg: 1 + kristof-beyls: 1 + liliya-wu: 1 + mathias-brossard: 1 + michael-hall: 3 + pareena-verma: 29 + pranay-bakre: 1 + ronan-synnott: 39 + uma-ramalingam: 1 + issues: + avg_close_time_hrs: 0 + num_issues: 7 + percent_closed_vs_total: 0.0 From 2f93584284940929ee666deffe87e800771a4311 Mon Sep 17 00:00:00 2001 From: David Spickett Date: Mon, 30 Oct 2023 13:30:40 +0000 Subject: [PATCH 06/35] Link directly to GitHub's Pull Request documentation This is likely (and was for me) the first result of searching Google for the same thing. Removes a small amount of friction if we just link to it. --- .../cross-platform/_example-learning-path/contribute.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/content/learning-paths/cross-platform/_example-learning-path/contribute.md b/content/learning-paths/cross-platform/_example-learning-path/contribute.md index 17e4370c9..403894902 100644 --- a/content/learning-paths/cross-platform/_example-learning-path/contribute.md +++ b/content/learning-paths/cross-platform/_example-learning-path/contribute.md @@ -55,7 +55,8 @@ After you have reviewed the new material using `hugo server` and there are no is You can now submit a GitHub pull request. {{% notice Note%}} -If you are new to GitHub, find a tutorial about how to create a pull request from a GitHub fork. +If you are new to GitHub, please go to [GitHub's documentation](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request) +to learn how to create a pull request from a GitHub fork. {{% /notice %}} Optionally, if you would like to add your new Learning Path content to the automated testing framework, follow the guidelines in the [Appendix: How to test your code](/learning-paths/cross-platform/_example-learning-path/appendix-3-test). From 21ba7713e237fe4b2df998e0299540ecde352a36 Mon Sep 17 00:00:00 2001 From: Liz Warman <81630105+lizwar@users.noreply.github.com> Date: Tue, 31 Oct 2023 09:07:45 +0000 Subject: [PATCH 07/35] Update _index.md editorial amends & some formatting changes --- .../embedded-systems/restrict-keyword-c99/_index.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/content/learning-paths/embedded-systems/restrict-keyword-c99/_index.md b/content/learning-paths/embedded-systems/restrict-keyword-c99/_index.md index db51af3da..411003e9f 100644 --- a/content/learning-paths/embedded-systems/restrict-keyword-c99/_index.md +++ b/content/learning-paths/embedded-systems/restrict-keyword-c99/_index.md @@ -1,15 +1,15 @@ --- -title: restrict keyword in C99 +title: Understand the `restrict` keyword in C99 minutes_to_complete: 30 who_is_this_for: C developers who are interested in software optimization learning_objectives: - - Learn the importance of using 'restrict' keyword in C correctly + - Learn the importance of using the `restrict` keyword in C correctly prerequisites: - - An Arm based system with Linux OS and recent compiler (clang or gcc) + - An Arm based system with Linux OS and recent compiler (Clang or GCC) author_primary: Konstantinos Margaritis, VectorCamp From abbb94060bb9c852f740b733210cae1f995a6f21 Mon Sep 17 00:00:00 2001 From: David Spickett Date: Tue, 31 Oct 2023 10:52:55 +0000 Subject: [PATCH 08/35] Ignore unformatted text blocks that are not scripts or files While testing a learning path I had one page that used the tripple backtick blocks just to show some plain text diagrams, but nothing else. The test runner thought these were tests, but then didn't set any commands for them, as there's nothing it can do with them. ``` Traceback (most recent call last): File "./tools/maintenance.py", line 171, in main() File "./tools/maintenance.py", line 148, in main check_lp(args.instructions, args.link, args.debug) File "./tools/maintenance.py", line 60, in check_lp res.append(check.check(lp_path + "/" + i[0], start=launch, stop=terminate)) File "/home/davspi01/work/open_source/arm-learning-paths/tools/check.py", line 196, in check for j in range(0, t["ncmd"]): KeyError: 'ncmd' ``` What we should do is not treat these blocks as tests at all, so that "ntests" is 0 by the time we get to check.py. check.py is already handling "ntests" being not present or set to 0 so it all works out. If you had a mix of triple backtick blocks, we'd skip the non-executable ones and the executable ones will still become tests and commands. --- tools/parse.py | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/tools/parse.py b/tools/parse.py index 1950e5301..9ae2a7f26 100644 --- a/tools/parse.py +++ b/tools/parse.py @@ -212,11 +212,15 @@ def save(article, cmd, learningpath=False, img=None): # for other types, we're assuming source code # check if a file name is specified else: - content[i_idx] = {"type": l[0]} - # check file name + # check for a file name if "file_name" in l[0]: + content[i_idx] = {"type": l[0]} fn = l[0].split("file_name=\"")[1].split("\"")[0] content[i_idx].update({"file_name": fn }) + else: + # No file name means it is some unformatted text, rather than + # some executable script or code. Skip it. + continue for j_idx,j in enumerate(l[1:]): content[i_idx].update({j_idx: j}) From ffd0f0f72c86358f9d454892d2db2f0effb048f5 Mon Sep 17 00:00:00 2001 From: David Spickett Date: Tue, 31 Oct 2023 14:26:57 +0000 Subject: [PATCH 09/35] Fix UnboundLocalError for username when testing multiple learning path pages Prior to this change, when I tested a learning path that had >1 page using `ubuntu:latest` I got: ``` Traceback (most recent call last): File "./tools/maintenance.py", line 171, in main() File "./tools/maintenance.py", line 148, in main check_lp(args.instructions, args.link, args.debug) File "./tools/maintenance.py", line 60, in check_lp res.append(check.check(lp_path + "/" + i[0], start=launch, stop=terminate)) File "/home/davspi01/work/open_source/arm-learning-paths/tools/check.py", line 215, in check cmd = ["docker cp {} test_{}:/home/{}/".format(fn, k, username)] UnboundLocalError: local variable 'username' referenced before assignment ``` This happened because username is only set when `start` is true and this only happens for the first page. ``` [INFO] Checking how-to-1.md > /home/davspi01/work/open_source/arm-learning-paths/tools/check.py(83)check() -> with open(json_file) as jf: (Pdb) print(start, stop) True False (Pdb) c [INFO] Checking how-to-3.md > /home/davspi01/work/open_source/arm-learning-paths/tools/check.py(83)check() -> with open(json_file) as jf: (Pdb) print(start, stop) False False (Pdb) c [INFO] Checking how-to-2.md > /home/davspi01/work/open_source/arm-learning-paths/tools/check.py(83)check() -> with open(json_file) as jf: (Pdb) print(start, stop) False True (Pdb) c ``` Where it was, `username` didn't really need to be a variable because it was hardcoded into the docker commands for each branch of the if. So I've moved the username lookup later where we check the image we're going to run on (which happens regardless of whether we start a new image). --- tools/check.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/check.py b/tools/check.py index b36849c3b..0468d4a21 100644 --- a/tools/check.py +++ b/tools/check.py @@ -92,8 +92,8 @@ def check(json_file, start, stop): subprocess.run(cmd, shell=True, stdout=subprocess.DEVNULL, stderr=subprocess.STDOUT) # Create user and configure - username="user" if "arm-tools" in img: + # These images already have a user account set up. username="ubuntu" cmd = ["docker exec test_{} apt update".format(i)] logging.debug(cmd) @@ -157,6 +157,7 @@ def check(json_file, start, stop): # Run bash commands for i in range(0, data["ntests"]): + username = "ubuntu" if "arm-tools" in data["image"][0] else "user" t = data["{}".format(i)] # Check if file name is specified From 08e1deece3b1dfd9134c289230b01dc77270e911 Mon Sep 17 00:00:00 2001 From: David Spickett Date: Tue, 31 Oct 2023 14:45:49 +0000 Subject: [PATCH 10/35] Remove unused var, move get closer to use. --- tools/check.py | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/tools/check.py b/tools/check.py index 0468d4a21..1b38bba67 100644 --- a/tools/check.py +++ b/tools/check.py @@ -93,8 +93,7 @@ def check(json_file, start, stop): # Create user and configure if "arm-tools" in img: - # These images already have a user account set up. - username="ubuntu" + # These images already have a 'ubunutu' user account set up. cmd = ["docker exec test_{} apt update".format(i)] logging.debug(cmd) subprocess.run(cmd, shell=True, stdout=subprocess.DEVNULL, stderr=subprocess.STDOUT) @@ -157,7 +156,6 @@ def check(json_file, start, stop): # Run bash commands for i in range(0, data["ntests"]): - username = "ubuntu" if "arm-tools" in data["image"][0] else "user" t = data["{}".format(i)] # Check if file name is specified @@ -211,6 +209,7 @@ def check(json_file, start, stop): else: inst = range(0, len(data["image"])) + username = "ubuntu" if "arm-tools" in data["image"][0] else "user" for k in inst: # Copy over the file with commands cmd = ["docker cp {} test_{}:/home/{}/".format(fn, k, username)] From 2f1194598ef39510b12ad869ed40ea183c3b7208 Mon Sep 17 00:00:00 2001 From: Liz Warman <81630105+lizwar@users.noreply.github.com> Date: Wed, 1 Nov 2023 09:43:14 +0000 Subject: [PATCH 11/35] Update _review.md Re-ordered questions so that they make more logical sense. In next-steps.MD, there is reference to what the learner should now know. This needs to be added here, to the review section. In the template file that was initially filled out, this should be there. So the review section starts with 'you now know how ...', i.e., referring back to the initial LOs and is then followed by questions to test the learning. --- .../restrict-keyword-c99/_review.md | 31 +++++++++---------- 1 file changed, 15 insertions(+), 16 deletions(-) diff --git a/content/learning-paths/embedded-systems/restrict-keyword-c99/_review.md b/content/learning-paths/embedded-systems/restrict-keyword-c99/_review.md index d9f0a8080..9284cf8ae 100644 --- a/content/learning-paths/embedded-systems/restrict-keyword-c99/_review.md +++ b/content/learning-paths/embedded-systems/restrict-keyword-c99/_review.md @@ -1,5 +1,18 @@ --- review: +- questions: + question: > + What does `restrict` do? + answers: + - It increases the frequency of the CPU cores, making your program run faster + - It issues a command to clear the cache, leaving more space for your program + - It restricts the standard of the C library used to C99 + - It hints to the compiler that the memory pointed to by the parameter cannot be accessed by any other means inside a particular function except using this pointer + correct_answer: 4 + explanation: > + In order for the compiler to better schedule the instructions of a function, it needs to know if there are any + dependencies between the parameter variables. If there is no dependency, usually the compiler can group together instructions + increasing performance and efficiency. - questions: question: > Where is `restrict` placed in the code? @@ -10,21 +23,7 @@ review: correct_answer: 3 explanation: > `restrict` is placed in the arguments list of a function, between the * and the parameter name, like this: - `int func(char *restrict arg)` - - questions: - question: > - What does `restrict` do? - answers: - - It increases the frequency of the CPU cores, making your program run faster - - It issues a command to clear the cache, leaving more room for your program - - It restricts the standard of the C library used to C99 - - It hints to the compiler that the memory pointed to by the parameter, cannot be accessed through any other means inside the particular function except, using this pointer - correct_answer: 4 - explanation: > - In order for the compiler to better schedule the instructions of a function, it needs to know if there is any - dependency between the parameter variables. If there is no dependency, usually the compiler can group together instructions - increasing performance and efficiency. - + `int func(char *restrict arg)` - questions: question: > Which language supports `restrict` @@ -35,7 +34,7 @@ review: - Rust correct_answer: 3 explanation: > - `restrict` is a C-only keyword, it does not exist on C++ (`__restrict__` does, but it is not exactly the same) + `restrict` is a C-only keyword, it does not exist on C++ (`__restrict__` does, but it does not have the same function) From dd425e8d867f94b1de049d729f01bf55329a30ba Mon Sep 17 00:00:00 2001 From: Liz Warman <81630105+lizwar@users.noreply.github.com> Date: Wed, 1 Nov 2023 09:47:49 +0000 Subject: [PATCH 12/35] Update _next-steps.md next_step_guidance: You should now be able to test the `restrict` keyword on your own or other open-source code and discover potential optimizations! - this sentence should be added to review.md as per comments already made in that commit. Next step guidance should be if there are other learning paths or documentation in, say, developer.arm.com that's useful to add to the newly acquired language. recommended_path: /learning-paths/embedded-systems/ - can we be more specific here. Is there a particular learning path that's helpful? Please can this be amended. further_reading: - resource: title: Wikipedia restrict entry link: https://en.wikipedia.org/wiki/Restrict type: documentation We do not recommend adding links to Wikipedia. It's too general and doesn't add to the learning path. - resource: title: Godbolt restrict tests - is this more useful as a next_step_guidance? What do you want the learner to be able to do with these tests? --- .../embedded-systems/restrict-keyword-c99/_next-steps.md | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/content/learning-paths/embedded-systems/restrict-keyword-c99/_next-steps.md b/content/learning-paths/embedded-systems/restrict-keyword-c99/_next-steps.md index ba23e557c..770b195a4 100644 --- a/content/learning-paths/embedded-systems/restrict-keyword-c99/_next-steps.md +++ b/content/learning-paths/embedded-systems/restrict-keyword-c99/_next-steps.md @@ -1,13 +1,11 @@ --- -next_step_guidance: You should now be able to test the `restrict` keyword on your own or other open-source code and discover potential optimizations! +next_step_guidance: recommended_path: /learning-paths/embedded-systems/ further_reading: - resource: - title: Wikipedia restrict entry - link: https://en.wikipedia.org/wiki/Restrict - type: documentation + - resource: title: Godbolt restrict tests link: https://godbolt.org/z/PxWxjc1oh From e124c04a5bf8e663fb3bb81388c2ad7424fe4e53 Mon Sep 17 00:00:00 2001 From: Liz Warman <81630105+lizwar@users.noreply.github.com> Date: Wed, 1 Nov 2023 10:16:51 +0000 Subject: [PATCH 13/35] Update what-is-restrict.md grammatical amends --- .../restrict-keyword-c99/what-is-restrict.md | 26 +++++++++---------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/content/learning-paths/embedded-systems/restrict-keyword-c99/what-is-restrict.md b/content/learning-paths/embedded-systems/restrict-keyword-c99/what-is-restrict.md index e4d08eeb5..aadb81b9a 100644 --- a/content/learning-paths/embedded-systems/restrict-keyword-c99/what-is-restrict.md +++ b/content/learning-paths/embedded-systems/restrict-keyword-c99/what-is-restrict.md @@ -6,9 +6,9 @@ weight: 2 layout: learningpathall --- -## The problem: Overlapping memory regions as pointer arguments +## The problem: overlapping memory regions as pointer arguments -Before we go into detail of the `restrict` keyword, let's first demonstrate the problem. +Before we go into the detail of the `restrict` keyword, let's first demonstrate the problem. Let's consider this C code: ```C @@ -44,7 +44,7 @@ int main() { } ``` -So, there are 2 points to make here: +There are 2 points to make here: 1. `scaleVectors()` is the important function here, it scales two vectors by the same scalefactor `*C` 2. vector `a` overlaps with vector `b`. (`b = &a[2]`). @@ -56,7 +56,7 @@ a(after) : 2 4 12 16 b(after) : 12 16 10 12 ``` -Notice that after the scaling the contents of `a` are also affected by the scaling of `b` as their elements overlap in memory. +Notice that after the scaling, the contents of `a` are also affected by the scaling of `b` as their elements overlap in memory. We will include the assembly output of `scaleVectors` as produced by `clang-17 -O3`: @@ -97,18 +97,18 @@ scaleVectors: // @scaleVectors ret ``` -This doesn't look optimal. `scaleVectors` seems to be doing each load, multiplication, store in sequence, surely it can be further optimized? This is because the memory pointers are overlapping, let's try different assignments of `a` and `b` in `main()` to make them explicitly independent, perhaps the compiler can detect that and generate faster instructions to do the same thing. +This doesn't look optimal. `scaleVectors` seems to be doing each load, multiplication, and store in sequence. Surely it can be better optimized? Because the memory pointers are overlapping, let's try different assignments of `a` and `b` in `main()` to make them explicitly independent. Perhaps the compiler will detect that and generate faster instructions to do the same thing. ``` int64_t a[] = { 1, 2, 3, 4 }; int64_t b[] = { 5, 6, 7, 8 }; ``` -Unsurprisingly, the disassembled output of `scaleVectors` is the same. The reason for this is that the compiler has no hint of the dependency between the two pointers used in the function so it has no choice than to assume that it has to process one element at a time. The function has no way of knowing with what arguments it is to be called. We see 8 instances of `mul`, which is correct but the number of loads and stores inbetween indicates that the CPU spends its time waiting for data to arrive from/to the cache. We need a way to be able to hint the compiler that it can assume the buffers passed are independent. +Unsurprisingly, the disassembled output of `scaleVectors` is the same. The reason for this is that the compiler has no hint about the dependency between the two pointers used in the function so it has no choice but to assume that it has to process one element at a time. The function has no way of knowing what arguments need to be called. We see 8 instances of `mul`, which is correct but the number of loads and stores inbetween indicates that the CPU spends its time waiting for data to arrive from/to the cache. We need a way to be able to tell the compiler that it can assume the buffers passed are independent. ## The Solution: restrict -This is what the C99 `restrict` keyword has come to solve. It instructs the compiler that the passed arguments are in no way dependant on each other and access to the memory of each happens only through the respective pointer. This way the compiler can schedule the instructions in a much better way. In essence it can group and schedule the loads and stores. As a note, `restrict` only works in C, not in C++. +This is what the C99 `restrict` keyword resolves. It instructs the compiler that the passed arguments are not dependant on each other and that access to the memory of each happens only through the respective pointer. This way the compiler can schedule the instructions in a much more efficient way. Essentially it can group and schedule the loads and stores. **Note**, `restrict` only works in C, not in C++. Let's add `restrict` to `A` in the parameter list: ```C @@ -149,7 +149,7 @@ scaleVectors: // @scaleVectors ret ``` -We see an obvious reduction in the number of instructions, from 32 instructions down to 22! That's 68% of the original count, which is impressive on its own. One can easily see that the loads are grouped, as well as the multiplications. Of course, still 8 multiplications, that cannot change, but far fewer loads and stores as the compiler found the opportunity to use `LDP`/`STP` which load/store in pairs for the pointer `A`. +We see an obvious reduction in the number of instructions, from 32 instructions down to 22! That's 68% of the original count, which is impressive. One can easily see that the loads are grouped, as well as the multiplications. Of course, there are still 8 multiplications as that cannot change, but there are far fewer loads and stores as the compiler found the opportunity to use `LDP`/`STP` which load/store in pairs for the pointer `A`. Let's try adding `restrict` to `B` as well: ```C @@ -185,14 +185,14 @@ scaleVectors: // @scaleVectors ret ``` -Another reduction in the number of instructions, down to 17, for a total reduction to 53% the original count. This time, only 5 loads and 4 stores. And as before, all the loads/stores are paired (because the `LDP`/`STP` instructions are used). +There is another reduction in the number of instructions, this time down to 17 from the original 32. There are only 5 loads and 4 stores and, as before, all the loads/stores are paired (because the `LDP`/`STP` instructions are used). -It is interesting to see that in such an example, adding just the `restrict` keyword reduced our code size to almost half. This will have an obvious impact in performance and efficiency. +It is interesting to see that in such an example adding the `restrict` keyword reduced our code size to almost half. This will have an obvious impact in both performance and efficiency. ## What about SVE2? We have shown the obvious benefit of `restrict` in this function, on an armv8-a CPU, but we have new armv9-a CPUs out there with SVE2 as well as Neon/ASIMD. -Could the compiler generate better code in that case using `restrict`? To save time, the output without `restrict` is almost the same, however with `restrict` used, this is the result (we used `clang-17 -O3 -march=armv9-a`): +Could the compiler generate better code in that case using `restrict`? The output without `restrict` is almost the same, but with `restrict` used, this is the result (we used `clang-17 -O3 -march=armv9-a`): ``` scaleVectors: // @scaleVectors @@ -208,6 +208,6 @@ scaleVectors: // @scaleVectors ret ``` -This is just 10 instructions, only 31% of the original code size! The compiler made a great use of SVE2 features, combining the multiplications and reducing them to 4, at the same time grouping loads and stores down to 2 each. We have optimized our code more than 3x by only adding a C99 keyword! +There are just 10 instructions, 31% of the original code size! The compiler has made great use of the SVE2 features, combining the multiplications and reducing them to 4 and, at the same time, grouping loads and stores down to 2 each. We have optimized our code by more than 3x just by adding a C99 keyword. -We are going to look at another example next. \ No newline at end of file +We are now going to look at another example. From 1c4385a5460a95114c8f864eae15c3b75dcfdbc2 Mon Sep 17 00:00:00 2001 From: Liz Warman <81630105+lizwar@users.noreply.github.com> Date: Wed, 1 Nov 2023 10:27:11 +0000 Subject: [PATCH 14/35] Update restrict-example-sve2.md grammatical amends --- .../restrict-keyword-c99/restrict-example-sve2.md | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/content/learning-paths/embedded-systems/restrict-keyword-c99/restrict-example-sve2.md b/content/learning-paths/embedded-systems/restrict-keyword-c99/restrict-example-sve2.md index 947f0e4a5..a92be36ad 100644 --- a/content/learning-paths/embedded-systems/restrict-keyword-c99/restrict-example-sve2.md +++ b/content/learning-paths/embedded-systems/restrict-keyword-c99/restrict-example-sve2.md @@ -18,7 +18,7 @@ void process_data (const char *in, char *out, size_t size) } ``` -This example will be easier to demonstrate with SVE2, and we found gcc 13 to have a better result than clang, this is the output of `gcc-13 -O3 -march=armv9-a`: +This example will be easier to demonstrate with SVE2. We found gcc 13 to have a better result than clang; this is the output of `gcc-13 -O3 -march=armv9-a`: ``` process_data: @@ -55,7 +55,7 @@ process_data: ret ``` -Do not worry about each instruction in the assembly here, but notice that gcc has added 2 loops, one that uses the SVE2 `while*` instructions to the processing (.L4) and one scalar loop (.L3). The latter is executed in case theis any pointer aliasing -if there is any overlap between the memory pointers basically. Let's try adding `restrict` to pointer `in`: +Do not worry about each instruction in the assembly here, but notice that gcc has added 2 loops, one that uses the SVE2 `while*` instructions to the processing (.L4) and one scalar loop (.L3). The latter is executed in case there is any pointer aliasing (basically, if there is any overlap between the memory pointers). Let's try adding `restrict` to pointer `in`: ```C void process_data (const char *restrict in, char *out, size_t size) @@ -85,11 +85,10 @@ process_data: ret ``` -This is a huge improvement! Code size reduction is down from 30 lines to 14, less than half the original size. In both cases, you will note that the main loop (`.L4` in the former case, `.L3` in the latter) is exactly the same, but the entry and exit code of the function are very much simplified. The compiler was able to distinguish that the memory pointed by `in` does not overlap with memory pointed by `out`, it was able to simplify the code by eliminating the scalar loop and remove the associated code that checked if it needed to enter it. +This is a huge improvement! The code size is down from 30 lines to 14, less than half of the original size. In both cases, note that the main loop (`.L4` in the former case, `.L3` in the latter) is exactly the same, but the entry and exit code of the function is very much simplified. The compiler was able to distinguish that the memory pointed by `in` does not overlap with memory pointed by `out`, it was able to simplify the code by eliminating the scalar loop, and also remove the associated code that checked if it needed to enter it. -But I can almost hear the question: "Why is that important if the main loop is still the same?" -And it is a right question. The answer is this: +Why is this important if the main loop is still the same? If your function is going to be called once and run over tens of billions of elements, then saving a few instructions before and after the main loop does not really matter. -But if your function is called on smaller sizes millions or even *billions* of times, then saving a few instructions in this function means we are saving a few *billions* of instructions total, which means less time to spend running on the CPU and less energy wasted. +But, if your function is going to be called on smaller sizes or even *billions* of times, then saving a few instructions in this function means we are saving a few *billions* of instructions in total, which means less time spent running on the CPU and less energy wasted. From 5f7b94a129c1eebbe8d45d42b0cc7274138c185d Mon Sep 17 00:00:00 2001 From: Liz Warman <81630105+lizwar@users.noreply.github.com> Date: Wed, 1 Nov 2023 10:46:48 +0000 Subject: [PATCH 15/35] Update when-to-use-restrict.md Grammatical amends. Can we add some explanation as to why the example doesn't benefit from restrict. I think that a sentence or 2 would help to make the example more useful. --- .../restrict-keyword-c99/when-to-use-restrict.md | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/content/learning-paths/embedded-systems/restrict-keyword-c99/when-to-use-restrict.md b/content/learning-paths/embedded-systems/restrict-keyword-c99/when-to-use-restrict.md index 65f617558..8b3caf9e3 100644 --- a/content/learning-paths/embedded-systems/restrict-keyword-c99/when-to-use-restrict.md +++ b/content/learning-paths/embedded-systems/restrict-keyword-c99/when-to-use-restrict.md @@ -6,13 +6,11 @@ weight: 4 layout: learningpathall --- -## So, when can we use restrict? +When can we use `restrict` or, put differently, how do we recognize that we need `restrict` in our code? -This is all very good, but when can we use it? Or put differently, how to recognize we need `restrict` in our code? +`restrict` as a pointer attribute is rather easy to test. As a rule of thumb, if the function includes one or more pointers to memory objects as arguments, we can use `restrict` if we are certain that the memory pointed to by these pointer arguments does not overlap and there is no other way to access them in the body of the function, except by the use of those pointers, i.e., there is no other global pointer or some other indirect way to access these elements. -`restrict` as a pointer attribute is rather easy to test. As a rule of thumb, if our function includes one or more pointers to memory objects as arguments, we can use `restrict` if we are certain that the memory pointed by those pointer arguments does not overlap and there is no other way to access it in the body of the function, except by the use of those pointers -eg. there is no other global pointer, or some other indirect way to access these elements. - -Let's show a coutner-example: +Let's show a counter example: ``` int A[10]; @@ -30,6 +28,6 @@ int main() { } ``` -This example does not not benefit from `restrict` at all in both gcc and clang. +This example does not not benefit from `restrict` in either gcc and clang. -However, there are plenty of cases that are candidates for the `restrict` optimization. And it's safe and easy to try. Nevertheless, even if it looks like a good candidate, it is still possible that the compiler will not detect a pattern that is suited for optimization and we might not see any reduction in the code or speed gain. It is up to the compiler, some cases clang handles better or differently than gcc, and vice versa, and that even depends on the version. If you have a particular piece of code that falls in the above criteria that you would care to optimize, before you attempt to refactor it completely, or rewrite it in assembly or use any SIMD instructions, it might be worth a shot to try `restrict`. Even saving a couple of instructions in a critical loop function is worth having to add just one keyword! \ No newline at end of file +However, there are plenty of cases that are candidates for the `restrict` optimization. It's safe and easy to try but, even if it looks like a good candidate, it is still possible that the compiler will not detect a pattern that is suited for optimization and we might not see any reduction in the code or speed gain. It is up to the compiler; in some cases clang handles this better or differently from gcc, and vice versa, and this will also depend on the version. If you have a particular piece of code that you would like to optimize, before you attempt to refactor it completely, rewrite it in assembly or use any SIMD instructions, it might be worth trying `restrict`. Even saving a couple of instructions in a critical loop function is worth having by just adding one keyword. From 891b8bd76872df8f1e5406d0f4b332d2ca0e7006 Mon Sep 17 00:00:00 2001 From: pareenaverma Date: Wed, 1 Nov 2023 11:45:10 -0400 Subject: [PATCH 16/35] Update _review.md --- .../embedded-systems/restrict-keyword-c99/_review.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/learning-paths/embedded-systems/restrict-keyword-c99/_review.md b/content/learning-paths/embedded-systems/restrict-keyword-c99/_review.md index 9284cf8ae..cf97d161c 100644 --- a/content/learning-paths/embedded-systems/restrict-keyword-c99/_review.md +++ b/content/learning-paths/embedded-systems/restrict-keyword-c99/_review.md @@ -13,7 +13,7 @@ review: In order for the compiler to better schedule the instructions of a function, it needs to know if there are any dependencies between the parameter variables. If there is no dependency, usually the compiler can group together instructions increasing performance and efficiency. - - questions: +- questions: question: > Where is `restrict` placed in the code? answers: @@ -24,7 +24,7 @@ review: explanation: > `restrict` is placed in the arguments list of a function, between the * and the parameter name, like this: `int func(char *restrict arg)` - - questions: +- questions: question: > Which language supports `restrict` answers: From 5db2e0bb9853757fda557f7e782d213b706acf28 Mon Sep 17 00:00:00 2001 From: pareenaverma Date: Wed, 1 Nov 2023 12:11:12 -0400 Subject: [PATCH 17/35] Update _next-steps.md --- .../embedded-systems/restrict-keyword-c99/_next-steps.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/content/learning-paths/embedded-systems/restrict-keyword-c99/_next-steps.md b/content/learning-paths/embedded-systems/restrict-keyword-c99/_next-steps.md index 770b195a4..16c3acfb4 100644 --- a/content/learning-paths/embedded-systems/restrict-keyword-c99/_next-steps.md +++ b/content/learning-paths/embedded-systems/restrict-keyword-c99/_next-steps.md @@ -1,13 +1,16 @@ --- -next_step_guidance: +next_step_guidance: You should now be able to test the `restrict` keyword in your own code. Why not explore these other embedded software learning paths. recommended_path: /learning-paths/embedded-systems/ further_reading: - resource: + title: How to use the restrict qualifier in C + link: https://www.oracle.com/solaris/technologies/solaris10-cc-restrict.html + type: blog - resource: - title: Godbolt restrict tests + title: Explore the usage of restrict with Godbolt link: https://godbolt.org/z/PxWxjc1oh type: website From 8460536b806144c12b0adcb3f83543b8634035b1 Mon Sep 17 00:00:00 2001 From: David Spickett Date: Thu, 2 Nov 2023 10:14:48 +0000 Subject: [PATCH 18/35] Revert "Ignore unformatted text blocks that are not scripts or files" This reverts commit abbb94060bb9c852f740b733210cae1f995a6f21. Due to causing existing learning paths to fail tests. I've opened issue #562 to discuss the larger issue here, which I did not understand when I did this change. --- tools/parse.py | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/tools/parse.py b/tools/parse.py index 9ae2a7f26..1950e5301 100644 --- a/tools/parse.py +++ b/tools/parse.py @@ -212,15 +212,11 @@ def save(article, cmd, learningpath=False, img=None): # for other types, we're assuming source code # check if a file name is specified else: - # check for a file name + content[i_idx] = {"type": l[0]} + # check file name if "file_name" in l[0]: - content[i_idx] = {"type": l[0]} fn = l[0].split("file_name=\"")[1].split("\"")[0] content[i_idx].update({"file_name": fn }) - else: - # No file name means it is some unformatted text, rather than - # some executable script or code. Skip it. - continue for j_idx,j in enumerate(l[1:]): content[i_idx].update({j_idx: j}) From 3839fc9b5a3881a27e5065ac3092b78043cd1622 Mon Sep 17 00:00:00 2001 From: David Spickett Date: Thu, 26 Oct 2023 11:13:12 +0100 Subject: [PATCH 19/35] Dynamic Memory Allocator Learning Path This learning path is for folks who know some C programming and have maybe used malloc and free before . It gives them a tour of a simple dynamic memory allocator and discusses aspects of how it works and how it could be improved. The overall goal is to demystify the subject so there is a lower barrier to entry if folks want to get onto more advanced memory allocation topics. For exampe, in future I would like to build on this by adding Arm Memory Tagging to the allocator in a separate learning path. The concepts are universal but the build instructions are for a Linux system, so I've put it in the laptops-and-desktops category. --- .../1_dynamic_memory_allocation.md | 187 +++++++++ .../2_designing_a_dynamic_memory_allocator.md | 169 +++++++++ ...implementing_a_dynamic_memory_allocator.md | 357 ++++++++++++++++++ .../4_conclusions_further_work.md | 181 +++++++++ .../dynamic-memory-allocator/_index.md | 31 ++ .../dynamic-memory-allocator/_next-steps.md | 22 ++ .../dynamic-memory-allocator/_review.md | 78 ++++ 7 files changed, 1025 insertions(+) create mode 100644 content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/1_dynamic_memory_allocation.md create mode 100644 content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/2_designing_a_dynamic_memory_allocator.md create mode 100644 content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/3_implementing_a_dynamic_memory_allocator.md create mode 100644 content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/4_conclusions_further_work.md create mode 100644 content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/_index.md create mode 100644 content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/_next-steps.md create mode 100644 content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/_review.md diff --git a/content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/1_dynamic_memory_allocation.md b/content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/1_dynamic_memory_allocation.md new file mode 100644 index 000000000..1816354e9 --- /dev/null +++ b/content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/1_dynamic_memory_allocation.md @@ -0,0 +1,187 @@ +--- +title: Dynamic Memory Allocation +weight: 2 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Dynamic vs. Static Allocation + +In this learning path you will learn how to implement dynamic memory allocation. +If you have used C's "heap" (`malloc`, `free`, etc.) before, that is one example +of dynamic memory allocation. + +It allows programs to allocate memory while they are running without knowing +at build time what amount of memory they will need. In constrast to static +memory allocation where the amount is known at build time. + +```C +#include + +void fn() { + // Static allocation + int a = 0; + // Dynamic allocation + int *b = malloc(sizeof(int)); +} +``` + +The example above shows the difference. The size and location of `a` is known +when the program is built. The size of `b` is also known, but its location is not. + +It may even never be allocated, as this pseudocode example shows: + +```C +int main(...) { + if (/*user has passed some argument*/) { + int *b = malloc(sizeof(int)); + } +} +``` + +If the user passes no arguments to the program, there's no need to allocate space +for `b`. If they do, `malloc` will find space for it. + +## malloc + +The C standard library provides a special function +[`malloc`](https://en.cppreference.com/w/c/memory/malloc). `m` for "memory", +`alloc` for "allocate". This can be used to ask for a suitably sized memory +location while the program is running. + +```C +void *malloc(size_t size); +``` + +The C library will then look for a chunk of memory with size of at least `size` +bytes in a large chunk of memory that it has reserved. For instance on Ubuntu +Linux, this will be done by GLIBC. + +The example at the top of the page is trivial of course. As it is we could just +statically allocate both integers like this: +```C +void fn() { + int a, b = 0; +} +``` + +That's ok if this data is never be returned from this function. Or in other +words, if the lifetime of this data is equal to that of the function. + +A more complicated example will show you when that is not the case, and the value +lives longer than the function that created it. + +```C +#include + +typedef struct Entry { + int data; + // NULL if end of list, next entry otherwise. + struct Entry* next; +} Entry; + +void add_entry(Entry *entry, int data) { + // New entry, which becomes the end of the list. + Entry *new_entry = malloc(sizeof(Entry)); + new_entry->data = data; + new_entry->next = NULL; + + // Previous tail now points to the newly allocated entry. + entry->next = new_entry; +} +``` + +What you see above is a struct `Entry` that defines a singly-linked-list entry. +Singly meaining that you can go forward via `next`, but you cannot go backwards +in the list. There is some data `data`, and each entry points to the next entry, +`next`, assuming there is one (it will be `NULL` for the end of the list). + +`add_entry` makes a new entry and adds it to the end of the list. + +Think about how you would use these functions. You could start with some known +size of list, like a global variable for the head (first entry) +of our list. + +```C +Entry head = {.data = 123, .next=NULL}; +``` + +Now you want to add another `Entry` to this list at runtime. So you do not know +ahead of time what it will contain, or if we indeed will add it or not. Where +would you put that entry? + +* If it is another global variable, we would have to declare many empty `Entry`s + and hope we never needed more than that amount. + +{{% notice Other Allocation Techniques%}} +Although in this specific case global variables aren't a good solution, there are +cases where large sets of pre-allocated objects can be beneficial. For example, +it provides a known upper bound of memory usage and makes the timing of each +allocation predictable. + +However, we will not be covering these techniques in this learning path. It will +however be useful to think about them after you have completed this learning +path. +{{% /notice %}} + +* If it is in a function's stack frame, that stack frame will be reclaimed and + modified by future functions, corrupting the new `Entry`. + +So you can see, we must use dynamic memory allocation. Which is why the `add_entry` +shown above calls `malloc`. The resulting pointer points to somewhere not in +the program's global data section or in any function's stack space, but in the +heap memory. Where it can live until we `free` it. + +## free + +You cannot ask malloc for memory forever. Eventually that space behind the scenes +will run out. So you should give up your dynamic memory once it is not needed, +using [`free`](https://en.cppreference.com/w/c/memory/free). + +```C +void free(void *ptr); +``` + +You call `free` with a pointer previously given to you by `malloc`, and this tells +the heap that we no longer need this memory. + +{{% notice Undefined Behaviour%}} +You may wonder what happens if you don't pass the exact pointer to `free`, as +`malloc` returned to you. The result varies as this is "undefined behaviour". +Which essentially means a large variety of unexpected things can happen. + +In practice, many allocators will tolerate this difference or reject it outright +if it's not possible to do something sensbile with the pointer. + +Remember that just because one allocator handles this a certain way, does not +mean all will. Indeed, that same allocator may handle it differently for +different allocations within the same program. +{{% /notice %}} + +So, you can use `free` to remove an item from your linked list. + +```C +void remove_entry(Entry* previous, Entry* entry) { + // NULL checks skipped for brevity. + previous->next = entry->next; + free(entry); +} +``` + +`remove_entry` makes the previous entry point to the entry after the one we want +to remove, so that the list skips over it. With `entry` now isolated we call +`free` to give up the memory it occupies. + +```text +----- List ------ | - Heap -- +[A] -> [B] -> [C] | [A][B][C] + | +[A] [B] [C] | [A][B][C] + |-------------^ | + | +[A]---------->[C] | [A] [C] +``` + +That covers the high level how and why of using `malloc` and `free`, next you'll +see a possible implementation of a dynamic memory allocator. \ No newline at end of file diff --git a/content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/2_designing_a_dynamic_memory_allocator.md b/content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/2_designing_a_dynamic_memory_allocator.md new file mode 100644 index 000000000..4b7b226d3 --- /dev/null +++ b/content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/2_designing_a_dynamic_memory_allocator.md @@ -0,0 +1,169 @@ +--- +title: Designing a Dynamic Memory Allocator +weight: 3 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## High Level Design + +To begin with, decide which functions your memory allocator will provide. We +have described `malloc` and `free`, there are more provided by the +[C library](https://en.cppreference.com/w/c/memory). + +This will assume you just need `malloc` and `free`. Start with those and write +out their behaviours, as the programmer using your allocator will see. + +There will be a function, `malloc`. It will: +* Take a size in bytes as a parameter. +* Try to allocate some memory. +* Return a pointer to that memory, NULL pointer otherwise. + +There will be a function `free`. It will: +* Take a pointer to some previously allocated memory as a parameter. +* Mark that memory as avaiable for future allocations. + +From this you can see that you will need: +* Some large chunk of memory, the "backing storage". +* A way to mark parts of that memory as allocated, or available for allocation. + +## Backing Storage + +The memory can come from many sources. It can even change size throughout the +program's execution if you wish. For your allocator you'll keep it as simple +as possible. + +A single, statically allocated global array of bytes will be your backing +storage. So you can do dynamic allocation of parts of a statically allocated +piece of memory. + +```C +#define STORAGE_SIZE 4096 +static char storage[STORAGE_SIZE]; +``` + +## Record Keeping + +This backing memory needs to be annotated somehow to record what has been +allocated so far. There are many, many ways to do this. With the biggest choice +here being whether to store these records in the heap itself, our outside of it. + +We will not go into those tradeoffs here, and instead you will put the records +in the heap, as this is relatively simple to do. + +What should be in your records? Think about what question the software will ask +us. Can you give me a pointer to an area of free memory of at least this size? + +For this you will need to know: +* Which ranges of the backing storage have been allocated or not. +* How large each of ranges sections is. This includes free areas. + +Where a "range" a pointer to a location, a size in bytes and a boolean to say +whether the range is free or allocated. So a range from 0x123 of 345 bytes, +that has been allocated would be: + +```text +start: 0x123 size: 345 allocated: true +``` + +For the intial state of a heap of size `N`, you will have one range of +unallocated memory. + +```text +Pointer: 0x0 Size: N Allocated: False +``` + +When an allocation is made you will split this free range into 2 ranges. The +first part the new allocation, the second the remaining free space. If 4 bytes +were to be allocated: + +```text +Pointer: 0x0 Size: 4 Allocated: True +Pointer: 0x4 Size: N-4 Allocated: False +``` + +The next time you need to allocate, you will walk these ranges until you find +one with enough free space, and repeat the splitting process. + +The walk works like this. Starting from the first range, add the size of that +range to the address of that range. This new address is the start of the next +range. Repeat until the resulting address is beyond the end of the heap. + +```text +range = 0x0; + +Pointer: 0x0 Size: 4 Allocated: False + +range = 0x0 + 4 = 0x4; + +Pointer: 0x4 Size: N-4 Allocated: False + +range = 0x4 + (N-4) = 1 beyond the end of the heap, so the walk is finished. +``` + +`free` uses the pointer given to it to find the range it needs to deallocate. +Let's say the 4 byte allocation was freed: + +```text +Pointer: 0x0 Size: 4 Allocated: False +Pointer: 0x4 Size: N-4 Allocated: False +``` + +Since `free` gets a pointer directly to the allocation you know exactly which +range to modify. The only change made is to the boolean which marks it as +allocated or not. The location and size of the range stay the same. + +{{% notice Merging Free Ranges%}} +The allocator presented here will not merge free ranges like the 2 above. This +is a deliberate limitation and addressing this is discussed later. +{{% /notice %}} + +## Record Storage + +You'll keep these records in heap which means using some of the allocated space +for them on top of the allocation itself. + +The simplest way to do this is to prepend each allocation with the range +information. This way you can skip from the start of one range to another with +ease. + +```text +0x00: [ptr, size, allocated] <-- The range information +0x08: <...> <-- The pointer malloc returns +0x10: [ptr, size, allocated] <-- Information about the second range +<...and so on until the end of the heap...> +``` + +Pointers returned by `malloc` are offset to just beyond the range information. +When `free` receives a pointer, it can get to the range information by +subtracting the size of that information from the pointer. Using the example +above: + +```text +free(my_ptr); + +0x00: [ptr, size, allocated] <-- my_ptr - sizeof(range information) +0x08: <...> <-- my_ptr +``` + +{{% notice Data Alignment%}} +When an allocator needs to produce addresses with a specific alignment, the +calculations above must be adjusted. The allocator presented here does not +concern itself with alignment, which is why it can do a simple subtraction. +{{% /notice %}} + +## Running Out Of Space + +The final thing an allocator must do is realise it has run out of space. This is +simply achieved by knowing the bounds of the backing storage. + +```C +#define STORAGE_SIZE 4096 +static char storage[STORAGE_SIZE]; +// If our search reaches this point, there is no free space to allocate. +static const char *storage_end = storage + STORAGE_SIZE; +``` + +If you are walking the heap and the start of the next range would be greater +than or equal to `storage_end`, you have run out of memory to allocate. \ No newline at end of file diff --git a/content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/3_implementing_a_dynamic_memory_allocator.md b/content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/3_implementing_a_dynamic_memory_allocator.md new file mode 100644 index 000000000..3169ecdce --- /dev/null +++ b/content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/3_implementing_a_dynamic_memory_allocator.md @@ -0,0 +1,357 @@ +--- +title: Implementing a Dynamic Memory Allocator +weight: 4 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Project Structure + +The file layout will be as follows: +* `CMakeLists.txt` - To tell `cmake` how to configure the project. +* `heap.c` - The dynamic memory allocator implementation. +* `heap.h` - Function declarations including your new `simple_malloc` and + `simple_free` functions. +* `main.c` - A program that makes use of `simple_malloc` and `simple_free`. + +Building it will produce a single binary, `demo`, that you will run to see the +results. + +## Sources + +### CMakeLists.txt + +``` {file_name="CMakeLists.txt"} +cmake_minimum_required(VERSION 3.15) + +project(MemoryAllocatorDemo C) + +add_executable(demo main.c heap.c) +``` + +#### heap.h + +```C {file_name="heap.h"} +#include + +// Call once at the start of main() to initialise the empty heap. +// This is the equivalent of what your C library is doing before main() for the +// system heap. +void simple_heap_init(); + +void *simple_malloc(size_t size); + +void simple_free(void *ptr); +``` + +## heap.c + +Please refer to the comments in the source code here for detailed explanations +of each function. We will cover the key elements here up front. + +First is `storage`, this is the backing storage which is a global char array. +This is where the ranges, represented by `Header`, are stored. + +Each `Header` is written to the start of the allocated range. This means that +`malloc` returns a pointer that points just beyond this location. `free` on the +other hand, deducts the size of `Header` from the pointer parameter to find the +range information. + +When the heap is initialised with `simple_heap_init`, a single range is setup +that covers the whole heap and marks it as unallocated. + +To find a free range, `find_free_space` walks the heap using these `Header` +values until it finds a large enough free range, or gets beyond the end of the +heap. + +For the first allocation the job is straightforward. There's one range and it's +all free. Split that into 2 ranges, using the first for the allocation. + +On subsequent allocations there will be more header values to read, but the +logic is the same. + +{{% notice Addresses in Logging%}} +The logging enabled by `log_events` may not have deterministic output +on systems where features like Address Space Layout Randomisation (ASLR) are +enabled. Generally run to run, the output addresses may change. Focus on the +relative values of pointers in relation to where the heap starts and ends. +{{% /notice %}} + +```C {file_name="heap.c"} +#include +#include +#include +#include +#include + +// Enable logging of heap events and current memory ranges. +static bool log_events = true; + +// printf but can be globally disabled by setting log_events to false. +static void log_event(const char *fmt, ...) { + if (log_events) { + va_list args; + va_start(args, fmt); + vprintf(fmt, args); + va_end(args); + } +} + +// We will allocate memory from this statically allocated array. If you are on +// Linux this could be made dynamic by getting it from mmap. +#define STORAGE_SIZE 4096 +static char storage[STORAGE_SIZE]; +// If our search reaches this point, there is no free space to allocate. +static const char *storage_end = storage + STORAGE_SIZE; + +// The heap is divided into ranges, initially only 1 that covers the whole heap +// and is marked free. A header is the number of bytes in the range, and a +// single bit to say whether it is free or allocated. +typedef struct { + uint64_t size : 63; + bool allocated : 1; +} Header; +// This header is placed at the start of each range, so we will return pointers +// that point to the first byte after it. +_Static_assert(sizeof(Header) == sizeof(uint64_t)); + +void log_header(Header header) { + log_event("0x%016lx (%s, size = %u bytes)\n", header, + header.allocated ? "allocated" : "free", header.size); +} + +static Header read_header(const char *ptr) { return *(Header *)ptr; } + +static void write_header(char *ptr, Header header) { + *(Header *)ptr = header; + log_event("[%p] Set header to ", ptr); + log_header(header); +} + +// Log a table showing the ranges currently marked in the heap. +static void log_ranges() { + Header header = {.size = 0, .allocated = false}; + for (const char *header_ptr = storage; header_ptr < storage_end; + header_ptr += header.size) { + header = read_header(header_ptr); + log_event(" [%p -> %p) : ", header_ptr, header_ptr + header.size); + log_header(header); + } +} + +void simple_heap_init() { + log_event("Simple heap init:\n"); + log_event("Storage [%p -> %p) (%d bytes)\n", storage, storage_end, + STORAGE_SIZE); + + // On startup, all the heap is one free range. + Header hdr = {.size = STORAGE_SIZE, .allocated = false}; + write_header(storage, hdr); + log_ranges(); +} + +// Search for a free range that has at least `bytes` of space +// (callers should include the header size). +static char *find_free_space(size_t bytes) { + Header header = {.size = 0, .allocated = false}; + for (char *header_ptr = storage; header_ptr < storage_end; + header_ptr += header.size) { + header = read_header(header_ptr); + assert(header.size != 0 && "Header should always have non-zero size."); + if (!header.allocated && (header.size >= bytes)) + return header_ptr; + } + + return NULL; +} + +// Take an existing free range and split it such that there is a `bytes` sized +// range at the start and a new, smaller, free range after that. +static void split_range(char *range, uint64_t size) { + Header original_header = read_header(range); + assert(!original_header.allocated && + "Shouldn't be splitting an allocated range."); + + // Mark what we need as allocated. + Header new_header = {.size = size, .allocated = true}; + write_header(range, new_header); + + // The following space is free and needs a new header to say so. + uint64_t remaining = original_header.size - size; + if (remaining) { + Header free_header = {.size = remaining, .allocated = false}; + write_header(range + size, free_header); + } +} + +// Attempt to allocate `size` bytes of memory. Returns NULL for 0 sized +// allocations or when we have run out of heap memory. The size passed here does +// not include header size, this is an internal detail. So the returned pointer +// will be sizeof(Header) further forward than the start of the range used. +void *simple_malloc(size_t size) { + if (!size) + return NULL; + + log_event("\nTrying to allocate %ld bytes\n", size); + + // Extra space to include the header. + uint64_t required_size = size + sizeof(Header); + char *allocated = find_free_space(required_size); + + if (!allocated) { + log_event("Heap exhausted.\n"); + return NULL; + } + + // Split the found range into this new allocation and a new free range after + // it. + split_range(allocated, required_size); + + // Return a pointer to after the header. + allocated += sizeof(Header); + + log_event("[%p] Memory was allocated, size %ld bytes\n", allocated, size); + log_ranges(); + return allocated; +} + +// Free the allocation pointed to by ptr. This simply sets the range to free, +// does not change its size of any of its contents. +void simple_free(void *ptr) { + if (!ptr) + return; + + assert(((char*)ptr > storage) && ((char*)ptr < storage_end) && + "Trying to free pointer that is not within the heap."); + + log_event("\n[%p] Freeing allocation\n", ptr); + + // This will point to after the header of the range it's in, so we must walk + // back a bit. + char *header_ptr = (char *)ptr - sizeof(Header); + + Header header = read_header(header_ptr); + assert(header.size != 0 && "Can't free an allocation of zero size."); + + // Mark this range as free, leave the size unchanged. + header.allocated = false; + write_header(header_ptr, header); + + log_event("[%p] Memory was freed\n", ptr); + log_ranges(); +} +``` + +### main.c + +```C { file_name="main.c"} +#include "heap.h" + +int main() { + simple_heap_init(); + + char *ptr = simple_malloc(100); + char *ptr2 = simple_malloc(240); + char *ptr3 = simple_malloc(256); + char *ptr4 = simple_malloc(333); + simple_free(ptr2); + simple_free(ptr3); + char *ptr5 = simple_malloc(300); + + return 0; +} +``` + +The code here does allocation and deallocation of memory. This tests the heap +code but also highlights an interesting problem that you'll see more about later. + +## Building + +First install dependencies. + +```bash +sudo apt install -y cmake ninja-build +``` + +Then configure using CMake. We recomend a Debug build for the extra safety the +asserts bring. + +```bash +cmake . -DCMAKE_BUILD_TYPE=Debug -G Ninja +``` + +Then build with `ninja` + +```bash +ninja +``` + +This should result in a `demo` executable in the same folder. Run this to see +the allocator in action. + +```bash +./demo +``` + +## Output + +The output addresses will vary depending on where backing memory gets allocated +by your system but this is the general form you should expect: + +```text +Simple heap init: +Storage [0x559871a24040 -> 0x559871a25040) (4096 bytes) +[0x559871a24040] Set header to 0x0000000000001000 (free, size = 4096 bytes) + [0x559871a24040 -> 0x559871a25040) : 0x0000000000001000 (free, size = 4096 bytes) +``` + +The addresses on the left usually refer to an action. In this case we've set +a `Header` value at `0x559871a24040`. + +The list in the last lines is the set of ranges you would see if you walked the +heap. Exactly what the allocator is seeing. The use of `[` followed by `)` +means that the start address is included in the range, but the end address is +not. This is the initial heap state where everything is free. + +Next there is a call `simple_malloc(100)` which produces: + +```text +Trying to allocate 100 bytes +[0x55e68c41f040] Set header to 0x800000000000006c (allocated, size = 108 bytes) +[0x55e68c41f0ac] Set header to 0x0000000000000f94 (free, size = 3988 bytes) +[0x55e68c41f048] Memory was allocated, size 100 bytes + [0x55e68c41f040 -> 0x55e68c41f0ac) : 0x800000000000006c (allocated, size = 108 bytes) + [0x55e68c41f0ac -> 0x55e68c420040) : 0x0000000000000f94 (free, size = 3988 bytes) +``` + +You see that a request was made for 100 bytes and the allocator decided to split +the 1 range into 2. It updated both the new ranges' header information. + +Note that although it says `[0x559871a24048] Memory was allocated`, you do not +see a range starting from this address. This is because this address is the one +returned to the user. Take the size of `Header` from this address and you get the +start of the range which is `0x559871a24040` as shown in the first range in the +list. + +You'll also notice that the allocated range is 8 bytes bigger than what the user +asked for. This is because it includes that `Header` at the start of it. + +If you skip ahead to after the `free` calls have been made you will see: + +```text +[0x55e68c41f1ac] Freeing allocation +[0x55e68c41f1a4] Set header to 0x0000000000000108 (free, size = 264 bytes) +[0x55e68c41f1ac] Memory was freed + [0x55e68c41f040 -> 0x55e68c41f0ac) : 0x800000000000006c (allocated, size = 108 bytes) + [0x55e68c41f0ac -> 0x55e68c41f1a4) : 0x00000000000000f8 (free, size = 248 bytes) + [0x55e68c41f1a4 -> 0x55e68c41f2ac) : 0x0000000000000108 (free, size = 264 bytes) + [0x55e68c41f2ac -> 0x55e68c41f401) : 0x8000000000000155 (allocated, size = 341 bytes) + [0x55e68c41f401 -> 0x55e68c420040) : 0x0000000000000c3f (free, size = 3135 bytes) +``` + +Which shows you that the second and third allocations were freed, and there is +still a large range of free memory on the end. + +Try to understand what the final allocation result is. Is the choice of location +expected or would you expect it to fit elsewhere in the heap? \ No newline at end of file diff --git a/content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/4_conclusions_further_work.md b/content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/4_conclusions_further_work.md new file mode 100644 index 000000000..f0d69fe3d --- /dev/null +++ b/content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/4_conclusions_further_work.md @@ -0,0 +1,181 @@ +--- +title: Conclusions +weight: 5 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Conclusions + +You've now had a glimpse into the world of dynamic memory allocation, and +probably have more questions than answers. You may have noticed some oversights +in the implementation presented, and you're almost certainly right, we'll get to +those shortly. + +Overall your take away from this material is that "dynamic" memory allocation +can mean many things. Sometimes it is all dynamic, sometimes it is a dynamic +face with a static allocation behind it. This will change depending on the +performance and complexity needs of the application. + +Fundementally it provides a way to get memory you did not know whether you would +need when the program was written. You knew you would need some non-zero amount +and dynamic allocation lets you ask for it while the program is running. + +The implementation shown here is a "classic" heap, and a very simple one at that +(not quite minimal, look up "bump allocator" for that). + +Memory allocation is a whole field of study, and you can use this implementation +as a base for further research if you wish. + +## Further Work + +### Merging Free Ranges + +Look again at the last logging example on the previous page. + +```text +[0x55e68c41f1ac] Memory was freed + [0x55e68c41f040 -> 0x55e68c41f0ac) : 0x800000000000006c (allocated, size = 108 bytes) + [0x55e68c41f0ac -> 0x55e68c41f1a4) : 0x00000000000000f8 (free, size = 248 bytes) + [0x55e68c41f1a4 -> 0x55e68c41f2ac) : 0x0000000000000108 (free, size = 264 bytes) + [0x55e68c41f2ac -> 0x55e68c41f401) : 0x8000000000000155 (allocated, size = 341 bytes) + [0x55e68c41f401 -> 0x55e68c420040) : 0x0000000000000c3f (free, size = 3135 bytes) +``` + +What's wrong with these ranges? Nothing, until you allocate something >= 249 +bytes. We should be able to put that at address `0x55e68c41f0ac`, but because +we treat the 2 free ranges as separate, we can't put it there, or in the second +free range. + +```text + [0x55e68c41f0ac -> 0x55e68c41f1a4) : 0x00000000000000f8 (free, size = 248 bytes) + [0x55e68c41f1a4 -> 0x55e68c41f2ac) : 0x0000000000000108 (free, size = 264 bytes) +``` + +To solve this, you would need some kind of cleanup step after a free. Where +free ranges next to each other are merged into one. + +Then again, this does add some overhead. Perhaps it shouldn't be called on every +free. Think about the tradeoff there (and don't be afraid to change the data +structures you've used, they are not perfect either). + +### Memory Safety (Or Lack Of) + +A big problem with memory in general is code accessing or changing memory that +it should not. The allocator presented here is certainly vunerable to all the +classic memory exploits, which you can try out yourself. + +Replace the allocations in `main.c` with these to see what happens. + +Use after free: +```C + int *ptr = simple_malloc(sizeof(int)); + *ptr = 123; + simple_free(ptr); + int *ptr2 = simple_malloc(sizeof(int)); + *ptr = 345; +``` + +There's a good chance `ptr2` will point to the same place as `ptr`. Meaning that +someone could use `ptr` to modify the data now at `ptr2`. This can be even worse +if the type of that data has changed in the meantime. + +Double free: +```C + int *ptr = simple_malloc(sizeof(int)); + simple_free(ptr); + int *ptr2 = simple_malloc(sizeof(int)); + simple_free(ptr); + int *ptr3 = simple_malloc(sizeof(int)); + // Ends up changing *ptr2 as well. + *ptr3 = 123; +``` + +Here you see the allocation `ptr` is freed once, then `ptr2` is allocated, likely +at the same place as `ptr`. When `ptr` is freed again, this would free the `ptr2` +allocation as well. + +Meaning that instead of being its own allocation, `ptr3` also ends up pointing +to the same location as `ptr2`. So modiying one modifies the other. + +Another possibility is that memory that was previously freed is used as part of +a larger allocation. So the original range header is now in the middle of the +new allocation. + +When free is called for the second time, the allocator may blindly write to where +it would have stored the metadata for the original allocation. In doing so, it will +corrupt the original allocation. + +Buffer overflow: +```C + char *ptr = simple_malloc(4); + char *ptr2 = simple_malloc(4); + ptr[4] = 1; +``` + +`ptr` is a 4 item array, `ptr2` is also a 4 byte array immediately after the +first one. Writing to `ptr[4]` overflows the array, because the maximum index +is only 3. + +This would corrupt the header attached to the `ptr2` allocation. In the case +of your allocator, it would likely change the size of the allocation to just 1 +byte. + +That's a selection of the many, many, possible attacks on the heap. + +You could consider how they might be mitigated, or even try applying some of +them to the heap you have just written. + +### Special Case Allocators + +Imagine you are writing a video game with a fixed memory budget and need +predictable performance. Do you think a heap that has to walk a variable number +of ranges would be able to achieve that? + +If you think it wouldn't, you could look into +[Region-Based Memory Management](https://en.wikipedia.org/wiki/Region-based_memory_management). + +(whether it would or not depends enitrely on your application's requirements) + +This takes advantages of scenarios where you know the upper limit of objects you +will need, along with their types and sizes. + +For the video game, maybe you are making a menu that will have at most 256 +entries. Why not statically allocate an array of 256 menu item objects on start +up? Then simply construct a new item in place in the array as you need them. + +It is more overhead if the menu is always small, but it's very predictable. +Maximum memory use is known and there is no variable time taken to walk the heap. + +You could also mix this approach into a traditional heap, using areas of memory +only for certain types or sizes of data. For example, could it reduce the metadata +overhead for small allocations (e.g. a 4 byte allocation that may require > 4 bytes of +metadata)? + +### LD_PRELOAD + +If your allocator grows to support all the C standard library functions, you +can try using it instead of the one your system C library provides. + +On Linux this is done using the environment variable `LD_PRELOAD`. + +``` +LD_PRELOAD= +``` + +Any shared object in `LD_PRELOAD` gets to provide the symbols a program needs +before what it would usually load. So in this case you will provide `malloc` +and the other memory management functions. + +You will have to rebuild the code as a shared object, and remove the `simple_` +prefix from the functions to do this. + +Note that if you only implement a subset of the memory management functions, +the program being run will get the rest from the system C library. This will +almost certianly lead to a crash when it tries to, for example, `realloc` a +pointer that your heap produced, but instead asks the system heap to do it. + +Finally, you will likely need a lot more storage for the heap. Either increase +the size of the static allocation, or consider using `mmap` to ask the kernel +for memory, as C libraries tend to do instead. \ No newline at end of file diff --git a/content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/_index.md b/content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/_index.md new file mode 100644 index 000000000..e2fa6a7c0 --- /dev/null +++ b/content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/_index.md @@ -0,0 +1,31 @@ +--- +armips: null +author_primary: David Spickett +layout: learningpathall +learning_objectives: +- Explain how dynamic memory allocation and the C heap works. +- Write a simple dynamic memory allocator. +- Explain some of the flaws and risks of heap allocation in general, and the specific + implementation you have written. +learning_path_main_page: 'yes' +minutes_to_complete: 120 +operatingsystems: +- Linux +prerequisites: +- Familiarity with C programming, with a good understanding of pointers. +skilllevels: Introductory +subjects: Memory Allocation +test_images: +- ubuntu:latest +test_link: null +test_maintenance: true +test_status: +- passed +title: Writing a Dynamic Memory Allocator +tools_software_languages: +- C Programming +weight: 1 +who_is_this_for: Those learning about dynamic memory allocation for the first time, + who may have used C's malloc and free before. Also suitable for those looking for + a simple template from which to explore more advanced topics. +--- diff --git a/content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/_next-steps.md b/content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/_next-steps.md new file mode 100644 index 000000000..9a6645c9c --- /dev/null +++ b/content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/_next-steps.md @@ -0,0 +1,22 @@ +--- +next_step_guidance: + +recommended_path: /learning-paths/PLACEHOLDER_CATEGORY/PLACEHOLDER_LEARNING_PATH/ + +further_reading: + - resource: + title: C Dynamic Memory Management Functions + link: https://en.cppreference.com/w/c/memory + type: documentation + - resource: + title: LLSoftSecBook chapter on Memory Vunerabilities + link: https://llsoftsec.github.io/llsoftsecbook/#memory-vulnerability-based-attacks + type: website + +# ================================================================================ +# FIXED, DO NOT MODIFY +# ================================================================================ +weight: 21 # set to always be larger than the content in this path, and one more than 'review' +title: "Next Steps" # Always the same +layout: "learningpathall" # All files under learning paths have this same wrapper +--- diff --git a/content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/_review.md b/content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/_review.md new file mode 100644 index 000000000..aa63ecc1a --- /dev/null +++ b/content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/_review.md @@ -0,0 +1,78 @@ +--- +review: + - questions: + question: > + What is one difference between static and dynamic memory allocation? + answers: + - Dynamic allocation cannot be done on embedded systems, but static + allocation can. + - Dynamic allocation takes place while the program is running, rather + than when it is built. + - Static allocation can allocate larger amounts of memory than dynamic + allocation. + correct_answer: 2 + explanation: > + Both types of allocation can run on any sort of system, though the + complexity of the dynamic allocator may change. + + Dynamic allocation is done using runtime calls, so the program can + react to what's needed at the time. Static alloation is decided ahead + of time instead. + + Both types of allocation have the same memory constraints as the + system itself. So in theory at least, they could have access to the + same amount of memory. + + - questions: + question: > + Do C's memory management functions like malloc and free validate the + addresses passed to them? + answers: + - Never + - Always + - The implementation may choose to validate them, but does not have to. + correct_answer: 3 + explanation: > + An allocator may choose to be strict about the parameters it accepts + but the C specification at least does not require it to be. Generally + this strictness can be controlled with debugging or hardening options. + + When writing your own allocators, you get to decide how to handle + invalid data. + + - questions: + question: > + If the allocator presented here was used mainly for very small + allocations (less than 8 bytes), what concern would you have? + answers: + - That memory was being wasted because the details of each allocation + (which take up 8 bytes) are larger than the useable space in each + allocation. + - That the heap walk would take an unacceptably long time due to + the large amount of ranges. + - That the time taken to walk the heap would increase as time went + on, until it was eventually unacceptable. + - All of the above. + correct_answer: 4 + explanation: > + Everything mentioned is a concern here, and is why some allocators + prefer to use "pools" or "buckets" for very small allocations. + + The allocator can make assumptions about these special areas that + reduce the time taken to find a free range, and the overhead of + recording the information about the ranges. + + In our case, the performance of the heap would be ok to begin with. + As the program continues, more and more small ranges pile up. Leading + to poorer performance later. + + This is sometimes not a problem, but for real time applications like + video game, unpredictable heap performance is a problem. + +# ================================================================================ +# FIXED, DO NOT MODIFY +# ================================================================================ +title: "Review" # Always the same title +weight: 20 # Set to always be larger than the content in this path +layout: "learningpathall" # All files under learning paths have this same wrapper +--- From 40e45a414f7963ced189a51b8888a56395b16c03 Mon Sep 17 00:00:00 2001 From: Jason Andrews Date: Thu, 2 Nov 2023 16:05:35 -0500 Subject: [PATCH 20/35] tested new Learning Path on dynamic memory allocation --- .../1_dynamic_memory_allocation.md | 80 ++++++++++--------- .../2_designing_a_dynamic_memory_allocator.md | 69 ++++++++-------- ...implementing_a_dynamic_memory_allocator.md | 59 ++++++++------ .../4_conclusions_further_work.md | 73 +++++++++-------- .../dynamic-memory-allocator/_index.md | 39 +++++++++ .../dynamic-memory-allocator/_next-steps.md | 2 +- .../dynamic-memory-allocator/_review.md | 4 +- .../dynamic-memory-allocator/_index.md | 31 ------- contributors.csv | 1 + 9 files changed, 194 insertions(+), 164 deletions(-) rename content/learning-paths/{laptops-and-desktops => cross-platform}/dynamic-memory-allocator/1_dynamic_memory_allocation.md (61%) rename content/learning-paths/{laptops-and-desktops => cross-platform}/dynamic-memory-allocator/2_designing_a_dynamic_memory_allocator.md (67%) rename content/learning-paths/{laptops-and-desktops => cross-platform}/dynamic-memory-allocator/3_implementing_a_dynamic_memory_allocator.md (87%) rename content/learning-paths/{laptops-and-desktops => cross-platform}/dynamic-memory-allocator/4_conclusions_further_work.md (72%) create mode 100644 content/learning-paths/cross-platform/dynamic-memory-allocator/_index.md rename content/learning-paths/{laptops-and-desktops => cross-platform}/dynamic-memory-allocator/_next-steps.md (88%) rename content/learning-paths/{laptops-and-desktops => cross-platform}/dynamic-memory-allocator/_review.md (95%) delete mode 100644 content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/_index.md diff --git a/content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/1_dynamic_memory_allocation.md b/content/learning-paths/cross-platform/dynamic-memory-allocator/1_dynamic_memory_allocation.md similarity index 61% rename from content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/1_dynamic_memory_allocation.md rename to content/learning-paths/cross-platform/dynamic-memory-allocator/1_dynamic_memory_allocation.md index 1816354e9..a410bc996 100644 --- a/content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/1_dynamic_memory_allocation.md +++ b/content/learning-paths/cross-platform/dynamic-memory-allocator/1_dynamic_memory_allocation.md @@ -1,20 +1,22 @@ --- -title: Dynamic Memory Allocation +title: Dynamic memory allocation weight: 2 ### FIXED, DO NOT MODIFY layout: learningpathall --- -## Dynamic vs. Static Allocation +## Dynamic vs. static memory allocation -In this learning path you will learn how to implement dynamic memory allocation. -If you have used C's "heap" (`malloc`, `free`, etc.) before, that is one example +In this Learning Path you will learn how to implement dynamic memory allocation. +If you have used the C programming language "heap" (`malloc`, `free`, etc.) before, that is one example of dynamic memory allocation. -It allows programs to allocate memory while they are running without knowing -at build time what amount of memory they will need. In constrast to static -memory allocation where the amount is known at build time. +Dynamic memory allocation allows programs to allocate memory while they are running without knowing +at build time how much memory they will need. In contrast, static +memory allocation is used when the amount of memory is known at build time. + +The code sample below shows both dynamic and static memory allocation: ```C #include @@ -27,10 +29,10 @@ void fn() { } ``` -The example above shows the difference. The size and location of `a` is known +In the example above, the size and location of `a` is known when the program is built. The size of `b` is also known, but its location is not. -It may even never be allocated, as this pseudocode example shows: +Sometimes, memory may never be allocated, as in the pseudocode example below: ```C int main(...) { @@ -40,37 +42,37 @@ int main(...) { } ``` -If the user passes no arguments to the program, there's no need to allocate space -for `b`. If they do, `malloc` will find space for it. +The arguments passed to the program determine if memory is allocated or not. -## malloc +## The C library malloc function The C standard library provides a special function [`malloc`](https://en.cppreference.com/w/c/memory/malloc). `m` for "memory", -`alloc` for "allocate". This can be used to ask for a suitably sized memory -location while the program is running. +`alloc` for "allocate". This is used to ask for a suitably sized memory +location while a program is running. ```C void *malloc(size_t size); ``` -The C library will then look for a chunk of memory with size of at least `size` +The C library looks for a chunk of memory with size of at least `size` bytes in a large chunk of memory that it has reserved. For instance on Ubuntu -Linux, this will be done by GLIBC. +Linux, this is done by GLIBC. The example at the top of the page is trivial of course. As it is we could just statically allocate both integers like this: + ```C void fn() { int a, b = 0; } ``` -That's ok if this data is never be returned from this function. Or in other -words, if the lifetime of this data is equal to that of the function. +Variables `a` and `b` work fine if they are not needed outside of the function. Or in other +words, if the lifetime of the data is equal to that of the function. -A more complicated example will show you when that is not the case, and the value -lives longer than the function that created it. +A more complex example shows when this is not the case, and the values +live longer than the creating function. ```C #include @@ -93,7 +95,7 @@ void add_entry(Entry *entry, int data) { ``` What you see above is a struct `Entry` that defines a singly-linked-list entry. -Singly meaining that you can go forward via `next`, but you cannot go backwards +Singly meaning that you can go forward via `next`, but you cannot go backwards in the list. There is some data `data`, and each entry points to the next entry, `next`, assuming there is one (it will be `NULL` for the end of the list). @@ -111,8 +113,8 @@ Now you want to add another `Entry` to this list at runtime. So you do not know ahead of time what it will contain, or if we indeed will add it or not. Where would you put that entry? -* If it is another global variable, we would have to declare many empty `Entry`s - and hope we never needed more than that amount. +* If it is another global variable, we would have to declare many empty `Entry` +values and hope {{% notice Other Allocation Techniques%}} Although in this specific case global variables aren't a good solution, there are @@ -120,7 +122,7 @@ cases where large sets of pre-allocated objects can be beneficial. For example, it provides a known upper bound of memory usage and makes the timing of each allocation predictable. -However, we will not be covering these techniques in this learning path. It will +However, these techniques are not covered in this Learning Path. It will however be useful to think about them after you have completed this learning path. {{% /notice %}} @@ -128,38 +130,38 @@ path. * If it is in a function's stack frame, that stack frame will be reclaimed and modified by future functions, corrupting the new `Entry`. -So you can see, we must use dynamic memory allocation. Which is why the `add_entry` +So you can see, dynamic memory allocation is required. Which is why the `add_entry` shown above calls `malloc`. The resulting pointer points to somewhere not in the program's global data section or in any function's stack space, but in the -heap memory. Where it can live until we `free` it. +heap memory. It will stay in the heap until a call to `free` is made. -## free +## The C library free function -You cannot ask malloc for memory forever. Eventually that space behind the scenes -will run out. So you should give up your dynamic memory once it is not needed, +You cannot ask malloc for memory forever. Eventually the space behind the scenes +will run out. You should give up your dynamic memory once it is not needed, using [`free`](https://en.cppreference.com/w/c/memory/free). ```C void free(void *ptr); ``` -You call `free` with a pointer previously given to you by `malloc`, and this tells -the heap that we no longer need this memory. +You call `free` with a pointer previously returned by `malloc`, and this tells +the heap that the memory is no longer needed. -{{% notice Undefined Behaviour%}} -You may wonder what happens if you don't pass the exact pointer to `free`, as -`malloc` returned to you. The result varies as this is "undefined behaviour". +{{% notice Undefined Behavior%}} +You may wonder what happens if you don't pass the exact same pointer to `free` as +`malloc` returned. The result varies as this is "undefined behavior". Which essentially means a large variety of unexpected things can happen. In practice, many allocators will tolerate this difference or reject it outright -if it's not possible to do something sensbile with the pointer. +if it's not possible to do something sensible with the pointer. -Remember that just because one allocator handles this a certain way, does not -mean all will. Indeed, that same allocator may handle it differently for +Remember, just because one allocator handles this a certain way, does not +mean all allocators will be the same. Indeed, that same allocator may handle it differently for different allocations within the same program. {{% /notice %}} -So, you can use `free` to remove an item from your linked list. +You can use `free` to remove an item from your linked list. ```C void remove_entry(Entry* previous, Entry* entry) { @@ -183,5 +185,5 @@ to remove, so that the list skips over it. With `entry` now isolated we call [A]---------->[C] | [A] [C] ``` -That covers the high level how and why of using `malloc` and `free`, next you'll +That covers the high level how and why of using `malloc` and `free`, next you will see a possible implementation of a dynamic memory allocator. \ No newline at end of file diff --git a/content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/2_designing_a_dynamic_memory_allocator.md b/content/learning-paths/cross-platform/dynamic-memory-allocator/2_designing_a_dynamic_memory_allocator.md similarity index 67% rename from content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/2_designing_a_dynamic_memory_allocator.md rename to content/learning-paths/cross-platform/dynamic-memory-allocator/2_designing_a_dynamic_memory_allocator.md index 4b7b226d3..495351eb8 100644 --- a/content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/2_designing_a_dynamic_memory_allocator.md +++ b/content/learning-paths/cross-platform/dynamic-memory-allocator/2_designing_a_dynamic_memory_allocator.md @@ -1,41 +1,41 @@ --- -title: Designing a Dynamic Memory Allocator +title: Design a dynamic memory allocator weight: 3 ### FIXED, DO NOT MODIFY layout: learningpathall --- -## High Level Design +## High level design To begin with, decide which functions your memory allocator will provide. We have described `malloc` and `free`, there are more provided by the [C library](https://en.cppreference.com/w/c/memory). -This will assume you just need `malloc` and `free`. Start with those and write -out their behaviours, as the programmer using your allocator will see. +This will assume you just need `malloc` and `free`. The new implementations will +be called `simple_malloc` and `simple_free`. Start with just two functions and write +out their behaviors. -There will be a function, `malloc`. It will: -* Take a size in bytes as a parameter. -* Try to allocate some memory. -* Return a pointer to that memory, NULL pointer otherwise. +The first function is `simple_malloc` and it will: +* Take a size in bytes as a parameter +* Try to allocate the requested memory +* Return a pointer to that memory or return a NULL pointer if the memory cannot be allocated -There will be a function `free`. It will: -* Take a pointer to some previously allocated memory as a parameter. -* Mark that memory as avaiable for future allocations. +The second function is `simple_free` and it will: +* Take a pointer to some previously allocated memory as a parameter +* Mark that memory as available for future allocations From this you can see that you will need: * Some large chunk of memory, the "backing storage". -* A way to mark parts of that memory as allocated, or available for allocation. +* A way to mark parts of that memory as allocated, or available for allocation -## Backing Storage +## Backing storage The memory can come from many sources. It can even change size throughout the -program's execution if you wish. For your allocator you'll keep it as simple -as possible. +program's execution if you wish. For your allocator you can keep it simple. A single, statically allocated global array of bytes will be your backing -storage. So you can do dynamic allocation of parts of a statically allocated +storage. You can do dynamic allocation of parts of a statically allocated piece of memory. ```C @@ -43,21 +43,20 @@ piece of memory. static char storage[STORAGE_SIZE]; ``` -## Record Keeping +## Record keeping This backing memory needs to be annotated somehow to record what has been -allocated so far. There are many, many ways to do this. With the biggest choice -here being whether to store these records in the heap itself, our outside of it. +allocated so far. There are many ways to do this. Te biggest choice +is whether to store these records in the heap itself or outside of it. -We will not go into those tradeoffs here, and instead you will put the records -in the heap, as this is relatively simple to do. +The easiest way is to put the records in the heap. -What should be in your records? Think about what question the software will ask -us. Can you give me a pointer to an area of free memory of at least this size? +What should be in the records? Think about the question the caller is asking. +Can you give me a pointer to an area of memory of at least this size? For this you will need to know: -* Which ranges of the backing storage have been allocated or not. -* How large each of ranges sections is. This includes free areas. +* The ranges of the backing storage that have already been allocated +* The size of each section, both free and allocated Where a "range" a pointer to a location, a size in bytes and a boolean to say whether the range is free or allocated. So a range from 0x123 of 345 bytes, @@ -67,7 +66,7 @@ that has been allocated would be: start: 0x123 size: 345 allocated: true ``` -For the intial state of a heap of size `N`, you will have one range of +For the initial state of a heap of size `N`, you will have one range of unallocated memory. ```text @@ -102,7 +101,7 @@ Pointer: 0x4 Size: N-4 Allocated: False range = 0x4 + (N-4) = 1 beyond the end of the heap, so the walk is finished. ``` -`free` uses the pointer given to it to find the range it needs to deallocate. +`simple_free` uses the pointer given to it to find the range it needs to deallocate. Let's say the 4 byte allocation was freed: ```text @@ -110,18 +109,18 @@ Pointer: 0x0 Size: 4 Allocated: False Pointer: 0x4 Size: N-4 Allocated: False ``` -Since `free` gets a pointer directly to the allocation you know exactly which +Since `simple_free` gets a pointer directly to the allocation you know exactly which range to modify. The only change made is to the boolean which marks it as allocated or not. The location and size of the range stay the same. {{% notice Merging Free Ranges%}} -The allocator presented here will not merge free ranges like the 2 above. This +The allocator presented here does not merge free ranges like the 2 above. This is a deliberate limitation and addressing this is discussed later. {{% /notice %}} -## Record Storage +## Record storage -You'll keep these records in heap which means using some of the allocated space +You will keep these records in the heap which means using some of the allocated space for them on top of the allocation itself. The simplest way to do this is to prepend each allocation with the range @@ -135,13 +134,13 @@ ease. <...and so on until the end of the heap...> ``` -Pointers returned by `malloc` are offset to just beyond the range information. -When `free` receives a pointer, it can get to the range information by +Pointers returned by `simple_malloc` are offset to just beyond the range information. +When `simple_free` receives a pointer, it can get to the range information by subtracting the size of that information from the pointer. Using the example above: ```text -free(my_ptr); +simple_free(my_ptr); 0x00: [ptr, size, allocated] <-- my_ptr - sizeof(range information) 0x08: <...> <-- my_ptr @@ -153,7 +152,7 @@ calculations above must be adjusted. The allocator presented here does not concern itself with alignment, which is why it can do a simple subtraction. {{% /notice %}} -## Running Out Of Space +## Running out of space The final thing an allocator must do is realise it has run out of space. This is simply achieved by knowing the bounds of the backing storage. diff --git a/content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/3_implementing_a_dynamic_memory_allocator.md b/content/learning-paths/cross-platform/dynamic-memory-allocator/3_implementing_a_dynamic_memory_allocator.md similarity index 87% rename from content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/3_implementing_a_dynamic_memory_allocator.md rename to content/learning-paths/cross-platform/dynamic-memory-allocator/3_implementing_a_dynamic_memory_allocator.md index 3169ecdce..b486463eb 100644 --- a/content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/3_implementing_a_dynamic_memory_allocator.md +++ b/content/learning-paths/cross-platform/dynamic-memory-allocator/3_implementing_a_dynamic_memory_allocator.md @@ -1,26 +1,34 @@ --- -title: Implementing a Dynamic Memory Allocator +title: Implement a dynamic memory allocator weight: 4 ### FIXED, DO NOT MODIFY layout: learningpathall --- -## Project Structure +The source code of the `simple_malloc` and `simple_free` memory allocation functions are below. +Everything required to build and run example allocations are also provided. -The file layout will be as follows: -* `CMakeLists.txt` - To tell `cmake` how to configure the project. +You will need a Linux machine to try the code and see how the allocation works. + +## Project structure + +The files used are: +* `CMakeLists.txt` - Tells `cmake` how to configure and build the project. * `heap.c` - The dynamic memory allocator implementation. * `heap.h` - Function declarations including your new `simple_malloc` and `simple_free` functions. -* `main.c` - A program that makes use of `simple_malloc` and `simple_free`. +* `main.c` - A test program that makes use of `simple_malloc` and `simple_free`. + +Building it will produce a single binary, `demo`, that you can run and see the results. + +## Source code -Building it will produce a single binary, `demo`, that you will run to see the -results. +The files are listed below. -## Sources +Use a text editor to copy and paste the contents of each file on a Linux machine. -### CMakeLists.txt +Contents of `CMakeLists.txt`: ``` {file_name="CMakeLists.txt"} cmake_minimum_required(VERSION 3.15) @@ -30,7 +38,7 @@ project(MemoryAllocatorDemo C) add_executable(demo main.c heap.c) ``` -#### heap.h +Contents of `heap.h`: ```C {file_name="heap.h"} #include @@ -45,20 +53,20 @@ void *simple_malloc(size_t size); void simple_free(void *ptr); ``` -## heap.c +## Information about heap.c Please refer to the comments in the source code here for detailed explanations -of each function. We will cover the key elements here up front. +of each function. You can identify a few key elements before studying the code. First is `storage`, this is the backing storage which is a global char array. This is where the ranges, represented by `Header`, are stored. Each `Header` is written to the start of the allocated range. This means that -`malloc` returns a pointer that points just beyond this location. `free` on the +`simple_malloc` returns a pointer that points just beyond this location. `simle_free` on the other hand, deducts the size of `Header` from the pointer parameter to find the range information. -When the heap is initialised with `simple_heap_init`, a single range is setup +When the heap is initialized with `simple_heap_init`, a single range is setup that covers the whole heap and marks it as unallocated. To find a free range, `find_free_space` walks the heap using these `Header` @@ -78,6 +86,8 @@ enabled. Generally run to run, the output addresses may change. Focus on the relative values of pointers in relation to where the heap starts and ends. {{% /notice %}} +Contents of `heap.c`: + ```C {file_name="heap.c"} #include #include @@ -243,7 +253,7 @@ void simple_free(void *ptr) { } ``` -### main.c +Contents of `main.c`: ```C { file_name="main.c"} #include "heap.h" @@ -263,38 +273,41 @@ int main() { } ``` -The code here does allocation and deallocation of memory. This tests the heap +The main code does allocation and deallocation of memory. This tests the heap code but also highlights an interesting problem that you'll see more about later. -## Building +## Build the source code -First install dependencies. +Install the required tools using the command: ```bash sudo apt install -y cmake ninja-build ``` -Then configure using CMake. We recomend a Debug build for the extra safety the +Next, configure using CMake. You can use a Debug build for the extra safety the asserts bring. ```bash cmake . -DCMAKE_BUILD_TYPE=Debug -G Ninja ``` -Then build with `ninja` +## Build and run a test + +Build the executable with `ninja`: ```bash ninja ``` -This should result in a `demo` executable in the same folder. Run this to see -the allocator in action. +You now have a `demo` executable in the same folder. + +Run `demo` to see the allocator in action: ```bash ./demo ``` -## Output +## Review the program output The output addresses will vary depending on where backing memory gets allocated by your system but this is the general form you should expect: diff --git a/content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/4_conclusions_further_work.md b/content/learning-paths/cross-platform/dynamic-memory-allocator/4_conclusions_further_work.md similarity index 72% rename from content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/4_conclusions_further_work.md rename to content/learning-paths/cross-platform/dynamic-memory-allocator/4_conclusions_further_work.md index f0d69fe3d..4d3545f99 100644 --- a/content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/4_conclusions_further_work.md +++ b/content/learning-paths/cross-platform/dynamic-memory-allocator/4_conclusions_further_work.md @@ -1,5 +1,5 @@ --- -title: Conclusions +title: Memory allocation summary weight: 5 ### FIXED, DO NOT MODIFY @@ -8,29 +8,29 @@ layout: learningpathall ## Conclusions -You've now had a glimpse into the world of dynamic memory allocation, and +You have a glimpse into the world of dynamic memory allocation, and probably have more questions than answers. You may have noticed some oversights -in the implementation presented, and you're almost certainly right, we'll get to -those shortly. +in the implementation presented, and you are almost certainly right. Overall your take away from this material is that "dynamic" memory allocation can mean many things. Sometimes it is all dynamic, sometimes it is a dynamic face with a static allocation behind it. This will change depending on the performance and complexity needs of the application. -Fundementally it provides a way to get memory you did not know whether you would -need when the program was written. You knew you would need some non-zero amount -and dynamic allocation lets you ask for it while the program is running. +Fundamentally, dynamic memory allocation provides a way to get memory you +did not know whether you would +need when the program was written. You likely know you need some amount +of memory and dynamic allocation lets you ask for it while the program is running. The implementation shown here is a "classic" heap, and a very simple one at that (not quite minimal, look up "bump allocator" for that). -Memory allocation is a whole field of study, and you can use this implementation -as a base for further research if you wish. +Memory allocation is an entire field of study, and you can use this implementation +as a basis for further research. -## Further Work +## Further work -### Merging Free Ranges +### Merging free ranges Look again at the last logging example on the previous page. @@ -44,31 +44,31 @@ Look again at the last logging example on the previous page. ``` What's wrong with these ranges? Nothing, until you allocate something >= 249 -bytes. We should be able to put that at address `0x55e68c41f0ac`, but because -we treat the 2 free ranges as separate, we can't put it there, or in the second -free range. +bytes. The allocator should be able to put that at address `0x55e68c41f0ac`, but because +the 2 free ranges are separated, the requested memory doesn't fit. ```text [0x55e68c41f0ac -> 0x55e68c41f1a4) : 0x00000000000000f8 (free, size = 248 bytes) [0x55e68c41f1a4 -> 0x55e68c41f2ac) : 0x0000000000000108 (free, size = 264 bytes) ``` -To solve this, you would need some kind of cleanup step after a free. Where -free ranges next to each other are merged into one. +To solve this, you would need a cleanup step after a free. Where +free ranges next to each other are merged into one free range. Then again, this does add some overhead. Perhaps it shouldn't be called on every -free. Think about the tradeoff there (and don't be afraid to change the data +free. Think about the tradeoff (and don't be afraid to change the data structures you've used, they are not perfect either). -### Memory Safety (Or Lack Of) +### Memory safety A big problem with memory in general is code accessing or changing memory that -it should not. The allocator presented here is certainly vunerable to all the +it should not. The allocator presented here is certainly vulnerable to all the classic memory exploits, which you can try out yourself. Replace the allocations in `main.c` with these to see what happens. -Use after free: +Here is a use after free: + ```C int *ptr = simple_malloc(sizeof(int)); *ptr = 123; @@ -81,7 +81,8 @@ There's a good chance `ptr2` will point to the same place as `ptr`. Meaning that someone could use `ptr` to modify the data now at `ptr2`. This can be even worse if the type of that data has changed in the meantime. -Double free: +Here is a double free: + ```C int *ptr = simple_malloc(sizeof(int)); simple_free(ptr); @@ -97,7 +98,7 @@ at the same place as `ptr`. When `ptr` is freed again, this would free the `ptr2 allocation as well. Meaning that instead of being its own allocation, `ptr3` also ends up pointing -to the same location as `ptr2`. So modiying one modifies the other. +to the same location as `ptr2`. So modifying one modifies the other. Another possibility is that memory that was previously freed is used as part of a larger allocation. So the original range header is now in the middle of the @@ -107,7 +108,8 @@ When free is called for the second time, the allocator may blindly write to wher it would have stored the metadata for the original allocation. In doing so, it will corrupt the original allocation. -Buffer overflow: +Here is buffer overflow: + ```C char *ptr = simple_malloc(4); char *ptr2 = simple_malloc(4); @@ -118,7 +120,7 @@ Buffer overflow: first one. Writing to `ptr[4]` overflows the array, because the maximum index is only 3. -This would corrupt the header attached to the `ptr2` allocation. In the case +This will corrupt the header attached to the `ptr2` allocation. In the case of your allocator, it would likely change the size of the allocation to just 1 byte. @@ -127,39 +129,41 @@ That's a selection of the many, many, possible attacks on the heap. You could consider how they might be mitigated, or even try applying some of them to the heap you have just written. -### Special Case Allocators +### Special case allocators Imagine you are writing a video game with a fixed memory budget and need predictable performance. Do you think a heap that has to walk a variable number of ranges would be able to achieve that? -If you think it wouldn't, you could look into +If you think it wouldn't, you can look into [Region-Based Memory Management](https://en.wikipedia.org/wiki/Region-based_memory_management). -(whether it would or not depends enitrely on your application's requirements) +(whether it would or not depends entirely on your application's requirements) -This takes advantages of scenarios where you know the upper limit of objects you +This takes advantage of scenarios where you know the upper limit of objects you will need, along with their types and sizes. -For the video game, maybe you are making a menu that will have at most 256 +For a video game, maybe you are making a menu that will have at most 256 entries. Why not statically allocate an array of 256 menu item objects on start up? Then simply construct a new item in place in the array as you need them. It is more overhead if the menu is always small, but it's very predictable. Maximum memory use is known and there is no variable time taken to walk the heap. -You could also mix this approach into a traditional heap, using areas of memory +You can also mix this approach into a traditional heap, using areas of memory only for certain types or sizes of data. For example, could it reduce the metadata overhead for small allocations (e.g. a 4 byte allocation that may require > 4 bytes of metadata)? -### LD_PRELOAD +### The LD_PRELOAD environment variable If your allocator grows to support all the C standard library functions, you can try using it instead of the one your system C library provides. On Linux this is done using the environment variable `LD_PRELOAD`. +Set the environment variable to point to your allocator instead of the one provided by Linux: + ``` LD_PRELOAD= ``` @@ -173,9 +177,12 @@ prefix from the functions to do this. Note that if you only implement a subset of the memory management functions, the program being run will get the rest from the system C library. This will -almost certianly lead to a crash when it tries to, for example, `realloc` a +almost certainly lead to a crash when it tries to, for example, `realloc` a pointer that your heap produced, but instead asks the system heap to do it. Finally, you will likely need a lot more storage for the heap. Either increase the size of the static allocation, or consider using `mmap` to ask the kernel -for memory, as C libraries tend to do instead. \ No newline at end of file +for memory, as C libraries tend to do instead. + +There are many things to learn about dynamic memory allocation, but it helps +to have a good understanding of the basics. \ No newline at end of file diff --git a/content/learning-paths/cross-platform/dynamic-memory-allocator/_index.md b/content/learning-paths/cross-platform/dynamic-memory-allocator/_index.md new file mode 100644 index 000000000..0faf2fcd4 --- /dev/null +++ b/content/learning-paths/cross-platform/dynamic-memory-allocator/_index.md @@ -0,0 +1,39 @@ +--- +armips: null +author_primary: David Spickett +layout: learningpathall +learning_objectives: +- Explain how dynamic memory allocation and the C heap works +- Write a simple dynamic memory allocator +- Explain some of the flaws and risks of heap allocation in general, and the specific + implementation you have studied +learning_path_main_page: 'yes' +minutes_to_complete: 120 +operatingsystems: +- Linux +prerequisites: +- Familiarity with C programming, with a good understanding of pointers. +- A Linux machine to run the example code. +skilllevels: Introductory +subjects: Performance and Architecture +armips: + - Cortex-A + - Neoverse +test_images: +- ubuntu:latest +test_link: null +test_maintenance: true +test_status: +- passed +title: Write a Dynamic Memory Allocator +tools_software_languages: +- C +- Coding +weight: 1 +who_is_this_for: This is an introductory topic for software developers learning about dynamic memory allocation for the first time, + and may have used malloc and free in C programming. It also provides a starting point to explore advanced memory allocation topics. +shared_path: true +shared_between: + - laptops-and-desktops + - embedded-systems +--- diff --git a/content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/_next-steps.md b/content/learning-paths/cross-platform/dynamic-memory-allocator/_next-steps.md similarity index 88% rename from content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/_next-steps.md rename to content/learning-paths/cross-platform/dynamic-memory-allocator/_next-steps.md index 9a6645c9c..a42a93bb7 100644 --- a/content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/_next-steps.md +++ b/content/learning-paths/cross-platform/dynamic-memory-allocator/_next-steps.md @@ -1,7 +1,7 @@ --- next_step_guidance: -recommended_path: /learning-paths/PLACEHOLDER_CATEGORY/PLACEHOLDER_LEARNING_PATH/ +recommended_path: /learning-paths/servers-and-cloud-computing/exploiting-stack-buffer-overflow-aarch64/ further_reading: - resource: diff --git a/content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/_review.md b/content/learning-paths/cross-platform/dynamic-memory-allocator/_review.md similarity index 95% rename from content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/_review.md rename to content/learning-paths/cross-platform/dynamic-memory-allocator/_review.md index aa63ecc1a..2f21591b2 100644 --- a/content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/_review.md +++ b/content/learning-paths/cross-platform/dynamic-memory-allocator/_review.md @@ -16,7 +16,7 @@ review: complexity of the dynamic allocator may change. Dynamic allocation is done using runtime calls, so the program can - react to what's needed at the time. Static alloation is decided ahead + react to what's needed at the time. Static allocation is decided ahead of time instead. Both types of allocation have the same memory constraints as the @@ -62,7 +62,7 @@ review: reduce the time taken to find a free range, and the overhead of recording the information about the ranges. - In our case, the performance of the heap would be ok to begin with. + In this case, the performance of the heap would be ok to begin with. As the program continues, more and more small ranges pile up. Leading to poorer performance later. diff --git a/content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/_index.md b/content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/_index.md deleted file mode 100644 index e2fa6a7c0..000000000 --- a/content/learning-paths/laptops-and-desktops/dynamic-memory-allocator/_index.md +++ /dev/null @@ -1,31 +0,0 @@ ---- -armips: null -author_primary: David Spickett -layout: learningpathall -learning_objectives: -- Explain how dynamic memory allocation and the C heap works. -- Write a simple dynamic memory allocator. -- Explain some of the flaws and risks of heap allocation in general, and the specific - implementation you have written. -learning_path_main_page: 'yes' -minutes_to_complete: 120 -operatingsystems: -- Linux -prerequisites: -- Familiarity with C programming, with a good understanding of pointers. -skilllevels: Introductory -subjects: Memory Allocation -test_images: -- ubuntu:latest -test_link: null -test_maintenance: true -test_status: -- passed -title: Writing a Dynamic Memory Allocator -tools_software_languages: -- C Programming -weight: 1 -who_is_this_for: Those learning about dynamic memory allocation for the first time, - who may have used C's malloc and free before. Also suitable for those looking for - a simple template from which to explore more advanced topics. ---- diff --git a/contributors.csv b/contributors.csv index 0683434a1..9043db3fa 100644 --- a/contributors.csv +++ b/contributors.csv @@ -16,4 +16,5 @@ Pranay Bakre,Arm,,,, Elham Harirpoush,Arm,,,, Frédéric -lefred- Descamps,OCI,,,,lefred.be Kristof Beyls,Arm,,,, +David Spickett,Arm,,,, Uma Ramalingam,Arm,uma-ramalingam,,, From 47d6a91d1d1e813c06c00042979331b335626cb3 Mon Sep 17 00:00:00 2001 From: GitHub Actions Stats Bot <> Date: Mon, 6 Nov 2023 01:46:00 +0000 Subject: [PATCH 21/35] automatic update of stats files --- data/stats_current_test_info.yml | 12 ++++++--- data/stats_weekly_data.yml | 43 ++++++++++++++++++++++++++++++++ 2 files changed, 51 insertions(+), 4 deletions(-) diff --git a/data/stats_current_test_info.yml b/data/stats_current_test_info.yml index eca9363ea..4ab5ef899 100644 --- a/data/stats_current_test_info.yml +++ b/data/stats_current_test_info.yml @@ -1,7 +1,7 @@ summary: - content_total: 179 - content_with_all_tests_passing: 32 - content_with_tests_enabled: 32 + content_total: 181 + content_with_all_tests_passing: 33 + content_with_tests_enabled: 33 sw_categories: cross-platform: intrinsics: @@ -77,7 +77,11 @@ sw_categories: readable_title: Terraform tests_and_status: - ubuntu:latest: passed - laptops-and-desktops: {} + laptops-and-desktops: + dynamic-memory-allocator: + readable_title: Writing a Dynamic Memory Allocator + tests_and_status: + - ubuntu:latest: passed microcontrollers: tfm: readable_title: Get started with Trusted Firmware-M diff --git a/data/stats_weekly_data.yml b/data/stats_weekly_data.yml index f2a6b4609..c2e23f7d8 100644 --- a/data/stats_weekly_data.yml +++ b/data/stats_weekly_data.yml @@ -892,3 +892,46 @@ avg_close_time_hrs: 0 num_issues: 7 percent_closed_vs_total: 0.0 +- a_date: '2023-11-06' + content: + cross-platform: 7 + embedded-systems: 15 + install-guides: 73 + laptops-and-desktops: 10 + microcontrollers: 22 + servers-and-cloud-computing: 47 + smartphones-and-mobile: 7 + total: 181 + contributions: + external: 3 + internal: 174 + github_engagement: + num_forks: 30 + num_prs: 8 + individual_authors: + brenda-strech: 1 + christopher-seidl: 4 + daniel-gubay: 1 + david-spickett: 1 + dawid-borycki: 1 + elham-harirpoush: 2 + florent-lebeau: 5 + "fr\xE9d\xE9ric--lefred--descamps": 2 + gabriel-peterson: 3 + jason-andrews: 77 + julie-gaskin: 1 + julio-suarez: 5 + kasper-mecklenburg: 1 + konstantinos-margaritis,-vectorcamp: 1 + kristof-beyls: 1 + liliya-wu: 1 + mathias-brossard: 1 + michael-hall: 3 + pareena-verma: 29 + pranay-bakre: 1 + ronan-synnott: 39 + uma-ramalingam: 1 + issues: + avg_close_time_hrs: 0 + num_issues: 10 + percent_closed_vs_total: 0.0 From 13b5f1fe59d12c952b4ce9f3c82aa44ceab00073 Mon Sep 17 00:00:00 2001 From: Pranay Bakre Date: Sun, 5 Nov 2023 23:45:09 -0800 Subject: [PATCH 22/35] Multi-architecture application deployment on EKS. --- .../eks-multi-arch/_index.md | 39 ++ .../eks-multi-arch/_next-steps.md | 24 ++ .../eks-multi-arch/_review.md | 30 ++ .../eks-multi-arch/go-multi-arch-eks.md | 398 ++++++++++++++++++ 4 files changed, 491 insertions(+) create mode 100644 content/learning-paths/servers-and-cloud-computing/eks-multi-arch/_index.md create mode 100644 content/learning-paths/servers-and-cloud-computing/eks-multi-arch/_next-steps.md create mode 100644 content/learning-paths/servers-and-cloud-computing/eks-multi-arch/_review.md create mode 100644 content/learning-paths/servers-and-cloud-computing/eks-multi-arch/go-multi-arch-eks.md diff --git a/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/_index.md b/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/_index.md new file mode 100644 index 000000000..4a8d674e9 --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/_index.md @@ -0,0 +1,39 @@ +--- +title: Learn how to build and deploy a multi-architecture application on Amazon EKS + +minutes_to_complete: 60 + +who_is_this_for: This is an advanced topic for software developers who are looking to understand how to build and deploy a multi-architecture application with x86/amd64 and arm64 based container images on Amazon EKS + +learning_objectives: + - Build x86/amd64 and arm64 container images with docker buildx and docker manifest + - Understand the nuances of building a multi-architecture container image + - Learn how to add taints and tolerations to Amazon EKS clusters to schedule application pods on architecture specific nodes + - Deploy a multi-arch container application across multiple architectures in a single Amazon EKS cluster + +prerequisites: + - A [AWS account](https://aws.amazon.com/). Create an account if needed. + - A computer with [Amazon eksctl CLI](/install-guides/eksctl) and [kubectl](/install-guides/kubectl/)installed. + - Docker installed on local computer [Docker](/install-guides/docker) + +author_primary: Pranay Bakre + +### Tags +skilllevels: Advanced +subjects: Containers and Virtualization +armips: + - Neoverse + +tools_software_languages: + - Kubernetes + - AWS +operatingsystems: + - Linux + + +### FIXED, DO NOT MODIFY +# ================================================================================ +weight: 1 # _index.md always has weight of 1 to order correctly +layout: "learningpathall" # All files under learning paths have this same wrapper +learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content. +--- diff --git a/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/_next-steps.md b/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/_next-steps.md new file mode 100644 index 000000000..9b4dd5b8d --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/_next-steps.md @@ -0,0 +1,24 @@ +--- +next_step_guidance: We recommend you to continue learning about deploying multi-architecture applications. + +recommended_path: "/learning-paths/servers-and-cloud-computing/migration" + +further_reading: + - resource: + title: EKS documentation + link: https://aws.amazon.com/eks/ + type: documentation + - resource: + title: Amazon Elastic Container Registry + link: https://docs.aws.amazon.com/AmazonECR/latest/userguide/what-is-ecr.html?pg=ln&sec=hs + type: documentation + + + +# ================================================================================ +# FIXED, DO NOT MODIFY +# ================================================================================ +weight: 21 # set to always be larger than the content in this path, and one more than 'review' +title: "Next Steps" # Always the same +layout: "learningpathall" # All files under learning paths have this same wrapper +--- diff --git a/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/_review.md b/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/_review.md new file mode 100644 index 000000000..c63ded344 --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/_review.md @@ -0,0 +1,30 @@ +--- +review: + - questions: + question: > + Taints and tolerations ensure that pods are scheduled on correct nodes. + answers: + - "True" + - "False" + correct_answer: 1 + explanation: > + Taints and tolerations work together to make sure that application pods are not scheduled on wrong architecture nodes. + + - questions: + question: > + You can't create an Amazon EKS cluster with both x86/amd64 and arm64 nodes. + answers: + - "True" + - "False" + correct_answer: 2 + explanation: > + Amazon EKS supports hybrid clusters with both x86/amd64 and arm64 nodes. + + +# ================================================================================ +# FIXED, DO NOT MODIFY +# ================================================================================ +title: "Review" # Always the same title +weight: 20 # Set to always be larger than the content in this path +layout: "learningpathall" # All files under learning paths have this same wrapper +--- diff --git a/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/go-multi-arch-eks.md b/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/go-multi-arch-eks.md new file mode 100644 index 000000000..608e7820d --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/go-multi-arch-eks.md @@ -0,0 +1,398 @@ +--- +title: Build and deploy a multi-arch application on Amazon EKS +weight: 2 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Multi-architecture Amazon EKS cluster with x86 and Arm-based (Graviton) nodes + +A multi-architecture Kubernetes cluster runs workloads on multiple hardware architectures, typically arm64 and amd64. To learn more about multi-architecture Kubernetes you can create a hybrid cluster in Amazon EKS and gain some practical experience with arm64 and amd64 nodes. This will also help you understand multi-architecture container images. + +## Before you begin + +You will need an [AWS account](https://aws.amazon.com/). Create an account if needed. + +Three tools are required on your local machine. Follow the links to install the required tools. + +* [Kubectl](/install-guides/kubectl/) +* [Amazon eksctl CLI](/install-guides/eksctl) +* [Docker](/install-guides/docker) + +## Create a Multi-architecture Amazon EKS Cluster + +Use eksctl to create a multi-architecture Amazon EKS cluster. Create the following file with your choice of text editor + +```yaml +apiVersion: eksctl.io/v1alpha5 +kind: ClusterConfig + +metadata: + name: multi-arch-cluster + region: us-east-1 + +nodeGroups: + - name: x86-node-group + instanceType: m5.large + desiredCapacity: 2 + volumeSize: 80 + - name: arm64-node-group + instanceType: m6g.large + desiredCapacity: 2 + volumeSize: 80 +``` + +Run the eksctl command to create the EKS cluster + +```console +eksctl create cluster -f cluster.yaml +``` +This command will create a cluster that has 2 x86/amd64 nodes and 2 arm64 nodes. When the cluster is ready, use the following command to check the nodes + +```console +kubectl get nodes +``` +You should see an output like below + +```output +NAME STATUS ROLES AGE VERSION +ip-172-31-10-206.eu-west-1.compute.internal Ready 9m56s v1.28.1-eks-43840fb +ip-172-31-16-133.eu-west-1.compute.internal Ready 9m59s v1.28.1-eks-43840fb +ip-172-31-19-140.eu-west-1.compute.internal Ready 8m32s v1.28.1-eks-43840fb +ip-172-31-40-45.eu-west-1.compute.internal Ready 8m32s v1.28.1-eks-43840fb +``` +To check the architecture of the nodes, execute the following command + +```console +kubectl get node -o jsonpath='{.items[*].status.nodeInfo.architecture}' +``` +The output should show two architectures for four nodes + +```output +arm64 amd64 amd64 arm64 +``` + +## Multi-architecture containers + +Multi-architecture container images are the easiest way to deploy applications, and hide the underlying hardware architecture. Building multi-architecture images is slightly more complex compared to building single-architecture images. +Docker provides two ways to create multi-architecture images: + * docker buildx - builds both architectures at the same time + * docker manifest - builds each architecture separately and joins them together into a multi-architecture image + +Below is a simple Go application you can use to learn about multi-architecture Kubernetes clusters. Create a file named hello.go with the contents below: + +```console +package main + +import ( + "fmt" + "log" + "net/http" + "os" + "runtime" +) + +func handler(w http.ResponseWriter, r *http.Request) { + fmt.Fprintf(w, "Hello from image NODE:%s, POD:%s, CPU PLATFORM:%s/%s", + os.Getenv("NODE_NAME"), os.Getenv("POD_NAME"), runtime.GOOS, runtime.GOARCH) +} + +func main() { + http.HandleFunc("/", handler) + log.Fatal(http.ListenAndServe(":8080", nil)) +} +``` +Create another file named go.mod with the following content + +```console +module example.com/arm +go 1.21 +``` + +Create a Dockerfile with the following + +```console +ARG T + +# +# Build: 1st stage +# +FROM golang:1.21-alpine as builder +ARG TARCH +WORKDIR /app +COPY go.mod . +COPY hello.go . +RUN GOARCH=${TARCH} go build -o /hello && \ + apk add --update --no-cache file && \ + file /hello + +# +# Release: 2nd stage +# +FROM ${T}alpine +WORKDIR / +COPY --from=builder /hello /hello +RUN apk add --update --no-cache file +CMD [ "/hello" ] +``` + +## Build multi-architecture docker images with docker buildx + +With these files you can build your docker image. Login to Amazon ECR and create the following repository - multi-arch-app + +Run the following command to build and push the docker image to the repository + +```console +docker buildx create --name multiarch --use --bootstrap +docker buildx build -t /multi-arch:latest --platform linux/amd64,linux/arm64 --push . +``` +You should now see the docker image in your repository. + +## Build multi-architecture docker images with Docker manifest + +You can also use docker manifest to create a multi-architecture image from two single-architecture images. This is an alternative way to to build the multi-architecture image. +Create another repo in Amazon ECR with the name - multi-arch-demo. Use the following command to build an amd64 image + +```console +docker build build -t /multi-arch-demo:amd64 --build-arg TARCH=amd64 --build-arg T=amd64/ . +docker push /multi-arch-demo:amd64 +``` + +Build an arm64 image by executing the following commands on an arm64 machine +```console +docker build build -t /multi-arch-demo:arm64 --build-arg TARCH=amd64 --build-arg T=amd64v8/ . +docker push /multi-arch-demo:arm64 +``` + +After building individual containers for each architecture, merge them into a single image by running the commands below on either architecture: + +```console +docker manifest create /multi-arch-demo:latest \ +--amend /multi-arch-demo:arm64 \ +--amend /multi-arch-demo:amd64 +docker manifest push --purge /multi-arch-demo:latest +``` + +You should see three images in the ECR repository - one for each architecture (amd64 and arm64) and a combined multi-architecture image. + +## Deploy Kubernetes service in EKS cluster + +We'll create a service to deploy the application. Create a file with the following contents + +```yaml +apiVersion: v1 +kind: Service +metadata: + name: hello-service + labels: + app: hello + tier: web +spec: + type: LoadBalancer + ports: + - port: 80 + targetPort: 8080 + selector: + app: hello + tier: web +``` + +Deploy the service by running the following command + +```console +kubectl apply -f hello-service.yaml +``` + +## Deploy amd64 application + +Create a text file named amd64-deployment.yaml with the contents below. The amd64 image will only run on amd64 nodes. The nodeSelector is used to make sure the continer is only scheduled on amd64 nodes. + +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: amd-deployment + labels: + app: hello +spec: + replicas: 1 + selector: + matchLabels: + app: hello + tier: web + template: + metadata: + labels: + app: hello + tier: web + spec: + containers: + - name: hello + image: /multi-arch-demo:amd64 + imagePullPolicy: Always + ports: + - containerPort: 8080 + env: + - name: NODE_NAME + valueFrom: + fieldRef: + fieldPath: spec.nodeName + - name: POD_NAME + valueFrom: + fieldRef: + fieldPath: metadata.name + resources: + requests: + cpu: 300m + nodeSelector: + kubernetes.io/arch: amd64 + +``` + +Use the following command to deploy the application. + +```console +kubectl apply -f amd64-deployment.yaml +``` +The output should show a single pod running. + +Get the external IP assigned to the service we deployed earlier, by executing the following command. + +```console +kubectl get svc +``` +Use the external-ip from the command output and execute the following command. This IP belongs to the Load Balancer provisioned in your cluster + +```console +curl -w '\n' http:// +``` +You should see an output similar to below, along with the architecture. + +```output +Hello from image NODE:ip-192-168-32-244.ec2.internal, POD:amd-deployment-7d4d44889d-vzhpd, CPU PLATFORM:linux/amd64 +``` + +## Deploy arm64 application + +Create a text file named arm64-deployment.yaml with the contents below. Notice that the value of nodeSelector is now arm64 + +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: arm-deployment + labels: + app: hello +spec: + replicas: 1 + selector: + matchLabels: + app: hello + tier: web + template: + metadata: + labels: + app: hello + tier: web + spec: + containers: + - name: hello + image: /multi-arch-demo:arm64 + imagePullPolicy: Always + ports: + - containerPort: 8080 + env: + - name: NODE_NAME + valueFrom: + fieldRef: + fieldPath: spec.nodeName + - name: POD_NAME + valueFrom: + fieldRef: + fieldPath: metadata.name + resources: + requests: + cpu: 300m + nodeSelector: + kubernetes.io/arch: arm64 +``` + +Deploy the arm64 application by using the command below + +```console +kubectl apply -f arm64-deployment.yaml +``` + +Execute the following command to check the running pods + +```console +kubectl get pods +``` +You should see two pods running in the cluster. One for amd64 and another one for arm64. + +Execute the curl command a few times to see output from both the pods. You'll see responses from arm64 and amd64 pods. + +```console +curl -w '\n' http:// +``` + +## Deploy multi-architecture application in EKS cluster + +Now, we'll deploy the multi-architecture version of the application in our EKS cluster. Create a text file named multi-arch-deployment.yaml with the contents below. The image is the multi-architecture image created with docker buildx and 6 replicas are specified. + +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: multi-arch-deployment + labels: + app: hello +spec: + replicas: 6 + selector: + matchLabels: + app: hello + tier: web + template: + metadata: + labels: + app: hello + tier: web + spec: + containers: + - name: hello + image: /multi-arch:latest + imagePullPolicy: Always + ports: + - containerPort: 8080 + env: + - name: NODE_NAME + valueFrom: + fieldRef: + fieldPath: spec.nodeName + - name: POD_NAME + valueFrom: + fieldRef: + fieldPath: metadata.name + resources: + requests: + cpu: 300m +``` +Deploy the multi-architecture application by using the command below + +```console +kubectl apply -f multi-arch-deployment.yaml +``` +Execute the following command to check the running pods +```console +kubectl get pods +``` +Now the output should show all the pods from three deployments. To test the application, run the following command to check messages from all three versions of the application + +```console +for i in $(seq 1 10); do curl -w '\n' http://; done +``` +The output will show a variety of arm64 and amd64 messages. + +You have now deployed an x86/amd64, arm64 and multi-architecture version of the same application in a single Amazon EKS cluster. Leverage these techniques to incrementally migrate your existing x86/amd64 based applications to arm64 in AWS. From 5a5cb117b4f6d1ddaf36a3fdec5a1b6eea57158c Mon Sep 17 00:00:00 2001 From: pareenaverma Date: Mon, 6 Nov 2023 08:57:01 -0500 Subject: [PATCH 23/35] Update go-multi-arch-eks.md --- .../eks-multi-arch/go-multi-arch-eks.md | 65 ++++++++++--------- 1 file changed, 35 insertions(+), 30 deletions(-) diff --git a/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/go-multi-arch-eks.md b/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/go-multi-arch-eks.md index 608e7820d..519c723f6 100644 --- a/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/go-multi-arch-eks.md +++ b/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/go-multi-arch-eks.md @@ -22,7 +22,7 @@ Three tools are required on your local machine. Follow the links to install the ## Create a Multi-architecture Amazon EKS Cluster -Use eksctl to create a multi-architecture Amazon EKS cluster. Create the following file with your choice of text editor +Use `eksctl` to create a multi-architecture Amazon EKS cluster. Create a file named `cluster.yaml` with the contents below using a file editor of your choice. ```yaml apiVersion: eksctl.io/v1alpha5 @@ -43,17 +43,17 @@ nodeGroups: volumeSize: 80 ``` -Run the eksctl command to create the EKS cluster +Run the `eksctl` command to create the EKS cluster: ```console eksctl create cluster -f cluster.yaml ``` -This command will create a cluster that has 2 x86/amd64 nodes and 2 arm64 nodes. When the cluster is ready, use the following command to check the nodes +This command will create a cluster that has 2 x86/amd64 nodes and 2 arm64 nodes. When the cluster is ready, use the following command to check the nodes: ```console kubectl get nodes ``` -You should see an output like below +The output should look similar to: ```output NAME STATUS ROLES AGE VERSION @@ -62,12 +62,12 @@ ip-172-31-16-133.eu-west-1.compute.internal Ready 9m59s v1.28.1- ip-172-31-19-140.eu-west-1.compute.internal Ready 8m32s v1.28.1-eks-43840fb ip-172-31-40-45.eu-west-1.compute.internal Ready 8m32s v1.28.1-eks-43840fb ``` -To check the architecture of the nodes, execute the following command +To check the architecture of the nodes, execute the following command: ```console kubectl get node -o jsonpath='{.items[*].status.nodeInfo.architecture}' ``` -The output should show two architectures for four nodes +The output should show two architectures for four nodes: ```output arm64 amd64 amd64 arm64 @@ -75,12 +75,12 @@ arm64 amd64 amd64 arm64 ## Multi-architecture containers -Multi-architecture container images are the easiest way to deploy applications, and hide the underlying hardware architecture. Building multi-architecture images is slightly more complex compared to building single-architecture images. +Multi-architecture container images are the easiest way to deploy applications and hide the underlying hardware architecture. Building multi-architecture images is slightly more complex compared to building single-architecture images. Docker provides two ways to create multi-architecture images: - * docker buildx - builds both architectures at the same time - * docker manifest - builds each architecture separately and joins them together into a multi-architecture image + * docker buildx - builds both architectures at the same time. + * docker manifest - builds each architecture separately and joins them together into a multi-architecture image. -Below is a simple Go application you can use to learn about multi-architecture Kubernetes clusters. Create a file named hello.go with the contents below: +Shown below is a simple `Go` application you can use to learn about multi-architecture Kubernetes clusters. Create a file named `hello.go` with the contents below: ```console package main @@ -103,14 +103,14 @@ func main() { log.Fatal(http.ListenAndServe(":8080", nil)) } ``` -Create another file named go.mod with the following content +Create another file named `go.mod` with the following content: ```console module example.com/arm go 1.21 ``` -Create a Dockerfile with the following +Create a Dockerfile with the following content: ```console ARG T @@ -139,31 +139,35 @@ CMD [ "/hello" ] ## Build multi-architecture docker images with docker buildx -With these files you can build your docker image. Login to Amazon ECR and create the following repository - multi-arch-app +With these files you can build your docker image. Login to Amazon ECR and create a repository named `multi-arch-app`. -Run the following command to build and push the docker image to the repository +Run the following command to build and push the docker image to the repository: ```console docker buildx create --name multiarch --use --bootstrap docker buildx build -t /multi-arch:latest --platform linux/amd64,linux/arm64 --push . ``` +Replace `` in the command above to the location of your repository. + You should now see the docker image in your repository. ## Build multi-architecture docker images with Docker manifest You can also use docker manifest to create a multi-architecture image from two single-architecture images. This is an alternative way to to build the multi-architecture image. -Create another repo in Amazon ECR with the name - multi-arch-demo. Use the following command to build an amd64 image +Create another repository in Amazon ECR with the name `multi-arch-demo`. Use the following command to build an amd64 image: ```console docker build build -t /multi-arch-demo:amd64 --build-arg TARCH=amd64 --build-arg T=amd64/ . docker push /multi-arch-demo:amd64 ``` +Replace `` in the command above to the location of your repository. -Build an arm64 image by executing the following commands on an arm64 machine +Build an arm64 image by executing the following commands on an arm64 machine: ```console docker build build -t /multi-arch-demo:arm64 --build-arg TARCH=amd64 --build-arg T=amd64v8/ . docker push /multi-arch-demo:arm64 ``` +Again, replace `` in the commands above to the location of your repository. After building individual containers for each architecture, merge them into a single image by running the commands below on either architecture: @@ -178,7 +182,7 @@ You should see three images in the ECR repository - one for each architecture (a ## Deploy Kubernetes service in EKS cluster -We'll create a service to deploy the application. Create a file with the following contents +You can now create a service to deploy the application. Create a file named `hello-service.yaml` with the following contents: ```yaml apiVersion: v1 @@ -198,7 +202,7 @@ spec: tier: web ``` -Deploy the service by running the following command +Deploy the service. Run the following command: ```console kubectl apply -f hello-service.yaml @@ -206,7 +210,7 @@ kubectl apply -f hello-service.yaml ## Deploy amd64 application -Create a text file named amd64-deployment.yaml with the contents below. The amd64 image will only run on amd64 nodes. The nodeSelector is used to make sure the continer is only scheduled on amd64 nodes. +Create a text file named `amd64-deployment.yaml` with the contents below. The amd64 image will only run on amd64 nodes. The nodeSelector is used to make sure the container is only scheduled on amd64 nodes. ```yaml apiVersion: apps/v1 @@ -250,24 +254,24 @@ spec: ``` -Use the following command to deploy the application. +Use the following command to deploy the application: ```console kubectl apply -f amd64-deployment.yaml ``` The output should show a single pod running. -Get the external IP assigned to the service we deployed earlier, by executing the following command. +Get the external IP assigned to the service you deployed earlier, by executing the following command: ```console kubectl get svc ``` -Use the external-ip from the command output and execute the following command. This IP belongs to the Load Balancer provisioned in your cluster +Use the `external-ip` from the command output and execute the following command. This IP belongs to the Load Balancer provisioned in your cluster. ```console curl -w '\n' http:// ``` -You should see an output similar to below, along with the architecture. +You should see output similar to what is shown below: ```output Hello from image NODE:ip-192-168-32-244.ec2.internal, POD:amd-deployment-7d4d44889d-vzhpd, CPU PLATFORM:linux/amd64 @@ -275,7 +279,7 @@ Hello from image NODE:ip-192-168-32-244.ec2.internal, POD:amd-deployment-7d4d448 ## Deploy arm64 application -Create a text file named arm64-deployment.yaml with the contents below. Notice that the value of nodeSelector is now arm64 +Create a text file named `arm64-deployment.yaml` with the contents below. Notice that the value of `nodeSelector` is now arm64. ```yaml apiVersion: apps/v1 @@ -318,7 +322,7 @@ spec: kubernetes.io/arch: arm64 ``` -Deploy the arm64 application by using the command below +Deploy the arm64 application by using the command below: ```console kubectl apply -f arm64-deployment.yaml @@ -331,7 +335,7 @@ kubectl get pods ``` You should see two pods running in the cluster. One for amd64 and another one for arm64. -Execute the curl command a few times to see output from both the pods. You'll see responses from arm64 and amd64 pods. +Execute the curl command a few times to see output from both the pods. You should see responses from both the arm64 and amd64 pods. ```console curl -w '\n' http:// @@ -339,7 +343,7 @@ curl -w '\n' http:// ## Deploy multi-architecture application in EKS cluster -Now, we'll deploy the multi-architecture version of the application in our EKS cluster. Create a text file named multi-arch-deployment.yaml with the contents below. The image is the multi-architecture image created with docker buildx and 6 replicas are specified. +You can now deploy the multi-architecture version of the application in our EKS cluster. Create a text file named `multi-arch-deployment.yaml` with the contents below. The image is the multi-architecture image created with docker buildx and 6 replicas are specified. ```yaml apiVersion: apps/v1 @@ -379,16 +383,17 @@ spec: requests: cpu: 300m ``` -Deploy the multi-architecture application by using the command below +Deploy the multi-architecture application by using the command below: ```console kubectl apply -f multi-arch-deployment.yaml ``` -Execute the following command to check the running pods +Execute the following command to check the running pods: + ```console kubectl get pods ``` -Now the output should show all the pods from three deployments. To test the application, run the following command to check messages from all three versions of the application +The output should show all the pods from three deployments. To test the application, run the following command to check messages from all three versions of the application: ```console for i in $(seq 1 10); do curl -w '\n' http://; done From 30cab8310ac38805ad4f0edfd1e8acfe63ac961f Mon Sep 17 00:00:00 2001 From: pareenaverma Date: Mon, 6 Nov 2023 08:58:19 -0500 Subject: [PATCH 24/35] Update _index.md --- .../servers-and-cloud-computing/eks-multi-arch/_index.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/_index.md b/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/_index.md index 4a8d674e9..1a15fc51c 100644 --- a/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/_index.md +++ b/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/_index.md @@ -3,7 +3,7 @@ title: Learn how to build and deploy a multi-architecture application on Amazon minutes_to_complete: 60 -who_is_this_for: This is an advanced topic for software developers who are looking to understand how to build and deploy a multi-architecture application with x86/amd64 and arm64 based container images on Amazon EKS +who_is_this_for: This is an advanced topic for software developers who are looking to understand how to build and deploy a multi-architecture application with x86/amd64 and arm64 based container images on Amazon EKS learning_objectives: - Build x86/amd64 and arm64 container images with docker buildx and docker manifest @@ -12,7 +12,7 @@ learning_objectives: - Deploy a multi-arch container application across multiple architectures in a single Amazon EKS cluster prerequisites: - - A [AWS account](https://aws.amazon.com/). Create an account if needed. + - An [AWS account](https://aws.amazon.com/). Create an account if needed. - A computer with [Amazon eksctl CLI](/install-guides/eksctl) and [kubectl](/install-guides/kubectl/)installed. - Docker installed on local computer [Docker](/install-guides/docker) From 78bed1ba3e4a9316430b5e1a6a50ad736b904572 Mon Sep 17 00:00:00 2001 From: Liz Warman <81630105+lizwar@users.noreply.github.com> Date: Mon, 6 Nov 2023 16:32:40 +0000 Subject: [PATCH 25/35] Update _index.md minor grammatical amends. Links currently don't work but seeking clarification from the author. --- .../servers-and-cloud-computing/eks-multi-arch/_index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/_index.md b/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/_index.md index 1a15fc51c..5213feecd 100644 --- a/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/_index.md +++ b/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/_index.md @@ -3,7 +3,7 @@ title: Learn how to build and deploy a multi-architecture application on Amazon minutes_to_complete: 60 -who_is_this_for: This is an advanced topic for software developers who are looking to understand how to build and deploy a multi-architecture application with x86/amd64 and arm64 based container images on Amazon EKS +who_is_this_for: This is an advanced topic for software developers who want to understand how to build and deploy a multi-architecture application with x86/amd64 and arm64-based container images on Amazon EKS learning_objectives: - Build x86/amd64 and arm64 container images with docker buildx and docker manifest From efbf40c46bcfe56ae4bf7b7efb0953cff1c4ad71 Mon Sep 17 00:00:00 2001 From: Liz Warman <81630105+lizwar@users.noreply.github.com> Date: Mon, 6 Nov 2023 16:42:26 +0000 Subject: [PATCH 26/35] Update _review.md minor editorial amends --- .../servers-and-cloud-computing/eks-multi-arch/_review.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/_review.md b/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/_review.md index c63ded344..d2a1b4575 100644 --- a/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/_review.md +++ b/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/_review.md @@ -8,7 +8,7 @@ review: - "False" correct_answer: 1 explanation: > - Taints and tolerations work together to make sure that application pods are not scheduled on wrong architecture nodes. + Taints and tolerations work together to ensure that application pods are not scheduled on the wrong architecture nodes. - questions: question: > From 16a3bee29e092f168aad44c1b946b516779dc8be Mon Sep 17 00:00:00 2001 From: Liz Warman <81630105+lizwar@users.noreply.github.com> Date: Mon, 6 Nov 2023 16:58:07 +0000 Subject: [PATCH 27/35] Update go-multi-arch-eks.md minor editorial amends --- .../eks-multi-arch/go-multi-arch-eks.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/go-multi-arch-eks.md b/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/go-multi-arch-eks.md index 519c723f6..03f44f74d 100644 --- a/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/go-multi-arch-eks.md +++ b/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/go-multi-arch-eks.md @@ -8,7 +8,7 @@ layout: learningpathall ## Multi-architecture Amazon EKS cluster with x86 and Arm-based (Graviton) nodes -A multi-architecture Kubernetes cluster runs workloads on multiple hardware architectures, typically arm64 and amd64. To learn more about multi-architecture Kubernetes you can create a hybrid cluster in Amazon EKS and gain some practical experience with arm64 and amd64 nodes. This will also help you understand multi-architecture container images. +A multi-architecture Kubernetes cluster runs workloads on multiple hardware architectures, typically arm64 and amd64. To learn more about multi-architecture Kubernetes, you can create a hybrid cluster in Amazon EKS and gain some practical experience with arm64 and amd64 nodes. This will also help you understand multi-architecture container images. ## Before you begin @@ -20,7 +20,7 @@ Three tools are required on your local machine. Follow the links to install the * [Amazon eksctl CLI](/install-guides/eksctl) * [Docker](/install-guides/docker) -## Create a Multi-architecture Amazon EKS Cluster +## Create a multi-architecture Amazon EKS Cluster Use `eksctl` to create a multi-architecture Amazon EKS cluster. Create a file named `cluster.yaml` with the contents below using a file editor of your choice. @@ -153,7 +153,7 @@ You should now see the docker image in your repository. ## Build multi-architecture docker images with Docker manifest -You can also use docker manifest to create a multi-architecture image from two single-architecture images. This is an alternative way to to build the multi-architecture image. +You can also use docker manifest to create a multi-architecture image from two single-architecture images. This is an alternative way to build the multi-architecture image. Create another repository in Amazon ECR with the name `multi-arch-demo`. Use the following command to build an amd64 image: ```console @@ -202,7 +202,7 @@ spec: tier: web ``` -Deploy the service. Run the following command: +Deploy the service and run the following command: ```console kubectl apply -f hello-service.yaml @@ -271,7 +271,7 @@ Use the `external-ip` from the command output and execute the following command. ```console curl -w '\n' http:// ``` -You should see output similar to what is shown below: +You should now see an output similar to what's shown below: ```output Hello from image NODE:ip-192-168-32-244.ec2.internal, POD:amd-deployment-7d4d44889d-vzhpd, CPU PLATFORM:linux/amd64 @@ -333,9 +333,9 @@ Execute the following command to check the running pods ```console kubectl get pods ``` -You should see two pods running in the cluster. One for amd64 and another one for arm64. +You should see two pods running in the cluster, one for amd64 and another one for arm64. -Execute the curl command a few times to see output from both the pods. You should see responses from both the arm64 and amd64 pods. +Execute the curl command a few times to see output from both the pods; you should see responses from both the arm64 and amd64 pods. ```console curl -w '\n' http:// From d2a772d88e410b79183bb9b8434c46f90d4837b9 Mon Sep 17 00:00:00 2001 From: Jason Andrews Date: Mon, 6 Nov 2023 14:32:03 -0600 Subject: [PATCH 28/35] spelling updates --- .wordlist.txt | 23 ++++++++++++++++++- .../2_designing_a_dynamic_memory_allocator.md | 2 +- 2 files changed, 23 insertions(+), 2 deletions(-) diff --git a/.wordlist.txt b/.wordlist.txt index d11fe18d8..e0aa9c790 100644 --- a/.wordlist.txt +++ b/.wordlist.txt @@ -1627,4 +1627,25 @@ Dawid rg pds WebApp -AllowAnyCustom \ No newline at end of file +AllowAnyCustom +Konstantinos +Margaritis +VectorCamp +ASIMD +enum +solaris +Spickett +Allocator +allocator +pseudocode +struct +allocators +tradeoff +tradeoffs +unallocated +deallocate +prepend +initialise +ptr +LLSoftSecBook +nodeSelector \ No newline at end of file diff --git a/content/learning-paths/cross-platform/dynamic-memory-allocator/2_designing_a_dynamic_memory_allocator.md b/content/learning-paths/cross-platform/dynamic-memory-allocator/2_designing_a_dynamic_memory_allocator.md index 495351eb8..c4f3ce20f 100644 --- a/content/learning-paths/cross-platform/dynamic-memory-allocator/2_designing_a_dynamic_memory_allocator.md +++ b/content/learning-paths/cross-platform/dynamic-memory-allocator/2_designing_a_dynamic_memory_allocator.md @@ -154,7 +154,7 @@ concern itself with alignment, which is why it can do a simple subtraction. ## Running out of space -The final thing an allocator must do is realise it has run out of space. This is +The final thing an allocator must do is realize it has run out of space. This is simply achieved by knowing the bounds of the backing storage. ```C From 8f4f18fc1b3c759e72a300c57837f64edb76e565 Mon Sep 17 00:00:00 2001 From: Jason Andrews Date: Mon, 6 Nov 2023 14:36:20 -0600 Subject: [PATCH 29/35] spelling updates --- .../cross-platform/dynamic-memory-allocator/_next-steps.md | 2 +- .../restrict-keyword-c99/what-is-restrict.md | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/content/learning-paths/cross-platform/dynamic-memory-allocator/_next-steps.md b/content/learning-paths/cross-platform/dynamic-memory-allocator/_next-steps.md index a42a93bb7..dd9051e5d 100644 --- a/content/learning-paths/cross-platform/dynamic-memory-allocator/_next-steps.md +++ b/content/learning-paths/cross-platform/dynamic-memory-allocator/_next-steps.md @@ -9,7 +9,7 @@ further_reading: link: https://en.cppreference.com/w/c/memory type: documentation - resource: - title: LLSoftSecBook chapter on Memory Vunerabilities + title: LLSoftSecBook chapter on Memory Vulnerabilities link: https://llsoftsec.github.io/llsoftsecbook/#memory-vulnerability-based-attacks type: website diff --git a/content/learning-paths/embedded-systems/restrict-keyword-c99/what-is-restrict.md b/content/learning-paths/embedded-systems/restrict-keyword-c99/what-is-restrict.md index aadb81b9a..83072aa82 100644 --- a/content/learning-paths/embedded-systems/restrict-keyword-c99/what-is-restrict.md +++ b/content/learning-paths/embedded-systems/restrict-keyword-c99/what-is-restrict.md @@ -45,7 +45,7 @@ int main() { ``` There are 2 points to make here: -1. `scaleVectors()` is the important function here, it scales two vectors by the same scalefactor `*C` +1. `scaleVectors()` is the important function here, it scales two vectors by the same scale factor `*C` 2. vector `a` overlaps with vector `b`. (`b = &a[2]`). this rather simple program produces this output: @@ -104,11 +104,11 @@ This doesn't look optimal. `scaleVectors` seems to be doing each load, multiplic int64_t b[] = { 5, 6, 7, 8 }; ``` -Unsurprisingly, the disassembled output of `scaleVectors` is the same. The reason for this is that the compiler has no hint about the dependency between the two pointers used in the function so it has no choice but to assume that it has to process one element at a time. The function has no way of knowing what arguments need to be called. We see 8 instances of `mul`, which is correct but the number of loads and stores inbetween indicates that the CPU spends its time waiting for data to arrive from/to the cache. We need a way to be able to tell the compiler that it can assume the buffers passed are independent. +Unsurprisingly, the disassembled output of `scaleVectors` is the same. The reason for this is that the compiler has no hint about the dependency between the two pointers used in the function so it has no choice but to assume that it has to process one element at a time. The function has no way of knowing what arguments need to be called. We see 8 instances of `mul`, which is correct but the number of loads and stores in between indicates that the CPU spends its time waiting for data to arrive from/to the cache. We need a way to be able to tell the compiler that it can assume the buffers passed are independent. ## The Solution: restrict -This is what the C99 `restrict` keyword resolves. It instructs the compiler that the passed arguments are not dependant on each other and that access to the memory of each happens only through the respective pointer. This way the compiler can schedule the instructions in a much more efficient way. Essentially it can group and schedule the loads and stores. **Note**, `restrict` only works in C, not in C++. +This is what the C99 `restrict` keyword resolves. It instructs the compiler that the passed arguments are not dependent on each other and that access to the memory of each happens only through the respective pointer. This way the compiler can schedule the instructions in a much more efficient way. Essentially it can group and schedule the loads and stores. **Note**, `restrict` only works in C, not in C++. Let's add `restrict` to `A` in the parameter list: ```C From b830974efeb5550f9df5eec1f3d46c0645fce2a2 Mon Sep 17 00:00:00 2001 From: Liz Warman <81630105+lizwar@users.noreply.github.com> Date: Tue, 7 Nov 2023 08:36:14 +0000 Subject: [PATCH 30/35] Update _next-steps.md minor editorial amends --- .../servers-and-cloud-computing/eks-multi-arch/_next-steps.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/_next-steps.md b/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/_next-steps.md index 9b4dd5b8d..a0c235d83 100644 --- a/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/_next-steps.md +++ b/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/_next-steps.md @@ -1,5 +1,5 @@ --- -next_step_guidance: We recommend you to continue learning about deploying multi-architecture applications. +next_step_guidance: We recommend you continue learning about deploying multi-architecture applications. recommended_path: "/learning-paths/servers-and-cloud-computing/migration" From 39250c5c6eb372105d12897ba84dcc81b42ace19 Mon Sep 17 00:00:00 2001 From: Liz Warman <81630105+lizwar@users.noreply.github.com> Date: Tue, 7 Nov 2023 08:50:49 +0000 Subject: [PATCH 31/35] Update go-multi-arch-eks.md Minor editorial amends --- .../eks-multi-arch/go-multi-arch-eks.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/go-multi-arch-eks.md b/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/go-multi-arch-eks.md index 03f44f74d..6d411b080 100644 --- a/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/go-multi-arch-eks.md +++ b/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/go-multi-arch-eks.md @@ -139,7 +139,7 @@ CMD [ "/hello" ] ## Build multi-architecture docker images with docker buildx -With these files you can build your docker image. Login to Amazon ECR and create a repository named `multi-arch-app`. +With these files you can build your docker image. Log in to Amazon ECR and create a repository named `multi-arch-app`. Run the following command to build and push the docker image to the repository: @@ -151,9 +151,9 @@ Replace `` in the command above to the location of your r You should now see the docker image in your repository. -## Build multi-architecture docker images with Docker manifest +## Build multi-architecture docker images with docker manifest -You can also use docker manifest to create a multi-architecture image from two single-architecture images. This is an alternative way to build the multi-architecture image. +You can also use docker manifest to create a multi-architecture image from two single-architecture images. Create another repository in Amazon ECR with the name `multi-arch-demo`. Use the following command to build an amd64 image: ```console @@ -266,7 +266,7 @@ Get the external IP assigned to the service you deployed earlier, by executing t ```console kubectl get svc ``` -Use the `external-ip` from the command output and execute the following command. This IP belongs to the Load Balancer provisioned in your cluster. +Use the `external-ip` from the command output and execute the following command (this IP belongs to the Load Balancer provisioned in your cluster): ```console curl -w '\n' http:// @@ -279,7 +279,7 @@ Hello from image NODE:ip-192-168-32-244.ec2.internal, POD:amd-deployment-7d4d448 ## Deploy arm64 application -Create a text file named `arm64-deployment.yaml` with the contents below. Notice that the value of `nodeSelector` is now arm64. +Create a text file named `arm64-deployment.yaml` with the contents below. Note that the value of `nodeSelector` is now arm64. ```yaml apiVersion: apps/v1 @@ -328,12 +328,12 @@ Deploy the arm64 application by using the command below: kubectl apply -f arm64-deployment.yaml ``` -Execute the following command to check the running pods +Execute the following command to check the running pods: ```console kubectl get pods ``` -You should see two pods running in the cluster, one for amd64 and another one for arm64. +You should now see two pods running in the cluster, one for amd64 and another one for arm64. Execute the curl command a few times to see output from both the pods; you should see responses from both the arm64 and amd64 pods. @@ -343,7 +343,7 @@ curl -w '\n' http:// ## Deploy multi-architecture application in EKS cluster -You can now deploy the multi-architecture version of the application in our EKS cluster. Create a text file named `multi-arch-deployment.yaml` with the contents below. The image is the multi-architecture image created with docker buildx and 6 replicas are specified. +You can now deploy the multi-architecture version of the application in EKS cluster. Create a text file named `multi-arch-deployment.yaml` with the contents below. The image is the multi-architecture image created with docker buildx and 6 replicas are specified. ```yaml apiVersion: apps/v1 From 5d905322f756cbee6159c14b232c73885e27c6a4 Mon Sep 17 00:00:00 2001 From: pareenaverma Date: Tue, 7 Nov 2023 09:06:10 -0500 Subject: [PATCH 32/35] Update _index.md --- .../servers-and-cloud-computing/eks-multi-arch/_index.md | 1 - 1 file changed, 1 deletion(-) diff --git a/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/_index.md b/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/_index.md index 5213feecd..3a5b32a29 100644 --- a/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/_index.md +++ b/content/learning-paths/servers-and-cloud-computing/eks-multi-arch/_index.md @@ -8,7 +8,6 @@ who_is_this_for: This is an advanced topic for software developers who want to u learning_objectives: - Build x86/amd64 and arm64 container images with docker buildx and docker manifest - Understand the nuances of building a multi-architecture container image - - Learn how to add taints and tolerations to Amazon EKS clusters to schedule application pods on architecture specific nodes - Deploy a multi-arch container application across multiple architectures in a single Amazon EKS cluster prerequisites: From 8cebbfd8871dd4c1bc07c5e3204cfc0a6ca94e55 Mon Sep 17 00:00:00 2001 From: pareenaverma Date: Tue, 7 Nov 2023 09:14:54 -0500 Subject: [PATCH 33/35] Update _index.md to draft while it's being reviewed. --- .../cross-platform/dynamic-memory-allocator/_index.md | 1 + 1 file changed, 1 insertion(+) diff --git a/content/learning-paths/cross-platform/dynamic-memory-allocator/_index.md b/content/learning-paths/cross-platform/dynamic-memory-allocator/_index.md index 0faf2fcd4..aaba30b48 100644 --- a/content/learning-paths/cross-platform/dynamic-memory-allocator/_index.md +++ b/content/learning-paths/cross-platform/dynamic-memory-allocator/_index.md @@ -1,6 +1,7 @@ --- armips: null author_primary: David Spickett +draft: true layout: learningpathall learning_objectives: - Explain how dynamic memory allocation and the C heap works From 665aef6acf95486b30e856a48b53e3ea7cdf7862 Mon Sep 17 00:00:00 2001 From: pareenaverma Date: Tue, 7 Nov 2023 09:31:17 -0500 Subject: [PATCH 34/35] Update _index.md --- .../embedded-systems/restrict-keyword-c99/_index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/learning-paths/embedded-systems/restrict-keyword-c99/_index.md b/content/learning-paths/embedded-systems/restrict-keyword-c99/_index.md index 411003e9f..227259611 100644 --- a/content/learning-paths/embedded-systems/restrict-keyword-c99/_index.md +++ b/content/learning-paths/embedded-systems/restrict-keyword-c99/_index.md @@ -3,7 +3,7 @@ title: Understand the `restrict` keyword in C99 minutes_to_complete: 30 -who_is_this_for: C developers who are interested in software optimization +who_is_this_for: This is an introductory topic for C developers who are interested in software optimization learning_objectives: - Learn the importance of using the `restrict` keyword in C correctly From 82f86f066c6ad48a1826da95da6ceac0b0b166e8 Mon Sep 17 00:00:00 2001 From: pareenaverma Date: Tue, 7 Nov 2023 14:56:34 +0000 Subject: [PATCH 35/35] Updated restrict LP --- .../restrict-keyword-c99/_index.md | 2 +- .../restrict-keyword-c99/what-is-restrict.md | 20 +++++++++++-------- .../when-to-use-restrict.md | 12 +++++------ contributors.csv | 1 + 4 files changed, 20 insertions(+), 15 deletions(-) diff --git a/content/learning-paths/embedded-systems/restrict-keyword-c99/_index.md b/content/learning-paths/embedded-systems/restrict-keyword-c99/_index.md index 227259611..e52d14a9d 100644 --- a/content/learning-paths/embedded-systems/restrict-keyword-c99/_index.md +++ b/content/learning-paths/embedded-systems/restrict-keyword-c99/_index.md @@ -9,7 +9,7 @@ learning_objectives: - Learn the importance of using the `restrict` keyword in C correctly prerequisites: - - An Arm based system with Linux OS and recent compiler (Clang or GCC) + - An Arm computer running Linux OS and a recent version of compiler (Clang or GCC) installed author_primary: Konstantinos Margaritis, VectorCamp diff --git a/content/learning-paths/embedded-systems/restrict-keyword-c99/what-is-restrict.md b/content/learning-paths/embedded-systems/restrict-keyword-c99/what-is-restrict.md index 83072aa82..67791a85c 100644 --- a/content/learning-paths/embedded-systems/restrict-keyword-c99/what-is-restrict.md +++ b/content/learning-paths/embedded-systems/restrict-keyword-c99/what-is-restrict.md @@ -48,8 +48,8 @@ There are 2 points to make here: 1. `scaleVectors()` is the important function here, it scales two vectors by the same scale factor `*C` 2. vector `a` overlaps with vector `b`. (`b = &a[2]`). -this rather simple program produces this output: -``` +This simple program produces this output: +```output a(before): 1 2 3 4 b(before): 3 4 5 6 a(after) : 2 4 12 16 @@ -60,7 +60,7 @@ Notice that after the scaling, the contents of `a` are also affected by the scal We will include the assembly output of `scaleVectors` as produced by `clang-17 -O3`: -``` +```output scaleVectors: // @scaleVectors ldr x8, [x2] ldr x9, [x0] @@ -108,7 +108,11 @@ Unsurprisingly, the disassembled output of `scaleVectors` is the same. The reaso ## The Solution: restrict -This is what the C99 `restrict` keyword resolves. It instructs the compiler that the passed arguments are not dependent on each other and that access to the memory of each happens only through the respective pointer. This way the compiler can schedule the instructions in a much more efficient way. Essentially it can group and schedule the loads and stores. **Note**, `restrict` only works in C, not in C++. +This is what the C99 `restrict` keyword resolves. It instructs the compiler that the passed arguments are not dependent on each other and that access to the memory of each happens only through the respective pointer. This way the compiler can schedule the instructions in a much more efficient way. Essentially it can group and schedule the loads and stores. + +{{% notice Note %}} +The `restrict` keyword only works in C, not in C++. +{{% /notice %}} Let's add `restrict` to `A` in the parameter list: ```C @@ -191,10 +195,10 @@ It is interesting to see that in such an example adding the `restrict` keyword r ## What about SVE2? -We have shown the obvious benefit of `restrict` in this function, on an armv8-a CPU, but we have new armv9-a CPUs out there with SVE2 as well as Neon/ASIMD. -Could the compiler generate better code in that case using `restrict`? The output without `restrict` is almost the same, but with `restrict` used, this is the result (we used `clang-17 -O3 -march=armv9-a`): +You have now seen the benefit of `restrict` in this function, on an Armv8-A CPU. You can now try it on an Armv9-A CPU which supports SVE2 as well as Neon/ASIMD. +Could the compiler generate better code in that case using `restrict`? The output without `restrict` is almost the same, but with `restrict` used, this is the result (Compiler flags used: `clang-17 -O3 -march=armv9-a`): -``` +```output scaleVectors: // @scaleVectors ldp q1, q2, [x0] ldp q3, q4, [x1] @@ -210,4 +214,4 @@ scaleVectors: // @scaleVectors There are just 10 instructions, 31% of the original code size! The compiler has made great use of the SVE2 features, combining the multiplications and reducing them to 4 and, at the same time, grouping loads and stores down to 2 each. We have optimized our code by more than 3x just by adding a C99 keyword. -We are now going to look at another example. +Next, lets take a look at another example. diff --git a/content/learning-paths/embedded-systems/restrict-keyword-c99/when-to-use-restrict.md b/content/learning-paths/embedded-systems/restrict-keyword-c99/when-to-use-restrict.md index 8b3caf9e3..4d5165019 100644 --- a/content/learning-paths/embedded-systems/restrict-keyword-c99/when-to-use-restrict.md +++ b/content/learning-paths/embedded-systems/restrict-keyword-c99/when-to-use-restrict.md @@ -1,18 +1,18 @@ --- -title: When can we use restrict +title: When can you use restrict weight: 4 ### FIXED, DO NOT MODIFY layout: learningpathall --- -When can we use `restrict` or, put differently, how do we recognize that we need `restrict` in our code? +When can you use `restrict` or, put differently, how do you recognize that you need `restrict` in your code? -`restrict` as a pointer attribute is rather easy to test. As a rule of thumb, if the function includes one or more pointers to memory objects as arguments, we can use `restrict` if we are certain that the memory pointed to by these pointer arguments does not overlap and there is no other way to access them in the body of the function, except by the use of those pointers, i.e., there is no other global pointer or some other indirect way to access these elements. +`restrict` as a pointer attribute is rather easy to test. As a rule of thumb, if the function includes one or more pointers to memory objects as arguments, you can use `restrict` if you are certain that the memory pointed to by these pointer arguments does not overlap and there is no other way to access them in the body of the function, except by the use of those pointers, i.e., there is no other global pointer or some other indirect way to access these elements. -Let's show a counter example: +Let's see a counter example: -``` +```C int A[10]; int f(int *B, size_t n) { @@ -30,4 +30,4 @@ int main() { This example does not not benefit from `restrict` in either gcc and clang. -However, there are plenty of cases that are candidates for the `restrict` optimization. It's safe and easy to try but, even if it looks like a good candidate, it is still possible that the compiler will not detect a pattern that is suited for optimization and we might not see any reduction in the code or speed gain. It is up to the compiler; in some cases clang handles this better or differently from gcc, and vice versa, and this will also depend on the version. If you have a particular piece of code that you would like to optimize, before you attempt to refactor it completely, rewrite it in assembly or use any SIMD instructions, it might be worth trying `restrict`. Even saving a couple of instructions in a critical loop function is worth having by just adding one keyword. +However, there are plenty of cases that are candidates for the `restrict` optimization. It's safe and easy to try but, even if it looks like a good candidate, it is still possible that the compiler will not detect a pattern that is suited for optimization and you might not see any reduction in the code or speed gain. It is up to the compiler; in some cases clang handles this better or differently from gcc, and vice versa, and this will also depend on the version. If you have a particular piece of code that you would like to optimize, before you attempt to refactor it completely, rewrite it in assembly or use any SIMD instructions, it might be worth trying `restrict`. Even saving a couple of instructions in a critical loop function is worth having by just adding one keyword. diff --git a/contributors.csv b/contributors.csv index 9043db3fa..5ff202e36 100644 --- a/contributors.csv +++ b/contributors.csv @@ -18,3 +18,4 @@ Frédéric -lefred- Descamps,OCI,,,,lefred.be Kristof Beyls,Arm,,,, David Spickett,Arm,,,, Uma Ramalingam,Arm,uma-ramalingam,,, +Konstantinos Margaritis,VectorCamp,,,,