Skip to content

Commit

Permalink
Merge pull request #579 from lizwar/main
Browse files Browse the repository at this point in the history
editorial review complete - dynamic memory allocation
  • Loading branch information
pareenaverma authored Nov 9, 2023
2 parents 5545e45 + 49f864f commit 1fe1927
Show file tree
Hide file tree
Showing 6 changed files with 83 additions and 99 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,7 @@ layout: learningpathall
## Dynamic vs. static memory allocation

In this Learning Path you will learn how to implement dynamic memory allocation.
If you have used the C programming language "heap" (`malloc`, `free`, etc.) before, that is one example
of dynamic memory allocation.
One example of dynamic memory allocation is if you have used the C programming language "heap" (`malloc`, `free`, etc.) before.

Dynamic memory allocation allows programs to allocate memory while they are running without knowing
at build time how much memory they will need. In contrast, static
Expand Down Expand Up @@ -44,22 +43,21 @@ int main(...) {
The arguments passed to the program determine if memory is allocated or not.
## The C library malloc function
## The C library `malloc` function
The C standard library provides a special function
[`malloc`](https://en.cppreference.com/w/c/memory/malloc). `m` for "memory",
`alloc` for "allocate". This is used to ask for a suitably sized memory
[`malloc`](https://en.cppreference.com/w/c/memory/malloc) (`m` for "memory",
`alloc` for "allocate"). This is used to ask for a suitably sized memory
location while a program is running.
```C
void *malloc(size_t size);
```

The C library looks for a chunk of memory with size of at least `size`
bytes in a large chunk of memory that it has reserved. For instance on Ubuntu
Linux, this is done by GLIBC.
The C library looks for a chunk of memory with a size of at least X bytes within the memory that it has reserved, where X is the value of the `size`
parameter passed to `malloc`. For instance, on Ubuntu Linux, this is done by GLIBC.

The example at the top of the page is trivial of course. As it is we could just
The example at the top of the page is trivial, of course. As it is we could just
statically allocate both integers like this:

```C
Expand All @@ -68,11 +66,11 @@ void fn() {
}
```

Variables `a` and `b` work fine if they are not needed outside of the function. Or in other
Variables `a` and `b` work fine if they are not needed outside of the function. In other
words, if the lifetime of the data is equal to that of the function.

A more complex example shows when this is not the case, and the values
live longer than the creating function.
A more complex example (shown below) demonstrates when this is not the case, and the values
live longer than the creating function:

```C
#include <stdlib.h>
Expand All @@ -96,49 +94,45 @@ void add_entry(Entry *entry, int data) {
What you see above is a struct `Entry` that defines a singly-linked-list entry.
Singly meaning that you can go forward via `next`, but you cannot go backwards
in the list. There is some data `data`, and each entry points to the next entry,
in the list. There is some `data` and each entry points to the next entry,
`next`, assuming there is one (it will be `NULL` for the end of the list).
`add_entry` makes a new entry and adds it to the end of the list.
Think about how you would use these functions. You could start with some known
size of list, like a global variable for the head (first entry)
sizes of lists, like a global variable for the head (first entry)
of our list.
```C
Entry head = {.data = 123, .next=NULL};
```

Now you want to add another `Entry` to this list at runtime. So you do not know
ahead of time what it will contain, or if we indeed will add it or not. Where
Now you want to add another `Entry` to this list at runtime. You do not know
ahead of time what it will contain or if we will add it or not. Where
would you put that entry?

* If it is another global variable, we would have to declare many empty `Entry`
values and hope
values and hope we never need more than that amount.

{{% notice Other Allocation Techniques%}}
Although in this specific case global variables aren't a good solution, there are
cases where large sets of pre-allocated objects can be beneficial. For example,
it provides a known upper bound of memory usage and makes the timing of each
allocation predictable.

However, these techniques are not covered in this Learning Path. It will
however be useful to think about them after you have completed this learning
allocation predictable. These techniques are not covered in this Learning Path. It will, however, be useful to think about them after you have completed this learning
path.
{{% /notice %}}

* If it is in a function's stack frame, that stack frame will be reclaimed and
modified by future functions, corrupting the new `Entry`.

So you can see, dynamic memory allocation is required. Which is why the `add_entry`
So you can see, dynamic memory allocation is required, which is why the `add_entry`
shown above calls `malloc`. The resulting pointer points to somewhere not in
the program's global data section or in any function's stack space, but in the
heap memory. It will stay in the heap until a call to `free` is made.

## The C library free function

You cannot ask malloc for memory forever. Eventually the space behind the scenes
will run out. You should give up your dynamic memory once it is not needed,
You cannot ask malloc for memory forever. Eventually the space will run out. You should give up your dynamic memory once it is not needed,
using [`free`](https://en.cppreference.com/w/c/memory/free).

```C
Expand All @@ -150,18 +144,17 @@ the heap that the memory is no longer needed.
{{% notice Undefined Behavior%}}
You may wonder what happens if you don't pass the exact same pointer to `free` as
`malloc` returned. The result varies as this is "undefined behavior".
Which essentially means a large variety of unexpected things can happen.
`malloc` returned. The result varies as this is "undefined behavior", which essentially means a large variety of unexpected things can happen.
In practice, many allocators will tolerate this difference or reject it outright
if it's not possible to do something sensible with the pointer.
Remember, just because one allocator handles this a certain way, does not
mean all allocators will be the same. Indeed, that same allocator may handle it differently for
Remember, just because one allocator handles this a certain way, it does not
mean all allocators will be the same. Indeed, the same allocator may handle it differently for
different allocations within the same program.
{{% /notice %}}
You can use `free` to remove an item from your linked list.
You can use `free` to remove an item from your linked list:
```C
void remove_entry(Entry* previous, Entry* entry) {
Expand All @@ -173,7 +166,7 @@ void remove_entry(Entry* previous, Entry* entry) {

`remove_entry` makes the previous entry point to the entry after the one we want
to remove, so that the list skips over it. With `entry` now isolated we call
`free` to give up the memory it occupies.
`free` to give up the memory it occupies:

```text
----- List ------ | - Heap --
Expand All @@ -185,5 +178,5 @@ to remove, so that the list skips over it. With `entry` now isolated we call
[A]---------->[C] | [A] [C]
```

That covers the high level how and why of using `malloc` and `free`, next you will
see a possible implementation of a dynamic memory allocator.
We've now covered the high level how and why for using `malloc` and `free`. Next you will
see a possible implementation of a dynamic memory allocator.
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,12 @@ layout: learningpathall

## High level design

To begin with, decide which functions your memory allocator will provide. We
have described `malloc` and `free`, there are more provided by the
To begin, decide which functions your memory allocator will provide. We
have already described `malloc` and `free` but there are more provided by the
[C library](https://en.cppreference.com/w/c/memory).

This will assume you just need `malloc` and `free`. The new implementations will
be called `simple_malloc` and `simple_free`. Start with just two functions and write
This demo assumes you just need `malloc` and `free`. The new implementations will
be called `simple_malloc` and `simple_free`. Start with just these two functions and write
out their behaviors.

The first function is `simple_malloc` and it will:
Expand All @@ -26,17 +26,17 @@ The second function is `simple_free` and it will:
* Mark that memory as available for future allocations

From this you can see that you will need:
* Some large chunk of memory, the "backing storage".
* A large chunk of memory, the "backing storage".
* A way to mark parts of that memory as allocated, or available for allocation

## Backing storage

The memory can come from many sources. It can even change size throughout the
program's execution if you wish. For your allocator you can keep it simple.
program's execution but for your allocator you can keep it simple.

A single, statically allocated global array of bytes will be your backing
storage. You can do dynamic allocation of parts of a statically allocated
piece of memory.
storage. You can carry out dynamic allocation of parts of a statically allocated
piece of memory:

```C
#define STORAGE_SIZE 4096
Expand All @@ -46,35 +46,33 @@ static char storage[STORAGE_SIZE];
## Record keeping
This backing memory needs to be annotated somehow to record what has been
allocated so far. There are many ways to do this. Te biggest choice
allocated so far. There are many ways to do this; the biggest choice
is whether to store these records in the heap itself or outside of it.
The easiest way is to put the records in the heap.
What should be in the records? Think about the question the caller is asking.
Can you give me a pointer to an area of memory of at least this size?
What should be in the records? Think about the question the caller is asking, e.g., can it give a pointer to an area of memory of at least this size?
For this you will need to know:
* The ranges of the backing storage that have already been allocated
* The size of each section, both free and allocated
Where a "range" a pointer to a location, a size in bytes and a boolean to say
whether the range is free or allocated. So a range from 0x123 of 345 bytes,
A "range" is made up of 3 things: a pointer to a location, a size in bytes and a boolean to say whether the range is free or allocated. So a range from 0x123 of 345 bytes,
that has been allocated would be:
```text
start: 0x123 size: 345 allocated: true
```

For the initial state of a heap of size `N`, you will have one range of
unallocated memory.
unallocated memory:

```text
Pointer: 0x0 Size: N Allocated: False
```

When an allocation is made you will split this free range into 2 ranges. The
first part the new allocation, the second the remaining free space. If 4 bytes
When an allocation is made you will split this free range into 2 ranges: the
first part the new allocation, the second the remaining free space. If, for example, 4 bytes
were to be allocated:

```text
Expand All @@ -87,7 +85,7 @@ one with enough free space, and repeat the splitting process.

The walk works like this. Starting from the first range, add the size of that
range to the address of that range. This new address is the start of the next
range. Repeat until the resulting address is beyond the end of the heap.
range. Repeat until the resulting address is beyond the end of the heap:

```text
range = 0x0;
Expand All @@ -101,7 +99,7 @@ Pointer: 0x4 Size: N-4 Allocated: False
range = 0x4 + (N-4) = 1 beyond the end of the heap, so the walk is finished.
```

`simple_free` uses the pointer given to it to find the range it needs to deallocate.
`simple_free` uses the pointer given to it to find the range it needs to de-allocate.
Let's say the 4 byte allocation was freed:

```text
Expand All @@ -115,7 +113,7 @@ allocated or not. The location and size of the range stay the same.

{{% notice Merging Free Ranges%}}
The allocator presented here does not merge free ranges like the 2 above. This
is a deliberate limitation and addressing this is discussed later.
is a deliberate limitation which will be discussed later.
{{% /notice %}}

## Record storage
Expand All @@ -125,7 +123,7 @@ for them on top of the allocation itself.

The simplest way to do this is to prepend each allocation with the range
information. This way you can skip from the start of one range to another with
ease.
ease:

```text
0x00: [ptr, size, allocated] <-- The range information
Expand Down Expand Up @@ -155,7 +153,7 @@ concern itself with alignment, which is why it can do a simple subtraction.
## Running out of space

The final thing an allocator must do is realize it has run out of space. This is
simply achieved by knowing the bounds of the backing storage.
simply achieved by knowing the bounds of the backing storage:

```C
#define STORAGE_SIZE 4096
Expand All @@ -165,4 +163,4 @@ static const char *storage_end = storage + STORAGE_SIZE;
```

If you are walking the heap and the start of the next range would be greater
than or equal to `storage_end`, you have run out of memory to allocate.
than or equal to `storage_end`, you have run out of memory to allocate.
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,13 @@ You will need a Linux machine to try the code and see how the allocation works.
## Project structure

The files used are:
* `CMakeLists.txt` - Tells `cmake` how to configure and build the project.
* `heap.c` - The dynamic memory allocator implementation.
* `CMakeLists.txt` - Tells `cmake` how to configure and build the project
* `heap.c` - The dynamic memory allocator implementation
* `heap.h` - Function declarations including your new `simple_malloc` and
`simple_free` functions.
* `main.c` - A test program that makes use of `simple_malloc` and `simple_free`.
`simple_free` functions
* `main.c` - A test program that makes use of `simple_malloc` and `simple_free`

Building it will produce a single binary, `demo`, that you can run and see the results.
Building it will produce a single binary `demo` that you can run and see the results.

## Source code

Expand Down Expand Up @@ -62,8 +62,8 @@ First is `storage`, this is the backing storage which is a global char array.
This is where the ranges, represented by `Header`, are stored.
Each `Header` is written to the start of the allocated range. This means that
`simple_malloc` returns a pointer that points just beyond this location. `simle_free` on the
other hand, deducts the size of `Header` from the pointer parameter to find the
`simple_malloc` returns a pointer that points just beyond this location. `simple_free`, on the
other hand, deducts the size of the `Header` from the pointer parameter to find the
range information.
When the heap is initialized with `simple_heap_init`, a single range is setup
Expand All @@ -73,7 +73,7 @@ To find a free range, `find_free_space` walks the heap using these `Header`
values until it finds a large enough free range, or gets beyond the end of the
heap.
For the first allocation the job is straightforward. There's one range and it's
For the first allocation the job is straightforward; there's one range and it's
all free. Split that into 2 ranges, using the first for the allocation.
On subsequent allocations there will be more header values to read, but the
Expand Down Expand Up @@ -273,7 +273,7 @@ int main() {
}
```

The main code does allocation and deallocation of memory. This tests the heap
The main code does allocation and de-allocation of memory. This tests the heap
code but also highlights an interesting problem that you'll see more about later.

## Build the source code
Expand Down Expand Up @@ -309,7 +309,7 @@ Run `demo` to see the allocator in action:

## Review the program output

The output addresses will vary depending on where backing memory gets allocated
The output addresses will vary depending on where the backing memory gets allocated
by your system but this is the general form you should expect:

```text
Expand All @@ -322,8 +322,8 @@ Storage [0x559871a24040 -> 0x559871a25040) (4096 bytes)
The addresses on the left usually refer to an action. In this case we've set
a `Header` value at `0x559871a24040`.

The list in the last lines is the set of ranges you would see if you walked the
heap. Exactly what the allocator is seeing. The use of `[` followed by `)`
The list in the last few lines is the set of ranges you would see if you walked the
heap, which is exactly what the allocator is seeing. The use of `[` followed by `)`
means that the start address is included in the range, but the end address is
not. This is the initial heap state where everything is free.

Expand All @@ -338,7 +338,7 @@ Trying to allocate 100 bytes
[0x55e68c41f0ac -> 0x55e68c420040) : 0x0000000000000f94 (free, size = 3988 bytes)
```

You see that a request was made for 100 bytes and the allocator decided to split
You can see that a request was made for 100 bytes and the allocator decided to split
the 1 range into 2. It updated both the new ranges' header information.

Note that although it says `[0x559871a24048] Memory was allocated`, you do not
Expand All @@ -347,10 +347,10 @@ returned to the user. Take the size of `Header` from this address and you get th
start of the range which is `0x559871a24040` as shown in the first range in the
list.

You'll also notice that the allocated range is 8 bytes bigger than what the user
You'll also notice that the allocated range is 8 bytes bigger than the user
asked for. This is because it includes that `Header` at the start of it.

If you skip ahead to after the `free` calls have been made you will see:
If you skip ahead to after the `free` calls have been made, you will see:

```text
[0x55e68c41f1ac] Freeing allocation
Expand All @@ -367,4 +367,4 @@ Which shows you that the second and third allocations were freed, and there is
still a large range of free memory on the end.

Try to understand what the final allocation result is. Is the choice of location
expected or would you expect it to fit elsewhere in the heap?
expected or would you have expected it to fit elsewhere in the heap?
Loading

0 comments on commit 1fe1927

Please sign in to comment.