Skip to content

Commit

Permalink
Merge pull request #559 from DavidSpickett/dynamic-memory-allocator
Browse files Browse the repository at this point in the history
Dynamic Memory Allocator Learning Path
  • Loading branch information
jasonrandrews authored Nov 2, 2023
2 parents 41afe01 + 3839fc9 commit a686566
Show file tree
Hide file tree
Showing 7 changed files with 1,025 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
---
title: Dynamic Memory Allocation
weight: 2

### FIXED, DO NOT MODIFY
layout: learningpathall
---

## Dynamic vs. Static Allocation

In this learning path you will learn how to implement dynamic memory allocation.
If you have used C's "heap" (`malloc`, `free`, etc.) before, that is one example
of dynamic memory allocation.

It allows programs to allocate memory while they are running without knowing
at build time what amount of memory they will need. In constrast to static
memory allocation where the amount is known at build time.

```C
#include <stdlib.h>

void fn() {
// Static allocation
int a = 0;
// Dynamic allocation
int *b = malloc(sizeof(int));
}
```

The example above shows the difference. The size and location of `a` is known
when the program is built. The size of `b` is also known, but its location is not.

It may even never be allocated, as this pseudocode example shows:

```C
int main(...) {
if (/*user has passed some argument*/) {
int *b = malloc(sizeof(int));
}
}
```
If the user passes no arguments to the program, there's no need to allocate space
for `b`. If they do, `malloc` will find space for it.
## malloc
The C standard library provides a special function
[`malloc`](https://en.cppreference.com/w/c/memory/malloc). `m` for "memory",
`alloc` for "allocate". This can be used to ask for a suitably sized memory
location while the program is running.
```C
void *malloc(size_t size);
```

The C library will then look for a chunk of memory with size of at least `size`
bytes in a large chunk of memory that it has reserved. For instance on Ubuntu
Linux, this will be done by GLIBC.

The example at the top of the page is trivial of course. As it is we could just
statically allocate both integers like this:
```C
void fn() {
int a, b = 0;
}
```

That's ok if this data is never be returned from this function. Or in other
words, if the lifetime of this data is equal to that of the function.

A more complicated example will show you when that is not the case, and the value
lives longer than the function that created it.

```C
#include <stdlib.h>

typedef struct Entry {
int data;
// NULL if end of list, next entry otherwise.
struct Entry* next;
} Entry;

void add_entry(Entry *entry, int data) {
// New entry, which becomes the end of the list.
Entry *new_entry = malloc(sizeof(Entry));
new_entry->data = data;
new_entry->next = NULL;

// Previous tail now points to the newly allocated entry.
entry->next = new_entry;
}
```
What you see above is a struct `Entry` that defines a singly-linked-list entry.
Singly meaining that you can go forward via `next`, but you cannot go backwards
in the list. There is some data `data`, and each entry points to the next entry,
`next`, assuming there is one (it will be `NULL` for the end of the list).
`add_entry` makes a new entry and adds it to the end of the list.
Think about how you would use these functions. You could start with some known
size of list, like a global variable for the head (first entry)
of our list.
```C
Entry head = {.data = 123, .next=NULL};
```

Now you want to add another `Entry` to this list at runtime. So you do not know
ahead of time what it will contain, or if we indeed will add it or not. Where
would you put that entry?

* If it is another global variable, we would have to declare many empty `Entry`s
and hope we never needed more than that amount.

{{% notice Other Allocation Techniques%}}
Although in this specific case global variables aren't a good solution, there are
cases where large sets of pre-allocated objects can be beneficial. For example,
it provides a known upper bound of memory usage and makes the timing of each
allocation predictable.

However, we will not be covering these techniques in this learning path. It will
however be useful to think about them after you have completed this learning
path.
{{% /notice %}}

* If it is in a function's stack frame, that stack frame will be reclaimed and
modified by future functions, corrupting the new `Entry`.

So you can see, we must use dynamic memory allocation. Which is why the `add_entry`
shown above calls `malloc`. The resulting pointer points to somewhere not in
the program's global data section or in any function's stack space, but in the
heap memory. Where it can live until we `free` it.

## free

You cannot ask malloc for memory forever. Eventually that space behind the scenes
will run out. So you should give up your dynamic memory once it is not needed,
using [`free`](https://en.cppreference.com/w/c/memory/free).

```C
void free(void *ptr);
```
You call `free` with a pointer previously given to you by `malloc`, and this tells
the heap that we no longer need this memory.
{{% notice Undefined Behaviour%}}
You may wonder what happens if you don't pass the exact pointer to `free`, as
`malloc` returned to you. The result varies as this is "undefined behaviour".
Which essentially means a large variety of unexpected things can happen.
In practice, many allocators will tolerate this difference or reject it outright
if it's not possible to do something sensbile with the pointer.
Remember that just because one allocator handles this a certain way, does not
mean all will. Indeed, that same allocator may handle it differently for
different allocations within the same program.
{{% /notice %}}
So, you can use `free` to remove an item from your linked list.
```C
void remove_entry(Entry* previous, Entry* entry) {
// NULL checks skipped for brevity.
previous->next = entry->next;
free(entry);
}
```

`remove_entry` makes the previous entry point to the entry after the one we want
to remove, so that the list skips over it. With `entry` now isolated we call
`free` to give up the memory it occupies.

```text
----- List ------ | - Heap --
[A] -> [B] -> [C] | [A][B][C]
|
[A] [B] [C] | [A][B][C]
|-------------^ |
|
[A]---------->[C] | [A] [C]
```

That covers the high level how and why of using `malloc` and `free`, next you'll
see a possible implementation of a dynamic memory allocator.
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
---
title: Designing a Dynamic Memory Allocator
weight: 3

### FIXED, DO NOT MODIFY
layout: learningpathall
---

## High Level Design

To begin with, decide which functions your memory allocator will provide. We
have described `malloc` and `free`, there are more provided by the
[C library](https://en.cppreference.com/w/c/memory).

This will assume you just need `malloc` and `free`. Start with those and write
out their behaviours, as the programmer using your allocator will see.

There will be a function, `malloc`. It will:
* Take a size in bytes as a parameter.
* Try to allocate some memory.
* Return a pointer to that memory, NULL pointer otherwise.

There will be a function `free`. It will:
* Take a pointer to some previously allocated memory as a parameter.
* Mark that memory as avaiable for future allocations.

From this you can see that you will need:
* Some large chunk of memory, the "backing storage".
* A way to mark parts of that memory as allocated, or available for allocation.

## Backing Storage

The memory can come from many sources. It can even change size throughout the
program's execution if you wish. For your allocator you'll keep it as simple
as possible.

A single, statically allocated global array of bytes will be your backing
storage. So you can do dynamic allocation of parts of a statically allocated
piece of memory.

```C
#define STORAGE_SIZE 4096
static char storage[STORAGE_SIZE];
```
## Record Keeping
This backing memory needs to be annotated somehow to record what has been
allocated so far. There are many, many ways to do this. With the biggest choice
here being whether to store these records in the heap itself, our outside of it.
We will not go into those tradeoffs here, and instead you will put the records
in the heap, as this is relatively simple to do.
What should be in your records? Think about what question the software will ask
us. Can you give me a pointer to an area of free memory of at least this size?
For this you will need to know:
* Which ranges of the backing storage have been allocated or not.
* How large each of ranges sections is. This includes free areas.
Where a "range" a pointer to a location, a size in bytes and a boolean to say
whether the range is free or allocated. So a range from 0x123 of 345 bytes,
that has been allocated would be:
```text
start: 0x123 size: 345 allocated: true
```

For the intial state of a heap of size `N`, you will have one range of
unallocated memory.

```text
Pointer: 0x0 Size: N Allocated: False
```

When an allocation is made you will split this free range into 2 ranges. The
first part the new allocation, the second the remaining free space. If 4 bytes
were to be allocated:

```text
Pointer: 0x0 Size: 4 Allocated: True
Pointer: 0x4 Size: N-4 Allocated: False
```

The next time you need to allocate, you will walk these ranges until you find
one with enough free space, and repeat the splitting process.

The walk works like this. Starting from the first range, add the size of that
range to the address of that range. This new address is the start of the next
range. Repeat until the resulting address is beyond the end of the heap.

```text
range = 0x0;
Pointer: 0x0 Size: 4 Allocated: False
range = 0x0 + 4 = 0x4;
Pointer: 0x4 Size: N-4 Allocated: False
range = 0x4 + (N-4) = 1 beyond the end of the heap, so the walk is finished.
```

`free` uses the pointer given to it to find the range it needs to deallocate.
Let's say the 4 byte allocation was freed:

```text
Pointer: 0x0 Size: 4 Allocated: False
Pointer: 0x4 Size: N-4 Allocated: False
```

Since `free` gets a pointer directly to the allocation you know exactly which
range to modify. The only change made is to the boolean which marks it as
allocated or not. The location and size of the range stay the same.

{{% notice Merging Free Ranges%}}
The allocator presented here will not merge free ranges like the 2 above. This
is a deliberate limitation and addressing this is discussed later.
{{% /notice %}}

## Record Storage

You'll keep these records in heap which means using some of the allocated space
for them on top of the allocation itself.

The simplest way to do this is to prepend each allocation with the range
information. This way you can skip from the start of one range to another with
ease.

```text
0x00: [ptr, size, allocated] <-- The range information
0x08: <...> <-- The pointer malloc returns
0x10: [ptr, size, allocated] <-- Information about the second range
<...and so on until the end of the heap...>
```

Pointers returned by `malloc` are offset to just beyond the range information.
When `free` receives a pointer, it can get to the range information by
subtracting the size of that information from the pointer. Using the example
above:

```text
free(my_ptr);
0x00: [ptr, size, allocated] <-- my_ptr - sizeof(range information)
0x08: <...> <-- my_ptr
```

{{% notice Data Alignment%}}
When an allocator needs to produce addresses with a specific alignment, the
calculations above must be adjusted. The allocator presented here does not
concern itself with alignment, which is why it can do a simple subtraction.
{{% /notice %}}

## Running Out Of Space

The final thing an allocator must do is realise it has run out of space. This is
simply achieved by knowing the bounds of the backing storage.

```C
#define STORAGE_SIZE 4096
static char storage[STORAGE_SIZE];
// If our search reaches this point, there is no free space to allocate.
static const char *storage_end = storage + STORAGE_SIZE;
```

If you are walking the heap and the start of the next range would be greater
than or equal to `storage_end`, you have run out of memory to allocate.
Loading

0 comments on commit a686566

Please sign in to comment.