Skip to content

Immix Garbage Collection

Ding YuChen edited this page May 2, 2020 · 7 revisions

Immix Garbage Collector is a type of mark-region garbage collection algorithm, that splits the heap space into blocks and lines, which make the finding of holes, and the freeing of unused objects faster than traditional mark-sweep or mark-copy algorithms

Node Layout

General Node Layout

const TAG_SLOT = 0;
const SIZE_SLOT = 1;
const FIRST_CHILD_SLOT = 2;
const LAST_CHILD_SLOT = 3;
const MARK_SLOT = 4;

Nodes in the Immix Garbage Collector differ from regular SVML nodes by including a MARK slot for all objects that live in the heap. This slot is required in order for us to resolve circular or multiple references when traversing live nodes.

Block and Line Layouts

// block nodes layout
//
// 0: tag  = BLOCK_TAG
// 1: size = total size of block
// 2: offset of first child from the tag: 6
// 3: offset of last child from the tag: (NUM_OF_LINES_PER_BLOCK - 1) * LINE_BK_SIZE (3)
// 4: mark (free: 0, occupied: 1)
// 5: block state (occupied: 0, recyclable: 1, free: 2)
// 6: line 0 start address
// 7: line 0 mark bit (marked: 1, unmarked: 0)
// 8: line 0 end of occupied address (for block occupancy profiling purpose)
// 9: line 1 address
// ...
// NUM_OF_LINES_PER_BLOCK * LINE_BK_SIZE + BLOCK_BK_SIZE: start of line 0 usable memory

Blocks contain bookkeeping slots for lines, in order to reserve a contiguous memory space in the body of the block to allow for allocation of objects that are more than the size of a line.

Line address refers to start of the line metadata with reference to the heap.

Design Considerations

  1. Line contains start and end addresses in order to provide more information and allow for flexibility of extension in the future, for example block occupancy profiling.
  2. We keep the start address even though it can be calculated from a given line address in order to reduce the use of registers to calculate this address, and also to provide convenience to the programmer.

Algorithm

The Immix Garbage Collector is essentially a Mark-Region Garbage collector with 2 levels of granularity as well as a compaction mechanism. There are also several other bells and whistles like overflow allocation and opportunistic defragmentation.

Garbage Collection

The garbage collection mechanism goes something like this:

function garbage_collect() {
    map(roots(heap), mark);
    map(blocks(heap), free);
    unmark_all_nodes(heap);
}

function mark(node) {
    node.mark_slot = MARKED;
    for (child in node.children) {
        if (is_forwarding_address(child)) {
            child = heap[forwarding_address];
        }
        if (child.mark_slot === UNMARKED) {
            const b = get_block(child);
            if (need_to_defrag) {
                const forwarding_address = copy(child);
                child = heap[forwarding_address];
            }
            get_block(child).mark_slot = MARKED;
            get_line(child).mark_slot = MARKED;
            mark(child);
        }
    }
}

function free(block) {
    if (block.mark_slot === MARKED) {
        for (line in block.lines) {
            if (line.mark_slot === MARKED) {
                free_line(line);
            } 
        }
    } else {
        free_block(block);
    }
}

Marking

All live objects that are reachable will be traversed via dfs and marked. If the defragmentation is active, the object will be copied to a free block and the garbage collector will leave behind a forwarding address. If this forwarding address is encountered again during the marking phase, the parent node of the object will have it's reference updated accordingly. At the end of the marking phase, we can be sure that there are no more loose pointers and the candidate block can then be freed and turned into the new headroom block.

Freeing

Freeing a node consists of freeing the lines that it resides in. This is done by setting the LIMIT_ADDRESS of the line to its START_ADDRESS. No values are actually overwritten in favor of speed, although this can be done during development. Freeing a block simply consists of changing the block state to FREE.

Debugging

Debugging is a tricky thing to do considering the possible propagation of unwanted behavior before the program fails and the absence of a stack trace. In order to improve the programmer productivity, this writer has decided to include helpful and general assertions that can be used throughout the program in order to ensure correctness and faster debugging.

Assertions

Assertions in Source makes use of the built-in error function. Assertion functions also take in a list of string that acts as a rudimentary stack trace that allows the user to identify where a violation of the assertion takes place. Upon encountering an assertion, the SVM will print out current register values as well as a visualization of the heap.

Note that the assertions listed here are only a part of *-with-mark-region-gc.js programs. (at the point of writing, May 2020)

Naming Convention

Assertions begin with assert_*. These functions do not return anything but check the validity of some given input. Reference functions begin with ref_*. These functions have return values depending on the specification, and are meant to be correct functions not written in machine code, so that virtual machine macros can be checked against this output.

Assertion Implementations and Documentation

assert_true

Description

Assert input boolean to be true

Parameters
  • bool boolean to check
  • msg string to display if boolean is not true
  • stack list of strings that act as stack trace

assert_false

Description

Assert input boolean to be false

Parameters
  • bool boolean to check
  • msg string to display if boolean is true
  • stack list of strings that act as stack trace

assert_same

Description

Assert 2 input values to be equal (===)

Parameters
  • v1 first value to check
  • v2 second value to check
  • stack list of strings that act as stack trace

assert_same_as_ref

Description

Assert a given input to be equal to the output of a reference-implementation function. Useful to check correctness of one's implementation, or discover bugs in the reference implementation.

Parameters
  • v1 value to check
  • ref_wrapper function such that when applied to a stack, returns the output of the reference implementation. ie, const ref_wrapper = stack => ref_mark(pair("more info on stack_trace", stack));
  • stack list of strings that act as stack trace

ref_mark

Description

Reference implementation for the mark algorithm, returns a list of all live nodes

Parameters
  • stack list of strings that act as stack trace
Return

List of marked node addresses

assert_valid_address

Description

Makes sure that input value is an address that is not undefined and not used for Block or Line metadata.

Parameters
  • address address number
  • stack list of strings that act as stack trace

assert_valid_node

Description

Makes sure that input value is a node that is not undefined and its children are all addressable and defined.

Parameters
  • node_address address of node
  • stack list of strings that act as stack trace

assert_correct_env

Description

Makes sure that input value is an environment node and all its parent are also environment nodes.

Parameters
  • env_address address of environment node
  • stack list of strings that act as stack trace

assert_os_overflow

Description

Ensures operand stack (OS) has a LAST_CHILD value that is within the bounds of the size of the OS node.

Parameters
  • OS address of the OS
  • stack list of strings that act as stack trace

ref_get_block

Description

Reference implementation for GET_BLOCK. Returns the address of the block that a given address is in.

Parameters
  • address input address
  • stack list of strings that act as stack trace
Return

Address of block

ref_get_line

Description

Reference implementation for GET_LINE. Returns the address of the line that a given address is in.

Parameters
  • address input address
  • stack list of strings that act as stack trace
Return

Address of line

copy_rts

Description

Helper function that copies current RTS values into an array

Parameters
  • top number that is used as the top of RTS
Return

Array of copied RTS values.

assert_rts

Description

That an input array is the same as current RTS, also considering the current TOP_RTS value.

Parameters
  • copy array of values, usually a copied RTS before executing some function.
  • stack list of strings that act as stack trace

assert_lines_marked

Description

Ensures that all lines a given node is residing in, are marked.

Parameters
  • node_address address of a given node
  • stack list of strings that act as stack trace

assert_marked

Description

Asserts that a node is marked

Parameters
  • mark_offset number that determines the mark slot of the node. This can be MARK_SLOT or LINE_MARK_SLOT. Other values may work but results are not guaranteed to be meaningful.
  • address address of a given node
  • stack list of strings that act as stack trace

assert_node_marked

Description

Asserts that a node and its corresponding lines and block are marked

Parameters
  • node_address address of a given node
  • stack list of strings that act as stack trace

assert_unfree

Description

Ensure that a given node and its corresponding lines and block are not freed.

Parameters
  • address address of a given node
  • stack list of strings that act as stack trace