MIR 001: SOA Object System

Millfork Improvement Roadmap 001:

SOA Object System

Classes, pools and handles

Every class contains fields. For now, the limitation is to byte-sized and word-sized fields.

class Point {
    word x
    byte y
}

(Syntax and naming conventions may be subject to change. Unless specified otherwise, all declarations in code examples are assumed to be at the global scope, and all statements that cannot happen at the global scope are assumed to be located within a body of some unspecified function.)

Objects are referred to by 8-bit handles. A handle of value 0 is considered a null handle and doesn't point to any object:

Point p      // declaring a variable holding a handle to a point
p = null     // p is physically now 0
p = Point(0) // the same
p = 0        // won't compile, a byte is not a point

All handles are implicitly convertible to a type handle, but not the other way around. null is of its own, inexpressible type that is implicitly convertible to all handle types.

handle h
h = p         // ok
h = null      // ok
p = h         // won't compile
p = Point(h)  // explicit casting makes it compile

If a variable holds the null handle, then casting it will yield a valid null handle (since it's zeroes all around).

Actual objects live in pools. All pools are fixed-size. All pools join together into one common pool (called a universe), of maximum size 255. A single pool can be considered a slice of the universe. The order of pools in a universe is undefined. A universe is implemented as a collection of arrays, called field arrays, each for every field of the object. Word-sized fields are split between high and low bytes. The layout those arrays in memory is undefined. Examples below use some concrete pool layout for illustration purposes.

pool Point triangle [3] // handles 1-3 refer to here
pool Point quad [4]     // handles 4-7 refer to here
// effectively, it's the same as:
// const byte triangle$offset = 1
// const byte quad$offset = 4
// const byte triangle.size = 3
// const byte quad.size = 4
// array Point$x$hi [7]
// array Point$x$lo [7]
// array Point$y [7]

Indexing into those arrays will use the handle minus 1, so using a null handle might read from before the array, giving undefined results.

You can refer to pool elements by the indexing operator:

p = triangle[0] // equivalent to p = Point(1)

You can create a one-object pool that will give you simpler syntax later

object Point tmp_point
p = tmp_point
// instead of:
// pool Point tmp_point [1]
// p = tmp_point[0]

You can access the members of the object via the handle using the -> operator (binding very tightly and to the left)

p->x + p->y
// compiles to:
// (Point$x$hi[byte(p)-1]:Point$x$lo[byte(p)-1]) + Point$y[byte(p)-1]

Class hierarchies

Classes can inherit from each other. This is an is-a relationship. A class can inherit from multiple other classes. A class inherits all the fields of its parents. If a class inherits a field via multiple parents, it's inherited only once. A class can be abstract. A non-abstract class is called a concrete class.

abstract class Object { }
abstract class HasPosition(Object) {
    Point position
}
abstract class HasColour {
    byte colour
}
class Background (HasColour) { }
class Sprite (HasColour, HasPosition) { }
abstract class Recolourable (HasColour) { }
class RecolourableSprite (Sprite, Recolourable) { }
// RecolourableSprite has only one field called `colour`

A graph where all the vertices are classes and all the edges are parent-child relationships is called the class graph. The class graph can be split into one or more connected components. Each of those components is called a class hierarchy. Each hierarchy has to have exactly one base class (one class with no superclasses), called a root class. In the example above, that class is Object. The name of the hierarchy is the same as the name of the root class, although in case of automatically generated hierarchy-wide functions, a name of any class in the hierarchy will work (so Object.delete, Background.delete and Sprite.delete are all aliases for the same function)

All fields in the hierarchy have to have unique names. If you want to use a field with the same name and the same type in two different unrelated classes, consider creating an abstract class with just that field and inheriting from it. If you want to use a field with the same name but a different type, it's not allowed at all.

A class is called a leaf class if it's not abstract and it has no descendants. In the example above there are such classes: RecolourableSprite and Background

Before further processing, the hierarchy is trimmed:

all unused leaf classes are removed, all runtime type checks for those classes are stubbed to always fail
all abstract classes with exactly one child are merged with it, all runtime type checks for those classes are changed to check for the child

A hierarchy is called a singular hierarchy if it contains only one class. The class in a singular hierarchy is never abstract, even if declared otherwise. The Point hierarchy described at the beginning is a singular hierarchy.

Every hierarchy has its own universe. Handles to different universes can have the same numerical value, but they will point to different objects.

Object o
Sprite s
Point p
o == p // won't compile, different hierarchies
byte(o) == byte(p) // will compile, but it's quite meaningless
o == s // will compile
o = s  // will compile, Sprite is a subclass of Object
s = o  // won't compile, Object is not a subclass of Sprite
s = Sprite(o) // will compile, but may cause problems later if o doesn't point to a sprite

If a pool contains objects of a non-leaf class, then it needs to account for fields of all possible objects:

pool Object objects [5]
// creates arrays called Object$position and Object$colour
pool Background backgrounds [5]
// extends the array Object$colour by 5 elements, but not the Object$position array

The compiler tries to order the pools in the universe so that the field arrays are as short as possible. Note that in the example above, the compiler may decide that backgrounds contains objects with handles 1–5 and objects with 6–10, meaning that x->colour will compile to Object$colour[byte(x)-6].

(An optimal pool ordering algorithm is yet to be determined.)

Type identifiers

Each object may have a type identifier. A type identifier is an 8-bit number that identifies the actual runtime class of the object. The value 0 is reserved for uninitialized objects and means that they can be recycled. Abstract classes don't have type identifiers.

From the implementation perspective, it is implemented as a field called typeid in all objects of the hierarchy. However, if the program doesn't need type identifiers, then the hierarchy will not have the typeid field.

(The type of the typeid field is going to be either byte, or a hierarchy-specific opaque 8-bit type)

There are two types of type identifiers: compact identifiers and fast identifiers.

Compact identifiers are required if:

the program is using dynamic dispatch for this hierarchy
a class in this hierarchy has its type identifier overridden by the programmer:
class Thing @1 { }

Fast identifiers are required if:

the program is using runtime type checks for this hierarchy

Any identifiers are required if:

the program is using the typeid field explicitly for this hierarchy
the program is using object allocation mechanism for this hierarchy(described later)

Note that a non-singular hierarchy doesn't necessarily need identifiers.

If the hierarchy needs any kind of identifiers, but doesn't need particularly either compact or fast identifiers, then compact identifiers are chosen. If the hierarchy needs both compact and fast identifiers, then the typeid field contains compact identifiers and a translation table is created that translates them to fast identifiers during type checks.

A hierarchy that uses identifiers can contain maximum of 255 concrete classes. Fast identifiers are more limiting, so they may fail to compile at even lower class counts. This is due to the fact that compact identifiers are consecutive (unless overridden) byte values, starting from 1, while fast identifiers are bitmasks (see: Near Optimal Hierarchical Encoding of Types by A. Krall, J. Vitek & R.N. Horspool, 1997)

A runtime typecheck if x points to an object of class T (abstract or concrete) in hierarchy H may be implemented as follows:

if typeid is a fast identifier:
H$typeid[byte(x)-1] & T$fasttypeid == T$fasttypeid
if typeid is a compact identifier:
H$typeid$xlat[H$typeid[byte(x)-1]] & T$fasttypeid == T$fasttypeid

The syntax for such check is yet to be determined. For now, let's assume the syntax is <type>.is(<handle>), where <handle> is a handle within the <type>'s hierarchy.

A runtime typecheck for whether a handle is null or points to an object of a given class is also available. If possible, it should be achieved using the same code with an extra check. An example on how it might be implemented on 6502 for a class T in hierarchy H:

T.isNullOr: // asm clear_zero T.isNullOr(byte a)
    TAY
    BEQ __exit
T.is:       // asm clear_zero T.is(byte y)
    LDA H$typeid, Y
    // if the typeid is compact instead of fast:
    // TAY
    // LDA H$typeid$xlat, Y
    AND #T$fasttypeid
    EOR #T$fasttypeid
__exit:
    RTS

Manual object allocation

To mark all objects in a pool as uninitialized, you can call a function called <poolname>.clear:

pool Object objects [7]
objects.clear()
// this calls Object$clear(objects$offset, objects.size)

To mark all objects in a hierarchy as uninitalized, you can call a function called <hierarchyname>.clear:

Object.clear()
// this calls Object$clear(1, Object$count)

To allocate a new object in a pool, call <poolname>.new(<typename>). The given type must be a subtype of the pool element type. One uninitialized object in the pool will be given the correct type identifier and its handle will be returned. If the pool is full, then null is returned.

objects.new(Sprite)
// this calls Sprite(Object$new(objects$offset, objects.size, Sprite.typeid))

To delete a new object in a pool, call <hierarchyname>.delete(<handle>). The handle cannot be null.

Sprite s
s = objects.new(Sprite)
if s != null {
    // do things with s
    Object.delete(s)
}

To check if a handle refers to a valid object (as opposed to an uninitialised one), check the value of the typeid field.

TODO: decide whether ...$clear and ...$new should take start & size or start & end

Virtual method dispatch

Classes can have methods:

abstract class Recolourable(HasColour) {
    void change_colour(self, byte new_colour) {
        self->colour = new_colour
    }
}

This defines a function void Recolourable.change_colour(Recolourable Recolourable.change_colour$self, byte Recolourable.change_colour$p1) which can be called statically and directly and will by treated as a normal function. Unlike a normal function however, methods have a different calling convention:

all parameters except for the self parameter are guaranteed to be passed via global variables
self may be passed in a different way than a parameter to a normal function even if there are no other parameters to the method

The self parameter is obligatory.

Within an abstract class, a method may be declared abstract:

void method(self) abstract

A subclass is allowed to override a method. Within a hierarchy, all methods with a given name have to have the same return type and use the same parameter types (except for self, which is always of type of the class the method was declared in).

If you call the method via h->change_colour(c) where h is of static type H, then the compiler will rewrite it to an appropriate call:

if H contains a non-abstract definition of change_colour and none of its descendants does, the call will be rewritten as H.change_colour(h, c)
if neither the static type of h nor its ancestors contains a definition (abstract or not) of change_colour, a compile-time error occurs ("Method change_colour not defined in H" or similar)
if neither H doesn't contain a non-abstract definition of change_colour nor the narrowest common subclass of ancestors of H which contain a non-abstract definition of change_colour do, then a compile-time error occurs ("Ambiguous method change_colour" or similar). This may happen even without calling the method, at the time of class hierarchy analysis.
otherwise, a call to HH.change_colour$dynamic(h, c) is made, where HH is the hierarchy of H

The ...$dynamic function does a return dispatch on the compact type identifier of the object pointed to by the handle. It has an identical calling convention as methods and it can therefore ignore the parameters, including the handle itself, and just pass them unchanged to the implementations.

The actual dynamic choice of the implementation of a called method is similar to the static choice, just precomputed into a jumptable.

Object copying

To check, if a pool can contain a copy of given object, call <poolname>.can_contain(<handle>). The handle has to be in the same hierarchy.

objects.can_contain(h)
// calls Sprite.isOrNull(h)

Calling <poolname>.can_contain(<typename>) also works and yields a compile-time constant.

Copy is implemented as automatically generated method:

h->copy_into(objects[0])
// alternative potential syntax:
objects[0] := h

This method should use a different calling convention than other methods to ensure that it is reentrant. For example on 6502, it can use the X and Y registers for each handle.

If the source handle points to an uninitialized object, then unlike all the other methods, copy_into doesn't cause undefined behaviour, but simply does nothing. In contrast, copying involving a null handle causes undefined behaviour.

If not compiling for speed, the compiler may generate only one copy method implementation per field combination used by pools. If the pool of the source handle can be determined statically, then such method is used:

h := objects[2]
// compiles to
H$copy0(objects$offset + 2, h)

If not, then another method is called that checks statically for the field combination of a given pool and uses a jumptable to pick the implementation.

If compiling for speed, the compiler may generate normal virtual methods for every class with a new field, just with a different calling convention.

Garbage collection

If the program is using garbage collection for a given hierarchy, then all the objects in that hierarchy will be given another field $marked_for_gc (for space saving reasons, this field may be a bitmask if compiling without -Of) and a method void $mark_for_gc(self), with custom implementation for every class that introduces a new field with a handle to an object in the same hierarchy. But if no class does that, then $mark_for_gc is not generated.

The garbage collector is a simple mark-and-sweep.

Let's assume your hierarchy is called H. First, you need to unmark all roots:

H.unmark_all()

Then you need to mark the roots, i.e. the objects that should not be collected:

H.mark(x)

This sets the $marked_for_gc for the given object and if it wasn't marked before, calls x->$mark_for_gc() on it. The $mark_for_gc method calles H.mark on all non-null fields of x that refer to objects within this hierarchy. If the handle points to an uninitialized object, then unlike all the other methods, $mark_for_gc doesn't cause undefined behaviour, but simply does nothing.

You can mark all roots within a pool:

pool H important_things [10]
important_things.mark_all()

After you mark all your roots, you can just call H.sweep() to do the actual collection. All unmarked objects have their typeid changed to 0 and become therefore uninitialized. The marked status of objects after this process is undefined. Calling H.sweep without unmarking and marking roots again leads to undefined behaviour.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly