Skip to content

SafePtr

Adam ? edited this page May 25, 2021 · 1 revision

This page will highlight information about il2cpp GC and SafePtr and its usage in beatsaber-hook.

Garbage Collection (GC) in Il2Cpp

Il2Cpp is typically built with boehm garbage collection. You can read more about that here:

This garbage collector is very well known and operates very well. There are a few caveats to be aware of, however.

Specifically, due to how it operates, it scans for references before cleaning up allocated instances. The way it checks for references is by checking each sizeof(void*) region in other gc-allocated objects, attempts to treat them as pointers, and should this address point to a region within the gc heap, will avoid destroying it.

If that was difficult to understand, consider the following examples:

  • We create a simple object within GC, it has one field that is of a pointer type.
  • We create another simple object within GC, it only has int fields (specifically, it has no fields that are pointers)
  • We assign the first object's pointer to field to the address of the second.
  • We perform a GC-pass. Under normal operation, this would be assigned to run on another thread. This is what Il2Cpp does.

In such a scenario, it is important to note that the original object is (in this case) also GC-able. Thus, in this scenario, because nothing references the first object, it will be deleted, as will the second, since nothing will no longer point to it as well.

If we change the first object to instead be a pinned object (sometimes also called a specific object) it will be within the GC-heap, but will be uncollectable.

In such a scenario, the first object will not be destroyed (since it is a pinned/specific object) and because a reference exists to the second object within the mapped GC memory, the second object will not be collected either.

Now, if we were to change the first object's pointer field to no longer point to the second object, in this simple example, a further GC collection pass will cause the second object to be collected, since nothing no longer references it.

However, if we were to change the first object's pointer field to no longer point to the second object and also change the second object to hold a pointer to itself, then the second object will not be collected. Note that this introduces a workaround for garbage collection-- if you have a pointer to self within an allocated instance, this will count as a reference when GC performs reference counting. Also note that this provides an alternative to pinned/specific object allocation, albeit at the cost of some memory and some confusion. In addition, should the pointer to self ever be changed, then it will no longer count as a reference, and the instance could be garbage collected.

I imagine a fair bit of this is hard to discern from my wording alone, so here are some pictures that should help clarify some of these edge cases a bit better.

Something that must be stated, but may seem obvious, is that reference checking is only done for objects that exist within the GC-heap. This means that if you allocate a random object (say, by using operator new) and have a pointer within it that points to an instance in the GC-heap, GC will not know about this instance, and thus will not see the reference. This could lead to your pointed-to instance being deallocated, if there are no existing references to that instance that are within the GC-heap.

This is an especially important point since this means that holding pointers to GC-able objects is a fundamentally unsafe practice unless the location in which these pointers exist are made known to GC (so that it can perform reference counting).

Consider the following pseudo-c code:

c
static void* ptr; // A pointer to an instance that is within GC
...
// Assign the ptr at some point to an existing and known to be correct value.
...
// Attempt to use it later, but could have undefined behavior due to the instance being pointed to no longer existing.

Unfortunately, this is rather common behavior, and is particularly useful when modding Il2Cpp games. Holding a reference (obtained from a hook, for example) allows us to call methods on it or check fields of it at a later time. However, without a guarantee that the instance being pointed to exists, this can cause quite significant undefined behavior. For example, you could get a simple C# exception thrown from attempting to runtime_invoke one of the instance's methods, or you could get undefined behavior by attempting to read a field and use it in future statements, assuming it is valid.

So, it should be fairly clear that there is a motive behind keeping these instances around-- what can we do to keep these instances (that we wish to hold pointers to) from being garbage collected?

Well, as highlighted earlier, we have a couple of options. If we control the field layout of the instance in question, we can add a pointer field to the instance and have it point to itself. This will forbid implicit garbage collection passes from collecting and thus deallocating the instance, but this is largely impractical for several reasons:

  • It is very rare that we (as modders) want to hook all of the places where a particular instance of a type may be created in order to simply add to the size being allocated and forcibly set a pointer field to itself. This also adds considerable complexity, not to mention memory and performance overheads to boot.
  • If we DO control the field layout of the instance (for whatever reason), it still comes at the cost of additional GC-allocated memory that may not be necessary for all instances of this type, instead only the particular one we wish to hold.

So, we usually don't want to add pointers to the type in question (or it may prove to be far more challenging than managing GC in another way). We still have other options. For example, we know that if we allocate an instance within the GC-heap, we know it will be searched during reference counting. We can take advantage of this to hold a pointer to the instance we wish to store.

There are, once again, a few options available here.

If we are allocating custom C# types anyways (a la custom-types or something similar) then we could add a field to the custom type (since these C# types will be allocated through GC proper and will exist on the GC-heap) which points to the instance we wish to hold.

The benefits to this approach are many:

  • We can hold the instance for as long as our custom type's instance exists for. After the custom type's lifetime expires, the reference to the instance will be dropped, which ensures we don't hold the instance for longer than we may need to.
  • The overhead (of both performance and memory) is extremely small here. If we are already allocating a custom type to use for other reasons, this is hardly an issue.

However, if we are not already using a custom type for this task, this is a lot of unnecessary work.

Enter SafePtr<T>. This is a thin proxy type that operates similarly to a pointer type, yet under the hood it ensures the address pointed to always has a reference within the GC-heap. It does this by creating a pinned/specific GC instance with only one field, which is a pointer to the held instance. Because pinned instances do not get cleaned up by GC implicitly, this is ideal for ensuring a reference exists to the instance you wish to use.

SafePtr also has an internal reference counter which it shares among all other SafePtr instances that point to the same address. This allows for copy semantics without the reference left dangling in either instance, or without the underlying pinned/specific instance being created/deleted more than necessary.

On destruction of a SafePtr, it decrements its internal reference count of the held address and will explicitly free the pinned/specific object it owns within the GC-heap, if the reference count has reached 0. As such, it is recommended to use SafePtr in order to ensure the lifetime of a pointer, and to pass a single instance of the SafePtr around via references/moves instead of copies, to avoid performance overhead.

Note that SafePtr has a default construction. However, this default construction places the SafePtr in a state in which it holds no instance whatsoever, which is different than it holding a nullptr, which would allocate a pinned/specific instance on the GC-heap to hold the nullptr.

The rationale behind this is due to the fact that SafePtr's emplace and assignment operations require il2cpp_init to already be called and for a valid GC_Free and GC_alloc_specific to have been found (see il2cpp-functions.hpp/il2cpp-functions.cpp). The default construction does not require this and can thus be used as a static field without issue.

SafePtr is not without its downsides, however. Due to the fact that each creation of a unique address allocates a new GC-heap instance, there is a bit of a performance impact, not to mention a small memory impact as well. In general, a SafePtr should only be used as a sort of "last resort", that is, if you don't already have custom types being created and you wish to only ensure the lifetime of a few instances, perhaps with varying lifetimes for each.

To summarize, SafePtr is a specific beast. It certainly has its uses, as it allows the initial pseudo-c at the beginning of this to be written as such:

SafePtr<Il2CppObject> ptr; // Represents a safe pointer to an Il2CppObject that will not be implicitly collected.
...
// Attempt to assign an existing instance to ptr. This is as easy as ptr = x; or ptr.emplace(x);
...
// Further attempts to use ptr are successful, as the instance held is no longer implicitly garbage collected.

which is certainly very powerful. It also lets you control the "extraneous" lifetimes of held instances, since there will be a reference to the instance until all SafePtr instances pointing to that instance are destroyed.

The "extraneous" lifetime IS important to note, however, because SafePtr simply adds a reference to the instance, it does NOT stop explicit GC_Free calls to the instance from destroying it, nor does it ensure the instance will be garbage collected immediately after the SafePtr holding it is destroyed. It is possible that the game holds more references to the instance also within the GC-heap, in which case a SafePtr may not be necessary, and will slow down your code.

Also note that having three SafePtr instances for three different instances (for example) is worse as far as memory and performance goes than a custom type that simply has all three instances as fields. This is because each unique SafePtr performs an allocation and deallocation, whereas a single custom type will only perform this once.

So, are SafePtrs useful? Yes.

Are they the answer to all things GC-related? Absolutely not.

Hopefully this has helped you understand more about SafePtr and GC in Il2Cpp in general.

Feel free to reach out to me on Discord if you have questions or feedback at: Sc2ad#8836

Clone this wiki locally