Skip to content

IL2CPP Basics

Sam Byass edited this page Oct 27, 2021 · 1 revision

What is IL2CPP?

If you're here, you probably know most of this. But there are some bits you may not know, so it's possibly still worth a read.

IL2CPP takes compiled .NET ("managed") assemblies (DLLs), and converts them to C++ source code.

Most methods simply get converted to C++ and written to the source file, but generic methods get split into variants[1] depending on:

  • Which generic parameters are ever actually used (for example, if List<int> is never used in the game, it won't be generated)
  • The size of the generic argument (e.g. int has size 4, bool is size 1, long/any complex object is size 8).

In addition, certain small methods may be inlined (copied into all methods which call them), and others may be stripped (removed) from the codebase, if they're not used. .NET usually does this at runtime during JIT (Just-In-Time) compilation, but because IL2CPP is AOT (Ahead-Of-Time), it does this now.

This is why, at runtime using a solution like Il2CppAssemblyUnhollower, some generic variants or methods may simply not exist, and trying to call them will cause a crash or exception.

But the part we're interested here is that IL2CPP itself is deterministic. The same input code will result in the same C++ generated code, every time, regardless of what else is going on in the game. How the C++ compiler chooses to compile that code can change, but overall there is some level of predictability to the output from IL2CPP - which is one of the two reasons that any of this is possible. The other reason is:

The Metadata

.NET assemblies have metadata associated with them, which contains information on which Types, Methods, Fields, Properties, Events, etc. are present in the assembly. Reflection in .NET uses this metadata to provide information at runtime on what fields exist on a type, for example, which is very useful for serialization libraries (like TOML or JSON serializers) to be able to create an object from the serialized form, and vice versa.

Of course, that means, for these serialization libraries to work in IL2CPP (and a LOT of developers like to use them!), it too must have this information stored somewhere. And that is where the global-metadata.dat file comes in. It's separate to the binary, unlike .NET metadata, but it contains almost all of the same information.

The CodeRegistration and MetadataRegistration

Most of the really interesting data is stored in the metadata file, but there is some more in the binary itself. Information on generic methods, method pointers, Runtime Generic Context objects (RGCTXs), debugger information, etc. is all stored in the Il2CppCodeRegistration struct.

And information on generic classes, types (not to be confused with type definitions), generic method specifications, and metadata usages (more on those later), among other data, is stored in the Il2CppMetadataRegistration struct.

The tricky part is finding this data. It's not exported by the binary, so it doesn't have a convenient label on it. It's not referenced by the metadata file. How exactly we find it varies from binary format to binary format, unity version to unity version, but some examples include looking for the string "mscorlib", which we know has to be one of the assemblies in the application, looking for certain known numerical values based on data in the metadata file, and looking for the initializer function (g_CodegenRegistration) in an ELF binary's configured startup routines.

Metadata Usages

Metadata usages are a struct that doesn't have an exact counterpart in .NET. They can be one of 6 kinds: two kinds of type pointer, string literals, method definitions, generic method information, and field definitions. They're used in every place you'd expect one of these other kinds of objects to be used in .NET code. For example, if you pass a format string to string.Format, that will be a Metadata Usage. If you create a new object, the type of the object being created is a Metadata Usage. If you want to invoke a generic method, exactly which method you want to invoke is passed in as a Metadata Usage. I'm sure you get the idea.

Cpp2IL represents these using the Mono.Cecil types TypeReference, FieldDefinition, MethodDefinition, the type Il2CppGlobalGenericMethodRef from LibCpp2IL, and the standard string type.

Conclusion

The IL2CPP space is vast, and this has been just a small sample of the complexities contained within. But hopefully these give you an idea of the basics that can be built on as you move on to more complex topics.

[1]: Note this changed, or rather, CAN be changed, in Unity 2021.2, so that all generic methods have one "canonical" form which doesn't care about the size of the generic parameter.