-
Notifications
You must be signed in to change notification settings - Fork 4
Data Types
A data type describes how a certain object is interpreted and what kind of value it expresses. There are two groups of data types: primitive types such as integers, floats, booleans, pointers and complex types. Primitive types are universally the same across all common platforms and have predefined operations. Complex data types such as strings, playable characters, NPCs, artificial neurons, etc. consist of member variables that can be of primitive or complex types, as well as operations and functions specific to their type. Complex types are mostly structs or classes and don't have a predefined size.
Integers (also known as whole numbers or natural numbers) are variables that represent a number without a fraction. These numbers can be either signed or unsigned. Unsigned integers can hold any value between 0 and 2 raised to the power of their bit size (z) minus 1 (2z - 1). Signed integers represent values within the range of -(2z-1) to 2z-1-1. Since integers come in different lengths, the table below contrasts different types of integers by their lengths, ranges, and type names.
Common Names | Bits | Bytes | Range (scientific, dec, hex) | Usage Examples |
---|---|---|---|---|
byte, char, int8, i8 | 8 | 1 | -(27-1) to 27-1 -128 to 127 0x00 - 0xFF |
lap counter, health points, item count, ammo, etc |
unsigned char, uint8, u8 | 8 | 1 | 28-1 0 to 255 0x00 - 0xFF |
lap counter, health points, item count, ammo, etc |
char (Java, C#), wchar (wide char), short, halfword, int16, i16 | 16 | 2 | -(215) to 215-1 −32,768 to 32,767 0x0000 - 0xFFFF |
item count, ammo, money, etc |
unsigned char (Java, C#), unsigned short, halfword, uint16, u16 | 16 | 2 | 216-1 0 to 65,535 0x0000 - 0xFFFF |
item count, ammo, money, rotation, etc |
int, long, word, int32, i32 | 32 | 4 | -(231) to 231-1 −2,147,483,648 to 2,147,483,647 0x00000000 - 0xFFFFFFFF |
item count, time etc |
unsigned int, unsigned long, word, uint32, u32 | 32 | 4 | 232-1 0 to 4,294,967,295 0x0000000000000000 - 0xFFFFFFFFFFFFFFFF |
item count, time, etc |
long long, doubleword, int64, i64 | 64 | 8 | -(263) to 263-1 −9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 0x00000000 - 0xFFFFFFFF |
item count, etc |
unsigned long long, doubleword, uint64, u64 | 64 | 8 | 264-1 0 to 18,446,744,073,709,551,615 0x0000000000000000 - 0xFFFFFFFFFFFFFFFF |
item count, etc |
Integers are chosen by the expected value range during the programming process.
Floating point types are used to represent numbers that have a fraction (also known as real numbers). These types are used for operations that require precision and can represent a wide range of values, including very small numbers and very large ones integers cannot represent. There're two common floating point types: single precision (float) and double precision (double). A floating point value consists of a sign bit, bits to express the exponent, and bits for the mantissa to represent the fraction. The single precision floating point type has a width of 32 bits. Its double precision counterpart consists of 64 bits. The table below explains the length of each part of the float and value range.
Common Names | Bits | Bytes | Range | Usage Examples |
---|---|---|---|---|
float, single, float32, f32 | 32 | 4 | 1.175 * 10-38 to 3.4 * 10 38 | energy bars, scale, position, rotation, time, velocity, gravity, HDR color values, pitch (SFX, BGM), aspect ratio, audio volume, etc |
double, float64, f64 | 64 | 8 | 2.225 * 10-308 to 1.798 * 10308 | scale, position, rotation, time, item count (Idle/Clicker games) etc |
Viewing raw data within a memory viewer or from a memory dump makes recognizing data types difficult. However, floating point values always follow a specific pattern, making them fairly easy to spot. If a sequence of 32 bits (4 bytes) suits the ranges of 0x3A800000 to 0x46000000 or 0xBA800000 to 0xC6000000, and starts at an address that is a multiple of 4, it is most likely a 32-bit floating point value. For 64-bit floating point values, the ranges are 0x3F60000000000000 to 0x40C0000000000000 and 0xBF60000000000000 to 0xC0C0000000000000.
If the binary data originates from a little-endian program, remember that the byte order is reversed.
Here's a list of some floating point values in their hex representation that may occur:
Value | Single Precision | Double Precision |
---|---|---|
-Inf | 0xFF800000 | 0xFFF0000000000000 |
-16.0 | 0xC1800000 | 0xC030000000000000 |
-8.0 | 0xC1000000 | 0xC020000000000000 |
-5.0 | 0xC0A00000 | 0xC014000000000000 |
-4.0 | 0xC0800000 | 0xC010000000000000 |
-3.0 | 0xC0400000 | 0xC008000000000000 |
-2.0 | 0xC0000000 | 0xC000000000000000 |
-1.5 | 0xBFC00000 | 0xBFF8000000000000 |
-1.0 | 0xBF800000 | 0xBFF0000000000000 |
-0.75 | 0xBF400000 | 0xBFE8000000000000 |
-0.5 | 0xBF000000 | 0xBFE0000000000000 |
-0.25 | 0xBE800000 | 0xBFD0000000000000 |
-0.2 | 0xBE4CCCCD | 0xBFC999999999999A |
-0.125 | 0xBE000000 | 0xBFC0000000000000 |
-0.1 | 0xBDCCCCCD | 0xBFB999999999999A |
-0.01 | 0xBC23D70A | 0xBF847AE147AE147B |
0.001 | 0x3A83126F | 0x3F50624DD2F1A9FC |
0.01 | 0x3C23D70A | 0x3F847AE147AE147B |
0.1 | 0x3DCCCCCD | 0x3FB999999999999A |
0.125 | 0x3E000000 | 0x3FC0000000000000 |
0.2 | 0x3E4CCCCD | 0x3FC999999999999A |
0.25 | 0x3E800000 | 0x3FD0000000000000 |
0.33333334 | 0x3EAAAAAB | 0x3FD5555555555555 |
0.5 | 0x3F000000 | 0x3FE0000000000000 |
0.75 | 0x3F400000 | 0x3FE8000000000000 |
1.0 | 0x3F800000 | 0x3FF0000000000000 |
1.5 | 0x3FC00000 | 0x3FF8000000000000 |
2.0 | 0x40000000 | 0x4000000000000000 |
3.0 | 0x40400000 | 0x4008000000000000 |
4.0 | 0x40800000 | 0x4010000000000000 |
5.0 | 0x40A00000 | 0x4014000000000000 |
8.0 | 0x41000000 | 0x4020000000000000 |
10.0 | 0x41200000 | 0x4024000000000000 |
12.0 | 0x41400000 | 0x4028000000000000 |
16.0 | 0x41800000 | 0x4030000000000000 |
20.0 | 0x41A00000 | 0x4034000000000000 |
32.0 | 0x42000000 | 0x4040000000000000 |
64.0 | 0x42800000 | 0x4050000000000000 |
100.0 | 0x42C80000 | 0x4059000000000000 |
128.0 | 0x43000000 | 0x4060000000000000 |
200.0 | 0x43480000 | 0x4069000000000000 |
256.0 | 0x43800000 | 0x4070000000000000 |
512.0 | 0x44000000 | 0x4080000000000000 |
1000.0 | 0x447A0000 | 0x408F400000000000 |
1024.0 | 0x44800000 | 0x4090000000000000 |
2048.0 | 0x45000000 | 0x40A0000000000000 |
4096.0 | 0x45800000 | 0x40B0000000000000 |
8192.0 | 0x46000000 | 0x40C0000000000000 |
Inf | 0x7F800000 | 0x7FF0000000000000 |
Anything above the Inf and -Inf range is NaN (Not a Number).
Since a 32-bit floating point value can represent larger numbers than a 32-bit integer, it comes with a loss in precision. This loss is most likely irrelevant for small values. However, adding 1.0 to a very large floating point number may result in no noticeable difference. Values with an infinite fraction will be rounded at some point. For example, 1/3 is not 0.̅3 (an infinitely repeating fraction) but 0.33333334.
There're more floating point types besides those with single and double precision. In graphic processing and neural networks, 8-bit (mini-float) and 16-bit (half precision) floating point types are commonly used. Banking software often relies on 86-bit or even 128-bit floating point types where precision is crucial.
A Boolean value is a value that only knows two different states: false and true. These are used to check if certain conditions or requirements are fulfilled to control the program flow accordingly. Even though only one bit of a Boolean value is used, it occupies an entire byte (8 bits) in memory because a byte is the smallest addressable unit. In a memory viewer or memory dump file Boolean values always appear as bytes of either 0x00 or 0x01. However, not every value being 0x00 or 0x01 is a Boolean since these values are common for integers as well!
Pointers are special integers which value is a memory address. These addresses can redirect to values, memory region or functions. The length of a pointer is determined by the system architecture. This means on a 64-bit system a pointer is 64 bits in length. This comes in handy if the location of a value in memory changes. So a pointer with a constant position always redirects to a value that changes its location.
In some cases pointers even point to other pointers. In game hacking it is easy to tell if the found value also needs a pointer to be found. Simply check if the value is still what you expect after loading another scene or even restarting the game. If the value is still as expected, no pointer needs to be found. If the value is something totally unexpected and changing it does not affect anything, a pointer scan will be required.
An array expresses a sequence of values of the same type that in most cases correlate to each other. They can be used as lists (sorted or unsorted), vectors, color values and even tables or matrices in case of multidimensional arrays (an array of arrays). The below list gives a few examples:
Element Type | Item Count | Item Width in Bytes | Usage |
---|---|---|---|
int8, uint8 | any | 1 | Look-up-tables, strings, list of IDs, inventory lists, RGB, RGBA |
int16, uint16 | any | 2 | Look-up-tables, strings, list of IDs, inventory lists |
int32, uint32 | any | 4 | Look-up-tables, inventory lists |
float | 3 | 4 | position, scale, rotation, RGBF |
float | any | 4 | trigonometric value lists |
float | 4 | 4 | RGBAF |
double | 3 | 8 | position, scale, rotation |
bool | any | 1 | DIP Switches |
Floating Point arrays are mostly used to describe geometry data as shown in the table above. Other use cases are lists of trigonometrical values that express a basic sine curve for instance. These are used to calculate animations and such. Color values that require smooth transition are also expressed as floats.
Dual in-line package (DIP) Switches (sometimes referred to a mouse piano) is a list of boolean values. These are typically 32-bit values where each bit represents a certain state being set or not. Games use these to set certain states or flags of playable characters, NPCs, debug features, unlockable features and many more.
Strings are sequences of bytes mostly used to represent text. In order to represent texts of different character sets several types of character encodings have been developed. The most basic character encoding is known as ASCII. It covers all letters of the English alphabet, non-printable control character, numbers, and symbols. They make 128 characters in total. The most significant bit is unused so it can be used to implement different kinds of encodings that use ASCII as a subset of characters. One of these advanced character sets is known as Latin1 which also supports umlauts and more. To support a wider range of characters multi-byte character encodings have been created. Shift-Jis for instance use single bytes for ASCII characters and 2 byte for Kanji and Kana. UTF-16 is a pure double-byte character encoding which encodes ASCII as double-byte characters as well and supports about 65,535 characters. Today UTF-8 is mostly used since it uses a variable character length and optimizes the amount of data used. ASCII characters still occupy 1 Byte, but other characters like Kanji or emoji use 2 or even more bytes (the family emoji of 2 women, a girl and a boy with medium skin tone (👩🏽👩🏽👧🏽👦🏽) even is 38 bytes long!). In game hacking many character encodings may be relevant if you want to alter text or set breakpoints on it for debugging. Some games like those of the Pokémon series even have their own character encodings that don't obey any encoding standardization.
Below is a list of some character encodings:
Encoding | Character Length in Bytes | Codepoints |
---|---|---|
ASCII | 1 | 128 |
UTF-8 | 1 - any | 1,114,112 |
UTF-16 | 2 - any | 1,112,064 |
UTF-32 | 4 | 1,114,112(?) |
ISO-5688-1 - 16 | 1 | 256 |
Shift-Jis | 1 or 2 | 5,801 to 6,879 depending on the code page |
Sometimes it's very useful to know where to find a player's/character's data since they have a lot of interesting stuff to mess with. Sadly there is not THE one method to find these. You can usually find the location of some character's energy, HP, velocity value or something and make out where all the related information begins. Here's an example what such data can look like show through a memory viewer using F-Zero GX and MungPlex.
Address | Type | Size in bytes | Description |
---|---|---|---|
0x80C321A0 | int32 | 4 | Vehicle state DIP switch |
0x80C321A4 | int32 | 4 | Vehicle ID |
0x80C321DC | string | 64 | Vehicle Name |
0x80C3221C | float32 | 4 | Vehicle position X |
0x80C32220 | float32 | 4 | Vehicle position Y |
0x80C32224 | float32 | 4 | Vehicle position Z |
0x80C32228 | float32 | 4 | Vehicle position X previous frame |
0x80C3222C | float32 | 4 | Vehicle position Y previous frame |
0x80C32230 | float32 | 4 | Vehicle position Z previous frame |
Colors come in many different format. They can be used to color polygons directly, pixels of textures or as an effect parameter to alter the appearance of an object. A color value consists of 3 or 4 color values. One for red, green and blue. Colors that feature transparency have an additional alpha channel. These color values are either integers or floats. Some size-optimized color types have 2 bytes for 3 or even 4 channels. Each channel has a range from 0 to it's maximum value. The higher a color value, the more significant its effect. This is used to mix different colors from the 3 base color channels. The alpha channel alters its transparency. If 0 it's opaque, at it's maximum value it's see-through. Floating point-based channels usually don't exceed their maximum value of 1.0. If they do the effect can be unexpected like glitching colors or glowing.
Color Type | Underlying Type | Has Alpha | Color Channel Range | Usage |
---|---|---|---|---|
RGB | int8 | No | 0 - 255 | Texture pixels, polygons, effects |
RGBA | int8 | Yes | 0 - 255 | Texture pixels, polygon, effects |
RGBF | float32 | No | 0.0 - 1.0 | HDR, effects, smooth color transitions |
RGBAF | float32 | Yes | 0.0 - 1.0 | HDR, effects, smooth color transitions |
RGB565 | byte fraction | No | 0 - 31 (R), 0 - 63 (G), 0 - 31 (B) | Texture pixel in some Nintendo games |
RGB5A3 | byte fraction | Optional | Without alpha: 0 - 31, with alpha: 0 - 7 (A), 0 - 15 (all others) | Texture pixel in some Nintendo games |
Complex data types are often used multiple times. In F-Zero GX an array of the above vehicle class is used with up to 30 vehicles. Scene data of levels also use huge arrays of actors appearing all around a level.