Skip to content

Data Types

LawnMeower edited this page Jul 23, 2024 · 9 revisions

A data type describes how a certain object is interpreted and what kind of value it expresses. There are two groups of data types: primitive types such as integers, floats, booleans, pointers and complex types. Primitive types are universally the same across all common platforms and have predefined operations. Complex data types such as strings, playable characters, NPCs, artificial neurons, etc. consist of member variables that can be of primitive or complex types, as well as operations and functions specific to their type. Complex types are mostly structs or classes and don't have a predefined size.

Primitive Types

Integers

Integers (also known as whole numbers or natural numbers) are variables that represent a number without a fraction. These numbers can be either signed or unsigned. Unsigned integers can hold any value between 0 and 2 raised to the power of their bit size (z) minus 1 (2z - 1). Signed integers represent values within the range of -(2z-1) to 2z-1-1. Since integers come in different lengths, the table below contrasts different types of integers by their lengths, ranges, and type names.

Common Names Bits Bytes Range (scientific, dec, hex) Usage Examples
byte, char, int8, i8 8 1 -(27-1) to 27-1
-128 to 127
0x00 - 0xFF
lap counter, health points, item count, ammo, etc
unsigned char, uint8, u8 8 1 28-1
0 to 255
0x00 - 0xFF
lap counter, health points, item count, ammo, etc
char (Java, C#), wchar (wide char), short, halfword, int16, i16 16 2 -(215) to 215-1
−32,768 to 32,767
0x0000 - 0xFFFF
item count, ammo, money, etc
unsigned char (Java, C#), unsigned short, halfword, uint16, u16 16 2 216-1
0 to 65,535
0x0000 - 0xFFFF
item count, ammo, money, rotation, etc
int, long, word, int32, i32 32 4 -(231) to 231-1
−2,147,483,648 to 2,147,483,647
0x00000000 - 0xFFFFFFFF
item count, time etc
unsigned int, unsigned long, word, uint32, u32 32 4 232-1
0 to 4,294,967,295
0x0000000000000000 - 0xFFFFFFFFFFFFFFFF
item count, time, etc
long long, doubleword, int64, i64 64 8 -(263) to 263-1
−9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
0x00000000 - 0xFFFFFFFF
item count, etc
unsigned long long, doubleword, uint64, u64 64 8 264-1
0 to 18,446,744,073,709,551,615
0x0000000000000000 - 0xFFFFFFFFFFFFFFFF
item count, etc

Integers are chosen by the expected value range during the programming process.

Floating Point

Floating point types are used to represent numbers that have a fraction (also known as real numbers). These types are used for operations that require precision and can represent a wide range of values, including very small numbers and very large ones integers cannot represent. There're two common floating point types: single precision (float) and double precision (double). A floating point value consists of a sign bit, bits to express the exponent, and bits for the mantissa to represent the fraction. The single precision floating point type has a width of 32 bits. Its double precision counterpart consists of 64 bits. The table below explains the length of each part of the float and value range.

Common Names Bits Bytes Range Usage Examples
float, single, float32, f32 32 4 1.175 * 10-38 to 3.4 * 10 38 energy bars, scale, position, rotation, time, velocity, gravity, HDR color values, pitch (SFX, BGM), aspect ratio, audio volume, etc
double, float64, f64 64 8 2.225 * 10-308 to 1.798 * 10308 scale, position, rotation, time, item count (Idle/Clicker games) etc

Identifying Floating Point Values

Viewing raw data within a memory viewer or from a memory dump makes recognizing data types difficult. However, floating point values always follow a specific pattern, making them fairly easy to spot. If a sequence of 32 bits (4 bytes) suits the ranges of 0x3A800000 to 0x46000000 or 0xBA800000 to 0xC6000000, and starts at an address that is a multiple of 4, it is most likely a 32-bit floating point value. For 64-bit floating point values, the ranges are 0x3F60000000000000 to 0x40C0000000000000 and 0xBF60000000000000 to 0xC0C0000000000000.

If the binary data originates from a little-endian program, remember that the byte order is reversed.

Here's a list of some floating point values in their hex representation that may occur:

Value Single Precision Double Precision
-Inf 0xFF800000 0xFFF0000000000000
-16.0 0xC1800000 0xC030000000000000
-8.0 0xC1000000 0xC020000000000000
-5.0 0xC0A00000 0xC014000000000000
-4.0 0xC0800000 0xC010000000000000
-3.0 0xC0400000 0xC008000000000000
-2.0 0xC0000000 0xC000000000000000
-1.5 0xBFC00000 0xBFF8000000000000
-1.0 0xBF800000 0xBFF0000000000000
-0.75 0xBF400000 0xBFE8000000000000
-0.5 0xBF000000 0xBFE0000000000000
-0.25 0xBE800000 0xBFD0000000000000
-0.2 0xBE4CCCCD 0xBFC999999999999A
-0.125 0xBE000000 0xBFC0000000000000
-0.1 0xBDCCCCCD 0xBFB999999999999A
-0.01 0xBC23D70A 0xBF847AE147AE147B
0.001 0x3A83126F 0x3F50624DD2F1A9FC
0.01 0x3C23D70A 0x3F847AE147AE147B
0.1 0x3DCCCCCD 0x3FB999999999999A
0.125 0x3E000000 0x3FC0000000000000
0.2 0x3E4CCCCD 0x3FC999999999999A
0.25 0x3E800000 0x3FD0000000000000
0.33333334 0x3EAAAAAB 0x3FD5555555555555
0.5 0x3F000000 0x3FE0000000000000
0.75 0x3F400000 0x3FE8000000000000
1.0 0x3F800000 0x3FF0000000000000
1.5 0x3FC00000 0x3FF8000000000000
2.0 0x40000000 0x4000000000000000
3.0 0x40400000 0x4008000000000000
4.0 0x40800000 0x4010000000000000
5.0 0x40A00000 0x4014000000000000
8.0 0x41000000 0x4020000000000000
10.0 0x41200000 0x4024000000000000
12.0 0x41400000 0x4028000000000000
16.0 0x41800000 0x4030000000000000
20.0 0x41A00000 0x4034000000000000
32.0 0x42000000 0x4040000000000000
64.0 0x42800000 0x4050000000000000
100.0 0x42C80000 0x4059000000000000
128.0 0x43000000 0x4060000000000000
200.0 0x43480000 0x4069000000000000
256.0 0x43800000 0x4070000000000000
512.0 0x44000000 0x4080000000000000
1000.0 0x447A0000 0x408F400000000000
1024.0 0x44800000 0x4090000000000000
2048.0 0x45000000 0x40A0000000000000
4096.0 0x45800000 0x40B0000000000000
8192.0 0x46000000 0x40C0000000000000
Inf 0x7F800000 0x7FF0000000000000

Anything above the Inf and -Inf range is NaN (Not a Number).

Loss of Precision

Since a 32-bit floating point value can represent larger numbers than a 32-bit integer, it comes with a loss in precision. This loss is most likely irrelevant for small values. However, adding 1.0 to a very large floating point number may result in no noticeable difference. Values with an infinite fraction will be rounded at some point. For example, 1/3 is not 0.̅3 (an infinitely repeating fraction) but 0.33333334.

More Floating Point Types

There're more floating point types besides those with single and double precision. In graphic processing and neural networks, 8-bit (mini-float) and 16-bit (half precision) floating point types are commonly used. Banking software often relies on 86-bit or even 128-bit floating point types where precision is crucial.

Boolean

A Boolean value is a value that only knows two different states: false and true. These are used to check if certain conditions or requirements are fulfilled to control the program flow accordingly. Even though only one bit of a Boolean value is used, it occupies an entire byte (8 bits) in memory because a byte is the smallest addressable unit. In a memory viewer or memory dump file Boolean values always appear as bytes of either 0x00 or 0x01. However, not every value being 0x00 or 0x01 is a Boolean since these values are common for integers as well!

Pointers

Pointers are special integers which value is a memory address. These addresses can redirect to values, memory region or functions. The length of a pointer is determined by the system architecture. This means on a 64-bit system a pointer is 64 bits in length. This comes in handy if the location of a value in memory changes. So a pointer with a constant position always redirects to a value that changes its location.

image

In some cases pointers even point to other pointers. In game hacking it is easy to tell if the found value also needs a pointer to be found. Simply check if the value is still what you expect after loading another scene or even restarting the game. If the value is still as expected, no pointer needs to be found. If the value is something totally unexpected and changing it does not affect anything, a pointer scan will be required.

Primitive Arrays

An array expresses a sequence of values of the same type that in most cases correlate to each other. They can be used as lists (sorted or unsorted), vectors, color values and even tables or matrices in case of multidimensional arrays (an array of arrays). The below list gives a few examples:

Element Type Item Count Item Width in Bytes Usage
int8, uint8 any 1 Look-up-tables, strings, list of IDs, inventory lists, RGB, RGBA
int16, uint16 any 2 Look-up-tables, strings, list of IDs, inventory lists
int32, uint32 any 4 Look-up-tables, inventory lists
float 3 4 position, scale, rotation, RGBF
float any 4 trigonometric value lists
float 4 4 RGBAF
double 3 8 position, scale, rotation
bool any 1 DIP Switches

Floating Point Arrays

Floating Point arrays are mostly used to describe geometry data as shown in the table above. Other use cases are lists of trigonometrical values that express a basic sine curve for instance. These are used to calculate animations and such. Color values that require smooth transition are also expressed as floats.

DIP Switches

Dual in-line package (DIP) Switches (sometimes referred to a mouse piano) is a list of boolean values. These are typically 32-bit values where each bit represents a certain state being set or not. Games use these to set certain states or flags of playable characters, NPCs, debug features, unlockable features and many more.

Complex Types

Strings

Strings are sequences of bytes mostly used to represent text. In order to represent texts of different character sets several types of character encodings have been developed. The most basic character encoding is known as ASCII. It covers all letters of the English alphabet, non-printable control character, numbers, and symbols. They make 128 characters in total. The most significant bit is unused so it can be used to implement different kinds of encodings that use ASCII as a subset of characters. One of these advanced character sets is known as Latin1 which also supports umlauts and more. To support a wider range of characters multi-byte character encodings have been created. Shift-Jis for instance use single bytes for ASCII characters and 2 byte for Kanji and Kana. UTF-16 is a pure double-byte character encoding which encodes ASCII as double-byte characters as well and supports about 65,535 characters. Today UTF-8 is mostly used since it uses a variable character length and optimizes the amount of data used. ASCII characters still occupy 1 Byte, but other characters like Kanji or emoji use 2 or even more bytes (the family emoji of 2 women, a girl and a boy with medium skin tone (👩🏽‍👩🏽‍👧🏽‍👦🏽) even is 38 bytes long!). In game hacking many character encodings may be relevant if you want to alter text or set breakpoints on it for debugging. Some games like those of the Pokémon series even have their own character encodings that don't obey any encoding standardization.

Below is a list of some character encodings:

Encoding Character Length in Bytes Codepoints
ASCII 1 128
UTF-8 1 - any 1,114,112
UTF-16 2 - any 1,112,064
UTF-32 4 1,114,112(?)
ISO-5688-1 - 16 1 256
Shift-Jis 1 or 2 5,801 to 6,879 depending on the code page

(Playable) Characters

Sometimes it's very useful to know where to find a player's/character's data since they have a lot of interesting stuff to mess with. Sadly there is not THE one method to find these. You can usually find the location of some character's energy, HP, velocity value or something and make out where all the related information begins. Here's an example what such data can look like show through a memory viewer using F-Zero GX and MungPlex.

image

Address Type Size in bytes Description
0x80C321A0 int32 4 Vehicle state DIP switch
0x80C321A4 int32 4 Vehicle ID
0x80C321DC string 64 Vehicle Name
0x80C3221C float32 4 Vehicle position X
0x80C32220 float32 4 Vehicle position Y
0x80C32224 float32 4 Vehicle position Z
0x80C32228 float32 4 Vehicle position X previous frame
0x80C3222C float32 4 Vehicle position Y previous frame
0x80C32230 float32 4 Vehicle position Z previous frame

Colors

Colors come in many different format. They can be used to color polygons directly, pixels of textures or as an effect parameter to alter the appearance of an object. A color value consists of 3 or 4 color values. One for red, green and blue. Colors that feature transparency have an additional alpha channel. These color values are either integers or floats. Some size-optimized color types have 2 bytes for 3 or even 4 channels. Each channel has a range from 0 to it's maximum value. The higher a color value, the more significant its effect. This is used to mix different colors from the 3 base color channels. The alpha channel alters its transparency. If 0 it's opaque, at it's maximum value it's see-through. Floating point-based channels usually don't exceed their maximum value of 1.0. If they do the effect can be unexpected like glitching colors or glowing.

Color Type Underlying Type Has Alpha Color Channel Range Usage
RGB int8 No 0 - 255 Texture pixels, polygons, effects
RGBA int8 Yes 0 - 255 Texture pixels, polygon, effects
RGBF float32 No 0.0 - 1.0 HDR, effects, smooth color transitions
RGBAF float32 Yes 0.0 - 1.0 HDR, effects, smooth color transitions
RGB565 byte fraction No 0 - 31 (R), 0 - 63 (G), 0 - 31 (B) Texture pixel in some Nintendo games
RGB5A3 byte fraction Optional Without alpha: 0 - 31, with alpha: 0 - 7 (A), 0 - 15 (all others) Texture pixel in some Nintendo games

Arrays of Complex Data Types

Complex data types are often used multiple times. In F-Zero GX an array of the above vehicle class is used with up to 30 vehicles. Scene data of levels also use huge arrays of actors appearing all around a level.