If this project helped you reduce developement time or you just want to help me continue making useful tools
This kit was created so programmers could encode a SEQUENCE of numbers (n-tuple) into 1 by using the Szudzik Pairing function (a space saving version of the Cantor Pairing Function [explained below in "Cantor VS Szudik"]) with effective use of C# integral types.
Note that if you want to encode a SET of numbers you can still use this Kit but you would need to do some preliminary steps. Either you
- Encode the Set as a Sequence
- by sorting your numbers [O(nlogn)]
- Encode every possible Sequence that can be created from this Set
- by creating every possible sequence [O(2^n)]
Additionally, it expands the capability of both the Szudzik and Cantor Pairing function so you can
- combine up to 8 numbers in a sequence into 1 (even though these 2 originally only worked to encode 2 numbers)
- using repetition [explained below]
- Encode Natural numbers, Integers, and Rational numbers into 1 (even though these 2 originally only worked to encode natural numbers)
- using bijections between sets [explained below]
Namespace:
- pairingKit (at the beginning of your script type 'using pairingKit')
Public Classes
- tupleBase (holds the base code used by _2tuple)
- _2tuple [for 2 numbers of types (uint or int) or (ushort or short) or (byte or sbyte)]
- _3tuple [for 3 numbers of types (ushort or short) or (byte or sbyte)]
- _4tuple [for 4 numbers of types (ushort or short) or (byte or sbyte)]
- _5tuple [for 5 numbers of types (byte or sbyte)]
- _6tuple [for 6 numbers of types (byte or sbyte)]
- _7tuple [for 7 numbers of types (byte or sbyte)]
- _8tuple [for 8 numbers of types (byte or sbyte)]
Every _nTuple Public Class has many different versions of 2 functions
- combine(a,b,...)
- combines a sequence of numbers indicated by the class into 1 number
- NOTE: the different versions are so that you can encode numbers of different Integral types with each other (within the limits explained below)
- reverse(z)
- returns the sequence of numbers indicated by the class from 1 number
- NOTE: the sequence returned will be an array of the same unsigned type
- in some cases you will have to do some unsigned to signed conversion OR some type casting to get back your original numbers (using tupleBase)
Additionally...
- ALL (signed or unsigned) nTuples will map to ONE unsigned Integral
- ALL unsigned Integrals will map to a unsigned nTuple of the largest size possible
- so if your original mapping used either (1) signed numbers (2) different types, you will have to use casting to get back your exact numbers
The Original Cantor and Szudzik Functions only map 2 Natural Numbers to another Natural Number. (Natural Numbers in this case numbers 0,1,2,3,...) [[N]] (original)
We can also Map 2 Integers to another Integer by simply converting the Integer into a Natural number given its type. So In other words if we want to convert an sbyte (with range -128 to 127) into a byte (with range 0 to 255) we simply add 128 to our sbyte and we are guaranteed to get something within byte range. (integers are numbers …,-2,-1,0,1,2,...) [[I]] (automatic)
Rational Numbers are numbers that can be formed by dividing 2 Integers. Instead of mapping 2 Rational Numbers we could simply map 4 Integers into 1. [[R]] (using _4tuple)
Cantor VS Szudik
The Cantor Pairing Function maps 2 Natural Numbers to a Natural Number but it wasn't specifically created to work with computers. Consequently,
mapping 2 [8 bit] numbers requires 1 [32 bit] number
even though all possible combinations of 8 bit numbers sum to 65536 combinations
which is exactly the amount of different numbers a [16 bit] number can represent.
The Szudik Pairing Function is a modification of the Cantor Pairing function where
2 [8 bit] numbers can map to 1 [16 bit] number
2 [16 bit] numbers can map to 1 [32 bit] number
2 [32 bit] numbers can map to 1 [64 bit] number
and so on if a larger integral types existed (but I decided to not use BigInteger)
It was created to use space effectively.
Additionally, performance testing revealed that on Average Cantor Pairing and Szudzik Pairing takes the same amount of Ticks (using the C# "Stopwatch" Class from "System.Diagnostics")
On Average
- mapping 2 numbers to 1 takes 3 ticks
- mapping 1 number to 2 takes 30 ticks
For that reason even though there is an implementation of Cantor Pairing in the tupleBase class we use Szudzik Pairing for all the main _nTuple Public Classes.
- http://szudzik.com/ElegantPairing.pdf
- https://stackoverflow.com/questions/919612/mapping-two-integers-to-one-in-a-unique-and-deterministic-way
This is useful in many scenarios but for the sake of simplicity I will explain how It is useful in a specific scenario.
Scenario:
- We have a space of N^K.
- Note: a 3D space of 100 by 100 by 100… would be a space of 100^3 where N=100, and K=3
- Every Location is a Sequence of Integers and has a set of data assigned to it.
- We want to access this information given the location.
There are multiple solutions
- Use a Multi-Dimensional Array
- Use a nested dictionaries
- Use pairing functions multiple times
- feel free to suggest another...
As long as (N is small && K is small) Multi-dimensional Arrays work great because they are easy to use, and have constant access time. However when (N is large || K is large) the amount of memory required to store this Multi-dimensional Array becomes ridiculous.
If your N and K are both something ridiculous AND everything in those N^K locations is different… the there is not much you can do… you need a ridiculous amount of memory regardless.
However, if any of those mappings are the same then why would you have a structure with duplicate data or empty data slots? That is where you would use (2) or (3).
Scenario Extended:
- some quantity of your data is repeated
Similar to Arrays, Nested Dictionaries have constant access time and are also easy to use. But they are great because even though your N and K are something ridiculous you don't need to have N^K different memory locations. You only need to have 1 copy of every piece of data that is different which is of course is a tremendous space saver. (assuming you only had ([N^K] / 2) different types of data… you only store that much data and not N^K chunks of data… where half of it is either empty or a duplicate)
However, note that I am giving you the K dimensional location that should map to a piece of data for that location but it is not a requirement to save that K dimensional location.
Scenario Extended Again:
- don't store data you don't require
Once again, similar to previous solutions, we have ease of use and constant access time but now we can store "less data" by combining the K dimensional location into 1 number when we are given a new K dimensional location and then simply get what the 1 numbers maps to by using the pairingKit.
More Details Below in "Nested Dictionaries VS Pairing Kit"
Nested Dictionaries are a better option If
- your sequence is longer than 8
- or if even 1 number in the sequence is larger than the kit allows for that size of sequence
- EX 1: you need a sequence of 8 uints or ints
- EX 2: you need a sequence of 2 ulongs
Pairing Kit is a better option otherwise because
- nested Dictionaries can be a bigger hassle to use
- nested Dictionaries would take up more space than just mapping all values into 1 and storing 1 Dictionary with that mapping (based on my limited knowledge)
- Note that I believe this is the case for C# but it might not be for another language
Note that If I am encoding 2 [8 bit] numbers I require a [16 bit] number to store all possible combinations. This is because each 8 bit number can store 256 numbers (0 -> 255). And every possible combination of two [8 bit] numbers is 256*256 = 65,536 combinations which is exactly how many numbers a [16 bit] number can store (0 -> 65,535). So as mentioned previously we can see that...
- to encode 2, [8 bit] numbers we need a [16 bit] number
- to encode 2, [16 bit] numbers we need a [32 bit] number
- to encode 2, [32 bit] numbers we need a [64 bit] number
- and so on...
I avoided the use of BigInteger anywhere because:
- It would have made this already large simple kit even larger
- Technically (from my limited understanding) BigInteger can store arbitrarily large numbers without losing precision so the users of this kit might use it and not realize that in their particular scenario using nested dictionaries would work better
Since I am not using BigInteger, the larger the sequence, the smaller each number in the sequence must be. This is because the largest number I am allowing all the 2->8 numbers to encode into is a ulong which can store a max of 2^64 possibilities.
I illustrate this visually below...
ulong 64 bits
uint | uint 64 bits = 32 bit + 32 bits
ushort | ushort | ushort | ushort 64 bits = 16 bits + 16 bits + 16 bits + 16 bits
byte | byte | byte | byte | byte | byte | byte | byte 64 bits = 8 bits * 8
Since our largest number after the encoding must fit into 64 bits.
- To combine 2 numbers each number
- must be of type: (uint or int) or (ushort or short) or (byte or sbyte)
- not of type: (ulong or long)
- To combine 3 numbers each number
- must be of type: (ushort or short) or (byte or sbyte)
- not of type: (uint or int) or (ulong or long)
- To combine 4 numbers each number
- must be of type: (ushort or short) or (byte or sbyte)
- not of type: (uint or int) or (ulong or long)
- To combine 5 numbers each number
- must be of type: (byte or sbyte)
- not of type: (uint or int) or (ushort or short) or(ulong or long)
- To combine 6 numbers each number
- must be of type: (byte or sbyte)
- not of type: (uint or int) or (ushort or short) or(ulong or long)
- To combine 7 numbers each number
- must be of type: (byte or sbyte)
- not of type: (uint or int) or (ushort or short) or(ulong or long)
- To combine 8 numbers each number
- must be of type: (byte or sbyte)
- not of type: (uint or int) or (ushort or short) or(ulong or long)
Note that these functions provide what is needed for most scenarios but you can make your own custom functions to handle all kinds of weird scenarios.
For example _4tuple(uint a, byte b, byte c, byte d). For this to work you have to keep in mind how the mappings work. First you can map 'c' and 'd' into a ushort, then you can map 'b' and 'cd' into a uint, then you map 'a' 'bcd' into a ulong. You cannot however map 'a' with 'b' first because 'ab' will be a ulong and a ulong and another other type paired together would require BigInteger.
Additionally, you will have to make a reverse method that unwraps the numbers in the reverse way in which you combined them.
- all automatic casting
- between integrals of different size
- between signed and unsigned integrals
- 2 number pairings using sbyte and byte types
Given that everything is based on 2 pairing, unless I made a typo (which is unlikely since I checked the functions thoroughly) everything should work.
In Either case, encoding random numbers and decoding them and checking if the result is the same as the original given you specific types and such is enough to confirm that it all works.