The library develops an identity schema suitable for rough, lexicographical objects ordering in distributed systems.
The library provide interface to generate k-ordered unique identifiers in lock-free and decentralized manner for Erlang application.
An array
A[1..n]
is said to be k-ordered ifA[i − k] ≤ A[i] ≤ A[i + k]
for alli
such thatk < i ≤ n−k
.
The library aims following objectives:
-
use 64-bit for storage friendly identity schema. It allows to reduce indexes footprints and optimize lookup latency.
-
generated values are suitable for partial event ordering in distributed environment and helps on detection of causality violation.
-
values are roughly sortable by time of allocation.
-
the concept is similar to Twitter's snowflake identity.
The library has developed a dual schema: 64-bit unique identity to be used within Erlang virtual machine and 96-bit unique identity for distributed environments. Client applications benefits from short 64-bit identity using it locally and dynamically switches schema while communicating with remote peers.
The k-ordered value consists of time-stamp with millisecond resolution (50-bit) that is used to roughly sort events. The time-stamp ensures distinct sorting within virtual machine where system clock is controlled by operation system. A locally monotonic padding (14-bits) prevents the collisions (Note: 14-bits allows to have about 16K allocations per millisecond).
The time-stamp based sorting fails on distributed systems unless it implements precises time synchronization protocol. Therefore, it was decided to uses location aspect (node fingerprint) as component of k-ordered value. This component allows to keep ordering consistent even if clocks on other node is skewed. The developed schema allows to control the quality of the ordering (precision) depending on the length of neighborhood interval. The neighborhood interval is a time span where the location has higher priority then time. The usage of this time interval relaxes a clock synchronization requirements while preserving an ordering.
The library represents k-ordered values as tuples but implements binary and url friendly encoding similar to base64. The serialization protocol has been changed after library version 1.2.0
. New serialization improves allocation performance of k-order values. It uses Erlang OTP/18 feature erlang:unique_integer(...)
to generate locally monotonic value. The library allocates about 13M k-ordered values per second on reference hardware.
20 bit 20 bit 10 bit 14 bit
|--------------|--------------|----------|-----------------|
A B C Seq
A, B, C - time stamp in micro seconds (casted to 50 bits)
Seq - sequence value generated by erlang:unique_integer([monotonic])
-type l() :: {uid, t(), seq()}.
-
50-bit time-stamp with millisecond resolution. The library uses Erlang native representation of time-stamp:
A
mega seconds 10⁶,B
seconds,C
micro seconds 10⁻⁶. -
Seq
is 14-bit monotonic sequence identifier obtained fromerlang:unique_integer(...)
.
20 bit 20 - I bit 32 bit I bit 10 bit 14 bit
|--------------|--------------|--------------|----------|----------|-----------------|
A B0 Node B1 C Seq
A, B, C - time stamp in microseconds (casted to 50 bits)
Seq - sequence value generated by erlang:unique_integer([monotonic])
Node - node fingerprint erlang:phash(erlang:node(), 1 bsl 32).
I - time neighborhood interval
-type g() :: {uid, id(), t(), seq()}.
-
50-bit time-stamp with millisecond resolution. The library uses Erlang native representation of time-stamp:
A
mega seconds 10⁶,B
seconds,C
micro seconds 10⁻⁶. -
Seq
is 14-bit monotonic sequence identifier obtained fromerlang:unique_integer(...)
. -
Node
32-bit (optionally 64-bit) represent unique node identifier allocated randomly to each node (derived as hash from long node name). Node id has higher sorting priority then seconds fraction of time stamp. This allows to build reliable sorting without time synchronization. -
I
is length of neighborhood interval that is configurable by application according to its needs. Its maximum value 10 bits that roughly corresponds to 15 minutes (default one 8 bits - 5 minutes).
The latest version of the library is available at its master
branch. All development, including new features and bug fixes, take place on the master
branch using forking and pull requests as described in contribution guidelines.
The stable library release is available via hex packages, add the library as dependency to rebar.config
{deps, [{uid}]}.
Libraries implements a simple api.
%%
%% generate local identity
%% {uid,{1539,625275,432128},1}
A = uid:l().
%%
%% convert local identity to global one
%% {uid,'uid@127.0.0.1',{1539,625275,432128},1}
B = uid:g(A).
%%
%% encode value to binary
%% <<0,96,57,138,3,235,39,50,123,105,128,1>>
uid:encode(B).
%%
%% encode value to base64
%% <<".5.tXVEf8n8vPN.0">>
uid:encode64(B).
The library is Apache 2.0 licensed and accepts contributions via GitHub pull requests:
- Fork it
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Added some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create new Pull Request
The build process requires Erlang/OTP version 19.0 or later and essential build tools.
Build and run service in your development console. The following command boots Erlang virtual machine and opens Erlang shell.
git clone https://github.com/fogfish/uid
cd uid
make
make run
The commit message helps us to write a good release note, speed-up review process. The message should address two question what changed and why. The project follows the template defined by chapter Contributing to a Project of Git book.
Short (50 chars or less) summary of changes
More detailed explanatory text, if necessary. Wrap it to about 72 characters or so. In some contexts, the first line is treated as the subject of an email and the rest of the text as the body. The blank line separating the summary from the body is critical (unless you omit the body entirely); tools like rebase can get confused if you run the two together.
Further paragraphs come after blank lines.
Bullet points are okay, too
Typically a hyphen or asterisk is used for the bullet, preceded by a single space, with blank lines in between, but conventions vary here
If you experience any issues with the library, please let us know via GitHub issues. We appreciate detailed and accurate reports that help us to identity and replicate the issue.
-
Specify the configuration of your environment. Include which operating system you use and the versions of runtime environments.
-
Attach logs, screenshots and exceptions, in possible.
-
Reveal the steps you took to reproduce the problem, include code snippet or links to your project.