Tuesday, January 10, 2006

Time based UUID Generation Algorithm

We had a requirement recently that we should map files to UUIDs. This gives us the flexibility to refer to a file without using a name thereby enabling us to rename it. So we dug a lil bit on UUIDs. java.util.UUID is the UUID implementation in java and this is the RFC its linked to.

So basically a UUID (java.util.UUID) represents a 128-bit value. These bits are split as


32 bits time_low
16 bits time_mid
16 bits time_hi_and_version
16 bits clock sequence
48 bits node

Timestamp is a 60 bit value of the UTC as a count of 100-nanosecond intervals since 00:00:00.00, 15 October 1582.

Clock Sequence is used to help avoid duplicates that could arise when the clock is set backwards in time or if the node ID changes.The clock sequence MUST be originally (i.e., once in the lifetime of a system) initialized to a random number to minimize the correlation across systems. If the previous value of the clock sequence is known, it can just be incremented; otherwise it should be set to a random or high-quality pseudo-random value.

Node For UUID version 1, the node field consists of an IEEE 802 MAC address, usually the host address.For UUID version 3 or 5, the node field is a 48-bit value constructed from a name. For UUID version 4, the node field is a randomly or pseudo-randomly generated 48-bit value.

There are four different basic types of UUIDs: time-based, DCE security, name-based, and randomly generated UUIDs. These types have a version value of 1, 2, 3 and 4, respectively. Lets look at the time based UUID generation algo.

Time based UUID creation Algorithm

These are the steps as present in the RFC. All italics are my comments..

1) Obtain a system-wide global lock - How ? Simple use a java.nio.channels.FileLock ! This is what the java.util.logging framework uses to ensure log entries are not overwritten when used from multiple JVMs.
2) From a system-wide shared stable store (e.g., a file), read the UUID generator state: the values of the timestamp, clock sequence, and node ID used to generate the last UUID.
3) Get the current time as a 60-bit count of 100-nanosecond intervals since 00:00:00.00, 15 October 1582.
4) Get the current node ID.
5) If the state was unavailable (e.g., non-existent or corrupted), or the saved node ID is different than the current node ID, generate a random clock sequence value - If someone deletes the state store file, then since we start off with a random number we can still get a unique UUID.
6) If the state was available, but the saved timestamp is later than the current timestamp, increment the clock sequence value. - Ingenious !! This means if u revert back ur clock the UUID will still remain unique.
7) Save the state (current timestamp, clock sequence, and node ID) back to the stable store.
8) Release the global lock.
9) Format a UUID from the current timestamp, clock sequence, and node ID values.

The algorithm looks very simple and elegant, and though there are other ways of getting UUIDs this is the one that is easy to understand.

Subscribe to comments for this post

1 comment:

Anonymous said...

I have implemented this in pure Java at
head-tech.com/uuid

 
Clicky Web Analytics