UUID (Universally Unique IDentifier but sometimes they’re also called GUID) is a simple and safe method to univocally identify any kind of information: they can be used to identify transactions, nodes of a filesystem, rows on a database or any other kind of object that needs an id.
KEY FACTORS
- UUIDs are unique even without a software or any other kind of authority that generates them
- UUIDs are always made up by 128 bit
- The risk of collisions is negligible
- The generation process is very fast and safe
- There are five versions of UUID, each one suitable for a specific task
FORMAT
For ease of use UUIDs are always presented as a fixed length sequence of hex numbers, divided into five groups merged by a hyphen:
970ea114-fa06–472f-acca-df384a39bf7e
Going deeper, the parts are named as follows:
An interesting fact is that the version bits are at a fixed position, so that you can know the UUID version by looking at the first character of the third group (in the above example it’s “4”).
Two bits of clk_low section are reserved and their value is constant.
An exception case is the Nil UUID, made up by zeros:
00000000-0000-0000-0000-000000000000
UUID VERSIONS
RFC4122 defines five versions of UUIDs, each one suitable for a different use, but — truly — you will end up using version 1, 5 and maybe 4.
VERSION 1: TIMESTAMP and MAC ADDRESS
This version is based on a timestamp and the mac address of the network card, by combining these two parameters we otain UUIDs that are:
- unique over time: the same computer cannot regenerate the same UUID twice, nor during the same timestamp nor in the future
- unique over space: UUIDs generated by different computers at the same time will always be different (thanks to their mac address)
Here’s their composition:
- timestamp is a 60 bit number obtained from the current UTC timestamp, using a resolution up to 100 microseconds
- clock sequence is a 14 bit unsigned integer used to disambiguate when many uuids are made before the timestamp changes
- MAC address is primary network card identifier. This value can be faked if a network card is not available.
V1 should be the one that fits most use cases: it’s quick, safe and unrepeatable.
Bad news: it is possible to track down the computer that has created a UUID by it’s mac address, if this is a potential security issue for your project, you should then take a look at version 4.
VERSION 4: RANDOM MADNESS
The easiest one: every byte of the UUID is randomly generated, with the exception of the version bits that are constant.
All this randomness comes at the cost: generating a large amount of ids can be much slower compared to other versions, so, think wisely before using v4 on a large scale.
VERSION 5: NAME and NAMESPACE
This version involves two inputs called name and namespace:
- name is any sequence of bytes, such as a string
- namespace must be a valid UUID
UUID v5 is obtained from sha1 hash of the namespace concatenated with the name, thus leading to a UUID that is predictable:
- the same name and namespace will always produce the same UUID
- the same name on different namespaces will produce different UUIDs
- different names on the same namespace will produce different UUIDs
A possible use scenario for this kind of uuid is the storage of encoded passwords in a database or to verify the integrity of a string.
OTHER VERSIONS
What about version 2 and 3?
- version 2 also called “DCE security” is a variant of version 1 that introduces a “local domain” value and a shorter timestamp (only 28 bit).
This version is barely documented and most libraries do not implement it at all, so there is no reason to use it. - version 3 works just like version 5 but uses md5 instead of sha1, for this reason it is considered less safe and its usage is discouraged.