An exhaustive introduction to UUIDs

UUID (Universally Unique IDentifier but sometimes they’re also called GUID) is a simple and safe method to univocally identify any kind of information: they can be used to identify transactions, nodes of a filesystem, rows on a database or any other kind of object that needs an id.

KEY FACTORS

  • UUIDs are unique even without a software or any other kind of authority that generates them
  • UUIDs are always made up by 128 bit
  • The risk of collisions is negligible
  • The generation process is very fast and safe
  • There are five versions of UUID, each one suitable for a specific task

    FORMAT

    For ease of use UUIDs are always presented as a fixed length sequence of hex numbers, divided into five groups merged by a hyphen:

    970ea114-fa06–472f-acca-df384a39bf7e

    Going deeper, the parts are named as follows:

    An interesting fact is that the version bits are at a fixed position, so that you can know the UUID version by looking at the first character of the third group (in the above example it’s “4”).

    Two bits of clk_low section are reserved and their value is constant.

    An exception case is the Nil UUID, made up by zeros:

    00000000-0000-0000-0000-000000000000

    UUID VERSIONS

    RFC4122 defines five versions of UUIDs, each one suitable for a different use, but — truly — you will end up using version 1, 5 and maybe 4.

    VERSION 1: TIMESTAMP and MAC ADDRESS

    This version is based on a timestamp and the mac address of the network card, by combining these two parameters we otain UUIDs that are:

    • unique over time: the same computer cannot regenerate the same UUID twice, nor during the same timestamp nor in the future
    • unique over space: UUIDs generated by different computers at the same time will always be different (thanks to their mac address)

    Here’s their composition:

    • timestamp is a 60 bit number obtained from the current UTC timestamp, using a resolution up to 100 microseconds
    • clock sequence is a 14 bit unsigned integer used to disambiguate when many uuids are made before the timestamp changes
    • MAC address is primary network card identifier. This value can be faked if a network card is not available.

    V1 should be the one that fits most use cases: it’s quick, safe and unrepeatable.

    Bad news: it is possible to track down the computer that has created a UUID by it’s mac address, if this is a potential security issue for your project, you should then take a look at version 4.

    VERSION 4: RANDOM MADNESS

    The easiest one: every byte of the UUID is randomly generated, with the exception of the version bits that are constant.

    All this randomness comes at the cost: generating a large amount of ids can be much slower compared to other versions, so, think wisely before using v4 on a large scale.

    VERSION 5: NAME and NAMESPACE

    This version involves two inputs called name and namespace:

    • name is any sequence of bytes, such as a string
    • namespace must be a valid UUID

    UUID v5 is obtained from sha1 hash of the namespace concatenated with the name, thus leading to a UUID that is predictable:

    • the same name and namespace will always produce the same UUID
    • the same name on different namespaces will produce different UUIDs
    • different names on the same namespace will produce different UUIDs

    A possible use scenario for this kind of uuid is the storage of encoded passwords in a database or to verify the integrity of a string.

    OTHER VERSIONS

    What about version 2 and 3?

    • version 2 also called “DCE security” is a variant of version 1 that introduces a “local domain” value and a shorter timestamp (only 28 bit).
      This version is barely documented and most libraries do not implement it at all, so there is no reason to use it.
    • version 3 works just like version 5 but uses md5 instead of sha1, for this reason it is considered less safe and its usage is discouraged.

    References:
    Wikipedia, RFC4122