# NewId

NewId generates sequential unique identifiers that are 128-bit (16-bytes) and fit nicely into a Guid. It was inspired from Snowflake (opens new window) and flake (opens new window).

# The Problem

Many applications use unique identifiers to identify data. Common approaches applications use to generate unique identifiers in a relational database delegate identifier generation to the database, using an identity column or another similar auto-incrementing value.

While this approach can be adequate for a small application, it quickly becomes a bottleneck at scale. And it's a common problem, as evidenced by this post (opens new window) from Twitter Engineering in 2010.

Quote

We needed something that could generate tens of thousands of ids per second in a highly available manner. This naturally led us to choose an uncoordinated approach.

A key use case, specifically related to MassTransit, is applications that use messages to communicate between services – which is common in a service-based architecture. In these applications, sequential identifiers generated by NewId can serve dual purposes. First and foremost, it is a sequential unique identifier. Second, it is also a timestamp, as every NewId includes a UTC timestamp.

# Why does order matter now?

For a .NET developer, it is easy to reach for Guid.NewGuid() and run with it. And while that works, the identifiers created are not sequential. They're completely randomized. And when it comes to data, being able to sort it matters. Using a uniqueidentifier column as a primary key clustered index with SQL Server was frowned upon for years because it caused massive index fragmentation. This led developers to use an int (or bigint once they realized that four billion isn't a lot) primary key and create a separate unique index on the uniqueidentifier column (to use the AK, one might say, it wasn't a good day).

# The Solution

NewId was created to solve the problem. NewId generates sequential 128-bit identifiers that are collation compatible with SQL Server as a clustered primary key. Using the host MAC address, along with an optional offset (in case multiple processes are on the same host), combined with a timestamp and an incrementing sequence number, generate identifiers are unique across a network of systems and can be safely inserted into a database without conflicts.

NewId is largely inspired by the Erlang library flake (opens new window), which adopted an approach of generating 128-bit, k-ordered ids (read time-ordered lexically) using the machines MAC, timestamp and a per-thread sequence number. These identifiers are sequential and do not collide in a cluster of nodes running applications that use these as UUIDs.

# Using NewId

NewIds can be generated using one of two methods. The first returns a NewId, whereas the second returns a Guid.

NewId newId = NewId.Next();

Guid guid = NewId.NextGuid();

NewId implements many of the same methods and constructors as Guid, and can be converted to and from a Guid.

// Formats to 11790000-CF25-B808-2365-08D36732603A
string identifier = NewId.Next().ToString("D").ToUpperInvariant();

// Convert from a string
NewId newId = new NewId("11790000-cf25-b808-dc58-08d367322210");

// Convert from a byte array
var bytes = new byte[] { 16, 23, 54, 74, 21, 14, 75, 32, 44, 41, 31, 10, 11, 12, 86, 42 };
NewId newId = new NewId(bytes);

# Configuration

Some features of NewId can be configured.

# Process Id

In cases where multiple processes are on the same host generating identifiers, it may be necessary to include the processId when generating identifiers. To enable the use of the processId, call the method below on startup.

NewId.SetProcessIdProvider(new CurrentProcessIdProvider());

This will replace two of the six network address bytes with the current processId.

WARNING

There are situations where using a predictable, sequential identifier is discouraged – cases where unpreditability is a desired feature. These include:

  • Generating passwords
  • Creating security tokens
  • Anything where someone should not be able to guess an identifier

NewId generated identifiers may expose the MAC address of the machine that generated the identifier along with the time the identifier was generated. While this isn't typically an issue in the modern world of networked computers with soft MAC addresses, some security-sensitive applications may need to be aware of the algorithm and any ramifications.

Oh, and don't do modulo 2 arithmetic on NewId-generated Guids with an expectation of random distribution.