Ensure Uniqueness of Nullable Columns

Introduction

A couple of weeks ago I encountered a tweet in which one of the questions expected from a candidate in a job interview was the reason to write this post, How do you ensure that duplicate keys are prevented on a column that allows NULL values?

That reminds me of the time that I was focused on databases and I had to handle such a problem for one of our databases, here we are going to talk about two approaches to address this issue in SQL Server.

Problem Space

Consider you have a column, such as Employee with a column, SSN that can be null at some point and will be filled when any SSN number issued for that employee. The script for such a table looks like the next:

CREATE  TABLE   dbo.Employee(
      Id          BIGINT          NOT NULL    IDENTITY
          CONSTRAINT PK_Employee_Id Primary KEY
  ,   FirstName   nVarChar(50)    NOT NULL
  ,   LastName    nVarChar(100)   NOT NULL
  ,   SSN         Char(10)            NULL
          CONSTRAINT UQ_Employee_SSN UNIQUE
)
ON [PRIMARY]
GO

In the preceding code snippet if I add a unique constraint on the SSN column, when I want to insert more than one employee with NULL values for the SSN column we will face an error indicating a UNIQUE KEY violation. Don’t believe me, try it for yourself:

INSERT INTO dbo.Employee( FirstName , LastName , SSN )
    VALUES( 'Jane' , 'Doe' , NULL )

INSERT INTO dbo.Employee( FirstName , LastName , SSN )
    VALUES( 'John' , 'Doe' , NULL )
Msg 2627, Level 14, State 1, Line 1
Violation of UNIQUE KEY constraint 'UQ_Employee_SSN'. Cannot insert duplicate key in object 'dbo.Employee'. The duplicate key value is (<NULL>).

Now that we saw the issue in action, let's touch on the solution. There are some approaches to tackle this issue, of which I explain two of them.

Solution Space

Computed Columns

Computed columns are a special kind of columns in SQL Server that gives the ability to store values based on a formula, in this approach we could define a computed column which takes the unique column, in this example SSN, and another combination of unique columns, I’ll take the primary key. So let's alter the previously defined table; first, we remove the Unique constraint on the SSN column, and then define the computed column and then adding another unique constraint on that:


ALTER TABLE dbo.Employee
    DROP    Constraint UQ_Employee_SSN
GO


ALTER TABLE dbo.Employee
  ADD SSN_CMPTD AS
    CASE
      WHEN SSN IS NOT NULL
        THEN SSN
      ELSE
        CAST( Id AS Char(10) )
    END
GO

ALTER TABLE dbo.Employee
  ADD CONSTRAINT  UQ_Employee_SSN_CMPTD UNIQUE(SSN_CMPTD)
GO

Now try the above-mentioned INSERT commands, this time both will pass, and the next two inserts will also pass as they don’t violate any constraint.

Now if you try to insert a real duplicate key, you will face a unique constraint violation error, but this time because of a real duplicate key, check the value at the end of the error and the column name which the violation occurred on.

So far so good, right? Let’s try the other approach, I preferred when designing such scenarios.

Filtered Indexes

Filtered Index is a feature described on the SQL Server documentation site as an optimized nonclustered index especially suited to cover queries that select from a well-defined subset of data. Okay, let’s remove the computed column and its unique constraint defined earlier. (Alternatively, you could re-create the table without the unique constraint on the SSN column).

ALTER TABLE dbo.Employee
    DROP CONSTRAINT UQ_Employee_SSN_CMPTD
GO

ALTER TABLE dbo.Employee
    DROP   COLUMN SSN_CMPTD
GO


CREATE UNIQUE NONCLUSTERED INDEX Idx_Employee_SSN
    ON  dbo.Employee( SSN )
WHERE   SSN IS NOT NULL
GO

Awesome! You could try the previous insert commands to check against this new unique index.

Conclusion

The latter approach have some advantages over the former, of which the most important one is that we could use this unique index as a Covering Index for queries on the Employee table to prevent a Key Lookup, that could be a boost in performance of queries relating to that table. Moreover, the former approach occupies more space since it has to store values of the computed column somewhere on the disk; another obstacle for the former approach would be the complexity of matching (another) column(s) to satisfy the uniqueness of the computed columns in the absence of an SSN.

I hope you found this post fruitful, have a good evening and enjoy coding!

Buy Me a Coffee at ko-fi.com

Explore Shadow Properties in EF Core

We will take a look at another popular feature, Shadow Properties. When designing applications we tend to keep our code clean and simple, however there are times that you need to add properties other than what is required in your main business use cases, CreatedOn and LastUpdatedOn are such well-known properties.

Custom Configuration Providers in ASP.NET Core

A colleague of mine had a requirement in which he was interested to put the configuration settings of an ASP.NET Core application inside a SQL Server table and read them at startup of the application with a mechanism similar to what we already have for JSON files; also he wanted to have the ability that these variable being overridden or override other settings if applicable. In this article, I am going to describe how that could be achievable and how to implement such feature.

Ditch Connection Strings, Embrace Secure Azure Access

As a developer if you are using Azure for cloud development, one of the first things to do is to authenticate your machine against Azure environment so that Azure knows commands issued against it are from a trusted machine! This is important to also eliminate the need of using connection strings as far as possible! Let's see how easy it is and how it could be done!

An error has occurred. This application may no longer respond until reloaded. Reload x