Saturday, June 1, 2019

NOT NULL Persisted Computed Columns (And What They Allow)

Ok, so I have been, and will mostly stay, in a bit of blog hibernation. I am super busy trying to finish up a rather large number of tasks (a few chapters in a book, 2 more SQL Saturdays, and as much theme park time as I can get in before the end of June, when I am having my knee replaced…) But as I was working today I discovered something that I wanted to throw together a blog on because I doubt most of my readers will know about this feature, and I wanted to try it out to the fullest. I had not seen this before in the course of writing database design/implementation books, coding many databases, etc for 20 years (ok, more than 20 years, since my first book on SQL Server Database Design was 19 years ago!)

This started out to be a quick, simple article, but it grew long to show all the interesting if possibly silly techniques that you can do with NOT NULL persisted computed columns.

I have for a long time known that persisted columns can be used in UNIQUE constraints and indexes. They never needed to be declare as persisted, because the engine would persist them for you when the index was added. I regularly use PERSISTED computed columns because they are important for performance, especially when used in reporting. What I had no idea of was that they could be used with other types of constraints. I am not at all suggesting at this point that you would want to employ every one of the techniques presented, or even that they are all good ideas (in fact, one of these ideas I will present is kind of horrifying from a proper design standpoint.) The way I see it, things you can do with code are like tools you find in that big toolbox you inherited. You play around with them, get a feel for how they work, but mostly have no clue what they are good for until you run into that need.

It is just good to know what “can” be done in code, and this is the point of my post today. Here are some tools you probably didn’t realize you had in your toolbox, and one day you may find a good reason to use them. (And ideally comment on my post with those ideas for the rest of us.)

Note that the examples in this article (other than simple computed columns) only work with persisted computed columns, if you use a non-persisted type, you will receive the following error:

Msg 8183, Level 16, State 1, Line 7
Only UNIQUE or PRIMARY KEY constraints can be created on computed columns, 
while CHECK, FOREIGN KEY, and NOT NULL constraints require that computed 
columns be persisted.

In the following block of code, I will create a table to use for the first set of examples.

USE tempdb;
GO 
CREATE TABLE MainTable
(
        MainTableId int NOT NULL,
        Value  nvarchar(10) NULL
);

Now say we want to make a computed column on the Value column, that upper cases the value. We might add:

ALTER TABLE dbo.MainTable 
   ADD ValueUpper AS (UPPER(Value)) PERSISTED;

If you query the catalog, you can see that the column allows NULL values, which seems obvious since the source column allows NULL values.

SELECT COLUMNS.COLUMN_NAME, COLUMNS.IS_NULLABLE
FROM   INFORMATION_SCHEMA.COLUMNS
WHERE  COLUMNS.TABLE_SCHEMA = 'dbo'
  AND  COLUMNS.TABLE_NAME = 'MainTable'

This returns:

COLUMN_NAME        IS_NULLABLE
------------------ -----------
MainTableId        NO
Value              YES
ValueUpper         YES

The source column allows NULL values, so the computed does too. Let’s add another ValueUpperFormatted, that is definitely NOT NULL because we will base it on a COALESCEd value.

ALTER TABLE dbo.MainTable 
   ADD ValueUpperFormatted AS (UPPER(COALESCE(Value,''))) PERSISTED;

Even though this clearly could not allow NULL values, query the metadata, and you will see it still allows NULL values.

COLUMN_NAME          IS_NULLABLE
-------------------- -----------
MainTableId          NO
Value                YES
ValueUpper           YES
ValueUpperFormatted  YES

We can fix this by declaring that the column is NOT NULL in the creation of the computed column (this was the piece of information I did not realize before today, and I learned it because I noticed it in a script that RedGate’s SQL Compare created.):

ALTER TABLE dbo.MainTable DROP COLUMN ValueUpperFormatted;
ALTER TABLE dbo.MainTable 
    ADD ValueUpperFormatted 
         AS (UPPER(COALESCE(Value,''))) PERSISTED NOT NULL;

Check the metadata again, and you will see that ValueUpperFormatted now will not allow NULL values. 

INSERT INTO dbo.MainTable(MainTableId, Value)
VALUES(1, NULL);

SELECT *
FROM dbo.MainTable;

Which returns:

MainTableId Value      ValueUpper ValueUpperFormatted
----------- ---------- ---------- -----------------
1           NULL       NULL

Let’s clear out the data in the table:

DELETE FROM dbo.MainTable;

And now, what if we define another computed column that clearly could be have a NULL result, but define it as NOT NULL;

ALTER TABLE dbo.MainTable ADD ValueUpperNotNull AS (UPPER(Value)) PERSISTED NOT NULL;

This works, and the metadata indeed says the column does not allow NULL values, so if you try to add data to the table that has a NULL value:

INSERT INTO dbo.MainTable(MainTableId, Value)
VALUES(1, NULL);

You get what would be a bit of a head-scratcher (I am personally glad this is not how I learned of this feature, or it would have driven me a little nuts):

Msg 515, Level 16, State 2, Line 87
Cannot insert the value NULL into column 'ValueUpperNotNull',
table 'tempdb.dbo.MainTable'; column does not allow nulls. 
INSERT fails.

On the other hand, making a computed column not allow NULL values is a fantastic tool for the database designer to know that the expression they have crafted does not match that they expected it to never output a NULL value (and in my opinion, it is better to catch these things BEFORE 2.5 million rows have been inserted that don’t meet the requirements you have set up.)

Next, let’s add a check constraint our computed column. For this example, we are just going to make sure that the value in the table is a palindrome (because this is something that every data architect has come across at least one in their life, right?). So Value = REVERSE(Value);

ALTER TABLE dbo.MainTable
  ADD CONSTRAINT CHK_MainTable_ValueIsAPalindrome 
        CHECK (MainTable.ValueUpper = REVERSE(Value) 
               AND LEN(Value) > 1);

We don’t need to check for NULL values, if the Boolean expression is TRUE or NULL, the operation will succeed.

INSERT INTO dbo.MainTable(MainTableId, Value)
VALUES(1,'abcd');

This throws the following error:

Msg 547, Level 16, State 0, Line 110
The INSERT statement conflicted with the CHECK constraint
"CHK_MainTable_ValueIsAPalindrome". The conflict occurred in 
database "tempdb", table "dbo.MainTable".

But the following code:

INSERT INTO dbo.MainTable(MainTableId, Value)
VALUES(1,'dcbabcd');

Succeeds. Using the UPPER value, lets this work with case sensitive data as well, so that might be useful. In reality, if you put anything semi-complex in the computed column, based on multiple other columns, it might be good to make sure that the output is in some allowed range. For example, say you had a complex CASE expression, you could have NULL as the default, and make the column fail on a NULL value, but that could be more confusing than using a CHECK constraint with at least a descriptive name to point out the invalid value.

—————-

Speaking of confusing, for fun, there are two other types of things we can do with a persisted computed column that are pretty interesting, if perhaps at least somewhat impractical.This is using one as a PRIMARY KEY or a FOREIGN KEY.

In the following table, we will let the user enter an integer value for the base value of the key, then use a persisted computed column to format the value and use this as the PRIMARY KEY value. The datatype of a computed column can be interesting based on whatever the source data is, so we need to make the size exactly defined for when we associate the foreign key, hence the cast to nchar(6)):

CREATE TABLE dbo.Type1 --generic, domain style data
(
        Type1Base int NOT NULL,
        Type1Id AS CAST((N'1' +       
                      RIGHT(CONCAT('00000',Type1Base),5)) 
                      AS nchar(6)) PERSISTED NOT NULL 
                        CONSTRAINT PKType1 PRIMARY KEY
);
--Then insert a couple of rows
INSERT INTO dbo.Type1(Type1Base) VALUES (1),(2)
GO
-- and retrieve them:
SELECT *
FROM   dbo.Type1;

You can see that the data is entered and is formatted:

Type1Base   Type1Id
----------- -------
1           100001
2           100002

Finally, let’s do a really sort of scary example of a way to implement a FOREIGN KEY constraint on a column that contains values that could be for multiple tables. WARNING: This example feels completely useless as a “real world” technique to me. Using the toolbox analogy, this is akin to that proprietary wrench you got in a box of cereal when you were 10 and left it in the toolbox. Probably useless, but a tool, nonetheless.

We will create another domain table for another type:

CREATE TABLE dbo.Type2 --generic, domain style data
(
        Type2Base int NOT NULL,
        Type2Id AS CAST((N'2' +       
                      RIGHT(CONCAT('00000',Type2Base),5)) 
                      AS nchar(6)) PERSISTED NOT NULL 
                        CONSTRAINT PKType2 PRIMARY KEY
);
--Then insert a couple of rows
INSERT INTO dbo.Type2(Type2Base) VALUES (1),(2)

So now, someone has mixed Type1 and Type2 data in a column, and you cannot change that about your data, but you want to validate it (and doing this in a constraint is usually better, plus, in the final output you will have the “proper” versions of the columns in your table as well.

CREATE TABLE dbo.TypedData
(
        TypedDataId int 
            CONSTRAINT PKTypeData PRIMARY KEY,
        TypeId nchar(6) NOT NULL
);
INSERT INTO dbo.TypedData(TypedDataId, TypeId)
VALUES(1,N'100001'),(2, N'100002'),
      (3, N'200001'),(4, N'200002')
GO
SELECT *
FROM  dbo.TypedData

This returns the following:

TypedDateId TypeId
----------- ------
1           000001
2           000002
3           100001
4           100002

Since we know that the algorithm to tell the source values apart is based on a prefix (it might be another column, or maybe a more clever parsing system than 1 for Type1, also). We can add:

ALTER TABLE dbo.TypedData
        ADD Type1Id AS 
        CASE WHEN TypeId LIKE '1%' THEN TypedData.TypeId END   
                                                   PERSISTED;
ALTER TABLE dbo.TypedData
        ADD Type2Id AS 
        CASE WHEN TypeId LIKE '2%' THEN TypedData.TypeId END  
                                                   PERSISTED;

Again, note: this is not a good design. Seriously. If you are making another human being do this because of how you implemented a databases, it is not a good thing.  Next, add FOREIGN KEY constraints to the two columns:

ALTER TABLE dbo.TypedData
        ADD CONSTRAINT TypeData$References$Type1 
        FOREIGN KEY (Type1Id) REFERENCES dbo.Type1(Type1Id);
ALTER TABLE dbo.TypedData
        ADD CONSTRAINT TypeData$References$Type2 
        FOREIGN KEY (Type2Id) REFERENCES dbo.Type2(Type2Id);

Now, through a bit of a roundabout manner, we are able to validate that the data in the TypeId column is either that of a row in Type1 or Type2 (additional work would be required to make sure that the value in TypeId follows the expected formats, for example, adding a check constraint that makes sure either Type1Id or Type2Id are not both NULL.

ALTER TABLE dbo.TypedData
        ADD CONSTRAINT CHKTypedData_ValidateTypeId 
           CHECK (Type1Id IS NOT NULL OR Type2Id IS NOT NULL);

Now you can test it out by trying to put in invalid data, such as a value prefixed with a 3.

INSERT INTO dbo.TypedData(TypedDataId, TypeId)
VALUES(5,N'300000')

This gives you an error in the check constraint:

Msg 547, Level 16, State 0, Line 198
The INSERT statement conflicted with the CHECK constraint
"CHKTypedData_ValidateTypeId". The conflict occurred in 
database "tempdb", table "dbo.TypedData".

If the data is formatted correctly with a prefix of 1:

INSERT INTO dbo.TypedData(TypedDataId, TypeId)
VALUES(5,N'100006');

This gives you an error with the foreign key constraint. Prefix the invalid value with a 2, and you will see the error with the other FOREIGN KEY constraint. Put in a proper value:

INSERT INTO dbo.TypedData(TypedDataId, TypeId)
VALUES(5,N'200001');

Success. (Well, it works, you still are stuck with some really weird code to deal with!)

 

The post NOT NULL Persisted Computed Columns (And What They Allow) appeared first on Simple Talk.



from Simple Talk http://bit.ly/2W81BEU
via

No comments:

Post a Comment