Wednesday, May 13, 2020

Database Kill and Fill

When testing database code in development, you usually have a lot of tests that aim to prove that the various processes such as ETL, reporting or the processing of an order, still work as they always did. If it is a changed process, you need to ensure that, from a known dataset, the process yields the result that the business expects. These tests happen as part of both development and deployment.

These datasets are generally created for each release and maintained to keep in step with the current development, being saved on the local filesystem as native BCP files. Sometimes, all or part of the live data is used, pseudonymised or masked where necessary. It is also handy to generate data, especially when you need large datasets to check that the database can scale appropriately.

Copying a database using a BCP dataset from a DOS script or from PowerShell is fairly quick and trivial, but what if the data is already in another copy of the database on the same instance? Imagine you have a database build that is a copy of an existing database on the same instance and you want to run a ‘destructive’ test on the data, and do it over and over again.

Deleting all the data in a database should, you’d have thought, be trivial. Truncation doesn’t work because it doesn’t like to truncate tables with foreign keys, even if they are disabled. DELETE is safer but you will need to then reset any identity columns. Deleting large tables in one chunk isn’t scalable, so you have to do it in smaller chunks. I like to disable triggers while I do this as well. The only disadvantage comes if you have triggers that are supposed to be disabled because the simple code just re-enables them all.

Killing data

Basically, if you are prepared to use sp_MSforeachtable, then this code is all reasonably simple. This will delete all the data from Adventureworks in about half a minute. It does Pubs in a second! Don’t use it without due care! This code is dangerous in untutored hands. You can try it out within a transaction and roll it back while you are testing.

---firstly delete all existing data
DISABLE TRIGGER ALL ON DATABASE;
--now disable all constraints
EXEC sp_MSforeachtable 'ALTER TABLE ? NOCHECK CONSTRAINT ALL';
--delete the data from each table in turn, doing tables with more than 10,000 rows in chuncks
EXEC sp_MSforeachtable '
Print ''Deleting all the data from ?''
SET QUOTED_IDENTIFIER ON;
DECLARE @SoFarDeleted INT=1;
WHILE (@SoFarDeleted  > 0 and @@Error=0)
  BEGIN
   -- Delete a chunk of rows at a time
     DELETE TOP (10000) ?
   SET @SoFarDeleted  = @@ROWCOUNT;
END ';
--switch on all constraints
EXEC sp_MSforeachtable 'ALTER TABLE ? WITH CHECK CHECK CONSTRAINT ALL';
-- Reseed identity columns if the table has an identity column
EXEC sp_MSforeachtable 'IF ObjectPropertyEx(Object_Id(''?''),''TableHasIdentity'')=1 DBCC CHECKIDENT (''?'', RESEED, 0)';
-- and enable all triggers
ENABLE TRIGGER ALL ON DATABASE;

 

Filling data

Now to get that data back in. you have a copy of the database on the same server so you use that. All you need to do, surely is to …

INSERT INTO tableA SELECT * FROM otherDatabase.tableA;

… for all the tables in the database? No. For a start, you cannot insert into a timestamp column, and you need to SET IDENTITY INSERT TableA ON in order to insert into TableA, which has an identity column. If course, if you try this with a table that hasn’t an identity column you get an error. Inserting XML into an XML column results in an error unless you do an explicit conversion (because of the possibility of a different XML Schema). You have to ignore calculated columns as well. You can’t insert into a calculated column or a timestamp column. I have no idea whether it is possible or desirable to insert into a hidden column so I disallowed that too. Change it to taste.

Out of the box, this code doesn’t work on older versions of SQL Server because I use the lovely string_agg function to generate lists of columns. You can change this to the classic XML trick to get the same result if you are stuck on an old version. Anything older than SQL Server 2016 didn’t have the is_hidden attribute in the sys.columns system view

Here, I have it set, for the sake of demonstration, to delete all the data from a database called AdventureworksCopy, and fill it again from Adventureworks2016. You’ll want to change that!

USE AdventureworksCopy;
GO
DECLARE @source sysname = 'Adventureworks2016';-- the source of the database

---firstly delete all existing data
DISABLE TRIGGER ALL ON DATABASE;
--now disable all constraints
EXEC sp_MSforeachtable 'ALTER TABLE ? NOCHECK CONSTRAINT ALL';
--delete the data from each table in turn, doing tables with more than 10,000 rows in chuncks
EXEC sp_MSforeachtable '
Print ''Deleting all the data from ?''
SET QUOTED_IDENTIFIER ON;
DECLARE @SoFarDeleted INT=1;
WHILE (@SoFarDeleted  > 0 and @@Error=0)
  BEGIN
   -- Delete a chunk of rows at a time
     DELETE TOP (10000) ?
   SET @SoFarDeleted  = @@ROWCOUNT;
END ';
--switch on all constraints
EXEC sp_MSforeachtable 'ALTER TABLE ? WITH CHECK CHECK CONSTRAINT ALL';
-- Reseed identity columns if the table has an identity column
EXEC sp_MSforeachtable 'IF ObjectPropertyEx(Object_Id(''?''),''TableHasIdentity'')=1 DBCC CHECKIDENT (''?'', RESEED, 0)';
-- and enable all triggers
ENABLE TRIGGER ALL ON DATABASE;
--create a table variable giving all the table names so we can do each one
DECLARE @tables TABLE
  (
  TableName_id INT IDENTITY,
  TheObject_ID INT NOT NULL,
  TheName sysname NOT NULL,
  TheSchema sysname NOT NULL
  );
--fill the table with the names of the tables we need to fill
INSERT INTO @tables (TheObject_ID, TheName, TheSchema)
  SELECT object_id, name, Object_Schema_Name(object_id) FROM sys.tables;
--and to prevent a generated script getting too untidy ---
DECLARE @CRLF CHAR(2) = '
';
--      We disable all constraints now   
DECLARE @Script NVARCHAR(MAX) =
  'EXEC sp_msforeachtable "ALTER TABLE ? NOCHECK CONSTRAINT ALL"' + @CRLF;
DECLARE @ii INT, @iiMax INT; ---our iteration counter
DECLARE @TheTableName sysname, @TheSelectColumns NVARCHAR(MAX),
  @TheInsertColumns NVARCHAR(MAX), @TheTable_id INT, @HasIdentityCol INT;
--set our iteration counters
SELECT @ii = Min(TableName_id), @iiMax = Max(TableName_id) FROM @tables;
--we need to have a table with all the source tables in it just in case we
--have an extra table in the destination with no equivalent in the source
DECLARE @sourceTables TABLE (TableName sysname);
DECLARE @Expression NVARCHAR(MAX) = --the expression we execute
  N'USE ' + @source
  + ' SELECT QuoteName(Object_Schema_Name(t.object_id)) + ''.'' + QuoteName(t.name)  FROM sys.tables t';
--and get the list of the tables in the other database
INSERT INTO @sourceTables EXECUTE (@Expression);
--now we can get cracking and generate a script that fills each table in turn
WHILE @ii <= @iiMax
  BEGIN
    SELECT @TheTable_id = TheObject_ID,
      @TheTableName = QuoteName(TheSchema) + '.' + QuoteName(TheName)
      FROM @tables
      WHERE TableName_id = @ii; --get the table name
    SELECT @ii = @ii + 1; --don't fill it if it isn't in the source
    IF @TheTableName NOT IN (SELECT TableName FROM @sourceTables) CONTINUE;
    SELECT @TheInsertColumns = --get the list of columns in the insert expression
      String_Agg(QuoteName(columns.name), ', ') WITHIN GROUP(
      ORDER BY column_id ASC),
      @TheSelectColumns =--get the list of columns in the select expression
        String_Agg( CASE WHEN types.name = 'xml'  --convert XML types 
                                        THEN 'Convert(XML, ' + QuoteName(columns.name) + ')' 
                                        ELSE QuoteName(columns.name) END,
                    ', '
                  ) WITHIN GROUP(ORDER BY column_id ASC)
      FROM sys.columns
        INNER JOIN sys.types
          ON types.user_type_id = COLUMNS.user_type_id
      WHERE object_id = @TheTable_id
        AND is_hidden = 0 --don't use hidden columns
        AND is_computed = 0--don't use computed columns either
        AND types.name NOT LIKE 'timestamp';--don't use timestamp columns
    SELECT @HasIdentityCol =Convert(INT,ObjectPropertyEx(@TheTable_id ,'TableHasIdentity'))
    SELECT @Script =
      @Script
      + CASE WHEN @HasIdentityCol > 0 THEN /*IF there is an identity column we 
          need to set identity insert on for the table */
               'SET IDENTITY_INSERT ' + @TheTableName + ' ON  ' ELSE '' END
      + @CRLF + N'Print ''Adding data to table ' + @TheTableName + N' '''
      + @CRLF + N'INSERT INTO ' + @TheTableName + N'(' + @TheInsertColumns
      + N') ' + @CRLF + N'SELECT ' + @TheSelectColumns + N' FROM ' + @source
      + N'.' + @TheTableName + @CRLF
      + CASE WHEN @HasIdentityCol > 0 THEN /*IF there is an identity column we 
          switched identity insert on so we need to switch it off */
               'SET IDENTITY_INSERT ' + @TheTableName + ' OFF' + @CRLF ELSE '' END
      + @CRLF;
  END;
SELECT @Script =
  @Script
  + '
  EXEC sp_msforeachtable "ALTER TABLE ? WITH CHECK CHECK CONSTRAINT ALL"';

EXECUTE (@Script);
GO

Conclusions

Am I recommending Kill and Fill? It depends on what sort of testing you are doing. A BCP process is always nice, quick and clean, though it still needs the kill part of the process, but it is best done from a Dos shell script or a PowerShell script. This Kill and Fill process is easy to set up though I don’t think it is any quicker than a bulk process. I just thought I should make it available just in case anyone else needed a different way of filling a development database with data.

The post Database Kill and Fill appeared first on Simple Talk.



from Simple Talk https://ift.tt/3buySSk
via

No comments:

Post a Comment