Saturday, December 17, 2011

How To Use Linked Server

The Linked Servers option allows you to connect to another instance of SQL Server running on a different machine, perhaps remotely in a different city/country. This can be useful if you need to perform distributed queries (query a remote database). Setting up a linked server is quite straight forward in SSMS, all you need is details of the remote server, and the database that you need to query.
  1. Navigate to Server Objects > Linked Servers
  2. Right click on Linked Servers and select New Linked Server...
  3. Complete the details for the linked server. In this example, Data source refers to the name of the SQL Server machine ("Barts_database_server"), Catalogue refers to the name of the database ("Barts_database"). You can also configure options in the other two tabs depending on your requirements.


Distributed Queries

Once you have configured your linked server, you will be able to run queries etc against it. When you run a query against a linked server, it is referred to as a distributed query.
When you execute a distributed query against a linked server, you must include a fully qualified, four-part table name for each data source to query. This four-part name should be in the form linked_server_name.catalog.schema.object_name.
Here's an example:
Barts_database_server.Barts_database.Person.Enemy
This example is based on the linked server example above. It assumes that the remote database has a schema called "Person" and a table called "Enemy".

Difference between sql server 2005 and sql server 2008

The features that have ms sql server 2008 over 2005 is following.here we are going to discuss some points
  1.  SQL server 2008 has the ability to encrypt the entire database by using (TDE)).I.e  transparent data encryption. Where as in sql server 2005 we have cell-level encryption.
  2. SQLserver 2008 provides Backup encryption and that will executed at backup time to prevent tampering.
  3. External Key Management. Storing Keys separate from the data.
  4. With SQL Server Audit, SQL Server 2008 introduces an important new feature that provides a true auditing solution for enterprise customers. While SQL Trace can be used to satisfy many auditing needs, SQL Server Audit offers a number of attractive advantages that may help DBAs more easily achieve their goals such as meeting regulatory compliance requirements. These include the ability to provide centralized storage of audit logs and integration with System Center, as well as noticeably better performance.
  5. Data Compression. Fact Table size reduction and improved performance.
  6. Collection of performance monitoring tools.With Performance Data Collector in SQL Server 2008, you can now store performance data from a number of SQL Servers in one central location. This data is collected by a collection set on each server and stored in a shareable management data warehouse (MDW). Reports can be generated from this data using the built-in reports or generating your own with reporting Services. Brad McGehee explains more.
  7. installation improvements. Disk images and service pack uninstall options.
  8. Dynamic Development. New ADO and Visual Studio options as well as Dot Net 3.
  9. Entity Data Services. Line Of Business (LOB) framework and Entity Query Language (eSQL)
  10. LINQ. Development query language for access multiple types of data such as SQL and XML.
  11. Data Synchronizing. Development of frequently disconnected applications.
  12. Large UDT(User-defined types). No size restriction on UDT(User-defined types).UDTs were restricted to a maximum size of 8 kilobytes. In SQL Server 2008, this restriction has been removed for UDTs that have a format of UserDefined
  13.  

Tuesday, December 13, 2011

Recovery Models In SQL Server

The first thing you  have to remember before back up of sql server data base is Recovery model..
The Recovery Model describes what data to keep in the transaction log file and for how long.By recovery model types you can select these options.
Basically we  have three types of recovery models
1)Full
2)Simple
3)Bulk-Logged
every data base has only one recovery model.but each data base use a different recovery models.that is you can select one database for simple recovery model and another data base for full recovery model and so on..
The selections is made up of the type of data base and backup needs.The only exception to this is the TempDB database which has to use the "Simple" recovery model.

Friday, November 25, 2011

SQL SERVER Interview Questions

  1. What are the different types of joins and what dies each do?
  2. What are the four main query statements?
  3. What is a sub-query? When would you use one?
  4. What is a NOLOCK?
  5. What are three SQL keywords used to change or set someone’s permissions?
  6. What is the difference between HAVING clause and the WHERE clause?
  7. What is referential integrity? What are the advantages of it?
  8. What is database normalization?
  9. Which command using Query Analyzer will give you the version of SQL server and operating system?
  10. Using query analyzer, name 3 ways you can get an accurate count of the number of records in a table?
  11. What is the purpose of using COLLATE in a query?
  12. What is a trigger?
  13. What is one of the first things you would do to increase performance of a query? For example, a boss tells you that “a query that ran yesterday took 30 seconds, but today it takes 6 minutes”
  14. What is an execution plan? When would you use it? How would you view the execution plan?
  15. What is the STUFF function and how does it differ from the REPLACE function?
  16. What does it mean to have quoted_identifier on? What are the implications of having it off?
  17. What are the different types of replication? How are they used?
  18. What is the difference between a local and a global variable?
  19. What is the difference between a Local temporary table and a Global temporary table? How is each one used?
  20. What are cursors? Name four types of cursors and when each one would be applied?
  21. What is the purpose of UPDATE STATISTICS?
  22. How do you use DBCC statements to monitor various aspects of a SQL server installation?
  23. How do you load large data to the SQL server database?
  24. How do you check the performance of a query and how do you optimize it?
  25. How do SQL server 2000 and XML linked? Can XML be used to access data?
  26. What is SQL server agent?
  27. What is referential integrity and how is it achieved?
  28. What is indexing?
  29. What is normalization and what are the different forms of normalizations?
  30. Difference between server.transfer and server.execute method?
  31. What id de-normalization and when do you do it?
  32. What is better - 2nd Normal form or 3rd normal form? Why?
  33. Can we rewrite subqueries into simple select statements or with joins? Example?
  34. What is a function? Give some example?
  35. What is a stored procedure?
  36. Difference between Function and Procedure-in general?
  37. Difference between Function and Stored Procedure?
  38. Can a stored procedure call another stored procedure. If yes what level and can it be controlled?
  39. Can a stored procedure call itself(recursive). If yes what level and can it be controlled.?
  40. How do you find the number of rows in a table?
  41. Difference between Cluster and Non-cluster index?
  42. What is a table called, if it does not have neither Cluster nor Non-cluster Index?
  43. Explain DBMS, RDBMS?
  44. Explain basic SQL queries with SELECT from where Order By, Group By-Having?
  45. Explain the basic concepts of SQL server architecture?
  46. Explain couple pf features of SQL server
  47. Scalability, Availability, Integration with internet, etc.)?
  48. Explain fundamentals of Data ware housing & OLAP?
  49. Explain the new features of SQL server 2000?
  50. How do we upgrade from SQL Server 6.5 to 7.0 and 7.0 to 2000?
  51. What is data integrity? Explain constraints?
  52. Explain some DBCC commands?
  53. Explain sp_configure commands, set commands?
  54. Explain what are db_options used for?
  55. What is the basic functions for master, msdb, tempdb databases?
  56. What is a job?
  57. What are tasks?
  58. What are primary keys and foreign keys?
  59. How would you Update the rows which are divisible by 10, given a set of numbers in column?
  60. If a stored procedure is taking a table data type, how it looks?
  61. How m-m relationships are implemented?
  62. How do you know which index a table is using?
  63. How will oyu test the stored procedure taking two parameters namely first name and last name returning full name?
  64. How do you find the error, how can you know the number of rows effected by last SQL statement?
  65. How can you get @@error and @@rowcount at the same time?
  66. What are sub-queries? Give example? In which case sub-queries are not feasible?
  67. What are the type of joins? When do we use Outer and Self joins?
  68. Which virtual table does a trigger use?
  69. How do you measure the performance of a stored procedure?
  70. Questions regarding Raiseerror?
  71. Questions on identity?
  72. If there is failure during updation of certain rows, what will be the state?

Wednesday, November 16, 2011

SSRS INTERVIEW QUESTIONS part-2


What is the report rendering?
Report rendering is to call the report from server to application. Report rendering can be possible through different ways Like
Excel, PDF, CSV,XML, TIFF, HTML Web Archive,DOC
 What is the report server?
Report server is the server where we deploy the report In Other way it’s a holding place for reports. Applications access report server to view the report.
All the reports are reside with Report Server All other activities pertaining to SSRS is done at Report Server.It acts like work station for reporting tool.
Can we use the page total in report body?
The built in field [&pagenumber] and [&pagetotal] cannot be used in report body as these are applicable in report header or report footer.
These built in field can be added in footer or header for better representation of the report.

What are the different types of report?
Using BIDS reports can be created in two ways. I.e. two ways to retrieve the data from SQL Server data source.
  1. TSQL Reports: TSQL Reports are made up of plain SQL Query. Data source in this case is SQL server database engine.
  2. MDX Reports: MDX reports are created through cubes. For this data source is analysis services cubes.
What is report subscription? 
Report subscription is to schedule the resource on particular time and to send a mailer to particular users. A subscription is a standing request to deliver a report at a specific time or in response to an event, and then to have that report presented in a way that you define. Subscriptions provide an alternative to running a report on demand. On-demand reporting requires that you actively select the report each time you want to view the report. In contrast, subscriptions can be used to schedule and then automate the delivery of a report. 
What is the RDL file?
      A report definition contains data retrieval and layout information for a report. Report Definition Language (RDL) is an XML representation of this report definition.
What are the Matrix reports and what are their uses?
Matrix reports are the reports which used to generate data dynamically i.e. table structure is not static and it can be changed at run time.

Can you use a stored procedure to provide data to an SSRS report?
Yes, you can use a stored procedure to provide data to an SSRS report by configuring the dataset to use a stored procedure command type. However, your stored procedure should return only a single result set. If it returns multiple result sets, only the first one is used for the report dataset.

What is the main purpose of a report parameter?
The main purpose of a report parameter is to add interactivity to your reports, letting users change the report behavior based on options they select. The report parameter option allows you to show different dimensions in a single report by selecting the options.

What is the main purpose of a query parameter?
The main purpose of a query parameter is to filter data in the data source.

Saturday, November 5, 2011

DataFlow Task Facts

Basics of Data Flow Task:

* Streaming
* Unlink control flow, multiple components can process data at the same time
* Smallest unit of the data flow is a component
* Data flows move data, but are also tasks in the control flow, as such, their success or failure effects how your control flow operates
* Data is moved and manipulated through transformations
* Data is passed between each component in the data flow
* Data flow is made up of source(s), transformations, and destinations.


Some Facts about Data Flow:

* The data flow task in SSIS (SQL Server Integration Services) sends data in series of buffers.
* This is bounded by DefaultBufferMaxRows and DefaultBufferMaxSize, two Data Flow properties.
* They have default values of 10,000 and 10,485,760 (10 MB), respectively. That means, one buffer will contain either 10,000 rows or 10 MB of data, whichever is less.
* You can adjust these two properties based on your scenario. Setting them to a higher value can boost performance, but only as long as all buffers fit in memory.
* The Data Flow task encapsulates the data flow engine that moves data between sources and destinations, and lets the user transform, clean, and modify data as it is moved.
* Addition of a Data Flow task to a package control flow makes it possible for the package to extract, transform, and load data.
* A data flow consists of at least one data flow component, but it is typically a set of connected data flow components: sources that extract data; transformations that modify, route, or summarize data; and destinations that load data. Components are connected in the data flow by paths.
* Each path specifies the two components that are the start and the end of the path. For more information, see Data Flow Elements.
* At run time, the Data Flow task builds an execution plan from the data flow, and the data flow engine executes the plan.
* We can create a Data Flow task that has no data flow, but the task executes only if it includes at least one data flow.

Wednesday, November 2, 2011

SSIS Performence Tuning Tip-2

Set IsSorted Property as true: 
                                                     
  •         If you know that the source data is sorted in such scenarios set the IsSorted =true on   the  source   adapter output.
  •         Go to the advanced editor by right clicking on source adapter.
  •         Go to input and output properties there you will be found IsSorted option .set it is true.
  •         By setting this value it does not perform a sort operation ,it only indicates that the data is sorted..
  •         By following this tip we can save some time in real time..if our data base is so large then it will helps a lot..
  • you can see SSIS performance tuning tip 1

Monday, October 24, 2011

Display page numbers in SSRS Roports

Display the page numbers in reports of SSRS:

  • put a text box on the report bottom
  • drag the page number built in field from the report data pane.
  • after that drag and drop the total pages built in field from data pane.
  • we can write the expression for this 

="page" & globals ! pagenumber & "of" & globals ! totalpages

SQL SERVER real time question

can we call the trigger with in a stored procedure? 

Performance Tuning Tip 1

Here i will post one  SSIS performance tip for a  day...regularly i will post a performance tip


TO DAY TIP:
Avoid Select *: 
                             The data flow task(DFT) of ssis uses a buffer oriented  architecture for data transfer and transformations.when data travels from source to the destination, the data first comes in to buffer, required transformations are done in buffer itself and then written in to the destination.
                             The size of the buffer is dependent on several factors, one of them is estimated row size. the estimated row size is determined by summing the maximum size of all the columns in the row.so more columns in the row means less number of rows in a buffer and with more buffer requirements the result is performance degradation .Hence it is recommended to select only those columns which are required at destination.
                            Even if toy need all the columns from the source, you should use the columns name specifically in the select statement other wise it takes another round for the source to gather mete-data  about the columns when you are using select *.

TIP: Try to fit as many rows into the buffer which will eventually reduce the number of buffers passing through pipeline engine and improve performance.


go for ssis performence tuing tip 2

Thursday, October 20, 2011

Command Line Tools In SSIS

what are the command line tools to execute SQL server Integration service projects?

post your answers as commet

Real time scenarios of SSIS

The  below some scenarios are  common in SSIS..


scenario 1:Loop over a list of files & load each one

Tasks Required: Foreach Loop, Data Flow Task

Solution:
Configure the Foreach Loop to loop over any particular directory of files. The loop should be configured to output
to a given variable. Map the given variable to a connection manager by using expressions.

 scenario 2:
Conditionally executing tasks

Solution:
Double-click the precedence constraint and set the Evaluation property to Expression and Constraint.
Type the condition that you want to evaluate in the Expression box.
scenario 3:
Pass in variables when scheduling or running a package

Solution:
Use the /SET command in the DTExec command line or change the Property tab in the Package Execution Utility to have the property path like:
\Package.Variables[User::VariableName].Properties[Value]
scenario 4:
Move and rename the file at the same time

Tasks Required:
File System Task

Solution:
Set the File System task to rename the file and point to the directory you'd like to move the file to. This enables you to rename and move the file in the same step.

Problem:Loop over an array of data in a table & perform a set of tasks for each row
Tasks Required: Execute SQL Task, Foreach Loop
Solution:
Use an Execute SQL Task to load the array and send the data into an object variable. Loop over the variable in a Foreach Loop by use an ADO Enumerator.
scenario 5:
Perform an incremental load of data

Tasks Required: 2 Execute SQL Tasks, Data Flow Task

Solution:
Have the 1st Execute SQL Task retrieve a date from a control table of when the target table was last loaded and place that into a variable.
In the Data Flow Task, create a date range on your query using the variable. Then, update the control table using a 2nd Execute SQL Task to specify when the table was
last updated.
scenario 6:
Perform a conditional update & insert
Components Required: Data Flow Task, Conditional Split, Lookup Transform or Merge Join, OLE DB Command Transform

Solution:
Use the lookup Transform or Merge Join to detemine if the row exists on the destination and ignore a failed match. If the row yields blank
on the key, then you know the row should be inserted into the target (by Conditional Split). Otherwise, the row is a duplicate or an update. Determine if the row
is an update by comparing the source value to the target value in the Conditional Split. The update can be done by an OLE DB Command Transform or by loading the data
into a staging table.

SQL SERVER Interview Questions

1.What is DTS in SQL Server ?
Data Transformation Services is used to transfer the data from one source to our required destination. Considering am having some data in sql server and I need to transfer the data to Excel destination. Its highly possible with dialogue based tool called Data Transformation services. More customization can be achieved using SSIS. A specialized tool used to do such migration works.

2.What is the significance of NULL value and why should we avoid permitting null values?
Null means no entry has been made. It implies that the value is either unknown or undefined.We should avoid permitting null values because Column with NULL values can't have PRIMARY KEY constraints. Certain calculations can be inaccurate if NULL columns are involved.

3.Difference between primary key and Unique key?
Both constraints will share a common property called uniqueness. The data in the column should be unique. The basic difference is,
· Primary key won’t allow null value. Whereas, unique key will accept null value but only one null value.
· On creating primary key, it will automatically format the data inturn creates clustered index on the table. Whereas, this characteristics is not associated with unique key.
· Only one primary key can be created for the table. Any number of Unique key can be created for the table.

4.What is a stored procedure:
Stored procedures are precompiled T-SQL statements combined to perform a single task of several tasks. Its basically like a Macro so when you invoke the Stored procedure, you actually run a set of statements. As, its precompiled statement, execution of Stored procedure is compatatively high when compared to an ordinary T-SQL statement.

5.What is the difference between UNION ALL Statement and UNION ?
The main difference between UNION ALL statement and UNION is UNION All statement is much faster than UNION,the reason behind this is that because UNION ALL statement does not look for duplicate rows, but on the other hand UNION statement does look for duplicate rows, whether or not they exist.

6.Example for Stored Procedure?
They are three kinds of stored procedures,1.System stored procedure – Start with sp_2. User defined stored procedure – SP created by the user.3. Extended stored procedure – SP used to invoke a process in the external systems.Example for system stored proceduresp_helpdb - Database and its propertiessp_who2 – Gives details about the current user connected to your system. sp_renamedb – Enable you to rename your database

7.What is a trigger?
Triggers are precompiled statements similar to Stored Procedure. It will automatically invoke for a particular operation. Triggers are basically used to implement business rules.

8.What is a view?
If we have several tables in a db and we want to view only specific columns from specific tables we can go for views. It would also suffice the needs of security some times allowing specfic users to see only specific columns based on the permission that we can configure on the view. Views also reduce the effort that is required for writing queries to access specific columns every time.

9.What is an Index?
When queries are run against a db, an index on that db basically helps in the way the data is sorted to process the query for faster and data retrievals are much faster when we have an index.

10.What are the types of indexes available with SQL Server?
There are basically two types of indexes that we use with the SQL ServerClustered -
1. It will format the entire table, inturn physically sort the table.
2. Only one clustered index can be created for a table.
3. Data will be located in the leaf level.
4. By default, primary key will create clustered index on the table.
Non-Clustered Index
1. It wont touch the structure of the table.
2. It forms an index table as reference to the exact data.
3. A reference to the data will be located in the leaf level.
4. For a table, we can create 249 non clustered index.

11.Extent Vs Page?
Pages are low level unit to store the exact data in sql server. Basically, the data will be stored in the mdf, ldf, ndf files. Inturn, pages are logical units available in sql server.The size of the page is 8KB.
Eight consecutive pages will form an extent 8 * 8KB = 64KB.
Thus I/O level operation will be happening at pages level.The pages will hold a template information at the start of each page (header of the page).
They are,
1. page number,
2. page type,
3. the amount of free space on the page,
4. the allocation unit ID of the object that owns the page.
Extents will be classifed into two types,
1. Uniform extents
2. Mixed extents
Uniform Extents:It occupied or used by a single object. Inturn, a single object will hold the entire 8 pages.Mixed
Extents:Mulitple objects will use the same extent. SQL Server will allow a max of eight objects to use a shared extent.
Property of SQL Server :Initally if an object is created, sql server will allocate the object to the mixed extent and once if the size reaches 8 pages and more... immediately, a new uniform extent will be provided for that particular object.
Herecomes, our fragmentation and reindexing concepts.

some information of sql server:
Maximum Number of tables : No restrictions (Until your harddisk size you can create tables)
Maximum bytes per row in the table : 8060 bytes
Maximum tables in a select statment - 256
Maxiumu references per table - 253
Maximum rows in the table - Unlimited (Depends on the storage)

Maximum columns per base table : 1024
Maximum columns per insert statment : 1024
Maximum columns per Select statment : 1024 * 4 = 4096
Bytes per Group by or Order by = 8000

Index:
Maximum clustered index per table - 1
Maximum Non-clustered index per table - 249
Maximum Columns in an index (composite index) - 16

Keys:
Maximum Columns in a Primary key - 16
Maximum Columns in a Foreign key - 16

Objects:
Nested Stored procedure levels- 32
Nested triggers levels- 32
Parameters per stored procedure - 2100
Parameters per User defined functions - 2100
Identifier length - 16

Database Level Details:
Number of instances for a server - 50
Files per database - 32767
Filegroups per database - 32767
Databases per instance of SQL Server - 32767

12.MSDE vs SQL Server :
MSDE is the downgrade version of SQL Server, the same code with initial release of SQL Server was named as MSDE (Microsoft Desktop Enginer).

13.Extent Vs Page:
Pages are 8 KB size smallest unit to store the data. Inturn, 8 Pages will form the extent.

14.Delete VS Truncate:
Delete will delete row by row in the table. The syntax is,
delete from tablename
We can give where conditions for deleting the data. 
Each delete will be logged in the log file.
Its a DML statement

Trucate will deallocate the reference in the pages instead of deleting the data.
Its DDL statement. The syntax is,
Truncate table tablename
We can provide truncate with tables having foreign key relationships. (Because it wont do any cascade delete on the table).
It wont log any information in the log file.
Delete or truncate the data at a stretch and wont do any row by row manipulation.

15.What do you mean by acid property?
Atomicity : The value should be atomic.
Consistency : The data in the database should be consistent. It indicates, if we are maintaining data in two different places. Those data should resemble. If it differs, then it will create confusions. Am I right?
Isolation : Transaction should be isolated and it should not affect the other transactions. If the transactions occured on the same data. Appropriate locks should be posted before starting the transactions.
Durability: The data should be stable.


Wednesday, October 19, 2011

SSRS INTERVIEW QUESTIONS

First iam providing basic SSRS interview questions.

1) What are the different kinds of SSRS Reports?
  • Reports can be categorized into operational and analytical reports. The distinction is based on the source of data and level of analysis facilitated by any particular report. Operational reports are based on OLTP sources and are static reports and Analytical reports are based on OLAP sources and generally facilitate drill-down and drill-through for analysis. Technically, SSRS reports can be categorized into parameterized, linked, snapshot, cached, etc...
2) What are parameterized reports? What are cascading parameters in SSRS reports?
  • Reports that accept parameters from users to fetch and report data conditionally, are known as parameterized reports. When you have multiple parameters in a report and values of different parameters are dependent and populated dynamically based on the value of parent parameters, it's known as a cascading parameter.
3) How would you go about developing a SSRS report?
  • General development methodology for a SSRS report is to start by creating a data source. Based on the data source create one or multiple datasets as needed for parameters and the body of the report. Add required controls from the toolbox which would act as a container for the fields in the dataset. Format the controls added to the report body.  Verify and validate the report and finally deploy the report.
4) What is a dataset and what are the different types of datasets?
  • A dataset is similar to a query definition, which is executed when the report is executed. Datasets are of two types: Shared and Embedded. An embedded dataset is private to the report in which it exists and shared datasets can be shared across reports.
5) Would you store your query in a SSRS report or a Database server? State the reason why.
  • Storing SQL queries directly in text format in the dataset, should be avoided. Ideally it should be stored in a stored procedure in the database server. The benefit is that the SQL would be in a compiled format in a SP and brings all the benefits of using an SP compared to using an ad-hoc query from the report.
6) What is Tablix?
  • A Tablix can be seen as a control with combined capabilities of a table and a matrix, which facilitates asymmetric and flexible, row and column level, static and dynamic groupings.
7) How would you access SSRS reports deployed on report server?
  • Reports Manager is the most straight-forward way to access SSRS reports deployed on report server. Apart from executing the reports, it is the main administration console for SSRS server to manage reports.
8) Have you used the Report Viewer control / web part? What are the limitations?
  • The report viewer control / web part is the basic control to integrate SSRS reports with external applications. Using these interface applications can link and display SSRS reports within the application. The parameters toolbar in these controls have a limited programmable interface to decorate and blend it with the theme of the application.
9) Which is the latest version of SSRS and what are the new enhancements?
  • SSRS 2008 R2 is the latest RTM version of SSRS. Rich data visualizations, better programming functions for lookup and aggregation, improved performance and better control over exported data are some of the major enhancements.
10) What is Report Builder?
  • Report Builder is an ad-hoc report authoring tool primarily targeted to be used by business analysts to facilitate self-service report authoring. Report Builder 3.0 is the latest version available as of date.
11) How would you deploy SSRS Reports using out-of-box functionality and how can you automate SSRS report deployment?
  • Business Intelligence Development Studio is generally used to deploy SSRS reports. There is no out-of-box support in SSRS to automate reports deployment, but free third-party products like RSScripter can be used for this.
12) What is drill-down and drill-through in SSRS?
  • Drill-down is a mechanism of decomposing summarized information to a detailed level. Drill-through is a mechanism of decomposing the problem by drilling information generally using more than one report




How to add NULL defense for dates in SSIS packages

In order to create a test bed for our NULL Defense series, create a new database ( name it something like TestDB ) which we will use for simulating problems and developing solutions.

Create a new table called NULLDefense and populate it with data exactly as shown in the picture below. To keep the focus on the problem, we would keep the structure simple, but inclined towards the problem for which we will create our NULL Defense.



Now follow the steps below to create the SSIS package which will simulate the problem in question.


  • Create a new SSIS Project
  • Add a new package to it and name it "NULLDefense".
  • Add a new Data Flow Task to the package.
  • Add an OLE DB Source and name it "NULLDefense - TestDB".
  • Edit the OLE DB Source and connect it to the table we just created.
  • Add a Conditional Split transformation and name it "Split valid and invalid rec". The expected use of this transformation is to split records which contain dates and records which do not contain dates. Edit this transformation as shown in the picture below and also make sure you change the Default output name to "Valid Records" as shown below.


Since this is only for demonstration, we are not going to waste time inserting these records, so we will just use two Multicasts for the destinations to analyze the number of records going to each destination.
  • Add a Multicast transform and name this "Invalid Records".  This should be tied to the "Dates Unavailable" output from the Conditional Split.
  • Add another Multicast transform and name this "Valid Records" and this should be tied to the "Valid Dates" output from the Conditional Split.
After you have completed these steps, your package should look something like the below picture.



Now if you take a look again at the data we have inserted into our table, we should have two records on the invalid output side and one record on the valid side as there are two records which do not have dates. The reason why we inserted a record with NULL and one record with '' is this is one of the scenarios that you will find in real life projects. Applications insert data into OLTP systems and many times application developers insert data using '' as a placeholder for no date instead of using a constant for NULL from the data access libraries (something like SQLDBNULL) which sends a NULL value to the database instead of a blank value. But from a user or business point of view any date that is not available is NULL, though blank and NULL might carry a different meaning and behavior from a technical standpoint.

Execute the package and you will find two records on the valid side and only one on the invalid side. You would find the same result even if you put a condition using the LEN function to measure the length of the value in the DOB field.



Go back to SSMS and view the data in the NULLDefense table we just created and you will find the reason for this. Whenever a blank value is entered inside a datetime field, when you run a query it will return a default value of "1900-01-01 00:00:00.000" as can be seen in the below picture.



We now move towards the solution for this.

I would not consider any options of making changes at the database level as we might not have the liberty of changing anything for our ETL solution as many other applications might be depending on the existing settings of the database.

Coming to the next alternative, we can place additional checks at the database level to ensure that the value is neither NULL nor the default value, but this kind of solution cannot be considered for a real-time solution as there might be numerous functions and conditions operating on this field and everywhere a dual check or replacement function would be required to fix this issue.

Finally, the option that I recommend is to bring uniformity in perception of NULL to our package and feeding only data that is either NULL or non-NULL to the package. This can be done through a very simple step at the database level itself, without changing any existing database settings, values or logic.

For our test package, just create a new "view" as shown in the picture below where we replace the default value with NULL.



Now change the connection of the source in our package to this view and execute the package. The results should now be as expected, which can be seen in the below figure.



Datetime is one special datatype which can trick you with NULLs and we just developed a defense against NULL date values.

Next Steps
  • Try to size the efforts that you would need to patch this NULL or blank issue of dates if you would be fixing this in your existing SSIS packages.
  • Implement this solution and compare the benefit and issues of implementing it at the database level instead of at the package level.