Monday, October 29, 2018

What Is SQLPSX?

I was and still am a big fan of SQLPSX (SQL PowerShell Extensions). It´s is a complete PowerShell library for SQL Server written by Chad Miller and Mike Shepard, and I was one of the developers on some of the modules. It’s composed of 13 modules with 163 advanced functions, two cmdlets, and seven scripts for working with ADO.NET, SMO (SQL Server Management Objects), SQL Server Agent, RMO (Replication Management Objects), SSIS (SQL Server Integration Services), SQL script files, PBM (Policy Based Management), Oracle, and MySQL. It uses the PowerShell ISE as a SQL and Oracle query tool. Also, optional backend databases and SQL Server Reporting Services 2008 reports are provided with the SQL Server and PBM modules. It works with any version of SQL Server, 2005 or later.

The idea behind SQLPSX is “Give a man a fish, and he eats for a day. Teach a man how to fish, he eats for a lifetime.” There are plenty of great libraries that, for example, give the fish to you. If you want a fish baked instead of fried, you will be in trouble, because the function only returns fried or with a lot of parameters to expose this misleading flexibility. SQLPSX has the flexibility to do whatever you want because you choose the output. SQLPSX doesn’t try to be an end-to-end solution for common problems. It is a toolbox you can use to assemble your own solutions.

Installing SQLPSX

Mike Shepard has made the SQLPSX available on GitHub, and you can just download it and add it to your profile. I like to put it in $PsHome, or all users all hosts: c:\windows\system32\windowspowershell\v1.0\Modules\

  1. Unzip the file downloaded from GitHub into a temp directory.
  2. Open PowerShell and navigate to the Modules folder of the unzipped folder.
    cd C:\temp\SQLPSX-master\SQLPSX-master\Modules
    
  3. Run this command to unblock the files.
    get-childitem -Recurse | unblock-file
    
  4. Copy the contents of the Modules folder from temp into the Modules folder in the $PSHome path.

NOTE: Always be sure to fully review any PowerShell modules or scripts before executing them.

For a good understanding of the profiles you can have in PowerShell, review this article: Understanding the Six PowerShell Profiles.

Playing with SQLPSX

To use the module, you must import it:

Import-Module sqlpsxserver

Then you will have access to all the features of this powerful library:

get-command -Module sqlpsxserver

You can see from the list that there is a wealth of commands for managing SQL Server. To get the information about a SQL Server instance run:

Get-SqlServer -sqlserver <server\instance>

Gathering Information from SQL Server

Several commands allow you to gather information from the instance:

 

Command

Description

get-sqlprocess <server\instance>

Get the server process

Get-SqlVersion -sqlserver <server\instance>

Check the version

Get-SqlServer -sqlserver <server\instance> | Select  -ExpandProperty  Databases

Return a list of the databases

Get-SysDatabases -sqlserver <server\instance>

Return the system databases

Get-SqlServer -sqlserver <server\instance> |
Select  -ExpandProperty  Databases | 
Where-Object {$_.name -eq 'Master'}  |
Select *

Return all the information about the Master database

Get-SqlServer -sqlserver <server\instance> |
Select  -ExpandProperty  Databases | 
Where-Object {$_.name -eq 'Master'} |
Select -ExpandProperty tables

Return a list of the tables in Master

get-SqlpsxDatabase -sqlserver <server\instance>  -dbname <database> | Get-SqlTable

Check for tables

get-SqlpsxDatabase -sqlserver <server\instance>  -dbname <database> | Get-SqlStoredProcedure

Check for procedures

get-SqlpsxDatabase -sqlserver <server\instance>  -dbname <database> | Get-sqlview

Check for views

Get-SQLPSXDatabase <server\instance> <database> | Get-SqlTable | Get-SqlScripter

Creating the script of all the tables in the database

Get-SqlServerPermission -sqlserver <server\instance>

Check the permissions

Get-sqllogin -sqlserver <server\instance>

List the logins

Get-InvalidLogins -sqlserver <server\instance>

Check for invalid logins (from restore)

Get-SQLErrorLog -sqlserver <server\instance> -lognumber 1

Check error Log

Get-SqlServerRole <server\instance>

Return the server roles

$server = Get-SqlServer -sqlserver <server\instance>
$server.Configuration.XPCmdShellEnabled.ConfigValue
$server.Configuration.XPCmdShellEnabled.RunValue

Check to see if XP_CMDSHELL is enabled

$server = Get-SqlServer -sqlserver <server\instance>
$server.Configuration.DefaultBackupCompression.ConfigValue
$server.Configuration.DefaultBackupCompression.runvalue

Check the default backup compression setting

$server = Get-SqlServer -sqlserver <server\instance>
$server.Configuration.PriorityBoost.ConfigValue
$server.Configuration.PriorityBoost.runvalue

Check the priority boost setting

$server = Get-SqlServer -sqlserver <server\instance>
$server.Configuration.MaxServerMemory.ConfigValue
$server.Configuration.MaxServerMemory.RunValue

Check the max server memory value

$server = Get-SqlServer -sqlserver <server\instance>
$server.Configuration.MaxDegreeOfParallelism.ConfigValue
$server.ConfigurationMaxDegreeOfParallelism.runvalue

Check the max degree of parallelism setting

get-agentjob <server\instance> |
 Where-Object {$_.lastrunoutcome -eq 'Failed'}

Get a list of failed SQL Agent jobs

You can control the output of each command by using the format-table or format-list cmdlets, for instance:

get-agentjob <server\instance> | Where-Object {$_.lastrunoutcome -eq 'Failed'} |Format-list
get-agentjob <server\instance> | Where-Object {$_.lastrunoutcome -eq 'Failed'} | SELECT Parent, Name, LastRunDate | Format-table

SQL Maintenance with SQLPSX

DBAs should be spending time on more important work, not manually repeating the same tasks day after day. Automating much of this work using the SQLPSX library is possible. Here is a list of useful commands to make changes or perform maintenance:

Command

Description

Add-SqlLogin -sqlserver <server\instance> -name <loginname> -password <password>
 -logintype SqlLogin -DefaultDatabase <database>

Adding a new login

Add-SqlUser -sqlserver <server\instance> -dbname <database> -name <username>
 -login <loginname>

And add a user with an existing login

Remove-SqlUser  -sqlserver <server\instance> -dbname <database>
 -name <username>

Or remove a user

Remove-SqlLogin -sqlserver <server\instance> -name <loginname>

And remove a login

get-SqlpsxDatabase -sqlserver <server\instance>  -dbname <database> | 
Get-SqlTable -name <table> | 
Get-SQLIndex | 
Get-SQLIndexFragmentation | 
Where-Object {$_.AverageFragmentation -ge 50} | 
Select-Object      index_Name,
                   Pages,
                   Rows,
                   IndexType,
                   AverageFragmentation

Checking index fragmentation > 50% in one table

get-SqlpsxDatabase -sqlserver <server\instance>  -dbname <database>| 
Get-SqlTable -name <table> | 
Get-SQLIndex | 
Invoke-SqlIndexDefrag  
get-SqlpsxDatabase -sqlserver <server\instance>  -dbname <database> | 
Get-SqlTable -name <table> | 
Get-SQLIndex | 
Invoke-SqlIndexRebuild

And if it is bad, invoke a rebuild or reorg

Get-SQLPSXDatabase  <server\instance>  <database> | 
Get-SqlTable -name <table> |
Get-SqlStatistic

Or get the statistics

Get-SQLPSXDatabase  <server\instance> <database> | 
Get-SqlTable -name <table> | 
Get-SqlStatistic | 
Update-SqlStatistic

And update statistics

Add-SqlDatabase -sqlserver <server\instance>  -dbname <database>

Add a new database

Remove-SqlDatabase -sqlserver <server\instance>  -dbname <database>

Remove a database

get-SqlpsxDatabase -sqlserver <server\instance> -dbname <database> | Invoke-SqlDatabaseCheck

Perform a database check

Send Results in an Email

You may have a process that returns the last backup date of the databases, and the output is using write-host, but what I have is a procedure that returns the last backup date of my databases and sends it by email to me.

First export the list of backups to a CSV file and then email as an attachment.

get-SqlpsxDatabase -sqlserver DESKTOP-SQEVVO1\SQL2017  |  
Select name, lastbackupdate |
export-csv c:\temp\lastbackup.csv -NoTypeInformation -noclobber
$SmtpServer = 'smtp.live.com'
$SmtpUser = 'youremail@outlook.com'
$smtpPassword = 'youroutlookpasword'
$MailtTo = 'youremail@outlook.com'
$MailFrom = 'youremail@outlook.com'
$MailSubject = "Last Backup" 
$Credentials = New-Object System.Management.Automation.PSCredential -ArgumentList $SmtpUser, $($smtpPassword | ConvertTo-SecureString -AsPlainText -Force) 
Send-MailMessage -To "$MailtTo" -from "$MailFrom" -Subject $MailSubject -SmtpServer $SmtpServer -UseSsl -Credential $Credentials -Attachments  c:\temp\lastbackup.csv

Automating the Daily Checklist with SQLPSX

As you can see from the examples shown here, you can gather a large amount of information from SQL Server and perform many maintenance tasks using this library. One of the daily tasks of the DBA in charge is the daily checks. Every morning, by reviewing these, you have a way to guarantee that your environment is online and available. In the days of Kubernetes, Docker, CDI CDR, and large estates, DBAs should not do this kind of job manually. Well, in this article I will show how to automate some of those daily checklists.

Setting Up

Create a folder called c:\checklist and a text file called InstanceNames.txt with all the SQL Servers listed. Create a text file called hostnames.txt with the name of the servers (instead of SQL Server instances here, add hostnames of the servers). The folder also will accommodate all the CSV files that will be created. The text file instance names are the SQL Server instance names to be checked, and the hostnames are the servers that need checks like disk space.

Failed Jobs

Create a file called failedjobs.ps1. With SQLPSX, collecting this information becomes very simple: just check the property lastrunoutcome:

Import-Module agent
import-module sqlpsxserver
get-content C:\Checklist\InstanceNames.txt |
ForEach-Object {
    $servername = $_
    $nameserver = $_ -replace ('\\','_')
    #get the agentjobs
    Remove-Item "c:\checklist\$($nameserver)_failedjobs.csv" -ErrorAction SilentlyContinue
    get-agentjob $servername |
    Where-Object {$_.lastrunoutcome -eq 'Failed'} |
    foreach-object { 
       [pscustomobject][ordered]@{'Sever Name' = $servername
                                  'Job Name ' = $_.name } |
        Export-Csv "c:\checklist\$($nameserver)_failedjobs.csv" -NoTypeInformation -NoClobber -append
        $Csvs +="c:\checklist\$($nameserver)_failedjobs.csv"
    } 
    
}

Disk Space

This step uses the code written by Aaron Nelson you can find here. Add the code to a file called diskspace.ps1.

Get-Content C:\Checklist\hostnames.txt |
ForEach-Object {
    $servername = $_
    Remove-Item "c:\checklist\$($servername)_diskspace.csv" -ErrorAction SilentlyContinue
    $params = @{Computername = $servername
                query = "select SystemName, Name, DriveType, FileSystem, FreeSpace, Capacity, Label from Win32_Volume where DriveType = 2 or DriveType = 3" }
    
    Get-WmiObject @params |
    select SystemName ,
           Name ,
           @{Label="SizeIn(GB)";Expression={"{0:n2}" -f($_.Capacity/1GB)}} ,
           @{Label="FreeIn(GB)";Expression={"{0:n2}" -f($_.freespace/1GB)}} ,
           @{Label="PercentFree";Expression={"{0:n2}" -f(($_.freespace / $_.Capacity) * 100)}} ,
           Label |
    Export-Csv "c:\checklist\$($servername)_diskspace.csv" -NoTypeInformation -noclobber -append
}

Databases Without a Backup in the Last 16 Hours

This script will record any database that hasn’t been backed up for 16 hours. Add the code to dbnobackup.ps1.

import-module sqlpsxserver
Get-Content C:\Checklist\InstanceNames.txt |
ForEach-Object {
    $servername = $_
    $nameserver = $_ -replace ('\\','_')
    Remove-Item "c:\checklist\$($nameserver)_dbnobackup.csv" -ErrorAction SilentlyContinue
    Get-SqlPSXDatabase -sqlserver $servername |
    Where-object { $_.status -eq 'Normal' -and $_.lastbackupdate -le (get-date).addhours(-16) } | 
    ForEach-Object {
        [pscustomobject]@{
            'Database Name' = $_.name
            'Last Backup Date ' = $_.LastBackupDate
            'Last Differential Backup Date' = $_.LastDifferentialBackupDate
            'Last Log Backup Date' = $_.LastLogBackupDate
        } |
        Export-Csv "c:\checklist\$($nameserver)_dbnobackup.csv" -NoTypeInformation -NoClobber -append 
    } 
}

Index Fragmentation

This script will return all indexes for rebuild if the fragmentation is >= 30. If >=10 and < 30, it will show reorg. No action is done. It just reports the information to the CSV file. The name of the file is indexfrag.ps1.

#Checking  index fragmentation 
Get-Content C:\Checklist\instancenames.txt |
ForEach-Object {
    $ServerName= $_ 
    get-SqlpsxDatabase -sqlserver $_ -PipelineVariable DatabaseName |  
    Get-SqlTable   |
    ForEach-Object {
        $tablename = $_
        Get-SQLIndex $_  | 
        Get-SQLIndexFragmentation |
        ForEach-Object {
            $nametosave = $servername.Replace("\","_")
            if ($_.AverageFragmentation -ge 30 ) { 
                    [pscustomobject][ordered ]@{ "Server Name"=$servername
                                                    "Database Name" = $databasename
                                                    "Table Name"=$tablename
                                                    "index_Name" = $_.index_name
                                                    "Pages" = $_.pages          
                                                    "AverageFragmentation"  = $_.AverageFragmentation
                                                    "Action" = "Rebuild"} |
                    Export-Csv  "C:\checklist\$($nametosave)_fragmentation.csv" -NoTypeInformation  -noclobber -append
            } elseif ($_.AverageFragmentation -le 30 –and $_.AverageFragmentation –ge 10 ) {
                    [pscustomobject][ordered ]@{ "Server Name"=$servername
                                                "Database Name" = $databasename
                                            "Table Name"=$tablename
                                            "index_Name" = $_.index_name
                                            "Pages" = $_.pages           
                                            "AverageFragmentation"  = $_.AverageFragmentation
                                                "Action" = "Reorg"} |
                    Export-Csv "C:\checklist\$($nametosave)_fragmentation.csv" -NoTypeInformation  -noclobber -append
            } 
        }
    }
}

Data and Log Files

This script returns information about the space used and free in the data and log files. The name of the file is DataLogFiles.ps1.

import-module sqlpsxserver
Get-Content C:\Checklist\instancenames.txt |
ForEach-Object {
    $servername = $_
    $nameserver = $_ -replace ('\\','_')
    
    Remove-Item "c:\checklist\$($nameserver)_LogFiles.csv" -ErrorAction SilentlyContinue
    Remove-Item "c:\checklist\$($nameserver)_DAtaFiles.csv" -ErrorAction SilentlyContinue
     Get-SqlPSXDatabase -sqlserver $servername |
     ForEach-Object {
          $_.FileGroups.Files |
            Select-Object Name,
                          FileName,
                          GrowthType,
                          @{ N = 'Growth MB'; E = { [math]::Round(($_.Growth * 1024)/1MB, 2) } },
                          @{ N = 'File Size MB'; E = { [math]::Round(($_.Size * 1024)/1MB, 2) } },
                          @{ N = 'Database Used Space MB'; E = { [math]::Round(($_.UsedSpace * 1024)/1MB, 2) } },
                          @{ N = 'Max Size MB'; E = { [math]::Round(($_.MaxSize * 1024)/1MB, 2) } },
                          @{ N = 'Free Space on Disk GB'; E = { [math]::Round(($_.VolumeFreeSpace * 1024)/1GB, 2) } } |
            Export-Csv "c:\checklist\$($nameserver)_DataFiles.csv" -NoTypeInformation -noclobber -append
            $_.LogFiles |
            Select-Object Name,
                          FileName,
                          GrowthType,
                          @{ N = 'Growth MB'; E = { [math]::Round(($_.Growth * 1024)/1MB, 2) } },
                          @{ N = 'Max Size MB'; E = { [math]::Round(($_.MaxSize * 1024)/1MB, 2) } },
                          @{ N = 'Size MB'; E = { [math]::Round(($_.Size * 1024)/1MB, 2) } },
                          @{ N = 'Used Space'; E = { [math]::Round(($_.UsedSpace * 1024)/1MB, 2) } },
                          @{ N = 'Free Space on Disk GB'; E = { [math]::Round(($_.VolumeFreeSpace * 1024)/1GB, 2) } }   |
            Export-Csv "c:\checklist\$($nameserver)_LogFiles.csv" -NoTypeInformation -noclobber -append
        }
}

New Logins

Check for new logins created in the last 24 hours. The name of the file is Newlogins.ps1.

import-module sqlpsxserver
get-content C:\Checklist\InstanceNames.txt |
ForEach-Object {
    $servername = $_
    $nameserver = $_ -replace ('\\','_')
 
    Remove-Item "c:\checklist\$($nameserver)_newlogins.csv" -ErrorAction SilentlyContinue
    Get-SqlLogin -sqlserver $servername | 
    Where-Object {$_.creationdate -ge (get-date).adddays(-1)}|
    export-csv "c:\checklist\$($nameserver)_newlogins.csv" -NoTypeInformation -NoClobber -append
}

After the checks

The last step will look for CSV files and send them by email. I am using Outlook’s SMTP server but change to your credentials. This file will be called sendemail.ps1.

$csvtosend = @()
$csvtosend = (dir "c:\checklist\*.csv").fullname
$SmtpServer = 'smtp.live.com'
$SmtpUser = 'youremail@outlook.com'
$smtpPassword = 'youroutlookpasword'
$MailtTo = 'youremail@outlook.com'
$MailFrom = 'youremail@outlook.com'
$MailSubject = "Daily Check" 
$Credentials = New-Object System.Management.Automation.PSCredential -ArgumentList $SmtpUser, $($smtpPassword | ConvertTo-SecureString -AsPlainText -Force) 
Send-MailMessage -To "$MailtTo" -from "$MailFrom" -Subject $MailSubject -SmtpServer $SmtpServer -UseSsl -Credential $Credentials -Attachments  $csvtosend

Creating the jobs

Now it´s time to create the job and add the steps with the files. Create a job called Checklist. Use the name of the file for each step. The step type will be Operating system (CMDEXEC). The command for each step will be:

PowerShell.exe “c:\checklist\<nameofthecheck>”

There should be one step for each check, and the final step will be the sendemail.ps1.

Be sure the account running the SQL Agent job has rights in the paths and, more important, Windows authentication in the SQL Server instances. Otherwise, you may want to use a Proxy account to run the job. If you have tons of SQL Server instances, a better approach is to store the results in tables and then create a report instead of emailing files.

Conclusion

As you can see, there are plenty of commands that can apply in the server, database, table, index level. Just use get-command -Module sqlpsxserver to check the available cmdlets of SQLPSX. As I said, the beauty of SQLPSX is the flexibility of the output. Of Course, you need a little bit more knowledge of PowerShell, but now that you have the tools, you can build whatever you need!

The post What Is SQLPSX? appeared first on Simple Talk.



from Simple Talk https://ift.tt/2DejgpA
via

Thursday, October 25, 2018

How to Program with MongoDB Using the .NET Driver

Many people that have a background in relational databases are confused with the terms NoSQL and the document database. What kind of documents are in the database and how to get data without the query language, without SQL? In my opinion, the term NoSQL does not mean that there is no schema, rather than the schema is not strictly enforced. And, of course, there is a query language as well.

During the past few years, JSON has become extremely popular. Getting data from various forms (i.e., WEB and WIN forms) became extremely easy using JSON. Furthermore, saving such data as users entered them dictates the shape of the data. The shape of the data is determined by the application itself, unlike a relational database in which the data are independent of the application. In NoSQL databases, the data are saved in the JSON document. The table record in the relational world is equivalent to the JSON document in the NoSQL world.

I think that it is easier to learn something new by comparing with something that you already know. This article will try to be a gentle introduction to the NoSQL world and will explain how to transform part of a well-known database, AdventureWorks2016, to NoSQL as shown in the image below.

For this article, I had to choose between MongoDB and Cosmos DB. Both databases provide challenges and opportunities, but, in my opinion, MongoDB has an advantage. It provides more free options than Cosmos DB. Furthermore, MongoDB is a multiplatform database with an on-premise edition that is much richer than the Cosmos DB Emulator.

Let’s get started. The first step is to set up the environment. This means installing the MongoDB database, installing the NET driver, and importing some data.

Setting up the environment

At the time of this writing, October 2018, the current version of MongoDB in 4.0.2, and the current version of the .NET driver is 2.7. To install MongoDB, start with a Google search for ‘Mongo DB download’ or click on the link. From the list of available platforms, choose Windows x64. Then fill in the form with your first and last name, e-mail address, and so on. After that, save the MSI installation file locally.

The installation is straightforward. MongoDB will be installed as a Windows service by default. You have the option of choosing the service name, the service user, and startup folders for data and log files, as shown in the image below.

If you choose the Custom installation option, you may specify an installation directory. The MongoDB installation will put the executables in C:\Program Files\MongoDB\Server\4.0\bin folder by default. Be sure to add this path to your PATH environment variable to be able to run the MongoDB shell from any folder in the command prompt.

Feel free to explore executables installed in the folder. The most important is mongod the windows service itself, mongo the shell and mongoimport, a utility that helps import various data in the database. You should also install the MongoDB GUI explorer called Compass from the link.

The next step in setting the environment is to install the NET driver. To do that, start Visual Studio. I’m using the VS 2017 Community Edition in this article.

Create a new console project using C#, and name it as you wish. The purpose of creating a new project is to show you how to reference the MongoDB .NET driver. From the project context menu, choose Manage NuGet Packages. After NuGet Package Manager appears, enter MongoDB driver in the Browse tab as shown in the image below.

Choose and install MongoDB.Driver, and the three main components for programming MongoDB by using .NET are installed. All three are published together, as shown by version number, in the image below.

MongoDB.Bson handles various types and-file formats as well as the mapping between CLR types to BSon. BSon is MongoDB representation of JSON. I will write much more about BSon later.

MongoDB.Driver.Core handles connection pooling, communication between clients and database and so on. Usually, there is no need to work with this library directly.

MongoDB.Driver is where the main API is located.

Once you download the three main libraries, you can start any new project and copy references from this first project. At this point, you can save and close this first project.

In order to follow the article, please, download the article solution from the GitHub. Open the solution by starting another instance of Visual Studio. In the solution notice the folder Data as shown in the image below.

The folder contains three JSON files that must be imported by using a MongoDB command line utility called mongoimport. Determine where on your local disk those three files are located. Start the command prompt from that location. In the command prompt window, execute the following three commands.

mongoimport --db simpleTalk --collection adventureWorks2016 --file sales.json --jsonArray
 
mongoimport --db simpleTalk --collection spetialOffer --file SpetailOffer.json --jsonArray
 
mongoimport --db simpleTalk --collection products --file Product.json --jsonArray

By executing these commands, you will create a MongoDB database named simpleTalk (camel case naming) and three collections named: adventureWorks2016, specialOffer, and product as shown in the image below from Compass. When starting Compass, it tries to establish a connection on localhost port 27017. Just press Connect and, on the left side, select the database simpleTalk.

The collection in NoSQL is similar to the table in the SQL world and provides a namespace for the document. To prove that the import was successful, start the mongo shell from the command prompt. In the command prompt, type mongo to enter into the shell.

Once you enter the shell, type use simpleTalk to switch in the context of the database. And then type show collections and you will be able to see all three imported collections.

To conclude this first section, notice the Authentication folder in the article’s solution. Inside the folder, there is a JavaScript file named AddUser.js. If you are not in the mongo shell, start it again. In the context of the simpleTalk database, execute the code of AddUser.js, as shown in the snippet below, to create a user.

db.createUser(
{
   user: "usrSimpleTalk",
   pwd: "pwdSimpleTalk",
   roles: [{ role: "readWrite", db: "simpleTalk" }]
}
)

All examples in the article will execute in the context of this newly created user usrSimpleTalk. The user has been granted read and write permissions to the database.

Now it’s time to talk about how to connect to the MongoDB database, how CLR types are mapped to BSon types, and the root objects in the MongoDB .NET driver.

Connecting to MongoDB

The purpose of this section is to provide information on how to connect to the MongoDB database and to examine the most important objects of MongoDB API for .NET. Those objects are the MongoDB client, the Mongo database, and the Mongo collection. Open the article solution and make sure that the startup object is Auth as shown in the image below.

The class named Auth demonstrates how to connect to the MongoDB database. MongoClient is the root object that provides that connectivity.

There are several various methods to connect to the database. The first way is to pass the database name, the username, and the password, as shown in the snippet below.

var credential =MongoCredential.CreateCredential(databaseName: Dbase, username: UserName, password: Password);
 
var settings = new MongoClientSettings
{
    Credential = credential
};
 
var mongoClient = new MongoClient(settings);

As you may have noticed by browsing the code, there is no need for opening, closing and disposing of connections. The client object is cheap, and the connection is automatically disposed of, unlike ADO.NET in which the connection is a very expensive resource. This approach is shown in the solution. In the solution, there is one configuration object located in the Configuration folder and named SampleConfig.

The MongoClient has many overloads. The most common way to instantiate the client is to pass a connection string, as shown in the snippet below.

//Using connectionString
var connectionString = $"mongodb://{UserName}:{Passwrod}@localhost:27017/{Dbase}";
 
mongoClient = new MongoClient(connectionString);

By default, MongoDB API uses port 27017 to communicate with the database, but there are many more options. That includes connecting to multiple servers, the replica sets and so on.

At this point, I have to make a digression. One common thing that I would like to know is the number of currently opened connection. By using the mongo shell, it is possible to see that number but without further information.

If you execute in the mongo shell command db.serverStatus().connections, you will see the response in the form of a document, as shown in the image below.

The only way to get more information is to use netstat –b in the cmd window with elevated permissions and getting the result as displayed on the image below.

Let’s get back to the main topic. Down in the object hierarchy is the database object. To get a reference to the database, usually, you execute the following snippet.

var db = mongoClient.GetDatabase($"{Dbase}");

The database is accessed in the context of the client. If you explore the exposed method in the client, you will notice that there are no options for creating a database. That is because the database is created automatically when it is needed and if it doesn’t exist.

In the context of the database, you can get a reference to the collection by executing snippet like shown below.

var collection = db.GetCollection<SalesHeader>($"{Collection}");

or like this one

var collection = db.GetCollection<BsonDocument>($"{Collection}");

In the context of the database, you can create, drop, or rename a collection. There is a method to create a collection, but there is no need to use it because the collection will be created automatically if does not exist. The collection and the database object are both thread safe and could be instantiated once and used globally.

As you noticed in the snippets above, the only difference in getting a reference to a collection object is by passing a type. This is the type needed in .NET to work with the collection. One is a BsonDocument that represents a dynamic schema in which you can use any document shape, and one is a so-called strongly typed schema named SalesHeader. As you will discover in the article, SalesHeader is the class that mimics the table in the AdventureWorks2016 database named Sales.SalesHeader. The option that is strongly typed is the generally preferred way when working with MongoDB in .NET.

The document object is found lower in the object hierarchy. The most general way to represent the document is to use the BSonDocument class. The BsonDocument is basically a dictionary of keys (strings) and BsonValues. BsonValues will be examined in the next section. To conclude this section, F5 to start the article’s solution. The result will display on the console screen. It represents the number of documents in the adventureWorks2016 collection, and it is determined by calling the collections method CountDocumentsAsync.

Mapping CLR types to BSON types

Change the startup object in order to follow this section. This time set Attribute Decoration located in the Serialization folder as the startup object. In the section, I will refer to a couple of simple classes which definitions can be found in the Pocos folder POCO is an acronym for ‘plain old CLR object’ and represents a class that is unencumbered by inheritance.

The first example uses two objects. The first one is of type TestSerializerWithOutAttributes, the second one of type TestSerializer. Both classes define the same properties of type bool, string, int, double and datetime. The classes’ definitions can be found in TestSerializer.cs located in the Pocos folder. The only difference between these two classes is that the second class has been decorated with attributes. In the example, I instantiate two objects and perform the basic serialization by using ToJson, as shown in the image below.

By pressing F5, the result will be displayed as shown in the image below.

As you will notice, there are a couple of differences. One of them is that decimal type, by default, is displayed as a string. It must be decorated decimal type by attribute [BsonRepresentation(BsonType.Decimal128)], as is done in the definition of class TestSerializer.

Then, if you noticed, the property OnlyDate is set by using the following snippet

OnlyDate = new DateTime (DateTime.Now.Year, DateTime.Now.Month, DateTime.Now.Day),

Therefore, there is no time part. I am located in the UTC +1h time zone, and because it’s currently daylight savings time, the default serializer reduces the date value by two hours. To avoid such behavior, I decorated, the property with the attribute [BsonDateTimeOptions(DateOnly =true)].

In the example, I am using some other attributes, which I’ll describe. If you would like to change the element name or specify a different order, try decorating the property with BsonElement as shown in the snippet below.

[BsonElement("description", Order= 1)]
Public string Comment { get; set; }

As you’ll see later, every BsonDocument that is part of a collection should have an element named _id. This is a kind of primary key in the NoSQL-Bson world. Also, by default, the collection is indexed by using this field. You are free to specify your own primary key by decorating a property or field in the class definition with the attribute BsonId, as shown in the snippet below.

[BsonId]
public int SalesOrderId { get; set; }

Finally, in order to specify that only a significant number of digits will be used when working with the double type, the following attribute is used:

// Double decorated with AllowTruncation 
[BsonRepresentation(BsonType.Double, AllowTruncation = true), BsonElement("myduble")]
public double MongoValueTypeDouble { get; set; }

You probably noticed that I use the word the default serializer, although in the code there is no call to any kind of serializer. This is because of the beautiful .ToJson extension. As you can see by using the Visual Studio peek definition or by pressing ALT + F12, ToJson is an extension of the object type defined in the MongoDB.Bson namespace. So, it should be safely used on any type.

There is one thing about ToJson I have to write about at this spot. As you probably noticed, the extension optionally receives a parameter of type JsonWriterSettings.

If you examine this class by using the Visual Studio peek definition, you will notice that the class has properties defined as shown in the image below.

You can pay special attention to the property outputMode, as shown surrounded by red on the image above. This is an enumerator with two possible values. The default one is JsonOutputMode.Shell and the second one is JsonOutputMode.Strict. So, what is the difference? According to the MongoDB documentation :

  • Strict mode. Strict mode representations of BSON types conform to the JSON RFC. Any JSON parser can parse these strict mode representations as key/value pairs; however, only the MongoDB internal JSON parser recognizes the type of information conveyed by the format.
  • mongo Shell mode. The MongoDB internal JSON parser and the mongo shell can parse this mode.

My experience is that the mongoimport utility does not understand shell mode, so, I had to change the default serialization behavior to be mode strict in order to generate a JSON file that could be accepted by the mongoimport utility. You can find more about the differences between strict and shell mode here.

Besides decorating classes with attributes, there is an option to use so-called ClassMap. For example, if you want to keep serialization details out of their domain classes and do not want to play with attributes, you will use the ClassMap approach instead. This is not the only scenario in which you might use ClassMap. You can combine attribute decoration with ClassMap as well.

 

To practice working with ClassMap, switch the startup object in the article’s solution to ClassMap. In this example, I’m using the same type as before, an object of type TestSerializerWithOutAttributes, to produce the same output as in the previous example.

The class should be registered only once like is shown in the snippet below.

if (!BsonClassMap.IsClassMapRegistered(typeof(TestSerializerWithOutAttributes)))
{
   BsonClassMap.RegisterClassMap<TestSerializerWithOutAttributes>(cm =>

An exception will be thrown if you try to register the same class more than once. Internally, BsonClassMap holds information about registered types in a dictionary like shown in the snippet below.

private static readonly Dictionary<Type, BsonClassMap> __classMaps = new Dictionary<Type, BsonClassMap>();

So, registering a class twice means adding a key to the dictionary that exists, which is, on the other hand, an exception. Usually, you call RegisterClassMap from some code path that is known to execute only once (the Main method, the Application_Start event handler, etc.).

The most common way when working with ClassMap is to use so-called AutoMap and after that perform some add-on coding as shown in the snippet below.

if (!BsonClassMap.IsClassMapRegistered(typeof(TestSerializerWithOutAttributes)))
{
       BsonClassMap.RegisterClassMap<TestSerializerWithOutAttributes>(cm =>
       {
           //only read and write properties are mapped
           cm.AutoMap();
            //BsonId attribute
           cm.MapIdProperty(c => c.SalesOrderId);
            //Setting ElementName and Order
           cm.GetMemberMap(c => c.Comment).SetElementName("description").SetOrder(1);
       }
}

In the article’s solution, there is an example showing how to use AutoMap named ClassMapAutoMap, located in Serialization folder. However, I will not use this option in the article. For example, to specify that the decimal type should be rendered (NOTE: I use term rendered and serialized interchangeably) as a decimal rather than strings, following code is used. Notice that there is a predefined serializer for a decimal type.

// Setting the default decimal serializer
cm.MapProperty(c => c.Salary).SetSerializer(new DecimalSerializer(BsonType.Decimal128));

In order to specify the element name and change the order that the element is rendered, the following snippet is used:

//Setting ElementName and Order
cm.MapProperty(c => c.Comment).SetElementName("description").SetOrder(1);

To serialize the DateTime type with only the date part, or to use Local Time, the following snippet is used. Notice that there is predefined serializer for the datetime type.

cm.MapProperty(c => c.OnlyDate).SetSerializer(new DateTimeSerializer(dateOnly: true));
cm.MapProperty(c => c.LocalTime).SetSerializer(new DateTimeSerializer(DateTimeKind.Local));

Similarly, to specify that SalesOrderId should be treated as a BsonId, the following snippet is used

// BsonId attribute
cm.MapIdProperty(c => c.SalesOrderId);

Finally, to specify only a significant part of the digits to be serialized when working with the double type, the following code snippet is used

// we have to set an object of type RepresentationConverter and pass allowTruncation 'true'
var rp = new RepresentationConverter(allowOverflow: false, allowTruncation: true);
cm.MapProperty(c => c.MongoValueTypeDouble).SetElementName("mydouble")
           .SetSerializer(new DoubleSerializer(BsonType.Double, converter: rp));

Also, by pressing F5, you should get the same result as in the previous example.

That is not the end of the possibilities. There is an option to use the so-called Convention Pack. In short, to be able to follow the section from this point, change the startup object to be TestConventionPack.

When working with ClassMap and ‘automapping,’ many decisions should be made. What property should be BsonId, how should the decimal type be serialized, what will be the element name, and so on?

Answers to these questions are represented by a set of conventions. For each convention, there is a default convention that is the most likely one you will be using, but you can override individual conventions and/or write your own convention.

If you want to use your own conventions that differ from the defaults, simply create an instance of ConventionPack, add in the conventions you want to use, and then register that pack (in other words, tell the default serializer when your special conventions should be used).

For example, to instantiate an object of the type of ConventionPack, you might use the following snippet:

var conventions = new ConventionPack
{
    new CamelCaseElementNameConvention(),
    new DecimalRepresentationConvention(BsonType.String),
    new DateOnlyRepresentation(BsonType.DateTime),
    new LocalDateRepresentation(BsonType.DateTime)
};

The first convention, CamelCaseElementNameConvention, will tell default serializer to put all elements name in CamelCase. This is a predefined convention. All other conventions are defined in the example and represent the custom conventions. DecimalRepresentationConvention will tell the default serializer to serialize all decimal properties as a decimal, rather than a string, which is the default option. Similar to this is the DateOnlyRepresentation and LocalDateReporesentation. When working with ClassMap, you should connect your class with convention pack. This is usually accomplished like is shown in the code snippet:

ConventionRegistry.Register("Test", conventions,
type => type.FullName != null && type.FullName.Contains("TestSerializerWithOutAttributes"));

In the Serialization folder, there is a class named TestTypes as well. Please, change the startup object to be TestTypes. This example shows how complex types are transformed into JSON(BSON). This includes .NET native types like generic collections, as well as classes that inherit from other classes. When a class inherits from other class, a special field _t, that represents the type, is rendered as shown in the image below, surrounded with red.

As a take away from this section, it’s possible to decorate the class with attributes, work with ClassMap, and, finally, work with ConventionPack.

API is very easy to use and very hard to misuse when working with serialization. Now it is time to talk about collections. How are they designed? The next section is about schema design.

Schema design

One of the documents in the adventureWorks2016 collection looks as shown in the image below.

As you’ll notice, a detail array that represents details of an order is embedded in the document. In the SQL world, details of an order are put in a separate table known as Sales.SalesDetail. You could do the same thing in MongoDB, e.g., put details in a separate collection, but as you may recall from the introduction, in the NoSQL world, the shape of the data is determined by the application itself. There’s a good chance that when you are working with the sales data, you probably need sales details. The decision about what to put in the document is pretty much determined by how the data is used by the application. The data that is used together as sales documents is a good candidate to be pre-joined or embedded.

One of the limitations of this approach is the size of the document. It should be a maximum of 16 MB.

Another approach is to split data between multiple collections which is also used in the article solution. For example, details about products and special offers are separated into another collection. One of the limitations of this approach is that there is no constraint in MongoDB, so there are no foreign key constraints as well. The database does not guarantee consistency of the data. Is it up to you as a programmer to take care that your data has no orphans.

Data from multiple collections could be joined by applying the lookup operator, as I’ll show in the section that talks about aggregations. But, a collection is a separate file on disk, so seeking on multiple collections means seeking from multiple files, and that is, as you are probably guessing, slow. Generally speaking, embedded data is the preferable approach.

The underlying CLR class to work with the adventureWorks2016 collection is SalesHeader. Its definition is located in the Pocos solution folder.

In the class definition, the detail is represented as shown in the following snippet.

public List<SalesDetail> Details { get; set; }

It is a generic List of SalesDetail. The SalesDetail class mimics the Sales.SalesDetail table. The CLR class to work with the product collection is Product and, to work with the spetialOffer collection, a class SpetialOffer is designed. The source code that shows you how these collections are generated is located in the Loaders folder.

The image below displays the content of the spetailOffer collection.

The spetialOffer collection will be used in the next section, which talks about C(reate), R(ead), U(pdate) & D(elete) operations.

CRUD Operations

To follow this section, change the startup object of the article’s solution to CrudDemo. This example demonstrates how to add, modify and, finally, delete a couple of documents in the spetialOffer collection.

As you may recall from the previous section, the spetailOffer collection has IDs from 1 to 16. If you try to add an ID that already exists, a run-time exception will occur. For example, you can define an object of type SpetialOffer, as shown in the following snippet:

var  so = new SpetialOffer
{
    SpecialOfferId = 1,
    Description = "Test inserting one",
    Type = "New Product",
    Category = "Reseller"
};
await collection.InsertOneAsync(so);
//A write operation resulted in an error.E11000 duplicate key error collection: simpleTalk.spetialOffer index: _id_ dup key: { : 1 }

Running this will return an exception with the message shown in the snippet. Finally, try inserting a document that has an ID of 20.

In order to get the document, the Find extension of IMongoCollection is used as shown in the following snippet.

var result = await collection.Find(x => x.SpecialOfferId == IdToAdd).ToListAsync();

To insert many documents into a collection, you have to pass an enumerable collection of the document to the InsertMany method. InsertMany is an extension of IMongoCollection. To insert in the collection documents with ID 30 and 31, you could execute the code snippet shown below.

var so = new[]
         {
         new SpetialOffer
             {
             SpecialOfferId = IdToAddMany,
             Description = "Test inserting many 1",
             Type = "New Product",
             Category = "Reseller"
            },
           new SpetialOffer
           {
             SpecialOfferId = IdToAddMany2,
             Description = "Test inserting many 2",
             Type = "New Product",
             Category = "Reseller"
           }
        };
var imo = new InsertManyOptions
        {
          IsOrdered = false
        };
await collSpetialOffer.InsertManyAsync(so,imo);

One interesting thing to notice is the second parameter of InsertManyAsync. It is an object of type InsertManyOptions. Particularly, its property IsOrdered is interesting. When set to false, the insertion process will continue on error.

There are two kinds of updates. There is a replace extension which replaces the entire document, and there is an update extension that updates just the particular field or fields in the document. When replacing a document, first you have to specify a filter function to find the document(s) to be replaced. It is not possible to change ID during replacement. If you try to do something like that an exception will be thrown. If you specify a condition that does not match any documents, the default behavior is to do nothing.

Usually, if you want to replace one document, a snippet like following is used:

// It is not possible to change ID !!!
// if we specify the condition that has no matching document default behavior is to do nothing!
var so = new SpetialOffer
{
   SpecialOfferId = IdToAdd,
   Description = "NEW DESCRIPTION",
   Type = "NEW TYPE",
   Category = "NEW Reseller"
};
await collection.ReplaceOneAsync(x => x.SpecialOfferId == IdToAdd, so);

ReplaceOneAsync is an extension of the IMongoCollection and represents a high order function which takes as a parameter another function – lambda expression. One of the parameters that is not provided in the above snippet is UpdateOptions. In my opinion, a better name should be ReplaceOptions because it’s in the context of Replacing. Particularly, in that class, a property IsUpsert is interesting. When specified to be true, an insert is made if the filter condition did not match any document.

 

When updating the document, usually you will execute a snippet like that shown below

private const int IdToAdd = 20;
-----
var ud = Builders<SpetialOffer>.Update.Set("Description", "Descripton from update");
await collection.UpdateOneAsync(x => x.SpecialOfferId == IdToAdd, ud);

In the above snippet, I’m using an example with Builders. Builders are classes that make it easier to use the CRUD API. Builders help define an update condition. This time, only part of a document has been changed. Similar to another CRUD extension of IMongoCollection is the Delete extension. For example, the following snippet will delete the document with ID 20.

 

var result = await collSpetialOffer.DeleteOneAsync(x => x.SpecialOfferId == IdToAdd);
Console.WriteLine($"Is deleted completed : {result.IsAcknowledged}");

There are more extensions like FindOneAndUpdate, FindOneAndReplace, FindOneAndDelete, and so on.

There is no transaction in MongoDB, but there is the so-called atomic operation. Any write operation on the particular document is guaranteed to be atomic – not breakable. Starting with MongoDB 4.0 there is a transaction, but they are limited only for replica sets. (NOTE: A replica set in MongoDB is a group of mongod processes that maintain the same data set ).

In the article’s solution, there is an example that uses a transaction, named InsertOneWithTransaction. It is commented, but in short, in order to use the transaction, a session object should be created in the context of the MongoDB client. Then the session object starts the transaction, and the session object is passed to CRUD methods (extensions) as shown in the snippet below.

// return IClientSessionHandle object
var session = SampleConfig.Client.StartSession();
// start transaction
session.StartTransaction(new TransactionOptions(readConcern: ReadConcern.Snapshot,
writeConcern: WriteConcern.WMajority));
try
{
// Note we have to pass session object
await collection.InsertOneAsync(session, so);
await collection.InsertOneAsync(session, so);
// the transaction is commited
session.CommitTransaction();
}
catch (Exception ex)
{
     // the transaction is aborted
      session.AbortTransaction();
}

All examples in this section use the async stack and TPL library (task parallel library). There is also a sync stack as well, but the first one should be considered as a modern way of programming and was introduced with driver version 2.X.

In this section, I just briefly mentioned how to read and filter data. The next section will talk more about how to filter data.

Filtering

To follow this section, set up FindWithCursor as a startup object for the article solution. In this example, I query documents where

  • TerritoryId equal to 1,
  • SalesPersonId equal to 283,
  • Total Due greater than 90000
  • and limiting the number of documents to be returned to 5,
  • Sorting the result ascending by Due Date.

The task is accomplished by utilizing Find, an extension of IMongoCollection. Find is defined as shown in the image below.

It returns an IFindFluent, which is a context type for method chaining in searching documents. Other methods like Sort, Limit, Project in context of IMongoCollection return IFindFluent.

Find takes two parameters, a lambda expression and an object of type FindOptions. Besides other properties, FindOptions defines the batch size. By using Find you could get a result in chunks. If you limit the number of the document to 5, a total of 3 batches is returned to the client. The complete code is shown in the image below.

Getting the next batch is accomplished by invoking cursor.Result.MoveNextAsync. Inside that batch, you can iterate through the documents by processing cursor.Result.Current. The benefit of that approach is that if you get a large number of documents as a result, you can process them in batches, which will use less memory.

There is an option to get all results by invoking ToListAsync(). In that case, all returned documents live in memory, and there is an option to process the cursor using ForEachAsync as shown in the image below.

Invoking cursor.ToString()will return a MongoDB shell query. I did not find a proper way to get the query plan in the code. It was possible before the 2.7 release of the .NET driver. In the article’s solution, there is an example of how to get the query plan from the code. The example is named Explain and is located in the Filtering folder.

To get the execution plan, you could save the query text and execute in the context of the Mongo shell. I saved the query in a file QueryUsingCursor.js.

So, if you append explain() before find, you will be able to see the execution plan.

Explain receives a parameter that describes what the type of output should be. The parameter specifies the verbosity mode for the explain output. The mode affects the behavior of explain() and determines the amount of information to return. The possible modes are queryPlanner, executionStats, and allPlansExecution.

The example uses executionStats. The plan is shown in the image below.

As you can see, to return five documents, you have to process all the documents in the collection. That is the part when the index comes to play. To create an index, you should execute a command shown in the image below, in the context of the simpleTalk database.

The rule in index creation requires a field that participates in filtering first and then fields that are included in sorting.

An index could be created in the foreground which is the default. What does it mean? MongoDB documentation states: ‘By default, creating an index on a populated collection blocks all other operations on a database. When building an index on a populated collection, the database that holds the collection is unavailable for reading or write operations until the index build completes. Any operation that requires a read or writes lock on all databases will wait for the foreground index build to complete’. This does not sound good.

For potentially long-running index building operations on standalone deployments, the background option should be used. In that case, the MongoDB database remains available during the index building operation.

To create an index in the background, the following snippet should be used. There is no need to create the index again. This index is small, and it’s creation takes a few seconds.

In MongoDB, there is no need to rebuild indexes. The database itself takes care of everything. Great!

Let’s get back to the main topic about the query plan. After the index is created, examine the execution plan again.

As you can see, highlighted with yellow, the execution plan looks much better now. Only five documents are examined.

Another example located in the Filter folder named FilterHeader uses the MongoDB aggregation framework to query the document, which is the next section. To explore this example, change the startup object to be FilterHeader, and you will receive the output as shown in the image below.

Aggregation

Aggregation operations process some input data, in the case of MongoDB, documents and return computed results. Aggregation operations group values from multiple documents together and can perform a variety of operations on the grouped data to return a single result. MongoDB provides three ways to perform aggregation:

I will write mostly about the first by introducing a common problem in the SQL world that is called TOP n per group. The single purpose aggregation methods are briefly mentioned in the first example that connects to MongoDB when Count was introduced. Map-reduce was the only way to aggregate in prior versions of MongoDB.

In MongoDB, there is a difference when querying embedded documents such as sales details, or the main document. In the first case, the unwind operator must be introduced. So, I decided to include the same problem twice.

The first example finds the top N (one) customer per territory that has the greatest Total Due and then limits the result to those territories and customers where the sum of Total Due is greater than the Limit (a defined number). The result should be sorted on Sum of Total Due descending order. The second example does something similar with special offer and products.

There are a couple of ways to accomplish this task in SQL. In the article’s solution, I include two T-SQL scripts, located in TSQL folder. One for querying the header table named QueryingHeader.sql, and one for querying the detail table named QueryingDetail.sql.

I provided three possible ways to solve the problem in T-SQL, by using T-SQL window functions and the APPLY operator.

Both results for querying the header and the detail table, are displayed in the image below. In the first case, the limit is 950.000, and in the second case, the limit is 200.000.

To see how is the problem solved in MongoDB, switch the startup object of the article’s solution to be AggregationSales.

In the class, there are three ways how to accomplish the same task.

  • Using IAggregateFluent
  • Using LINQ
  • Parsing BsonDocument ( the MongoDB’s shell-like syntax )

The first way is shown in the snippet below.

var result = collection.Aggregate(aggregationOptions)
           .Group(x => new { x.TerritoryId, x.CustomerId },
g =>new { TerritoryIdCustomerId = g.Key, TotalDue = g.Sum(x => x.TotalDue) })
         .SortBy(x => x.TotalDue)
         .Group(x => x.TerritoryIdCustomerId.TerritoryId, g => new
         {
             TerritoryId = g.Key,
             MaxCustomer = g.Last().TerritoryIdCustomerId.CustomerId,
             MaxTotal = g.Last().TotalDue
         })
        .Match(x => x.MaxTotal > Limit)
        .Project(x => new
        {
           x.TerritoryId,
           MaxCust = new { Id = x.MaxCustomer, Total = x.MaxTotal },
        })
        .SortByDescending(x => x.MaxCust.Total);
var queryToExplain =query.ToString();
 
var result = await query.ToListAsync();

First, you have to notice that everything in the snippet above is strongly typed!

Then, you should notice that Aggregate is an extension of the IMongoCollection. Aggregate returns a type of IAggregateFluent<TDocument>. All other operators like SortBy, Group, Match, Project, etc. do the same thing. They are extensions of IAggregateFluent and return IAggregateFluent<TDocument>. This is the how methods chaining is accomplished which is really one of the characteristics of the fluent API. So, again, I will repeat the API is easy to use and difficult to misuse, and every method name is self-documenting.

Take a look at IAggregateFluent, in the image below. Surrounded with red is a read-only property named Stages.

It represents every operation performed on the IMongoCollection. If you set the breakpoint on the line

var queryToExplain = query.ToString();

And try to examine the query variable, you will notice that there are six stages defined as shown in the image below.

Every stage has the type of IPipelineStageDefinition and has the Input and Output type as well as the shell operator name as shown in the image below. The .NET driver is perfectly mapped to MongoDB shell!

To capture the whole query, execute query.toString();. The query is saved in the article’s solution in a file Query.js, located in Aggregation folder.

The second way to accomplish the same task is to use LINQ like syntax. To do that, the MongoDB.Driver.Linq namespace must be included. This time the query looks similar like is shown in the snippet below.

var pipeline2 = collection.AsQueryable()
    .GroupBy(z => new { z.TerritoryId, z.CustomerId })
    .Select(g => new { TerritoryIdPlusCustomerId = g.Key, TotalDue = g.Sum(x => x.TotalDue) })
    .OrderBy(o => o.TotalDue)
    .GroupBy(o => o.TerritoryIdPlusCustomerId.TerritoryId)
    .OrderBy(g => g.Key)
    .Select(g => new
    {
        TerritoryId = g.Key,
        MaxCustomer = new { Name = g.Last().TerritoryIdPlusCustomerId.CustomerId, g.Last().TotalDue }
    })
    .Where(x => x.MaxCustomer.TotalDue > Limit)
    .OrderByDescending(x => x.MaxCustomer.TotalDue);

As you see this time, the AsQueryable extension is used. This is an extension of the IMongoCollection and returns an instance of IMongoQueryable. Every method also returns IMongoQueryable, and this is the how method chaining is accomplished, this time with LINQ style syntax.

Method names are similar to IAggregateFluent. Instead of Group, there is GroupBy; instead of Sort; there is OrderBy; instead of Project, there is Select; instead of Match, there is Where, etc.

You can also use SQL like syntax to accomplish the same task, as shown in the snippet below.

var pipeline = from o in
from z in collection.AsQueryable()
group z by new { z.TerritoryId, z.CustomerId }
into g
select new { TerritoryIdPlusCustomerId = g.Key, TotalDue = g.Sum(x => x.TotalDue) }
orderby o.TotalDue
group o by o.TerritoryIdPlusCustomerId.TerritoryId
into g
orderby g.Key
select new
{
   TerritoryId = g.Key,
   MaxCustomer = new
   { Name = g.Last().TerritoryIdPlusCustomerId.CustomerId, g.Last().TotalDue }
};
// filter by limit and apply descaning order 
pipeline = pipeline.Where(x => x.MaxCustomer.TotalDue > Limit).OrderByDescending(x => x.MaxCustomer.TotalDue);

There is a method to pass BsonDocument to the pipeline as well. In the solution the method name is UsingMongoShellLikeSyntax.

The result from all three methods is displayed on the console output and shown in the image below.

 

At this point, I have to make two digressions. First, I briefly describe the lookup operator and then the unwind operator.

As you recall when I write about schema design, the preferable way to work with data is embedded documents, but, there is a situation when you have to join data from a different collection. In such a situation, the lookup operator will help you.

Switch the solution’s startup object to be SampleLookup located in the Aggregation folder. This example creates two collections, one that contains names and the other that contains the meaning of the names. Of course, the only reason I am doing like this is to show how the lookup operator works.

If you use the peek definition of Visual Studio, it will show what Lookup to expect, as shown in the image below, surrounded with red.

The first Lookup is an extension of the IMongoCollection. It executes in the context of the collection and requires a foreign key collection as the first parameter, then the local field on which relation is established, the foreign field that composes the relation, and finally the result type. Unfortunately, the result type cannot be an anonymous type (or I did not discover how to be anonymous?). As always is expected that from the foreign collection will be returned more than one element, so, the operator always expects an array to be returned.

In this case that looks like the code shown in the snippet below.

var result = await colPerson.Aggregate()
    .Lookup<Person, NameMeaning, LookedUpPerson>(collNameMeaning, 
       x => x.FirstName, 
        y => y.Name,
        x => x.Meanings
).ToListAsync();

And the result is displayed on the console window as shown in the image below.

The MongoDB documentation states following for unwind: Deconstructs an array field from the input documents to output a document for each element. Each output document is the input document with the value of the array field replaced by the element.

That means if you want to aggregate on embedded documents, you have to promote them to the top level. Here’s an example. If you would like to aggregate sales details, an array of documents embedded in the main document, you have to use unwind. The example of the basic usage of unwind operator could be found in the SampleUnwind solution file.

If you take just one document in the adventureWorks2016 collection by applying Limit(1) and after that apply Unwind on the Details field, the output will produce as many rows as the number of embedded documents that are in the Details fields. The source code to accomplish such a task is displayed in the snippet below.

var collection = SampleConfig.Collection;
var result = await collection.Aggregate().Limit(1)
           .Unwind<SalesHeader, SalesDetailHelper>(x => x.Details)
           .Project(x=>new {x.Details.SalesOrderDetailId})
           .ToListAsync();
           foreach (var example in result)
           {
          Console.WriteLine($"{example.SalesOrderDetailId.ToString().PadRight(10)}");
           }

And finally, make AggregationUnwind the startup object of the article’s solution. There are five tests in this class. Two of them are accomplished by using LINQ, two of them by using IAggregateFluent and one by parsing BsonDocument as shown in the image below. The result is always the same and equal what is returned by executing the T-SQL script in SSMS.

I found that using LINQ style coding is extremely easy. Unwind in LINQ style syntax is simple SelectMany for joining, as shown in the image below highlighted with yellow.

Using IAggregateFluent is a little bit difficult, especially when working with Unwind. Unwind has three different overloads. One is obsolete, one uses BsonDocument as output, and that means breaks strongly typed writing, and the last one is a little bit confusing.

I expected that the result of the UnWind operation should be an anonymous type. Unfortunately, it should be a concrete type of class that is defined. It is shown surrounded with red on the image below.

And SalesDetailHelper is defined as shown in the image below.

Pretty simple, but on the other hand still a little bit annoying. It would be nice if you could use an anonymous type.

The aggregation framework has limitations. The stages have a limit of 100 MB of RAM, per stage. If a stage exceeds the limit, MongoDB produces an error.

Funny thing about this limitation is that I was unable to find how much memory is used per stage. Thanks to, the moderator in MongoDB forums, I finally found out that there are no possibilities to get that information. It will be available in the next release. As a drawback for such a situation, you could pass an optional parameter allowDiskUse to true to enable writing data to temporary files, as in the following example:

db.adventureWorks2016.aggregate( [
      { $project : .... },
      { $sort : ... }
   ],
   { allowDiskUse: true }
)

Another limitation is the ability to use indexes. I was trying to avoid the collection scan when using the group operator. Unfortunately using indexes with the group operator does not work. The operator always performs the scan. Furthermore, if in the aggregation pipeline, any stage cannot benefit from index usage, all stages that follow also will not use indexes.

Besides that, everything seems to be excellent.

Summary

There is no doubt, MongoDB is a great product. Although I still search for some features that exist in the SQL world, my general impression is excellent. I can say the same for the .NET driver. It is easy to use, every method (extension) emphasizes the intent of the code, enables you to write less code, and everything is where you expect to be.

Although the article moves at the speed of light, I hope that the article with article’s solution would be enough to encourage readers to start exploring MongoDB and the .NET driver.

The post How to Program with MongoDB Using the .NET Driver appeared first on Simple Talk.



from Simple Talk https://ift.tt/2q97jc8
via

SOX and Database Administration – Part 3

The series so far:

  1. Introduction to HIPAA and SOX — Part 1
  2. HIPAA and Database Administration — Part 2
  3. SOX and Database Administration — Part 3

The U.S. Congress passed the Sarbanes-Oxley Act of 2002 (SOX) in response to the number of financial scandals surrounding major corporations such as Enron and WorldCom. By regulating financial reporting and other practices, the SOX legislation significantly expanded the role of the Securities and Exchange Commission (SEC) in its ability to oversee U.S. public companies.

The new law sought to improve the accuracy and reliability of corporate financial disclosures and to force companies to be more transparent. To this end, the law also established the Public Company Accounting Oversight Board (PCAOB) to oversee corporate auditing practices and to establish the rules, standards, and quality control mechanisms that govern financial reporting.

The SOX regulations are arranged into 11 titles that are broken down into individual sections. Each section defines a different aspect of a corporation’s responsibilities for achieving financial transparency and avoiding fraudulent practices. Corporate officers are legally bound to carry out the law’s requirements and ensure the accuracy and completeness of all financial disclosures.

Although officers carry the brunt of the responsibility, complying with the SOX regulations is a companywide effort, with most departments involved in the process. For many organizations, however, much of their financial data resides in databases, putting DBAs at the front lines for protecting data, ensuring its integrity, and supporting the internal controls required for SOX compliance.

Complying with SOX Regulations

The SOX regulations leave it up to the corporation to figure out the best methods to use to comply with the law. DBAs working for a public company should familiarize themselves with the regulations, particularly those sections most relevant to compliance. Although much of the focus around SOX compliance centers on sections 302 and 404, several other sections are also important to gaining insight into the regulations, including sections 401, 408, and 409.

Section 302

Section 302 describes the corporation’s responsibilities when filing quarterly and annual SEC financial reports. The section states that signing officers must certify that the reports do not contain any “untrue statement of a material fact or omit to state a material fact” that would result in a misleading report. In addition, all material included in the report must fairly present the “financial condition and results of operations” for the applicable period of the report.

The main thrust of this part of Section 302 is that all financial SEC reports must be complete, accurate, and in no way misrepresent the organization’s financial condition. To ensure this compliance, Section 302 also outlines the corporation’s responsibilities for establishing and maintaining internal controls for financial reporting as well as for evaluating and reporting on the effectiveness of those controls.

In addition, signing officers must disclose to outside auditors and the internal auditing committee “all significant deficiencies in the design or operation of internal controls,” as well as any fraud that “involves management or other employees who have a significant role in the issuer’s internal controls.” Plus, the corporation must report any significant changes or other factors that could impact the controls.

The requirements for complete and accurate information, along with having internal controls in place, have serious implications for DBAs who manage database environments that contain financial information. The data’s integrity, security, and availability must be ensured at all times and the necessary mechanisms put into place to support an internal control structure.

Section 401

Section 401 identifies additional disclosures that corporations must include in their quarterly and annual SEC financial reports. For example, each report must “reflect all material correcting adjustments that have been identified by a registered public accounting firm.” In addition, the section states that each report must include the following:

…all material off-balance sheet transactions, arrangements, obligations (including contingent obligations), and other relationships of the issuer with unconsolidated entities or other persons, that may have a material current or future effect on financial condition, changes in financial condition, results of operations, liquidity, capital expenditures, capital resources, or significant components of revenues or expenses.

In other words, corporate officers must fully disclose all corrections identified by outside auditors as well as relevant relationships that might impact the organization’s financial picture.

Section 401 also states that all financial information included in the SEC reports or in “any public disclosure or press or other release” shall not contain untrue statements or omit the facts necessary to understanding the corporation’s financial condition. Not only must corporations provide an accurate accounting to the SEC, but also to the public at large, again putting the pressure on DBAs to ensure the accuracy, security and availability of financial data.

Section 404

Section 404 is specific to managing the assessment of internal controls. According to this section, each financial SEC report must include an internal controls report that states the management’s responsibility for “establishing and maintaining an adequate internal control structure and procedures for financial reporting.” In addition, the internal controls report must include an assessment of the effectiveness of the internal control structure and procedures.

Section 404 also states that registered public accounting firms that prepare or issue internal control reports for the corporation must “attest to, and report on, the assessment made by the management of the issuer.”

As noted earlier, Section 302 outlines the requirements for an internal control structure. Not surprisingly, the process of establishing such a structure will likely involve the database team either directly or indirectly, at least as it pertains to the database environment. Because of Section 404, the team will just as likely have to support an auditor’s ability to assess the effectiveness of that system. For example, if an employee with direct database access has left the company, an auditor should be able to see when and if the user’s account has been deleted from the database system. The good news here is that an effective internal control structure will likely have the mechanisms in place to support this type of verification.

Section 408

On the surface, Section 408 is more about SEC responsibilities than those of the corporation. The section states that the SEC must review the financial disclosures submitted by a corporation on a “regular and systematic basis for the protection of investors,” no less frequently than once every three years. However, except for the three-year limit, the frequency with in which financial records are reviewed is left to the discretion of the SEC.

That said, Section 408 does provide several guidelines when a review might be warranted, such as a corporation’s stock prices being more volatile than other corporations. Of these guidelines, the final one is perhaps the most significant, stating that the SEC should consider a review for “any other factors that the [SEC] may consider relevant.” In other words, the SEC has free reign in deciding when to review on organization’s financial records, as long as it’s done at least once every three years.

The implication here is that a corporation could be faced with an SEC audit at any time, in which case the DBA must ensure that the systems are in place to handle an audit whenever it might occur. SEC auditors require vast amounts of information, not only financial data, but also details about the internal control structure and the mechanisms in place to assess that structure. If the database team and the rest of the corporation are not prepared for an audit, they could find themselves working 20-hour days trying to pull everything together and still come up short.

Section 409

Section 409 is specific to real-time disclosures and is short and to the point:

Each issuer reporting under section 13(a) or 15(d) shall disclose to the public on a rapid and current basis such additional information concerning material changes in the financial condition or operations of the issuer, in plain English, which may include trend and qualitative information and graphic presentations, as the Commission determines, by rule, is necessary or useful for the protection of investors and in the public interest.

The bottom line here is that the corporation must keep investors and the public informed about significant changes in the corporation’s operations or financial status “on a rapid and current basis.” For the DBA, this means ensuring the continual integrity and availability of the data so the corporation’s financial department can stay abreast of any significant changes and report on them immediately.

SOX Compliance and the DBA

The SOX law contains many more sections than those summarized above, and the sections I’ve summarized contain more information than what I’ve covered here. Before taking any steps to address SOX compliance, DBAs should understand exactly what the law expects and their role in the corporation’s larger compliance effort. The summaries I’ve provided here are meant only as a starting point for understanding how the regulations might impact database administration and what it will take to comply with them.

Not surprisingly, the exact steps that database teams need to take to achieve compliance will depend on the size of their organizations, the amount of financial data stored in their databases, how their teams and infrastructures are organized, and other important factors. Despite these variables, they should keep in mind the following six guidelines when planning how they’ll approach SOX compliance and the steps they’ll take to get there.

Defining Policies, Standards, and Procedures

Applies to sections 302, 404, 408 and 409 in the SOX regulations.

The database team should start by defining specifications that describe how the database environment will be protected and how the internal control structure will be implemented and assessed. The specifications should cover all aspects of database security and accountability. For example, they should include a security model that defines who can access what financial data, how user accounts are added and removed, what groups and roles will be set up, how users will be authenticated, and any other details that govern database security. The specifications should also address such issues as database coding practices, monitoring strategies, audit-response planning, and risk assessments.

Ensuring Data Integrity

Applies to sections 302, 401, and 409 in the SOX regulations.

A corporation must ensure that all financial data is accurate and complete. If that data is stored in databases, the database team must guarantee the data’s integrity, starting with normalized database designs that eliminate duplicate data and minimize the risks from data modification errors. In addition, the team should implement primary keys, foreign keys, unique indexes, defaults, triggers and other types of constraints as necessary to ensure integrity. The team might also need to work with application developers to verify that code reviews address query-related issues such as implicit data conversions or the use of zeroes or nulls for default values.

Ensuring Data Availability

Applies to sections 302, 401, 408, and 409 in the SOX regulations.

DBAs must ensure that financial data is available whenever it is required. The financial department should be able to access and update data as needed and permitted. Stakeholders should be able to generate reports to provide accurate financial insights. Full and accurate information should be available to internal, external, and SEC auditors as required. In addition, files and databases that support monitoring, reporting, or internal control components should also be readily available if they’re needed. The database team must implement disaster recovery and high availability strategies that protect against hardware failures, data corruption, natural disasters, cyberattacks, or any incidents that can lead to data being lost or unavailable in the short- or long-term.

Securing the Environment

Applies to sections 302 and 401 in the SOX regulations.

Database teams must ensure that financial data cannot be wrongfully deleted or modified, whether maliciously or inadvertently. To this end, they must take whatever steps necessary to protect the components that make up the database environment, working with other IT administrators as necessary to provide complete protection. Security considerations must include the entire physical infrastructure, including the database servers, storage media, network components, and facilities themselves, as well as database management systems, operating systems, and other supporting software. Database teams should also consider such factors as minimizing attack surfaces, applying security patches, or deleting built-in service accounts in the database system.

Controlling Data Access

Applies to sections 302 and 401 in the SOX regulations.

Part of securing the environment is to control access to the data itself to ensure that it cannot be wrongfully modified or deleted. Integral to this process is the effective management of the accounts and passwords that users and applications require to access the data. Data access should be based on the principles of least privilege so that individuals cannot access any more information than is necessary for them to do their jobs. The accounts should also be managed based on separation of duties to reduce the potential for fraud or error. In addition, only authorized personnel should be able to carry out such tasks as updating database schemas or configuring server settings.

Auditing and Monitoring Systems

Applies to sections 302, 401, 404, 408, and 409 in the SOX regulations.

Monitoring and auditing the database systems is essential to addressing all five of the SOX regulations described above. A comprehensive auditing strategy tracks user activity, data and schema modifications, security changes, and other events, helping to reveal both real and potential security threats. Detailed auditing is also integral to meeting the requirements for internal controls and for assessing those controls and determining their effectiveness. Although this level of auditing can impact performance and resource requirements, it must be utilized to its fullest to provide the necessary controls. Fortunately, auditing solutions are available in most major databases management systems and include the ability to set up alerts and generate comprehensive reports.

Achieving SOX Compliance

To comply with SOX regulations, DBAs must ensure the integrity, availability, and security of the data and its environment. They must also have in place an effective monitoring strategy to guarantee ongoing protection and meet the requirements for internal controls. The SOX law doesn’t specify how to go about implementing all this, only that it needs to be done. Fortunately for many database teams, much of what the law requires is consistent with the security and management best practices they already have in place.

Even so, complying with the SOX law can be a complex process, and the database team should work closely with other teams to ensure that all regulations are met and nothing slips through the cracks, only to be discovered during an SEC review. Those involved in planning a compliance strategy need to fully grasp how the regulations works and the implications for being out of compliance. The better they understand the law and more carefully they plan and prepare, the more effectively that can meet the SOX requirements and minimize the risks from an SEC audit.

The post SOX and Database Administration – Part 3 appeared first on Simple Talk.



from Simple Talk https://ift.tt/2ReUnx7
via