Friday, October 30, 2020

10 DevOps strategies for working with legacy databases

Database teams often maintain large and complex legacy databases that they must keep operational to support existing applications. The teams might have maintained the databases for years, or they might have inherited them through acquisitions or other circumstances. Whatever the reasons, they likely manage and update the databases in much the same way they always have, performing their tasks mostly segregated from other efforts. But this approach makes it difficult for organizations to fully embrace modern application methodologies such as DevOps, which rely on more agile and aggressive application delivery processes.

The traditional methods for maintaining legacy databases can slow application delivery, impact productivity, and increase overall costs. Some organizations are starting to incorporate databases into their DevOps processes, but it’s not always an easy goal to achieve. Application and database development have historically been much different, and synchronizing changes can be a significant challenge. Another serious consideration is data persistence, which plays a major role in database updates, but is of little importance to the applications themselves.

Despite these issues, many organizations have come to recognize the importance of incorporating databases into the DevOps model. By applying the same principles to databases as those used for delivering applications, they can better control the entire application development and deployment process, including the databases that go with them. For organizations ready to make the transition, this article offers 10 strategies for incorporating legacy databases into their DevOps pipelines.

1. Choose your database strategy wisely.

Before embarking on your database DevOps journey, give careful consideration to how you should proceed. Start by identifying your business objectives and long-term strategies. In some cases, it might not be worth investing the time and resources necessary to apply DevOps to a legacy database. For example, the database might support applications that will soon be phased out, in which case, you might need to focus on how to migrate the data, rather than continuing to support the legacy database.

For those databases that you’ll continue to support, keep the overall scope in mind. It’s better to take one small step at a time than try to bring every database into the DevOps fold all at once. You should be planning for the long-term and not trying to do everything overnight. You might start with a small database to see how the process goes and then move on from there, or you might deploy a new database so your database team can become familiar with DevOps principles before tackling the legacy databases.

2. Create a DevOps culture that includes the database team.

Discussions around DevOps inevitably point to the importance of creating a culture of collaboration and transparency, one that fosters open communications for all individuals involved in application delivery. DevOps team members should be encouraged to work together and share in the responsibility for application delivery, adopting a mindset in which everyone has a stake in the outcome.

Not that long ago, DevOps teams rarely included database team members, leaving them out of the discussion altogether. For the most part, databases were seen as completely separate entities from the applications they served. However, the only way to successfully incorporate legacy databases into the DevOps process is to ensure that the database team is as much a part of the DevOps effort as the development and operations teams. The better the communications between the database team and everyone else, the smoother the transition to database DevOps.

3. Get the right tools in place.

DevOps teams need the right tools to support the continuous integration/continuous delivery (CI/CD) pipeline common to DevOps deployments. For example, a team might use Chef for configuration management, Jenkins for build automation, Kubernetes for deploying containers, or NUnit for unit testing. The exact tools depend on an organization’s specific requirements and the type of applications they’re deploying. DevOps teams should select their tools carefully, taking into account the growing need to support database deployments.

Initially, DevOps solutions tended to leave databases out of the equation, but that’s been steadily changing. Many DevOps tools now accommodate databases, and many database tools now accommodate DevOps. For example, Redgate Deploy can help incorporate a database into the CI/CD pipeline and can integrate with any CI server that supports PowerShell. Redgate Deploy can also integrate with common CI and release tools, including Jenkins, GitHub, TeamCity, or Azure DevOps.

4. Prepare the database team for DevOps.

DevOps methodologies require special skills to ensure that applications are properly built, tested, and deployed to their various environments. If developers also implement infrastructure as code (IaC), DevOps teams require skills in this area as well. In some cases, an organization might need to bring on additional personnel or outside consultants. For many DevOps teams, however, all they need is enough training and education to get them up and running. This is just as true for the database team as anyone else.

For example, individuals on the database team might focus on specific areas, such as Oracle database development or SQL Server administration. To prepare them for a transition to database DevOps, they should be trained in DevOps methodologies, particularly as they apply to database deployments. In this way, they’ll be much better prepared for the transition and able to understand why specific steps need to be taken. They’ll also be able to more effectively communicate with other DevOps team members about CI/CD operations.

5. Document your databases and data.

When preparing your databases for the transition to DevOps, you should ensure that they’re fully documented so that everyone on the DevOps team can quickly understand how a database’s objects are used and how they impact the application as a whole. Anyone looking at the documentation should be able to understand the types of data being stored, how the data is related, and any special considerations that might apply to the data.

One approach to documentation is to use the features built into the database platform. For example, SQL Server supports extended properties, which can be leveraged to document individual objects. In this way, the data definitions are stored alongside the schema definitions, providing immediate access to the information when needed. Better still, the extended properties also get checked into source control when the schema files are checked in (assuming that source control is being used, as it certainly should be).

6. Get databases into version control.

Many database teams are already putting their database script files into source control for maintaining file versions, often along with lookup or seed data. The databases might not participate in DevOps processes, but the source files are still protected. If a database team is not using source control, they’ll need to start. Version control is an essential part of DevOps methodologies. It offers one source of truth for all code files, resulting in fewer errors or code conflicts. It also provides a record of all changes, and it makes it possible to roll back changes to a previous version.

Database teams should store all their database code in source control in a product like SQL Source Control, without exception. This includes the scripts used to build the database as well as change scripts for modifying the schema. It also includes scripts used for creating stored procedures and user-defined functions, as well as any data modification scripts. In addition, it’s a good idea to store static data into source control, such as that used for lookup data, as well as configuration files when applicable. Source control check-ins should also incorporate code reviews to reduce the possibility of errors.

7. Prepare for data migration.

When you update an application, you don’t have to worry about persisting data from one version to the next. It’s not so easy with databases. You can’t simply update or replace schema without considering the impact on data. Even a minor update can result in lost or truncated data. With any database update, you must take into account how your changes will impact the existing data, what steps you must take to preserve that data, and how to apply any new or modified data to the updated database.

When preparing to incorporate your databases into DevOps, you need to have in place a system for ensuring that any new database versions get the data they need. If you’re doing an in-place schema update, you might also need scripts for modifying and preserving data. If you’re re-creating the database from scratch, you need the scripts necessary to populate the tables. In either case, those scripts should be checked into source control along with the schema changes, so they’re included in the CI/CD build process.

8. Shift left in your thinking.

DevOps methodologies include the concept of shifting left, which refers to the idea of performing certain tasks earlier in the application lifecycle. For example, instead of waiting to build the application and test its code until late in the development cycle, building and testing become an ongoing process that takes place as soon as updated script files are checked into source control. Also important to this process is that code check-ins occur frequently and in smaller chunks. The shift-left approach makes it easier to address issues sooner in the development process when they’re far more manageable.

The shift-left strategy should also be employed when incorporating databases into the DevOps pipeline, with developers checking in their script files on a frequent and ongoing basis. In addition, they should create database-specific tests, such as ones that verify object structure or query results. That way, when a developer checks in code changes, the CI server builds the database and runs the tests so that each change checked into source control is immediately verified. This doesn’t preclude other types of testing later in the cycle, but it helps catch certain issues earlier in the process when they’re much easier to resolve

9. Automate database operations.

Automation is a key component of a successful DevOps operation. This applies to the core application as well as the database. Automation helps avoid repetitive tasks, reduces the potential for errors, and ensures consistent results with each deployment. To this end, DevOps teams must ensure that the build, test, and deployment operations incorporate the database script files along with the application files. This might mean acquiring new CI/CD tools that can better accommodate databases or reconfiguring existing one to handle the additional requirements.

The key, of course, is to ensure that the database developers are checking their files into source control and following a consistent, agreed-upon process for managing script files. For example, they should decide whether to take a state-based approach or migration-based approach to deploying database updates. Although it’s possible to employ both strategies, choosing one over the other makes it easier to collaborate on a single code base while avoiding the complexities of balancing both at the same time.

10. Perform ongoing monitoring.

Like automation, continuous monitoring is an essential component of a successful DevOps operation. This includes monitoring the systems themselves, as well as providing continual feedback at every stage of the DevOps process. Comprehensive monitoring ensures that everything is functioning properly, while helping to identify issues early in the development cycle. Not only does this make for better applications, but it also helps to improve the CI/CD process itself.

Most DBAs are already familiar with the importance of monitoring their production database systems for performance, security, and compliance issues. In all likelihood, they already have the tools in place for identifying issues and determining which ones need immediate attention. The continuous feedback loop represents a different type of monitoring. It provides database developers with visibility across the entire pipeline and alerts them to any problems with their script files during the integration, testing, and deployment phases. For this reason, you must ensure that your pipeline’s feedback loop takes into account the database script files at every phase of the operation.

Looking to the future

All databases have a limited lifespan, and it’s just a matter of time before your legacy databases must be overhauled or replaced with another system. For example, you might find that a document database such as MongoDB will better serve an application than a relational database. Many organizations are also looking for ways to better accommodate the persistent data requirements that go with containers, microservices, and cloud-based applications.

No matter what direction your organization is heading, chances are it’s moving toward a goal of faster release cycles and greater agility. Database teams are under increasing pressure to adopt this strategy and take a more flexible approach to database maintenance. Instead of thinking of databases as unmovable monolithic structures, they must learn to treat them as manageable code that can be tested, automated, and deployed just like application code. DevOps can make this transition a lot easier, while also preparing database teams for what might be coming in the future.

 

The post 10 DevOps strategies for working with legacy databases appeared first on Simple Talk.



from Simple Talk https://ift.tt/37Ppivn
via

Tuesday, October 27, 2020

Ignore the technology for a moment, let’s talk about the problem

Twelve and a half years ago I wrote my largest hit count article of my writing career Ten Common Database Design Mistakes. It currently shows as having over 12 million hits, despite having a misspelling in the first paragraph. After this length of time, you would think that things would have changed and we would have found new problems to be concerned with. You, my reader, would be wrong. In fact, all 10 concepts still would be my top 10, though I might let #8 (Not using stored procedures to access data) slide a bit lower, or at least change it to “Not using relational database technology to its fullest.”

In the years since I wrote that article, I have come to believe that the first item is the major problem that continues to plague us the most here in the 20’s: “Poor design/planning.”

I have spent a great deal of my professional career designing and coding software, and much of my writing and speaking about software creation centers around this base concept that while most of us spend a lot of our time trying to come up with new coding techniques, they largely don’t matter that much. Coding techniques are the least of your problems as a programmer, understanding the problem to be solved is.

It reminds me of a programming challenge we had back when I was in college: sort algorithms. Quick sort, bubble sort, etc. Learning these algorithms was certainly very useful to teach a newbie programmer like myself the basics of programming (I still remember how to do a bubble sort,) but in 20+ years I have yet to need to program a sort in a piece of customer code. What don’t I remember from my college years? Much on how to make sure I was sorting data as the customer wanted, or if the customer even cared about sorting the data. How many times have you thought “That process works really fast, if only it did what I wanted it to?”

The Job of the Dog Food Chef

A term that is often bandied around in programming circles is “eating our own dog food”, and it is a wonderful concept. If you are in charge of mass-producing dog food for lots of animals, how do you know it is good? As the chef, you should taste it (assuming it is healthy to do so) and feed it to the dogs you know and love. If you are willing to feed it to yourself and your dogs, then you have confidence in it. This makes great sense in a company that creates software to sell. For example, Microsoft creates the Windows Operating System, SQL Server, and applications for many operating systems. So as a company, they use that software for critical operations before foisting it on others. And tying this back to the primary point of the blog, making sure it solves the problems they as a company has, and could be morphed to meet other company’s needs.

Unfortunately, one of the interesting aspects of being a computer programmer in a non-software company is that I spend a lot of time using software other people have created, but little to no time using software I actually wrote or even contributed to. In fact, it is very rare that I use anything I have designed or built professionally, because it runs parts of our company’s financial systems, reporting systems, and some parts of our websites.

As a data architect, I do my best to understand what the customer wants, but most of the time it is filtered through layers of communication and time management that get in the way. First, most customers don’t really understand their own problems, especially the minutiae of what they want. Second, the people capturing requirements rarely know enough about the customer to know if the customers do understand their problem. I know my architect job is primarily to interpret requirements given and create (or get created) software to meet the requirements. Even this process is fraught with issues for many architects because either they misinterpret the requirements, or in some cases, just take shortcuts on the way to a solution.

If I was a chef of any kind preparing food that is applicable to humans, I will only rarely ever meet those ingesting the food I have created. I do know these folks will complain about the quality at times, because I complain about the quality of software I use often. Why didn’t the programmer think of this? Who did it this way? What the heck? All of this leads back to the fact that many programmers don’t actually have a chance to use the software they have created for real. Even if it was 100% quality tested, and did exactly what it was created to do, the true users find that it tastes funny.

The Micro-example that Inspired this Blog

About 3 years ago, I started a Twitter account for posting Disney Parks pictures. To manage my pictures, I basically did what any other person would do, I created a set of folders for pictures to tweet, and pictures that I had tweeted. This worked wonderfully for a few weeks, but it quickly became tedious trying to remember what I had recently posted.

So, I broke down and built a database. My first pass at the database was simplistic. Because I don’t like writing UI code (and haven’t in 10 years, at least), I built my database for working through Management Studio as my UI. This meant I kept it simple, just solving the one problem on my mind I thought I could solve easily and didn’t really take the time to do a design or consider what all my software could do for me eventually.

As I started to use the software to manage my process, I realized something. What I had built was ok, but I had really done a mediocre job. I had tested my code, but some stuff I had built just didn’t work as the months passed. I had built randomization functions that weren’t very random. I didn’t build in picture management. I had used natural keys in places (for easy editing), but this meant that making changes to the keys was a lot of more work than had I used surrogate key values.

In reality, I had created some blech dog food and this is what I had to eat. What happened next was exactly what I have seen other users go through. I made things work… sort of. I didn’t always use the software, I repurposed columns, and I was only getting half of the value I wanted out of the software. What I wanted was a tool that would take the work out of picking things to tweet about, leaving me to just take pictures, classify pictures, and write 240 characters a day about them. Instead, managing the software was taking more time than the creative process, which was certainly not my goal.

Pretty much exactly what I had seen out of software users for all these years. Managing the software that manages corporate data needs to be almost imperceptible to the customer. A natural part of the process. Going back to the dog food example, in a perfect recipe, the nutritional value of food should just happen. I eat a delicious meal, and I get all of the vitamins and minerals my body needs.

I Then Did What Most Programmers Can’t Do (but really ought to)

I went back to the drawing board and captured my actual requirements. I did a data model, created the software I wanted, threw out history, and started over. My model isn’t large, only about 16 tables (it and the code is here in Github still in not extremely documented fashion), so it isn’t an extremely complex build. But I made sure I did (or could) meet all of the requirements I have in mind for the upcoming future.

It also helped that the entire team consisted of one person, who took the time to think of themselves as multiple people. In my first build, I catered to the user’s fear of having to edit normalized tables using surrogate key values for references. I haven’t posted this blog yet, but my next blog will discuss how I got around this limitation (and it doesn’t actually use stored procedures)!

When gathering my requirements, I put myself into the eyes of the hungry dog, who just wanted to eat a bowl of something delicious that made their fur shiny; not the person who would be building the tools to cook the food daily. I knew what I wanted to be able to do and put that down.

While most programmers can’t just dump the software someone wrote (particularly bad when it was your own code!) and start over, they can spend a little time when building software putting themselves into the mind of the user thinking “If I had to use this process to do my job, is there a way I can make it better for them?” Not always does this mean a polished user interface. While I would love a polished interface for my picture selecting process, had I built a UI on the first version, it still would not have solved my problem. Right now, the process just produces a set of SQL statements that I have to edit the content of and then execute.

I am happy with that because those statements stop me from making all sorts of dumb errors; leaving me the freedom to be creative in the process. That was the problem I needed to solve and the technology just makes it happen.

Summary

In the end, the important goal for any software project is solving the problem that someone needs a solution to. No matter how well written, high performing, and beautiful looking your software, it’s all meaningless if your solution doesn’t do what your customer needs.

 

The post Ignore the technology for a moment, let’s talk about the problem appeared first on Simple Talk.



from Simple Talk https://ift.tt/3kzQuSe
via

Why database folks should care about User Research

I attended a training session at Redgate this week by Chris Spalton. Chris’ session topic was “An Introduction into Planning User Research.” Chris pointed out that there’s much more to User Research, but that planning the research is particularly important as it is the foundation for everything you do later.

If you’re a database administrator or developer, you may not immediately feel that User Research is related to your job — that’s someone else’s job, right? 

While that used to be true, the world of tech is changing fast. There are reasons you should reconsider.

Understanding your customer is key

One of the most important aspects of Agile development methods, DevOps, and Digital Transformations is a focus on your customer. 

IT folks traditionally have thought of their colleagues as their internal “customers” — DBAs and others used to act more as a sort of internal service provider in a cost center. DevOps and Digital Transformations notably shift this and bring IT into a position to be creative and partner equally with others in the organization to generate value for external customers.

It’s easy to be a bit cynical about this and think, “That’s not the way it is in my job.” If you feel that way, I challenge you to take the initiative to change your own perspective! There is no need for database specialists to wait around for someone to give them permission to start thinking about their organization’s customers and to start being a creative collaborator with others.

Give yourself permission, and very likely it will have a positive impact on your career.

The User Research planning approach is suitable for all sorts of customer interactions

I found that a lot of what I learned about planning User Research is also useful to think about when framing all sorts of conversations:

  • Think carefully about how you define the problem space, as this impacts everything afterward
  • Remember that you are not your customer — and neither are your colleagues. In other words, try to clear your natural bias that you already understand your customer. You don’t!
  • Scope and prioritize your goals for learning carefully, but bring extra questions in case there is time to gain more insights
  • It is useful to gain insights from not only current customers — ex-customers, prospects, and others working in the space may hold valuable insights
  • You won’t cover everything about your research in one session. Plan your research over phases, from initial broad questions, progressing through interaction, showing proposals, and finally testing
  • Plan, prepare for, and structure your calls carefully in a way that respects your interviewee’s time, and allows them freedom to answer honestly and openly

After attending this session, I believe that learning more about User Research will also help me build my skills for other interactions and projects, too.

Participating in User Research connects you to your colleagues

As a DBA or database developer, you may not conduct your own user research. Quite possibly, you already have colleagues who conduct User Research. This may go by different names in different organizations, and the research may be conducted by people in software development, marketing, sales, or customer success teams. 

Keep an eye out to identify where user research is happening in your organization and ask your manager and teammates if they know how and when this works. Reach out to the teams who are doing this research and ask if you are able to join in sometimes, even as an observer, so that you can learn to better understand  your customers. Taking this initiative can open many doors for you and help you bring your career into a new phase.

The post Why database folks should care about User Research appeared first on Simple Talk.



from Simple Talk https://ift.tt/35xhThs
via

Monday, October 26, 2020

SQL Server Backup Types

The most important task for a DBA to know is to be able to recover a database in the event of a database corruption issue. Corrupted databases can happen for lots of reasons, like hardware failures, power failures, incorrect shutdown of a server, failed upgrades, SQL injection, etc. The most common corruption problem that requires a DBA to recover a database is caused by human error such as forgetting to put a WHERE statement in a DELETE or UPDATE statement or accidentally deleting an object or the entire database.

In order to recover a corrupted database, a DBA will need to restore a database to a point in time before the corruption occurred. That recovery process is done using one or more database backups. There are a number of different types of database backups a DBA can create. In this article, I will cover the different backup types available in SQL Server.

Full Backup

The full backup, as it sounds, is a complete backup of a database. The full backup contains all the data in a database and can be used to do a complete restore of the database to the point-in-time that the full backup completed, less the uncommitted transaction in flight at that time. To remove the uncommitted transactions, the full backup also contains some transaction log information. The transaction log information in the backup is used, during the restore process, to roll back and remove any uncommitted transactions. By rolling backup uncommitted transactions, the database is left in a consistent state once the restore process completes.

When a full backup is created, it is written to an operating system file known as a media set. Discussion of media sets is outside the scope of this article. To create a full backup of a database, a DBA can use SQL Server Management Studio (SSMS) or can write a TSQL script. SSMS is typically used to take ad hoc backups of databases, whereas TSQL scripts are more often used to automate the backup process. Using TSQL scripts provides a repeatable backup process, that doesn’t require any other action besides submitting the script. The code in Listing 1 shows how to back up the sample AdventureWorks2019 database, using TSQL code.

Listing 1: TSQL Script to back up the AdventureWorks2019 database

BACKUP DATABASE [AdventureWorks2019] 
TO DISK = N'C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\MSSQL\Backup\AdventureWorks2019_full.bak' 
WITH FORMAT, NAME = N'AdventureWorks2019-Full Database Backup'   
GO

The code in Listing 1 will create a .bak backup file in the backup folder for the default instance, on the C drive. The WITH FORMAT option tells SQL Server to create a new media header and backup set in this file. If the backup file already exists prior to this command being executed, it will be overwritten.

To back up a database using SSMS, you need to use the SSMS GUI. It takes several clicks and some typing to create a backup. Here are the steps to create a backup using the SSMS GUI:

Step 1: Expand the “Databases” item

Step 2: Right click on the “AdventureWorks2019” database

Step 3: Hover over the “Tasks” item from the dropdown menu

Step 4: Click on “Back Up…” from the context menu

Step 5: Make sure the “Backup Type” on the General page is set to “Full”

Step 6: Make sure the “Backup up to” is set to “Disk”

Step 7: Click on the “Add” button in the “Destination” section

Step 8: Browse or type in a location for the backup file

Step 9: Click on the OK buttons a couple of times to create the backup

The time it takes to perform a full back up a database is directly proportional to how much data is stored in the database. The more data stored in a database, the more time it takes to back up the database. Therefore, if the database is quite large, a DBA might want to consider taking other database backup types in addition to a full backup, like differential, as part of their backup and recovery plan.

Differential Backups

A differential back is a backup that copies only the data that has changed since the last full backup, also known as the delta changes. The amount of time it takes to perform a database backup is directly proportional to the amount of data a backup is required to write to the backup file. On a large database, it might take hours to back it up using a full backup. Whereas the time to take a differential backup is directly proportional to the amount of data that has changed since the last full backup. If your backup window is short, and the amount of data changed in a database since the last full backup is small then taking a differential backup will optimize the runtime and still provide a recovery point for your database.

In order to take a differential backup, you must first take a full backup. The full backup is the base for any follow-on differential backup. Each time a differential backup that is taken the delta changes (data changed since the full last backup) are written to the backup file. The amount of data and the time to take to perform a differential backup is directly proportional to the number of updates that have occurred since the last full backup. Therefore, over time a differential backup will get bigger and bigger the more a database is updated and will take longer and longer to run.

Because the differential backups grow in size over time, it is recommended that periodically a new base full backup be taken. Taking a new full backup will reset the number of delta changes back to zero, and thus help keep subsequent differential backups from being really large. A good timing for full and differential backups might be to take a full back up once a week, like on a Sunday, and take differential backups once a day the rest of a week.

One thing to keep in mind when using differential backups is that the restore process is more complicated. In order to restore to a differential backup recovery point, you first have to restore the base full back up before you restore the differential backup. This means you need to run two different restore processes, one for the full backup, and one for the differential backup.

A differential backup can be taken using a TSQL script or using SSMS. To take a differential backup of the AdventureWorks2019 database, you can run the TSQL code in Listing 2.

Listing 2: TSQL Script to take a differential backup

BACKUP DATABASE [AdventureWorks2019] 
TO DISK = N'C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\MSSQL\Backup\AdventureWorks2019_diff.bak' 
WITH FORMAT, NAME = N'AdventureWorks2019-Diff Database Backup', Differential  
GO

If you compare the script in Listing 2 with Listing 1, you can see the I added the word Differential to the WITH option. This is all it takes to make a differential backup of a database. To use SSMS, you have to follow the same steps as taking a full backup shown in the Full Backup section but change the Backup Type to Differential in Step 5.

Tail Log Backup

The tail-Log backup is a special type of transaction log backup. As the name implies, it backs up the tail of the log, meaning those log records that have yet to be backed up, as part of the normal transaction log backup process. In reality, it is just another transaction log backup, but it is the last transaction log backup taken to ensure that all transaction log records for a database have been backed up so there is no data loss when restoring or moving a database to another server using a restore operation. The tail-log can be taken even when a database is offline or has a missing or corrupted data file. This type of backup is only supported when a database is in Full or Bulk-Load recovery model. To take a take tail log backup, you can use the same script and steps as described in the Transaction Log backup section.

Transaction Log Backup

If you want to do a restore to a point-in-time other than when full or differential backups are taken, then you will want to perform transaction log backups periodically, in addition to full and/or differential backups. A transaction log backup copies the transaction log records off to a backup file. Transaction log backups are only supported if a database is using the Full or Bulk-Logged recovery model.

Running transaction log backups keeps the transaction log file from growing out of control when your database is using Full or Bulk-Logged recovery model. When a transaction log backup is taken, the backup process deletes the unused portions of the log files, known as Virtual Log Files (VLF’s), once that information has been copied to the backup file. Deleting these unused/inactive VLF’s frees up space in the log file for reuse. Therefore, if you want to keep your keep the transaction log cleaned up and keep it from growing, you should take transaction log backups periodically.

There is no hard and fast rule to how often you should run a transaction log backup. The frequency depends on how many update transactions are being written to the transaction log, and how much data loss a database can tolerate. Therefore, it is recommended that you take transaction log backups frequently to minimize data loss and to keep the transaction log cleaned up, so it doesn’t run out of space or need to be expanded.

The only requirement to take a transaction log, besides the database being in Full or Bulk-Logged recovery model, is that a full backup has been taken. After a full backup has been taken, you can run as many transaction log backups you want and as often as you like. Typically, full backups are taken daily or weekly depending on the size of the backup window and the time it takes to perform a full backup. Transaction log backups are then taken throughout the day/week at a frequency short enough to keep the transaction log cleaned up and meet the data loss requirements of the database. Keep in mind that if you have a lot of transaction log backups between each full backup, then a lot of transaction log backups may be needed to recover to a failure point. If you find you have an unmanageable number of transaction logs backups between each full back up, then you might want to consider taking differential backups periodically between each full backup. By doing this, you can keep the number of transaction log backups needed for recovery down to a reasonable number.

To take a transaction log backup, you can run a script similar to the one shown in Listing 3. The script in Listing 3 takes a transaction log backup of the AdventureWorks2019 database.

Listing 3: Performing a transaction log backup with a script

BACKUP  LOG [AdventureWorks2019] 
TO DISK = N'C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\MSSQL\Backup\AdventureWorks2019_tran.trn' 
WITH FORMAT, NAME = N'AdventureWorks2019-tran Database Backup' 
GO

If you want to use SSMS to take a transaction log backup, you can follow the steps in the Full Backup section, but change the Backup Type to Transaction Log in Step 5.

Copy-Only Backups

Copy-Only backups are special backups used to take out-of-band backups for special situations. Copy-Only backups do not affect the existing sequence of the normal routine backups. Meaning it will not be considered part of the normal set of backups used to recover a database. This is why these Copy-Only backups are ideal for creating backups copies that can be used for other purposes beside recovering the database, like copying a production database to a development environment.

Copy-Only backups can be taken regardless of the recovery model of the database. Only full and transaction log backups can be created using the Copy-Only option. Differential Copy-Only backups are not supported. If you are using Availability Groups, then Copy-Only backups can be taken on secondary replicates. To take a Copy-Only backup using TSQL, all that is needed is to add the Copy-Only option to the WITH parameter as shown in Listing 4.

Listing 4: Copy-Only Backup

BACKUP DATABASE [AdventureWorks2019] 
TO DISK = N'C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\MSSQL\Backup\AdventureWorks2019_full_copyonly.bak' 
WITH FORMAT, NAME = N'AdventureWorks2019-Full Copy-Only Database Backup', Copy_Only;
GO

To use SSMS to create a Copy-Only back, all that is needed is to follow these steps: Note Step 5.1 was added to support the Copy-Only backup option.

Step 1: Expand the “Databases” item

Step 2: Right-click on the “AdventureWorks2019” database

Step 3: Hover of the “Tasks” item from the dropdown menu

Step 4: Click on “Back Up…” from the context menu

Step 5: Make sure the “Backup Type” on the General page is set to “Full”

Step 5.1: Check the “Copy-only Backup: checkbox

Step 6: Make sure the “Backup up to” is set to “Disk”

Step 7: Click on the “Add” button in the Destination

Step 8: Browse or type in a location and type in the file name for the backup file

Step 9: Click on the OK buttons a couple of times to create the backup

Additional Backups Types

The full, differential and transaction log backups are the normal backups used in most database corruption situations. With the Copy-Only backup being used to support out-of-band backups for special processing, and the tail-log backups to support backing up the last few transactions in the log that have not been backed up. The Additional backup types listed below are for completeness. These backup types are used for databases that have additional special backup requirements that make the backup types listed above impractical like, performance or size.

File and File Group Backups

By default, the data in a database is contained in a single Operating System (OS) file. But there are times when a database may be contained in multiple OS files. These files are then grouped together into a single file group, or multiple file groups to make up the complete database. When a database is divided into multiple files, and/or file groups you can back up the database using piece meal approach, by using file and/or file group backups instead of back up the entire database. All recovery models support file/filegroup backups. By performing file or filegroup backups, a DBA can schedule a database backup in pieces. Backing up a database in pieces helps manage and schedule backing up large databases. By using a file and/or filegroup backup, a DBA can speed up recovery from a database corruption problem by only restoring the files in the database that are corrupted. This flexibility does come at a cost. The backup and recovery process is more complicated and requires more backup files to create and manage.

Partial Backups

A partial backup, as it sounds, is a backup that only backs up part of the database. Partial backups are used to back up the data in the primary filegroup and all of the read-write filegroups. Optionally it can also be used to back up the read-only filegroups. All recovery models support partial backups. Partial backups are a great way to reduce the run time and the size of the database backup when large amounts of the data in the read-only filegroups doesn’t change.

Summary

Being able to backup and recovery a database when a database corruption event occurs should be the number one priority for a DBA. A DBA has many different options for backing up a database, from a single full back up to a set of piecemeal backups using, differential, transaction log, file/file groups, or partial backups. When minimal data loss is important, transaction log backups need to be used in conjunction with full and differentials backups. A DBA needs to understand all the different backup/recovery options available, as well as the data loss requirements for a database when developing a backup strategy. Knowing how much data can be lost allows a DBA to develop a backup/recovery strategy to meet the performance and data loss requirements for their databases.

The post SQL Server Backup Types appeared first on Simple Talk.



from Simple Talk https://ift.tt/37Jdqec
via

Reordering Deployments in Database DevOps

We recently received an interesting question in our Redgate forums from Peter Daniels about altering the order of deployments in database DevOps. The question includes the following scenario:

  1. Developer A makes changes – feature 1.  This set of changes makes its way into the dev integration environment.
  2. Developer B makes changes – feature 2.  This set of changes makes its way into the dev integration environment, too – after feature 1.
  3. We decide that we want feature 2 to go to prod, but NOT feature 1.

For the sake of simplicity, Peter mentions that for this example, “we know that feature 2 is schema-isolated and does NOT depend on feature 1.”

Peter is primarily asking about Git branching workflows in his question. In my opinion, branching workflows are only one of several strategies that may help regarding this situation, so I’m going to zoom out a little bit and consider branching in the context of other practices that can help.

Essentially, I believe some alternative strategies flow more easily long term and support a simpler branching strategy than you might arrive at by thinking about this problem only through the lens of branching workflows.

A quick overview

In this post I’ll cover four big picture strategies that may help, then dig into some technical details for managing this in SQL Change Automation (the tool our question is about). All strategies listed may be used individually or combined with one another.

Strategy 1: The Expand/Contract model (“Parallel Change”)

Strategy 2: Flexible database provisioning / integration and QA databases on demand

Strategy 3: A Hybrid state-and-migrations solution to versioning database changes

Strategy 4: Managing environment-specific branches

A Tool-Specific Example – Redgate’s SQL Change Automation

Conclusions

Strategy 1: The Expand/Contract model (“Parallel Change”)

One useful strategy in authoring database changes is to use a design pattern that focuses on always writing backwards compatible changes. This model known by a few names:

  • Expand / Contract – the approach was popularized under this name in Evolutionary Database Design by Pramod Sadalage and Martin Fowler
  • Parallel Change – Danilo Sato and others have written about the model under this name
  • The “N-1 model”

With this design pattern, database deployments include small changes which are designed to not impact the applications using the database. This is the “expand” phase.

When features enabled by the change are ready to go live, a feature flag / application toggle, or application deployment is done to make new functionality visible to users of the database. This is the “cutover” phase. If there’s a need to undo the change, then this is done by disabling the feature flag or reversing the application deployment — but leaving the data structures in place. 

Finally, after enough time has elapsed to be confident that the new functionality is working well, older structures that are no longer needed may be cleaned up. This is the “contract” phase.

This pattern tends to work best for teams who are able to deploy frequent sets of small changes to their databases using Agile methodologies.

The Expand/Contract model has multiple benefits, as it:

  1. Simplifies the problem of re-ordering planned deployments. In the scenario above, if the database changes for feature 1 are backwards compatible changes which will not be noticeable to customers, we may decide to go ahead and let the changes for feature 1 flow through the pipelines and be deployed. This is because the changes are safe to deploy, even if we don’t plan to light up that functionality for some time.
  2. Simplifies the problem of rollbacksSee this related post for more information on that topic.
  3. Makes database deployments ‘boring’: An important goal of a DevOps process is to have utterly boring, unremarkable deployments. You want to save the excitement for when you “release” functionality to your users, and have a way to control that “release” so that it’s simple to undo the release when something goes wrong. The Expand/Contract model helps you separate database deployments (the Expand and Contract phases) from “releases” (the transition of a feature from off to on). 

The more skilled your team becomes at using the Expand/Contract pattern, the less often you are likely to need to “pull” a deployment from the planned queue.

Expand/Contract (Parallel Change)
Pros Cons
Does not require purchase of a specific technology Requires building skills to deploy small “slices” of changes
Works for teams of all sizes and any database platform Works best when you deploy frequently, which not all teams can do
Combines easily with other strategies listed in this article  

Strategy 2: Flexible database provisioning / integration and QA databases on demand

Another strategy that helps with this problem is the ability to quickly and easily re-provision the environments involved in your database DevOps pipelines.

If we have the ability to quickly create the databases for our integration environment, this gives us the ability to handle the scenario above with the following workflow:

  1. Remove the code for feature 1 from the branch associated with the integration environment — let’s say this branch is named ‘develop’ — but capture the code in a feature branch if we may use it in the future. This might be done by reverting the Pull Request that was used to merge the changes into ‘develop’ earlier (if that was our workflow to get changes into this branch). Alternatively, we could create a new feature branch including the code from feature 1 if one does not exist, then adjust the code in ‘develop’ not include that code.
  2. Deploy a new implementation of the integration database environment reflecting the current (or a very recent) state of production
  3. Deploy ‘develop’ to the reset integration database environment

Having the flexibility to recreate environments quickly has a lot of perks: if we can easily “reset” an environment to a production-like state then this gives us a lot of confidence in exactly how our modified code branch is going to deploy. If we make any mistakes in “removing” code from the branch, then we are very likely to be able to find that in the process of doing this fresh deployment to the environment. 

In order to implement this strategy, we need some tooling to help us implement the database provisioning in a way that isn’t slow or a lot of work. This can be done with data virtualization technology like Redgate’s SQL Clone, with writable SAN snapshots from storage vendors– many of which have APIs which support automation, or with related technology from other storage or software providers.

Like the Expand/Contract model, flexible database provisioning combines well with many tools and processes. Embracing either (or both) of these strategies can help you implement a simpler branching model with fewer long-lived branches to manager and simpler deployment pipelines.

Flexible database provisioning / databases on demand
Pros Cons
Works for teams of all sizes Requires technology and processes to implement (not free)
Provides significant increased flexibility for pipelines and automation  
Combines easily with other strategies listed here  

Strategy 3: A Hybrid state-and-migrations solution to versioning database changes

Sometimes customers can’t meet their needs with the strategies above for a variety of reasons. Perhaps they can’t work in an agile fashion due to political or implementation constraints and need to periodically deploy larger groups of changes in a waterfall pattern. Or perhaps an organizational separation between development and operations is preventing them from being able to embrace on-demand database environments.

Some of these customers find success with using an approach to database versioning where they maintain their source code in a “hybrid” fashion, using two folders in a single source control repository. Redgate offers a solution which provides this approach (video, article), but others might be able to create a similar approach on their own by combining other tooling.

In this hybrid approach:

  • One folder tracks the “state” of current work of the integration environment.
  • Another folder tracks the code which will be deployed. Changes are imported into this second folder as “migrations” when the deployment order is decided. 

This model may suit teams for a few reasons. There may be a large number of team members who prefer a low-code approach where they primarily work with GUIs and use lightweight tools to capture database state on a regular basis, and only worry about what the deployment code will look like when they are getting ready to release. Perhaps the deployment code is primarily ordered and crafted by a smaller group of developers who specialize in implementation details for that database platform. Or there may not be much visibility at all regarding the order of deployments for many team members during a large part of the development process.  

Hybrid State-and-Migrations
Pros Cons
Enables database code to be stored in version control, built, and tested, while selectively identifying code that is ready to deploy Requires storing both “state” and “migrations” (deployment code) in the version control repository, which adds some complexity
Allows deployment code (migrations) to be customized for high-availability requirements and seamlessly integrating data modification changes Tooling must keep both “state” and “migrations” in sync— or if it allows drift it should make the drive visible to the team
Supports a simpler branching workflow that a non-hybrid model Reordering changes after migrations are generated adds notable complexity: you want the path to deployment after migration generation to be as fast as possible (this means that the ability to have database environments provisioned on-demand is still desirable)

Strategy 4: Managing environment-specific branches

Another strategy is to maintain environment specific branches. An example of this strategy is described in “ENV Branching with Git.”

In this branching model, feature branches are used in combination with environment specific branches. An example of how code might flow through this model is:

  • Developers Lexi and Freyja commit database changes to a branch FeatureWidget, perhaps working first in individual sandbox environments where they have their own copy of the database. When working in this feature branch, each of them uses temporary branches off of FeatureWidget to draft work, and regularly merge their changes back into FeatureWidget to collaborate. 
  • When ready, FeatureWidget is merged to the Integration environment branch with a Pull Request. The Pull Request process includes an automated build and deployment of the database for initial validation, then when approved automation deploys the changes from FeatureWidget to the Integration environment database.
  • Important note: the FeatureWidget branch is not deleted at this time, as it will continue to be used
  • When this feature is ready to proceed to the QA environment, the FeatureWidget branch is merged to that environment via a Pull Request
  • If changes need to be made to this feature, changes are made to the FeatureWidget branch (perhaps by using temporary branches and Pull Requests) and Pull Requests are used to re-deploy this code to all branches and related database environments where it has been deployed. (Note: this may require you to make a plan to remove a deployment/reset a lower database environment, depending on your tooling choices. More on this in the example below.)

Although this model may sound simple at first, complexities creep in when you begin to look closely at the challenge of maintaining database state. While this model works well for some teams, it is not a “simple trick” to solving this problem by itself. Both the Expand/Contract model and the ability to provision databases on demand help keep this model simple.

Environment Specific Branches
Pros Cons
Enables database code to be stored in version control, built, and tested, while selectively identifying code that is ready to deploy If changes are reordered after they reach “upper” environments late in the pipeline, resetting environments is still needed if you want your deployments to simulate what the later production deployment will be like (this means that the ability to have database environments provisioned on-demand is still desirable).
Does not require release branches to be used Requires maintaining branches for each environment
  Scoping is important: splitting apart or reordering the changes inside an individual feature branch adds complexity. (Dividing changes into small slices in a way similar to the Expand/Contract model mitigates this risk.)
  If you decide to cancel deployment of a feature branch altogether, you need to remove the code from “lower” environment branches as well as have a method of removing the code from the database or resetting the environment and redeploying to it.

A Tool-Specific Example – Redgate’s SQL Change Automation

While it’s really valuable to consider the strategies listed above, how changes flow through your pipelines will also vary based on the tools you’re using to version control your database code.

Our forum question is about Redgate’s SQL Change Automation, so I’ll dig into that here. If you are using a different tool, it’s quite likely that some of these considerations will still apply (particularly #3 and #4), so it may still be worth your while to think through these in the context of your tooling.

1. Group your versioned migration scripts in folders per deployment (or at a more granular level)

When setting up your SQL Change Automation project, one key thing to consider is whether or not you’d like to use folders to separate your migration scripts. Folders are extremely helpful in separating migration scripts into individual groups, which can make your life much easier when planning and managing deployments.

If you need to re-order a deployment and you have captured all the scripts for that deployment in a folder, it is much more intuitive to manage order changes by removing or renaming folders.

Note that a folder can hold either other folders OR it may hold scripts.

This means you can use subfolders, which is very convenient! You simply cannot combine scripts and folders at the same folder level. 

2. Do you prefer to disable “programmable objects” so that all changes for a deployment are in a single folder?

SQL Change Automation’s programmable object feature treats stored procedures, views, user defined functions (UDFs), and DDL triggers in a special way: it automatically manages these objects as repeatable migration scripts. This is possible because the entire code for these objects is always included in a command to either create or alter them — unlike, for example, an ALTER TABLE statement which only contains details about the specific parts of the object being altered.

This can be advantageous for teams who frequently make changes to these types of objects. The benefits of enabling programmable objects are that:

  • Programmable Objects are stored using a consistent filename based on the object name, making them easy to identify and simplifying the process of handling conflicts when merging branches 
  • Programmable Objects simplify change ordering within a deployment: when changed, programmable objects are automatically executed at the end of the deployment, which easily ensures that dependencies in scripts changing table schema are executed prior to programmable object changes

However, the consistent naming and location of programmable objects means that they are not stored in the same folders as versioned migrations.

This adds some complexity to the process of identifying which changes are grouped together for a planned deployment. If you would like this to be a simple process and to use only versioned migrations in folders for all database changes, you may do this by disabling the programmable object feature.

3. Make a plan on how to remove a deployment from an environment

There will be times when you decide to “back out” changes from a branch and the related environment. It’s worth thinking through your strategy on how to accomplish this.

If you have adopted the strategy of provisioning database environments on demand, then your plan for this will likely to be to reset the environment, either by reverting it to a prior state (if you are using a snapshot or clone related technology that has this feature), or by deploying a fresh, production-like copy/clone of the database environment. Once you have done this, you may make changes to your version control branch to remove code as needed and re-deploy to the database.

If you are not able to quickly reset or re-provision the database environment, you may leverage comparison tools to remove some code changes. To do this, while still maintaining the ability to simulate a production-like deployment:

  • Use SQL Compare to reset the database: you may compare the database to a branch in version control which contains the code representing the state in production, or against another database which has the current state of production. This can be done via the GUI or the command line. Note that SQL Compare is now available as a Docker container for convenient automation. 
  • Make changes to your branch in version control to remove the code you do not wish to deploy using SQL Change Automation
  • Delete rows from the dbo.__MigrationLog table related to the code which you have removed from the environment
  • Redeploy the modified branch

Please note that I don’t recommend modifying the dbo.__MigrationLog table in production. This table contains the history of your deployments, and you want to protect this in your production environments!

If you require a simpler approach to this: you may simply modify the code in your version control branch and execute SQL Compare between that branch and the SQL Change Automation project and the target database. However, consider that this simpler approach does not fully test the way in which your code will be later executed to higher environments. Perhaps that risk is acceptable if the code will still be deployed to several environments later in the pipeline.

4. Make a plan on how to selectively promote a change to the next environment

Many teams have a need to quickly move certain changes — notably hotfixes — through a development pipeline. 

As in previous examples, if you have the ability to quickly provision a production-like database environment for testing, this helps your ability to work with hotfixes. You may create a feature branch for the hotfix which is based upon the point in version control which represents the current stage of production, then deploy fresh production-like environments and test the change.

The feature branch is typically then moved throughout the deployment process using Pull Requests, whether you are using a branching model like Microsoft’s Release Flow, GitFlow, or the Environment Branching model discussed above. 

If you are not able to quickly reset or re-provision the database environment, for speed purposes you may be forced to test the hotfix in environments which contain a significant amount of code that has not yet been released to production. This is an area where you need to weigh the pros and cons for your organization of speed vs quality of testing. The number of environments which you maintain in your pipeline will factor into this: many teams maintain at least one “staging” database environment in a state as close as possible to production, and work hard to ensure that they are able to mimic production deployments to this environment for hotfixes for safety purposes.

Conclusions

When planning or improving your development process for database changes, it’s helpful to step back and look at the larger picture and think of these questions:

  • Can your team move toward adapting the Expand/Contract model and embrace small “slices” of changes for deployment?
  • Does your team have the ability to begin provisioning databases on demand so that you may easily create and reset environments for initial testing, integration, and testing?
  • Is a hybrid approach to capturing database code in version control combining state and migration approaches suitable for your team?
  • Does an environment based branching strategy suit the way your team works?

After you decide on these strategies, those choices will help inform how you proceed in implementing projects with your toolkit of choice. 

Thanks for the great question, Peter!

The post Reordering Deployments in Database DevOps appeared first on Simple Talk.



from Simple Talk https://ift.tt/3jv0f2S
via

Thursday, October 22, 2020

What you’ll learn at my 2020 PASS Summit presentation

T-SQL is my favorite aspect of working with SQL Server. The release of SQL Server 2005 coincided with the beginning of my involvement with the community as a presenter and author. For a while, I would put together a “What’s new with T-SQL” session for each release. Eventually, the windowing functions were such a big part of my material, that I began focusing on them.

In SQL Server 2017 and 2019, I’m less excited about the new things one can do in T-SQL than I am about the optimizer enhancements that improve performance without making code changes. This group of features is called Intelligent Query Processing (IQP). With these enhancements, Microsoft has addressed some of the typical antipatterns that cause queries to run slow.

Back in my consulting days, I would often see these “red flags” and teach the customer why these patterns were usually bad while coming up with alternative solutions.

One example involved scalar UDFs (user defined functions) that were nested several layers deep in a query. There was no advantage to using the UDFs, except for making the query look nicer. It was also a dev shop with no database expert on staff, and they were approaching SQL Server they way that they wrote C# by using functions to hide complexity. By replacing the UDFs with code in the query and adding a couple of appropriate indexes, the query went from 60 minutes down to a few seconds. Until the version of 2019 of SQL Server, that was always the best approach.

As part of the SQL Server 2019 set of IQP features, Microsoft introduced Scalar UDF Inlining. In certain situations, the optimizer can treat the UDF as if the code inside it is just part of the query. Performance can drastically improve. This works for UDFs containing formulas, IF blocks, and queries. It doesn’t work for UDFs with WHILE loops, table variables, and time-dependent functions like GETDATE, for example. Actually, the list of exceptions is quite long, so be sure to take a look at the documentation.

During my session, I’ll demonstrate how query performance can improve just by upgrading. I’ll talk about what to watch out for and how to turn off the features at the database or query level when they don’t make sense.

There are seven enhancements that I’ll cover during my talk at the 2020 Virtual PASS Summit. I hope you will join me!

 

The post What you’ll learn at my 2020 PASS Summit presentation appeared first on Simple Talk.



from Simple Talk https://ift.tt/3dRIYjk
via

Wednesday, October 21, 2020

Continuous deployment at scale

As organizations grow larger and larger, so does the codebase and its complexity. With growing complexity, it becomes increasingly difficult to deliver code to customers with high confidence and velocity. This problem becomes especially hard in large organizations where the number of commits per day can be in the thousands or more. Here, we discuss the basic principle behind continuous deployment, the problems with continuous deployment at large scale, and define a generalized methodology to think about how to release code at scale.

What are Continuous Integration and Continuous Deployment?

First, it is important to understand the concepts of Continuous Integration and Continuous Deployment. Continuous integration is merely a practice of ensuring that pull requests from multiple developers working on the same codebase are consistently and safely merged into a singular branch to avoid conflicts. Having Continuous Integration (CI) ensures that teams within an organization are building, testing, and merging code with consistency and quality.

As soon as code is merged, we enter the phase of Continuous Deployment or Delivery (CD). While CI revolved around merging code consistently and rapidly, deployment is focused on releasing code with consistency. In theory, the practice of adopting Continuous Deployment means that every code change is deployed as soon as it is merged and goes instantly into the hands of the user. Most development teams have a plethora of tools from which to choose: Jenkins, AWS, CircleCI, Atlassian Bamboo, Travis CI etc. Regardless of the tool a company chooses, it is important to understand the different architectures, benefits, and downsides of adopting and building a Continuous Delivery pipeline at scale.

Problems with Continuous Deployment at large scale

While the ideology of Continuous Deployment makes perfect sense on paper, with scale and the size of organizations, it becomes quite complex and hard to manage. For a team or company shipping a few changes per day, Continuous Deployment in practice is quite beneficial. However, if an organization is pushing upwards of thousands of commits per day, Continuous Deployment can be quite expensive. Close attention needs to be paid towards how it is architected. Two major problems that arise due to large scale are:

Cost 

Building, packaging and deploying code requires computational resources. If a Continuous Deployment pipeline is deploying every commit to production continually, it is expected to burn a lot of resources while doing so. In these cases, organizations must balance the need to push code to production along with the cost it incurs. In a small company where the number of commits is relatively small, cost is not a primary concern. However, as soon as the team is writing and shipping code at scale with hundreds of commits per hour, the cost of building and deploying code can easily supersede the benefits of Continuous Delivery. There are only so many meaningful customer commits during a certain time period.

Rollbacks 

Regardless of how much testing, pre-merge, and deploy safeguards one can create, there will  always be bugs in production. When continually deploying code to production, it can become extremely difficult to find bugs. Tracking back a faulty deploy becomes excruciatingly hard if a bad commit is deployed to production and then multiple commits are deployed after that. In these scenarios, the only option would be to revert the culprit commit and do a hotfix or ad hoc deploy. When deploying around 50 commits an hour and a bug is found in the code a week after deployment, the master branch is already ahead by around 7000 commits and finding the culprit can be quite time consuming and costly. 

Solving Continuous Delivery at scale

Regardless of the previously defined drawbacks, we could design and implement a Continuous Delivery platform that can rapidly release code at scale and increase the quality of user experience. There are multiple ways in which one can navigate the problem and the different approaches they can consider

Frequency of meaningful commits

In order to decrease the massive cost of continually building and deploying code, one strategy is to only deploy at a certain cadence and batch commits at certain intervals. These intervals can be different for different microservices. The goal is to have an optimization function to make the cost of deploying acceptable. One way to do it is to find out the number of meaningful commits in a service over time — commits that are bug fixes, feature deploys, configuration changes, or anything an organization deems as meaningful. Small improvements, minor refactors, and other non-significant changes can easily be dismissed. In order to understand, simply calculating the frequency of meaningful commits:

f = M / T

M is the number of occurrences of meaningful commits

T is a unit of time

f is the frequency of meaningful commits

If a service or repository only has meaningful commits once a week, it is worth evaluating if we really need to deploy code every couple of hours or not.

Reduce the external exposure of bugs in the system

A naive approach to implementing Continuous Deployment would be to take every commit and release it to the users. This can work if the user base is small enough or if the number of commits is not large. However, if the organization’s size is large or if there are millions of users, the continuous delivery architecture needs to take that into account and develop a system where the team reduces the potential exposure of bugs to a small surface area. Using a percentage rollout along with emergency triggers to block deploys mid-process can be quite useful. In addition, developing multiple safeguards and checkpoints along the phased rollout are essential. At each checkpoint, or throughout the pipeline, there should be emergency triggers set up that can pause the rollout of a bad commit to production.

The diagram shows how there is a phased rollout to avoid external exposure to bugs. Additionally, not every commit goes to production, and commits are batched together based on optimizing for maximum meaningful commits with some precision. Each step of the way, we validate the build even during the deployment stage. First, once a preflight or employee build is deployed, smoke or webdriver tests are run to ensure the behavior of the application is expected and there are no major regressions in the new build. Second, after deploying to 1% traffic, several load tests and key metrics evaluation are run. These ensure that before rolling the build to the rest of the fleet, we ensure that the machines running the build for the 1% traffic pass all key metrics evaluation and metrics such as success rate, latency, garbage collection and more.

Conclusion

Continuous Delivery can unlock a lot of potential value for any organization and its customers. It enables reaching customers faster and reduces any human intervention required to deploy systems. It does come with its own complexity, especially with large and growing organizations. However, if architected keeping these concerns in mind, it can be extremely beneficial.

 

The post Continuous deployment at scale appeared first on Simple Talk.



from Simple Talk https://ift.tt/37ktDGG
via

Tuesday, October 20, 2020

Is communication always important?

Conventional wisdom says that communication is the critical for success in just about any aspect of life, be that personal relationships or within teams or corporations. Of course, miscommunication can cause all sorts of problems.

Communication has changed drastically over the past few centuries due to technology. We’ve gone from the ability to speak mostly in person to being able to speak to anyone, anywhere in the world with our devices. Our smartphones have become such a part of us that many people today would rather send a text or use social media to communicate over having an actual conversation.

The pandemic has caused quite a bit of physical separation. With many schools and universities using online learning and many employees working from home, in-person communication is difficult, if not impossible in many situations. In some cases, however, less talking makes people more productive. Unless home responsibilities interfere, working remotely can often allow improved productivity since there are fewer interruptions and noise in the typical “sea of cubicles” found in many work environments. We are in the midst of a big experiment right now, and, it turns out, that remote working is not for everyone, but it does have many benefits for some.

Jeff Bezos, founder and CEO of Amazon, once said “communication is terrible.” He was actually talking about the amount of communication required to coordinate activities within a large team. Not only does coordination take time, there is often a lot of non-productive communication taking place when groups get large. He came up with the “two pizza rule” that means a team is too big if it can’t be fed by two pizzas. Keeping teams small reminded me of the Mythical Man Month. Adding more developers when a project is late will cause even more delays due to the amount of added coordination.

Even in my own job, by using the right tools and a defining a good process decreased the amount of required communication. When I first began this job three years ago, there were many back-and-forth emails and Slack discussions between myself and an analyst just to get articles published and the Simple Talk newsletter built. Now an email or Slack message about the newsletter is the rare exception. By improving the process, the communication to coordinate is no longer necessary.

Along the same lines, one of the tenets of DevOps is breaking down technical silos to improve communication between development and operational teams. It makes sense that one group should know what the other is up to so that deployments to production are successful. Everyone should be accountable for the success and the goal is to eliminate the old “throw the code over the wall” mentality. Automation can decrease the required amount of communication within a team and improve success.

Is communication always important? Yes, it certainly is, but it can also get in the way of being efficient and productive.

 

Commentary Competition

Enjoyed the topic? Have a relevant anecdote? Disagree with the author? Leave your two cents on this post in the comments below, and our favourite response will win a $50 Amazon gift card. The competition closes two weeks from the date of publication, and the winner will be announced in the next Simple Talk newsletter.

The post Is communication always important? appeared first on Simple Talk.



from Simple Talk https://ift.tt/34fhJMf
via

Connecting to Log Analytics using Azure Data Studio and KQL

Azure Data Studio is a great tool and supports way more than only SQL. Recently, Azure Data Studio included the support to Kusto language, or KQL.

Kusto is a query language used for massive amounts of streamed data, perfect for logs. That’s why Kusto was initially created for Azure Monitor and Azure Log Analytics.

However, Azure Data Studio only supports Kusto connections with Azure Data Explorer, not Log Analytics. All this huge focus on Azure Data Explorer given by Microsoft still puzzles me. Log Analytics is a ready-to-use monitoring solution for cloud and on-premises environment.

On the other hand, Azure Data Explorer is a database storage MPP (Massive Parallel Processing) solution for massive stream storage – logs – allowing anyone to create their own monitoring solution using it as a storage, but not a ready-to-use solution as Log Analytics.

Considering these differences, the use cases for Log Analytics seems related to migrations to the cloud and all the following monitoring work. Log Analytics can be a fundamental tool on the process of cloud adoption, because it gives numbers to the process. These numbers allow the IT Administrators to do in the cloud what only a few really manage in an organized way: Keep baselines and compare results after every environment change. 

On the other hand, software houses creating custom solutions or, in some situations, solution providers trying to manage a big number of clients, may find Azure Data Explorer as a better solution for them. It puzzles me that one receives more attention than the other. What do you think? Let’s talk about on the comments.

Now it’s time to go direct to the point. Of course, you already noticed how I got disappointed by discovering Azure Data Studio couldn’t connect to Log Analytics. However, with some help from Microsoft community on twitter I was able to find a work around: We can use KqlMagic, a Python package, to connect to Log Analytics.

Yes, Azure Data Studio is so flexible it supports Python and we can install Python packages.

The Challenges and What Doesn’t Work

If you are thirsty for the solution, you can skip this part and go to the solution a bit below on the blog. However, I like to explain how I reached the solution and what didn’t work. At the bare minimum, you will find some great links and ideas.

What may appear easy in the beginning, is not at all. Let’s face the challenges:

  • The article explaining how to do it only connects to a demo Log Analytics Workspace kept by Microsoft
  • The github repo has a Get Started With KqlMagic and Log Analytics which redirects you to an online executable notebook with some explanations about the connection string, but using the demo workspace
  • If you ask help to KqlMagic itself, the information you get is the same on the notebook, and it doesn’t help to much when trying to connect to a workspace.

The Existing Documentation

This is the documentation we have:

%kql loganalytics://code;workspace='<workspace-id>';alias='<workspace-friendly-name>'
%kql loganalytics://tenant='<tenant-id>';clientid='<aad-appid>';clientsecret='<aad-appkey>';workspace='<workspace-id>';alias='<workspace-friendly-name>'
%kql loganalytics://username='<username>';password='<password>';workspace='<workspace-id>';alias='<workspace-friendly-name>'
%kql loganalytics://anonymous;workspace='<workspace-id>';alias='<workspace-friendly-name>'

(1) authentication with appkey works only for the demo.
(2) username/password works only on corporate network.
(3) alias is optional.
(4) if credentials are missing, and a previous connection was established the credentials will be inherited.
(5) if secret (password / clientsecret) is missing, user will be prompted to provide it.
(6) if tenant is missing, and a previous connection was established the tenant will be inherited.
(7) a not quoted value, is a python expression, that is evaluated and its result is used as the value. This is how you can parametrize the connection string
(8) anonymous authentication, is NO authentication, for the case that your cluster is local.

 

The numbered items help to exclude the options that don’t work:

  • The connection string on the demo only works for the demo
  • We can’t use anonymous, for sure
  • I tried to leave some parameters missing, but it didn’t work for me, KqlMagic always complains about the missing parameters

I can ensure you: The solution is above, but it is still difficult to find.

More Documentation

Another closer look on the KqlMagic git repo and we find this documentation about authentication methods on the repo:

  • AAD Username/password – Provide your AAD username and password.

  • AAD application – Provide your AAD tenant ID, AAD app ID and app secret.

  • code – Provide only your AAD username, and authenticate yourself using a code, generated by ADAL.

  • certificate – Provide your AAD tenant ID, AAD app ID, certificate and certificate-thumbprint (supported only with Azure Data Explorer)

  • appid/appkey – Provide you application insight appid, and appkey (supported only with Application Insights)

 

We have a match

Finally, it’s just a matter of linking the dots: On the sample connection strings, the one which makes more sense mention an AppId inserted as the clientId key value, even against the advise that App Ids work only for the demo. On the other hand, the list of authentication methods mentioned above includes an AAD Application. We have a match!

I don’t remember exactly how I found this great link about Log Analytics API , but it includes information about how to register an application with Azure Active Directory. The final result is not exactly what we need, because the documentation focus is make a direct access to the API. However, we can still follow this link to register the application on Azure Active Directory.

The Solution

After many attempts, I finally figured out how to fit the information I have in the connection string. You need to follow the steps below:

  • Create an App Registration on Azure Active Directory

App Registration

 

  • Create a client secret
    • The client secret goes on the connection string, but not the client secret name. clientId is not the name of the client secret.

App Secret

 

  • Set the Api Permissions to Log Analytics Read as application permission (not delegation)

API Permissions

 

  • On Manifest of the AAD application, copy the appId. The appId is the value to fill the clientId property on the connection string. This was one of the most challenging discoveries.

 

App Id

 

  • On the Log Analytics workspace, copy the Id of the workspace. Workspace property on the connection string is the Id, not the name. This was another though discovery.

Workspace Id

  • On the Log Analytics workspace, Access Control (IAM) => Add => Add Role Assignment
    • On the next window, give permission to the app registration to read the logs. You will find the app registration as a user.

 

Log Analytics Permissions

  • On the Azure Active Directory, copy the tenantId

Tenant Id

The connection string you need to use:

%kql loganalytics://tenant='<<your tenant id>>';clientid='<<yourappid>>';clientsecret='<<your client secret>>';workspace='<<Your log analytics workspace id>>';alias='<<workspace name>>'
 
Voila! You are connected to Log Analytics and you can use KQL to retrieve data and build graphics.
 

Further Studies

  • Metrics Advisor in Cognitive Services has Azure Data Explorer as its only Kusto source. It’s unable to connect to Log Analytics. However, Log Analytics Export is also a feature in preview, so we can export from Log Analytics to Metrics Advisor. There is a lot to discover here.
  • Azure Data Explorer has a proxy which enables Data Explorer to connect to Log Analytics. Log Analytics becomes an additional node in the cluster. I think this solution may be useful in two situations:
    • You are already using Azure Data Explorer for something (whatever) and you would like to keep everything in a single place.
    • You are a solution provider managing hundreds of different Log Analytics services for different clients. The best architecture would be create a single place where you can query everything. This 2nd solution seems very interesting. If I’m not mistaken, we would need to use Azure Lighthouse as well.
  • Metrics Advisor can connect to Azure Data Explorer. So, if we create an Azure Data Explorer and configure a proxy, we can use Metrics Advisor to analyze Log Analytics data through the proxy. This seems very interesting.
  • Configuring the proxy, we can use the regular Azure Data Studio Kusto extension for Data Explorer and we can query Log Analytics data as well.

References

The post Connecting to Log Analytics using Azure Data Studio and KQL appeared first on Simple Talk.



from Simple Talk https://ift.tt/2HlbOfs
via