Saturday, November 26, 2022

Backing Up MySQL Part 5: Big Data

According to research made by Statista in 2022, the amount of data that is being worked with globally is expected to reach more than 180 zettabytes. Part of that is because more and more data is being introduced to the online world – but another part has its roots in software development. There are approximately 24.3 million software developers in the world (source: InApps), and with the number increasing every year, there‘s bound to be a lot of work with data.

The one thing that has always helped software developers work with data are databases – databases are not all made equal though. Some are said to be a fit for working with bigger data sets (MongoDB comes to mind), some are fit for timescale data (think TimescaleDB), some are a good fit for other use cases.

However, MySQL is MySQL – the RDBMS has been here for ages, and undeniably, it‘s here to stay. As with every database management system, MySQL has its upsides and downsides – one of the biggest notable downsides of MySQL is that many people think that it‘s not a fit for working with big data.

However, that’s not exactly true – MySQL is powering some of the biggest websites in the world from Facebook (for Facebook, MySQL is one of the main database management systems of choice as the social network stores almost all of the necessary data in it), to some of the biggest & fastest data breach search engines in the world letting people secure themselves from identity theft.

MySQL is a powerful, but a complex beast and in this blog, we will tell you how to back up big data sets within that beast.

The Infrastructure of MySQL

Before working with bigger data sets inside MySQL, you should familiarize yourself with its infrastructure. The most frequently used storage engine specific to MySQL is InnoDB and its counterpart XtraDB developed by Percona. Both InnoDB and XtraDB have a couple of properties unique to themselves:

  1. The functionality of InnoDB is based on the InnoDB buffer pool – the buffer pool is an area used to cache the data and indexes when they are accessed. The bigger it is, the faster certain types of queries that are using the buffer pool or its data (SELECT, LOAD DATA INFILE) will complete.
  2. InnoDB supports the ACID model guaranteeing data atomicity, consistency, transaction isolation, and durability. Note that even though InnoDB is known to offer support for ACID properties by default, it can also be made to exchange ACID for speed (and vice versa.)
  3. InnoDB has a couple of acquaintances – while InnoDB data is stored in the ibdata1 file, InnoDB also has a couple of log files – ib_logfile0 and ib_logfile1 – that are used to store undo and redo logs. Undo logs contain information about undoing changes made by a specific transaction, while redo logs store information about how to re-do them (hence the names.) Both files can be found either in the /var/lib/mysql/mysql*.*.**/data directory if Linux is in use, or in the /bin/mysql/mysql*.*.**/data directory if you find yourself using Windows (*.*.** represents the MySQL version):
  4. All of the settings relevant to InnoDB can be configured via the settings present in my.cnf.

Now as far as backups are concerned, we don‘t need to know much about the internals of InnoDB (or any other MySQL storage engine for that matter) because we can simply take them using mysqldump, mydumper, or through phpMyAdmin by heading over to the Export tab:

All of those tools will provide us with logical backups that can be used to restore data whenever we please, however, for those of us that work with bigger data sets, neither of those tools are an option – as INSERT statements contained in the backups derived from those tools usually come with a lot of overhead, developers frequently find themselves needing to dip their toes in other waters instead.

Backing Up Big Data

Logical Backups

In order to back up data sets that are bigger in size, developers need to keep a couple of things in mind:

  1. First off, if a logical backup is being taken, we should avoid creating a backup filled with INSERT INTO statements – this article should provide you with a little more information what these are, but the main reason such statements are to be avoided is because such statements come with a lot of overhead that MySQL has to consider. In order to execute a single INSERT statement, MySQL has to check for permissions, open the necessary tables, initialize the query, acquire all locks that are needed for the query to execute, run the query, finish the work (let the query run until it’s finished), close tables and perform additional clean up operations. Of course, these might be relatively quick to complete in the grand scheme of things, but we need to consider that we would be working with hundreds of millions of rows at a time.
  2. Working with big data sets would require a dedicated server, or at the very least, a VPS. This one is probably already out of the question (you’ve done your investigation into the hosting provider before choosing one, right?), but nonetheless, choosing the right server for our big data project is always a necessity.
  3. How many databases are we backing up? How many tables are inside of those databases? Do all of those need to be backed up? You would be surprised how many times we can limit the amount of data that is being backed up by simply thinking ahead.
  4. Did we work with bigger data sets before? Our experience in the past teaches us more lessons than we could imagine – however, if we didn’t have much experience beforehand, we can also just ask around – if our colleagues are well-versed in the field, the answers to most pressing questions shouldn’t be too far away.

Once the basics are covered and we’ve decided what we’re backing up, it’s time to move on. The best way to take a logical backup of a bigger data set is to use the LOAD DATA INFILE query – since the query comes with many of its own bells and whistles, it’s significantly faster than ordinary INSERT statements:

  • MySQL needs little overhead to parse the query.
  • Users can easily skip lines or columns.
  • Users can easily specify the columns to load the data into at the end of the query.

It’s important to note that MySQL offers two ways to use LOAD DATA to load data into our infrastructure – LOAD DATA LOCAL is another query that users can use. The local option only works if we didn’t define the local-infile option to be 0 or if the local-infile option is defined within my.cnf underneath the [mysql] and [mysqld] headings. The LOAD DATA LOCAL query is widely considered to be insecure because the parsing of data happens on the server side – the local option:

  • Invokes certain security functions of MySQL that must be followed.
  • Changes the behavior of MySQL when searching for the file to import.
  • In some cases (when dealing with errors, etc.), it has the same effect as IGNORE does.

The most important security considerations are explained here.

The main reason LOAD DATA INFILE works so well with bigger data sets though is because the files created with it only contain the data that is being backed up separated by a specific character (“:”, “|”, “,”, Tab, etc.) – MySQL can then load the data straight into a specific column without performing any additional operations and know where the set of data “ends” as well. A basic query looks like so:

LOAD DATA INFILE ‘file.ext’ INTO TABLE demo_table [options]; where:

  • file.ext is a file bearing any extension other than SQL (.txt, .csv, etc.);
  • demo_table defines the table that the data will be loaded into.
  • After the file and the table are defined, we can choose to define certain options:
    • TERMINATED BY will make our columns end with a certain character (it can be useful when our file has a lot of records that are terminated by a character):
      LOAD DATA INFILE ‘file.ext’ INTO TABLE demo_table TERMINATED BY ‘|’;
    • CHARACTER SET will make MySQL use a specific character set when importing the data. This feature can be very important when loading in data that comes from a different geographical place than the existing data (e.g. if our data is from England, and we’re importing data from Korea, China, Russia, etc.) – it will help us avoid errors when importing such data sets – learn more about charsets here.
      LOAD DATA INFILE ‘file.ext’ INTO TABLE demo_table CHARACTER SET latin1;
    • PARTITION will make MySQL able to insert data into a specific partition – that’s very useful when working with bigger partitioned data sets – the partition can be specified like so (p1, p2, and p3 defines the partitions):
      LOAD DATA INFILE ‘file.ext’ INTO TABLE demo_table PARTITION p1, p2, p3;
    • IGNORE LINES|ROWS can be used to ignore certain lines or rows – this feature can be very useful when we want to ignore a certain amount of rows or lines when importing big data sets – perhaps we only want to import rows starting from the row #25527? (we can either ignore lines or rows):
      LOAD DATA INFILE ‘file.ext’ INTO TABLE demo_table IGNORE 25526 LINES|ROWS;

All of the options can be found at the MySQL documentation.

As far as backups of big data are concerned, the most popular options that are used are as follows:

  1. TERMINATED BY
  2. PARTITION
  3. CHARACTER SET
  4. IGNORE LINES|ROWS

Each of those options have their own upsides (see above) and when combined with an exceptionally large data set (think 100M rows and above), they make the LOAD DATA INFILE operation even more powerful than before.

To take a backup that can be restored by using the LOAD DATA INFILE statement on the other hand, we’d need to use the SELECT [columns] INTO OUTFILE statement:

SELECT * FROM demo_table INTO OUTFILE ‘backup.txt’ [options];

The options remain the same. That’s it!

Another consideration that is worth mentioning is the IGNORE keyword: the keyword will ignore all of the errors derived from the query: think not all entries having enough columns, some rows being wrongfully terminated (terminated with the wrong character), etc. – to make use of the keyword in LOAD DATA INFILE operations, specify it like so:

LOAD DATA INFILE ‘path/file.txt’ IGNORE INTO TABLE demo_table [options];

To make use of the keyword in SELECT ... INTO OUTFILE operations, make use of the keyword like so:

SELECT * INTO OUTFILE ‘demo.txt’ IGNORE FROM demo_table [options];

Contrary to popular belief, the IGNORE keyword has more use cases than we could think of – can you imagine how much time is saved by ignoring thousands or even millions of errors?

Physical Backups

As far as physical backups are concerned, backing up big data sets is not that different from backing fewer amount of rows – here’s what we have to consider:

  1. We can either take a hot or cold backup (a hot backup is taken when a database is still running, while a cold backup is taken while the database is down.)
  2. We need to consider whether we will use tools or take the backup manually (the tools that come to mind in this space include MySQL Enterprise Backup and others.)

If you decide to take a hot backup, most of the time you will be advised to use mysqlbackup (MySQL Enterprise Backup) as it provides good results with minimal downtime.

If we decide on a cold backup, the following steps should help:

  1. Shut down MySQL.
  2. Copy all of the files relevant to your storage engine of choice – if you’re using InnoDB, take a backup of ibdata1, ib_logfile0, and ib_logfile1 and store them somewhere safe (If you’re using MyISAM, simply take a backup of .MYD and .MYI files in your MySQL data directory and stop here.)
  3. If you’re using InnoDB, take a copy of all of the tables in the data directory of MySQL (start from the /var/lib/mysql/bin/ directory if you find yourself using Linux – if you’re using Windows, follow the path given below):
  4. Finally, take a copy of my.cnf – the file can be found in /var/lib/bin/mysql/mysqlversion when using Linux – if you find yourself using Windows, the file is called my.ini and it can be found here (keep in mind that taking a copy of the file can be useful to review old settings as well):

For those who are interested, the settings within the file will most likely look like so – if you want to, feel free to explore the file, then come back to this blog:

Anyhow, here’s why performing all of these things is a necessity:

  1. Shutting down MySQL will ensure that nothing interferes with our backup operation.
  2. The backup of ibdata1 will take a backup of all of the data existing in InnoDB, while backing up the ib_logfile0 and ib_logfile1 will back up the redo logs.
  3. Taking a copy of the .frm and .ibd files is necessary because as of MySQL 5.7.8, only table metadata is stored in ibdata1 to prevent congestion – the actual table data is stored in the .frm and .ibd files (an .ibd file contains the data of the table, while an .frm file contains table metadata.)
  4. Taking a copy of my.cnf (or my.ini if we find ourselves using Windows) is an absolute necessity as well because the file contains all of the settings relevant to all storage engines in MySQL. That means that even if we are using an obsolete storage engine like MyISAM and want to continue running the engine on the server that we’re loading the backup onto for some reason (MyISAM provides faster COUNT(*) operations because it stores the number of rows in the table metadata – InnoDB does not), we can.
  5. Finally, make sure to store everything in a safe and accessible place – doing so allows us to recover data whenever we need to.

To restore a hot backup taken by using MySQL Enterprise Backup, once again refer to our blog, and to restore a cold physical backup taken by following the steps above, follow the steps below (the steps are specific to InnoDB):

  1. While MySQL is still down, copy over your my.cnf file to your server and adjust the settings within the file to your liking.
  2. Place the folders contained within the data folder when backing up back into the data directory.
  3. Place the ibdata1 file and the log files back into the data directory of MySQL (see above for an example.)
  4. Start MySQL.

To recover your MyISAM data, simply copy over the data folder to your server. No further action is necessary due to the fact that MyISAM doesn’t store additional data in files as InnoDB does.

To take and restore physical backups of InnoDB, we can also use PITR (Point-In-Time Recovery) – some say that the PITR method is significantly easier, but that depends on the person. Anyway, the MySQL manual should walk you through all of the necessary steps.

Summary

Backing up big data sets in MySQL is not the easiest thing to do – however, with knowledge centered around how the main storage engine (InnoDB) works internally and what queries we can run to make our backup process faster, we can overcome every obstacle in our way.

Of course, as with everything, there are certain nuances, so make sure to read all about them in the MySQL documentation chapters surrounding the necessary functions you need to employ – came back to this blog for more information about database functionality and backups within them, and until next time!

The post Backing Up MySQL Part 5: Big Data appeared first on Simple Talk.



from Simple Talk https://ift.tt/56bLHMR
via

Friday, November 25, 2022

Express.js or Next.js for your Backend Project

Backend projects create projects to execute code on a centralized server and then communicate results with a client. These kinds of projects are executed efficiently with Express.js, and the number of web developers who use this framework is increasing daily. Next.js, on the other hand, makes backend projects more interesting because you can use it alongside an existing backend as well as for frontend development.

Both Next.js and Express.js are good tools for development and production projects. 

Development and production are terms that every software developer should know. Development refers to the stage when you are still building and experimenting with your application on your machine. It is not yet open for end users to do anything beyond quality assurance testing. Production represents the stage when your application is available for the public – end users– to use. You must have fixed all glitches at this stage (or at least all that you can find!), and the application is ready for users to do the work the application is built for and expect that work to be persisted. 

It is essential to state here that Express.js and Next.js are both powerful tools for creating an enjoyable user interface. However, both tools have peculiarities to understand so in this article I will help you decide whether Express.js or Next.js is the best Javascript framework for your backend project.

This article will compare Express.js and Next.js and show which may be preferable for you when embarking on a backend project.

What is Express.js?

Express.js (also referred to as Express) is a node.js web application framework. It is an excellent backend web tool and was founded by TJ Holowaychuk who also is the sole developer of Apex software.  Express.js is layered on node.js which aids in managing servers. Node.js is a reliable javascript runtime that is efficient in creating server-side applications. Express.js became popular when it was used by top brands like Fox Sports, PayPal, Une and IBM. For more companies that use Express.js, check this article on the Express.js website.

Web developers can make the most of the features of Express.js because it stands out among other web application frameworks, one reason being that developing with it is faster. In addition to this, it offers a less stressful system of debugging. If you are a developer, I am sure you understand how stressful finding and removing bugs from software can be. This is especially true with backend software as errors do not always show up directly to the end user.

What is Express.js used for?

In the JavaScript/Node.js ecosystem, Express.js is used to create applications, API endpoints, routing systems, and frameworks.

A selection of the types of applications you can build with Express.js on the backend include

  • Single Page Applications
  • Real Time Collaboration Software
  • Streaming Software
  • Financial Technology(Fintech) Applications

In the following sections I will discuss how Express.js can be used to implement these types of solutions.

Single-Page Applications

Single-Page Applications (SPAs) are a modern approach to application development in which the entire application is routed into a single index page. Express.js is an excellent framework for creating an API that connects these SPA applications to server resources and consistently allows them to serve data. Examples of single page applications include Gmail, Google Maps, Airbnb, Netflix, Pinterest, PayPal, and many other. Companies are using SPAs to create a fluid, scalable experience that works similar to an application in a browser window.

Real-Time Collaboration Software

Collaboration tools make it easier for businesses to work and collaborate daily. With Express.js, you can easily develop collaborative and real-time networking applications.

For example, this framework is employed to build real-time applications like chat and dashboard applications. Express.js make it virtually effortless to integrate WebSocket into the framework.

Express.js handles the routing and middleware portions of the process, allowing developers to focus on the critical business logic of these real-time features when developing live collaborative tools, rather than the real-time itegrations.

Streaming Software

Streaming software is ubiquitous these days. Most people use one like Netflix, Disney+, etc. Streaming software is complex real-time application typically using many layers of data streams. To build such an app, you’ll need a solid framework that can efficiently handle asynchronous data streams. Express.js It’s an excellent framework for developing and deploying enterprise-grade, scalable streaming applications.

Financial Technology (Fintech) Applications

Fintech refers to computer programs and other forms of technology that support or enable banking and financial services. Building a fintech application to support customers anywhere from any device is a continuing industry trend, and Express.js is one of the best frameworks for developing highly scalable fintech application

If your plan is to develop and deploy a fintech application with a high user and transaction volume, it may interest you to know that you’ll be joining companies like Paypal and Capital One in using Express.js.

What is Next.js?

Next.js is a React-based open-source web development framework. It was brought about by Vercel, which was formerly known as Zeit. It allows us to add functions – like server-side rendering and static web generations – to React based apps. In addition to this, it extends React to provide additional features that are extremely useful in production, such as caching, which improves response times and decreases the number of requests to external services.

Next.js is used in web development to create fast web apps. You can create a React app that uses server-side rendering to store content on the server in advance. End-users can easily interact with the HTML page and a user-friendly interactive website or app.

What is Next.js used for?

Next.js has handled quite a lot of problems in frontend for years now making big and popular companies to rely on it. It’s mostly known to help developers create performant web applications and speedy static websites. Below highlights some of the things that the framework is used for the following usage types, amongst others.

  • Server-side rendering
  • Static Websites
  • Routers
  • Web Software
  • Code Splitting

In the following sections I will discuss how Next.js can help with these sorts of utilizations.

Server-side rendering

By default, Next.js performs server-side rendering. This optimizes your app for search engines. You can also use any middleware like express.js or Hapi.js, as well as any database like MongoDB or MySQL.

Static websites

By employing a static generation tool such as Next.js, you can serve a full page of content on the first load. Next.js’s static site generator provides good speed without sacrificing the quality.

Routers

A router is a mechanism that lets you determine what occurs when a user goes to a certain webpage. For more details, see this page on the Next.js website. This is yet another fantastic feature of Next.js.

When you make use of the create-react-app command, you must usually install react-router and configure it but Next.js includes its own routers with no configuration. The routers do not require any additional configuration. Simply place your page in the pages folder, and Next.js will handle all routing configuration.

Web software

Because Next.js possesses features such as the TypeScript support which helps to avoid and catch errors in the development phase, it is suitable for building large-scale and complex software. Next.js includes a robust debugging feature, which allows developers to debug their code by generating friendly error messages.

Authenticating statically generated pages permits Next.js to automatically determine that a page is static if no blocking data requirements exist. This allows Next.js to support different authentication patterns. Next.js can be used to create a client-friendly interactive user interface.

Code splitting

It is true that code splitting, also referred to as lazy loading, enhances the user experience of your application. In some applications, it’s possible that a page may take some time to load. If the loading time is long, the visitor may dismiss your app because they are not sure whether the application has failed or not.

To avoid this, a trick is used to show the visitor that the page is loading, such as displaying a spinner. Doing this allows you to deal with and control slow loading by only loading part of the JavaScript in your page quickly, and loading the rest while the user waits, confident that the site has not frozen up.

Next.js includes methods for splitting code that makes it easy to dynamic loading of pages.

Edges and Drawbacks

In this section, I will provide you with a set of positives and negatives for both Express.js and Next.js. This will help you to decide which of the libraries in this article is best in certain situations.

Edges of Express.js 

This section will cover a few aspects of Express.js that makes these JavaScript tools great for back-end development.

  • It supports JavaScript, making it simple to learn. In addition to that, Express.js is also widely available and has a big community. 
  • It offers a straightforward route for client requests.
  • Express.js includes middleware that handles decision-making. You can define an error handling in the middleware as well.
  • Express.js provides developers with convenience, flexibility, and efficiency.
  • Asynchronous APIs are used.
  • Express.js is compatible with any database. Databases like MongoDB, Redis, and MySQL are straightforward to connect to.
  • It is simple to integrate with various template engines such as Jade, Vash, and EJS.
  • It’s simple to serve static files and app resources.

Drawbacks of Express.js 

Express.js exercises a lot of benefits in web development but there are also drawbacks that comes with working with this tool. In this section I will list some of them.

  • The middleware system provided by Express.js encounters a number of client request issues.
  • There could be issues with asynchronous callbacks. For more information on such callback, visit this page.
  • Developers who are familiar with other software languages may find Node.js’ event-driven nature difficult to grasp.
  • Organization of code in Express.js projects can be complex.
  • Express.js security can have security vulnerabilities that need frequent patching. For more details, visit this webpage.
  • Its error messages are frequently ineffective, making it hard to debug at times.

Edges of Next.js 

Comparing Next.js with its counterparts, you’ll come across plenty of benefits and this section will highlight some of these benefits.

  • The delivery of the complete webpage is ensured by static site generation.
  • When developing Next.js projects, it allows for simple deployment. All you do is link up your account to GitHub and transfer your Next.js repositories. 
  • It allows for simple page navigation. The Next.js router enables client-side route transitions between pages.
  • API Routes allows you to use multiple HTTP actions for the identical address in the same file.
  • Next.js includes TypeScript out of the box, making builds easier.
  • The number of developers is increasing as the popularity of Next.js grows. As a result, it may be easier to find a company or freelancer to make any necessary changes.
  • Security of data is reasonably easy to ensure using Next.js.
  • It makes creating an outstanding user experience (UX) easy.

Drawbacks of Next.js 

Some features of Next.js might come up as a disadvantage when relating it to how it affects you, relating to a particular situation, time or project and this section will shed more light on those cons.

  • Cost of flexibility is somewhat expensive. This is because Next.js does not include many installed front pages, so you must create the entire front-end layer from scratch.
  • Development and management are worth considering.  Say you want to use Next.js to build an online store but don’t have an in-house development team, you may need to hire someone to handle the development and management subsequently.
  • Because there is no built-in state manager, you will need to use Redux, MobX, or something similar.
  • In comparison to Gatsby.js, there are fewer plug-ins available.
  • Because Next. js supports static website development, the amount of time it takes do build software packages with many pages could be quite lengthy.

Key elements Comparison

In this table, I will show a list of key elements of a JavaScript libraries, and how Express.js and Next.js fit these elements.

Key elements 

Express.js 

Next.js 

Data fetching 

Doesn’t let Express.js deliver a response from server-side to an EJS template in client’s side

Next.js data fetching enables you to deliver your content in a variety of ways, based on the use case of your application.

Server use

Can be used both on the client-side and the server-side

Used for building server-side rendering applications 

Plug-in and ecosystem 

The Express Gateway plugin includes entities and supports event subscription.

Next.js makes use of existing powerful tools such as Webpack, Babel, and Uglify, and presents them to the end user in a stunningly simple interface.

Static generation 

Deferred static generation.  Developers can choose to postpone the creation of specific pages until the first time a user requests them.

Incremental static generation. After you’ve built your site, you can use Next.js to create or update static pages.

Scalability 

When developing a large-scale web application, it handles user API calls efficiently and requires little to no extra configuration.

Next, js makes it simple to scale multiple pages because it allows you to choose whether to render each page on the client, the server, or both.

Speed

a Node. js backend framework with minimalist and fast tools and functions for developing scalable backend applications

Because of the static destinations and server-side rendering, it is extremely fast.

SEO

It is beneficial for SEO as it offers crawlers from search engines with a fully rendered homepage, making their work easier.

Next.js is a fundamental tool for achieving impressive SEO performance. 

Express.js Vs Next.js: Is either truly better?

There are numerous JavaScript packages available for use in your project but determining which one is best for you can be difficult. Many developers use Express.js because it is simple to learn and rapidly growing. Next.js is also popular because of its ease of use and performance advantages.

Next.js is smart enough to only load the Java and CSS required for each document. This results in much quicker page loading times because the user’s browser isn’t required to install JavaScript and CSS that aren’t required for the page where the user is viewing.

Express.js enables the routing of web applications utilizing multiple URL and HTTP methods. It enables an Express JS designer to handle a large number of orders and quickly generate responses for specific URLs. This enhances the user experience while also allowing the apps to scale.

These two are great tools to use for the project but the question of which is better lies in what exactly the nature of your project is. One is a specialized backend tool while the other is a full stack tool which can be used for both frontend and backend.

The post Express.js or Next.js for your Backend Project appeared first on Simple Talk.



from Simple Talk https://ift.tt/IbwU2C9
via

Tuesday, November 22, 2022

Backing up MySQL Part 4: mydumper

There are many tools we can use to back up our MySQL databases. Some are well-known and used by the best technology companies out there (mysqldump comes to mind), and some are a little less famous, but still have their place in the MySQL world. Enter mydumper – the tool is built by the engineering team over at Percona and it‘s supposedly created to address performance issues caused by mysqldump.

Prerequisites and Installation

mydumper is a tool that lets people take logical backups of their MySQL databases. The tool is built according to the best MySQL practices, it‘s being actively maintained, and works on all flavors of MySQL, including Percona Server and MariaDB. Note that the tool is not built-in to MySQL – to use it, one needs to install the development versions of MySQL or any of its flavors, install a couple of development tools, and install the development versions of GLib, ZLib, PCRE, and ZSTD – once all of these libraries are installed, we should follow the instructions for installation that are available here.

All of these libraries accomplish specific goals – GLib is a bundle of libraries, ZLib is used to compress data, PCRE stands for „Perl-Compatible Regular Expressions“, and ZSTD – otherwise known as ZStandard – is a compression algorithm developed by Facebook.

The installation of mydumper will provide us with:

  • The ability to export and import data using multiple threads in parallel, sort of like mysqlpump does.
  • The ability to use regex expressions when exporting data.
  • Easily manageable output – all tables will be backed up into separate files exclusive to them, data and metadata will be separated, etc.

Using mydumper

Once mydumper is installed, we can start playing with the tool. Even though there are multiple advanced features provided by the tool, the syntax isn‘t rocket science and looks like the following:

mydumper [options]

That‘s it, really. Well, users will still need to specify the host (--host), user (--user), and password (--password) to work with, but other than that, only the options remain. The output that is received will, of course, vary and depend directly on the options that are specified, so let‘s look into them as well:

  • Users can specify the --database option to provide mydumper with the database to work with (dump data from.) Alternatively, the --B option can also be specified.
  • The --tables option, or its shortened version --T, will let users specify a comma-delimited list of tables to take a copy of.
  • The databases and tables that are being dumped need to have an output directory that can be specified by appending --outputdir or --o (the directory must be writable.)
  • Users can also split INSERT statements into smaller size by using --statement-size (or --s.) Note that the size of INSERT statements are to be specified in bytes (default size – 1,000,000 bytes.)
  • mydumper allows its users to specify which storage engines to ignore (i.e. mydumper will not back up tables built on this specific storage engine) by using --ignore-engines or --i.
  • The --regex or --r option will let users specify databases and tables using regex in the format of „database.table.“ Something like this should do:
    mydumper --regex „(^demo_db\.demo_database$)“
  • To specify the number of threads mydumper is able to use (the default number of allocated threads is 4), consider the --threads or --t option.
  • mydumper is also able to automatically send long-running queries to the timeout realm or kill long-running queries: that can be done by either specifying the --long-query-guard or --kill-long-queries parameters (or their shorter counterparts which are --l and --k respectively.)

The options specified above should tell you a little about just how powerful mydumper can be when used properly – however, do note that other options can be used as well mydumper also comes with a brother called myloader which is essentially a backup tool that restores data generated by mydumper from a directory specified by the user. To use myloader, invoke it like so:

myloader --directory=[directory] [--overwrite_tables] --user=your_user

The --directory option takes the directory inside of which the backup created by mydumper is being stored, and the overwrite-tables option can also be invoked to overwrite all existing tables.

Make sure to read up on the docs surrounding the two tools here and follow Percona’s blog to learn more about the tools they develop – explore the Databases part of SimpleTalk to learn more about the functionality of all kinds of databases (our blog is not limited to MySQL!) and how they impact the behavior of applications, and we’ll see you in the next one.

The post Backing up MySQL Part 4: mydumper appeared first on Simple Talk.



from Simple Talk https://ift.tt/oWGdpUs
via

Converting Data Across Time Zones: An In-Depth Primer

If only the entire world used UTC, wouldn’t life be so much easier? We can dream, can’t we? While some software applications can live in an ecosystem where all dates and times can be stored in a single time zone with no conversions needed, many cannot. As applications grow and interact more with international customers, this need becomes omnipresent.

Converting a current time from one time zone to another is relatively easy. Regardless of whether daylight savings is involved or not, one simply needs to retrieve the current time in both time zones, find the difference, and apply that difference as needed to date/time calculations. Historical data is trickier, though, as times from the past may cross different daylight savings boundaries.

This article dives into all the math required to convert historical times between time zones. While seemingly academic in nature, this information can be used when building applications that interact between time zones and need to apply detailed rules to those applications and their users. These calculations will be demonstrated in T-SQL and a function built that can help in handling the math for you.

Note that there are tools and functions that may be available to you that can convert time zone data automatically. For example, AT TIME ZONE in SQL Server 2016+ can do this. Not all data platforms have this available, though. Moreover, there is a huge benefit to understanding the nitty-gritty of how time zones are defined and how they work, even when tools are available to do the work for us.

The following is an example of how AT TIME ZONE can be used to quickly convert a point in time from on time zone to another:

SELECT
        CAST('3/1/2022 12:00:00' AS DATETIME2(0)) 
           AT TIME ZONE 'Eastern Standard Time' AS source_time,
        CAST('3/1/2022 12:00:00' AS DATETIME2(0)) 
           AT TIME ZONE 'Eastern Standard Time' 
             AT TIME ZONE 'Central European Standard Time' AS target_time,
        CAST('3/1/2022 12:00:00' AS DATETIME2(0)) 
           AT TIME ZONE 'Eastern Standard Time'
               AT TIME ZONE 'UTC' AS utc_time;

This T-SQL returns a time in Eastern Standard Time (3/1/2022 @ noon). The next time is the same one, but converted to Central European Time. The final time in the list is the same time converted to UTC. The inputs must be DATETIME2, or this syntax will fail. The results are as follows:

The results show the source time appended with the correct offset from UTC (-5 hours), the same time adjusted to Central European Time (+1 from UTC) and lastly the same time converted to UTC (with the expected offset of zero).

If AT TIME ZONE is available to you, consider using it as it is a relatively simple syntax that can help manage TIMEZONEOFFSET dates/times efficiently, without the need for extensive date math.

AT TIME ZONE does have some limitations, however. For example, a date from the 1800’s would still have modern daylight savings rules applied, even though daylight savings did not exist back then.

A Note on Changing Rules

And noted in the 1800’s example, daylight savings rules change. For example, in the United States, the period for daylight savings was lengthened by 5 weeks in 2007. Other countries have made similar changes over the years, with some abolishing the semi-annual time changes altogether. For the purpose of this article, and to avoid it getting far too complex, current rules will be honored in the examples, but the value of having T-SQL code that I will demonstrate is to allow you to be able to implement scenarios where the standard methods will not work.

If you have a need to accurately convert across boundaries when daylight savings rules (or any time zone rules for that matter) have changed, then add those breakpoints into your code as needed. These events are rare enough that they may not impact your code, and they are well-documented enough to be able to adjust for them effectively.

The simplest way to do this would be a versioned table that maintains time zone attributes and has a valid from/valid through date that indicates when each set of rules is applicable to a given time zone. Then any process that needs this data can consult the appropriate rules based on the date, filtering using the valid from/valid to dates.

Ideal Solutions to Time Zone Challenges

Understanding how time zones and daylight savings operate is invaluable to solving complex problems that arise when working with localized data. To some extent, avoiding these challenges altogether is next-to-impossible, but there are a wide variety of best practices that can minimize their impact and reduce the effort needed to resolve them.

Set Server Time to UTC

Having multiple moving targets can make working with data even more challenging. Data ultimately will reference two sets of time zones: The application data time zone and the server time zone. Having application and database servers set to UTC means that they have zero bias and never shift due to daylight savings. This greatly simplifies both code and database maintenance.

The most common problem that occurs when a database server is set to use a local time zone is when daylight savings shifts twice a year. What happens to SQL Server Agent Jobs when you spring forward and lose an hour from 02:00 to 03:00? The answer is that THEY NEVER RUN! For unsuspecting developers and administrators, this can be a nightmare. Imagine backups being skipped for a day. Or maybe a daily analytics job never crunches its data for an important dashboard. When daylight savings reverts to standard time, jobs scheduled during that hour to OCCUR TWICE (!) For some tasks, this may be acceptable, but a reporting task that creates data every hour may end up with duplicates. Similarly, any jobs that create files, backups, or tabular data will get an extra round of that data when this happens.

Any application that relies on SQL Server Agent or scheduled tasks that are scheduled using local server time will incur unexpected behavior when daylight savings shifts. The effort to mitigate this risk can be surprisingly complex to the point where some administrators simply avoid scheduling and all tasks during the daylight savings changeover time period (specified by standard_hour and daylight_hour). Similarly, a job that should run at midnight UTC will need to be adjusted alongside daylight savings changes if the server it resides on uses a non-UTC time zone. Otherwise, a job that runs at 8pm EST will become a job that runs at 01:00:00 UTC instead of 00:00:00 UTC when Eastern falls back to standard time.

If application and database servers can be set to UTC, then these problems disappear for good.

Store Date/Time data in UTC or DATETIMEOFFSET

When storing dates and times, the default often chosen when an application is new is the default local time zone. Over the years, this has resulted in many applications that use Eastern US or Pacific US time as the default format for storing this data. This seems convenient until an application grows to encompass many locales and inevitably becomes international, spanning dozens of countries and time zones. Now, conversions are needed to ensure that users see the correct time for where they live and internal processes can convert times correctly to enable meaningful data analytics. Similarly, a software development team may be scattered around many different time zones, adding more complexity into how dates/times are worked with.

Storing all dates and times in UTC ensures that no complex conversions are needed. This is simple and straightforward and if the database server is also set to UTC, then there is no discrepancy between SYSDATETIME() and SYSUTCDATETIME(). For some data points, such as log data, analytics, row versioning, or other internal processes, storing them in UTC is ideal for consistency and ease of use.

If data is directly correlated to a user or location that is time-sensitive, then storing that data using a DATETIMEOFFSET data type is ideal. DATETIMEOFFSET is a data type that stores a time along with its UTC offset. This provides the best of both worlds: the ability to quickly retrieve data points normalized to UTC (or some other time zone) and the ability to know what a local time is. Sometimes there is a need to understand what happens at certain times of day using local times, rather than UTC times. For example, analyzing how order processing occurs at a busy time of day is a common problem. Busy in London will occur at a different time of day than in Melbourne.

Do Not Make Assumptions

There are dozens of time zones defined around the world and they do not all operate the same way. Do not assume that the dates/times in your application will always behave like a specific time zone. Not only do different time zones behave differently from each other, but rules as to how time zones work can change over time.

Time zones should be seen as a property of the user experience. Time zones in of themselves are artificial constructs that help us navigate the day with more ease so that 12:00 is considered mid-day and is during a lighter part of the day, regardless of where you live. Computers do not care about time zones. Nor does data. Flying to Tokyo from New York does not cause you to travel forwards in time by crossing lots of time zones.

Therefore, treat time zones as if they are a user preference. If a user lives in Madrid, then they would experience the Central European time zone (UTC+2), but data stored about them may not need to know this. It could be UTC or DATETIMEOFFSET, if needed. Similarly, storing data in multiple time zones in a table is liable to result in confusion. Ideally, a column that represents a time should not require being stored repeatedly in many time zones. Those conversions should ideally happen in the presentation layer of an application and not in the database.

The Components of a Time Zone

Each time zone has built-in components that describe how it works. To understand how to convert between time zones, it is necessary to understand the components that comprise a time zone. These attributes can be roughly broken into two categories:

  1. How does this time zone compare to UTC?
  2. If this time zone uses daylight savings, how and when are adjustments made?

The following is a list of all components needed to understand the definition of a time zone:

Bias

This is the default difference between a time zone and UTC, typically given in minutes. It can be positive or negative. The number indicates the number of minutes that needs to be added to a local time to convert it into UTC during standard (non-daylight savings) times.

For example, a bias of zero indicates UTC/GMT. Eastern Standard time has a bias of 300, which indicates that 300 minutes need to be added to an Eastern Standard time (5 hours) to convert it to UTC during non-daylight savings times of the year. If daylight savings is in effect, then only 240 minutes would need to be added to the local time, though additional parameters related to daylight savings should be used to arrive at this number, since not all time zones behave the same way.

Start of Standard Time

For time zones with daylight savings, these numbers determine when during the year time is shifted from daylight savings back to standard time (aka: Fall Back). Along with the description will be the domain of values as they will be used in the code I will present.

  • Standard Month: The month of the year that time is shifted to standard time from daylight savings time. This is an integer from 1 (January) to 12 (December). Zero is typically used to denote time zones without daylight savings.
  • Standard Day: The day of the month after which daylight savings can occur. The day that daylight savings reverts is the first instance of the Standard Day of Week after the Standard Day. More on this below. This is an integer from 1-31. Zero is typically used to denote time zones without daylight savings.
  • Standard Day of Week: This is the day of the week that time shifts from daylight savings to standard time. This is an integer from 0 (Sunday) to 6 (Saturday) and has no meaning if this time zone does not have daylight savings.
  • Standard Hour: This is the hour of the day that time is shifted from daylight savings to standard time. This is an integer from 0 to 23 and has no meaning if this time zone does not have daylight savings.
  • Standard Bias: If there is any additional adjustment (in minutes) required to account for standard time, then it is provided here. Typically, this is zero and is not used in time zone calculations. For the purposes of this article, standard bias will not be used. Consider this a placeholder for hypothetical modifications introduced by future laws/changes, but as of the writing of this article is not used anywhere on Earth.

To determine when to shift from daylight savings to standard time, start with the standard month and day. If that day happens to be equal to the standard day of week, then you are done, otherwise iterate forward through the days of the month until the standard day of week is reached. For example, if the standard month is 11, the standard day is 1, and the standard day of week is 0 (Sunday), then that indicates that a time zone shifts from daylight savings to standard on the first Sunday of November, starting on November 1st. If November 1st is not a Sunday, then the shift will occur on the next Sunday of the month after November 1st. If the standard hour is 2, then the time will shift from daylight savings to standard time at 02:00 on that date.

Start of Daylight Saving Time

On the flip side, the following values provide the information needed to determine when and how to shift from standard time to daylight savings time. Time zones that do not use daylight savings are not impacted by these numbers.

  • Daylight Month: The month of the year that time is shifted to daylight savings time from standard time. This is an integer from 1 (January) to 12 (December). Zero is typically used to denote time zones without daylight savings.
  • Daylight Day: The day of the month after which time can shift from standard to daylight savings. The day that daylight savings occurs is the first instance of the Daylight Day of Week after the Daylight Day. This is an integer from 1-31. Zero is typically used to denote time zones without daylight savings.
  • Daylight Day of Week: This is the day of the week that time shifts from standard to daylight savings time. This is an integer from 0 (Sunday) to 6 (Saturday) and has no meaning if this time zone does not have daylight savings.
  • Daylight Hour: This is the hour of the day that time is shifted from standard to daylight savings time. This is an integer from 0 to 23 and has no meaning if this time zone does not have daylight savings.
  • Daylight Bias: This is the adjustment, in minutes, that occurs when time shifts from standard to daylight savings. This number can be positive or negative, but is usually negative and represents the minutes that are added to the local time zone bias to reach the daylight savings time. A time zone with a bias of 300 and a daylight bias of -60 indicates that when standard time shifts to daylight savings time, the local bias shifts to 240. In other words, this time zone “springs forward” an hour, as is often said colloquially.

The process to determine when to shift from standard to daylight savings time is quite similar to the process for shifting to standard time outlined earlier in this article. Start with the daylight month and day. This determines the earliest that daylight savings can occur. From here, the first instance of the daylight day of week (including that first date) is when daylight savings occurs. The daylight hour is the time when the daylight savings shift occurs. In most time zones, this is the same hour that is used for standard time.

Note that all time zone attributes are enumerated in Microsoft’s documentation and can be used when developing applications. More information can be found here: https://ift.tt/eFoIWl2

Many databases/lists of time zone attributes are available freely for download and use. The following is a small database containing a single table with the commonly used time zone metadata available for use. Feel free to download and share freely: https://www.dropbox.com/s/ifqflzqdhghifix/TimeZoneData.bak

There is a huge benefit to storing time zone information in a readily accessible table so that applications can use it whenever needed and have a centralized source of truth.

Example of Daylight Savings

Consider a time zone with the following parameters (this happens to be Pacific Standard Time):

  • Bias: 480 minutes (UTC-8)
  • Standard Month: 11 (November)
  • Standard Day: 1
  • Standard Day of Week: 0 (Sunday)
  • Standard Hour: 2
  • Daylight Month: 3 (March)
  • Daylight Day: 2
  • Daylight Day of Week: 0 (Sunday)
  • Daylight Hour: 2
  • Daylight Bias: -60 minutes

This time zone is UTC-8 hours (480 minutes). Daylight savings occurs on the first Sunday in March on or after March 2nd, at 02:00. At that time, 60 minutes are subtracted from this time zone’s bias, resulting in a new bias of 420 minutes (UTC-7 hours).

Similarly, this time zone reverts from daylight savings to standard time on the first Sunday in November on or after November 1st at 02:00. At that time, 60 minutes are added to this time zone’s bias to return it to its standard bias of 480 minutes (UTC-8 hours)

The Time Zone Challenge

Without daylight savings time, converting between time zones would simply be a matter of adjusting the bias between two time zones. For example, adjusting from a time zone with a bias of 300 to a time zone with a bias of 180 would only require that 120 minutes be subtracted from its bias. Practically speaking, reducing a time zone’s bias equates to adding minutes to a time whereas increasing the bias subtracts minutes from the time.

Daylight savings adds a wrinkle into this equation. The time on December 1st and the time on July 1st within a single time zone may each be subject to a different bias due to daylight savings. As a result, converting between time zones requires knowing whether each time zone honors daylight savings and if either (or both) do, then what are the details of when those time shifts occur.

For example, in the Central European time zone (UTC+1/bias of -60 minutes), daylight savings adds an hour on the first Sunday on or after March 5th. Therefore, 09:00 on March 1st is UTC+1, but 09:00 on April 1st is UTC+2. While those times are sampled in the same physical location, they essentially belong to different time zones. If data from this time zone is stored in UTC and converted back to the local time zone at a later time using the standard bias of -60, the result would be 09:00 on March 1st and 10:00 on April 1st.

Storing times as DATETIMEOFFSET resolves this issue in its entirety and is the recommended solution. Most date/time values in most databases are not stored with the UTC offset inline, though, and are subject to a data sampling error across daylight savings boundaries. DATETIMEOFFSET allows both the time and time zone to be defined all at once. This leaves no ambiguity with regards to precisely what time it was and its relationship to UTC and other time zones.

Therefore, the challenge that we are set to tackle here is to be able to use the attributes of time zones to retroactively convert one time into another and account for daylight savings in the process. Using the information discussed thus far, this process will be encapsulated into a scalar function that can handle the work for us.

Using T-SQL to Automate Time Zone Calculations

A better solution to the time zone problem is a function that can be passed the parameters for each time zone and can then crunch the resulting time using what we have discussed thus far in this article. The details for each time zone can be stored in table to make for more reusable code and reduce the need to hard-code time zone attributes whenever this is needed. Generally speaking, a table that contains a wide variety of attributes and dimensions for all time zones can be useful for many other purposes as well. The value in codifying time zone details and then referencing only standard time zone names or abbreviations is that it greatly reduces the chances for typos or mistakes. This provides similar functionality to a calendar table in that regard.

If one of the time zones is UTC, then zeroes can be passed in for all of that time zone’s parameters, which can simplify this function if one side of the equation happens to always be UTC. Similarly, if the daylight bias is always -60, which is likely, then it can be omitted as well. The function below has a variety of defaults to help simplify calls to it when various attributes are irrelevant, not needed, or not specified.

CREATE FUNCTION dbo.fnConvertBetweenTimeZones
(       @source_datetime DATETIME2(3),-- The date/time that will be 
                                    --converted to the target time zone.
        @source_bias AS SMALLINT = 0, -- Defaults to UTC
        @source_standard_month AS TINYINT = 0, -- Defaults to UTC
        @source_standard_day AS TINYINT = 0, -- Defaults to UTC
        @source_standard_day_of_week AS TINYINT = 0, -- Defaults to UTC
        @source_standard_hour AS TINYINT = 0, -- Defaults to UTC
     -- Source_ parameters default to UTC/no daylight savings
        @source_daylight_month AS TINYINT = 0, 
        @source_daylight_day AS TINYINT = 0, 
        @source_daylight_day_of_week AS TINYINT = 0,
        @source_daylight_hour AS TINYINT = 0,
     -- Defaults to -60 (1 hour) if used
        @source_daylight_bias AS SMALLINT = -60, 
     -- Target_ parameters default to UTC
        @target_bias AS SMALLINT, 
        @target_standard_month AS TINYINT = 0,
        @target_standard_day AS TINYINT = 0,
        @target_standard_day_of_week AS TINYINT = 0,
        @target_standard_hour AS TINYINT = 0,
        @target_daylight_month AS TINYINT = 0,
        @target_daylight_day AS TINYINT = 0,
        @target_daylight_day_of_week AS TINYINT = 0,
        @target_daylight_hour AS TINYINT = 0,
        @target_daylight_bias AS SMALLINT = -60)
RETURNS DATETIME2(3)
AS
BEGIN
        DECLARE @target_datetime DATETIME2(3);
        DECLARE @difference_in_minutes SMALLINT;
        -- Calculate base difference between time zones.
        SELECT @difference_in_minutes = @source_bias - @target_bias;
        -- Determine if source time was affected by daylight savings
        IF @source_daylight_month <> 0
        BEGIN
                DECLARE @source_daylight_start DATETIME2(3);
                DECLARE @source_daylight_end DATETIME2(3);
                -- Calculate when daylight savings starts
                SELECT @source_daylight_start = 
              DATEFROMPARTS(DATEPART(YEAR, @source_datetime), 
                      @source_daylight_month, @source_daylight_day);
                WHILE DATEPART(DW, @source_daylight_start) - 1 <> 
                                        @source_daylight_day_of_week
                BEGIN
                      SELECT @source_daylight_start = DATEADD(DAY, 1, 
                                              @source_daylight_start);
                END
                -- Calculate when daylight savings ends
                SELECT @source_daylight_end = 
              DATEFROMPARTS(DATEPART(YEAR, @source_datetime), 
                      @source_standard_month, @source_standard_day);
                WHILE DATEPART(DW, @source_daylight_end) - 1 <>   
                                          @source_standard_day_of_week
                BEGIN
                        SELECT @source_daylight_end = DATEADD(DAY, 1, 
                                               @source_daylight_end);
                END
                -- Add the daylight savings modifier, if needed
          
              IF @source_daylight_start < @source_daylight_end 
              -- Typically Northern hemisphere
                
                BEGIN
                  IF @source_datetime >= 
                   DATETIMEFROMPARTS(DATEPART(YEAR, @source_daylight_start), 
                          @source_daylight_month, 
                         DATEPART(DAY, @source_daylight_start), 
                                     @source_daylight_hour, 0, 0, 0)
                        AND @source_datetime < 
                DATETIMEFROMPARTS(DATEPART(YEAR, @source_daylight_end), 
                          @source_standard_month, 
                         DATEPART(DAY, @source_daylight_end), 
                                     @source_standard_hour, 0, 0, 0)
                        BEGIN
                                SELECT @difference_in_minutes = 
                       @difference_in_minutes + @source_daylight_bias;
                        END
                END
                IF @source_daylight_start > @source_daylight_end 
               -- Typically Southern hemisphere
                BEGIN
                 IF @source_datetime >= 
                  DATETIMEFROMPARTS(DATEPART(YEAR, @source_daylight_start), 
                      @source_daylight_month, 
                      DATEPART(DAY, @source_daylight_start), 
                               @source_daylight_hour, 0, 0, 0)
                        OR @source_datetime < 
                  DATETIMEFROMPARTS(DATEPART(YEAR, @source_daylight_end), 
                       @source_standard_month, 
                       DATEPART(DAY, @source_daylight_end), 
                               @source_standard_hour, 0, 0, 0)
                        BEGIN
                                SELECT @difference_in_minutes = 
                        @difference_in_minutes + @source_daylight_bias;
                        END
                END
        END
        -- Determine if the target time was affected by daylight savings
        IF @target_daylight_month <> 0
        BEGIN
                DECLARE @target_daylight_start DATETIME2(3);
                DECLARE @target_daylight_end DATETIME2(3);
                -- Calculate when daylight savings starts
                SELECT @target_daylight_start = 
               DATEFROMPARTS(DATEPART(YEAR, @source_datetime), 
                      @target_daylight_month, @target_daylight_day);
                WHILE DATEPART(DW, @target_daylight_start) - 1 <> 
                                           @target_daylight_day_of_week
                BEGIN
                        SELECT @target_daylight_start = 
                    DATEADD(DAY, 1, @target_daylight_start);
                END
                -- Calculate when daylight savings ends
                SELECT @target_daylight_end = 
                DATEFROMPARTS(DATEPART(YEAR, @source_datetime), 
                        @target_standard_month, @target_standard_day);
                WHILE DATEPART(DW, @target_daylight_end) - 1 <> 
                                            @target_standard_day_of_week
                BEGIN
                        SELECT @target_daylight_end = 
                         DATEADD(DAY, 1, @target_daylight_end);
                END
                -- Add the daylight savings modifier, if needed
                IF @target_daylight_start < @target_daylight_end 
                 -- Typically Northern hemisphere
                BEGIN
                 IF @source_datetime >= 
                  DATETIMEFROMPARTS(DATEPART(YEAR, 
                                       @target_daylight_start), 
                       @target_daylight_month, 
                       DATEPART(DAY, @target_daylight_start), 
                                       @target_daylight_hour, 0, 0, 0)
                        AND @source_datetime < 
                  DATETIMEFROMPARTS(DATEPART(YEAR, @target_daylight_end), 
                       @target_standard_month, 
                       DATEPART(DAY, @target_daylight_end), 
                                       @target_standard_hour, 0, 0, 0)
                        BEGIN
                                SELECT @difference_in_minutes = 
                        @difference_in_minutes - @target_daylight_bias;
                        END
                END
                IF @target_daylight_start > @target_daylight_end 
             -- Typically Southern hemisphere
                BEGIN
                        IF @source_datetime >= 
                 DATETIMEFROMPARTS(DATEPART(YEAR, @target_daylight_start), 
                      @target_daylight_month, 
                      DATEPART(DAY, @target_daylight_start), 
                                        @target_daylight_hour, 0, 0, 0)
                OR @source_datetime < 
                   DATETIMEFROMPARTS(DATEPART(YEAR, @target_daylight_end), 
                   @target_standard_month, 
                   DATEPART(DAY, @target_daylight_end), 
                                         @target_standard_hour, 0, 0, 0)
                        BEGIN
                                SELECT @difference_in_minutes = 
                          @difference_in_minutes - @target_daylight_bias;
                        END
                END
        END
        SELECT @target_datetime = 
             DATEADD(MINUTE, @difference_in_minutes, @source_datetime);
        RETURN @target_datetime;
END;

This function performs the following steps in order to convert @source_datetime to the target time zone:

  1. Determine the difference between time zones due to the difference in bias.
  2. If the source time zone observes daylight savings, then:
    • Determine when daylight savings begins for the source time zone.
    • Determine when daylight savings ends for the source time zone.
    • For time zones where the daylight savings month is before the standard month, apply the daylight bias, if needed. This is typically for time zones in the Northern hemisphere.
    • For time zones where the daylight savings month is after the standard month, apply the daylight bias, if needed. This is typically for time zones in the Southern hemisphere.
  3. If the target time zone observes daylight savings, then:
    • Determine when daylight savings begins for the target time zone.
    • Determine when daylight savings ends for the target time zone.
    • For time zones where the daylight savings month is before the standard month, apply the daylight bias, if needed. This is typically for time zones in the Northern hemisphere.
    • For time zones where the daylight savings month is after the standard month, apply the daylight bias, if needed. This is typically for time zones in the Southern hemisphere.
  4. Apply the calculated difference in minutes to the target datetime and return it.

Consider the following example:

SELECT dbo.fnConvertBetweenTimeZones('3/1/2022 12:00:00', 
             300, 11, 1, 0, 2, 3, 2, 0, 0, -60, -60, 10, 
             5, 0, 3, 3, 5, 0, 0, -60) AS target_time;

This code is converting noon on March 1st, 2022 from Eastern US time to Central European time. Both time zones observe daylight savings, but neither is observing it at this time. EST is UTC – 5 and CET is UTC + 1. The result of this function call is as follows:

The results show that six hours were added to the source datetime. Since neither time zone was observing daylight savings, the time shift required only adding up the difference in biases and applying it to the date/time. In this case, 300 minutes from the source plus 60 additional minutes from the target, or 360 minutes (6 hours).

To test a scenario where one time zone is in daylight savings while the other is not, the following example can be used:

SELECT dbo.fnConvertBetweenTimeZones('6/1/2022 12:00:00', 
             300, 11, 1, 0, 2, 3, 2, 0, 0, -60, 0, 0, 
             0, 0, 0, 0, 0, 0, 0, 0) AS target_time;

This converts from Eastern Daylight Savings Time to UTC, with the result being as follows:

The bias for Eastern Time is 300 minutes, but is offset by -60 due to daylight savings, resulting in a bias of -240 minutes. This equates to UTC-4 and therefore to convert to UTC requires adding 4 hours to the source datetime, resulting in a conversion from 12:00 to 16:00 on the same date.

We can apply the dummy test case of converting a time from the same time zones to each other:

SELECT dbo.fnConvertBetweenTimeZones('7/1/2022 12:00:00', 
             480, 11, 1, 0, 2, 3, 2, 0, 0, -60, 480, 11, 
              1, 0, 2, 3, 2, 0, 0, -60) AS target_time;

Converting 7/1/2022 12:00:00 from PST to PST results in the following datetime:

No surprise here, but this does provide a good QA test case to ensure that things are working the way they are expected to.

A common application of a function like this would be to apply it to a set of data from a table, rather than a hard-coded scalar value. The following query against WideWorldImporters converts the LastEditedWhen DATETIME2 column from Central time to UTC:

SELECT
        Orders.OrderID,
        Orders.LastEditedWhen,
        dbo.fnConvertBetweenTimeZones(Orders.LastEditedWhen, 
                    360, 11, 1, 0, 2, 3, 2, 0, 0, -60, 0, 0, 0, 0, 0, 
                    0, 0, 0, 0, 0) AS edited_time_from_CST_to_UTC
FROM WideWorldImporters.Sales.Orders
WHERE Orders.LastEditedWhen BETWEEN '3/2/2013' AND '4/4/2013'
ORDER BY Orders.LastEditedWhen, Orders.OrderID;

Note that this sample database from Microsoft may be downloaded for free from this link. This example intentionally intersects the time when Central time shifts from standard to daylight savings. The results illustrate that shift:

Note that the offset from Central US time to UTC changes from 6 hours to 5 hours after daylight savings goes into effect. The additional daylight bias of -60 minutes is added to the overall bias for Central time (360 minutes/6 hours) resulting in a new bias of 300 minutes (5 hours).

If time zone data is stored in a permanent reference table, then that table can be joined into the query to allow for a more dynamic approach based on the source/target time zones, removing most of the hard-coded literals and replacing them with column references within the time zone table.

What Else Can This be Used for?

The knowledge of how times work can be greatly helpful when working with code that does not already manage dates and times perfectly. Legacy code, T-SQL from older SQL Server versions, or code that needs to work across different data platforms may not be managed so easily.

The metadata that describes time zones can be used on its own as a reference tool to understand when and how daylight savings changes occur. They can also be used to gauge the time difference between two locations. The need to do this effectively arises often, even in systems where time zones are handled relatively well.

While the idea of regularly changing clocks to manage daylight savings has been losing appeal over the years, it is likely to remain in effect in many locales for years to come. If changes are made to remove it in a time zone or country, knowing how to adjust for future dates can allow applications to effectively prepare for future change, even before software updates are issued throughout the many tools and services we rely on to build, maintain, and host them.

Conclusion

Understanding how time zones and daylight savings works will make working with localized data much easier. If an application can be architected to minimize the need to convert between time zones, then doing so will avoid a wide variety of challenges and problems in the future. If an application cannot avoid this due to legacy data structures or code, then formalize how to convert between time zones and use the knowledge of how to represent time zones using their various attributes to provide the most reliable processes possible for managing multiple time zones.

The code in this article may not be needed in all scenarios, but will provide insight into the complexities of time zone conversions, especially with historical data. Use it as a way to better encapsulate metadata about time zones into a single source-of-truth and then build code that works with time zone data to be as re-usable as possible. This will improve code quality, reduce errors, and make an application more maintainable in the long run!

 

The post Converting Data Across Time Zones: An In-Depth Primer appeared first on Simple Talk.



from Simple Talk https://ift.tt/8mZyKEi
via

Monday, November 21, 2022

SQL Server 2022: Azure AD Authentication

SQL Server 2022 is finally GA and one of the features I was most expecting is finally available. It’s the allows Azure AD Authentication. Azure AD users can access SQL Server directly, without a second user account.

SQL Server on premises requires Azure ARC to be integrated to Azure. Azure VMs, on the other hand, don’t allow the usage of Azure ARC. Microsoft waited until the last moment to enable the same feature on the SQL Server IAAS Agent Extension.

Azure creates the SQL Server IAAS Agent Extension automatically for us when we create a SQL Server virtual machine. It appears as an additional Virtual Machine object. Using this object, we can control many details of SQL Server configuration inside an azure virtual machine and one of these details is the authentication method.

 

The first requirement for the Azure AD authentication is to set an identify to the virtual machine. On the Virtual Machine, we access the Identity tab and turn on the System Assigned Managed Identity. Azure will create an identity to the VM with the same name as the VM.

 

Define the Identity Permissions

The VM identity needs permission to access active directory for authentication. We can set this permission by assigning the role Directory Readers to the VM account. Follow these steps to set the permissions:

1) Access Azure Active Directory

2) Click the Role and Administrators tab

3) On the search text box, type “Directory” to locate the directory readers role

 

4) Click the Directory Readers role

5) Click the Add assignments button

6) Locate the VM identity and click the add button

 

 

Set the Azure Authentication in SQL Server 2022

After these steps, we can enable the Azure AD Authentication on the SQL Server IaaS agent following these steps:

7) Access the Security Configuration tab

 

8) Click the Enable option

9) On the Managed identity type drop down, select the identity to be used.

 

10) Click the Apply button

Finally, we need to give permission to Azure users to access SQL Server. We need to use SSMS to set the permissions to the Azure users. The statements are simple:

CREATE login [dennes@dennesbufaloinfocom.onmicrosoft.com] FROM EXTERNAL provider
ALTER server role sysadmin ADD member
[dennes@dennesbufaloinfocom.onmicrosoft.com] 

 

Of course, you will adapt the role of your users according to your need. On Azure SQL Databases we can only create users from Azure if we are connected with an Azure account. SQL Server 2022 on an Azure VM doesn’t have this requirement.

Conclusion

SQL Server 2022 is the most cloud connected SQL Server version. Most of the connected features depend on Azure ARC, but Microsoft left for the last minute to enable the features through the SQL Server IaaS agent. One example that this was a last minute feature is how difficult it is to find documentation about this configuration.

 

The post SQL Server 2022: Azure AD Authentication appeared first on Simple Talk.



from Simple Talk https://ift.tt/fYFRCTh
via

Saturday, November 19, 2022

Backing up MySQL Part 3: mysqlpump

In the MySQL world, there are a couple of ways to take backups of your data. Backups in MySQL can be categorized into two distinct categories: logical or physical. MySQL also comes with a lot of tools helping us achieve our backup objectives: we have walked you through one of the most popular offerings in this space – Percona XtraBackup – in the second part of this series, and the first part of this series covered one of the main logical MySQL backup tools called mysqldump. As far as backups are concerned, though, we are far from done – for example, mysqldump alone has a brother called mysqlpump. Since the two CLI-based backup tools are frequently confused with each other, the aim of this blog is to clear the confusion up a little.

What is mysqlpump?

mysqlpump is a backup utility that is used via the command-line interface. The tool is very similar to mysqldump in that it provides us with the ability to take logical backups, but also different at the same time – the goal of mysqlpump is to be an extendable, parallel-supporting replacement of mysqldump. In their blog from 2015, MySQL team said that one of the primary aims of introducing mysqlpump was not be forced to implement legacy functionality that is provided by mysqldump.

mysqlpump was introduced to the world of MySQL in 2015 – the tool was added to MySQL 5.7.8.

Using mysqlpump

The most basic use case of mysqlpump looks like this (the tool can be used in the same fashion both on Linux and on Windows):

mysqlpump is designed in such a way that it will only dump tables that were created by a user – in other words, it will refrain from backing up tables that were created internally by MySQL (it won‘t touch any of the internal tables.)

The one feature that is exclusive to mysqlpump, though, is the ability to issue backups using multiple threads at once – to figure out how many threads your system has, follow these steps:

  1. Linux users – issue a command like lscpu or lscpu | egrep ‘Thread|CPU\(s\)’ and look at the output (you should see something similar to “Thread(s) per core.”)
  2. Windows users – open up task manager and look at the number next to “Logical processors” – they represent the amount of threads in the system:

Once you know the number of threads in your system, think about the number of threads you wish (and can) allocate to mysqlpump – things might seem difficult and complex, but your thought process shouldn’t take more than a couple of seconds – mysqlpump is usually quick to complete too.

Once you know the number of threads in your infrastructure, configure the number of threads that can be used together with mysqlpump. Threads are to be configured using the --default-parallelism and --parallel-schemas options. Also, consider providing the username and password that you’re using to access MySQL itself through my.cnf for secure access: passwords provided through the CLI can be observed by accessing the history of issued commands. If you don’t want to do that, feel free to provide a username and a password, but be aware that mysqlpump itself will issue you warnings if you do so:

mysqlpump --default-parallelism=[threads] > data.sql

Since the “specialty” of mysqlpump is to back up data using multiple threads, we can also define the number of threads that we want the tool to use to dump a specific database – this functionality of mysqlpump can be exceptionally useful if we’re working with bigger sets of data, but need a logical backup to be taken.

mysqlpump – the Details

  • To specify more threads to dump a large database and less threads for smaller databases, look at the option named parallel-schemas. If you define your parameters like so, 6 threads will be used for larger databases, and 2 for smaller ones:
    mysqlpump --parallel-schemas=6:large_db1,large_db2 --parallel_schemas=2:small_db [--default-parallelism=x] > backup.sql
    Note the --default-parallelism option: a default number of threads when dumping other databases (databases that are not defined) can also be specified.
  • To back up only specific databases, define your query like so (here db_1 and db_2 represent two separate databases):
    mysqlpump –-databases db_1 db_2 > databases.sql
    You can also accomplish the same goal like so:
    mysqlpump --include-databases=db_1,db_2 --result-file=dump.sql
  • mysqlpump can also perform an “empty” dump (meaning that it can only back up the schema of the database, but not the data contained within):
    mysqlpump --include-databases=data_1,db2,db3 --skip-dump-rows --result-file=mydata.sql
  • mysqlpump also gives us the ability to use wildcards inside of any parameter. That means that if we have a lot of databases, backing a part of them (only those starting with the letter x, for example) is a breeze:
    mysqlpump --include-databases=x% --result-file=s_databases.sql
  • We can also exclude databases from being backed up like so (the following query would exclude all databases starting with test_ and back up all the rest):
    mysqlpump --exclude-databases=test_% --result-file=data.sql
  • We can also work with entire patterns (the following query would exclude all tables matching the demo pattern from the beginning):
    mysqlpump --exclude-tables=__demo --result-file=backup.sql
  • And last but not least, mysqlpump is also able to work with events and routines – all the same, just specify events or routines instead of databases and tables:
    mysqlpump --[include|exclude]-[events|routines]=title1,title2 --result-file=verycoolbackup.sql

mysqlpump can do a number of other things as well – all of the information on mysqlpump, as with everything MySQL-related, can be found at the documentation.

The Downsides of mysqlpump

However, as powerful as mysqlpump is, it’s also not without its weaknesses. Research made by Giuseppe Maxia back in September 2015 suggests that mysqlpump is faster than mysqldump, but only slightly – the blogger provides an example where mysqldump takes 3 minutes and 33 seconds to execute, whereas mysqlpump takes 2 minutes and 55 seconds – the difference is there, but we presume that the blogger has expected it to be way bigger than it was.

The research was made with approximately 20 million rows inside of the tables in the database – a figure big enough for both scripts to handle, but not large enough to invoke SELECT * INTO OUTFILE (a command used to back up bigger sets of data.) This suggests that mysqlpump is good if we find ourselves needing to use specific options not available to mysqldump, but doesn’t provide much of an upside otherwise.

Summary

mysqlpump entered the MySQL scene in 2015 with the release of MySQL 5.7.8. At first it was thought that it could be a reliable replacement for mysqldump and also offer a couple of “exotic” options not available to its counterpart, but as time went by mysqlpump proved that it still needs more refinement to be considered a reliable replacement for mysqldump.

Both mysqldump and mysqlpump are part of the MySQL’s ecosystem to this day, yet people steer towards mysqldump more than they do towards mysqlpump – while mysqlpump has its use cases, it is thought that its speed is not significant enough to outweigh its counterpart.

We hope that you’ve enjoyed reading this article, stay around the Redgate blog to learn more about everything database-related, and until next time.

The post Backing up MySQL Part 3: mysqlpump appeared first on Simple Talk.



from Simple Talk https://ift.tt/NP4qAgi
via