Saturday, October 29, 2022

The PASS Sessions I Am Most Excited For

The PASS Data Community Summit is coming up in less than three weeks, so it is time to start finalizing your schedule. My schedule this year is fixed, as I will be working behind the scenes helping to manage one of the live online tracks. But that doesn’t stop me from checking the sessions I will be making sure I watch after the show is over if I don’t see them live.

Note: The sessions mentioned are based on things that Louis Davidson (me!) is interested in and doesn’t say anything about sessions I am not mentioning. While I now edit the Simple-Talk website, I will continue to be a data architect/programmer at heart. Hence my preferred sessions are going to be heavily T-SQL programming centric when I am choosing for me.

While I am Producing

Just looking at the list of sessions I am pretty sure I will be involved with, there are some sessions that I want to see either way. For example:

Note: you will need to register/login to follow these links. If you haven’t registered, time is running out! Go to PASS Data Community Summit and register.

In-Memory OLTP Design Principles by Tosten Strauss. I have been fascinated by the memory-optimized objects in SQL Server for many years, though I haven’t used the feature even in a demo manner for a while. I have built quite a bit of sample code with it, but always love to hear more.

SQL Server Table Partitioning – DOs and DON’Ts by Margarita Naumova. Partitioning is not something I have ever done in a production setting, and it is really one of the features in SQL Server that I have never even written very much demo code on. As such it is a topic I always wanted to know more about for that day when I need to implement it (or tech edit a blog about it!).

SSIS Custom Pipeline Component: A Step-by-Step Guide by Arne Bartels. I have spent a lot of time over the past 15 years building SSIS Packages, and I will probably continue to build them to more data around in my personal projects as well.

While I may not professionally build SSIS packages now (and maybe not ever), sometimes it is fun to go to a session and learn all the things you could have done better.


Does this mean I won’t enjoy An Introduction to S3 Data Lake for SQL Server 2022 by Chris Adkin, Getting started with Power BI Deployment Pipelines by Akshata Revankar., Better Data Governance with Purview by Kelly Broekstra, or any of the other sessions I will be in? If you think that, you may have skipped the introduction to this blog. Clearly the answer is “no”. In fact, since I will just be where I am told, I have only barely looked at the schedule after it posted. The sessions may have had a time change since last I looked.

The three highlighted sessions are ones I would have in my list to stream after the conference anyhow!

Saving for Later

I have scanned the schedule a few times and have quite a few more sessions that would be not-miss, front row attenders if I was in Seattle. Of course, the reality is that even if I was there in-person, I still probably would have missed a few. Sometimes it is because there are two sessions that overlap each other (the committee tries to not let that happen for sessions that suit a certain persona (like database programmer), but there are only so many slots to hold sessions. The far more common reason I tend to miss sessions is that I know at least 100-300 people that will be at the Summit… that’s a lot of catching up to do (especially since I haven’t been to a conference in almost three years)!

There are plenty of sessions I will do my best to watch after the week is over. It is aways a bit difficult to find the time to go back and watch every session you want to, but there are a few sessions that I expect really need to be attended.

First and foremost, any session by Itzik Ben-Gan is going on my list, and I love the topic of the Beware Nondeterministic T-SQL Code section too. Honestly, I might know everything he is going to say in this presentation, but Itzik digs deep enough that there are usually an Aha Moment or twenty..

Really though, even sessions where you think you are an expert there is almost always the possibility of learning something new. Just last night, I reviewed a potential writer’s article on a subject I know quite well, but I still learned a few things! On the other hand, in Itzik’s case, it is possible that I might end up having no idea what he is talking about until I watch it a few times, so watching the recording won’t be all bad.

How to Maintain the Same Level of utilities in Cloud Deployments by Denny Cherry

As we all move to the cloud, even those of us who just write about SQL Server for the most part, this question is big in our minds. “How do I use the skills I have in the new world?” Plus, Denny is an entertaining speaker no matter what!

A Query Tuner’s Practical Guide to Statistics by Andy Yun.

Statistics are kind of a mystery to most people, even somewhat me at times. For the most part, we ignore them and let the engine do its thing. But sometimes… well a query is slow. Looking at the query plan, you see guesses about how many rows need to be processed and the actual number of rows processed is orders of magnitude greater. The common blame? Statistics.

Take knowledge that is useful and add to that the fact that Andy is a great speaker, and this one seems like a lock.

And so on

I could go on. I looked at the schedule quite a few times as part of the committee to choose sessions (I helped validate the sessions and schedules before starting here at Redgate) and there are so many great sessions no one could see them all, or even sit down and pick just a few to highlight. I didn’t even mention Paul Randal’s Performance Mythbusters session, Glenn Berry’s Index Tuning 101, Intelligent Data through Data Intelligence by Chris Unwin, SQL Titbits for the Inexperienced by Erland Sommarskog…any of the keynotes, precons, or so many others.

All this and these are just a handful (maybe two handfuls) of sessions that fit what I like best/ There are like 300 sessions plus to choose from all over the variety of topics the data platform provides.

See you soon!

 

The post The PASS Sessions I Am Most Excited For appeared first on Simple Talk.



from Simple Talk https://ift.tt/OjUQBlR
via

Introducing the MySQL UPDATE statement

Entry in Robert Sheldon's series on Learning MySQL. The series so far:

  1. Getting started with MySQL
  2. Working with MySQL tables
  3. Working with MySQL views
  4. Working with MySQL stored procedures
  5. Working with MySQL stored functions
  6. Introducing the MySQL SELECT statement
  7. Introducing the MySQL INSERT statement

The UPDATE statement enables you to modify values in your database tables, including temporary tables. With a single statement, you can update one or more rows, one or more columns, or any combination of the two. You can even update multiple tables. As you work through this article, you’ll find that the UPDATE statement is intuitive and straightforward to use, once you understand the basics of how it works.

Preparing your MySQL environment

For the examples in this article, I used the same database (travel) and tables (manufacturers and airplanes) that I used for the last few articles in this series.

Note: The examples assume that you worked through the previous article, in which case, the travel database should be set up and ready to go. If you did not, you can still follow along with this article, just know that your query results will be slightly different from the ones I show here.

To set up the travel database—if you haven’t already done so—download the MySQL_06_setup.sql file and run the SQL script against your MySQL instance. The script creates the database and tables and inserts sample data. Alternatively, you can create the database and then run the following script to create the manufacturers and airplanes tables:

CREATE TABLE manufacturers (
  manufacturer_id INT UNSIGNED NOT NULL AUTO_INCREMENT,
  manufacturer VARCHAR(50) NOT NULL,
  create_date TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
  last_update TIMESTAMP NOT NULL 
    DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (manufacturer_id) ) 
ENGINE=InnoDB AUTO_INCREMENT=1001;

CREATE TABLE airplanes (
  plane_id INT UNSIGNED NOT NULL AUTO_INCREMENT,
  plane VARCHAR(50) NOT NULL,
  manufacturer_id INT UNSIGNED NOT NULL,
  engine_type VARCHAR(50) NOT NULL,
  engine_count TINYINT NOT NULL,
  max_weight MEDIUMINT UNSIGNED NOT NULL,
  wingspan DECIMAL(5,2) NOT NULL,
  plane_length DECIMAL(5,2) NOT NULL,
  parking_area INT GENERATED ALWAYS 
             AS ((wingspan * plane_length)) STORED,
  icao_code CHAR(4) NOT NULL,
  create_date TIMESTAMP NOT NULL 
             DEFAULT CURRENT_TIMESTAMP,
  last_update TIMESTAMP NOT NULL 
    DEFAULT CURRENT_TIMESTAMP 
           ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (plane_id),
  CONSTRAINT fk_manufacturer_id 
    FOREIGN KEY (manufacturer_id) 
       REFERENCES manufacturers (manufacturer_id) ) 
ENGINE=InnoDB AUTO_INCREMENT=101;

Next, you need to add data to the manufacturers and airplanes tables to support the examples in this article. Start with the manufacturers table by running the following INSERT statement, which adds Beagle Aircraft Limited as a manufacturer:

INSERT INTO manufacturers (manufacturer)
VALUES ('Beagle Aircraft Limited');

SELECT manufacturer_id
FROM   manufacturers
WHERE  manufacturer = 'Beagle Aircraft Limited';

After the data is inserted, the SELECT statement returns the newly added value from the manufacturer_id column. If you’ve been following along exactly with the last couple of articles, the value would be 1008. If it is not, make a note of the manufacturer_id value you get here because you’ll need it for the rest of the article, starting with the following INSERT statement:

INSERT INTO airplanes 
  (plane, manufacturer_id, engine_type, engine_count, 
    wingspan, plane_length, max_weight, icao_code)
VALUES 
  ('A.61 Terrier',1008,'piston',1,36,23.25,2400,'AUS6'),
  ('B.121 Pup',1008,'piston',1,31,23.17,1600,'PUP'),
  ('B.206',1008,'piston',2,55,33.67,7500,'BASS'),
  ('D.4-108',1008,'piston',1,36,23.33,1900,'D4'),
  ('D.5-108 Husky',1008,'piston',1,36,23.17,2400,'D5');

If necessary, replace 1008 with your manufacturer_id value and then run this statement. After you’ve done that, you should be set up to follow along with the examples in this article. Be aware, however, that a number of the following examples reference the manufacturer_id column, so if the value is not 1008 on your system, be sure to use the correct one.

The UPDATE statement syntax

The UPDATE statement in MySQL supports five clauses, two of which are required and three that are optional, as indicated in the following syntax:

UPDATE [IGNORE] table_name
SET column = value [, column = value]...
[WHERE where_condition]
[ORDER BY order_list]
[LIMIT row_count]

The syntax does not include all the elements in an UPDATE statement, but it does provide most of them. These are ones you’ll be using the majority of the time, not only when learning about the statement, but also after you’ve mastered it. For the complete syntax check the MySQL documentation on the UPDATE statement.

Here’s a breakdown of the five clauses:

  • The UPDATE clause, one of the statement’s two mandatory clauses, specifies the table that is the target of the update. You can specify multiple tables in this clause, separating them with commas, but my focus in this article is on single-table updates. I’m saving the subject of multi-table updates for when I cover more advanced topics.
  • The SET clause, the other mandatory clause, specifies which columns to update. You can include one or more column assignments. For each assignment, specify the column name, an equal sign, and the new value. If you include multiple assignments, separate them with commas.
  • The WHERE clause determines which rows to update, based on one or more conditions. The clause works much like the WHERE clause in a SELECT statement. Although the WHERE clause is optional, you should be very careful running an UPDATE statement that that does not include one. Without a WHERE clause, the statement will update every row in the table, unless the LIMIT clause is included.
  • The ORDER BY clause specifies the order that rows should be updated. This can be useful in situations that might otherwise result in an error, as you’ll see later in the article. The ORDER BY clause is similar to the one you saw in the SELECT statement. The clause cannot be used for multi-table updates.
  • The LIMIT clause limits the number of rows that will be updated. If you include a WHERE clause, the count applies to the rows returned by that clause. This means that the statement will stop based on the number of rows that satisfy the WHERE conditions, whether or not those rows are actually updated. As with the ORDER BY clause, the LIMIT clause cannot be used for multi-table updates.

With these five clauses, you can build a wide range of UPDATE statements. Most of the time, you’ll be using the UPDATE, SET and WHERE clauses, although the ORDER BY clause and LIMIT clause can also come in handy at times.

Once you see the statement in action, you should have no problem understanding how all the clauses work and using them to update data. In fact, it’s almost too easy to update data, and if you’re not careful, you could make a significant mess of things. Data modifications can be difficult to undo, so you need to proceed cautiously, especially when you’re first learning how to use the UPDATE statement. Certainly, don’t practice in a production environment. When you do update the production environment, be sure to do it within a transaction, a topic I plan to cover later in the series.

Performing a basic update in MySQL

Now that you have a basic overview of the UPDATE statement syntax, it’s time to see the statement in action so you can get a feel for how it works. As I already mentioned, the UPDATE and SET clauses are the only required clauses, so let’s start with them.

Suppose you want to round all the values in the wingspan column in the airplanes table to whole numbers. To achieve this, you create the following UPDATE statement:

UPDATE airplanes
SET wingspan = ROUND(wingspan);

The UPDATE clause identifies airplanes as the target table, and the SET clause specifies that the values in the wingspan column should be rounded, which is achieved by using the built-in ROUND function.

That’s all it takes to update the wingspan data. However, there’s a good chance that when you try to run this statement, you’ll receive the following error:

Error Code: 1175. You are using safe update mode and you tried to update a table without a WHERE that uses a KEY column. To disable safe mode, toggle the option in Preferences -> SQL Editor and reconnect.

By default, you cannot perform an update without a WHERE clause that specifies a key column. This helps to ensure that you don’t make sweeping changes that you’ll later regret. You can disable safe mode (as the error message suggests), either permanently or temporarily. I recommend that you do it on a temporary basis to avoid any unwanted changes elsewhere.

To temporarily disable safe mode, use a SET statement to change the SQL_SAFE_UPDATES system variable to 0 prior to running your UPDATE statement and then set the variable to 1 after you run the statement, as shown in the following example:

SET SQL_SAFE_UPDATES = 0;

UPDATE airplanes
SET wingspan = ROUND(wingspan);

SET SQL_SAFE_UPDATES = 1;

The SET statements merely toggle the variable off and then on during the current session. Be aware, however, that the SET statement supports the GLOBAL option, which sets a variable at the global scope. As a general practice, do not use this option when disabling safe updates. It is much less risky to make these sorts of changes at the session level to prevent any unplanned mishaps. Use GLOBAL only if it’s essential in your circumstances.

For more details on SET and global variables and settings. See Using System Variables in the MySQL Documentation.

Even at the session level, the SET statement makes it possible to execute your UPDATE statement without generating an error. You can confirm the changes by running a simple SELECT statement after executing the script above:

SELECT * FROM airplanes;

Figure 1 shows part of the results returned by the SELECT statement. Notice that the wingspan values are now all whole numbers.

Figure 1. Rounding the values in the wingspan column

In some cases, you might want to update multiple columns at the same time. For this, you need to add the additional column assignments, separating them with commas. For example, the following UPDATE statement modifies both the wingspan and plane_length columns:

SET SQL_SAFE_UPDATES = 0;

UPDATE airplanes
SET wingspan = ROUND(wingspan), plane_length = ROUND(plane_length);

SET SQL_SAFE_UPDATES = 1;

Both column assignments work the same way. You’re simply rounding the column values to whole numbers. If you were to query the airplanes table after running the UPDATE statement, your results would look similar to those shown in Figure 2.

Figure 2. Rounding the values in the wingspan and plane_length columns

By using the UPDATE and SET clauses, you can quickly update all of a column’s values. Just be careful if taking this approach. It’s all too easy to mess things up in a big way.

Adding a WHERE clause to your UPDATE statement

Most of your UPDATE statements will likely include a WHERE clause to help you better target the data that you want to modify. The WHERE clause specifies one or more conditions that narrow down the rows to be updated. For example, the following statement includes a WHERE clause that limits the updates to rows with a manufacturer_id value of 1008:

UPDATE airplanes
SET engine_type = 'piston (adg-i)' 
WHERE manufacturer_id = 1008;

SELECT *
FROM   airplanes
WHERE manufacturer_id = 1008;

The SET clause in this statement sets the engine_type value to piston (adg-i) for the targeted rows. The results from executing the statements should look similar to Figure 3.

Figure 3. Limiting your update to specific rows

Note: In case you’re wondering, the adg-i value is a reference to Airplane Design Group (ADG) classifications, a system used to categorize aircraft by dividing them into six groups based on their wingspans and tail heights. The lowercase i indicates that the planes in this example are in Group I. (I realize that you’d probably want to add a column for the ADG groups. The approach I took here was meant only to demonstrate these concepts.)

That said, it turns out that the B.206 airplane should actually be in Group II, which means you need to update that record without updating the others. Fortunately, you can define multiple conditions in your WHERE clause to help narrow down the rows. In the following example, the WHERE clause includes two conditions, one based on the manufacturer_id column and the other on the plane column:

UPDATE airplanes
SET engine_type = 'piston (adg-ii)'
WHERE manufacturer_id = 1008 AND plane = 'B.206';

As in the previous example, the WHERE clause limits the updates to rows with a manufacturer_id value of 1008. However, the clause also specifies that the plane value must equal B.206. The two conditions are linked together by the AND logical operator, which means that both conditions must evaluate to true for the row to be updated.

After you run the UPDATE statement, you can retrieve the same rows as before. Your results should look similar to those shown in Figure 4. Notice that the B.206 aircraft is now shown as a Group II plane.

Figure 4. Limiting your update to one specific row

You can make your WHERE clause as detailed as necessary to ensure that you’re updating the target rows and no other rows. The key is to use your logical operators correctly to ensure that your conditional logic is accurate.

Working with column values

In the first example in this article, you saw how to use the ROUND system function to round values in the airplanes table. When you update a column in this way, MySQL uses the column’s current value to create a new value. The ability to use the current value makes it possible to build on that value in ways that go beyond simply applying a function. For example, the following UPDATE statement adds 3 to the wingspan value and 5 to the plane_length value:

UUPDATE airplanes 
SET wingspan = wingspan + 3, plane_length = plane_length + 5 
WHERE plane_id = 344;

In this case, I used the plane_id value 344 in the WHERE clause, which I had to look up in the table. However, you might want to use a different method for finding this value, such as retrieving it through a subquery, a topic I plan to cover later in this series.

After your run this statement, you can query the airplanes table to verify the results, which should look similar to those shown in Figure 5.

Figure 5. Increasing values in the wingspan and plane_length columns

If you compare Figure 5 to Figure 4, you’ll see that the row with a plane_id value of 344 has been updated. You might have also noticed that MySQL automatically updated the parking_area column, which is a generated column that multiples the wingspan and plane_length values.

If you try to update a column with the same value it already has, MySQL is smart enough to realize the values are the same and does not change the original value. This approach could potentially reduce unnecessary overhead and minimize the impact on concurrent operations that might be trying to retrieve or modify that value at the same time. MySQL is also smart enough to recognize when you try to insert an unacceptable value into a column. For instance, the following UPDATE statement attempts to change the engine_type column to NULL:

UPDATE airplanes SET engine_type = NULL WHERE plane_id = 344;

Because the column is defined as NOT NULL, the UPDATE statement will fail and generate the following error:

Error Code: 1048. Column 'engine_type' cannot be null

You’ll also receive an error if you attempt to update a column to a value with an incorrect data type. For example, the following UPDATE statement attempts to update the max_weight column to the string value unknown:

UPDATE airplanes
SET max_weight = 'unknown'
WHERE plane_id = 344;

Not surprisingly, this statement will also fail because the max_weight column is defined with the MEDIUMINT data type. Rather than update the value, MySQL returns the following error:

Error Code: 1366. Incorrect integer value: 'unknown' for column 'max_weight' at row 1

As with inserting data, updating data requires that you’re familiar with the target columns whose values you’re trying to modify. It’s not enough just to know the data type. You must also understand how the data type is defined. For example, if you try to update the icao_code column with the string abcdef, you’ll generate an error because the column is defined as CHAR(4).

Updating foreign key columns in MySQL tables

There might be times when you want to update a value in a foreign key column. This can be tricky, however, because MySQL performs foreign key checks. For example, suppose you want to modify the manufacturer_id column in the airplanes table:

UPDATE airplanes 
SET manufacturer_id = 2001
WHERE manufacturer_id = 1008;

Not surprisingly, MySQL will balk when you try to run this statement and will instead return the following error (unless you’ve included the IGNORE keyword:

Error Code: 1452. Cannot add or update a child row: a foreign key constraint fails (`travel`.`airplanes`, CONSTRAINT `fk_manufacturer_id` FOREIGN KEY (`manufacturer_id`) REFERENCES `manufacturers` (`manufacturer_id`))

You cannot update a foreign key to a value that does not exist in the referenced column. You must first make the necessary changes to the parent table. However, this too can be tricky. For example, you might try to modify the manufacturer_id value in the manufacturers table:

UPDATE manufacturers 
SET manufacturer_id = 2001
WHERE manufacturer_id = 1008;

Unfortunately, this too will cause MySQL to generate an error because you cannot update a value that’s being referenced by a foreign key:

Error Code: 1451. Cannot delete or update a parent row: a foreign key constraint fails (`travel`.`airplanes`, CONSTRAINT `fk_manufacturer_id` FOREIGN KEY (`manufacturer_id`) REFERENCES `manufacturers` (`manufacturer_id`))

You can get around these issues by temporarily disabling foreign key checks within your session (or by setting the CASCADE option on the foreign key, something I’ll be discussing in a later article). To achieve this, set the foreign_key_checks system variable to 0 before running the UPDATE statements, and then set it back to 1 after running the statements:

SET foreign_key_checks = 0;

UPDATE manufacturers 
SET manufacturer_id = 2001
WHERE manufacturer_id = 1008;

UPDATE airplanes 
SET manufacturer_id = 2001
WHERE manufacturer_id = 1008;

SET foreign_key_checks = 1;

In this way, you can update the manufacturer_id values in both tables without generating any foreign key errors. As a reminder, avoid using the GLOBAL option in your SET statement. If you turn off foreign key checks at a global level, you’re putting the integrity of your data at risk.

After you run these statements, you can query the manufacturers table to verify your changes:

SELECT * FROM manufacturers WHERE manufacturer_id = 2001;

Figure 6 shows the data returned by this statement. As you can see, the table was updated with no problem, in part because you specified a new primary key value that did not already exist.

Figure 6. Updating the manufacturers table

You can also query the airplanes table to verify that the rows have been properly updated:

SELECT * FROM airplanes WHERE manufacturer_id = 2001;

Figure 7 shows the results returned by the query. As expected, the manufacturer_id values have been updated in all the target rows.

Figure 7. Updating the manufacturer_id column in the airplanes table

Chances are, you probably won’t have to update foreign key columns too frequently, but it’s good to understand what it takes to make it happen. Just know that there are other issues to be aware of, such as not trying to insert duplicate primary keys.

Updating primary key columns in MySQL tables

As with foreign keys, there might be times when you need to update the values in a primary key column. If you update a single value (as you saw above), it’s usually no problem as long as the new value conforms to the column’s requirements. However, things get trickier if updating multiple values at one time. For example, the following UPDATE statement attempts to add 1 to all plane_id values in rows that have a manufacturer_id value of 2001:

UPDATE airplanes 
SET plane_id = plane_id + 1
WHERE manufacturer_id = 2001;

The statement has a good chance of failing because of the order that MySQL updates each row of data (although you can never be certain about the exact order that the database engine will choose when updating data). This is because MySQL is trying to update the original value to a value that already exists and is itself waiting to be updated. For instance, if MySQL tries to update the first row from 342 to 343 before the second row has been changed, the statement will fail and MySQL will return the following error:

Error Code: 1062. Duplicate entry '343' for key 'airplanes.PRIMARY'

You might be tempted to include the IGNORE keyword to try to get around this issue:

UPDATE IGNORE airplanes 
SET plane_id = plane_id + 1
WHERE manufacturer_id = 2001;

The IGNORE keyword instructs MySQL to return a warning rather than an error and to continue with the statement’s execution instead of stopping. In this case, you’ll likely receive four warnings, along with a message indicating only one row was successfully updated:

1 row(s) affected, 4 warning(s):

1062 Duplicate entry '343' for key 'airplanes.PRIMARY'

1062 Duplicate entry '344' for key 'airplanes.PRIMARY'

1062 Duplicate entry '345' for key 'airplanes.PRIMARY'

1062 Duplicate entry '346' for key 'airplanes.PRIMARY'

Rows matched: 5 Changed: 1 Warnings: 4

If you query the airplanes table, you can see that only the last row has been updated, as shown in Figure 8. This is because the last row was the only one that did not try to update the primary key value to an existing value.

Figure 8. Using the IGNORE option when updating the plane_id column

A better solution is to include an ORDER BY clause that sorts the rows by the plane_id values, in descending order:

UPDATE airplanes 
SET plane_id = plane_id + 1
WHERE manufacturer_id = 2001
ORDER BY plane_id DESC;

When you include the ORDER BY clause in this way, MySQL applies the updates starting with the last row, making it possible to increment the values by 1 without generating any errors or warnings.

Figure 9 shows what the data now looks like after running the UPDATE statement.

Figure 9. Adding the ORDER BY clause to your UPDATE statement

You probably won’t need to use the ORDER BY clause very often, but when you do, it will prove very useful.

Another clause that’s similar in this respect is the LIMIT clause, which limits the number of rows that are updated. For instance, the following update statement limits the number of rows to 3:

UPDATE airplanes 
SET plane_id = plane_id + 1
WHERE manufacturer_id = 2001
ORDER BY plane_id DESC
LIMIT 3;

Because the UPDATE statement still includes the ORDER BY clause, the three rows that are updated start at the bottom and go up. Figure 10 shows the results of querying the table after the update.

Figure 10. Adding the LIMIT clause to your UPDATE statement

I suspect you’re not going to include the LIMIT clause in your UPDATE statements very often (if at all), but situations might arise in which you find it useful. For example, you might want to test an UPDATE statement that would normally modify a large number of rows. If you include the LIMIT clause while testing the statement, you’ll reduce the amount of time and processing it takes to verify that the statement is working properly.

Working with the MySQL UPDATE statement

The UPDATE statement is one of the most common statements used when working with MySQL data. In most cases, you’ll be including the UPDATE, SET and WHERE clauses. At times, you might forego the WHERE clause—at your own peril—and at other times, you might incorporate the ORDER BY clause or LIMIT clause (or both). However, the bulk of your updates will likely rely on the three primary clauses.

Regardless of which clauses you use, you should understand how they all work to ensure that you’re modifying your data as effectively as possible, while ensuring the accuracy of those updates. Fortunately, the UPDATE statement is fairly easy to understand and use, so you should have no problem getting started.

 

The post Introducing the MySQL UPDATE statement appeared first on Simple Talk.



from Simple Talk https://ift.tt/n8Frqto
via

Wednesday, October 26, 2022

Oracle optimizer Or Expansion Transformations

Jonathan Lewis' continuing series on the Oracle optimizer and how it transforms queries into execution plans:

  1. Transformations by the Oracle Optimizer
  2. The effects of NULL with NOT IN on Oracle transformations
  3. Oracle subquery caching and subquery pushing
  4. Oracle optimizer removing or coalescing subqueries
  5. Oracle optimizer Or Expansion Transformations

There are two versions of this transformation, the “legacy” Or-Expansion – formerly known as “Concatenation” – and the new (12.2) implementation that allows Oracle greater scope and flexibility in its use. The basic mechanism is essentially the same in both versions though the shape of the execution plan does change. The key feature is that the optimizer can take a single query block and transform it into a UNION ALL of 2 or more query blocks which can then be optimized and run separately.

Here’s some code to create a table that I will use to demonstrate the principle:

create table t1
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid 
                                -- Wordpress format issue
)
select
        cast(rownum as number(8,0))                     id,
        cast(mod(rownum,1949) as number(8,0))           n1949,
        cast(mod(rownum,1951) as number(8,0))           n1951,
        cast(lpad(rownum,10,'0') as varchar2(10))       v1,
        cast(lpad('x',100,'x') as varchar2(100))        padding
from
        generator
order by
        dbms_random.value
;
create index t1_i1949 on t1(n1949);
create index t1_i1951 on t1(n1951);

The table t1 holds 10,000 rows. In case you’re wondering, 1949 and 1951 are prime numbers and since they are both slightly less than 2,000 most values in the two columns will appear 5 times each, but a few will appear six times. It’s worth doing a couple of quick queries to get a better intuitive feel for the data:

select  v1, n1949, n1951 from t1 where n1949 = 3;

select  v1, n1949, n1951 from t1 where n1951 = 3;

select  v1, n1949, n1951 from t1 where n1949 = 3 or n1951 = 3;

And here are the three sets of results from my test (in the absence of an “order by” clause your results may appear in a different order):

V1              N1949      N1951
---------- ---------- ----------
0000007799          3       1946
0000009748          3       1944
0000001952          3          1
0000005850          3       1948
0000000003          3          3
0000003901          3       1950

6 rows selected.

V1              N1949      N1951
---------- ---------- ----------
0000009758         13          3
0000005856          9          3
0000001954          5          3
0000007807         11          3
0000003905          7          3
0000000003          3          3

6 rows selected.

V1              N1949      N1951
---------- ---------- ----------
0000009758         13          3
0000005856          9          3
0000007799          3       1946
0000001954          5          3
0000007807         11          3
0000003905          7          3
0000009748          3       1944
0000001952          3          1
0000005850          3       1948
0000000003          3          3
0000003901          3       1950

11 rows selected.

You’ll notice, of course, that the two simple queries returned 6 rows, but the disjunct (“OR”) of the two separate predicates returned only 11 rows. A little visual inspection shows that the row where v1 = '0000000003' appears in both of the first two result sets when it’s only going to appear once in the final query.

This example is so simple that the final plan doesn’t show OR expansion it has used a completely different option to access the data using a method that the optimizer thinks is cheaper (partly because of a limitation – not a bug – in the optimizer’s model). Here’s the plan (pulled from memory after checking the SQL_ID of the query):

select * from table(dbms_xplan.display_cursor('7ykzsaf3r0umb',null));

This returns:

The optimizer has decided to use btree/bitmap conversion. The plan does an index range scan of the two indexes, one for each predicate, then converts the two lists of rowids into two bit-strings, does a Bitmap-OR of the two strings then converts the resulting string back into a list of rowids. It’s an unfortunate feature of execution plans that bitmap strategies don’t include estimates of how many bits are “set” (i.e. 1) and how many rowids will be produced before showing the final estimate of rows returned after the table has been accessed by rowid and filtered.

Side note: Since the optimizer has no idea how the pattern of bits in a bitmap is related to the scattering of data (and the clustering_factor of a bitmap index is simply a count of the number of bitmap chunks in that index) the optimizer simply makes a guess about the data scatter. Roughly speaking it assumes that 80% of the rows identified through a bitmap predicate will be very well clustered, and that the remaining 20% will be widely scattered. Inevitably the resulting cost estimate of using the bitmap approach will be far too high in some cases and far too low in others.

Or-Expansion plans

Since the example allows the optimizer to use btree/bitmap conversion, the query will need a hint to disable that choice. Unfortunately, there’s no explicit hint to switch the mechanism off (you can force it to appear with the /*+ index_combine(alias) */ hint but there’s no “no_index_combine()” hint), so it’s necessary to fall back on the opt_param() hint to modify one of the optimizer parameters for the duration of the query.

Since I’m running 19.11.0.0 I’m going to run two more versions of the sample query – the first simply disables the btree/bitmap feature, the second also takes the optimizer back to the 12.1.0.2 optimizer feature level:

select  /*+ opt_param('_b_tree_bitmap_plans','false') */  
        v1, n1949, n1951 
from    t1 
where   n1949 = 3 or n1951 = 3;

select  /*+ 
                opt_param('_b_tree_bitmap_plans','false') 
                optimizer_features_enable('12.1.0.2') 
        */ 
        v1, n1949, n1951 
from    t1 
where   n1949 = 3 or n1951 = 3;

Here are the resulting plans – first the baseline 19.11.0.0 plan showing the newer Or-Expansion:

And here’s the 12.1.0.2 plan showing “Legacy” Or-Expansion (i.e. Concatenation):

 

The plans are clearly similar – both show two index range scans and the access predicate associated with each range scan uses the obvious index.

Both plans report an unusual filter predicate of the form lnnvl(column = constant) – and if you check the plan body and Predicate Information you’ll notice that this lnnvl() predicate is used during access to the second child of the Concatenation/Union All, but the “predicate within a predicate” echoes the predicate used during access to the first child of the plan. This is Oracle making sure that rows that were reported in the first part of the Concatenation/Union All do not get repeated by the second part.

The lnnvl() function takes a predicate as its single parameter, returning FALSE if the input predicate evaluates to TRUE, and TRUE if the predicate evaluates to FALSE or NULL. Effectively it is the “is not true()” function, engineered to work around some of the difficulties caused by SQL’s “three-value logic”.

There are several cosmetic differences between the plans – the view VW_ORE_xxxxxxxx makes it very clear that the plan includes Or-Expansion, and there’s an explicit UNION ALL operation that shows very clearly how Oracle is operating. One curious difference is the reversal of the order in which the original predicates become the separate branches of the transformed query – there doesn’t appear to be any clever arithmetic involved, it just seems to be a switch from bottom-up to top-down.

As a side note – this does mean that when you upgrade from a “Concatenation version” of Oracle” to Or-Expansion a query that you’ve written to “select first N rows” without including an “order by” clause may now return a different set of rows.

In effect, Or-Expansion has taken the original text and transformed it into a select from a UNION ALL view (text extracted from the CBO trace file and edited for readability – “test_user” was the name of the schema running the demo):

SELECT 
        VW_ORE_BA8ECEFB.ITEM_1 V1,
        VW_ORE_BA8ECEFB.ITEM_2 N1949,
        VW_ORE_BA8ECEFB.ITEM_3 N1951 
FROM    (
           (
           SELECT T1.V1 ITEM_1, T1.N1949 ITEM_2, T1.N1951 ITEM_3
           FROM TEST_USER.T1 T1 
           WHERE T1.N1949=3
           )
           UNION ALL
           (
           SELECT T1.V1 ITEM_1, T1.N1949 ITEM_2, T1.N1951 ITEM_3 
           FROM TEST_USER.T1 T1 
           WHERE T1.N1951=3 AND LNNVL(T1.N1949=3)
                )
        )       VW_ORE_BA8ECEFB
;

In more complex queries, of course, once this transformation has been done the optimizer may find ways other parts of the query can be transformed with the different query blocks embedded in this UNION ALL view. Again, this may cause some surprises when an upgrade takes you from Concatenation to Or-Expansion: plans may change because there are more cases where the optimizer can apply the basic expansion then carry on doing further transformations to individual branches of the UNION ALL.

Or-Expansion threats

A critical difference between Concatenation and Or-Expansion is that the OR’ed access predicates for the driving table must all be indexed before Concatenation can be used. The same restriction is not required for Or-Expansion and this means the optimizer will spend more time considering more execution plans while optimizing the query. Consider a query joining table tA and table tB with the following where clause:

SELECT  {list of columns}
FROM
        tA, tB
WHERE
        (tA.indexed_column = 'Y' or tA.unindexed_column = 'N')
AND     tB.join_column = tA.join_column
AND     tB.another_indexed_column = 'X'
;

This could be expanded to:

WHERE
         tA.indexed_column = 'Y'
AND      tB.join_column = tA.join_column
AND      tB.another_indexed_column = 'X'
...
UNION ALL
...
WHERE
         tA.unindexed_column = 'N'
AND      tB.join_column = tA.join_column
AND      tB.another_indexed_column = 'X'
AND      lnnvl(tA.indexed_column = 'Y')

This expansion would not be legal for Concatenation because (as indicated by the choice of column names) there is no indexed access path for the predicate tA.unindexed_column = ‘N’, but Or-Expansion would allow the transformation. At first sight that might seem like a silly choice – but now that the optimizer has two separate query blocks to examine, one of the options open to it is to produce a plan where the first query block uses an index into tA followed by a nested loop into tB while the second query block uses an index into tB followed by a nested loop into tA.

Generally we would expect the improved performance of the final plan to more than offset the extra time the optimizer spends investigating the extra execution plans but there are patterns where the optimizer tries too hard and things can go badly wrong; here’s an example from a few years ago that modelled a production performance problem:

drop table t1 purge;

CREATE TABLE t1
AS
WITH GENERATOR AS (
        SELECT  --+ materialize
                rownum id 
        FROM   dual 
        CONNECT BY 
                level <= 1e4
)
SELECT
        rownum          id1,
        rownum          id2,
        rownum          id,
        lpad(rownum,10) v1,
        rpad('x',100)   padding
FROM
        generator       v1,
        generator       v2
WHERE
        rownum <= 1e5   -- > comment to avoid 
                        -- wordpress format issue
;

CREATE INDEX t1_i1 ON t1(id1, id2);

SELECT * FROM t1 WHERE
   ( id1 =  1001 AND id2 in (1,2))
OR ( id1 =  1002 AND id2 in (2,3))
...
OR ( id1 =  1249 AND id2 in (249,250))
OR ( id1 =  1250 AND id2 in (250,251))
;

I’ve removed 246 lines from the query but you can probably see the pattern from the remaining 4 predicates. I didn’t write this query by hand, of course, I wrote a query to generate it. If you want to see the entire script (with some variations) you can find it at https://jonathanlewis.wordpress.com/2013/05/13/parse-time/

Running version 19.11 on my sandbox this query took 21.2 seconds to parse (optimize) out of a total run time of 22.1 seconds. The session also demanded 453 MB of PGA (Program Global Area) memory while optimizing the query. The execution plan was a full tablescan – but the optimizer spent most of its time costing Or-Expansion before discarding that option and picking a simple tablescan path.

After a little trial and error I reduced the number of predicates to 206 at which point the plan switched from a full tablescan to using Or-Expansion (the cost was 618 for Or-Expansion and 619 for a hinted full tablescan, at 207 predicates the costs were 221 and 220 respectively). The counter-intuitive side effect of this change was to increase the optimization time to 27 seconds and the memory demand to 575MB. The reason for this was that having discovered that the initial OR-Expansion produced a lower cost than the tablescan the optimizer then evaluated a number of further transformations of the resulting UNION ALL view to see if it could produce an even lower cost.

Apart from the increase in the workload associated with the Or-Expansion another threat comes from an error in the estimate of CPU usage at actual run-time. Here are a few lines from the body of the execution plan with 206 predicates, and a few lines from the Predicate Information for just the last operation in the plan:

The plan includes 206 separate query blocks that do a basic index range scan with table access (I’ve shown just two of them) and as you work down the list the filter() predicate for each index range scan gets one more set of lnnvl() predicates than the previous one. In fact, by time the optimizer got to operation 128 – which is from the 63rd query block in the Union All – the number of bytes allowed in v$sql_plan for reporting the filter predicate was too few to hold the entire predicate and, as you can see at the end of the sample above, the filter predicate has stopped at references to the values 61 and 62, and is nowhere near the values 206 and 207 which would have appeared in the last line of the original SQL statement. There’s a lot of lnnvl() checking that has to go on as the number of branches in the expansion increases – and that’s going to be burning up CPU at run-time.

To block the transformation and cut down the optimization time, all that was needed was the addition of the hint /*+ no_or_expand(@sel$1) */ to the query (sel$1 because I hadn’t used the qb_name() hint to give the query a programmer-friendly name) and this reduced the parse time from 27 seconds to 0.4 seconds and the memory demand to 7 MB.

Side note: It’s worth pointing out that when you run a query the memory used for optimization is in your PGA, and the memory you’re using will reduce the “aggregate PGA auto target” which affects the limit for every other user. On the other hand if you call “explain plan” to generate an execution plan without executing the query the memory used comes from the SGA (i.e. the shared pool), which means that when the optimizer goes a little mad with Or-Expansion your session may flush lots of useful information from the library cache, with far greater impact on the entire system. This is just one more reason for not depending on “explain plan” to give you execution plans.

If you’re responsible for the performance of a system, there are some clues about the overheads of optimization in extreme cases. One point to check is the active session history (ASH – v$active_session_history or dba_hist_active_sess_history) where you could aggregate SQLexecutions to check for statements that record more than one sample in a hard parse, e.g:

break on sql_id skip 1
select 
        sql_id, sql_exec_id, in_hard_parse, count(*) 
from 
        v$active_session_history 
group by 
        sql_id, sql_exec_id, in_hard_parse
having 
        count(*) > 1
order by 
        1,2,3

This returns

SQL_ID        SQL_EXEC_ID I   COUNT(*)
------------- ----------- - ----------
1v1mm1224up26             Y         18
2kt6gcu4ns3kq             N          2
                          Y         64
7zc0gjc4uamz6             Y         21
d47kdkn618bu4    16777216 Y          2
g1rvj7fumzxda    16777216 N          4

This isn’t a perfect search, unfortunately – if a statement is parsing, then it’s not yet executing so it doesn’t have an execution id (sql_exec_id), so when you see that sql_id“2kt6gcu4ns3kq” has recorded 64 samples “in hard parse” that might mean there have been lots of short hard parses (though that’s a little suspicious anyway) or a few long hard parses. You’d have to drill into sample times to get a better idea of what the larger numbers are telling you (64 consecutive sample times would be a fairly strong indication the statement was a threat).

Another thing you can check retrospectively is the alert log, as this may tell you about the memory aspect of extreme optimization problems. Here’s a short extract from my alert log following one of my nasty parse calls:

This isn’t saying anything about the hundreds of megabytes of PGA memory I’ve been using, it’s saying that someone pushed a “large object” into the SGA – but there’s a chance that an object that large is interesting and you’ll want to find out what it was, and the last few lines in the extract show that this one was an SQL statement that looks familiar. (The 51200K mentioned as the “notification threshold” is set by the hidden parameter _kgl_large_heap_warning_threshold.)

Summary

The Concatenation operation from earlier versions of Oracle has been superseded by the Or-Expansion operation, which is more flexible and can allow further transformations to be applied after the initial transformation in more ways than were available following the Concatenation operator. Or-Expansion is easily visible in execution plans, reporting a VIEW operation with a name like VW_ORE_xxxxxxxx followed by a UNION ALL operation.

On the plus side, some queries will be able to execute much more efficiently because the optimizer can find better plans for the different branches of the expansion; on the minus side the time spent in optimization can increase significantly and there will be cases where the cost of optimisation far outweighs the benefit of the faster execution. Although the CPU and elapsed time are the obvious costs of the optimization stage of Or-Expansion it is important to keep an eye on the demands for PGA that can also appear, so queries which have a large number of “OR’ed” predicates should be viewed with caution – and this includes queries with long “IN” lists, though the simplest type of IN-lists are likely to be transformed into “INLIST ITERATORS” rather than UNION ALL operations.

If you find cases where the optimizer persistently tries to use Or-Expansion when it’s a bad idea you can use the hint /*+ no_or_expand(@qbname) */ to block the expansion. The hint /*+ or_expand() */ is also available to force Or-Expansion (but it’s not a hint to use casually). If you aren’t allowed to block Or-Expansion through a hint or SQL patch then you could consider disabling the feature entirely by setting the hidden parameter _no_or_expansion to true at the system or session level.

The post Oracle optimizer Or Expansion Transformations appeared first on Simple Talk.



from Simple Talk https://ift.tt/LUCZpWq
via

Tuesday, October 25, 2022

Aha Moments at PASS Data Community Summit

Anyone who has attended Summit will recount tales of its wonderful ‘Aha!‘ moments. Perhaps it happens after a presentation, as you’re chatting in the lobby, when in a flash of inspiration an idea of how to tackle a problem suddenly reveals itself to you!

It may feel like a sudden flash, but more likely it’s the culmination of many small pockets of discovery; a day of learning in a pre-con, advice received in the Microsoft ‘tech zone’ of the Exhibition Hall, a casual chat at a Summit lunch, an interesting idea in a technical session. And now, as you stand in the lobby talking animatedly with a group of database people facing similar challenges, each of these disparate strands comes together and the solution is clear.

People tend to get misty-eyed when speaking of Summit, of the sense of community, the professional connections made, and long friendships forged. I’ve experienced this myself. However, if its value to you lies in building technical knowledge, and a strong professional network, the value to your employer depends on those ‘Aha!’ moments. These are the ones that will help you resolve the tough technical or architectural problem with which your team or organization is currently grappling.

The ‘hive mind’ can conjure some excellent ideas. We experience it occasionally on the best forum threads and discussions, but it really gets to work at a world event like PASS Data Community Summit. Here, you can network with DBAs and developers from across the globe, tackling similar problems in very different settings. You’ll hear new perspectives and fresh approaches to common problems. Often it will reveal the path you need to take.

If you’ve a good ‘Aha!’ moment from Summit, please share it! You can register for this year’s event here, and hopefully will provide a lot more of them.

The post Aha Moments at PASS Data Community Summit appeared first on Simple Talk.



from Simple Talk https://ift.tt/YWukOP1
via

Friday, October 21, 2022

The Basics of Updating Data in a SQL Server Table

Entry in Greg Larsen's series on Learning T-SQL. The series so far:

  1. The basic T-SQL Select Statement
  2. Ordering a result set in SQL Server
  3. The ins and outs of joining tables in SQL Server
  4. Summarizing data using GROUP BY and HAVING
  5. The Basics of Inserting Data into a SQL Server Table
  6. The Basics of Updating Data in a SQL Server Table

Once data is inserted into a table, data typically needs to be maintained as time goes on. To make changes to an existing row or a number of rows, in a table, the UPDATE statement is used. This article shows how to use the UPDATE statement to modify data within a SQL Server table.

Syntax of the UPDATE Statement

There are many different options supported by the UPDATE statement. This article will only be showing how to use the basic and most common options of the UPDATE statement. The syntax for that basic UPDATE statement that this article will explore can be found in Figure 1.

UPDATE <object>
    SET <column_name> = {expression}[ ,…n]
    [ <OUTPUT Clause> ]
    [ FROM <table_source> [ ,…n] ]
    [ WHERE  <search_condition> ]

Figure 1: Basic syntax for the UPDATE statement

Where:

  • <object> – is a table or view name that is being updated.
  • <column_name> – is a column that will be updated.
  • {expression} – is a literal string, number, or value that will be used to update the associated <column_name>. This value can be any scalar expression, such as supplied from a variable or even another column in the same or different table.
  • <table_source> – is a table, view or derived table that will be updated.
  • <search_condition> – is a constraint that identifies which rows are the target of the UPDATE statement.

To understand the complete syntax of the UPDATE statement refer to the Microsoft Documentation.

Sample Data

To demonstrate how to update the values in a table, sample data will be needed. The code in Listing 1 will create and populate two different sample tables that will be used in this article.

The code two tables: dbo.Room and dbo.PriceChanges. These tables will be used in the examples found in this article. If you want to follow along and run the examples contained in this article, use this code in Listing 1, to create these sample tables in the tempdb database on your instance of SQL Server.

USE tempdb;
GO
CREATE TABLE dbo.Room (
   RoomNum INT,
   Beds INT,
   Sleeps INT,
   StandardRate MONEY,
   PriceChangeDateTime DATETIME2(0),
   RoomDesc VARCHAR(max));
GO
CREATE TABLE dbo.PriceChanges (
   RoomNum VARCHAR(100),
   NewStandardRate MONEY)
GO
-- INSERT Tables with Data
INSERT INTO dbo.Room 
VALUES (1,1,2,125.99,SYSDATETIME(),
        'This room has 1 king size bed, a TV with blue-ray DVD player, and private bathroom with tub'),  
       (2,2,4,175.99, SYSDATETIME(),
        'This room has two double beds, a TV with blue-ray DVD Player, and a private bathroom, with only a shower'),
       (3,2,4,250.99, SYSDATETIME(),
        'This room is a suite.  It has a private bedroom with king size bed, and private bathroom.  In the living space there is a sofa that pulls out to sleep two.  There is also a fireplace and a couple of easy chairs.  There are two TVs with blue-ray players. Has a kitchen with dishes.');
INSERT INTO dbo.PriceChanges    
VALUES (1,150.99),
       (2,198.99),
       (3,299.99);
GO

Listing 1: Sample Tables

Updating a Single Column Value on a Single Row

There are times when data in a single row of a table needs to be changed. It might need to be changed because it was entered incorrectly, or as time goes on the data needs to change to reflect the current situation. When a single column value, in a single row, needs to be changed the UPDATE statement can be used.

In Listing 2 is an example that will update the value of the StandardRate column to 179.99 for the row where the RoomNum column is equal to 2.

USE tempdb;
GO

UPDATE dbo.Room
   SET StandardRate = 179.99
WHERE RoomNum = 2;
GO

Listing 2: Updating a single column value on a single row

In Listing 2, the UPDATE statement updated the stored value of the StandardRate column in the Room table. The SET clause was used to provide the new rate value of 179.99 for StandardRate column. To identify that only the row with a RoomNum of 2 was to be updated the WHERE clause was used.

Care needs to be taken when issuing an UPDATE statement to ensure you don’t update more rows than intended. For instance, if the WHERE clause was not included in the statement, all the rows in the Room table would have had their StandardRate value set to 179.99. More on this later in the “Concerns with using the UPDATE statement” section below.

Updating Multiple Column Values on a Single Row

The prior example issued an UPDATE statement to modify a single column’s value on a single row. The UPDATE statement allows multiple column values on a single row to be updated at the same time. When multiple columns need to be updated, the additional columns and values can be added to the SET clause, as was done on Listing 3

USE tempdb;
GO
UPDATE dbo.Room
   SET StandardRate = 299.99,
          PriceChangeDateTime = SYSDATETIME()
WHERE RoomNum = 3;
GO

Listing 3: Updating two column values on a single row

This example updates the StandardRate and PriceChangeDateTime columns for only the rows where RoomNum is equal to 3. To provide different values for the two different columns (StandardRate and PriceChangeDateTime) two different values were provided on the SET clause. Each column name/value pair is separated by a comma. This example only updated two different columns. If more than 2 columns needed to have their values updated then more name/values pairs could be added, where each pair is separated by a comma.

Updating Multiple Rows

Each example so far has only updated a single row. There are times when an applications might need to make the same column value changes to more than a single row. For instance, suppose the sample table needed to have the PriceChangeDateTime column value changed to the current date and time, on the two different rows that were updated in the last two examples. This could be accomplished by running the code in Listing 4.

USE tempdb;
GO
UPDATE dbo.Room
   SET PriceChangeDateTime = SYSDATETIME()
WHERE RoomNum in (2, 3);
GO

Listing 4: Updating multiple rows with a single UPDATE statement

When the code in Listing 4 is executed the current date and time value will be set for room numbers 2 and 3. If the WHERE clause was excluded, like in Listing 5 below, all the PriceChangeDateTime values for all rows in the Room table would have been updated.

USE tempdb;
GO
UPDATE dbo.Room
   SET PriceChangeDateTime = SYSDATETIME();
GO

Listing 5: Updating all rows in Room table

Note: Make sure to review the concerns section below to better understand the pitfalls that can occur when using an UPDATE statement that excludes the WHERE clause.

Update a table with values from another table

In all the examples so far, the SET clause has used a literal value to provide the new value to update the columns in the Room table. Another way to provide the values for an UPDATE statement is to provide them from another table. When the sample data was generated, a table named dbo.PriceChanges was created. This table contains price changes for each of the different rooms. To use the PriceChange table to update rows in the Room table the code in Listing 6 is provided.

USE tempdb;
GO
UPDATE dbo.Room
   SET StandardRate = PriceChanges.NewStandardRate,
       PriceChangeDateTime = SYSDATETIME()
FROM Room JOIN PriceChanges
ON Room.RoomNum = PriceChanges.RoomNum;
GO

Listing 6: Using a table to provide values for UPDATE statement

The FROM clause joins the Room table and the PriceChanges table based on the RoomNum column in both tables. For every row that matches the join criteria SQL Server will take the NewStandardRate column value from the matching PriceChanges table and update the StandardRate column in the Room table. Additionally, the code in Listing 6 also updated the PriceChangeDateTime column with the current date and time, using the SYSDATETIME() function.

In Listing 6 every row in the Room table got a new rate. That is because there was a matching row in the PriceChanges table for every RoomNum. If only a specific RoomNum was required to get a rate update, then a WHERE constraint could be added to the code, as shown in Listing 7.

USE tempdb;
GO
UPDATE dbo.Room
   SET StandardRate = PriceChanges.NewStandardRate,
       PriceChangeDateTime = SYSDATETIME()
FROM Room JOIN PriceChanges
ON Room.RoomNum = PriceChanges.RoomNum
WHERE Room.RoomNum = 2;
GO

Listing 7: Update only RoomNum 2

In Listing 7 the WHERE constraint identified that only RoomNum 2 should be updated.

Partial Updates of Large Data Type Columns

Certain datatypes can hold a large amount of data and sometimes it is better to not try to download and update the entire value. With the introduction of SQL Server 2005, Microsoft introduced the WRITE clause to perform a partial or full updates of a large data type columns (varchar(max), nvarchar(max), and varbinary(max); each of which can hold up to 2GB of data in a single value). Here is the syntax for the WRITE clause:

.WRITE(expression, @Offsert, @Length)

The “expression” is the value that will be written to the large data time column, the @Offset identifies the starting position of where the partial update will begin, and the @Length identifies the number of characters that will be replaced.

The code in Listing 8 uses the WRITE clause to change the word “to” in the RoomDesc column to the value “which will” on only the row where the RoomNum is equal to 2.

USE tempdb;
GO
SELECT RoomDesc
FROM   dbo.Room
WHERE  RoomNum = 3;

UPDATE dbo.Room
   SET RoomDesc.WRITE --charindex finds first instance of a value
                  ('which will',charindex('to',RoomDesc)-1,2)
WHERE RoomNum = 3;

SELECT RoomDesc
FROM   dbo.Room
WHERE  RoomNum = 3;
GO

Listing 8: Using the WRITE calls to perform a Partial update

Compare the output and you will see the difference (this is just a portion of the RoomDesc column value before and after the update):

RoomDesc
----------------------------------------------
s out to sleep two. There is also a fireplace

RoomDesk
----------------------------------------------
s out which will sleep two. There is also a f

The WRITE clause cannot be used to update a large data type column to NULL value. This can be demonstrated by running the code in Listing 9.

UPDATE dbo.Room
   SET RoomDesc.WRITE (NULL,0, LEN(RoomDesc))
WHERE RoomNum = 3;
GO
SELECT * FROM dbo.Room WHERE RoomNum = 3;

Listing 9: Trying to set RoomDesc to null

When the code is Listing 9 is executed it runs without error. But by reviewing the output of the SELECT statement, which can be found Report 1, the RoomNum column was set to the empty string, not to a NULL value.

Table Description automatically generated with medium confidence

Report 1: Output of SELECT statement in Listing 9

If a large data type column needs it values set to NULL, then a standard SET clause can be used as in Listing 10.

UPDATE Room
   SET RoomDesc = NULL
WHERE RoomNum = 3;
GO
SELECT * FROM dbo.Room WHERE RoomNum = 3;

Listing 10: Setting a large data type column to null.

The output shown in Report 2 was created when the code in Listing 10 was executed.

Report 2: Output from Listing 10.

In Report 2 the column value for the RoomDesc column is now set to NULL.

Using the TOP Clause

The TOP clause can be used to limit the number of rows being updated. This can be demonstrated by using the code in Listing 11.

USE tempdb;
GO
UPDATE TOP (2) dbo.Room
   SET PriceChangeDateTime = SYSDATETIME();
SELECT * FROM Room;
GO

Listing 11: Using the TOP clause to limit the number of rows being updated

The table Room only has 3 rows of data. When the code is listing 9 is run only the first two rows will be update. This can be seen by revieing the PriceChangeDateTime column values on the three different rows in the output in Report 3.

Report 3: Output when Listing 11 was executed

In Report 3 only the first two rows have updated the PriceChangeDateTime column values. The “TOP (2)” clause in Listing 11 restricted the third row from getting a new datetime stamp value.

Concerns with using the UDPATE statement

Here is a list of a few concerns that might arise when issuing an UPDATE command:

  • Forgetting to add a WHERE constraint to an UPDATE statement when only a subset of rows needs to be updated. By leaving off the WHERE clause causes all rows in the target table will be updated. (This is a common database programmer error to think an entire statement is highlighted, but miss the WHERE clause. An all too common mistake.)
  • Having an incorrect WHERE clause. When a WHERE clause is not correct, either no rows will be updated, or the wrong rows will be updated.

There are a couple of different techniques that can be used to avoid these two issues.

The first one is to take a database/transaction log backup prior to running an untested UPDATE statement. By having a backup, you have a recovery point should a UPDATE statement update rows that should have been updated. Even though this method technically works, it might not be the most elegant solution to resolving a poorly coded UPDATE statement. It also is not ideal in a very active table where other writes are happening concurrently.

Another method is to first write a SELECT statement using the newly coded WHERE clause. By using a SELECT statement you can verify the correct rows are return. If the wrong rows are displayed, then no problem. Just modify the SELECT statement until the correct WHERE statement is coded, and the correct set of rows are displayed. Once the correct WHERE statement has been generated, change the SELECT statement to an UPDATE statement.

The last method (and one of the best methods when writing scripts to modify a production system) is to wrap the UPDATE statement inside a transaction. First issue a BEGIN TRANACTION, then make your changes. By doing this the UPDATE statement can be rolled back if it updated the rows or values incorrectly. Just be aware that uncommitted changes can block other users depending on how your server and databases are configured.

Updating data using a view

Updating data accessing a view can be a very useful tool. Using a view, you are able to enforce criteria on the user by embedding it in the view. Using the WITH CHECK OPTION, you can ensure that users can only modify (or insert) data that they can see based on any filtering of rows in the view. You can read more about view objects in the Microsoft documentation.

The important thing to understand for this article is that you can only modify one table object’s data at a time. For example, in listing 12 I will create a view that references dbo.Room and dbo.PriceChanges.

CREATE VIEW dbo.v_room
AS
SELECT Room.RoomNum,
       Room.Beds,
       Room.Sleeps,
       Room.StandardRate,
       Room.PriceChangeDateTime,
       Room.RoomDesc,
       PriceChanges.NewStandardRate
FROM   dbo.Room
                JOIN dbo.PriceChanges
                        ON Room.RoomNum = PriceChanges.RoomNum;

Listing 12: View that references multiple tables

One of the nice things about view objects is that they hide the implementation details from the user. However, if a user wrote the query as shown in Listing 13, they will get a confusing error message:

UPDATE dbo.v_room
   SET Beds = 100,
       NewStandardRate = 1000
WHERE  RoomNum = 1;

Listing 13: Update that references multiple tables in the same query

The error output is:

View or function 'dbo.v_room' is not updatable because the modification affects multiple base tables.

To use the view in this particular update scenario, you need to use the code from listing 14:

UPDATE dbo.v_room
  SET Beds = 100
WHERE  RoomNum = 1;

UPDATE dbo.v_room
   SET NewStandardRate = 1000
WHERE  RoomNum = 1;

This works fine.

Note: While it is beyond the scope of this article, you can make any view editable using an instead of trigger object. For more details, go to the documentation for CREATE TRIGGER.

Changing Data with the Basic UPDATE statement

The primary method to maintain existing data in a table as it changes over time is to use the UPDATE statement. An UPDATE statement can update a single column on a single row, or multiple columns on one or more rows. The values used to update a column value can be provided as a scalar expression or can come from column values in another table.

Care needs to be taken when writing UPDATE statements. A badly formed UPDATE statement might update all rows in an entire table, or an incorrect set of rows. Therefore, make sure all UPDATE statements are fully tested prior to running them against your production database. Understanding the basic UPDATE statement is critical in making sure accurate UPDATE statement are written and performed, to maintain database records, as their column values changed over time.

The post The Basics of Updating Data in a SQL Server Table appeared first on Simple Talk.



from Simple Talk https://ift.tt/vSKkfAc
via

Saturday, October 15, 2022

Going to the PASS Data Community Summit this year?

We are just one month away from the PASS Data Community Summit. One month. If you haven’t yet decided to go… let me try to sway you at least one more time to be there in mid- November.

If you work with any of the products that comprise the Microsoft Data Platform, you hopefully have heard of PASS, the PASS Summit, or the current PASS Data Community Summit. If you haven’t, follow this link and check it out. Suffice it to say that it has been one of the biggest conferences for the Microsoft Data Platform over the past 20 years and one I have gone to almost every one of them.

While the main focus of the summit is learning about SQL Server, Power BI, and the array of data platform tools, over time it has taken on a much more personal feel. It has grown to be almost like a family homecoming event where you see family you know as well as family you’ve never met before.

The past two years (for reasons you already know to well) have been virtual only. In the years before it was on site, mostly in Seattle, WA, but it has visited other sites as well. This year is a first… the conference is going to be hybrid, allowing you to be able to attend in Seattle, Washington, US, or anywhere in the planet where there is Internet connectivity.

 In Person

Not much needs to be said about going to the in-person event. The conference will be three main days of breakout sessions, and two days of full day sessions. The entire time you are there will be spent learning, eating, socializing and maybe a little sightseeing around Seattle. Pike Place Market.is just down a few blocks away as well as some amazing shops and museums. (You can sleep on the trip home.)

With a month to go, if you’ve been sitting on the fence of whether to travel to Seattle, it is time to choose a side of that fence. I can offer you one more incentive to fall towards attending: $200 off the price. Just use this code when registering: SIMPLETALKVIP, then before you know it, you will be amongst thousands of friends or soon to be friends and family.

Virtual

I obviously cannot, with clear conscious say that attending virtually will be as good as being there in person. It is not. Even the experience of walking down Pike Street on a chilly fall morning in Seattle is something that you cannot replicate virtually.

  A city street with a fire hydrant Description automatically generated with low confidence

What I will say, is that it is infinitely better than missing out completely and does have certain benefits over attending locally. So, if flying, travel costs, family needs, visas, mobility issues, etc. are a barrier for you making it to the Washington State Conference Center, attending virtually you can get just as much technical education (and still as much socializing as you want without even leaving the city you live in.)

This year will be my third PASS Summit that I will miss in-person for health issues (recovering from a broken foot), and it is immensely comforting that I will at least get to learn new topics, and interact with people, etc. even if it isn’t in person. I will be working as a session producer during all the break-out sessions, but will spend as much other time as I can hanging out in the virtual hangout areas. I will help host a few meetups, one for Simple Talk (for readers and creators alike) and another for the Friends of RedGate.

Beyond the obvious travel (or lack of) benefit, there is one major benefit to attending virtually. Seeing and hearing. After a day of sessions (and a previous night of socializing), the screen starts looking like this.

A group of people sitting in a room looking at a screen Description automatically generated with medium confidence

Sometimes because I am tired, sometimes because the room is large and some of the print is small; and sometimes because I accidentally brought the wrong glasses. Sitting here at my desk attending the session, it is clear as day even without my glasses. If I want to get a little bit of the feel of attending in person, I can cast sessions to my 65-inch TV and invite a friend or family member to occasionally climb past me during the best part of the session and then take pictures of the screen with the flash on! (Luckily I can go back and watch the part I missed later with the recordings!)

Just like I had a code for in-person event, I have, arguably, a better code for online. Use the code LDONLINE210 for 50% off. (That’s right, not $50, 50%. A great deal!).

 So go get registered

If you are able, click here on the registration link and register, attend, and watch as many of the presentations as you can during and after the summit. It will be well worth it!

See you online!

 

 

 

 

The post Going to the PASS Data Community Summit this year? appeared first on Simple Talk.



from Simple Talk https://ift.tt/ogv3iQ5
via

Friday, October 14, 2022

Backing up MySQL Part 1: mysqldump

mysqldump is one of the most popular database backup tools in the MySQL world.

The tool is prevalent partly because it’s very basic and quite powerful – mysqldump database backup tool is command line-based, very simple and very straightforward to use. As far as MySQL or its flavors (MariaDB and Percona Server) are concerned, this command line-based tool is one of the top choices for junior and senior database engineers across the world alike.

What is mysqldump and How Does It Work?

On a high level, mysqldump is a CLI-based database backup utility that is widely used to take logical backups. A logical backup creates a “logical copy” of the database by creating SQL statements that can be used to recreate the database structure, indexes, partitions, and data inside of tables within the database.

The utility can be invoked by typing mysqldump inside of the command line once we’re logged into MySQL and in most use cases, the usage of the tool looks something like this (all as one command on one line, separated for formatting purposes).

mysqldump -u[username] -p[password] 
    [one_or_more_of_the_available_options] 
    [database_name] [table_name] > output.sql

In this code block:

  • username - depicts your username that has access to do the backup.
  • password - is the password of that user.
  • one_or_more_of_the_available_options – lets us define the options needed for a specific dump of the data (we will get into that a little later.) You can also define a database_name from where they want their data to be dumped.

You can also elect to define a table_name inside of the database as well to only dump one table.

The mysqldump command is also usually finalized by specifying a file to write the output to – an arrow towards the right (“>”) with a file name after it will tell MySQL where to write the output of the command (the file can have any legal filename.) Without it, mysqldump will print all results to the screen. Users can also make the arrow face towards the left to import their data dumps into MySQL using the CLI like so:

mysql -uroot -p db_name < backup.sql

Bear in mind that, like most command line tools, it’s not advisable to use mysql or mysqldump by supplying a password in clear text. There are many reasons for this, but the main one is that might be possible for another person to log in and see the last issued commands, which include the provided username and password. It’s best to provide a username and password in the my.cnf file and avoid providing the password in the command line by doing something like this (also, choose a stronger password than the one provided here):

[mysqldump]
user=secret
password=useastrongpasswordhere

Then, our query would look the same, just without the -p[password] option due to the password already being provided in my.cnf: provide a username in the same fashion and you can get rid of the -u[username] option as well.

The output of mysqldump is usually a file with a .sql extension: if the name of the file is specified, MySQL creates the specified file, but if not, the name of the file will consist of the name of the database or the table that is being backed up (the following command would back up all tables inside of a demo_database into a file called demodb_backup.sql):

mysqldump -uroot -p demo_database > demodb_backup.sql

Options Provided by mysqldump

Now that we have covered the basics of how mysqldump works, let’s look at the options of mysqldump. Some of the commonly used options are as follows (all options should be provided after the username and the password, in the [one_or_more_of_the_available_options] position as indicated in the previous section.) All of the options can also be used in conjunction with other options as well:

Option

What Does it Do?

   

--all-databases

Can be used to take a backup of all databases in a database instance.

--add-drop-[table|database]

Adds a DROP [TABLE|DATABASE] statement before creating tables or databases. Useful if we’re migrating updated data back to the server where the old pieces of the data reside.

--fields-terminated-by

Adds a character at the end of one column and at the start of another (data will look like “column1|column2” if the “|” termination denominator is used, etc.) Very frequently used together with LOAD DATA INFILE | LOAD DATA INTO OUTFILE statements.

--force

Using this setting, the database dump continues even if any errors are encountered. Useful in a demo environment. Think of this setting as a brother to the --ignore command when loading a dump into the database – it essentially ignores all of the non-critical errors encountered by MySQL (such as duplicate value errors) letting the backup process continue until it’s finished no matter what.

--no–data

Tells MySQL to avoid dumping the data existing inside tables (i.e. MySQL will dump only the database and table structure, so the dumps will be considerably faster if we have a lot of data.)

--where

Only dumps data matching a specified WHERE clause.

--lock-tables

Allows us to lock all of the tables before taking a copy of them so that no operations could interfere with our backup – in such a case, data will not be inserted, updated, deleted, or otherwise modified. All locks last until the backup operation is finalized.

The table above should allow you to realize how powerful mysqldump can be if used properly; keep in mind that the table above doesn’t provide a complete list of the available options (you can find all options in the MySQL documentation); but it should be a decent starting point to gain a basic understanding of what mysqldump can do.

mysqldump Use Cases

Beyond simple backups, mysqldump can also be used to assist when we find ourselves dealing with “corner-case” usage cases as well: for example, the tool can provide tab-delimited output that can be exceptionally useful if we want to work with data in “big data” text viewers such as EmEditor or work with our data using Microsoft Excel: we simply specify the --tab option as part of the options and mysqldump will do all of the work for us. Should we want to test if the database object definitions are loaded into the database properly, all we have to do is dump our data by using a combination of four options provided one after another:

  • The --all-databases option would dump all databases.
  • The --no-data option would refrain from copying data.
  • The --routines option would dump procedures and functions inside of our database.
  • The --events option would dump the existing scheduled events.

When we use mysqldump in such a fashion, a dump (backup) file will be created, but it won’t have any data. However, loading the file back into a database server, though, will allow us to test for any weird, odd-looking warnings, or spot other noteworthy errors.

mysqldump can also be used together with other client-line programs such as mysql, mysqlimport or in a manual fashion using LOAD DATA INFILE: all we have to do is have the file taken by mysqldump at hand, then issue a statement like so (the following statement would import a file named demo.txt into a database named database_name):

mysqlimport --fields-terminated-by=: --lines-terminated-by=\n 
        database_name demo.txt

We can also use LOAD DATA INFILE in the same fashion: if our data is delimited by, say, the “,” sign, we could make use of a query like this and load our data existing in demo.txt into a table called demo_table:

LOAD DATA INFILE 'demo.txt' INTO TABLE demo_table FIELDS  
        TERMINATED BY ',';

Combining Options with mysqldump

Now let’s rewind a little. Throughout this article, we’ve kept mentioning that other options can be used together with mysqldump to do complex things with the tool. For some, here’s where the depths of mysqldump become a little complex but bear with us and we will explain everything you need to know.

First off, bear in mind that all components of MySQL (including, but not limited to mysqldump) usually have a couple of additional options can be specified and those options vary according to the component that we work with. Combining options is simple:

  1. Choose the component you need to work with (in our case, we’re looking at mysqldump.)
  2. Visit the documentation and familiarize yourself with the available options regarding that component (the complete list of options can be found in the documentation of the component you’ve chosen – the options for mysqldump can be found here.)
  3. Evaluate the issue that you’re having. How does the option you’re looking at help solve it? Will the results be satisfactory?
  4. Choose one or multiple options from those available.

Once you have chosen an option, remember that the output of any issued command is directly dependant on additional options. In other words, any option that is specified modifies the functionality of the command you’ve chosen. All options need to be specified by writing their name as part of a command after “–“, like so:

mysqldump --no-data

The above statement would make mysqldump work with a command named “no-data” (as explained above, the command wouldn’t dump any data, and it would only dump the structure of the database.) Combining --add-drop-table to tell MySQL to make sure the table of the same name doesn’t exist before creating any other tables. Some people may also want to specify an absolute path to the file in order to ensure that the file is going to be stored in a directory of their choice (doing so may be a necessity if the –secure-file-priv option is enabled: the option limits the ability for users to export or import data to or from the database – in that case, data could only be exported into a directory specified in the variable – learn more here.)

Coming back to mysqldump, though, one of the most frequent usage scenarios consist of dumping all databases and including a DROP TABLE statement before any table is created – doing so will ensure that even if the backup is loaded into a database that has the exact same table names, they will be dropped and created anew.:

mysqldump -uroot --all-databases --add-drop-table > backup.sql

As we have covered – mysqldump will dump the data inside of MySQL, but the specifics of what it will do are directly dependent on the options you specify when executing it. All options have a specific use, and once we choose what options we are going to combine the command with, we need to keep in mind the fact that it affects the output that we receive.

The complete list of available options exclusive to mysqldump can be found over at the documentation. When you’re done dumping, test the backup as well: load it into your MySQL database instance through phpMyAdmin or simply specify the name of the file through the CLI like so (note that the database must be created before importing the data into it):

mysql demo < backup.sql

The above command will import a backup into your database called “demo.” Do note this command assumes that you have your username and password specified in my.cnf in the same way you did for mysqlbackup (see example above) – if the username and password are not specified in your my.cnf file, you will still have to provide a username and a password like in the example below.

Then, inspect your database – does the data in it represent the data you expect to be in the instance?

With that being said, we will now leave the gates towards the depths of mysqldump open for you to explore; the advice contained in this blog post should be a decent starting point for those who are interested in how the CLI-based database dumping tool works and why does it work in the way that it does, and the MySQL documentation will act as a guide for people who are interested in the tool on a deeper level. Nothing beats your own experience though – do experiment with mysqldump on local servers, make sure to learn from the mistakes other developers made in the past as well, and until next time!

 

The post Backing up MySQL Part 1: mysqldump appeared first on Simple Talk.



from Simple Talk https://ift.tt/UFVfkPG
via