The VALUES
clause is probably one of the most misused features in SQL. If you look at SQL forums online, you’ll see people use it as the second clause in an insertion statement, but they only use it to construct a single row at a time, thus:
BEGIN INSERT INTO Zodiac (astro_sign, astro_start_date, astro_end_date) VALUES ('Aries', '2022-03-21', '2022-04-19'); INSERT INTO Zodiac (astro_sign, astro_start_date, astro_end_date) VALUES ('Taurus', '2022-04-20', '2022-05-20'); … INSERT INTO Zodiac (astro_sign, astro_start_date, astro_end_date) VALUES ('Pisces', '2023-02-19', '2023-03-20'); END;
Each insertion statement ends with a semi-colon, so they will be executed separately and in the order presented. An optimizer doesn’t dare combine them because there might be a forward reference to previous insertions.
I think people write this kind of code because that this is how you would read punch cards. Each card goes into a card reader, gets buffered, and written in the order presented to the magnetic tape or disk file. Welcome to 1960! Stop mimicking old programming languages like FORTRAN or BASIC that had WRITE statements and put one record at a time into a file. Start thinking of working with entire sets.
Using VALUES to insert rows
The VALUES
clause is more appropriately called a table constructor. Each row constructor within the table is a comma-separated list enclosed in parentheses. Officially, there is an optional keyword ROW
. They can be placed at the start of each list. Nobody does this, and it is a bit redundant, but was required in MySQL.
One of the worst ways of constructing a table is to use the CREATE
or DECLARE
construct to build a temporary table, load it with insertion statements, and finally insert the table into the desired destination. This leads to multiple statements with no way to really optimize the insertion and shows that you are really not thinking in sets yet.
The entire zodiac can be inserted with a single statement like this:
INSERT INTO Zodiac (astro_sign, astro_start_date, astro_end_date) VALUES ( ('Aries', '2022-03-21', '2022-04-19'), ('Taurus', '2022-04-20', '2022-05-20'), ('Gemini', '2022-05-21', '2022-06-21'), ('Cancer', 2022-06- 22', '2022-07-22'), ('Leo', '2022-07-23', '2022-08-22'), ('Virgo', '2022-08-23', '2022-09-22'), ('Libra', '2022-09-23', '2022-10-23'), ('Scorpius', '2022-10-24', '2022-11-21'), ('Sagittarius', '2022-11-22', '2022-12-21'), ('Capricorn', '2022-12-22', '2023-01-19'), ('Aquarius', '2023-01-20', '2023-02-18'), ('Pisces', '2023-02-19', '2023-03-20') );
Given a whole set of rows, the optimizer can deal with a single atomic statement. Not only does it save execution time as compared to the row-at-a-time model of insertion, but it presents the optimizer with an opportunity to improve things. The insertion statement can rearrange the list of new rows and pick an optimal ordering. It also means that if one of my rows had an error in it, I wouldn’t have to back out all of the other rows. If I wanted a proper ACID transaction model, I would’ve had to back out each individual insert up until I came to the insertion that gave me the error.
Here’s the basic syntax. Please note that besides including an expression of the proper data type, you can use the keywords DEFAULT
or NULL
in a row constructor. Obviously, those values must make sense in relation to the declaration of the table into which you are inserting.
VALUES (<row value expression list>) [ , ...n ] <row value expression list> ::= {<row value expression> } [ , ...n ] <row value expression> ::= { DEFAULT | NULL | <expression> }
Please remember that an expression is not always a simple constant. In fact, it’s very handy to use the CAST (<exp> AS <data type>)
function as a way to assure that a column in the constructed virtual table has a known data type:
VALUES (CAST(‘foobar’, AS NVARCHAR(10), CAST(42 AS INTEGER), CAST(3.14159 AS REAL))
The AS
keyword can also be used, to give the constructed tables each a name. Here is skeleton:
SELECT X.a, X.b, X.c, Y.a, Y.b, Y.c FROM (VALUES (1,2,3) , (4,5,6) ) AS X(a, b, c) , (VALUES (1,2,3), (4,5,6) ) AS Y(a,b,c) WHERE X.a =Y.a AND X.b = Y.b AND X.c <= 0.0;
MERGE with VALUES Clause
The MERGE
statement was added to Standard SQL several years ago. It was based on a proposal by ANSI representatives from Oracle and IBM, but forms of it had already existed in other products, though under a different name. The most common one was UPSERT
from Postgres. Let’s jump right into it.
One table expression is the target, the table you are trying to modify. The other table expression is the source, the table that provides the modifications. Presumably, you want the target to persist, but you don’t need the source to persist after the updates and insertions are done.
The MERGE
clause defines the target; the USING
clause defines the source. And the ON
clause matches the two tables. The WHEN [NOT] MATCHED ...THEN
clauses determine the action to be taken
MERGE INTO Sales.Sales_Reasons AS Target USING (VALUES ('Recommendation', 'Other'), ('Review', 'Marketing'), ('Internet', 'Promotion')) AS Source (new_sales_name, new_reason_type) ON Target.sales_name = Source.new_sales_name WHEN MATCHED THEN UPDATE SET reason_type = Source.new_reason_type WHEN NOT MATCHED THEN INSERT (sales_name, reason_type) VALUES (new_sales_name, new_reason_type);
It’s easier to think of the MERGE
statement as a program, written as a single statement Instead of having IF–THEN–ELSE
logic or CASE
expressions in multiple statements.
Obviously, updating makes sense only when there is a match, and inserting makes sense only when there is not a match. The standards allow for either of these clauses to include an optional … AND <search condition>
, so you can add quite a bit of logic to this one statement. Technically, the WHEN clauses list can finish with the ELSE
IGNORE
; it acts as a placeholder just as the ELSE
clause did in the CASE
expression. Microsoft has more extensions to the syntax, and there have been some performance issues. If you are using it in SQL Server, I strongly suggest checking both the syntax and current performance in whichever version of SQL Server you’re running.
Conclusion
The VALUES
clause is ANSI standard and implemented by many relational database vendors. The VALUES
clause can save typing, and it’s also the rare case when easier can mean better performance because the rows will be treated as a set.
The post The VALUES clause or building tables out of nothing appeared first on Simple Talk.
from Simple Talk https://ift.tt/kec5hyv
via
No comments:
Post a Comment