So far in the series I’ve shown how to create databases, tables, constraints, indexes and schema. Now, it’s time to put some of that information to work and begin the process of manipulating data within the database. After all, a database is only useful if there’s information stored within. PostgreSQL makes use of standard SQL for operations like INSERT
, UPDATE
and DELETE
. However, as with so much of what I’ve learned in PostgreSQL, there are quite a few interesting wrinkles that are different to my “SQL Server” eyeballs.
In the sample database I’ve created as a part of this ongoing series, I created a couple of schemas and organized my tables within them. If you wish to execute the code or look at the data structures, the code is in my ScaryDBA/LearningPostgreSQL
repository here. The objects and database you will need can be created/reset using the CreateDatabase.sql
script, then adding sample data using the SampleData.sql
script. The rest of the code from this article is in the 09_DataManipulation folder.
INSERT
We may as well start by adding data to a table. The core behavior of INSERT
is very much as you would expect coming from a SQL Server background:
INSERT INTO radio.radios (radio_name, manufacturer_id, connectortype_id, digitalmode_id) VALUES ('GD-88', 7, 3, 2);
The basic behavior is pretty straightforward. You define the statement: INSERT INTO (
Note, INTO
is not optional in PostgreSQL.)
. Then you tell it which table you’re addressing, including the schema: radio.radios
. Yeah, like SQL Server you can leave the schema off and the PostgreSQL engine will figure things out for you. Don’t do that. It’s a good practice, from the get-go, to define your tables including their schema. You avoid issues down the line by developing that habit early.
Then you list the columns. Like how SQL Server code (and most RDBMS code) works, you can skip columns that have a default (or allow NULL
values). In this case, there is a radio_id
column that’s a sequence, so I don’t have to supply it in the column list. Then, you define the VALUES
as shown. The VALUES
clause is where you specify the values that will be set for the column when the row is inserted.
If you wanted to add multiple rows at once, it’s very similar syntax to what you’d see in SQL Server.
INSERT INTO radio.radios (radio_name, manufacturer_id, connectortype_id, digitalmode_id) VALUES ('FT300DR', 1, 4, 1), --added a second row ('IC-V86', 2, 2, NULL);
I simply added a comma delimited set of parentheses to the VALUES
. You can also see how I dealt with a NULL
value by simply using the defined key word.
As with SQL Server, you can also do an INSERT
statement with the source of rows based on a SELECT
:
INSERT INTO radio.radiobands (radio_id, band_id) SELECT 9, band_id FROM radio.radiobands WHERE radio_id = 2;
In this case, the radio I added earlier has the same two bands as another radio, so I can use the values from that known radio to add to the radio.radiobands
table.
Finally, you can use the default value that is defined for a column (or for a column that accepts NULL
), by using the key word DEFAULT
:
INSERT INTO radio.connectortypes (connectorytype_id, connectortype_name) VALUES (DEFAULT, 'F-Type Male');
In this case, since connectortype_id
is managed by the identity
property, we automatically get a value, so using DEFAULT
let’s us still list the column, but we don’t have to supply a value (and in this case, we couldn’t anyway).
If there are default values for all the columns, you could write the INSERT
like this:
INSERT INTO radio.connectorytypes DEFAULT VALUES;
Although, on this database, that will cause an error since there is no default value defined for the connectorytype_name
.
Overall, this behavior is exactly what I would have expected. In fact, it’s largely the same as what I’m used to in SQL Server. There are some differences though.
One small difference is that the INTO
keyword, has to be a part of the syntax. Whereas, you can cheat and leave that off in SQL Server. Personally, I tend to use it because that’s how I learned to use T-SQL and I think the INTO
adds clarity. PostgreSQL enforces its use because that is part of the ANSI standard, which PostgreSQL follows more closely than most other database systems.
Another useful feature that is not a part of SQL Server is the ability to use OVERRIDING
to add your own data rather than let the system generate it for you.
For example, if wanted to specify an identity value for a column, rather than let the identity mechanism generate it for me, I could do this:
INSERT INTO radio.bands (band_id, band_name, frequency_start_khz, frequency_end_khz, country_id) OVERRIDING SYSTEM VALUE VALUES (10, '6 Meters', 50000, 54000, 1);
The OVERRIDING SYSTEM VALUE
clause lets me add my own value to the band_id
column, bypassing the identity property for that column. That can’t be done within the INSERT
statement in T-SQL but instead requires changing a setting prior to running the INSERT
. This is clearly a lot easier.
Finally, there’s one really neat trick that you can’t do in SQL Server, ON CONFLICT
:
INSERT INTO radio.antenna (antenna_name, manufacturer_id, connectortype_id) VALUES ('rubber duck', 2, 2) ON CONFLICT (antenna_name, manufacturer_id) DO UPDATE SET connectortype_id = excluded.connectortype_id;
Basically, this code will add the defined antenna, or it will update an existing antenna, but only for the row where we violated the unique index that exists on the table on the two columns, antenna_name and manufacturer_id. In short, it’s a way to do a MERGE
statement without doing a MERGE
statement. I’m not sure if, like MERGE
in SQL Server, there are performance implications. You can also use the clause DO NOTHING
to prevent a response to the conflict.
This is most of the syntax for the INSERT
statement, but not everything. If you want to see more, you can see there are a few additional features here in the PostgreSQL INSERT statement documentation.
UPDATE
When it’s time to change data in the database, you’re going to use the UPDATE
statement, as you’d expect:
UPDATE radio.bands SET band_name = '70 CM' WHERE band_id = 2;
The statement is UPDATE
. You then supply the table, and yes, just as you can only add data to a single table at a time, you can only modify data to a single table at a time. The SET
command then lets you pick and choose which columns you’re going to modify. Finally, the WHERE
clause is used to filter the data to only modify the row or rows, you’re interested in.
Leaving off the WHERE
clause will modify all data in the table.
You can also use the FROM
clause to modify data between tables something like this:
UPDATE radio.bands SET band_name = r.radio_name FROM radio.radios AS r WHERE band_id = r.radio_id;
One additional note on UPDATE
. Since it’s possible to use table inheritance within PostgreSQL, an additional clause can be added to UPDATE
statements to ensure that only the table specified has data modified within it. (A discussion about table inheritance is beyond this article, but the PostgreSQL documentation has a straightforward explanation here):
UPDATE ONLY radio.bands SET band_name = r.radio_name FROM radio.radios AS r WHERE band_id = r.radio_id;
Except for the addition of the ONLY clause, this UPDATE is the same as the one above. However, now, if radio.bands
was inherited from another table, this statement ensures that only the specific table specified is affected.
DELETE
With the DELETE
command, we finally have a bit more deviation from the standards, although, the standards are there as well. A simple statement like this will remove all data in a table:
DELETE FROM radio.bands;
Just as with SQL Server, you can also use the TRUNCATE
command to remove data from a table and it is faster. TRUNCATE
does have limitations (permission and will not work with FOREIGN KEY
constraints), for example. In fact, if you attempt to execute this DELETE
statement you will get an error because it causes a foreign key constraint error.
It does have benefits for larger updates such as removing all row versions without the VACUUM
process needing to execute (For more details on the VACCUM
process, check out Henrietta Dombrovskaya’s post here).
If you want to get specific, you take advantage of the WHERE
clause:
DELETE FROM radio.antenna WHERE antenna_id = 42;
That’s about what I’d expect. One point, just as you had to keep the INTO
clause for an INSERT
, you must use the FROM
keyword in DELETE
. And if you want to reference another table, you don’t simply start writing JOIN
statements as I would in T-SQL. Instead, you must insert the USING
clause:
DELETE FROM radio.antennabands AS ab USING radio.bands AS b WHERE ab.band_id = b.band_id AND b.band_name = '6 Meters';
So here I used an alias on both the antennabands
table and the bands
table. Then I used the WHERE
clause to define the join criteria between the tables, and the filtering criteria for the band_name
.
In addition to these behaviors, you also have the ONLY
clause to deal with inheritance.
The RETURNING Clause
One piece of behavior that’s common across all the standard data manipulation commands is the RETURNING
clause. Basically, this returns as a result set, the data that was created or modified. So, in the example of an INSERT
, you wouldn’t see anything other than what you supplied, except where there are defaults such as a sequence number on an identity
column. Then, you can get the value, or values for multi-row inserts, that were generated.
For an UPDATE
statement, the RETURNING
clause will return the new values for the row, especially useful if you’ve done calculations on a column or columns to see the values that resulted.
When you run a DELETE
statement, the RETURNING
clause will show you the values for the row or rows that were removed from the table.
The RETURNING
clause can be put to really interesting types of use, especially since you can use common table expressions (CTEs) with all these data manipulation queries. That makes it possible to do something like this:
WITH addant AS ( /* create a new antenna row */ INSERT INTO radio.antenna (antenna_name, manufacturer_id, connectortype_id) VALUES('Rubber duck', 1, 2) RETURNING antenna_id) /* take the new antenna_id and create a new antennabands row */ INSERT INTO radio.antennabands (antenna_id, band_id) SELECT aa.antenna_id, 1 FROM addant AS aa;
The WITH
clause defines a rowset, addant
, which is just the RETURNING
value of the antenna_id
generated from the INSERT
statement that defines the CTE. I can then use that value to add another row to another table, all as part of a single statement. This opens up a lot of possibilities of stacking statement that can run as one single statement.
MERGE
As in SQL Server, the MERGE
command gives you the ability to combine INSERT
, UPDATE
and DELETE
operations in various combinations into a single statement. When using MERGE
, you are going to be evaluating conditions to determine behaviors. This means you’ll always have a target table where the actions are going to occur. In addition, you need to have an evaluation data source. This can be a table, or a set of tables defined in a sub-select, whatever you need to evaluate the necessary actions.
From there, you can define WHEN MATCHED
for conditions that require a matched value, or WHEN NOT MATCHED
for those other conditions. The order you define them in, is the order in which they’ll be evaluated.
Worth noting, the MERGE
command is new to PostgreSQL 15.
Here’s an example. I’ll build out some data in a temporary table, and then use the logic to add the data, update it, or delete it, based on information being passed:
CREATE TEMPORARY TABLE radioupdates (radio_name varchar(100), manufacturer_id int, connectortype_id int, delete_flag int); INSERT INTO radioupdates (radio_name, manufacturer_id, connectortype_id, delete_flag) VALUES ('UV5R', 2, 3, 0), ('UV5R', 3, 3, 0), ('UV5R', 1, 3, 1);
Then I will use the following MERGE
statement to merge the values in the temp table into the radios table
MERGE INTO radio.radios AS r USING radioupdates AS ru ON ru.radio_name = r.radio_name AND ru.manufacturer_id = r.manufacturer_id WHEN NOT MATCHED THEN INSERT VALUES(DEFAULT, ru.radio_name, ru.manufacturer_id, NULL, ru.connectortype_id, NULL) WHEN MATCHED AND ru.delete_flag = 0 THEN UPDATE SET connectortype_id = ru.connectortype_id WHEN MATCHED AND ru.delete_flag = 1 THEN DELETE;
The trick is just to get the logic right. In my MERGE
statement, the logic is: if there are no matches based on the ON
criteria, WHEN NOT MATCHED
, I’ll INSERT
the row. Then, if it matches, but it’s not flagged to be a DELETE
, it does an UPDATE
operation on the values from the source. Otherwise, if it matches and it’s flagged for deleting, it gets deleted.
You can also specify DO NOTHING
for the outcome of evaluations. The ONLY
key word can be used to deal with inheritance as mentioned in the UPDATE
section. You can even use OVERRIDING
in the INSERT
clause. In short, most of the behaviors we’ve gone over through this article are available in MERGE
.
Just remember, the old approach was to just use the ON CONFLICT
clause to achieve an UPSERT
(UPDATE
or INSERT
) command. That method was not as powerful as MERGE
when you need the extra complexity, for example, removing data from the target.
For complete details about the MERGE
statement, here is the link to the PostgreSQL documentation.
Conclusion
Manipulating data within PostgreSQL is one of the easiest things I’ve learned so far. Mostly, broad strokes, it’s the same as with SQL Server. While there are a few details different in some areas, mostly, this is the kind of behavior I expected. I really like how you can use common table expressions with these commands. I also like how RETURNING
works to allow you to do some really customized behaviors. Overall, there’s quite a bit of useful functionality within PostgreSQL when it comes to directly manipulating data.
The post Manipulating Data In PostgreSQL: Learning PostgreSQL with Grant appeared first on Simple Talk.
from Simple Talk https://ift.tt/P0R3xry
via
No comments:
Post a Comment