Monday, June 12, 2023

Manipulating Data In PostgreSQL: Learning PostgreSQL with Grant

So far in the series I’ve shown how to create databases, tables, constraints, indexes and schema. Now, it’s time to put some of that information to work and begin the process of manipulating data within the database. After all, a database is only useful if there’s information stored within. PostgreSQL makes use of standard SQL for operations like INSERT, UPDATE and DELETE. However, as with so much of what I’ve learned in PostgreSQL, there are quite a few interesting wrinkles that are different to my “SQL Server” eyeballs.

In the sample database I’ve created as a part of this ongoing series, I created a couple of schemas and organized my tables within them. If you wish to execute the code or look at the data structures, the code is in my ScaryDBA/LearningPostgreSQL repository here. The objects and database you will need can be created/reset using the CreateDatabase.sql script, then adding sample data using the SampleData.sql script. The rest of the code from this article is in the  09_DataManipulation folder.

INSERT

We may as well start by adding data to a table. The core behavior of INSERT is very much as you would expect coming from a SQL Server background:

INSERT
        INTO
        radio.radios
(radio_name,
        manufacturer_id,
        connectortype_id,
        digitalmode_id)
VALUES
('GD-88',
7,
3,
2);

The basic behavior is pretty straightforward. You define the statement: INSERT INTO (Note, INTO is not optional in PostgreSQL.). Then you tell it which table you’re addressing, including the schema: radio.radios. Yeah, like SQL Server you can leave the schema off and the PostgreSQL engine will figure things out for you. Don’t do that. It’s a good practice, from the get-go, to define your tables including their schema. You avoid issues down the line by developing that habit early.

Then you list the columns. Like how SQL Server code (and most RDBMS code) works, you can skip columns that have a default (or allow NULL values). In this case, there is a radio_id column that’s a sequence, so I don’t have to supply it in the column list. Then, you define the VALUES as shown. The VALUES clause is where you specify the values that will be set for the column when the row is inserted.

If you wanted to add multiple rows at once, it’s very similar syntax to what you’d see in SQL Server.

INSERT
INTO
radio.radios
(radio_name,
manufacturer_id,
connectortype_id,
digitalmode_id)
VALUES
('FT300DR',
1,
4,
1),

--added a second row
('IC-V86',
2,
2,
NULL);

I simply added a comma delimited set of parentheses to the VALUES. You can also see how I dealt with a NULL value by simply using the defined key word.

As with SQL Server, you can also do an INSERT statement with the source of rows based on a SELECT:

INSERT
INTO
radio.radiobands 
(radio_id,
band_id)
SELECT
9,
band_id
FROM
radio.radiobands
WHERE
radio_id = 2;

In this case, the radio I added earlier has the same two bands as another radio, so I can use the values from that known radio to add to the radio.radiobands table.

Finally, you can use the default value that is defined for a column (or for a column that accepts NULL), by using the key word DEFAULT:

INSERT
INTO
radio.connectortypes (connectorytype_id,
connectortype_name)
VALUES
(DEFAULT,
'F-Type Male');

In this case, since connectortype_id is managed by the identity property, we automatically get a value, so using DEFAULT let’s us still list the column, but we don’t have to supply a value (and in this case, we couldn’t anyway).

If there are default values for all the columns, you could write the INSERT like this:

INSERT
INTO radio.connectorytypes
DEFAULT VALUES;

Although, on this database, that will cause an error since there is no default value defined for the connectorytype_name.

Overall, this behavior is exactly what I would have expected. In fact, it’s largely the same as what I’m used to in SQL Server. There are some differences though.

One small difference is that the INTO keyword, has to be a part of the syntax. Whereas, you can cheat and leave that off in SQL Server. Personally, I tend to use it because that’s how I learned to use T-SQL and I think the INTO adds clarity. PostgreSQL enforces its use because that is part of the ANSI standard, which PostgreSQL follows more closely than most other database systems.

Another useful feature that is not a part of SQL Server is the ability to use OVERRIDING to add your own data rather than let the system generate it for you.

For example, if wanted to specify an identity value for a column, rather than let the identity mechanism generate it for me, I could do this:

INSERT INTO
radio.bands
(band_id,
band_name,
frequency_start_khz,
frequency_end_khz,
country_id)

OVERRIDING SYSTEM VALUE

VALUES (10,
'6 Meters',
50000,
54000,
1);

The OVERRIDING SYSTEM VALUE clause lets me add my own value to the band_id column, bypassing the identity property for that column. That can’t be done within the INSERT statement in T-SQL but instead requires changing a setting prior to running the INSERT. This is clearly a lot easier.

Finally, there’s one really neat trick that you can’t do in SQL Server, ON CONFLICT:

INSERT
INTO
radio.antenna 
(antenna_name,
manufacturer_id,
connectortype_id)
VALUES ('rubber duck',
2,
2)

ON
CONFLICT (antenna_name,
manufacturer_id)
DO
UPDATE
SET
connectortype_id = excluded.connectortype_id;

Basically, this code will add the defined antenna, or it will update an existing antenna, but only for the row where we violated the unique index that exists on the table on the two columns, antenna_name and manufacturer_id. In short, it’s a way to do a MERGE statement without doing a MERGE statement. I’m not sure if, like MERGE in SQL Server, there are performance implications. You can also use the clause DO NOTHING to prevent a response to the conflict.

This is most of the syntax for the INSERT statement, but not everything. If you want to see more, you can see there are a few additional features here in the PostgreSQL INSERT statement documentation.

UPDATE

When it’s time to change data in the database, you’re going to use the UPDATE statement, as you’d expect:

UPDATE
radio.bands
SET
band_name = '70 CM'
WHERE
band_id = 2;

The statement is UPDATE. You then supply the table, and yes, just as you can only add data to a single table at a time, you can only modify data to a single table at a time. The SET command then lets you pick and choose which columns you’re going to modify. Finally, the WHERE clause is used to filter the data to only modify the row or rows, you’re interested in.

Leaving off the WHERE clause will modify all data in the table.

You can also use the FROM clause to modify data between tables something like this:

UPDATE
radio.bands
SET
band_name = r.radio_name
FROM
radio.radios AS r
WHERE
band_id = r.radio_id;

One additional note on UPDATE. Since it’s possible to use table inheritance within PostgreSQL, an additional clause can be added to UPDATE statements to ensure that only the table specified has data modified within it. (A discussion about table inheritance is beyond this article, but the PostgreSQL documentation has a straightforward explanation here):

UPDATE ONLY
radio.bands
SET
band_name = r.radio_name
FROM
radio.radios AS r
WHERE
band_id = r.radio_id;

Except for the addition of the ONLY clause, this UPDATE is the same as the one above. However, now, if radio.bands was inherited from another table, this statement ensures that only the specific table specified is affected.

DELETE

With the DELETE command, we finally have a bit more deviation from the standards, although, the standards are there as well. A simple statement like this will remove all data in a table:

DELETE
FROM
radio.bands;

Just as with SQL Server, you can also use the TRUNCATE command to remove data from a table and it is faster. TRUNCATE does have limitations (permission and will not work with FOREIGN KEY constraints), for example. In fact, if you attempt to execute this DELETE statement you will get an error because it causes a foreign key constraint error.

It does have benefits for larger updates such as removing all row versions without the VACUUM process needing to execute (For more details on the VACCUM process, check out Henrietta Dombrovskaya’s post here).

If you want to get specific, you take advantage of the WHERE clause:

DELETE
FROM
radio.antenna
WHERE
antenna_id = 42;

That’s about what I’d expect. One point, just as you had to keep the INTO clause for an INSERT, you must use the FROM keyword in DELETE. And if you want to reference another table, you don’t simply start writing JOIN statements as I would in T-SQL. Instead, you must insert the USING clause:

DELETE
FROM
radio.antennabands AS ab
USING radio.bands AS b
WHERE
ab.band_id = b.band_id
AND b.band_name = '6 Meters';

So here I used an alias on both the antennabands table and the bands table. Then I used the WHERE clause to define the join criteria between the tables, and the filtering criteria for the band_name.

In addition to these behaviors, you also have the ONLY clause to deal with inheritance.

The RETURNING Clause

One piece of behavior that’s common across all the standard data manipulation commands is the RETURNING clause. Basically, this returns as a result set, the data that was created or modified. So, in the example of an INSERT, you wouldn’t see anything other than what you supplied, except where there are defaults such as a sequence number on an identity column. Then, you can get the value, or values for multi-row inserts, that were generated.

For an UPDATE statement, the RETURNING clause will return the new values for the row, especially useful if you’ve done calculations on a column or columns to see the values that resulted.

When you run a DELETE statement, the RETURNING clause will show you the values for the row or rows that were removed from the table.

The RETURNING clause can be put to really interesting types of use, especially since you can use common table expressions (CTEs) with all these data manipulation queries. That makes it possible to do something like this:

WITH addant AS
(
/* create a new antenna row */
INSERT
INTO
radio.antenna (antenna_name,
manufacturer_id,
connectortype_id)
VALUES('Rubber duck',
1,
2)
RETURNING antenna_id)
/*
take the new antenna_id and create
a new antennabands row 
*/
INSERT
INTO
radio.antennabands 
(antenna_id,
band_id)
SELECT
aa.antenna_id,
1
FROM
addant AS aa;

The WITH clause defines a rowset, addant, which is just the RETURNING value of the antenna_id generated from the INSERT statement that defines the CTE. I can then use that value to add another row to another table, all as part of a single statement. This opens up a lot of possibilities of stacking statement that can run as one single statement.

MERGE

As in SQL Server, the MERGE command gives you the ability to combine INSERT, UPDATE and DELETE operations in various combinations into a single statement. When using MERGE, you are going to be evaluating conditions to determine behaviors. This means you’ll always have a target table where the actions are going to occur. In addition, you need to have an evaluation data source. This can be a table, or a set of tables defined in a sub-select, whatever you need to evaluate the necessary actions.

From there, you can define WHEN MATCHED for conditions that require a matched value, or WHEN NOT MATCHED for those other conditions. The order you define them in, is the order in which they’ll be evaluated.

Worth noting, the MERGE command is new to PostgreSQL 15.

Here’s an example. I’ll build out some data in a temporary table, and then use the logic to add the data, update it, or delete it, based on information being passed:

CREATE TEMPORARY TABLE radioupdates (radio_name varchar(100),
manufacturer_id int,
connectortype_id int,
delete_flag int);
INSERT
INTO
radioupdates (radio_name,
manufacturer_id,
connectortype_id,
delete_flag)
VALUES
('UV5R',
2,
3,
0),
('UV5R',
3,
3,
0),
('UV5R',
1,
3,
1);

Then I will use the following MERGE statement to merge the values in the temp table into the radios table

MERGE
INTO
radio.radios AS r
USING radioupdates AS ru
ON
ru.radio_name = r.radio_name
AND ru.manufacturer_id = r.manufacturer_id
WHEN NOT MATCHED THEN
INSERT
VALUES(DEFAULT,
ru.radio_name,
ru.manufacturer_id,
NULL,
ru.connectortype_id,
NULL)
WHEN MATCHED
AND ru.delete_flag = 0 THEN
UPDATE
SET
connectortype_id = ru.connectortype_id
WHEN MATCHED
AND ru.delete_flag = 1 THEN
DELETE;

The trick is just to get the logic right. In my MERGE statement, the logic is: if there are no matches based on the ON criteria, WHEN NOT MATCHED, I’ll INSERT the row. Then, if it matches, but it’s not flagged to be a DELETE, it does an UPDATE operation on the values from the source. Otherwise, if it matches and it’s flagged for deleting, it gets deleted.

You can also specify DO NOTHING for the outcome of evaluations. The ONLY key word can be used to deal with inheritance as mentioned in the UPDATE section. You can even use OVERRIDING in the INSERT clause. In short, most of the behaviors we’ve gone over through this article are available in MERGE.

Just remember, the old approach was to just use the ON CONFLICT clause to achieve an UPSERT (UPDATE or INSERT) command. That method was not as powerful as MERGE when you need the extra complexity, for example, removing data from the target.

For complete details about the MERGE statement, here is the link to the PostgreSQL documentation.

Conclusion

Manipulating data within PostgreSQL is one of the easiest things I’ve learned so far. Mostly, broad strokes, it’s the same as with SQL Server. While there are a few details different in some areas, mostly, this is the kind of behavior I expected. I really like how you can use common table expressions with these commands. I also like how RETURNING works to allow you to do some really customized behaviors. Overall, there’s quite a bit of useful functionality within PostgreSQL when it comes to directly manipulating data.

 

The post Manipulating Data In PostgreSQL: Learning PostgreSQL with Grant appeared first on Simple Talk.



from Simple Talk https://ift.tt/P0R3xry
via

No comments:

Post a Comment