Writing queries to retrieve the data from a database is probably the single most common task when it comes to working with data. Working with data in PostgreSQL is no exception. Further, PostgreSQL has an incredibly rich, wide, and varied set of mechanisms for retrieving data. From standard SELECT… FROM… WHERE
to windowing functions and recursive queries, PostgreSQL has it all. I honestly can’t do it justice in a single article. Further, since so much of this functionality is effectively identical to where I’m more comfortable, SQL Server, I’m not turning this into a PostgreSQL 101 on the SELECT
statement.
Instead, for this series, I’m just going to assume I may have more than one article on querying PostgreSQL. For this entry in the series, I’m going to focus on the core behaviors of SELECT
, FROM
and WHERE
with an emphasis on what’s different from SQL Server. This won’t be a fundamental how-to on querying PostgreSQL, but instead an exploration of the gotchas you’re likely to experience coming in with existing knowledge of how you think these things should work. And hoo boy, there’s some fun stuff in there. Let’s get stuck in.
In the sample database I’ve created as a part of this ongoing series, I created a couple of schemas and organized my tables within them. If you wish to execute the code or look at the data structures, the code is in my ScaryDBA/LearningPostgreSQL
repository here. The objects and database you will need can be created/reset using the CreateDatabase.sql
script, then adding sample data using the SampleData.sql
script. After executing that script, execute the Sample The rest of the code in this article is in the 10_Select folder.
FROM
I actually love how the PostgreSQL document defines what you’re doing in the FROM
clause:
Trivial table expressions simply refer to a table on disk, a so-called base table, but more complex expressions can be used to modify or combine base tables in various ways.
While I wouldn’t myself define it this way, I find it to be interestingly more accurate than simply saying “table.” However, since you can make a query on a table expression, it is more than simply saying “tables.” Not all the definitions are useful though. Take this further explanation of what defines a table expression:
The result of the FROM list is an intermediate virtual table that can then be subject to transformations by the WHERE, GROUP BY, and HAVING clauses and is finally the result of the overall table expression.
It sounds like the FROM
clause is a temporary table or something. Yet, I know (reading ahead in the docs) that execution plans in PostgreSQL are similar to SQL Server and this description leads us down a path: virtual table, that’s not accurate, depending on the whole query, structures involved and statistics. While it can help to visualize this way, it certainly is not implemented this way.
However, the rest is what I expect. List table names, and/or, define a table through a sub-SELECT
. Aliasing, everything, pretty much the way it works in T-SQL because that’s the way it works in the ANSI SQL Standard, to which PostgreSQL complies very closely.
JOIN
So much of the FROM
clause in PGSQL is the same as T-SQL. The first big difference is in the use of JOIN
operations. The standard join operators are the same: INNER
, LEFT
/RIGHT
OUTER
, CROSS
, FULL OUTER
. These all perform the same logical functions. The fun stuff is in the internal syntax. Such as the USING
clause:
SELECT r.radio_name, m.manufacturer_name FROM radio.radios r JOIN radio.manufacturers m USING (manufacturer_id);
The first time I did this, and it worked, it honestly felt like black magic. Of course, for true black magic, we’d have to use the NATURAL
clause:
SELECT r.radio_name, m.manufacturer_name FROM radio.radios r NATURAL JOIN radio.manufacturers m;
Basically, PostgreSQL figures out, based on naming and data type, which are the common columns between two tables, so you don’t have to define the JOIN
criteria at all. Also note that I left off the syntactic sugar of INNER
. PostgreSQL figured out what I wanted, just like SQL Server does. I like this standard which also allows me to do this:
SELECT a.antenna_name, ab.band_id FROM radio.antenna a NATURAL LEFT JOIN radio.antennabands ab;
Also worth noting, the AS
word is optional. Personally, I prefer it, but I’m currently using DBeaver which supplies an alias (similar to SQL Prompt) but leaves off the AS
key word.
Natural joins are a nice feature but beware that if your naming standard allows you to have the same name in multiple tables (for example a Name
or Description
column), you may not get the results you expect. Also, worth noting is if the tables do not contain a column with the same name, you will get the same results as if you use a CROSS JOIN
. Hence this is something I would rarely use in production code but will be very nice doing ad hoc querying.
LATERAL
In addition to subqueries, there are also table valued functions. There are some differences in how they work, but the devil there is in the details, not the larger behaviors. When using subqueries and functions, you can get the T-SQL equivalent of a CROSS APPLY
by using the PostgreSQL version, LATERAL
.
SELECT b.band_name, rl.radio_name FROM radio.bands b, LATERAL ( SELECT * FROM radio.radios r JOIN radio.radiobands rb ON r.radio_id = rb.radio_id WHERE b.frequency_start_khz < 146) rl;
Just like CROSS APPLY
in T-SQL, you basically get a functional loop. By that I mean that for each row in the radio.bands
table, you’ll get an execution of the LATERAL
query. Any columns referenced from the outer tables (in this case radio.bands
), will be available as parameters to the query.
The example above could be rewritten just using a traditional join to get the same results. However, when it comes time to start to use functions with parameters to return data, LATERAL
becomes very useful. You can also use LEFT JOIN LATERAL
to get the equivalent of a LEFT JOIN
in the execution.
WHERE
Apart from unique PostgreSQL functions within the WHERE
clause, it’s basically the same as T-SQL. No real weird surprises in functionality. For example, you can use the WHERE
clause to define join criteria instead of using ON
. However, just like in T-SQL, if you do this with an OUTER JOIN
, you’re effectively turning it into an INNER JOIN
(yes, I know that’s not the complete story, but it is, basically, what happens).
Mind you, the sheer number of value expressions as well as functions and operators is daunting. I’m not going to attempt to explain even a partial listing of what’s possible. Suffice to say, you can do a lot of things in the WHERE
clause in PostgreSQL that you simply can’t in SQL Server. For example, in T-SQL, you have the LIKE
operator to search a string for similar values. In PostgreSQL you also get SIMILAR TO
<regular expression> and POSIX
<regular expression>.
SELECT
I chose the order, FROM
, WHERE
, SELECT
, because in PostgreSQL, as in SQL Server, this is the actual order of operations. In a simple query, you simply define where the data is originating: FROM
. Then you apply filters to that data: WHERE
. Finally, you decide what is being returned, SELECT
.
By the nature of how a query works, all the examples I’ve listed so far show the basics of the SELECT
operation. It’s very much the same as in T-SQL. I can specify columns from the defined tables and use the alias of those tables to make the code clearer (and shorter). In fact, most behaviors I’m used to, I can see: DISTINCT
, WITH
, UNION
, ORDER BY
and more.
However, there are some really interesting behaviors. For example, if you leave off a column alias like this:
SELECT rb.radio_id + rb.band_id FROM radio.radiobands rb;
You get a name automatically, “?column?
“:
There are also some really interesting functions and clauses that change the way queries behave. For example, LIMIT
, works very similarly to TOP
:
SELECT r.radio_name FROM radio.radios r LIMIT 3;
Since there is no TOP
operator, this is what you would use. However, there’s a wrinkle. You can actually add OFFSET
to this and then it will return 3 rows, starting at the row you specify:
SELECT r.radio_name FROM radio.radios r LIMIT 3 OFFSET 2;
Just like TOP
, if you don’t specify the order, you can get inconsistent results.
Another one is FETCH
. It works a lot like LIMIT (
SQL Server has OFFSET (part of the ORDER BY
clause and FETCH clauses, but the syntax is quite different):
SELECT r.radio_name FROM radio.radios r ORDER BY r.radio_name DESC FETCH NEXT 3 ROWS ONLY;
I can use the OFFSET
in the same way. However, I get some additional behavior too. Instead of ONLY
, I can tell it WITH TIES
. In that case, if more than one value meets the top 3 criteria, we’ll get any ties, meaning, possibly, more than 3 rows, depending on how many rows match.
There are several additional differences and some other unique behaviors, but that’s most of the big stuff. I could spend a lot of time talking about the differences in window functions and locking, but there’s enough there to make independent articles just on those topics, so I’ll skip them here.
Conclusion
Mostly, when querying PostgreSQL, I just start typing and most of the time it works. There really are more similarities than differences. However, the differences are worth noting, especially when it comes to something like TOP
vs. LIMIT/FETCH
. Overall, these are small things, but they will act as a gotcha when you’re writing your code.
The post Querying PostgreSQL: Learning PostgreSQL with Grant appeared first on Simple Talk.
from Simple Talk https://ift.tt/T56x9W7
via
No comments:
Post a Comment