Thursday, July 15, 2021

Typical uses of Oracle sequences

The series so far:

  1. Oracle sequences: The basics
  2. Oracle sequences: RAC
  3. Typical uses of Oracle sequences

So far, in this mini-series on sequences, I have described the basic mechanics of sequences in single-instance and multi-instance (RAC) systems. I’ve explained the significance of setting the cache size in both cases and the order option for RAC. I’ve demonstrated how using sequences as synthetic/surrogate keys can lead to contention on the “right-hand”/“high-value” block of supporting indexes unless you take defensive actions. I’ve also shown how one of the typical strategies has been embedded in recent versions of Oracle in the scale and extend options. In this article, I’ll cover the most typical uses of Oracle sequences and highlight a couple of details of their behaviour that can cause some confusion.

Basic options

There are essentially only two things you can do with a sequence – you can ask for the “next value” by referencing {sequence}.nextval or ask for the “current value” by referencing {sequence}.currval. By making a statement that simple, I’ve already introduced a trap. The nextval for a sequence is a system-level request, the currval for a sequence is a session-level request. While nextval is a request for Oracle to generate the next available value according to the sequence definition, currval is a request to repeat the most recent value that the current session got on its most recent call to nextval. So, for example, this is what happens if you start a new SQL*Plus session and immediately call for {sequence}.currval:

SQL> select s1.currval from dual;
select s1.currval from dual
       *
ERROR at line 1:
ORA-08002: sequence S1.CURRVAL is not yet defined in this session

The session hasn’t yet called (and saved locally) s1.nextval, which is why it says that s1.currval is not yet defined. (Personally, I think the error message could have been more explicit – e.g. commenting that there has been no previous call to s1.nextval in the session.) Reconnect, do a couple of calls for s. nextval, then a couple for s1.currval, and this is what happens:

SQL> select s1.nextval from dual;
   NEXTVAL
----------
         3
1 row selected.
SQL> select s1.nextval from dual;
   NEXTVAL
----------
         4
1 row selected.
SQL> select s1.currval from dual;
   CURRVAL
----------
         4
1 row selected.
SQL> select s1.currval from dual;
   CURRVAL
----------
         4
1 row selected.

I’ve used a select from dual to get a single sequence value but you can do multi-row selects to get a stream of sequence values, and you can also use simple assignments to get a value, e.g:

SQL> declare
  2          n1 number(6,0);
  3  begin
  4          n1 := s1.nextval;
  5          dbms_output.put_line(n1);
  6  end;
  7  /
5
PL/SQL procedure successfully completed.

If you enable tracing on this anonymous pl/sql block, though, you will find that behind the scenes,

The series so far:

  1. Oracle sequences: The basics
  2. Oracle sequences: RAC
  3. Typical uses of Oracle sequences
 will have executed a simple select s1.nextval from dual to acquire the value.

The basic rule of sequences is that you write code to generate a “rowsource”, and the sequence generator is called once per row; this produces results that aren’t always quite what you anticipate. Consider the position where I start a new session and continue using sequence s1 just after the point where I ended in the previous example, and execute the following three statements:

SQL> select s1.currval, s1.nextval from dual;
SQL> select s1.nextval, s1.currval from dual;
SQL> select s1.nextval, s1.nextval, s1.nextval from dual;

Looking at the first statement, which (apparently) calls currval before it calls nextval, you might expect it to fail with the ORA-08002 error, but it doesn’t.

Looking at the second statement, you might expect the call to currval to report the previous value acquired by this session while nextval reports the next value from the global cache, but it doesn’t.

Looking at the last statement, you might expect to see three consecutive sequence values, but you won’t.

The results of the three statements are as following:

SQL> select s1.currval, s1.nextval from dual;
   CURRVAL    NEXTVAL
---------- ----------
         6          6
SQL> select s1.nextval, s1.currval from dual;
   NEXTVAL    CURRVAL
---------- ----------
         7          7
SQL> select s1.nextval, s1.nextval, s1.nextval from dual;
   NEXTVAL    NEXTVAL    NEXTVAL
---------- ---------- ----------
         8          8          8

There is one call to the sequence generator for each row which is why 8 appears three times. If you include both currval and nextval in the select list, then nextval is called, and the return value used for both the nextval and the currval.

Some of the questions people ask about the “unexpected” behaviour of sequences can be answered very easily once the connection between “rows selected” and “calls to nextval” is clear. However, this doesn’t mean that “if you can write a select clause you can select a sequence”; there are cases where your code will run into the exception: ORA-02287: sequence number not allowed here, for example:

SQL> create or replace view v1 
  2  as
  3  select object_name, owner, s1.nextval
  4  from t1
  5  /
select object_name, owner, s1.nextval
                              *
ERROR at line 3:
ORA-02287: sequence number not allowed here

There are also cases where adding a sequence to the select list has a much greater effect than just being “another column.” Distributed queries, for example, may change their execution plan quite dramatically because the addition of a sequence has changed a query from “remote only” to genuinely distributed.

Sequences and DML

The most common use of a sequence is probably as the source of a surrogate (or synthetic) key, so you may see code like the following in an ETL or other large batch process:

insert into production_table 
select  s1.nextval, …
from    staging_table
where      ...

An important point to be aware of when you do this type of thing is that sequences and parallelism don’t co-operate very well. It would be very easy to decide that you needed to increase the speed at which you were generating data and that simply using parallel execution would help and write something like:

insert  /*+ parallel(6) */ into production_table 
select  s1.nextval, …
from    staging_table
where      ...

A common mistake that people make at this point is to forget that parallel DML is not enabled by default, so if you want the insert to operate with DOP (degree of parallelism) 6, then you also need to execute

alter session enable parallel dml;

(and don’t forget to disable it afterwards).

However, having flagged up this reminder on parallel DML, I’m now going to point out that if your code uses sequences, you might be in exactly the position where you want the select to run parallel, but the insert to run serially. Remember that sequence.nextval has to update a specific row in the seq$ table from time to time, and (more importantly) every call to nextval has to access the dictionary cache entry for that particular sequence. That means competition for dictionary cache mutexes (or latches for older versions of Oracle). You may find that the nominal benefit of parallel execution disappears in hugely increased CPU consumption due to contention.

End-user DML

In high-precision code (typically end-user facing( you may do something which requires re-use of a primary key as a foreign key to another table: orders and order_lines, for example, and there have been a few variations in method over the years. For example:

select s1.nextval into :local_variable from dual;
insert into orders (id, …) values(:local_variable, …);
insert into order_lines(id_ord, line, …) values(:local_variable, …)

Of course, this type of approach might have been necessary when it wasn’t possible to use a sequence nextval as the default for a column. But now we can, so we could have something like:

create sequence s1 start with 1e6 cache 1e3;
create table orders (
        id number(8,0) default s1.nextval primary key, 
        date_made date not null
);
create table order_lines(
        id_ord     number(8,0) not null references orders, 
        line#      number(4,0) not null, 
        id_product number(8,0) not null
);
insert into orders     (date_made) values (sysdate);
insert into order_lines(id_ord, line#, id_product) 
                                   values(s1.currval, 1, 999);

This does require the programmer to know that the orders table has its primary key generated by a call to a sequence called s1. Be aware of Oracle’s syntax, which allows you to call sequence.currval after (even implicitly) you’ve called sequence.nextval. A slightly more generic, or programmer-friendly, variant might return the generated id to the client code:

variable m_id number
insert into orders (date_made) values (sysdate) returning id into :m_id;
insert into order_lines(id_ord, line#, id_product) values(:m_id, 1, 999);

If you start using the 12c “identity” type, you probably have to adopt this approach to retrieve the value you’ve inserted. You’re not supposed to know that, under the covers, Oracle is simply using a sequence to supply a value, and you certainly should not assume you can work out the name of that sequence and call for its currval.

create table orders (
        id number(8,0) 
                generated always as identity start with 1e6 cache 1e3
                primary key,
        date_made date not null
);

The Merge Command

An increasingly popular strategy for updating one table from another is to use the merge command, which has the following skeleton:

merge into target 
using source
on target.cols = source.cols
when matched then 
        update set {target_columns} = {source columns} where …
        delete where …
when not matched then
        Insert ({target columns}) values({source columns}) where …

A couple of points to watch out for in general with the merge command:

  • Performance: rather than using table names for the source and target data sets, see if you can use in-line views with the minimum number of columns selected from the source and target. There is an oddity with the merge command that (unlike other SQL statements) it doesn’t automatically project only the columns that will be used – this means that it may carry far more intermediate data than it needs to. (See, for example, this post.) 
  • Determinism: you should view the merge command in much the same way as you would updateable join views. The ON clause has a uniqueness requirement – there should be at most one row in the source table for any row in the target table, though there can be many rows in the target table for each row in the source table. If you don’t meet this requirement, updateable join views will raise a parse-time error of ORA-01779 relating to “non key preserved table”. With the equivalent data, the merge command may work, or it may raise a run-time error of “ORA-30926: unable to get a stable set of rows in the source tables”). The effect is not deterministic. (Side note: in Oracle 21c, the parse-time error has been removed, and updateable join views also follow the run-time mechanism.)

The key reason for picking the merge command is that it’s a nice example of how unpredictable the results of using a sequence might be – and it has also appeared a couple of times quite recently on various Oracle forums for exactly this reason.

Here’s a little script to create a small demonstration data set. I’m going to have a new data table which is going to be merged with an old data table. The old data will consist of just 50 rows with odd-numbered IDs between 1 and 99; the new data will consist of 100 rows with IDs (both odd and even) from 1 to 100. I’ll be updating based on matching IDs and inserting where there is no match.

create sequence s1 minvalue 1001 start with 1001 cache 1000;
create table old_data
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4
)
select
        2 * rownum - 1                  id,
        'OLD'                           status,
        cast(0 as number(4,0))          seq_value,
        lpad('x',100,'x')               padding
from
        generator       v1
where
        rownum <= 50
;
alter table old_data add constraint od_pk primary key(id);
create table new_data
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4
)
select
        rownum                          id,
        'NEW'                           status,
        cast(0 as number(4,0))          seq_value,
        lpad('x',100,'x')               padding
from
        generator       v1
where
        rownum <= 100
;
alter table new_data add constraint nd_pk primary key(id);

I’ll run three separate tests: in one, I’ll use only the when matched update clause, in the second, I’ll use only the when not matched insert clause, and finally, I’ll run a test with both clauses. To keep things simple, I won’t use the where or delete clauses. Here’s the statement with both clauses:

merge into
        old_data od
using   new_data nd
on      (nd.id = od.id)
when matched then
        update set seq_value = s1.nextval, status ='UPD'
when not matched then
        insert (id, status, seq_value, padding)
        values (nd.id, 'NEW',  s1.nextval, nd.padding)
;

In this case, I have 50 rows in the old data that will match and 50 that will not match. I expect to update 50 rows with s1.nextval and insert 50 rows with s1.nextval as the inserted sql_value column. Since the sequence s1 starts with 1,001, I expect to get to 1,100 by the end of this merge, and that’s exactly what happens.

The question is: can I say anything about the order in which the sequence values have been used? Will I have 1001 – 1050 in the ‘UPD’ rows and 1051 – 1100 in the ‘NEW’ rows, or will the value alternate between the two? A simple query to select id, status, seq_value from old_data order by id supplies the (initial) answer:

The results of old_data

The pattern continues through the entire ordered data set – and if you check the execution plan for the query, you may be able to infer why that appears to be true in this case:

Execution plan

The underlying mechanism of the merge command is to join the old and new data sets. If there’s a when not matched clause, it must be an outer join. In this case, Oracle has decided to do a “right outer” hash join, which means it has used the (smaller) old_data data set as the build table and a table scan of the new_data table to probe the build table. Since every row from new_data survives the probe, the sequence numbers are applied in order to the rows in the new_data table (which, thanks to my original CTAS) gives the impression that the sequence and the id values are in sync.

But what would happen if Oracle decided to do the outer join the other way round? I checked the outline Information for this plan and changed a swap_join_Inputs() hint to a no_swap_join_inputs() hint to make this happen, changing the last three lines of the plan to:

Last three lines of the plan

And this is how the updated data looked:

Image showing the updated data

There is still a pattern. The odd IDs, when ordered, produce the sequence values 1001 to 1050 in order; again the apparent lock-step is a side effect of the original CTAS. However, the newly inserted even IDs show a random ordering of the seq_value column because of how the outer hash join works when the build table is generated from the preserved table.

The key feature here is that there’s a degree of unpredictability of how your newly arriving data may end up with sequence numbers that don’t seem to match the order of arrival very well. In a relational database, this shouldn’t really come as a surprise, but if it seems to have behaved “nicely” for a long time, it can be a big shock when things suddenly change simply because of a barely noticeable change in the execution plan.

There’s more, though, because sequence numbers get lost. Again, the many variations of the merge command (with where and delete clauses, in particular) can show several different ways in which values disappear. The simplest option is simply to create a merge command without a when matched clause.

Running the test using only the when not matched clause, the resulting output from the ordered result from the old_data table as follows:

ordering the results of old_data

The pattern continues through the entire data set. Only the even sequence values have been used, and the odd sequence values are “lost”. Going back to my earlier comment: “you write code to generate a “row source”, and the sequence generator is called once per row.” However, it is just a little more subtle than that. In this case, the critical “row source” is the output from the hash join, and the sequence nextval is generated as each row is passed up from the hash join. Then the merge command splits the row source into two parts, the update stream and the insert stream and (in this example) discards the update stream losing half the sequence numbers.

Remote queries

Another older example of the surprises you can get if you treat sequences as if they were “just another column” is the trap of moving data from one database to another. For example, you might have some code that does a fairly simple:

insert into local table
select  {list of columns}
from    {list of remote tables}
where   {join and filter conditions}
;

After a little tweaking, you find that you have a good execution plan, and the query runs efficiently as a “fully remote” query. Your next step is to add a (local) sequence number to the select list, but the performance takes a catastrophic nose-dive because the query is now a distributed query, and the optimizer has to optimize it locally and fails to come up with a plan that includes a remote join. An example I created for a recent note on my blog showed a plan that started efficiently like this (notice the zero selected in the second line of the statement):

insert into t3 (id1, id2, n0, n1, n2, v1, v2) select  t1.id,  t2.id,
0,  t1.n1,  t2.n2,  t1.v1,  t2.v2 from  t1@orclpdb@loopback t1,
t2@orclpdb@loopback t2 where  t2.id = t1.id

 

but changed to the following when the zero was replaced by a call to a local sequence:

insert into t3 (id1, id2, n0, n1, n2, v1, v2) select  t1.id,  t2.id,
s1.nextval,  t1.n1,  t2.n2,  t1.v1,  t2.v2 from  t1@orclpdb@loopback
t1,  t2@orclpdb@loopback t2 where  t2.id = t1.id

 

There are workarounds to this problem, but if you don’t look carefully at the things that change when you introduce a sequence, you may find that a test case on a small development system simply isn’t acceptable for a full-scale production system.

Typical uses of Oracle sequences

I’ve covered a few of the commoner examples of using sequences and highlighted four general points;

First, all you can ever do is, essentially, select the nextval and/or the currval of a sequence. If a statement includes a row source that uses a sequence, then nextval will be actioned before currval, and nextval will only be actioned once per row in that row source.

  • There are row sources that do not permit the use of sequences.
  • For complex execution plans, a change in the plan may give the appearance of randomizing the order of sequence values in the final output, and some plans may even give the impression that sequence values have “gone missing”.
  • Finally, the presence of a sequence in a query may dictate the overall strategy of the execution plan, with the particular example of a “fully remote” query changing to a “distributed” query.

A sequence is not just another column or function – it carries some special baggage.

In the final article in this mini-series, I’ll finish with a few more comments on the latest features of sequences from 12c onwards.

 

The post Typical uses of Oracle sequences appeared first on Simple Talk.



from Simple Talk https://ift.tt/3hBKg5s
via

No comments:

Post a Comment