Many database applications make use of “meaningless ids” or “synthetic keys” rather than using columns of user data to construct the primary key for their tables. For a very long time, the Oracle RDBMS has supplied the “sequence” mechanism to generate such values efficiently and with minimum contention between transactions. Despite a history of several decades, sequences are often misunderstood and misused – and there are a few threats involving sequences that Oracle Corp. is still working to minimize.
In this short series of articles, I aim to eliminate the most typical misunderstandings, warn you of the threats and provide ideas for workarounds. I’ll start by discussing the pre-12c use of sequences for single-instance, Oracle, then I’ll move on to the critical changes that are relevant to multi-instance RAC, and end with some comments on the features that appeared more recently in 12.1, 18c and 19c to deal particularly with “Application Continuity”, RAC and “Sharding”.
Defining a sequence
Until 12c, the full syntax of the create sequence command is as follows:
CREATE SEQUENCE [schema_name.]{sequence_name} START WITH n INCREMENT BY n [MINVALUE n | NOMINVALUE ] [MAXVALUE n | NOMAXVALUE ] [CACHE n | NOCACHE] [ORDER | NOORDER] [CYCLE | NOCYCLE]
The shortest statement you could use to create a sequence, however, would be simply:
CREATE SEQUENCE sequence_name;
This command would be equivalent to:
CREATE SEQUENCE sequence_name START WITH 1 INCREMENT BY 1 MINVALUE 1 NOMAXVALUE -- which is effectively 1e28 - 1 CACHE 20 -- which is usually a bad idea, as we shall see later NOORDER NOCYCLE
Oracle invented sequences to make it possible to supply on demand a (virtually) limitless sequence of numbers that were guaranteed to be unique. The easiest way to do this is simply to keep a globally visible counter and increment it every time a session says: “give me a number” (or, more accurately, “tell me the value of sequence_name.nextval
”).
But Oracle has to keep track of the last number supplied to avoid the risk of supplying the same number twice – that’s easy enough for a database system: just keep the most recent number in a table. However, that makes things very slow if you have to update a table each time someone needs the next number. You ought to use an autonomous transaction so that nobody has to wait for the first session to commit its current transaction before the next number can be generated. Therefore, Oracle uses recursive, autonomous transactions to handle the table updates, and it avoids updating the table for every single call to nextval
by counting in memory and remembering when it will next need to update the table. Unfortunately, Oracle refers to this “in-memory” mechanism as the cache – which is very misleading.
A common strategy for “home-made” sequences is to have rows in a table with columns (sequence name, most recently used sequence number), then write a function that selects the relevant row from the table for update, increments the number, updates the table, and supplies the number to the end-user. Behind the scenes, this is exactly what the Oracle software does, though being an internalized mechanism, it can “cheat” in ways that the home-grown code can’t.
You can see the database image of a sequence by querying the view user_sequences
. Assuming I’ve just created a sequence called s1, here’s a suitable query followed by the result it gives me immediately after I created the sequence, followed by the result I get after one call to fetch s1.nextval
:
select min_value, max_value, increment_by, cache_size, last_number, cycle_flag, order_flag from user_sequences where sequence_name = 'S1' /
If you have DBA privileges, you could go one step further by finding the object_id
of the sequence and then querying the dictionary table (seq$
) that the view is based on. Again, I’ve run the query twice, once before and once just after the first call to s1.nextval):
select minvalue, maxvalue, increment$, cache, highwater, cycle#, order$ from seq$ where obj# = 124874 ;
There’s one final option for checking the state of a sequence – the in-memory information that is visible to privileged accounts through the dynamic performance view v$_sequences
(the unexpected underscore isn’t a typo). In this case, I’ve queried the view just once, after the first call to s1.nextval
(the sequence won’t even be in the view until the first call to nextval),
and I’ve included a column that isn’t available in the dictionary table:
select min_value, max_value, increment_by, cache_size, highwater, cycle_flag, order_flag, nextvalue from v$_sequences where object# = 124874 /
As you can see, the dynamic performance view has a column nextvalue
that holds the value which will be supplied the next time a user calls for s1.nextval
.
All three views hold a “highwater” value (though it’s called “last_number
” in view user_sequences
). When a call to s1.nextval returns the value shown in highwater
(highwater
and nextvalue
are the same), the session making the call will execute and commit a recursive transaction to add the increment
value to the current highwater
and update the seq$
table.
The “cache” isn’t a cache in the traditional sense, it’s just a single pair of numbers (no matter how large you set the sequence cache size): the current value and the value at which you bump the highwater
value. There is no resource penalty for setting a large cache for a sequence, and the most important fact about sequences is that, in most cases, you should probably be using a large cache size.
Performance issues (Single Instance)
As I said at the start of this article, the commonest use of sequences is to generate synthetic keys. If you’ve used a sequence in this way for a table that’s subject to a high volume of activity – and particularly if it’s subject to highly concurrent activity – you want the sequence numbers to be supplied as quickly as possible. You don’t want to have to pause on every 20th call for a data dictionary table to be updated, and you don’t want to wait because 50 other sessions are all colliding on the same sequence maintenance.
Cache size
It’s quite easy to recognize this issue in single-instance Oracle as you’ll see sessions in v$lock
waiting on the “SQ
” enqueue (lock) in mode 4, and the reported wait event will be “enq: SQ - contention
”, with p1
set to the object_id
of the sequence they’re waiting for. The solution to this problem is simply to increase the cache size, and it’s not uncommon on very busy systems to see sequences with caches in the thousands:
alter sequence s1 cache 10000;
Given that it’s often important for performance and concurrency to increase the cache size of a sequence from the default 20, you should appreciate that it’s a bad idea to create a single sequence to supply values for every single column that needs unique meaningless values. Play safe – give each synthetic key its own sequence. (Some other RDBMS do this automatically through an “identity” data type, but that feature has been introduced only in recent versions of Oracle.)
This default performance problem is sometimes made worse by a demand from the end-users that “there should be no missing numbers” which leads to DBAs setting the sequence to nocache
. In other words, every call to nextval
results in the highwater
being incremented and the seq$
table being updated. This can be a massive source of contention and could easily end up as the key bottleneck in the system – and to make things worse, it doesn’t stop numbers from going missing (e.g. a session might have to rollback or might crash after its call to nextval
, and the value would then be lost).
Oracle sequences are not intended to provide an unbroken sequence of values; they are only intended to supply a unique set of numbers. If you need some form of sequential audit trail (and you want it to be scalable) then you will have to do some fairly careful coding to create your own – it’s not a trivial exercise.
Index Contention
There is another common performance issue that is a byproduct of using sequences to generate unique ids: if uniqueness is important, you ought to declare a constraint on the relevant column, and a unique (or primary key) constraint will automatically result in Oracle creating a corresponding index. The index will become a point of contention because all the sessions that are trying to insert a new row will have generated sequence numbers which are (a) very close to each other and (b) the highest values generated so far; this means you will have multiple sessions trying to insert new values into the same (so-called “right-hand”) leaf block of the index.
In these circumstances, you’re likely to see a lot of waits for “buffer busy wait” on the index; you may also see waits for “enq: TX – index contention”, and even “enq: TX – allocate ITL entry” as sessions queue up waiting for an index leaf block split to complete. Unfortunately, there’s no good solution to this problem, though there are damage limitation mechanisms you could implement, and one of the enhancements Oracle 18c supplies is a declarative version of one of the commoner mechanisms.
Once upon a time, a common suggestion to avoid the index hot-spot was to recreate the index as a “reverse-key” index but, for continuously growing tables, that’s a bad idea as it typically replaces a few seconds of “buffer busy wait” with minutes (or more) of extra “db file sequential read” waits. (For an explanation of this comment, see: https://jonathanlewis.wordpress.com/2015/06/17/reverse-key-2/ )
If you’re licensed for the partitioning option, a better way to remove the hot spot is to recreate the index as a globally hash partitioned index – with the usual proviso that the number of partitions should be a power of 2 to ensure equal sized partitions. There are some optimizer-related drawbacks to partitioning an index in this way, but since we’re talking about the use of sequences for synthetic (meaningless) keys, the only queries you expect to see targeting the index are likely to be of the form: unique_key = {constant},
and the optimizer is unlikely to be misled by a query with a predicate of that type.
The partitioning option is expensive, of course, and isn’t available on Standard Edition, so the manually coded fall back is spread the hot spot by manipulating the sequence number after you’ve fetched it. First, you have to pick a number that is so large that you think the sequence will never need to reach it (say 1e10). Then you add a session-dependent value to the sequence number that makes sequence values that are close to each other end up widely separated in the index because they were inserted by different sessions. For example, you might add (1 + mod(sid, 16)) * 1e10
to the sequence value; this will give you most of the performance benefit of hash partitioning your index into 16 partitions – splitting your one hot-spot into 16 “slightly-warm” spots.
Inevitably there’s a price to pay for this scaling mechanism. In an ideal world, an index on data generated by a sequence would use close to 100% of the available space in each index leaf block because Oracle would be doing what it calls “leaf node 90-10 splits” each time the right-hand (high value) leaf block split. If you spread the inserts across N different insertion points in the index (using 1+mod(sid,N)
as above), then all but the highest insertion point would be doing “50-50” leaf node splits.
Although Oracle has a statistic in the instance and session activity statistics (v$sysstat / v$sesstat)
for “leaf node 90-10 splits”, a better name would be “leaf node 100-0 splits”. The “split” simply adds a leaf block at the right-hand (high-value) end of the index and carries on into it; it doesn’t copy any data from the preceding leaf block. By comparison, the “normal” mid-index leaf node split links a new leaf block into the right place then moves roughly half the data from the existing leaf block into the new leaf block.
The side effect of manually “pseudo-partitioning” an index in this way is that it might operate at roughly twice its previous size. (And that’s not allowing for the fact that you now have an index where the smallest entry is a few bytes longer than the largest entry used to be.) In real-life, there are a couple of odd concurrency effects that can make any index with hot spots waste a significant amount of space, so the change may not be quite as dramatic as the theory predicts.
Miscellaneous
I’ll finish this article with a brief round-up of the remaining, less commonly used features of the pre-12c sequence.
Start with: you can specify any legal integer value as the first value to use, positive or negative; the default is 1.
Increment by: although I’ve been discussing sequences that increase by 1, you can set the increment to any legal integer value (positive or negative). So if you wanted a sequence of the positive even numbers you could specify “start with 2 increment by 2”. The default increment is 1.
Minvalue: if your increment is negative, you can set a value below which Oracle raises an error (ORA-08004: sequence goes below MINVALUE
).
Maxvalue: the mirror to minvalue
– with a positive increment in place you can set a value above which Oracle will raise the error (ORA-08004: sequence goes above MAXVALUE
).
Cycle: I don’t think I’ve ever seen this used in the wild, but you can set up a sequence to repeat if it has a maxvalue
with positive increment or a minvalue
with negative increment. If the call to nextval
would push the sequence past its min/max value then it goes back to its start value and continues from there.
Order: This is an option that applies very specifically to how RAC handles sequences, and I’ll leave that to the second article in the series.
If you start playing around with sequences, you’ll discover that there are a number of ways of combining the options to produce fairly self-explanatory Oracle errors. For example:
SQL> create sequence s4 start with 10 increment by 15 nomaxvalue cycle; create sequence s4 start with 10 increment by 15 nomaxvalue cycle *
There are so many possibilities that there’s no point in trying to list all of them. If you want to read the Oracle error messages file ($ORACLE_HOME/rdbms/mesg/oraus.msg) then the error numbers relating to creating sequences go from ORA-04001 to ORA-04016.
Summary
In this first article, I’ve described the basic and most common use of sequences and highlighted the need to choose a sensible cache size to avoid concurrency problems with the sequence itself. In particular, I’ve emphasized the threat of defining a sequence as nocache while pointing out that, contrary to popular assumption, this option doesn’t stop you “losing” sequence numbers.
I’ve pointed out the concurrency issues that arise when you create an index to enforce the uniqueness of sequence values, and discussed strategies for minimizing the threat, at the same time pointing out that whatever you do there are side effects. In passing, I’ve mentioned that time-based indexes will exhibit similar behaviour to sequence-based indexes.
Finally, I’ve given you a brief run-down on the less frequently used options for defining a sequence, leaving any comments about the order option to the next article, which will start talking about RAC and the extra care needed when handling sequences in a multi-instance system.
The post Oracle sequences: The basics appeared first on Simple Talk.
from Simple Talk https://ift.tt/2Pe08Q1
via
No comments:
Post a Comment