<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:gd="http://schemas.google.com/g/2005" xmlns:thr="http://purl.org/syndication/thread/1.0" version="2.0"><channel><atom:id>tag:blogger.com,1999:blog-15440695</atom:id><lastBuildDate>Thu, 02 Feb 2012 01:08:53 +0000</lastBuildDate><title>THE Q U A D R O BLOG</title><description /><link>http://afatkulin.blogspot.com/</link><managingEditor>noreply@blogger.com (Alex Fatkulin)</managingEditor><generator>Blogger</generator><openSearch:totalResults>135</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/TheQUADROBlog" /><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="thequadroblog" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-4803016963357368436</guid><pubDate>Thu, 02 Feb 2012 01:08:00 +0000</pubDate><atom:updated>2012-02-01T20:08:53.436-05:00</atom:updated><title>Latch: row cache objects, session_cached_cursors and a database link</title><description>What could possibly be common about the above things?&lt;br /&gt;
&lt;br /&gt;
Sometime ago I was involved in tuning a large scale production system which experienced quite a lot of latch contention. In fact the contention was bad enough as to render the entire system unusable during peak hours.&lt;br /&gt;
&lt;br /&gt;
A typical AWR report during times with moderate workload looked like this:&lt;br /&gt;
&lt;br /&gt;
&lt;pre&gt;Event                      Waits           Time(s) Avg wait (ms)   % DB time  Wait Class
DB CPU                     21,805          36.95  
latch: row cache objects   14,524,462      11,552  1               19.57      Concurrency
db file sequential read    2,697,778       8,988   3               15.23      User I/O
gc current block 3-way     4,202,356       3,599   1               6.10       Cluster
gc current block 2-way     3,670,293       2,330   1               3.95       Cluster&lt;/pre&gt;&lt;br /&gt;
(in reality it was a 7-node RAC cluster but it is not really relevant to our subject).&lt;br /&gt;
&lt;br /&gt;
It is not hard to spot &lt;i&gt;latch: row cache objects&lt;/i&gt; on a second place consuming almost 20% of DB time. This event was rapidly escalating whether load increased quickly bringing the entire cluster into unusable state as far as any front-end application were concerned. The total number of gets during an hour (the interval with which this particular AWR report had been made) totaled 1,281,167,103 which averages a whopping 355,879 gets per second!&lt;br /&gt;
&lt;br /&gt;
The first step was to take a look at the Dictionary Cache statistics and see if there were any obvious deviations:&lt;br /&gt;
&lt;pre&gt;Cache                   Get Requests    Pct Miss
dc_awr_control          72              9.72
dc_database_links       124,452         0.05
dc_files                46,900          17.14
dc_global_oids          555,599         0.07
dc_histogram_data       8,962,576       1.01
dc_histogram_defs       412,885,094     0.02
dc_object_grants        52,475          1.13
dc_objects              2,860,222       0.64
dc_profiles             61,189          0.01
dc_rollback_segments    392,885         0.00
dc_segments             657,653         7.72
dc_sequences            1,324           67.67
dc_table_scns           3,206           3.31
dc_tablespaces          1,253,322       0.06
dc_users                5,475,824       0.02
global database name    65,092          0.01
kqlsubheap_object       10,910          0.46
outstanding_alerts      721             95.98&lt;/pre&gt;&lt;br /&gt;
&lt;i&gt;dc_histogram_defs&lt;/i&gt; really stands out but so far provides no direct clues. Though the entire dictionary cache seems to be quite busy. Let's take a look at instance activity statistics:&lt;br /&gt;
&lt;pre&gt;&amp;nbsp;                 Per Second   Per Transaction  Per Exec        Per Call
DB Time(s):       16.4         0.5              0.00            0.00
DB CPU(s):        6.1          0.2              0.00            0.00
Redo size:        87,543.5     2,398.6
Logical reads:    336,555.7    9,221.2
Block changes:    449.3        12.3
Physical reads:   1,018.3      27.9
Physical writes:  36.8         1.0
User calls:       41,250.4     1,130.2
Parses:           6,264.4      171.6
Hard parses:      27.1         0.7
W/A MB processed: 33,769,202.4 925,233.0
Logons:           18.3         0.5
Executes:         14,230.5     389.9
Rollbacks:        3.4          0.1
Transactions:     36.5&lt;/pre&gt;&lt;br /&gt;
A couple of things to notice:&lt;br /&gt;
&lt;ol&gt;&lt;li&gt;The number of executions is not that high (at least for an IBM box with 56 CPU cores, that is).&lt;/li&gt;
&lt;li&gt;&lt;i&gt;Execute to Parse&lt;/i&gt; ratio is not particularly good nor bad (6,264.4 parses per second compared to 14,230.5 executions).&lt;/li&gt;
&lt;li&gt;None of the above seems to be enough to justify 355,879 &lt;i&gt;latch: row cache objects&lt;/i&gt; gets per second, even when you consider 27.1 hard parses/sec.&lt;/li&gt;
&lt;/ol&gt;&lt;br /&gt;
There had to be more going on. Looking at the &lt;i&gt;SQL ordered by Parse Calls&lt;/i&gt; section of the report revealed that all the top parsing queries were coming from a database link. Which prompted me to ask additional questions...&lt;br /&gt;
&lt;br /&gt;
It turned out that the system had a somewhat weird architecture. A 7-node RAC cluster served as a back end to a numerous front-end Oracle databases which were essentially acting as PL/SQL-engines, accessing all the data through the database link while dong all the processing. The architectural feasibility of such solution is debatable (to say the least) but let's not steer away from the main topic of our discussion.&lt;br /&gt;
&lt;br /&gt;
I geared to find out whether there was some special case which could cause &lt;i&gt;latch: row cache objects&lt;/i&gt; to shoot through the roof when we deal with a lot of queries executed through a database link. After a number of experiments I eventually found a cause which explains the title of this blog post.&lt;br /&gt;
&lt;br /&gt;
Take a look at the following example:&lt;br /&gt;
&lt;br /&gt;
&lt;pre&gt;SQL&gt; declare
  2   l_gets number;
  3  begin
  4   select gets into l_gets from v$latch@dblink where name='row cache objects';
  5  
  6   for i in 1 .. 1000
  7   loop
  8    for cur in (select null from dual@dblink) loop null; end loop;
  9   end loop;
 10  
 11   select sum(gets)-l_gets into l_gets from v$latch@dblink where name='row cache objects';
 12   dbms_output.put_line(to_char(l_gets));
 13  end;
 14  /
 
79
 
PL/SQL procedure successfully completed&lt;/pre&gt;&lt;br /&gt;
Accessing a remote table 1000 times through a database link only caused 79 &lt;i&gt;row cache objects&lt;/i&gt; gets on the &lt;i&gt;destination&lt;/i&gt; database. However, exactly the same example can produce quite a different result:&lt;br /&gt;
&lt;br /&gt;
&lt;pre&gt;SQL&gt; alter session set session_cached_cursors=0;
 
Session altered
 
SQL&gt; 
SQL&gt; declare
  2   l_gets number;
  3  begin
  4   select gets into l_gets from v$latch@dblink where name='row cache objects';
  5  
  6   for i in 1 .. 1000
  7   loop
  8    for cur in (select null from dual@dblink) loop null; end loop;
  9   end loop;
 10  
 11   select sum(gets)-l_gets into l_gets from v$latch@dblink where name='row cache objects';
 12   dbms_output.put_line(to_char(l_gets));
 13  end;
 14  /
 
3082
 
PL/SQL procedure successfully completed&lt;/pre&gt;&lt;br /&gt;
Now we jumped from 79 gets to 3082 which is a 39x increase! What's really astonishing is the fact that it's a cursor cache miss on the &lt;i&gt;source&lt;/i&gt; which wreaks havoc on the &lt;i&gt;destination&lt;/i&gt;. Evidently, the solution to the problem had nothing to do with the database where the problem was observed. The "PL/SQL-databases" had an insufficient setting for &lt;i&gt;session_cached_cursors&lt;/i&gt; and once that was corrected the issue disappeared.&lt;br /&gt;
&lt;br /&gt;
It is somewhat ironic that that problem would have never appears had the data and PL/SQL be allowed to be kept in the same database (where they belong).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15440695-4803016963357368436?l=afatkulin.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://afatkulin.blogspot.com/2012/02/latch-row-cache-objects.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-52312454049975182</guid><pubDate>Wed, 06 Apr 2011 00:23:00 +0000</pubDate><atom:updated>2011-04-05T20:23:23.372-04:00</atom:updated><title>Oracle GoldenGate and undocumented OCI redo log APIs</title><description>&lt;a href="http://afatkulin.blogspot.com/2011/03/oracle-goldengate-asm-and-dblogreader.html"&gt;Last time&lt;/a&gt; I did some quick write up on the new APIs appeared in Oracle GoldenGate 11G which can be used against Oracle 10.2.0.5 and 11.2.0.2 to improve how GG handles redo on ASM.&lt;br /&gt;
&lt;br /&gt;
Some of the items still required follow up, namely:&lt;br /&gt;
&lt;ul&gt;&lt;li&gt;What are the names of these new API functions?&lt;/li&gt;
&lt;li&gt;What kind of security privileges do you need to call these?&lt;/li&gt;&lt;/ul&gt;I was especially interested in the last item since the ability to read Oracle's redo log gives you  extremely powerful access to database information.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;New OCI APIs&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
Discovering these is pretty straightforward, all we need to do is a stack dump on a running GG Extract process:&lt;br /&gt;
&lt;pre&gt;[oracle@quadro.com ~]$ pstack 4390
...
#10 0x00002acc23443ef0 in knxOGGRedoLogRead () from /u01/app/oracle/ggs_11g/libclntsh.so.11.1
&lt;b&gt;#11 0x00002acc232a9b3b in OCIPOGGRedoLogRead () from /u01/app/oracle/ggs_11g/libclntsh.so.11.1&lt;/b&gt;
#12 0x00000000008fa92c in ASMReader2::readFile(unsigned int, unsigned int, unsigned int*, char*, char*) ()
...&lt;/pre&gt;Once we know the name of the read function (&lt;i&gt;OCIPOGGRedoLogRead&lt;/i&gt;) we can find the rest of them:&lt;br /&gt;
&lt;pre&gt;[oracle@quadro.com ~]$ nm /u01/app/oracle/ggs_11g/libclntsh.so.11.1 | grep OCIPOGG
00000000010e3954 T OCIPOGGRedoLogClose
00000000010e3c90 T OCIPOGGRedoLogOpen
00000000010e3a2a T OCIPOGGRedoLogRead&lt;/pre&gt;As you can see, these functions are indeed exposed directly through &lt;i&gt;libclntsh&lt;/i&gt; library. We can even see the Extract process "in action" calling these functions with the help of Linux's &lt;i&gt;gdb&lt;/i&gt;:&lt;br /&gt;
&lt;pre&gt;[oracle@quadro.com ~]$ ps -aef | grep extract
oracle    4390  4386  0 19:36 ?        00:00:00 /u01/app/oracle/ggs_11g/extract PARAMFILE /u01/app/oracle/ggs_11g/dirprm/11g_ext.prm REPORTFILE /u01/app/oracle/ggs_11g/dirrpt/11G_EXT.rpt PROCESSID 11G_EXT USESUBDIRS
oracle    4985  4114  0 19:51 pts/2    00:00:00 grep extract
[oracle@quadro.com ~]$ gdb /u01/app/oracle/ggs_11g/extract -p 4390
...
(gdb) break OCIPOGGRedoLogClose
Breakpoint 1 at 0x2acc232a9958
(gdb) break OCIPOGGRedoLogOpen
Breakpoint 2 at 0x2acc232a9c94
(gdb) break OCIPOGGRedoLogRead
Breakpoint 3 at 0x2acc232a9a2e
(gdb) info break
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x00002acc232a9958 &lt;OCIPOGGRedoLogClose+4&gt;
2       breakpoint     keep y   0x00002acc232a9c94 &lt;OCIPOGGRedoLogOpen+4&gt;
3       breakpoint     keep y   0x00002acc232a9a2e &lt;OCIPOGGRedoLogRead+4&gt;&lt;/pre&gt;After we've sat up our breakpoints we can let the Extract run and hit one of these:&lt;br /&gt;
&lt;pre&gt;(gdb) continue
Continuing.

Breakpoint 3, 0x00002acc232a9a2e in OCIPOGGRedoLogRead ()
   from /u01/app/oracle/ggs_11g/libclntsh.so.11.1

(gdb) bt
#0  0x00002acc232a9a2e in OCIPOGGRedoLogRead ()
   from /u01/app/oracle/ggs_11g/libclntsh.so.11.1
#1  0x00000000008fa92c in ASMReader2::readFile (this=0xa408100,
    offset=&lt;value optimized out&gt;, bytes_to_read=1024000,
    bytes_read=0x7fffb8c73ad4, buffer=0xabc0000 "\001\"", errtext=0xc457c0 "")
    at /home/ecloud/workspace/Build_FBO_OpenSys_r11.1.1.0.0_078_[34086]/perforce/src/app/er/redo/oracle/asm.c:798
...&lt;/pre&gt;On top of the stack we can see GoldenGate making a call to &lt;i&gt;OCIPOGGRedoLogRead&lt;/i&gt;. We can also get a sense of parameters by looking at the calling function (&lt;i&gt;ASMReader2::readFile&lt;/i&gt;). The function specifies how many bytes it would like to read (&lt;i&gt;bytes_to_read=1024000&lt;/i&gt;), a pointer to a variable which will hold the actual amount read (&lt;i&gt;bytes_read=0x7fffb8c73ad4&lt;/i&gt;), a pointer to a buffer to store the returned redo data (&lt;i&gt;buffer=0xabc0000&lt;/i&gt;) and the error text, if any (&lt;i&gt;errtext=0xc457c0&lt;/i&gt;).&lt;br /&gt;
&lt;br /&gt;
Now it is time to see the other two functions usage. What I did is switched the redo logs in my database, disabled breakpoint number 3 (the one which points to &lt;i&gt;OCIPOGGRedoLogRead&lt;/i&gt;) and let the Extract process continue:&lt;br /&gt;
&lt;pre&gt;(gdb) disable 3

(gdb) continue
Continuing.

Breakpoint 2, 0x00002acc232a9c94 in OCIPOGGRedoLogOpen ()
   from /u01/app/oracle/ggs_11g/libclntsh.so.11.1
(gdb) bt
#0  0x00002acc232a9c94 in OCIPOGGRedoLogOpen ()
   from /u01/app/oracle/ggs_11g/libclntsh.so.11.1
#1  0x00000000008fa859 in ASMReader2::openFile (this=0xa408100,
    filename=0xa408350 "+DATA/ora11gr2/onlinelog/group_3.258.743164441",
    lblksize=&lt;value optimized out&gt;, blksize=0x0, tot_blks=0x0,
    filetype=&lt;value optimized out&gt;, errtext=0xc457c0 "")
    at /home/ecloud/workspace/Build_FBO_OpenSys_r11.1.1.0.0_078_[34086]/perforce/src/app/er/redo/oracle/asm.c:762
...&lt;/pre&gt;Interestingly enough we never made a call to &lt;i&gt;OCIPOGGRedoLogClose&lt;/i&gt;. Maybe I should open an Oracle SR and complain :)&lt;br /&gt;
&lt;br /&gt;
As before, we can get a glimpse of parameters which include redo log file name as well as bunch of output parameters specifying the block size and so on (which looks somewhat similar to what you get from &lt;i&gt;dbms_diskgroup.open&lt;/i&gt; call).&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Security&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
Looking at how all this stuff is being exposed through OCI, I became even more eager to find out what kind of privileges do you require on the database side in order to be able to use these. The first step was to try and run the Extract process with stripped out privileges which thankfully resulted in the following error:&lt;br /&gt;
&lt;pre&gt;2011-04-05 20:10:36  ERROR   OGG-00446  Opening ASM file +DATA/ora11gr2/onlinelog/group_3.258.743164441 in DBLOGREADER mode: (1031) ORA-01031: insufficient privilegesNot able to establish initial position for sequence 398, rba 1040.&lt;/pre&gt;At least there is something but what exactly is it? GoldenGate's documentation (should I add "as usual?") doesn't mention anything in that regards so I had to figure that one out on my own.&lt;br /&gt;
&lt;br /&gt;
After some back and forth I was able to eventually discover the privilege which makes it all possible -- it's &lt;i&gt;select any transaction&lt;/i&gt;. Alas it is not documented to play this role but now you know anyway.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15440695-52312454049975182?l=afatkulin.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://afatkulin.blogspot.com/2011/04/oracle-goldengate-and-undocumented-oci.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-4573771736624458913</guid><pubDate>Sat, 05 Mar 2011 18:21:00 +0000</pubDate><atom:updated>2011-03-05T13:21:22.898-05:00</atom:updated><title>Oracle GoldenGate, ASM and DBLOGREADER option</title><description>Some time ago I did a &lt;a href="http://www.pythian.com/news/7459/oracle-goldengate-extract-internals-part-ii/"&gt;write up&lt;/a&gt; detailing how Oracle GoldenGate interacts with Oracle ASM. The technology stack where the Extract process establishes a connection with the ASM instance and then reads file contents using dbms_diskgroup package is quite cumbersome to say the least. When compared to how GG reads redo from a filesystem (detailed &lt;a href="http://www.pythian.com/news/7225/oracle-goldengate-extract-internals-part-i/"&gt;here&lt;/a&gt;), the way GG has to deal with ASM is quite inefficient to say the least. In fact, as part of my original article, the recommendation was to use a bequeath connection when possible to cut on the amount of overhead. Nowadays the use of bequeath connection became an official recommendation from Oracle outlined in Oracle GG 11G &lt;a href="http://download.oracle.com/docs/cd/E18101_01/doc.1111/e18165.pdf"&gt;Release Notes&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;DBLOGREADER&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
I was quite excited to see that Oracle made some progress in it's newer release of GoldenGate (11.1). Here is what the documentation says about the new option:&lt;br /&gt;
&lt;br /&gt;
&lt;blockquote&gt;A new DBLOGREADER option was added to TRANLOGOPTIONS to enable Extract to use a &lt;br /&gt;
newer API for capturing from an Oracle ASM instance. This feature is available as of Oracle &lt;br /&gt;
10.2.0.5.&lt;/blockquote&gt;&lt;br /&gt;
This was indeed promising, however, what exactly has changed remained to be discovered.&lt;br /&gt;
&lt;br /&gt;
The first change you're going to notice when using DBLOGREADER is that the Extract process establishes a connection with the RDBMS instance (not an ASM instance). If you trace the relevant server side process then you'll observer the following event being emitted each time it reads from the redo:&lt;br /&gt;
&lt;br /&gt;
&lt;pre&gt;WAIT #0: nam='log file sequential read' ela= 3 log#=0 block#=102653 blocks=2000 obj#=-1 tim=1299345636565911&lt;/pre&gt;&lt;br /&gt;
One of the advantages of the new option is the ability to use a much larger buffer (the old mechanism was limited to something around 28K which is abysmal for any system with even moderate redo generation rate). As we can see from the above, we had &lt;i&gt;2000x512=1000K&lt;/i&gt; worth of data read in one shot so at least that part works as advertised. But how is it done?&lt;br /&gt;
&lt;br /&gt;
A stack dump reveals the following:&lt;br /&gt;
&lt;br /&gt;
&lt;pre&gt;[oracle@ora11gr2 ggs_11g]$ pstack 4740
#1  0x00000000090bae87 in sskgpwwait ()
#2  0x00000000090b9f9a in skgpwwait ()
#3  0x0000000008d29f93 in ksliwat ()
#4  0x0000000008d29541 in kslwaitctx ()
#5  0x0000000008d269ab in kslwait ()
&lt;b&gt;#6  0x00000000071f8566 in knloggRedoRead ()
#7  0x00000000071f9e05 in knloggmain ()&lt;/b&gt;
#8  0x0000000008eae67a in opiodr ()
#9  0x00000000090467f9 in ttcpip ()
#10 0x0000000001722f86 in opitsk ()
#11 0x0000000001727c26 in opiino ()
#12 0x0000000008eae67a in opiodr ()
#13 0x000000000171eedc in opidrv ()
#14 0x0000000001d95f5f in sou2o ()
#15 0x0000000000a07935 in opimai_real ()
#16 0x0000000001d9b408 in ssthrdmain ()
#17 0x0000000000a078a1 in main ()&lt;/pre&gt;&lt;br /&gt;
Look at the functions in line #6 and #7 (&lt;i&gt;knloggmain()&lt;/i&gt; and &lt;i&gt;knloggRedoRead()&lt;/i&gt;). These functions follow right after OPI (Oracle Program Interface) layer which makes this code path look rather efficient. No need to deal with compilation (KK) and execution (KX) layers. Just a direct and straightforward call to the functions we need. Indeed, when I did some quick and dirty tests (using 11.2.0.2), the new code path was able to archive about 90% performance compared to just reading your redo from ext3 filesystem.&lt;br /&gt;
&lt;br /&gt;
The only question I have about all this is a security paradigm behind &lt;i&gt;knloggmain()&lt;/i&gt; calls. This function appears to be directly exposed through the OPI layer and, while this helps archive greater efficiency, how does it control who can who can not calls it? You may not necessarily want somebody connect to your database and start reading your redo stream as you may get all kinds of sensitive information right there. I haven't spent any time figuring that one out yet though.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15440695-4573771736624458913?l=afatkulin.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://afatkulin.blogspot.com/2011/03/oracle-goldengate-asm-and-dblogreader.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-6695699248811420656</guid><pubDate>Sun, 19 Dec 2010 19:54:00 +0000</pubDate><atom:updated>2010-12-19T14:54:58.918-05:00</atom:updated><title>Oracle GoldenGate Trail File Size</title><description>When it comes to estimating how big your trail files will be, Oracle &lt;a href="http://download.oracle.com/docs/cd/E18101_01/doc.1111/e17799.pdf"&gt;documentation&lt;/a&gt; suggests to use the following formula:&lt;br /&gt;
&lt;pre&gt;[log volume in one hour] x [number of hours downtime] x .4 = trail disk space&lt;/pre&gt;It is also described as being a conservative estimate so you are likely to archive a better mileage. What I found is that there are some corner cases which can produce some "anomaly" results.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Test Case&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
I'll start with a specially crafted schema and table names as well as data:&lt;br /&gt;
&lt;pre&gt;create user uuuuuuuuuuuuuuuuuuuuuuuuuuuuuu identified by "u";

User created.

SQL&gt; grant resource to uuuuuuuuuuuuuuuuuuuuuuuuuuuuuu;

Grant succeeded.

SQL&gt; create table uuuuuuuuuuuuuuuuuuuuuuuuuuuuuu.tttttttttttttttttttttttttttttt
  2  (
  3     n number primary key
  4  );

Table created.

SQL&gt; alter table uuuuuuuuuuuuuuuuuuuuuuuuuuuuuu.tttttttttttttttttttttttttttttt
  2 add supplemental log data (primary key) columns;

Table altered.&lt;/pre&gt;I've just created a new trail file which is currently 974 bytes in size:&lt;br /&gt;
&lt;pre&gt;[oracle@gg1 dirdat]$ ls -l aa000000
-rw-rw-rw- 1 oracle oinstall 974 Dec 19 11:50 aa000000&lt;/pre&gt;What I'm going to do is insert some data into a table and then measure how much redo as well as trail data were generated:&lt;br /&gt;
&lt;pre&gt;SQL&gt; set autot traceonly stat
SQL&gt; insert into uuuuuuuuuuuuuuuuuuuuuuuuuuuuuu.tttttttttttttttttttttttttttttt
      select power(10,10)+(level-1)*power(10,10)
        from dual
        connect by level &lt;= 10000;  2    3    4

10000 rows created.


Statistics
----------------------------------------------------------
        0  recursive calls
      329  db block gets
       31  consistent gets
        0  physical reads
     364612  redo size
      821  bytes sent via SQL*Net to client
      917  bytes received via SQL*Net from client
        3  SQL*Net roundtrips to/from client
        2  sorts (memory)
        0  sorts (disk)
      10000  rows processed

SQL&gt; commit;

Commit complete.&lt;/pre&gt;Let's take a look at the trail file size:&lt;br /&gt;
&lt;pre&gt;[oracle@gg1 dirdat]$ ls -l aa000000
-rw-rw-rw- 1 oracle oinstall 1619890 Dec 19 13:40 aa000000&lt;/pre&gt;That is roughly 4.4 times bigger than our redo size and 11 times bigger than Oracle's "conservative" estimate. Your storage provisioning might be in for a surprise.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Trail File&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
To understand the key factors which resulted in our trail file to be of such outrageous size let's use &lt;i&gt;logdump&lt;/i&gt; and take a look inside:&lt;br /&gt;
&lt;pre&gt;Logdump 11 &gt;open ./dirdat/aa000000
Current LogTrail is /u01/app/oracle/ggs/dirdat/aa000000
Logdump 12 &gt;ghdr on
Logdump 13 &gt;next 2

2010/12/19 13:39:16.632.818 FileHeader           Len   966 RBA 0
Name: *FileHeader*
 3000 01b6 3000 0008 4747 0d0a 544c 0a0d 3100 0002 | 0...0...GG..TL..1...
 0002 3200 0004 2000 0000 3300 0008 02f1 bdfb 2d3e | ..2... ...3.......-&gt;
 74f2 3400 0028 0026 7572 693a 6767 313a 7175 6164 | t.4..(.&amp;uri:gg1:quad
 726f 3a63 6f6d 3a3a 7530 313a 6170 703a 6f72 6163 | ro:com::u01:app:orac
 6c65 3a67 6773 3600 0025 0023 2f75 3031 2f61 7070 | le:ggs6..%.#/u01/app
 2f6f 7261 636c 652f 6767 732f 6469 7264 6174 2f61 | /oracle/ggs/dirdat/a
 6130 3030 3030 3037 0000 0101 3800 0004 0000 0000 | a0000007....8.......

___________________________________________________________________
Hdr-Ind    :     E  (x45)     Partition  :     .  (x04)
UndoFlag   :     .  (x00)     BeforeAfter:     A  (x41)
RecLength  :    19  (x0013)   IO Time    : 2010/12/19 13:40:32.000.000
IOType     :     5  (x05)     OrigNode   :   255  (xff)
TransInd   :     .  (x00)     FormatType :     R  (x52)
SyskeyLen  :     0  (x00)     Incomplete :     .  (x00)
AuditRBA   :         45       AuditPos   : 37458516
Continued  :     N  (x00)     RecCount   :     1  (x01)

2010/12/19 13:40:32.000.000 Insert               Len    19 RBA 974
Name: UUUUUUUUUUUUUUUUUUUUUUUUUUUUUU.TTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
After  Image:                                             Partition 4   G  b
 0000 000f 0000 000b 3130 3030 3030 3030 3030 30   | ........10000000000&lt;/pre&gt;The first record is a standard trail file header and is of little interest to us. We're going to take look at the second record. First of all, as you can see, we've got a fully qualified table name stored in there. We can see the inserted value as well. Now if all that stuff got stored in plain text that could explain it... let's check it out:&lt;br /&gt;
&lt;pre&gt;[oracle@gg1 dirdat]$ grep --binary-files=text -o -i \
&gt; uuuuuuuuuuuuuuuuuuuuuuuuuuuuuu.tttttttttttttttttttttttttttttt \
&gt; aa000000 \
&gt; | wc -l
10000&lt;/pre&gt;Indeed, we've got a fully qualified table name appearing 10 thousand times on our trail file! The longer your schema and/or your table names are the bigger your trail file will be. The main reason Oracle Streams are using object identifiers (as well as Oracle's redo) is to avoid getting into exactly this kind of trouble. This makes GoldenGate configuration easier as it doesn't have to store the mapping information on the target database but you also have to pay the price with every row modification which gets captured.&lt;br /&gt;
&lt;br /&gt;
The story doesn't end there, however. Let's take a look at the inserted data. How do you think that got stored?&lt;br /&gt;
&lt;pre&gt;[oracle@gg1 dirdat]$ grep --binary-files=text -o -i \
&gt; 0000000000 \
&gt; aa000000 \
&gt; | wc -l
10000&lt;/pre&gt;That's right - all numbers got stored in plain text. The fundamental problem here is that trail file does not store data types. Everything just gets converted to strings. This helps dealing with heterogeneous data sources but also makes the storage for certain data types to be very inefficient. Again, you have to pay the price somewhere. If you &lt;i&gt;gzip&lt;/i&gt; the above trail file you'll get a whopping 25x compression ratio.&lt;br /&gt;
&lt;br /&gt;
You may ask yourself how column names are being stored? The answer is there are no column names in the trail file, just column positions. That's why GoldenGate requires column order to match between source and target databases, otherwise you'll get your data mapped to a wrong column. Add a lack of data types and you might be in for a very interesting results. If column order doesn't match then you'll have to use &lt;i&gt;defgen&lt;/i&gt; to get your data across properly.&lt;br /&gt;
&lt;br /&gt;
In a nutshell, watch out for tables/schemas with long names and lots of numeric data.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15440695-6695699248811420656?l=afatkulin.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://afatkulin.blogspot.com/2010/12/oracle-goldengate-trail-file-size.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>3</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-7788693593463771762</guid><pubDate>Tue, 20 Jul 2010 02:32:00 +0000</pubDate><atom:updated>2010-07-19T22:32:09.466-04:00</atom:updated><title>ASM Mirroring and Disk Partnership</title><description>Let's say you have a RAID10 disk array&amp;nbsp;with&amp;nbsp;50 disks. I'm offering you to play the following game. You'll spin a wheel of fortune&amp;nbsp;divided&amp;nbsp;into 50 equal slots with numbers ranging from 1 to 50. After you spun the wheel for the first time you'll write the resulted number down. After that I'll offer you to spin the second wheel of fortune equally&amp;nbsp;divided&amp;nbsp;into 49 slots where the number&amp;nbsp;you've just&amp;nbsp;pulled is missing (so you can't pull the same number twice).&lt;br /&gt;
&lt;br /&gt;
After that you'll end up with two (mutually exclusive) random numbers in a range from 1 to 50. What I'm going to ask you do next is pull the disks with corresponding numbers out of your RAID10 array. What are the odds of your entire array going down?&lt;br /&gt;
&lt;br /&gt;
In a classical RAID10 setup where every drive is being mirrored by a single other drive you can calculate the probability in a&amp;nbsp;straightforward&amp;nbsp;fashion. After you've pulled the first drive out (doesn't matter which one) there is only one specific disk out of 49 which you have to pull in order for the entire array to go bust. So your odds of taking the entire array down will be 1/49 or roughly 2%.&lt;br /&gt;
&lt;br /&gt;
Now let's say that instead of a regular RAID10 array you've got a normal redundancy ASM disk group with two failure groups (25 disks each). What are the odds now?&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;ASM Mirroring&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
Before we're going to answer the above question we need to realize that ASM does not mirror disks the same way traditional RAID10 does. In fact it doesn't mirror disks at all. It mirrors extents instead. For all you know is that the extents from the disk you've just pulled out won't be mirrored in the same failure group. So that leaves us with 24 disks as safe. But what about the other 25 disks from the other failure group? How much of these disks are unsafe and will result in your normal redundancy disk group going south?&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Disk Partnership&lt;/b&gt;&lt;br /&gt;
&lt;b&gt;&lt;br /&gt;
&lt;/b&gt;&lt;br /&gt;
When mirroring extents ASM uses a concept called Disk Partnership. Every disk in a normal redundancy disk group has one or more partners which are used to mirror primary extents from that disk. This also means that the loss of any of the partner disks is fatal to the disk group as you'll nuke both the primary extent and it's mirror copy (keep in mind that we're talking about pulling both disks out before the disk group would be able to rebalance). At least now we know what we need next in order to solve the puzzle. We need to find out how many partners each of the disks in our array have.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Disk Partners&lt;/b&gt;&lt;br /&gt;
&lt;b&gt;&lt;br /&gt;
&lt;/b&gt;&lt;br /&gt;
Let's say that the first disk we've pulled out was disk number 0. The following query can be used to find all partners for the disk number 0 in a first disk group:&lt;br /&gt;
&lt;br /&gt;
&lt;pre&gt;SQL&amp;gt; select p.number_kfdpartner, d.FAILGROUP
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;from x$kfdpartner p, v$asm_disk d
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;where p.disk=0
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;and p.grp=1
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;and p.grp=group_number
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;and p.number_kfdpartner=d.disk_number; &amp;nbsp;2 &amp;nbsp; &amp;nbsp;3 &amp;nbsp; &amp;nbsp;4 &amp;nbsp; &amp;nbsp;5 &amp;nbsp; &amp;nbsp;6

NUMBER_KFDPARTNER FAILGROUP
----------------- ------------------------------
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 25 FG2
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 26 FG2
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 27 FG2
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 29 FG2
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 38 FG2
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 46 FG2
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 48 FG2
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 49 FG2

8 rows selected.&lt;/pre&gt;Pulling any of the above disks out at the same time with the disk number 0 will be fatal for our normal redundancy disk group. In other words, once we pull the first disk out, there are other 8 disks out of 49 which are unsafe. That will bring our odds up (or down, depending in which outcome you're interested :) to&amp;nbsp;8/49 or&amp;nbsp;a little bit more than 16%.&lt;br /&gt;
&lt;br /&gt;
You can confirm that every disk has exactly 8 partners by running the following query:&lt;br /&gt;
&lt;br /&gt;
&lt;pre&gt;SQL&amp;gt; select min(cnt), max(cnt) from (
select number_kfdpartner disk_number, count(*) cnt
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;from x$kfdpartner
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;where grp=1
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;group by number_kfdpartner); &amp;nbsp;2 &amp;nbsp; &amp;nbsp;3 &amp;nbsp; &amp;nbsp;4 &amp;nbsp; &amp;nbsp;5

&amp;nbsp;&amp;nbsp;MIN(CNT) &amp;nbsp; MAX(CNT)
---------- ----------
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 8 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;8&lt;/pre&gt;&lt;br /&gt;
&lt;b&gt;Partner Disk Count&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
Is there any way to control the number of partner disks which ASM uses for extents mirroring? Turns out that there is. &lt;i&gt;Note that it's a completely non supported operation so you shouldn't be playing with it.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
The parameter which controls the &lt;i&gt;maximum&lt;/i&gt; number of partner disks is called&amp;nbsp;&lt;i&gt;_asm_partner_target_disk_part&lt;/i&gt;. In 11GR2 that parameter has a default value of 8. I didn't have a chance to check it in a previous versions but supposedly it's default value there is 10(&lt;a href="#ref1"&gt;1&lt;/a&gt;). So at least we know that Oracle itself sometimes changes it between different releases.&lt;br /&gt;
&lt;br /&gt;
All you need to do after changing this parameter is to rebalance the disk group:&lt;br /&gt;
&lt;br /&gt;
&lt;pre&gt;SQL&amp;gt; alter system set "_asm_partner_target_disk_part"=2;

System altered.

SQL&amp;gt; alter diskgroup data rebalance;

Diskgroup altered.

&lt;div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;SQL&amp;gt; select p.number_kfdpartner, d.FAILGROUP&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;from x$kfdpartner p, v$asm_disk d&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;where p.disk=0&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;and p.grp=1&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;and p.grp=group_number&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;and p.number_kfdpartner=d.disk_number; &amp;nbsp;2 &amp;nbsp; &amp;nbsp;3 &amp;nbsp; &amp;nbsp;4 &amp;nbsp; &amp;nbsp;5 &amp;nbsp; &amp;nbsp;6&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;NUMBER_KFDPARTNER FAILGROUP&lt;/div&gt;&lt;div&gt;----------------- ------------------------------&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 25 FG2&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 38 FG2&lt;/div&gt;&lt;/div&gt;&lt;/pre&gt;So here it goes!&lt;br /&gt;
&lt;div&gt;&lt;br /&gt;
&lt;/div&gt;&lt;div&gt;&lt;b&gt;References&lt;/b&gt;&lt;/div&gt;&lt;a name="ref1" href="http://www.amazon.com/Oracle-Automatic-Storage-Management-Under/dp/0071496076"&gt;Oracle Automatic Storage Management: Under-the-Hood &amp;amp; Practical Deployment Guide&lt;/a&gt;&lt;br /&gt;
&lt;a href="https://twiki.cern.ch/twiki/bin/view/PDBService/ASM_Internals"&gt;ASM Metadata and Internals&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;All test were performed with Oracle Grid Infrastructure 11.2.0.1&lt;/i&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15440695-7788693593463771762?l=afatkulin.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://afatkulin.blogspot.com/2010/07/asm-mirroring-and-disk-partnership.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>5</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-3610321853051438063</guid><pubDate>Wed, 09 Jun 2010 02:00:00 +0000</pubDate><atom:updated>2010-06-09T22:11:43.598-04:00</atom:updated><title>11GR2 Result Cache Scalability</title><description>&lt;style type="text/css"&gt;.nobrtable br { display: none }&lt;/style&gt;&lt;span style="font-weight:bold;"&gt;Result Cache in 11GR1&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Two years ago I wrote a series of posts where I explained some of the dynamics around Result Cache latch. To recap, the result cache memory in 11GR1 is backed up by a single RC latch. That in itself wouldn't be so much of an issue (at least relatively to what we've got in reality) had the latch allowed for shared mode gets in case all you have to do is read from the result cache memory.&lt;br /&gt;&lt;br /&gt;Alas, the latch turned out to be without shared mode gets. It is going almost without saying that, as concurrency levels increased, that single latch was behaving more and more like to a hand brake (&lt;a href="http://www.pythian.com/news/683/oracle-11g-result-cache-tested-on-eight-way-itanium"&gt;link&lt;/a&gt; to a test I've done back then on a 8-way Itanium 2).&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Back to the future&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;When 11GR2 has been released I knew that at some point in time I'll need to go back and revisit this subject. What I did is a couple of quick and dirty runs which came back confirming the same single latch and no shared mode gets so it didn't look like something has really changed. At this point I've decided to revisit it a bit later. This a "bit later" happened just recently.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;How bad can it get?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;What I wanted to do is get an UltraSPARC T2 and face it against Core i7 980X on a different concurrency levels in order to see how bad it can get. T2 will require quite a lot of parallelism in order to keep up even with a single i7 core. But since all we've got is a single RC latch, I've expected T2 to choke on it quite fast as not only there will be a lot of processes competing for the same latch, the slow single-threaded performance will cause the latch to be held for a much longer periods of time. Performance degradation will be dare.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Result Cache in 11GR2&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I used the same test described &lt;a href="http://www.pythian.com/news/598/oracle-11g-query-result-cache-rc-latches"&gt;here&lt;/a&gt; as it is targeted at exploiting RC latch weakness and gives me the ability to compare with the old results. I've used 250K lookup iterations. The performance was measured as a total number of lookups performed per second and RC latch statistics were captured for analysis.&lt;br /&gt;&lt;br /&gt;Since 980X has 6 cores and 12 threads, the tests were done with 1 to 12 processes running at the same time which also gave an opportunity to see how well HT will scale. Note that I plan to do some further testing on T2 with up to 64 threads but for now I've tested up to 12 threads only as I couldn't get a test window big enough.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;UltraSPARC T2 Results&lt;/span&gt;&lt;br /&gt;&lt;div class="nobrtable"&gt;&lt;br /&gt;&lt;table border="1"&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;th&gt;# of processes&lt;/th&gt;&lt;br /&gt;  &lt;th&gt;Buffer Cache&lt;/th&gt;&lt;br /&gt;  &lt;th&gt;% linear&lt;/th&gt;&lt;br /&gt;  &lt;th&gt;Result Cache&lt;/th&gt;&lt;br /&gt;  &lt;th&gt;% linear&lt;/th&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;1&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;4426&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;100&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;4555&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;100&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;2&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;8930&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;100.88&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;9124&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;100.15&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;3&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;13465&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;101.41&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;13731&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;100.48&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;4&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;17886&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;101.03&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;18179&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;99.77&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;5&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;22290&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;100.72&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;22715&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;99.74&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;6&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;26615&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;100.22&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;27012&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;98.84&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;7&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;30659&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;98.96&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;30804&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;96.61&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;8&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;34347&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;97&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;34910&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;95.8&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;9&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;38389&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;96.37&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;39029&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;95.2&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;10&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;42772&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;96.64&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;43126&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;94.68&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;11&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;46840&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;96.21&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;46936&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;93.68&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;12&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;50667&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;95.4&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;50590&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;92.55&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt;&lt;/table&gt;&lt;br /&gt;&lt;/div&gt;When I saw these numbers for the first time I was quite surprised just how good these results are! UlstraSPARC T2 end up being far from choking and, as a matter of fact, the only position where Result Cache had to give up is the last one. If you reference the &lt;a href="http://www.pythian.com/news/683/oracle-11g-result-cache-tested-on-eight-way-itanium"&gt;results&lt;/a&gt; I've obtained on 8-way Itanium 2 in 11GR1 you'll see that Result Cache gave up much earlier and scaled a lot worse.&lt;br /&gt;&lt;br /&gt;This certainly looks promising so let's take a look at the RC latch statistic:&lt;br /&gt;&lt;br /&gt;&lt;div class="nobrtable"&gt;&lt;br /&gt;&lt;table border="1"&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;th&gt;# of processes&lt;/th&gt;&lt;br /&gt;  &lt;th&gt;Gets&lt;/th&gt;&lt;br /&gt;  &lt;th&gt;Misses&lt;/th&gt;&lt;br /&gt;  &lt;th&gt;Sleeps&lt;/th&gt;&lt;br /&gt;  &lt;th&gt;Wait Time&lt;/th&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;1&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;500001&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;0&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;0&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;0&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;2&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;1000002&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;40253&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;1&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;0&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;3&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;1500003&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;50404&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;0&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;0&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;4&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;2000004&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;165116&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;9&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;464&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;5&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;2500005&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;211559&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;5&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;182&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;6&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;3000006&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;437898&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;8&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;6877&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;7&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;3500007&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;805752&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;52&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;16556&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;8&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;4000008&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;1214762&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;20&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;2980&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;9&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;4500009&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;1775372&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;188&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;3140&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;10&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;5000010&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;2244964&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;491&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;29568&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;11&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;5500011&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;2552323&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;664&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;28011&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;12&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;6000012&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;3019903&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;1226&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;60005&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt;&lt;/table&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;There is one astonishing fact about the above number. Let's get some efficiency metrics in place for comparison between these numbers and the ones I've got in 11GR1. I'll use a data point with eight parallel processes as it's the highest reference point I can get across both data sets.&lt;br /&gt;&lt;br /&gt;First of all, the number of gets per execution remained the same and equals two gets per exec. If we were going to calculate &lt;span style="font-style:italic;"&gt;% miss per get&lt;/span&gt; we'll get 28.62% in 11GR1 and 50.33% in 11GR2. In other words, roughly every second get request has resulted in a miss in 11GR2 and every third in 11GR1. It may appear as if this got worse but it's really a consequence from something else.&lt;br /&gt;&lt;br /&gt;If we calculate &lt;span style="font-style:italic;"&gt;% sleep per miss&lt;/span&gt; we'll get 31.36% in 11GR1 but only 0.04% in 11GR2! In other words, the amount of times a process had to go to sleep has drastically decreased. In almost all of the cases the process was able to acquire a latch during a spin without going into a sleep. This also explains why &lt;span style="font-style:italic;"&gt;% miss per get&lt;/span&gt; in 11GR2 went up and shows that a lowering in efficiency for a single metric does not necessarily indicates a problem, it might happen because some other correlated metric has in fact improved.&lt;br /&gt;&lt;br /&gt;There is certainly a sign of a great improvement but what is it? Most likely the improvement is related to the optimization of how long the latch is required to be held. The time required to hold the latch became so small that, in most of the cases, the process is able to acquire it during spinning before being required to go to sleep (i.e. less than &lt;span style="font-style:italic;"&gt;_spin_count&lt;/span&gt; iterations).&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Core i7 980X Results&lt;/span&gt;&lt;br /&gt;&lt;div class="nobrtable"&gt;&lt;br /&gt;&lt;table border="1"&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;th&gt;# of processes&lt;/th&gt;&lt;br /&gt;  &lt;th&gt;Buffer Cache&lt;/th&gt;&lt;br /&gt;  &lt;th&gt;% linear&lt;/th&gt;&lt;br /&gt;  &lt;th&gt;Result Cache&lt;/th&gt;&lt;br /&gt;  &lt;th&gt;% linear&lt;/th&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;1&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;40064&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;100&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;43554&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;100&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;2&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;78989&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;98.58&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;84602&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;97.12&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;3&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;121753&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;101.3&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;127768&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;97.79&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;4&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;159490&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;99.52&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;166667&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;95.67&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;5&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;194704&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;97.2&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;204583&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;93.94&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;6&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;229709&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;95.56&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;240770&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;92.13&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;7&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;231788&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;82.65&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;244755&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;80.28&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;8&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;233918&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;72.98&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;246305&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;70.69&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;9&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;250836&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;69.57&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;260718&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;66.51&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;10&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;267094&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;66.67&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;275330&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;63.22&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;11&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;280326&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;63.61&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;290084&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;60.55&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;12&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;290416&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;60.41&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;293830&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;56.22&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt;&lt;/table&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;Here Result Cache won across all the positions. We need about 10 processes running on UltraSPARC T2 in order to beat a single process running on i7 980X. Performance gains declined rapidly once we got over six concurrent processes but still we were able to realize some additional performance with 12 threads being about 22% faster than 6 threads.&lt;br /&gt;&lt;br /&gt;Latch statistics:&lt;br /&gt;&lt;br /&gt;&lt;div class="nobrtable"&gt;&lt;br /&gt;&lt;table border="1"&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;th&gt;# of processes&lt;/th&gt;&lt;br /&gt;  &lt;th&gt;Gets&lt;/th&gt;&lt;br /&gt;  &lt;th&gt;Misses&lt;/th&gt;&lt;br /&gt;  &lt;th&gt;Sleeps&lt;/th&gt;&lt;br /&gt;  &lt;th&gt;Wait Time&lt;/th&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;1&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;500001&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;0&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;0&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;0&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;2&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;1000002&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;40456&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;0&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;0&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;3&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;1500003&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;117893&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;5&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;71&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;4&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;2000004&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;209399&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;0&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;0&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;5&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;2500005&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;381160&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;0&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;0&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;6&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;3000006&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;517745&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;11&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;179&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;7&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;3500007&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;913125&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;20&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;555&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;8&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;4000008&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;1355226&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;26&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;11914&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;9&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;4500009&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;1834112&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;13&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;1017&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;10&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;5000010&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;2602801&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;42&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;1607&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;11&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;5500011&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;3196415&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;145&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;3451&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;12&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;6000012&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;3730467&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;184&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;123954&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt;&lt;/table&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;Essentially we're looking at the same phenomena with the amount of sleeps being significantly lower compared to what we observed in 11GR1. With six concurrent processes &lt;span style="font-style:italic;"&gt;% miss per get&lt;/span&gt; is 17.26% and &lt;span style="font-style:italic;"&gt;% sleep per miss&lt;/span&gt; is 0.002%! This allowed Result Cache to stay ahead with up to (and including) 12 concurrent processes running.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;UltraSPARC T2 vs i7 980X&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;We'll wrap up with a nice graph showing result cache performance on both UltraSPARC T2 and Core i7 980X:&lt;br /&gt;&lt;br /&gt;&lt;img src="http://lh5.ggpht.com/_D2NCa9-CEuQ/TBBFkX67qUI/AAAAAAAAAoc/4INLSO4dFkI/t2_vs_980x.png"&gt;&lt;/img&gt;&lt;br /&gt;&lt;br /&gt;i7 980X starts almost where 12 UltraSPARC T2 processes ends. Would T2 be able to narrow the gap with more parallel threads? I'll certainly find out.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Conclusion&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;There is an enormous improvement when it comes to Result Cache scalability in 11GR2. Still it's slower than if we had shared mode gets (or multiple child latches or, even better, both) but it gets very, very close.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15440695-3610321853051438063?l=afatkulin.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://afatkulin.blogspot.com/2010/06/11gr2-result-cache-scalability.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://lh5.ggpht.com/_D2NCa9-CEuQ/TBBFkX67qUI/AAAAAAAAAoc/4INLSO4dFkI/s72-c/t2_vs_980x.png" height="72" width="72" /><thr:total>2</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-16644232208784654</guid><pubDate>Mon, 07 Jun 2010 21:47:00 +0000</pubDate><atom:updated>2010-06-07T21:21:27.224-04:00</atom:updated><title>Row cache objects latch contention</title><description>A data loading process was running on UltraSPARC T2 CPU. To take advantage of the platform architecture (or, I'd rather say, to avoid it's limitations) the loading process has been design to run with a massive amount of parallel query slaves in order to extract the maximum output from CMT architecture.&lt;br /&gt;&lt;br /&gt;Every time this data loading process executed, it experienced strange slowdowns on seemingly random points in time. Performance drops were quite substantial, which prompted to do an additional investigation. Upon a closer examination of ASH data, it turned out that all slowdowns were due to &lt;span style="font-style:italic;"&gt;latch: row cache objects&lt;/span&gt; contention.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt;Row cache objects&lt;/span&gt; latch protects the dictionary cache. The first thing was to figure out whether most of the contention was contributed by a particular &lt;span style="font-style:italic;"&gt;row cache objects&lt;/span&gt; child latch:&lt;br /&gt;&lt;pre&gt;SQL&gt; select latch#, child#, sleeps&lt;br /&gt;  2   from v$latch_children&lt;br /&gt;  3   where name='row cache objects'&lt;br /&gt;  4    and sleeps &gt; 0&lt;br /&gt;  5   order by sleeps desc;&lt;br /&gt; &lt;br /&gt;    LATCH#     CHILD#     SLEEPS&lt;br /&gt;---------- ---------- ----------&lt;br /&gt;       270          1   24241645&lt;br /&gt;       270          5        523&lt;br /&gt;       270          4         52&lt;/pre&gt;The first child certainly doesn't look good when we take the amount of sleeps experienced by it, compared to all the other child latches. Once we have the troublesome child latch identified, we can move on and see which type of dictionary cache it protects:&lt;br /&gt;&lt;pre&gt;SQL&gt; select distinct s.kqrstcln latch#,r.cache#,r.parameter name,r.type,r.subordinate#&lt;br /&gt;from v$rowcache r,x$kqrst s&lt;br /&gt;where r.cache#=s.kqrstcid&lt;br /&gt;order by 1,4,5;  2    3    4&lt;br /&gt;&lt;br /&gt; LATCH# CACHE# NAME                              TYPE        SUBORDINATE#&lt;br /&gt;------- ------ --------------------------------- ----------- ------------&lt;br /&gt;      1      3 dc_rollback_segments              PARENT&lt;br /&gt;      2      1 dc_free_extents                   PARENT&lt;br /&gt;      3      4 dc_used_extents                   PARENT&lt;br /&gt;      4      2 dc_segments                       PARENT&lt;br /&gt;      5      0 dc_tablespaces                    PARENT&lt;br /&gt;      6      5 dc_tablespace_quotas              PARENT&lt;br /&gt;      7      6 dc_files                          PARENT&lt;br /&gt;      8     10 dc_users                          PARENT&lt;br /&gt;      8      7 dc_users                          SUBORDINATE            0&lt;br /&gt;      8      7 dc_users                          SUBORDINATE            1&lt;br /&gt;      8      7 dc_users                          SUBORDINATE            2&lt;br /&gt;      9      8 dc_objects                        PARENT&lt;br /&gt;      9      8 dc_object_grants                  SUBORDINATE            0&lt;br /&gt;     10     17 dc_global_oids                    PARENT&lt;br /&gt;     11     12 dc_constraints                    PARENT&lt;br /&gt;     12     13 dc_sequences                      PARENT&lt;br /&gt;     13     16 dc_histogram_defs                 PARENT&lt;br /&gt;     13     16 dc_histogram_data                 SUBORDINATE            0&lt;br /&gt;     13     16 dc_histogram_data                 SUBORDINATE            1&lt;br /&gt;     14     32 kqlsubheap_object                 PARENT&lt;br /&gt;     15     19 dc_table_scns                     PARENT&lt;br /&gt;     15     19 dc_partition_scns                 SUBORDINATE            0&lt;br /&gt;     16     18 dc_outlines                       PARENT&lt;br /&gt;     17     14 dc_profiles                       PARENT&lt;br /&gt;     18     47 realm cache                       PARENT&lt;br /&gt;     18     47 realm auth                        SUBORDINATE            0&lt;br /&gt;     19     48 Command rule cache                PARENT&lt;br /&gt;     20     49 Realm Object cache                PARENT&lt;br /&gt;     20     49 Realm Subordinate Cache           SUBORDINATE            0&lt;br /&gt;     21     46 Rule Set Cache                    PARENT&lt;br /&gt;     22     34 extensible security user and rol  PARENT&lt;br /&gt;     23     35 extensible security principal pa  PARENT&lt;br /&gt;     24     37 extensible security UID to princ  PARENT&lt;br /&gt;     25     36 extensible security principal na  PARENT&lt;br /&gt;     26     33 extensible security principal ne  PARENT&lt;br /&gt;     27     38 XS security class privilege       PARENT&lt;br /&gt;     28     39 extensible security midtier cach  PARENT&lt;br /&gt;     29     44 event map                         PARENT&lt;br /&gt;     30     45 format                            PARENT&lt;br /&gt;     31     43 audit collector                   PARENT&lt;br /&gt;     32     15 global database name              PARENT&lt;br /&gt;     33     20 rule_info                         PARENT&lt;br /&gt;     34     21 rule_or_piece                     PARENT&lt;br /&gt;     34     21 rule_fast_operators               SUBORDINATE            0&lt;br /&gt;     35     23 dc_qmc_ldap_cache_entries         PARENT&lt;br /&gt;     36     52 qmc_app_cache_entries             PARENT&lt;br /&gt;     37     53 qmc_app_cache_entries             PARENT&lt;br /&gt;     38     27 qmtmrcin_cache_entries            PARENT&lt;br /&gt;     39     28 qmtmrctn_cache_entries            PARENT&lt;br /&gt;     40     29 qmtmrcip_cache_entries            PARENT&lt;br /&gt;     41     30 qmtmrctp_cache_entries            PARENT&lt;br /&gt;     42     31 qmtmrciq_cache_entries            PARENT&lt;br /&gt;     43     26 qmtmrctq_cache_entries            PARENT&lt;br /&gt;     44      9 qmrc_cache_entries                PARENT&lt;br /&gt;     45     50 qmemod_cache_entries              PARENT&lt;br /&gt;     46     24 outstanding_alerts                PARENT&lt;br /&gt;     47     22 dc_awr_control                    PARENT&lt;br /&gt;     48     25 SMO rowcache                      PARENT&lt;br /&gt;     49     40 sch_lj_objs                       PARENT&lt;br /&gt;     50     41 sch_lj_oids                       PARENT&lt;br /&gt;&lt;br /&gt;60 rows selected.&lt;/pre&gt;The first child protects &lt;span style="font-style:italic;"&gt;dc_rollback_segments&lt;/span&gt;. We can confirm it by referencing data in &lt;span style="font-style:italic;"&gt;v$rowcache&lt;/span&gt;:&lt;br /&gt;&lt;pre&gt;SQL&gt; select parameter, gets&lt;br /&gt;  2   from v$rowcache&lt;br /&gt;  3   order by gets desc;&lt;br /&gt; &lt;br /&gt;PARAMETER                              GETS&lt;br /&gt;-------------------------------- ----------&lt;br /&gt;dc_rollback_segments              310995555&lt;br /&gt;dc_tablespaces                     76251831&lt;br /&gt;dc_segments                         3912096&lt;br /&gt;dc_users                            2307601&lt;br /&gt;dc_objects                          1460725&lt;br /&gt;dc_users                             608659&lt;br /&gt;dc_histogram_defs                    250666&lt;br /&gt;global database name                  67475&lt;br /&gt;dc_histogram_data                     43098&lt;br /&gt;dc_histogram_data                     14364&lt;br /&gt;dc_global_oids                        14320&lt;br /&gt;outstanding_alerts                     2956&lt;br /&gt;dc_profiles                            2555&lt;br /&gt;dc_awr_control                         1925&lt;br /&gt;dc_object_grants                        745&lt;br /&gt;dc_files                                532&lt;br /&gt;dc_constraints                          201&lt;br /&gt;sch_lj_oids                             158&lt;br /&gt;dc_sequences                            156&lt;br /&gt;dc_table_scns                            20&lt;br /&gt;sch_lj_objs                              18&lt;br /&gt;dc_qmc_ldap_cache_entries                 0&lt;br /&gt;qmc_app_cache_entries                     0&lt;br /&gt;qmc_app_cache_entries                     0&lt;br /&gt;qmtmrcin_cache_entries                    0&lt;br /&gt;qmtmrctn_cache_entries                    0&lt;br /&gt;qmtmrcip_cache_entries                    0&lt;br /&gt;qmtmrctp_cache_entries                    0&lt;br /&gt;qmtmrciq_cache_entries                    0&lt;br /&gt;qmtmrctq_cache_entries                    0&lt;br /&gt;qmrc_cache_entries                        0&lt;br /&gt;qmemod_cache_entries                      0&lt;br /&gt;SMO rowcache                              0&lt;br /&gt;dc_users                                  0&lt;br /&gt;dc_partition_scns                         0&lt;br /&gt;dc_users                                  0&lt;br /&gt;realm auth                                0&lt;br /&gt;Realm Subordinate Cache                   0&lt;br /&gt;rule_or_piece                             0&lt;br /&gt;rule_info                                 0&lt;br /&gt;audit collector                           0&lt;br /&gt;format                                    0&lt;br /&gt;event map                                 0&lt;br /&gt;extensible security midtier cach          0&lt;br /&gt;XS security class privilege               0&lt;br /&gt;extensible security principal ne          0&lt;br /&gt;extensible security principal na          0&lt;br /&gt;extensible security UID to princ          0&lt;br /&gt;extensible security principal pa          0&lt;br /&gt;extensible security user and rol          0&lt;br /&gt;Rule Set Cache                            0&lt;br /&gt;Realm Object cache                        0&lt;br /&gt;Command rule cache                        0&lt;br /&gt;realm cache                               0&lt;br /&gt;dc_outlines                               0&lt;br /&gt;kqlsubheap_object                         0&lt;br /&gt;dc_tablespace_quotas                      0&lt;br /&gt;dc_used_extents                           0&lt;br /&gt;rule_fast_operators                       0&lt;br /&gt;dc_free_extents                           0&lt;br /&gt; &lt;br /&gt;60 rows selected&lt;/pre&gt;The next step is to see whether latch miss source can give us some more hints regarding the issue:&lt;br /&gt;&lt;pre&gt;SQL&gt; select "WHERE", sleep_count, location&lt;br /&gt;  2   from v$latch_misses&lt;br /&gt;  3   where parent_name='row cache objects'&lt;br /&gt;  4    and sleep_count &gt; 0;&lt;br /&gt; &lt;br /&gt;WHERE               SLEEP_COUNT LOCATION&lt;br /&gt;------------------- ----------- ------------------------------&lt;br /&gt;kqrpre: find obj       20612167 kqrpre: find obj&lt;br /&gt;kqrpup                        7 kqrpup&lt;br /&gt;kqrcmt: while loop            1 kqrcmt: while loop&lt;br /&gt;kqrcmt: clear flag            1 kqrcmt: clear flag&lt;br /&gt;kqreqd                  1026837 kqreqd&lt;br /&gt;kqreqd: reget           2602576 kqreqd: reget&lt;br /&gt; &lt;br /&gt;6 rows selected&lt;/pre&gt;Now if you take &lt;span style="font-style:italic;"&gt;kqrpre: find obj&lt;/span&gt; and plug it into a search on My Oracle Support you'll quickly yield &lt;span style="font-style:italic;"&gt;Bug 5749075  High Requests on dc_rollback_segments&lt;/span&gt;. Among other things, this note points out at the unusually high number of undo segments being created due to cleanup not able to work properly...&lt;br /&gt;&lt;pre&gt;SQL&gt; select count(*) from dba_rollback_segs;&lt;br /&gt; &lt;br /&gt;  COUNT(*)&lt;br /&gt;----------&lt;br /&gt;     14838&lt;/pre&gt;...and this seems to be the case. The only difference is that the issue has been observed on 11GR2 and the bug has been filled against the older versions. Though it was still worth checking in case we were seeing a regression. Indeed, after getting rid of that many undo segments by simply recreating the undo tablespace, the issue, thought not completely vanished, manifested itself a lot less making it's impact relatively insignificant to the process throughput.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15440695-16644232208784654?l=afatkulin.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://afatkulin.blogspot.com/2010/06/row-cache-objects-latch-contention.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>3</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-6202432242512442950</guid><pubDate>Wed, 12 May 2010 23:36:00 +0000</pubDate><atom:updated>2010-05-12T21:42:57.629-04:00</atom:updated><title>The case of a slow lookup</title><description>Take a look at the runtime difference for the same anonymous PL/SQL block between 11GR1 and 11GR2:&lt;br /&gt;&lt;pre&gt;Connected to:&lt;br /&gt;Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production&lt;br /&gt;With the Partitioning, OLAP, Data Mining and Real Application Testing options&lt;br /&gt;&lt;br /&gt;SQL&gt; set timing on&lt;br /&gt;SQL&gt; begin&lt;br /&gt;        for i in 1 .. 1000000&lt;br /&gt;        loop&lt;br /&gt;                for cur in (select n from z_t where n=mod(i,10000)+1)&lt;br /&gt;                loop&lt;br /&gt;                        null;&lt;br /&gt;                end loop;&lt;br /&gt;        end loop;&lt;br /&gt;end;  2    3    4    5    6    7    8    9&lt;br /&gt; 10  /&lt;br /&gt;&lt;br /&gt;PL/SQL procedure successfully completed.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Elapsed: 00:00:31.03&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Connected to:&lt;br /&gt;Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production&lt;br /&gt;With the Partitioning, OLAP, Data Mining and Real Application Testing options&lt;br /&gt;&lt;br /&gt;SQL&gt; set timing on&lt;br /&gt;SQL&gt; begin&lt;br /&gt;        for i in 1 .. 1000000&lt;br /&gt;        loop&lt;br /&gt;                for cur in (select n from z_t where n=mod(i,10000)+1)&lt;br /&gt;                loop&lt;br /&gt;                        null;&lt;br /&gt;                end loop;&lt;br /&gt;        end loop;&lt;br /&gt;end;  2    3    4    5    6    7    8    9&lt;br /&gt; 10  /&lt;br /&gt;&lt;br /&gt;PL/SQL procedure successfully completed.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Elapsed: 00:00:46.23&lt;/span&gt;&lt;/pre&gt;11GR2 end up being about 50% slower.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;A little background&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I've been running some tests and the numbers I was getting back just weren't making any sense. I've been using 11GR2 but all I was getting were marginally better results compared to the tests I've done a couple of years ago using 11GR1 on a much less powerful hardware. Something definitely didn't look right so I geared up to find out what it is. The above test case is something I arrived at, which is simple but demonstrates the issue. This test case was run using the same hardware so the only difference was release of Oracle database being used.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;tkprof results&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;To see whether there are any low hanging fruits I've traced the same loop with 250K iterations (to cut a bit on a trace file size):&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;11GR1&lt;/span&gt;&lt;br /&gt;&lt;pre&gt;SQL ID: 85f6s9ndbynfc&lt;br /&gt;Plan Hash: 3532178995&lt;br /&gt;SELECT N&lt;br /&gt;FROM&lt;br /&gt; Z_T WHERE N=MOD(:B1 ,10000)+1&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;call     count       cpu    elapsed       disk      query    current        rows&lt;br /&gt;------- ------  -------- ---------- ---------- ---------- ----------  ----------&lt;br /&gt;Parse        1      0.00       0.00          0          0          0           0&lt;br /&gt;Execute 249995      2.51       2.24          0          0          0           0&lt;br /&gt;Fetch   249995      1.54       1.68          0     499990          0      249995&lt;br /&gt;------- ------  -------- ---------- ---------- ---------- ----------  ----------&lt;br /&gt;total   499991      4.06       3.92          0     499990          0      249995&lt;br /&gt;&lt;br /&gt;Misses in library cache during parse: 0&lt;br /&gt;Optimizer mode: ALL_ROWS&lt;br /&gt;Parsing user id: 54     (recursive depth: 1)&lt;br /&gt;&lt;br /&gt;Rows     Row Source Operation&lt;br /&gt;-------  ---------------------------------------------------&lt;br /&gt;      1  INDEX UNIQUE SCAN PK_Z_T (cr=2 pr=0 pw=0 time=0 us cost=1 size=13 card=1)(object id 61973)&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;11GR2&lt;/span&gt;&lt;br /&gt;&lt;pre&gt;SQL ID: 85f6s9ndbynfc&lt;br /&gt;Plan Hash: 3532178995&lt;br /&gt;SELECT N&lt;br /&gt;FROM&lt;br /&gt; Z_T WHERE N=MOD(:B1 ,10000)+1&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;call     count       cpu    elapsed       disk      query    current        rows&lt;br /&gt;------- ------  -------- ---------- ---------- ---------- ----------  ----------&lt;br /&gt;Parse        1      0.00       0.00          0          0          0           0&lt;br /&gt;Execute 249999      4.07       4.06          0          0          0           0&lt;br /&gt;Fetch   249999      2.62       2.67          0     499998          0      249999&lt;br /&gt;------- ------  -------- ---------- ---------- ---------- ----------  ----------&lt;br /&gt;total   499999      6.69       6.74          0     499998          0      249999&lt;br /&gt;&lt;br /&gt;Misses in library cache during parse: 0&lt;br /&gt;Optimizer mode: ALL_ROWS&lt;br /&gt;Parsing user id: 57     (recursive depth: 1)&lt;br /&gt;&lt;br /&gt;Rows     Row Source Operation&lt;br /&gt;-------  ---------------------------------------------------&lt;br /&gt;      1  INDEX UNIQUE SCAN PK_Z_T (cr=2 pr=0 pw=0 time=0 us cost=1 size=13 card=1)(object id 69265)&lt;/pre&gt;&lt;br /&gt;These statistics looks almost identical but with one big difference. 11GR2 is about 70% slower.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;What's going on?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Certainly you wouldn't expect the newest release of Oracle database to be so much slower compared to the previous version. As a first step I've traced both loops which resulted in discovering some &lt;a href="http://afatkulin.blogspot.com/2010/05/timing-improvements-in-oracle-11gr2.html"&gt;timing improvements&lt;/a&gt; which I've blogged about just recently. But the additional overhead introduced by &lt;span style="font-style:italic;"&gt;times&lt;/span&gt; syscalls was not even remotely enough to justify the difference. As a next step I've compared session statistics between 11GR1 and 11GR2 which lead to &lt;span style="font-style:italic;"&gt;session cached cursors&lt;/span&gt; &lt;a href="http://www.pythian.com/news/11145/stats-overflow/"&gt;overflow issue&lt;/a&gt; discovery. Apart from that the only difference was significantly greater amount of CPU time consumed by 11GR2 (as evident by the trace file as well).&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Now what?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Certainly some more diagnostic were required. What I did is a couple of stack dumps while the loop was running to see whether there will be something unusual... and there were something indeed:&lt;br /&gt;&lt;pre&gt;[oracle@ora11gr2 trace]$ pstack 1052&lt;br /&gt;#0  0x0000003b83c99247 in times () from /lib64/libc.so.6&lt;br /&gt;#1  0x00000000085556c7 in sltrgatime64 ()&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;#2  0x0000000005521a44 in auddft ()&lt;br /&gt;#3  0x0000000008438ee3 in audStatement ()&lt;/span&gt;&lt;br /&gt;#4  0x00000000083fc299 in opiexe ()&lt;br /&gt;#5  0x0000000008405f57 in opipls ()&lt;br /&gt;#6  0x00000000083f78b9 in opiodr ()&lt;br /&gt;#7  0x00000000084892af in __PGOSF141_rpidrus ()&lt;br /&gt;#8  0x00000000085ee820 in skgmstack ()&lt;br /&gt;#9  0x000000000848a759 in rpiswu2 ()&lt;br /&gt;#10 0x0000000008489e9d in rpidrv ()&lt;br /&gt;#11 0x0000000008476438 in psddr0 ()&lt;br /&gt;#12 0x0000000008476048 in psdnal ()&lt;br /&gt;#13 0x000000000288b3aa in pevm_BFTCHC ()&lt;br /&gt;#14 0x000000000287ab84 in pfrinstr_BFTCHC ()&lt;br /&gt;#15 0x0000000008645105 in pfrrun_no_tool ()&lt;br /&gt;#16 0x00000000086439ea in pfrrun ()&lt;br /&gt;#17 0x000000000864a7d6 in plsql_run ()&lt;br /&gt;#18 0x0000000002870d07 in peicnt ()&lt;br /&gt;#19 0x0000000002867649 in kkxexe ()&lt;br /&gt;#20 0x00000000083fe337 in opiexe ()&lt;br /&gt;#21 0x00000000019da690 in kpoal8 ()&lt;br /&gt;#22 0x00000000083f78b9 in opiodr ()&lt;br /&gt;#23 0x0000000008583187 in ttcpip ()&lt;br /&gt;#24 0x0000000001702661 in opitsk ()&lt;br /&gt;#25 0x0000000001707282 in opiino ()&lt;br /&gt;#26 0x00000000083f78b9 in opiodr ()&lt;br /&gt;#27 0x00000000016fe708 in opidrv ()&lt;br /&gt;#28 0x0000000001b7183f in sou2o ()&lt;br /&gt;#29 0x00000000009d3f05 in opimai_real ()&lt;br /&gt;#30 0x0000000001b76ace in ssthrdmain ()&lt;br /&gt;#31 0x00000000009d3e71 in main ()&lt;/pre&gt;Look at the lines #2 and #3. Are these some auditing functions getting in the way? I verified &lt;span style="font-style:italic;"&gt;audit_trail&lt;/span&gt; parameter which turned out to be sat to &lt;span style="font-style:italic;"&gt;DB&lt;/span&gt;:&lt;br /&gt;&lt;pre&gt;SQL&gt; show parameter audit_trail&lt;br /&gt;&lt;br /&gt;NAME                                 TYPE        VALUE&lt;br /&gt;------------------------------------ ----------- ------------------------------&lt;br /&gt;audit_trail                          string      DB&lt;/pre&gt;&lt;br /&gt;Though I didn't have any auditing enabled for any of the statements let alone the table I was selecting from. But after I've confirmed that &lt;span style="font-style:italic;"&gt;audit_trail&lt;/span&gt; was set to &lt;span style="font-style:italic;"&gt;NONE&lt;/span&gt; on 11GR1 database, disabling audit completely looked like something worth trying.&lt;br /&gt;&lt;br /&gt;After I've repeated the test with &lt;span style="font-style:italic;"&gt;audit_trail&lt;/span&gt; sat to &lt;span style="font-style:italic;"&gt;NONE&lt;/span&gt;, I end up getting virtually identical results compared to 11GR1:&lt;br /&gt;&lt;pre&gt;Connected to:&lt;br /&gt;Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production&lt;br /&gt;With the Partitioning, OLAP, Data Mining and Real Application Testing options&lt;br /&gt;&lt;br /&gt;SQL&gt; set timing on&lt;br /&gt;SQL&gt; begin&lt;br /&gt;        for i in 1 .. 1000000&lt;br /&gt;        loop&lt;br /&gt;                for cur in (select n from z_t where n=mod(i,10000)+1)&lt;br /&gt;                loop&lt;br /&gt;                        null;&lt;br /&gt;                end loop;&lt;br /&gt;        end loop;&lt;br /&gt;end;  2    3    4    5    6    7    8    9&lt;br /&gt; 10  /&lt;br /&gt;&lt;br /&gt;PL/SQL procedure successfully completed.&lt;br /&gt;&lt;br /&gt;Elapsed: 00:00:32.42&lt;/pre&gt;...and two audit functions has disappeared from the stack dump as well.&lt;br /&gt;&lt;br /&gt;It looks like the impact of having auditing enabled end up being high enough to draw some attention in this particular case even though it wasn't sat to capture anything about our test. On a side note this can be considered an invitation to instrument some auditing functionality as well.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15440695-6202432242512442950?l=afatkulin.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://afatkulin.blogspot.com/2010/05/case-of-slow-lookup.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>1</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-26927365407664417</guid><pubDate>Tue, 11 May 2010 23:35:00 +0000</pubDate><atom:updated>2010-05-12T18:46:03.959-04:00</atom:updated><title>Timing improvements in Oracle 11GR2 trace</title><description>I've been doing some research recently (I might blog about it a bit later if something worthwhile comes out of it) which resulted in a byproduct I found interesting to blog about.&lt;br /&gt;&lt;br /&gt;In a nutshell, in one of the steps I've run the following PL/SQL block to trace the results of a very tight loop:&lt;br /&gt;&lt;pre&gt;begin&lt;br /&gt; dbms_monitor.session_trace_enable(waits =&gt; false, binds =&gt; false);&lt;br /&gt; for i in 1 .. 100000&lt;br /&gt; loop&lt;br /&gt;  for cur in (select null from dual)&lt;br /&gt;  loop&lt;br /&gt;   null;&lt;br /&gt;  end loop;&lt;br /&gt; end loop;&lt;br /&gt; dbms_monitor.session_trace_disable;&lt;br /&gt;end;&lt;/pre&gt;&lt;br /&gt;While comparing raw trace files between 11.1.0.7 and 11.2.0.1 I've noticed something interesting. Here is an excerpt from 11.1.0.7 raw trace:&lt;br /&gt;&lt;pre&gt;PARSING IN CURSOR #2 len=21 dep=1 uid=54 oct=3 lid=54 tim=1273621434711518&lt;br /&gt; hv=2339989301 ad='86044a88' sqlid='bg2ng0u5rkttp'&lt;br /&gt;SELECT NULL FROM DUAL&lt;br /&gt;END OF STMT&lt;br /&gt;EXEC #2:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=1388734953,tim=1273621434711518&lt;br /&gt;FETCH #2:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=1,dep=1,og=1,plh=1388734953,tim=1273621434711518&lt;br /&gt;CLOSE #2:c=0,e=0,dep=1,type=3,tim=1273621434711518&lt;/pre&gt;Notice how &lt;span style="font-style:italic;"&gt;tim=&lt;/span&gt; values are the same across parse, exec, fetch and close; and &lt;span style="font-style:italic;"&gt;e=&lt;/span&gt; values are zero for fetch, close and execute.&lt;br /&gt;&lt;br /&gt;Now take a look at a similar excerpt from 11GR2:&lt;br /&gt;&lt;pre&gt;PARSING IN CURSOR #3 len=21 dep=1 uid=57 oct=3 lid=57 tim=1273621440328909&lt;br /&gt; hv=2339989301 ad='844d5a20' sqlid='bg2ng0u5rkttp'&lt;br /&gt;SELECT NULL FROM DUAL&lt;br /&gt;END OF STMT&lt;br /&gt;EXEC #3:c=0,e=9,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=1388734953,tim=1273621440328908&lt;br /&gt;FETCH #3:c=0,e=6,p=0,cr=0,cu=0,mis=0,r=1,dep=1,og=1,plh=1388734953,tim=1273621440328947&lt;br /&gt;CLOSE #3:c=0,e=1,dep=1,type=3,tim=1273621440328968&lt;/pre&gt;Notice how &lt;span style="font-style:italic;"&gt;tim=&lt;/span&gt; value changes from line to line and &lt;span style="font-style:italic;"&gt;e=&lt;/span&gt; has a corresponding value associated with it.&lt;br /&gt;&lt;br /&gt;Before we move on you may ask how 11.1.0.7 calculates the time if all the values appear to be zero? The answer is that every so often you'll see a line like that:&lt;br /&gt;&lt;pre&gt;FEXEC #5:c=1000,e=20960,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=1388734953,tim=1273621434873334&lt;/pre&gt;Sure we didn't spent that much time fetching a single row. This value comes from the fact that we've just dumped all the time associated with multiple executions in this single line.&lt;br /&gt;&lt;br /&gt;Now for 11GR2 to calculate the time with that additional precision will require a heck more &lt;span style="font-style:italic;"&gt;times&lt;/span&gt; syscalls. To confirm this I've ran &lt;span style="font-style:italic;"&gt;strace&lt;/span&gt; for processes in both 11GR1 and 11GR2 and here is what I came up with:&lt;br /&gt;&lt;br /&gt;11GR1&lt;pre&gt;[oracle@ora11gr1 tmp]$ grep -c times 4279.out&lt;br /&gt;116&lt;/pre&gt;11GR2&lt;pre&gt;[oracle@ora11gr2 tmp]$ grep -c times 5318.out&lt;br /&gt;200149&lt;/pre&gt;Indeed, 11GR2 end up calling &lt;span style="font-style:italic;"&gt;times&lt;/span&gt; twice per iteration while 11GR1 was doing one call per about thousand iterations. In total 11GR2 end up doing roughly 200K more syscalls. Next I became curious about how much overhead all these additional &lt;span style="font-style:italic;"&gt;syscalls&lt;/span&gt; introduce and in order to find out I've run a small program:&lt;br /&gt;&lt;pre&gt;[oracle@ora11gr2 tmp]$ cat do_times.c&lt;br /&gt;#include &lt;sys/times.h&gt;&lt;br /&gt;&lt;br /&gt;int main()&lt;br /&gt;{&lt;br /&gt;        int i;&lt;br /&gt;        struct tms t;&lt;br /&gt;        clock_t c;&lt;br /&gt;        for (i=1;i&lt;=200000;i++)&lt;br /&gt;        {&lt;br /&gt;                c=times(&amp;t);&lt;br /&gt;        }&lt;br /&gt;&lt;br /&gt;        return 0;&lt;br /&gt;}&lt;br /&gt;[oracle@ora11gr2 tmp]$ gcc do_times.c&lt;br /&gt;[oracle@ora11gr2 tmp]$ time ./a.out&lt;br /&gt;&lt;br /&gt;real    0m0.096s&lt;br /&gt;user    0m0.037s&lt;br /&gt;sys     0m0.059s&lt;/pre&gt;We're talking about 0.01 second for 200K &lt;span style="font-style:italic;"&gt;times&lt;/span&gt; syscalls, in other words - not much.&lt;br /&gt;&lt;br /&gt;It's definitely nice to see all these small improvements being done.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15440695-26927365407664417?l=afatkulin.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://afatkulin.blogspot.com/2010/05/timing-improvements-in-oracle-11gr2.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-2584198534657294</guid><pubDate>Thu, 06 Aug 2009 23:20:00 +0000</pubDate><atom:updated>2009-08-06T21:47:41.308-04:00</atom:updated><title>How to install Oracle Grid Control Agents on a Windows failover cluster with no downtime</title><description>Metalink Note:464191.1 describes steps required to configure Oracle Grid Control agent in Windows failover cluster environment. Unfortunately, as part of the configuration, the cluster disk containing virtual agent's state information have to be moved to the node where the agent is being deployed.&lt;br /&gt;&lt;br /&gt;The agent state directory has to "follow" virtual agent when a failover occurs, hence the requirement for it to be on a cluster disk resource. And since the cluster disk resource is visible on the active node only, you can not deploy virtual agent on any of the passive nodes without moving the group containing disk with state information first.&lt;br /&gt;&lt;br /&gt;The above is not a big deal when you're doing install on a brand new or development system but what if you have to deal with a production cluster where any potential downtime that might be associated with moving the group across the nodes would better be avoided?&lt;br /&gt;&lt;br /&gt;Let's say you have an Oracle FailSafe configuration and you intend to use Oracle Grid Control to monitor your Oracle database. In this case your virtual agent will be a part of the same cluster group where your Oracle database is. Failing over your database across all the nodes for the sake of deploying a virtual agent may not necessarily be what you want to do.&lt;br /&gt;&lt;br /&gt;Of course, the easy workaround is to add another disk (LUN) to a cluster, use it to deploy the agents and, once the deployment has been done, add it to the same group where your database is. But what if you do not have any spare disks and have to share the same cluster disk with your Oracle database?&lt;br /&gt;&lt;br /&gt;I gave this problem a bit of research and, as it turned out, there is a really simple workaround which may come in handy in case you'll be faced with the same problem.&lt;br /&gt;&lt;br /&gt;I'll use the following configuration as an example:&lt;br /&gt;&lt;pre&gt;ORA01A -- first (active) node.&lt;br /&gt;ORA01B -- second (passive) node.&lt;br /&gt;ORA01V -- Oracle Database VIP.&lt;br /&gt;c:\oracle\product\10.2.0\agent10g -- Oracle Grid Control agent home.&lt;/pre&gt;&lt;br /&gt;Let's say that each system has a local drive C: and the deployment will be done on a cluster drive D:.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Deploy virtual agent on the active node&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;This is where you follow exactly what Metalink note says you to do:&lt;br /&gt;&lt;pre&gt;C:\&gt;emctl deploy agent -n OracleAgentORA01V d:\agent10g ORA01V:1830 ORA01A:1830&lt;br /&gt;Creating shared install...&lt;br /&gt;Source location: C:\oracle\product\10.2.0\agent10g&lt;br /&gt;Destination (shared install) : d:\agent10g&lt;br /&gt;DeployMode : agent&lt;br /&gt;&lt;br /&gt;Creating directories...&lt;br /&gt;Creating targets.xml...&lt;br /&gt;Creating emctl control program...&lt;br /&gt;Creating emtgtctl control program...&lt;br /&gt;Setting log and trace files locations for Agent ...&lt;br /&gt;Secure agent found. New agent should be configured for secure mode&lt;br /&gt;&lt;br /&gt;Source Agent operating in secure mode.&lt;br /&gt;Run "d:\agent10g/bin/emctl secure agent" to secure agent&lt;br /&gt;Service "OracleAgentORA01V" create SUCCESS&lt;/pre&gt;The above will create a virtual agent service named OracleAgentORA01V which will be "bound" to ORA01V virtual IP and use d:\agent10g as a location for virtual agent's state files. Note that I'm using port 1830 since port 3872 is used by a "real" agent. You can specify &lt;i&gt;AgentListenOnAllNICs=FALSE&lt;/i&gt; in your &lt;i&gt;emd.properties&lt;/i&gt; file (for all agents in the cluster) if you want virtual and real agents share the same port as this will stop agents from trying to listen on all network adapters on the node.&lt;br /&gt;&lt;br /&gt;Secure the agent in case your OMS is running in the secure mode:&lt;br /&gt;&lt;pre&gt;C:\&gt;d:\agent10g/bin/emctl secure agent&lt;br /&gt;Oracle Enterprise Manager 10g Release 5 Grid Control 10.2.0.5.0.&lt;br /&gt;Copyright (c) 1996, 2009 Oracle Corporation.  All rights reserved.&lt;br /&gt;Agent is already stopped...   Done.&lt;br /&gt;Securing agent...   Started.&lt;br /&gt;Enter Agent Registration Password :&lt;br /&gt;Securing agent...   Successful.&lt;/pre&gt;&lt;b&gt;Deploy virtual agent on the passive node&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;The same deployment command won't work on a passive node simply because drive D: is not there. As a workaround which will allow us to create a virtual agent service on the passive node we will use a local drive C: for initial deployment:&lt;br /&gt;&lt;pre&gt;C:\&gt;emctl deploy agent -n OracleAgentORA01V c:\agent10g ORA01V:1830 ORA01B:1830&lt;br /&gt;Creating shared install...&lt;br /&gt;Source location: C:\oracle\product\10.2.0\agent10g&lt;br /&gt;Destination (shared install) : c:\agent10g&lt;br /&gt;DeployMode : agent&lt;br /&gt;&lt;br /&gt;Creating directories...&lt;br /&gt;Creating targets.xml...&lt;br /&gt;Creating emctl control program...&lt;br /&gt;Creating emtgtctl control program...&lt;br /&gt;Setting log and trace files locations for Agent ...&lt;br /&gt;Secure agent found. New agent should be configured for secure mode&lt;br /&gt;&lt;br /&gt;Source Agent operating in secure mode.&lt;br /&gt;Run "c:\agent10g/bin/emctl secure agent" to secure agent&lt;br /&gt;Service "OracleAgentORA01V" create SUCCESS&lt;/pre&gt;However, this is not what we want as all virtual agents should be sharing the same cluster drive D: instead. To fix the location of the agent state directory, launch &lt;i&gt;regedit.exe&lt;/i&gt; and navigate to &lt;pre&gt;HKLM\SOFTWARE\ORACLE\SYSMAN\OracleAgentORA01V&lt;/pre&gt; registry key. Under that key you'll find &lt;i&gt;EMSTATE&lt;/i&gt; field with &lt;i&gt;c:\agent10g&lt;/i&gt; as its value. Modify this value to be &lt;i&gt;d:\agent10g&lt;/i&gt; instead. You can remove original folder as well.&lt;br /&gt;&lt;br /&gt;Done! I found that this virtual agent will be fully operational once the group failovers to the passive node (don't forget to create a cluster resource for a virtual agent) and will be using shared state directory.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15440695-2584198534657294?l=afatkulin.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://afatkulin.blogspot.com/2009/08/how-to-install-oracle-grid-control.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>2</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-3803363569469556426</guid><pubDate>Sun, 08 Feb 2009 04:40:00 +0000</pubDate><atom:updated>2009-02-08T00:59:14.335-05:00</atom:updated><title>Consistent gets from cache (fastpath) 2</title><description>Not so long time ago I wrote an article about interesting optimization in 11G which appears as &lt;a href="http://afatkulin.blogspot.com/2009/01/consistent-gets-from-cache-fastpath.html"&gt;consistent gets from cache (fastpath)&lt;/a&gt;. One thing I've pointed there is that this optimization can operate only if we're accessing the same block over and over again. This may bring us to some interesting observations how certain type of queries are behaving in 11G.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;HASH GROUP BY vs SORT GROUP BY&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Consider the following example:&lt;br /&gt;&lt;pre&gt;SQL&gt; create table dept&lt;br /&gt;  2  (&lt;br /&gt;  3   dept_id  number primary key,&lt;br /&gt;  4   dept_name varchar2(100)&lt;br /&gt;  5  ) organization index;&lt;br /&gt; &lt;br /&gt;Table created&lt;br /&gt; &lt;br /&gt;SQL&gt; insert /*+ append */ into dept&lt;br /&gt;  2   select level, dbms_random.string('x', 100)&lt;br /&gt;  3    from dual&lt;br /&gt;  4    connect by level &lt;= 10000;&lt;br /&gt; &lt;br /&gt;10000 rows inserted&lt;br /&gt; &lt;br /&gt;SQL&gt; create table emp&lt;br /&gt;  2  (&lt;br /&gt;  3   emp_id  number primary key,&lt;br /&gt;  4   dept_id  references dept (dept_id),&lt;br /&gt;  5   emp_name varchar2(100)&lt;br /&gt;  6  );&lt;br /&gt; &lt;br /&gt;Table created&lt;br /&gt; &lt;br /&gt;SQL&gt; insert /*+ append */ into emp&lt;br /&gt;  2   select level, trunc(dbms_random.value(1, 10000)), dbms_random.string('x', 100)&lt;br /&gt;  3    from dual&lt;br /&gt;  4    connect by level &lt;= 100000;&lt;br /&gt; &lt;br /&gt;100000 rows inserted&lt;br /&gt; &lt;br /&gt;SQL&gt; commit;&lt;br /&gt; &lt;br /&gt;Commit complete&lt;br /&gt;&lt;br /&gt;SQL&gt; exec dbms_stats.gather_table_stats(user, 'dept');&lt;br /&gt; &lt;br /&gt;PL/SQL procedure successfully completed&lt;br /&gt; &lt;br /&gt;SQL&gt; exec dbms_stats.gather_table_stats(user, 'emp');&lt;br /&gt; &lt;br /&gt;PL/SQL procedure successfully completed&lt;/pre&gt;Let's say we want to output department names along with how many employees are there:&lt;br /&gt;&lt;pre&gt;SQL&gt; select /*+ gather_plan_statistics */ count(*)&lt;br /&gt;  2  from (&lt;br /&gt;  3   select /*+ no_merge */ d.dept_name, count(*) cnt&lt;br /&gt;  4    from emp e, dept d&lt;br /&gt;  5    where e.dept_id=d.dept_id&lt;br /&gt;  6    group by d.dept_name&lt;br /&gt;  7  );&lt;br /&gt;&lt;br /&gt;  COUNT(*)&lt;br /&gt;----------&lt;br /&gt;      9999&lt;br /&gt;&lt;br /&gt;SQL&gt; select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));&lt;br /&gt;&lt;br /&gt;PLAN_TABLE_OUTPUT&lt;br /&gt;----------------------------------------------------------------------------------------------------------------------------------&lt;br /&gt;SQL_ID  djysxbcmwwxj3, child number 0&lt;br /&gt;-------------------------------------&lt;br /&gt;select /*+ gather_plan_statistics */ count(*) from (  select /*+&lt;br /&gt;no_merge */ d.dept_name, count(*) cnt   from emp e, dept d   where&lt;br /&gt;e.dept_id=d.dept_id   group by d.dept_name )&lt;br /&gt;&lt;br /&gt;Plan hash value: 1432452646&lt;br /&gt;&lt;br /&gt;----------------------------------------------------------------------------------------------------------------------------------&lt;br /&gt;| Id  | Operation               | Name              | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |&lt;br /&gt;----------------------------------------------------------------------------------------------------------------------------------&lt;br /&gt;&lt;br /&gt;PLAN_TABLE_OUTPUT&lt;br /&gt;----------------------------------------------------------------------------------------------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT        |                   |      1 |        |      1 |00:00:00.08 |   11636 |       |       |          |&lt;br /&gt;|   1 |  SORT AGGREGATE         |                   |      1 |      1 |      1 |00:00:00.08 |   11636 |       |       |          |&lt;br /&gt;|   2 |   VIEW                  |                   |      1 |   9999 |   9999 |00:00:00.06 |   11636 |       |       |          |&lt;br /&gt;|   3 |    HASH GROUP BY        |                   |      1 |   9999 |   9999 |00:00:00.06 |   11636 |  2058K|   999K| 2497K (0)|&lt;br /&gt;|   4 |     NESTED LOOPS        |                   |      1 |   9999 |   9999 |00:00:00.02 |   11636 |       |       |          |&lt;br /&gt;|   5 |      VIEW               | VW_GBC_10         |      1 |   9999 |   9999 |00:00:00.02 |    1635 |       |       |          |&lt;br /&gt;|   6 |       HASH GROUP BY     |                   |      1 |   9999 |   9999 |00:00:00.02 |    1635 |  1207K|  1207K| 2496K (0)|&lt;br /&gt;|   7 |        TABLE ACCESS FULL| EMP               |      1 |    100K|    100K|00:00:00.01 |    1635 |       |       |          |&lt;br /&gt;|*  8 |      INDEX UNIQUE SCAN  | SYS_IOT_TOP_15648 |   9999 |      1 |   9999 |00:00:00.01 |   10001 |       |       |          |&lt;br /&gt;----------------------------------------------------------------------------------------------------------------------------------&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;PLAN_TABLE_OUTPUT&lt;br /&gt;----------------------------------------------------------------------------------------------------------------------------------&lt;br /&gt;Predicate Information (identified by operation id):&lt;br /&gt;---------------------------------------------------&lt;br /&gt;&lt;br /&gt;   8 - access("ITEM_1"="D"."DEPT_ID")&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;27 rows selected.&lt;/pre&gt;Note that 11G did "Group by Placement" automatically (Jonathan Lewis wrote an &lt;a href="http://jonathanlewis.wordpress.com/2008/12/21/group-by/"&gt;article&lt;/a&gt; about it) and our query performed 11636 consistent gets.&lt;br /&gt;&lt;br /&gt;Now, take a look at what happens if we rewrite the query to use a &lt;span style="font-style:italic;"&gt;sort group by&lt;/span&gt; instead:&lt;br /&gt;&lt;pre&gt;SQL&gt; select /*+ gather_plan_statistics */ count(*)&lt;br /&gt;  2  from (&lt;br /&gt;  3   with e as&lt;br /&gt;  4   (&lt;br /&gt;  5    select dept_id, count(*) cnt&lt;br /&gt;  6     from emp e&lt;br /&gt;  7     group by dept_id&lt;br /&gt;  8     order by dept_id&lt;br /&gt;  9   )&lt;br /&gt; 10   select /*+ no_merge */ d.dept_name, e.cnt&lt;br /&gt; 11    from e, dept d&lt;br /&gt; 12    where e.dept_id=d.dept_id&lt;br /&gt; 13  );&lt;br /&gt;&lt;br /&gt;  COUNT(*)&lt;br /&gt;----------&lt;br /&gt;      9999&lt;br /&gt;&lt;br /&gt;SQL&gt; select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));&lt;br /&gt;&lt;br /&gt;PLAN_TABLE_OUTPUT&lt;br /&gt;---------------------------------------------------------------------------------------------------------------------------------&lt;br /&gt;SQL_ID  2utq5vammnwa4, child number 0&lt;br /&gt;-------------------------------------&lt;br /&gt;select /*+ gather_plan_statistics */ count(*) from (  with e as  (&lt;br /&gt;select dept_id, count(*) cnt    from emp e    group by dept_id    order&lt;br /&gt;by dept_id  )  select /*+ no_merge */ d.dept_name, e.cnt   from e, dept&lt;br /&gt;d   where e.dept_id=d.dept_id )&lt;br /&gt;&lt;br /&gt;Plan hash value: 2732217545&lt;br /&gt;&lt;br /&gt;---------------------------------------------------------------------------------------------------------------------------------&lt;br /&gt;| Id  | Operation              | Name              | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |&lt;br /&gt;&lt;br /&gt;PLAN_TABLE_OUTPUT&lt;br /&gt;---------------------------------------------------------------------------------------------------------------------------------&lt;br /&gt;---------------------------------------------------------------------------------------------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT       |                   |      1 |        |      1 |00:00:00.10 |    2624 |       |       |          |&lt;br /&gt;|   1 |  SORT AGGREGATE        |                   |      1 |      1 |      1 |00:00:00.10 |    2624 |       |       |          |&lt;br /&gt;|   2 |   VIEW                 |                   |      1 |   9999 |   9999 |00:00:00.08 |    2624 |       |       |          |&lt;br /&gt;|   3 |    NESTED LOOPS        |                   |      1 |   9999 |   9999 |00:00:00.08 |    2624 |       |       |          |&lt;br /&gt;|   4 |     VIEW               |                   |      1 |   9999 |   9999 |00:00:00.08 |    1635 |       |       |          |&lt;br /&gt;|   5 |      SORT GROUP BY     |                   |      1 |   9999 |   9999 |00:00:00.08 |    1635 |   549K|   549K|  487K (0)|&lt;br /&gt;|   6 |       TABLE ACCESS FULL| EMP               |      1 |    100K|    100K|00:00:00.01 |    1635 |       |       |          |&lt;br /&gt;|*  7 |     INDEX UNIQUE SCAN  | SYS_IOT_TOP_15648 |   9999 |      1 |   9999 |00:00:00.01 |     989 |       |       |          |&lt;br /&gt;---------------------------------------------------------------------------------------------------------------------------------&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;PLAN_TABLE_OUTPUT&lt;br /&gt;---------------------------------------------------------------------------------------------------------------------------------&lt;br /&gt;Predicate Information (identified by operation id):&lt;br /&gt;---------------------------------------------------&lt;br /&gt;&lt;br /&gt;   7 - access("E"."DEPT_ID"="D"."DEPT_ID")&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;27 rows selected.&lt;/pre&gt;What I did there is performed the same transformation myself but replaced &lt;span style="font-style:italic;"&gt;hash group by&lt;/span&gt; with a &lt;span style="font-style:italic;"&gt;sort group by&lt;/span&gt;. Note more than 4 times decrease in the amount of consistent gets. By looking at the plan statistics, you can tell why. Although both group by's performed the same amount of consistent gets (1635), it is a nested loops join with &lt;span style="font-style:italic;"&gt;dep&lt;/span&gt; which produced all the difference -- 10001 for &lt;span style="font-style:italic;"&gt;hash group by&lt;/span&gt; versus only 989 for a &lt;span style="font-style:italic;"&gt;sort group by&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;Unless you've heard about &lt;span style="font-style:italic;"&gt;consistent gets from cache (fastpath)&lt;/span&gt; optimization, the above results may produce quite a bit of surprise for you. Note that due to a sorting, which has to be performed by our second query, the first query still performs better. However, in the environments which are wreaking havoc on CBC latches, the additional savings on number of consistent gets may have a potential to alleviate additional sorting expenses. In other words, the lesser are expenses for outer resultset sorting, the more appealing this could be.&lt;br /&gt;&lt;br /&gt;Of course, before you even consider this as an optimization opportunity, keep in mind that you're relying on a specific feature which may narrow you down to a specific dot releases (or even patches) as it may change (or even completely disappear) in the next versions.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15440695-3803363569469556426?l=afatkulin.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://afatkulin.blogspot.com/2009/02/consistent-gets-from-cache-fastpath-2.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-8597647665912053627</guid><pubDate>Sun, 01 Feb 2009 20:18:00 +0000</pubDate><atom:updated>2009-02-02T04:50:07.454-05:00</atom:updated><title>AE enqueue</title><description>Just a quick follow up from my &lt;a href="http://afatkulin.blogspot.com/2009/01/edition-based-redifinition-speculation.html"&gt;previous&lt;/a&gt; post.&lt;br /&gt;&lt;br /&gt;The relation of AE enqueue to editions was mentioned a couple of times around the internet already. Any user session connected to a database holds AE enqueue in a shared mode...&lt;br /&gt;&lt;pre&gt;SQL&gt; select type, id1, lmode, sys_context('userenv', 'current_edition_id') edition_id&lt;br /&gt;  2   from v$lock&lt;br /&gt;  3   where type='AE'&lt;br /&gt;  4    and sid=sys_context('userenv', 'sid');&lt;br /&gt; &lt;br /&gt;TYPE        ID1      LMODE EDITION_ID&lt;br /&gt;---- ---------- ---------- --------------------------------------------------------------------------------&lt;br /&gt;AE          100          4 100&lt;/pre&gt;...and the first argument seems to be session's &lt;span style="font-style:italic;"&gt;current_edition_id&lt;/span&gt;. I guess the lock mode will require an upgrade to exclusive mode during edition alterations.&lt;br /&gt;&lt;br /&gt;Here is another interesting thing -- it looks like installing 11.1.0.7 patchset increments the &lt;span style="font-style:italic;"&gt;current_edition_id&lt;/span&gt;. The value in 11.1.0.6 seems to be 99, but 11.1.0.7 changes it to 100 (you can observe it in &lt;span style="font-style:italic;"&gt;sys.editon$&lt;/span&gt; table as well). Does that mean that Oracle has any plans in doing patchset installation through edition-based redefinition (install the patchset online, short downtime is required only during switch to an upgraded edition) or is it simply a way to represent version change?&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Updated the same day:&lt;/span&gt; it looks like in case your database was &lt;span style="font-style:italic;"&gt;upgraded&lt;/span&gt; from a previous release, &lt;span style="font-style:italic;"&gt;edition_id&lt;/span&gt; for &lt;span style="font-style:italic;"&gt;ORA$BASE&lt;/span&gt; will be some other number as it represents &lt;span style="font-style:italic;"&gt;ORA$BASE's object_id&lt;/span&gt;. This also means that my initial assumption about patchset installation changing &lt;span style="font-style:italic;"&gt;edition_id&lt;/span&gt; is not correct as it is just whatever &lt;span style="font-style:italic;"&gt;object_id&lt;/span&gt; is being available at the time. Before &lt;span style="font-style:italic;"&gt;ORA$BASE&lt;/span&gt; edition is being created, 11.1.0.7 creates one more object (compared to 11.1.0.6), index &lt;span style="font-style:italic;"&gt;I_SYN2&lt;/span&gt;, which explains advance in edition_id.&lt;br /&gt;&lt;br /&gt;However, the idea seems to be interesting anyway...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15440695-8597647665912053627?l=afatkulin.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://afatkulin.blogspot.com/2009/02/ae-enqueue.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-1926758158673415212</guid><pubDate>Sat, 31 Jan 2009 22:44:00 +0000</pubDate><atom:updated>2009-02-01T03:02:54.962-05:00</atom:updated><title>Edition-based redifinition (a speculation)</title><description>As you can see from &lt;a href="http://download.oracle.com/docs/cd/B28359_01/readmes.111/b28280/toc.htm#BABGIGDC"&gt;this link&lt;/a&gt;, there is a feature called &lt;span style="font-style:italic;"&gt;Edition-based redefinition&lt;/span&gt; mentioned in the 11GR1 documentation. Unfortunately, all it says is that this feature is unavailable as of now so there is really not much you can tell about it.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;A speculation&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I'm usually not a great fun of doing any sort of a guesswork, however, in this case it might be an interesting exercise to observe some bits and pieces presented in the current release of 11GR1 in order to see whether it can give us some clues as to what expect from this new feature as well as what the potential underpinning might look like. As well as prepare yourself for some implications...&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;SYS.OBJ$&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Perhaps this will be the first thing what you'll notice as soon as 11G's data dictionary is concerned. For example, this is how &lt;span style="font-style:italic;"&gt;dba_synonyms&lt;/span&gt; view is defined in 10GR2:&lt;br /&gt;&lt;pre&gt;create or replace view dba_synonyms&lt;br /&gt;(owner, synonym_name, table_owner, table_name, db_link)&lt;br /&gt;as&lt;br /&gt;select u.name, o.name, s.owner, s.name, s.node&lt;br /&gt;from sys.user$ u, sys.syn$ s, sys.obj$ o&lt;br /&gt;where o.obj# = s.obj#&lt;br /&gt;  and o.type# = 5&lt;br /&gt;  and o.owner# = u.user#;&lt;/pre&gt;Now, take a look at the same view's definition in 11GR1:&lt;br /&gt;&lt;pre&gt;create or replace view dba_synonyms&lt;br /&gt;(owner, synonym_name, table_owner, table_name, db_link)&lt;br /&gt;as&lt;br /&gt;select u.name, o.name, s.owner, s.name, s.node&lt;br /&gt;from sys.user$ u, sys.syn$ s, &lt;font color="blue"&gt;sys."_CURRENT_EDITION_OBJ"&lt;/font&gt; o&lt;br /&gt;where o.obj# = s.obj#&lt;br /&gt;  and o.type# = 5&lt;br /&gt;  and o.owner# = u.user#;&lt;/pre&gt;For you see, the reference to &lt;span style="font-style:italic;"&gt;SYS.OBJ$&lt;/span&gt; was replaced with &lt;span style="font-style:italic;"&gt;SYS."_CURRENT_EDITION_OBJ"&lt;/span&gt;. This replacement occurs all around the place in the data dictionary which alone makes it interesting enough to attract a bit of attention.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;SYS._CURRENT_EDITION_OBJ&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;What is &lt;span style="font-style:italic;"&gt;_CURRENT_EDITION_OBJ&lt;/span&gt;? It's a view:&lt;br /&gt;&lt;pre&gt;SQL&gt; select object_type&lt;br /&gt;  2   from dba_objects&lt;br /&gt;  3   where object_name='_CURRENT_EDITION_OBJ';&lt;br /&gt; &lt;br /&gt;OBJECT_TYPE&lt;br /&gt;-------------------&lt;br /&gt;VIEW&lt;/pre&gt;Let's take a look at this view's definition (I've omitted fields list for the sake of clarity):&lt;br /&gt;&lt;pre&gt;select ...&lt;br /&gt;from obj$ o, user$ u&lt;br /&gt;where o.owner# = u.user#&lt;br /&gt;  and (   /* non-versionable object */&lt;br /&gt;          (   o.type# not in (4,5,7,8,9,10,11,12,13,14,22,87,88)&lt;br /&gt;           or bitand(u.spare1, 16) = 0)&lt;br /&gt;          /* versionable object visible in current edition */&lt;br /&gt;       or (    o.type# in (4,5,7,8,9,10,11,12,13,14,22,87)&lt;br /&gt;           and (   (u.type# &lt;&gt; 2 and&lt;br /&gt;                    sys_context('userenv', 'current_edition_name') = 'ORA$BASE')&lt;br /&gt;                or (u.type# = 2 and&lt;br /&gt;                    u.spare2 = sys_context('userenv', 'current_edition_id'))&lt;br /&gt;                or exists (select 1 from obj$ o2, user$ u2&lt;br /&gt;                           where o2.type# = 88&lt;br /&gt;                             and o2.dataobj# = o.obj#&lt;br /&gt;                             and o2.owner# = u2.user#&lt;br /&gt;                             and u2.type#  = 2&lt;br /&gt;                             and u2.spare2 =&lt;br /&gt;                                  sys_context('userenv', 'current_edition_id'))&lt;br /&gt;               )&lt;br /&gt;          )&lt;br /&gt;      );&lt;/pre&gt;First of all, it mentions about (A) versionable objects and (B) current edition which is referenced through session's context &lt;span style="font-style:italic;"&gt;sys_context('userenv', 'current_edition_id')&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Versionable objects&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Only the objects of the following types seems to be versionable:&lt;br /&gt;&lt;ul&gt;&lt;li/&gt;VIEW&lt;br /&gt;&lt;li/&gt;SYNONYM&lt;br /&gt;&lt;li/&gt;PROCEDURE&lt;br /&gt;&lt;li/&gt;FUNCTION&lt;br /&gt;&lt;li/&gt;PACKAGE&lt;br /&gt;&lt;li/&gt;PACKAGE BODY&lt;br /&gt;&lt;li/&gt;TRIGGER&lt;br /&gt;&lt;li/&gt;TYPE&lt;br /&gt;&lt;li/&gt;TYPE BODY&lt;br /&gt;&lt;li/&gt;LIBRARY&lt;br /&gt;&lt;li/&gt;ASSEMBLY&lt;/ul&gt;Object &lt;span style="font-style:italic;"&gt;type# = 88&lt;/span&gt; seems to be something special as I couldn't find what it is from &lt;span style="font-style:italic;"&gt;sql.bsq&lt;/span&gt; (nor d&lt;span style="font-style:italic;"&gt;ba_objects&lt;/span&gt;) and it is being referenced using somewhat special way: if &lt;span style="font-style:italic;"&gt;obj#&lt;/span&gt; matches &lt;span style="font-style:italic;"&gt;dataobj#&lt;/span&gt; of some entry with &lt;span style="font-style:italic;"&gt;type# = 88&lt;/span&gt; and this entry is part if the current edition, the object is considered to be a part of the current edition as well. Note that the join itself permits this special object to exist in some other schema, though I don't know whether this has any practical meaning or not.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;New userenv attributes&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;There are two (at least) new userenv context attributes:&lt;br /&gt;&lt;pre&gt;SQL&gt; select sys_context('userenv', 'current_edition_name')&lt;br /&gt;  2   current_edition_name from dual;&lt;br /&gt; &lt;br /&gt;CURRENT_EDITION_NAME&lt;br /&gt;--------------------------------------------------------------------------------&lt;br /&gt;ORA$BASE&lt;br /&gt;SQL&gt; select sys_context('userenv', 'current_edition_id')&lt;br /&gt;  2   current_edition_id from dual;&lt;br /&gt; &lt;br /&gt;CURRENT_EDITION_ID&lt;br /&gt;--------------------------------------------------------------------------------&lt;br /&gt;100&lt;/pre&gt;Changing these will change data returned by &lt;span style="font-style:italic;"&gt;_CURRENT_EDITION_OBJ&lt;/span&gt; view as well and, given that that view is referenced instead of &lt;span style="font-style:italic;"&gt;obj$&lt;/span&gt;, it looks like these contexts will be the primary mechanism responsible for referencing one version (edition) of the object or the other.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;New user type&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Again, by looking at the view definition, we may spot that versionable objects are being concerned as far as &lt;span style="font-style:italic;"&gt;user$.type# = 2&lt;/span&gt;. To remind you, 0 is a role and 1 is a regular user, so there will be something new, perhaps this new feature could be enabled on a per-user level and/or versioned objects will be owned by some other "ghost" schema (my comment about object with type 88 seems to be allowing this, at least technically).&lt;br /&gt;&lt;br /&gt;Some interesting clues might be get from &lt;span style="font-style:italic;"&gt;_CURRENT_EDITION_OBJ&lt;/span&gt;'s field definition itself:&lt;br /&gt;&lt;pre&gt;when (u.type# = 2) then&lt;br /&gt; (select eo.name from obj$ eo where eo.obj# = u.spare2)&lt;br /&gt;else&lt;br /&gt; 'ORA$BASE'&lt;br /&gt;end&lt;/pre&gt;The objects which are owned by this new user type will have &lt;span style="font-style:italic;"&gt;editition_id&lt;/span&gt; as their &lt;span style="font-style:italic;"&gt;object_id&lt;/span&gt; and will be named after edition name. That also means that &lt;span style="font-style:italic;"&gt;object_id is no longer unique&lt;/span&gt;:&lt;pre&gt;create unique index i_obj1 on obj$(obj#, owner#, type#) (11GR1)&lt;br /&gt;create unique index i_obj1 on obj$(obj#) (10GR2)&lt;br /&gt;(from sql.bsq)&lt;/pre&gt;I wander whether this have some chances to break legacy code which references &lt;span style="font-style:italic;"&gt;sys.obj$&lt;/span&gt; by &lt;span style="font-style:italic;"&gt;obj#&lt;/span&gt; and expect one row to be returned at most...&lt;br /&gt;&lt;br /&gt;Does all that, again, points out that object versions will be kept under this special user and will be linked back using &lt;span style="font-style:italic;"&gt;data_object_id&lt;/span&gt;? Is this also a reason why objects which do have "proper" &lt;span style="font-style:italic;"&gt;data_object_id&lt;/span&gt; are not versionable?&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;A speculation (again)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Pleases note that everything said in this post is based on some (random) observations of preliminary data available in the current release (11.1.0.7). There are chances that some (if not all) of this data may became altered or even completely invalid when this feature will be officially announced.&lt;br /&gt;&lt;br /&gt;However, one thing seems to be clear -- in order to support this new feature, some serious alterations of the data dictionary are taking place and you should keep yourself alerted...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15440695-1926758158673415212?l=afatkulin.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://afatkulin.blogspot.com/2009/01/edition-based-redifinition-speculation.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>5</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-1705425970408491422</guid><pubDate>Sat, 31 Jan 2009 21:59:00 +0000</pubDate><atom:updated>2009-01-31T17:38:55.494-05:00</atom:updated><title>Create database or who wants some DMT?</title><description>&lt;span style="font-weight:bold;"&gt;Simpler than ever&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Starting from Oracle 10G, creating the database can be as simple as this:&lt;br /&gt;&lt;pre&gt;SQL&gt; create database;&lt;br /&gt;&lt;br /&gt;Database created.&lt;/pre&gt;It's not a surprise that this feature was somewhat advertised here and there. However, what was missing in these advertisements is this:&lt;br /&gt;&lt;pre&gt;SQL&gt; select name, decode(bitmapped, 0, 'DMT', 'LMT')&lt;br /&gt;      from ts$&lt;br /&gt;      order by name;  2    3&lt;br /&gt;&lt;br /&gt;NAME                           DEC&lt;br /&gt;------------------------------ ---&lt;br /&gt;SYSAUX                         LMT&lt;br /&gt;&lt;span style="color:red;"&gt;SYSTEM                         DMT&lt;/span&gt;&lt;br /&gt;SYS_UNDOTS                     LMT&lt;/pre&gt;For you see, &lt;span style="font-style:italic;"&gt;create database&lt;/span&gt; will make your &lt;span style="font-style:italic;"&gt;SYSTEM&lt;/span&gt; tablespace to be &lt;span style="font-style:italic;"&gt;dictionary managed&lt;/span&gt; by default. I don't really know if there are any reasons for this and since everything else will default to LMT, this should not be a big deal for most of you anyway. Just don't forget that &lt;span style="font-style:italic;"&gt;SYSTEM&lt;/span&gt; hosts objects like &lt;span style="font-style:italic;"&gt;AUD$&lt;/span&gt; (audit log), &lt;span style="font-style:italic;"&gt;FGA_LOG$&lt;/span&gt; (fine-grained audit log) or &lt;span style="font-style:italic;"&gt;NCOMP_DLL$&lt;/span&gt; (natively compiled objects) which could grow to a fairly large number of extents.&lt;br /&gt;&lt;br /&gt;I was a bit surprised watching this relic appear even when you do this in 11.1.0.7...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15440695-1705425970408491422?l=afatkulin.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://afatkulin.blogspot.com/2009/01/create-database-or-who-wants-some-dmt.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>4</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-8564059587898722413</guid><pubDate>Sat, 31 Jan 2009 03:27:00 +0000</pubDate><atom:updated>2009-01-31T01:36:02.065-05:00</atom:updated><title>Moving a datafile</title><description>Sometimes you need to move a datafile into a different mount point or ASM diskgroup. This could make you wandering what technique you can use in order to minimize downtime. I'll show you one of my favorite methods which works well under certain circumstances.&lt;br /&gt;&lt;br /&gt;Let's say you want to move the following datafile...&lt;br /&gt;&lt;pre style="font-size:11px;"&gt;SQL&gt; select file_id, file_name&lt;br /&gt;  2     from dba_data_files&lt;br /&gt;  3     where tablespace_name='USERS';&lt;br /&gt;&lt;br /&gt;FILE_ID FILE_NAME&lt;br /&gt;------- --------------------------------------------------------&lt;br /&gt;      4 /u01/oradata/ORA11GR1/datafile/o1_mf_users_4q759m64_.dbf&lt;/pre&gt;...into mount point /u02.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Backup as copy&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The first thing we need to do is backup this datafile as copy using RMAN:&lt;br /&gt;&lt;pre style="font-size:11px;"&gt;[oracle@ora11gr1a ~]$ rman&lt;br /&gt;&lt;br /&gt;Recovery Manager: Release 11.1.0.7.0 - Production on Thu Jan 29 17:29:59 2009&lt;br /&gt;&lt;br /&gt;Copyright (c) 1982, 2007, Oracle.  All rights reserved.&lt;br /&gt;&lt;br /&gt;RMAN&gt; connect target;&lt;br /&gt;&lt;br /&gt;connected to target database: ORA11GR1 (DBID=3707369966)&lt;br /&gt;&lt;br /&gt;RMAN&gt; backup as copy datafile 4&lt;br /&gt;2&gt;      format '/u02/oradata/ORA11GR1/datafile/users01.dbf';&lt;br /&gt;&lt;br /&gt;Starting backup at 29-JAN-09&lt;br /&gt;using channel ORA_DISK_1&lt;br /&gt;channel ORA_DISK_1: starting datafile copy&lt;br /&gt;input datafile file number=00004 name=/u01/oradata/ORA11GR1/datafile/o1_mf_users&lt;br /&gt;_4q759m64_.dbf&lt;br /&gt;output file name=/u02/oradata/ORA11GR1/datafile/users01.dbf tag=TAG20090129T1825&lt;br /&gt;32 RECID=14 STAMP=677442334&lt;br /&gt;channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:03&lt;br /&gt;Finished backup at 29-JAN-09&lt;/pre&gt;&lt;span style="font-weight:bold;"&gt;Rollforward image copy&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Since switching to datafile copy will require datafile recover, it might be a good idea to rollforward this image copy first, in order to bring it up to date...&lt;br /&gt;&lt;pre style="font-size:11px;"&gt;RMAN&gt; list copy of datafile 4;&lt;br /&gt;&lt;br /&gt;List of Datafile Copies&lt;br /&gt;=======================&lt;br /&gt;&lt;br /&gt;Key     File S Completion Time Ckp SCN    Ckp Time&lt;br /&gt;------- ---- - --------------- ---------- ---------------&lt;br /&gt;14      4    A 29-JAN-09       2033489    29-JAN-09&lt;br /&gt;        Name: /u02/oradata/ORA11GR1/datafile/users01.dbf&lt;br /&gt;        Tag: TAG20090129T182532&lt;br /&gt;&lt;br /&gt;RMAN&gt; backup incremental from scn 2033489 datafile 4 format '/u02/oradata/ORA11G&lt;br /&gt;R1/datafile/%U';&lt;br /&gt;&lt;br /&gt;Starting backup at 29-JAN-09&lt;br /&gt;&lt;br /&gt;using channel ORA_DISK_1&lt;br /&gt;backup will be obsolete on date 05-FEB-09&lt;br /&gt;archived logs will not be kept or backed up&lt;br /&gt;channel ORA_DISK_1: starting full datafile backup set&lt;br /&gt;channel ORA_DISK_1: specifying datafile(s) in backup set&lt;br /&gt;input datafile file number=00004 name=/u01/oradata/ORA11GR1/datafile/o1_mf_users&lt;br /&gt;_4q759m64_.dbf&lt;br /&gt;channel ORA_DISK_1: starting piece 1 at 29-JAN-09&lt;br /&gt;channel ORA_DISK_1: finished piece 1 at 29-JAN-09&lt;br /&gt;piece handle=/u02/oradata/ORA11GR1/datafile/1nk61t1d_1_1 tag=TAG20090129T183004&lt;br /&gt;comment=NONE&lt;br /&gt;channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01&lt;br /&gt;&lt;br /&gt;using channel ORA_DISK_1&lt;br /&gt;backup will be obsolete on date 05-FEB-09&lt;br /&gt;archived logs will not be kept or backed up&lt;br /&gt;channel ORA_DISK_1: starting full datafile backup set&lt;br /&gt;channel ORA_DISK_1: specifying datafile(s) in backup set&lt;br /&gt;including current control file in backup set&lt;br /&gt;channel ORA_DISK_1: starting piece 1 at 29-JAN-09&lt;br /&gt;channel ORA_DISK_1: finished piece 1 at 29-JAN-09&lt;br /&gt;piece handle=/u02/oradata/ORA11GR1/datafile/1ok61t1e_1_1 tag=TAG20090129T183004&lt;br /&gt;comment=NONE&lt;br /&gt;channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01&lt;br /&gt;Finished backup at 29-JAN-09&lt;br /&gt;&lt;br /&gt;RMAN&gt; recover copy of datafile 4;&lt;br /&gt;&lt;br /&gt;Starting recover at 29-JAN-09&lt;br /&gt;using channel ORA_DISK_1&lt;br /&gt;channel ORA_DISK_1: starting incremental datafile backup set restore&lt;br /&gt;channel ORA_DISK_1: specifying datafile copies to recover&lt;br /&gt;recovering datafile copy file number=00004 name=/u02/oradata/ORA11GR1/datafile/u&lt;br /&gt;sers01.dbf&lt;br /&gt;channel ORA_DISK_1: reading from backup piece /u02/oradata/ORA11GR1/datafile/1nk&lt;br /&gt;61t1d_1_1&lt;br /&gt;channel ORA_DISK_1: piece handle=/u02/oradata/ORA11GR1/datafile/1nk61t1d_1_1 tag&lt;br /&gt;=TAG20090129T183004&lt;br /&gt;channel ORA_DISK_1: restored backup piece 1&lt;br /&gt;channel ORA_DISK_1: restore complete, elapsed time: 00:00:01&lt;br /&gt;Finished recover at 29-JAN-09&lt;br /&gt;&lt;br /&gt;RMAN&gt; list copy of datafile 4;&lt;br /&gt;&lt;br /&gt;List of Datafile Copies&lt;br /&gt;=======================&lt;br /&gt;&lt;br /&gt;Key     File S Completion Time Ckp SCN    Ckp Time&lt;br /&gt;------- ---- - --------------- ---------- ---------------&lt;br /&gt;15      4    A 29-JAN-09       2033633    29-JAN-09&lt;br /&gt;        Name: /u02/oradata/ORA11GR1/datafile/users01.dbf&lt;br /&gt;        Tag: TAG20090129T182532&lt;br /&gt;&lt;/pre&gt;Note that image copy's checkpoint SCN has moved forward. Keep in mind that this step generally makes sense only if you have block change tracking enabled and/or there is a huge amount of archivelogs to apply, as it will be a trade-off between creating and applying the incremental backup compared to directly applying all necessarily archivelogs. This step can be done starting from Oracle 10G.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Switch datafile&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;All we have to do now is execute small RMAN block...&lt;br /&gt;&lt;pre style="font-size:11px;"&gt;RMAN&gt; run&lt;br /&gt;2&gt; {&lt;br /&gt;3&gt;      sql 'alter database datafile 4 offline';&lt;br /&gt;4&gt;      switch datafile 4 to datafilecopy '/u02/oradata/ORA11GR1/datafile/users0&lt;br /&gt;1.dbf';&lt;br /&gt;5&gt;      recover datafile 4;&lt;br /&gt;6&gt;      sql 'alter database datafile 4 online';&lt;br /&gt;7&gt; }&lt;br /&gt;&lt;br /&gt;sql statement: alter database datafile 4 offline&lt;br /&gt;&lt;br /&gt;datafile 4 switched to datafile copy&lt;br /&gt;input datafile copy RECID=15 STAMP=677442659 file name=/u02/oradata/ORA11GR1/dat&lt;br /&gt;afile/users01.dbf&lt;br /&gt;&lt;br /&gt;Starting recover at 29-JAN-09&lt;br /&gt;using channel ORA_DISK_1&lt;br /&gt;&lt;br /&gt;starting media recovery&lt;br /&gt;media recovery complete, elapsed time: 00:00:00&lt;br /&gt;&lt;br /&gt;Finished recover at 29-JAN-09&lt;br /&gt;&lt;br /&gt;sql statement: alter database datafile 4 online&lt;/pre&gt; This is where you'll have some downtime. The amount of downtime depends on how long it will take to recover the datafile which will generally be a function of how many archivelogs needs to be applied which, in turn, can be reduced by using incremental backup. The point is that this step can be really fast.&lt;br /&gt;&lt;br /&gt;Don't forget to watch against nologging operations!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15440695-8564059587898722413?l=afatkulin.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://afatkulin.blogspot.com/2009/01/moving-datafile.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-6018804904447102634</guid><pubDate>Sun, 25 Jan 2009 13:27:00 +0000</pubDate><atom:updated>2009-01-25T10:07:08.011-05:00</atom:updated><title>Update and rownum oddity</title><description>Take a look at the following table:&lt;br /&gt;&lt;pre&gt;SQL&gt; create table codes&lt;br /&gt;  2  (&lt;br /&gt;  3     code varchar2(10),&lt;br /&gt;  4     used number,&lt;br /&gt;  5     constraint pk_codes primary key (used, code)&lt;br /&gt;  6  ) organization index;&lt;br /&gt;&lt;br /&gt;Table created.&lt;br /&gt;&lt;br /&gt;SSQL&gt; insert into codes&lt;br /&gt;  2     select  dbms_random.string('x', 10),&lt;br /&gt;  3             case when level &lt;= 5000 then 1 else 0 end&lt;br /&gt;  4             from dual&lt;br /&gt;  5             connect by level &lt;= 100000;&lt;br /&gt;&lt;br /&gt;100000 rows created.&lt;br /&gt;&lt;br /&gt;SQL&gt; commit;&lt;br /&gt;&lt;br /&gt;Commit complete.&lt;br /&gt;&lt;br /&gt;SQL&gt; exec dbms_stats.gather_table_stats(user, 'codes');&lt;br /&gt;&lt;br /&gt;PL/SQL procedure successfully completed.&lt;/pre&gt;This table contains a set of codes with &lt;span style="font-style:italic;"&gt;used&lt;/span&gt; column representing whether the code was already used (1) or not (0). We need to return one (random) unused code from the above table and mark this code as used. This is very easy to archive using the following update statement:&lt;br /&gt;&lt;pre&gt;SQL&gt; variable code varchar2(10);&lt;br /&gt;SQL&gt; set autot traceonly&lt;br /&gt;SQL&gt; update codes set used=1&lt;br /&gt;  2     where used=0 and rownum=1&lt;br /&gt;  3     returning code into :code;&lt;br /&gt;&lt;br /&gt;1 row updated.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Execution Plan&lt;br /&gt;----------------------------------------------------------&lt;br /&gt;Plan hash value: 1169687698&lt;br /&gt;&lt;br /&gt;-----------------------------------------------------------------------------------&lt;br /&gt;| Id  | Operation              | Name     | Rows  | Bytes | Cost (%CPU)| Time     |&lt;br /&gt;-----------------------------------------------------------------------------------&lt;br /&gt;|   0 | UPDATE STATEMENT       |          |     1 |    13 |   117   (1)| 00:00:02 |&lt;br /&gt;|   1 |  UPDATE                | CODES    |       |       |            |          |&lt;br /&gt;|*  2 |   COUNT STOPKEY        |          |       |       |            |          |&lt;br /&gt;|*  3 |    INDEX FAST FULL SCAN| PK_CODES | 50000 |   634K|   117   (1)| 00:00:02 |&lt;br /&gt;-----------------------------------------------------------------------------------&lt;br /&gt;&lt;br /&gt;Predicate Information (identified by operation id):&lt;br /&gt;---------------------------------------------------&lt;br /&gt;&lt;br /&gt;   2 - filter(ROWNUM=1)&lt;br /&gt;   3 - filter("USED"=0)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Statistics&lt;br /&gt;----------------------------------------------------------&lt;br /&gt;          1  recursive calls&lt;br /&gt;          5  db block gets&lt;br /&gt;         12  consistent gets&lt;br /&gt;          0  physical reads&lt;br /&gt;        124  redo size&lt;br /&gt;        913  bytes sent via SQL*Net to client&lt;br /&gt;        874  bytes received via SQL*Net from client&lt;br /&gt;          3  SQL*Net roundtrips to/from client&lt;br /&gt;          1  sorts (memory)&lt;br /&gt;          0  sorts (disk)&lt;/pre&gt;However, the plan is not exactly what I would expect (I'm running this on 11.1.0.7)... Why do &lt;span style="font-style:italic;"&gt;IFFS&lt;/span&gt; when we can do &lt;span style="font-style:italic;"&gt;IRS&lt;/span&gt; to get only one row? This is exactly what regular select does, after all:&lt;br /&gt;&lt;pre&gt;SQL&gt; select *&lt;br /&gt;  2     from codes&lt;br /&gt;  3     where used=0 and rownum=1;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Execution Plan&lt;br /&gt;----------------------------------------------------------&lt;br /&gt;Plan hash value: 802332609&lt;br /&gt;&lt;br /&gt;------------------------------------------------------------------------------&lt;br /&gt;| Id  | Operation         | Name     | Rows  | Bytes | Cost (%CPU)| Time     |&lt;br /&gt;------------------------------------------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT  |          |     1 |    13 |     2   (0)| 00:00:01 |&lt;br /&gt;|*  1 |  COUNT STOPKEY    |          |       |       |            |          |&lt;br /&gt;|*  2 |   INDEX RANGE SCAN| PK_CODES |     1 |    13 |     2   (0)| 00:00:01 |&lt;br /&gt;------------------------------------------------------------------------------&lt;br /&gt;&lt;br /&gt;Predicate Information (identified by operation id):&lt;br /&gt;---------------------------------------------------&lt;br /&gt;&lt;br /&gt;   1 - filter(ROWNUM=1)&lt;br /&gt;   2 - access("USED"=0)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Statistics&lt;br /&gt;----------------------------------------------------------&lt;br /&gt;          1  recursive calls&lt;br /&gt;          0  db block gets&lt;br /&gt;          2  consistent gets&lt;br /&gt;          0  physical reads&lt;br /&gt;          0  redo size&lt;br /&gt;        590  bytes sent via SQL*Net to client&lt;br /&gt;        520  bytes received via SQL*Net from client&lt;br /&gt;          2  SQL*Net roundtrips to/from client&lt;br /&gt;          0  sorts (memory)&lt;br /&gt;          0  sorts (disk)&lt;br /&gt;          1  rows processed&lt;/pre&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;10053 trace&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Let's take a look at 10053 trace output for both &lt;span style="font-style:italic;"&gt;select&lt;/span&gt; and &lt;span style="font-style:italic;"&gt;update&lt;/span&gt; statements. I'm picking up relevant sections.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt;UPDATE:&lt;/span&gt;&lt;br /&gt;&lt;pre&gt;***************************************&lt;br /&gt;BASE STATISTICAL INFORMATION&lt;br /&gt;***********************&lt;br /&gt;Table Stats::&lt;br /&gt;  Table: CODES  Alias: CODES&lt;br /&gt;    #Rows: 100000  #Blks:  423  AvgRowLen:  13.00&lt;br /&gt;Index Stats::&lt;br /&gt;  Index: PK_CODES  Col#: 2 1&lt;br /&gt;    LVLS: 1  #LB: 423  #DK: 100000  LB/K: 1.00  DB/K: 1.00  CLUF: 0.00&lt;br /&gt;Access path analysis for CODES&lt;br /&gt;***************************************&lt;br /&gt;SINGLE TABLE ACCESS PATH &lt;br /&gt;  Single Table Cardinality Estimation for CODES[CODES] &lt;br /&gt;  Table: CODES  Alias: CODES&lt;br /&gt;    &lt;font color="red"&gt;Card: Original: 100000.000000  Rounded: 50000  Computed: 50000.00  Non Adjusted: 50000.00&lt;/font&gt;&lt;br /&gt;  Access Path: index (index (FFS))&lt;br /&gt;    Index: PK_CODES&lt;br /&gt;    resc_io: 116.00  resc_cpu: 20012369&lt;br /&gt;    ix_sel: 0.000000  ix_sel_with_filters: 1.000000 &lt;br /&gt;  Access Path: index (FFS)&lt;br /&gt;    Cost:  116.81  Resp: 116.81  Degree: 1&lt;br /&gt;      Cost_io: 116.00  Cost_cpu: 20012369&lt;br /&gt;      Resp_io: 116.00  Resp_cpu: 20012369&lt;br /&gt;OPTIMIZER PERCENT INDEX CACHING = 0&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;  Access Path: index (IndexOnly)&lt;br /&gt;    Index: PK_CODES&lt;br /&gt;    resc_io: 213.00  resc_cpu: 11516867&lt;br /&gt;    ix_sel: 0.500000  ix_sel_with_filters: 0.500000 &lt;br /&gt;    Cost: 213.47  Resp: 213.47  Degree: 1&lt;br /&gt;  Best:: AccessPath: IndexFFS&lt;br /&gt;  Index: PK_CODES&lt;br /&gt;         Cost: 116.81  Degree: 1  Resp: 116.81  Card: 50000.00  Bytes: 0&lt;br /&gt;&lt;br /&gt;***************************************&lt;/pre&gt;&lt;span style="font-style:italic;"&gt;Nothing else&lt;/span&gt; is being tried and this is what optimizer selects as the best execution plan. Note how cardinalities are being reported (highlighted in &lt;font color="red"&gt;red&lt;/font&gt;).&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt;SELECT:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;In addition to the above, has one more section:&lt;br /&gt;&lt;pre&gt;***************************************&lt;br /&gt;SINGLE TABLE ACCESS PATH &lt;font color="blue"&gt;(First K Rows)&lt;/font&gt;&lt;br /&gt;  Single Table Cardinality Estimation for CODES[CODES] &lt;br /&gt;  Table: CODES  Alias: CODES&lt;br /&gt;    &lt;font color="green"&gt;Card: Original: 2.000000  Rounded: 1  Computed: 1.00  Non Adjusted: 1.00&lt;/font&gt;&lt;br /&gt;  Access Path: index (index (FFS))&lt;br /&gt;    Index: PK_CODES&lt;br /&gt;    resc_io: 2.00  resc_cpu: 7461&lt;br /&gt;    ix_sel: 0.000000  ix_sel_with_filters: 1.000000 &lt;br /&gt;  Access Path: index (FFS)&lt;br /&gt;    Cost:  2.00  Resp: 2.00  Degree: 1&lt;br /&gt;      Cost_io: 2.00  Cost_cpu: 7461&lt;br /&gt;      Resp_io: 2.00  Resp_cpu: 7461&lt;br /&gt;OPTIMIZER PERCENT INDEX CACHING = 0&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;  Access Path: index (IndexOnly)&lt;br /&gt;    Index: PK_CODES&lt;br /&gt;    resc_io: 2.00  resc_cpu: 14443&lt;br /&gt;    ix_sel: 0.500000  ix_sel_with_filters: 0.500000 &lt;br /&gt;    Cost: 2.00  Resp: 2.00  Degree: 1&lt;br /&gt;  Best:: AccessPath: IndexRange&lt;br /&gt;  Index: PK_CODES&lt;br /&gt;         Cost: 2.00  Degree: 1  Resp: 2.00  Card: 1.00  Bytes: 13&lt;br /&gt;&lt;br /&gt;First K Rows: unchanged join prefix len = 1&lt;br /&gt;Join order[1]:  CODES[CODES]#0&lt;br /&gt;***********************&lt;/pre&gt;Note how cardinalities has changed (highlighted in &lt;font color="green"&gt;green&lt;/font&gt;) this time. What happened is &lt;span style="font-style:italic;"&gt;rownum = 1&lt;/span&gt; predicate resulted in &lt;span style="font-style:italic;"&gt;fist_rows(1)&lt;/span&gt; mode (highlighted in &lt;font color="blue"&gt;blue&lt;/font&gt;), affecting how cardinalities were calculated.&lt;br /&gt;&lt;br /&gt;We know that &lt;span style="font-style:italic;"&gt;first_rows(n)&lt;/span&gt; hint is being ignored in &lt;span style="font-style:italic;"&gt;update&lt;/span&gt; and &lt;span style="font-style:italic;"&gt;delete&lt;/span&gt; statements, thus our update statement always goes in &lt;span style="font-style:italic;"&gt;all_rows&lt;/span&gt; mode.&lt;br /&gt;&lt;br /&gt;You can confirm that &lt;span style="font-style:italic;"&gt;select&lt;/span&gt; behaves exactly the same way when in &lt;span style="font-style:italic;"&gt;all_rows&lt;/span&gt; mode:&lt;br /&gt;&lt;pre&gt;SQL&gt; select /*+ all_rows */ *&lt;br /&gt;  2     from codes&lt;br /&gt;  3     where used=0 and rownum=1;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Execution Plan&lt;br /&gt;----------------------------------------------------------&lt;br /&gt;Plan hash value: 2682988822&lt;br /&gt;&lt;br /&gt;----------------------------------------------------------------------------------&lt;br /&gt;| Id  | Operation             | Name     | Rows  | Bytes | Cost (%CPU)| Time     |&lt;br /&gt;----------------------------------------------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT      |          |     1 |    13 |   117   (1)| 00:00:02 |&lt;br /&gt;|*  1 |  COUNT STOPKEY        |          |       |       |            |          |&lt;br /&gt;|*  2 |   INDEX FAST FULL SCAN| PK_CODES | 50000 |   634K|   117   (1)| 00:00:02 |&lt;br /&gt;----------------------------------------------------------------------------------&lt;br /&gt;&lt;br /&gt;Predicate Information (identified by operation id):&lt;br /&gt;---------------------------------------------------&lt;br /&gt;&lt;br /&gt;   1 - filter(ROWNUM=1)&lt;br /&gt;   2 - filter("USED"=0)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Statistics&lt;br /&gt;----------------------------------------------------------&lt;br /&gt;          0  recursive calls&lt;br /&gt;          0  db block gets&lt;br /&gt;         12  consistent gets&lt;br /&gt;          0  physical reads&lt;br /&gt;          0  redo size&lt;br /&gt;        590  bytes sent via SQL*Net to client&lt;br /&gt;        520  bytes received via SQL*Net from client&lt;br /&gt;          2  SQL*Net roundtrips to/from client&lt;br /&gt;          0  sorts (memory)&lt;br /&gt;          0  sorts (disk)&lt;br /&gt;          1  rows processed&lt;/pre&gt;Well, I guess here goes my next wish for &lt;span style="font-style:italic;"&gt;CBO&lt;/span&gt; improvement regarding how &lt;span style="font-style:italic;"&gt;update&lt;/span&gt; and &lt;span style="font-style:italic;"&gt;delete&lt;/span&gt; statements are handled with predicates involving &lt;span style="font-style:italic;"&gt;rownum&lt;/span&gt;...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15440695-6018804904447102634?l=afatkulin.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://afatkulin.blogspot.com/2009/01/update-and-rownum-oddity.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>9</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-5563627845038955843</guid><pubDate>Fri, 23 Jan 2009 21:29:00 +0000</pubDate><atom:updated>2009-01-24T04:05:28.708-05:00</atom:updated><title>11G Managed Recovery Process</title><description>&lt;span style="font-style:italic;"&gt;MRP&lt;/span&gt; process is commonly &lt;a href="http://www.google.ca/search?hl=en&amp;amp;q=oracle+mrp+process&amp;amp;btnG=Google+Search&amp;amp;meta="&gt;referenced&lt;/a&gt; throughout the web as the process which performs redo apply to your managed standby database. MRP may can in a team with PQ slave (or pr&lt;span style="font-style:italic;"&gt;nn&lt;/span&gt; in 11G) processes in case you start managed recovery in parallel.&lt;br /&gt;&lt;br /&gt;Unfortunately, the term &lt;span style="font-style:italic;"&gt;performs redo apply&lt;/span&gt; seems to be causing some confusion along the way as well. I found it very common that people believes that it's MRP process which performs both reads from the redo streams as well as writes changes into datafiles. For example, sometimes they try to battle slow log apply by increasing managed recovery parallelism without realizing that there is in fact a bit more to the puzzle.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Some details&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I'm going to use my &lt;span style="font-weight:bold;"&gt;11G DataGuard&lt;/span&gt; setup to demonstrate a couple of key points. My setup is operating using real time apply (no parallel), which makes the entire example a bit simpler to demonstrate.&lt;br /&gt;&lt;br /&gt;Let's update a row on the source DB:&lt;br /&gt;&lt;pre&gt;SQL&gt; update t set n=n;&lt;br /&gt;&lt;br /&gt;1 row updated.&lt;br /&gt;&lt;br /&gt;SQL&gt;  commit;&lt;br /&gt;&lt;br /&gt;Commit complete.&lt;/pre&gt;Now, take a look at MRP strace output which was produced as a result of the above change:&lt;br /&gt;&lt;pre&gt;[oracle@ora11gr1b fd]$ ps -fp 6364&lt;br /&gt;UID        PID  PPID  C STIME TTY          TIME CMD&lt;br /&gt;oracle    6364     1  0 19:59 ?        00:00:00 ora_mrp0_ora11gr1&lt;br /&gt;[oracle@ora11gr1b fd]$ strace -e pread,pwrite -p 6364&lt;br /&gt;Process 6364 attached - interrupt to quit&lt;br /&gt;pread(32,..., 512, 45568) = 512&lt;br /&gt;pread(32,..., 1024, 46080) = 1024&lt;br /&gt;pread(30,..., 8192, 2228224) = 8192&lt;br /&gt;pread(30,..., 8192, 259858432) = 8192&lt;br /&gt;pread(31,..., 8192, 9461760) = 8192&lt;/pre&gt;I've set &lt;span style="font-style:italic;"&gt;filesystemio_options=none&lt;/span&gt; so we can observe &lt;span style="font-style:italic;"&gt;pread/pwrite syscalls&lt;/span&gt; which are easier to follow compared to asynch &lt;span style="font-style:italic;"&gt;io_submit/io_getvents system calls&lt;/span&gt; (and we don't care about O_DIRECT flag either).&lt;br /&gt;&lt;br /&gt;Let's check what are these file descriptors:&lt;br /&gt;&lt;pre&gt;[oracle@ora11gr1b fd]$ cd /proc/6364/fd&lt;br /&gt;[oracle@ora11gr1b fd]$ file 30&lt;br /&gt;30: symbolic link to `/u01/oradata/ORA11GR1B/datafile/o1_mf_undotbs1_0fk5fp2c_.dbf'&lt;br /&gt;[oracle@ora11gr1b fd]$ file 31&lt;br /&gt;31: symbolic link to `/u01/oradata/ORA11GR1B/datafile/o1_mf_users_0ik5fp4u_.dbf'&lt;br /&gt;[oracle@ora11gr1b fd]$ file 32&lt;br /&gt;32: symbolic link to `/u01/oradata/ORA11GR1B/onlinelog/o1_mf_9_4qn2rkhk_.log'&lt;/pre&gt;In other words, the process read from standby logfile, undo and users (this is where our table is) tablespaces. However, as you might notice, all these calls were reads, we didn't write anything.&lt;br /&gt;&lt;br /&gt;From time to time MRP gets a bit more interesting, for example during logfile switches:&lt;br /&gt;&lt;pre&gt;...&lt;br /&gt;pwrite(27,..., 16384, 16384) = 16384&lt;br /&gt;pread(27,..., 16384, 16384) = 16384&lt;br /&gt;pread(28,..., 8192, 8192) = 8192&lt;br /&gt;pread(29,..., 8192, 8192) = 8192&lt;br /&gt;pread(30,..., 8192, 8192) = 8192&lt;br /&gt;pread(31,..., 8192, 8192) = 8192&lt;br /&gt;pread(27,..., 16384, 393216) = 16384&lt;br /&gt;pwrite(28,..., 8192, 8192) = 8192&lt;br /&gt;pwrite(29,..., 8192, 8192) = 8192&lt;br /&gt;pwrite(30,..., 8192, 8192) = 8192&lt;br /&gt;pwrite(31,..., 8192, 8192) = 8192&lt;br /&gt;...&lt;/pre&gt;Here we actually wrote (27 is a controlfile, 28 and 29 are system and sysaux tablespaces respectively) something. However, from the offset (fourth parameter) you can realize that we are writing to the second block in these datafiles. There is no (can't be) any user data there.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Who is writing the data then?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The first thing you might want to check is, of course, the database writer process:&lt;br /&gt;&lt;pre&gt;[oracle@ora11gr1b ~]$ ps -fp 6303&lt;br /&gt;UID        PID  PPID  C STIME TTY          TIME CMD&lt;br /&gt;oracle    6303     1  0 19:55 ?        00:00:00 ora_dbw0_ora11gr1&lt;br /&gt;[oracle@ora11gr1b ~]$ strace -e pread,pwrite -p 6303&lt;br /&gt;Process 6303 attached - interrupt to quit&lt;br /&gt;pwrite(23,..., 8192, 2097152) = 8192&lt;br /&gt;pwrite(23,..., 8192, 35987456) = 8192&lt;br /&gt;pwrite(24,..., 8192, 9461760) = 8192&lt;/pre&gt;This is the output produced by standby's dbwr right after we updated our table on the source. We wrote two undo blocks (23) and one block in users tablespace (24). By looking at the offset for file descriptor 24 we can confirm that we wrote the table itself:&lt;br /&gt;&lt;pre&gt;SQL&gt; select segment_name&lt;br /&gt; 2   from dba_extents&lt;br /&gt; 3   where tablespace_name='USERS'&lt;br /&gt; 4    and 9461760/8192 between block_id and block_id + blocks-1;&lt;br /&gt;&lt;br /&gt;SEGMENT_NAME&lt;br /&gt;--------------------------&lt;br /&gt;T&lt;/pre&gt;From the above you can confirm that it is DBWR process which wrote the changes for us and it plays crucial role during your standby database operations.&lt;br /&gt;&lt;br /&gt;MRP's workload consists mostly from &lt;span style="font-style:italic;"&gt;reading&lt;/span&gt; the redo stream, datafiles, controlfiles and occasional writes into the controlfile and datafiles header.&lt;br /&gt;&lt;br /&gt;If your standby is suffering from the redo apply performance, you may want to pay attention to both MRP &lt;span style="font-style:italic;"&gt;and&lt;/span&gt; DBWR processes.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15440695-5563627845038955843?l=afatkulin.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://afatkulin.blogspot.com/2009/01/11g-managed-recovery-process.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-4966029699927956151</guid><pubDate>Tue, 20 Jan 2009 11:10:00 +0000</pubDate><atom:updated>2009-01-20T21:52:16.274-05:00</atom:updated><title>Latch waits in 11G: queuing</title><description>Following up on my previous &lt;a href="http://afatkulin.blogspot.com/2009/01/longhold-latch-waits-on-linux.html"&gt;post&lt;/a&gt; about 11G latch waits using semaphore post-wait with a latch holder...&lt;br /&gt;&lt;br /&gt;What I was curious to confirm is whether latch holder posts only the first waiter in the list or all of them?&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;The results&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Here are the results I've got using three sessions:&lt;br /&gt;&lt;pre&gt;&lt;ol&gt;&lt;li/&gt;&lt;span style="font-weight:bold;"&gt;Session 1:&lt;/span&gt; gets RC latch&lt;br /&gt;&lt;li/&gt;&lt;span style="font-weight:bold;"&gt;Session 2:&lt;/span&gt; misses the latch, begins to sleep&lt;br/&gt;&lt;span style="font-style:italic;"&gt;semop(753664, 0x7fbfffa348, 1)&lt;/span&gt;&lt;br /&gt;&lt;li/&gt;&lt;span style="font-weight:bold;"&gt;Session 3:&lt;/span&gt; misses the latch, begins to sleep&lt;br/&gt;&lt;span style="font-style:italic;"&gt;semop(753664, 0x7fbfff5f58, 1)&lt;/span&gt;&lt;br /&gt;&lt;li/&gt;&lt;span style="font-weight:bold;"&gt;Session 1:&lt;/span&gt; completes, posts session 2&lt;br/&gt;&lt;span style="font-style:italic;"&gt;semctl(753664, 28, SETVAL, 0x1)&lt;/span&gt;&lt;br /&gt;&lt;li/&gt;&lt;span style="font-weight:bold;"&gt;Session 2:&lt;/span&gt; completes, posts session 3&lt;br/&gt;&lt;span style="font-style:italic;"&gt;semctl(753664, 31, SETVAL, 0x1)&lt;/span&gt;&lt;br /&gt;&lt;/ol&gt;&lt;/pre&gt;Note that though we used the same &lt;span style="font-style:italic;"&gt;semaphore set&lt;/span&gt; (753664), sessions 1 and 2 posted different &lt;span style="font-style:italic;"&gt;semaphores&lt;/span&gt; within that set (28 and 31 respectively) and waiters were wakened up by their respective holders according to the order they begun to wait.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;sleeps + spin_gets = misses ?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;That's an ideal case, of course, as just awakened process might at least have to compete for the same latch with a previous holder if that holder decided to get the same latch again. If awakened process doesn't manage to get the latch during the spin, it'll have to sleep again (thus breaking the above formula). However, the &lt;span style="font-style:italic;"&gt;proximity&lt;/span&gt; to the above equation might be useful as one of the indicators whether latch is a &lt;span style="font-style:italic;"&gt;potential&lt;/span&gt; subject to the "new" algorithm or not (this new wait model should tend to produce less sleeps, hence spins, per miss, otherwise there were not much points about it).&lt;br /&gt;&lt;pre&gt;SQL&gt; select misses, sleeps, spin_gets, sleeps+spin_gets&lt;br /&gt;  2   from v$latch&lt;br /&gt;  3   where name='Result Cache: Latch';&lt;br /&gt; &lt;br /&gt;    MISSES     SLEEPS  SPIN_GETS SLEEPS+SPIN_GETS&lt;br /&gt;---------- ---------- ---------- ----------------&lt;br /&gt;      8208        275       7946             8221&lt;/pre&gt;The numbers are pretty close indeed. All this made me a bit more curios and I've decided to take a bit closer look at what happens in the strace then waiter has to sleep more than once in a row.&lt;br /&gt;&lt;br /&gt;I've ran two parallel sessions executing a loop which selects from result cache...&lt;br /&gt;&lt;pre&gt;begin&lt;br /&gt; for i in 1 .. 1000000&lt;br /&gt; loop&lt;br /&gt;  for cur in (select /*+ result_cache */ * from t where n=1)&lt;br /&gt;  loop&lt;br /&gt;   null;&lt;br /&gt;  end loop;&lt;br /&gt; end loop;&lt;br /&gt;end;&lt;/pre&gt; ...made sure the above formula became distorted further and took a look at strace c&lt;span style="font-style:italic;"&gt;&lt;/span&gt;utput for one of the sessions which led me to an interesting discovery:&lt;br /&gt;&lt;pre&gt;[oracle@ora11gr1a ~]$ strace -e semop,semctl,semtimedop -p 7591&lt;br /&gt;Process 7591 attached - interrupt to quit&lt;br /&gt;...&lt;br /&gt;semctl(1015808, 30, SETVAL, 0x1)        = 0&lt;br /&gt;semctl(1015808, 30, SETVAL, 0x1)        = 0&lt;br /&gt;semtimedop(1015808, 0x7fbfff42b8, 1, {0, 10000000}) = 0&lt;br /&gt;semtimedop(1015808, 0x7fbfff42b8, 1, {0, 10000000}) = -1 EAGAIN (Resource temporarily unavailable)&lt;br /&gt;semop(1015808, 0x7fbfff5f58, 1)         = 0&lt;br /&gt;...&lt;/pre&gt;What we see there is traditional time-based sleep using &lt;span style="font-style:italic;"&gt;semtimedop syscall&lt;/span&gt;. Does that means that Oracle still can choose between algorithms dynamically? The answer, according to extended SQL trace, seems to be &lt;span style="font-style:italic;"&gt;no&lt;/span&gt;, as I wasn't able to match these &lt;span style="font-style:italic;"&gt;semtimedop syscalls&lt;/span&gt; with latch waits in SQL trace (so these &lt;span style="font-style:italic;"&gt;syscalls&lt;/span&gt; were used for something else).&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt;P.S. It looks like 11G has invalidated all that pseudo code you may find around about latch acquisition algorithm Oracle uses.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15440695-4966029699927956151?l=afatkulin.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://afatkulin.blogspot.com/2009/01/latch-waits-in-11g-queuing.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>1</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-2669024593286241615</guid><pubDate>Mon, 19 Jan 2009 23:05:00 +0000</pubDate><atom:updated>2009-01-19T20:53:49.259-05:00</atom:updated><title>"LONGHOLD" latch waits on Linux</title><description>Take a look at the figure from my previous blog post:&lt;br /&gt;&lt;pre&gt;call     count       cpu    elapsed       disk      query    current        rows&lt;br /&gt;------- ------  -------- ---------- ---------- ---------- ----------  ----------&lt;br /&gt;Parse        1      0.00       0.00          0          0          0           0&lt;br /&gt;Execute      1      0.00       0.00          0          0          0           0&lt;br /&gt;Fetch     5001      0.53      53.71          0          0          0      500000&lt;br /&gt;------- ------  -------- ---------- ---------- ---------- ----------  ----------&lt;br /&gt;total     5003      0.53      53.71          0          0          0      500000&lt;br /&gt;&lt;br /&gt;...&lt;br /&gt;&lt;br /&gt;Event waited on                             Times   Max. Wait  Total Waited&lt;br /&gt;  ----------------------------------------   Waited  ----------  ------------&lt;br /&gt;  latch free                                      1       53.20         53.20&lt;br /&gt;&lt;/pre&gt;What's unusual about it is a single latch wait for 53 seconds without consuming any CPU time. This is something different from a usual&lt;br /&gt;&lt;pre&gt;get -&gt; miss -&gt; spin -&gt; sleep&lt;/pre&gt;latch acquisition algorithm we generally used to.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Sequence of events&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;To remind you, the sequence of events goes like this:&lt;ol&gt;&lt;li/&gt;&lt;span style="font-weight:bold;"&gt;First session:&lt;/span&gt; do &lt;span style="font-style:italic;"&gt;select /*+ result_cache */ * from t&lt;/span&gt; which places a single big result into Result Cache memory.&lt;br /&gt;&lt;li/&gt;&lt;span style="font-weight:bold;"&gt;First session:&lt;/span&gt; do &lt;span style="font-style:italic;"&gt;select count(*) from v$result_cache_memory&lt;/span&gt; which grabs Result Cache latch for a long amount of time.&lt;br /&gt;&lt;li/&gt;&lt;span style="font-weight:bold;"&gt;Second session:&lt;/span&gt; do &lt;span style="font-style:italic;"&gt;select /*+ result_cache */ * from t&lt;/span&gt; which forces us to wait until Result Cache latch will be freed up by our first session.&lt;/ol&gt;&lt;span style="font-weight:bold;"&gt;Latch statistics&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Let's take a look at v$latch right after we execute the above three steps:&lt;br /&gt;&lt;pre&gt;SQL&gt; select gets, misses, sleeps&lt;br /&gt;  2   from v$latch&lt;br /&gt;  3   where name='Result Cache: Latch';&lt;br /&gt; &lt;br /&gt;      GETS     MISSES     SLEEPS&lt;br /&gt;---------- ---------- ----------&lt;br /&gt;     53494          1          1&lt;/pre&gt;As we can see, the session indeed went into a sleep after a single miss. The first thing which may come into your mind is that I'm running this on a single CPU box where we don't spin if we miss a latch (due to &lt;span style="font-style:italic;"&gt;_spin_count=1&lt;/span&gt; being default on a single CPU boxes). This is not the case:&lt;br /&gt;&lt;pre&gt;SQL&gt; show parameter cpu_count&lt;br /&gt;&lt;br /&gt;NAME                                 TYPE        VALUE&lt;br /&gt;------------------------------------ ----------- ------------------------------&lt;br /&gt;cpu_count                            integer     2&lt;/pre&gt;And my DB's &lt;span style="font-style:italic;"&gt;_spin_count&lt;/span&gt; has a default value of 2000 as well.&lt;br /&gt;&lt;br /&gt;Let's look a bit further:&lt;pre&gt;SQL&gt; select location, sleep_count, wtr_slp_count, longhold_count&lt;br /&gt;  2   from v$latch_misses&lt;br /&gt;  3   where parent_name='Result Cache: Latch'&lt;br /&gt;  4    and sleep_count+wtr_slp_count+longhold_count &gt; 0;&lt;br /&gt; &lt;br /&gt;LOCATION                       SLEEP_COUNT WTR_SLP_COUNT LONGHOLD_COUNT&lt;br /&gt;------------------------------ ----------- ------------- --------------&lt;br /&gt;Result Cache: Serialization12            0             1              0&lt;br /&gt;Result Cache: Serialization16            1             0              1&lt;/pre&gt;Note &lt;span style="font-style:italic;"&gt;LONGHOLD_COUN&lt;/span&gt;T column equals 1 for &lt;span style="font-style:italic;"&gt;Result Cache: Serialization16&lt;/span&gt;. That means that we've holded a latch for the entire duration of someone else's sleep which, again, conforms to our observation. But what is the underlying mechanism for this?&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Under the hood&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Here is an excerpt from our second session's strace output:&lt;br /&gt;&lt;pre&gt;...&lt;br /&gt;getrusage(RUSAGE_SELF, {ru_utime={0, 728889}, ru_stime={1, 317799}, ...}) = 0&lt;br /&gt;getrusage(RUSAGE_SELF, {ru_utime={0, 728889}, ru_stime={1, 317799}, ...}) = 0&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;semop(98304, 0x7fbfff5f58, 1)           = 0 -- enters the sleep here&lt;/span&gt;&lt;br /&gt;times({tms_utime=72, tms_stime=131, tms_cutime=0, tms_cstime=0}) = 429496789&lt;br /&gt;getrusage(RUSAGE_SELF, {ru_utime={0, 729889}, ru_stime={1, 317799}, ...}) = 0&lt;br /&gt;...&lt;/pre&gt;And the first session:&lt;br /&gt;&lt;pre&gt;...&lt;br /&gt;mmap(0x2a972e7000, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 9, 0) = 0x2a972e7000&lt;br /&gt;mmap(0x2a972f7000, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 9, 0) = 0x2a972f7000&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;semctl(98304, 31, SETVAL, 0x7f00000001) = 0 -- issues upon completion&lt;/span&gt;&lt;br /&gt;getrusage(RUSAGE_SELF, {ru_utime={206, 80670}, ru_stime={2, 686591}, ...}) = 0&lt;br /&gt;getrusage(RUSAGE_SELF, {ru_utime={206, 81670}, ru_stime={2, 686591}, ...}) = 0&lt;br /&gt;...&lt;/pre&gt;In other words:&lt;br /&gt;&lt;ol&gt;&lt;li/&gt;&lt;span style="font-weight:bold;"&gt;Second session:&lt;/span&gt; upon discovering that the latch is busy, it begins to sleep by issuing &lt;br/&gt;&lt;span style="font-style:italic;"&gt;semop(98304, 0x7fbfff5f58, 1)&lt;/span&gt; command.&lt;br /&gt;&lt;li/&gt;&lt;span style="font-weight:bold;"&gt;First session:&lt;/span&gt; upon completion issues &lt;br/&gt;&lt;span style="font-style:italic;"&gt;semctl(98304, 31, SETVAL, 0x7f00000001)&lt;/span&gt; which wakes our second session up.&lt;/ol&gt;The first argument is a &lt;span style="font-style:italic;"&gt;semid&lt;/span&gt; which means that both sessions are operating on the same semaphore. Both semaphore &lt;span style="font-style:italic;"&gt;syscalls&lt;/span&gt; are not presented if we execute each of the above sessions separately.&lt;br /&gt;&lt;br /&gt;Though we do sleep under normal latch acquisition algorithm as well, the sleep is based on &lt;span style="font-style:italic;"&gt;time&lt;/span&gt;, not the latch holder posting us.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Which algorithm?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The interesting question to answer is how the waiter determines which algorithm to use. It could be (A) a special "longhold" mode get for a latch or (B) the waiter might check for how long the latch was already held. Thus far I tend to think that it is more close to (A) (a special get mode or a separate flag somewhere) because if we execute above two sessions concurrently we always get an enormously unusual number of sleeps even if we place only one single-row result into Result Cache memory (thus making sure there are no significant delays selecting from &lt;span style="font-style:italic;"&gt;v$result_cache_memory&lt;/span&gt;).&lt;br /&gt;&lt;br /&gt;Maybe some of you could share other ideas as well.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15440695-2669024593286241615?l=afatkulin.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://afatkulin.blogspot.com/2009/01/longhold-latch-waits-on-linux.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>3</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-132606393814285152</guid><pubDate>Sun, 18 Jan 2009 11:14:00 +0000</pubDate><atom:updated>2009-01-18T21:57:57.206-05:00</atom:updated><title>v$result_cache_memory or what is the newest method to hang stuff up?</title><description>Sometimes I get quite amused by the "side effects" you can discover while playing with features.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Prerequisites&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;First, we need to place something relatively big into result cache memory. The easiest way to archive this is start like this:&lt;br /&gt;&lt;pre&gt;SQL&gt; alter system set result_cache_max_size=64m;&lt;br /&gt;&lt;br /&gt;System altered.&lt;br /&gt;&lt;br /&gt;SQL&gt; alter system set result_cache_max_result=100;&lt;br /&gt;&lt;br /&gt;System altered.&lt;/pre&gt;The first parameter controls how much memory out of your shared pool Oracle can dedicate for result cache. The second parameter tells how much memory, in percent, can be dedicated for a single resultset. Default value is 5 and I've set it to 100 so we don't have to dedicate a lot of memory to demonstrate my point.&lt;br /&gt;&lt;br /&gt;All we have to do now is populate the result cache memory:&lt;br /&gt;&lt;pre&gt;SQL&gt; create table t as&lt;br /&gt;  2   select level n, rpad('*', 100, '*') v&lt;br /&gt;  3    from dual&lt;br /&gt;  4    connect by level &lt;= 500000;&lt;br /&gt; &lt;br /&gt;Table created&lt;br /&gt; &lt;br /&gt;SQL&gt; begin&lt;br /&gt;  2   for cur in (select /*+ result_cache*/ * from t)&lt;br /&gt;  3   loop&lt;br /&gt;  4    null;&lt;br /&gt;  5   end loop;&lt;br /&gt;  6  end;&lt;br /&gt;  7  /&lt;br /&gt; &lt;br /&gt;PL/SQL procedure successfully completed&lt;/pre&gt;&lt;span style="font-weight:bold;"&gt;What's next?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Let's check how much result cache memory is allocated:&lt;br /&gt;&lt;pre&gt;SQL&gt; set timing on&lt;br /&gt;SQL&gt; select count(*) from v$result_cache_memory;&lt;br /&gt; &lt;br /&gt;  COUNT(*)&lt;br /&gt;----------&lt;br /&gt;     65536&lt;br /&gt; &lt;br /&gt;Executed in 55.984 seconds&lt;/pre&gt;That's almost a minute to get the results, yes.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;So what?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;There are (nasty) consequences.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;It's time to buy Core i7&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Let's start by taking a look at the trace file of our session:&lt;br /&gt;&lt;pre&gt;SQL ID: 0zudkuq64dkwx&lt;br /&gt;Plan Hash: 600544352&lt;br /&gt;select count(*)&lt;br /&gt;from&lt;br /&gt; v$result_cache_memory&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;call     count       cpu    elapsed       disk      query    current        rows&lt;br /&gt;------- ------  -------- ---------- ---------- ---------- ----------  ----------&lt;br /&gt;Parse        1      0.00       0.00          0          0          0           0&lt;br /&gt;Execute      1      0.00       0.00          0          0          0           0&lt;br /&gt;Fetch        1     54.65      56.96          0          0          0           1&lt;br /&gt;------- ------  -------- ---------- ---------- ---------- ----------  ----------&lt;br /&gt;total        3     54.65      56.96          0          0          0           1&lt;br /&gt;&lt;br /&gt;Misses in library cache during parse: 1&lt;br /&gt;Optimizer mode: ALL_ROWS&lt;br /&gt;Parsing user id: 38&lt;br /&gt;&lt;br /&gt;Rows     Row Source Operation&lt;br /&gt;-------  ---------------------------------------------------&lt;br /&gt;      1  SORT AGGREGATE (cr=0 pr=0 pw=0 time=0 us)&lt;br /&gt;  65536   FIXED TABLE FULL X$QESRCMEM (cr=0 pr=0 pw=0 time=0 us cost=0 size=13 card=1)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Elapsed times include waiting on following events:&lt;br /&gt;  Event waited on                             Times   Max. Wait  Total Waited&lt;br /&gt;  ----------------------------------------   Waited  ----------  ------------&lt;br /&gt;  SQL*Net message to client                       2        0.00          0.00&lt;br /&gt;  SQL*Net message from client                     2        0.00          0.01&lt;/pre&gt;First of all, all the time was spent actively consuming CPU. Selecting from X$QESRCMEM became a pretty expensive thing (in fact, at first I confused such a huge delay with a session being simply hang while spinning on the CPU).&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Can't use result cache&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Any session which attempts to use result cache will be forced to wait until our first session completes:&lt;br /&gt;&lt;pre&gt;SQL ID: bzf4nvdav1m2j&lt;br /&gt;Plan Hash: 1601196873&lt;br /&gt;SELECT /*+ result_cache*/ *&lt;br /&gt;FROM&lt;br /&gt; T&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;call     count       cpu    elapsed       disk      query    current        rows&lt;br /&gt;------- ------  -------- ---------- ---------- ---------- ----------  ----------&lt;br /&gt;Parse        1      0.00       0.00          0          0          0           0&lt;br /&gt;Execute      1      0.00       0.00          0          0          0           0&lt;br /&gt;Fetch     5001      0.53      53.71          0          0          0      500000&lt;br /&gt;------- ------  -------- ---------- ---------- ---------- ----------  ----------&lt;br /&gt;total     5003      0.53      53.71          0          0          0      500000&lt;br /&gt;&lt;br /&gt;Misses in library cache during parse: 0&lt;br /&gt;Optimizer mode: ALL_ROWS&lt;br /&gt;Parsing user id: 38     (recursive depth: 1)&lt;br /&gt;&lt;br /&gt;Rows     Row Source Operation&lt;br /&gt;-------  ---------------------------------------------------&lt;br /&gt; 500000  RESULT CACHE  49yjv3h37cjhz39gkd90p4413p (cr=0 pr=0 pw=0 time=0 us)&lt;br /&gt;      0   TABLE ACCESS FULL T (cr=0 pr=0 pw=0 time=0 us cost=2127 size=30436770 card=468258)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Elapsed times include waiting on following events:&lt;br /&gt;  Event waited on                             Times   Max. Wait  Total Waited&lt;br /&gt;  ----------------------------------------   Waited  ----------  ------------&lt;br /&gt;  latch free                                      1       53.20         53.20&lt;br /&gt;&lt;/pre&gt;And the latch name is...&lt;br /&gt;&lt;pre&gt;WAIT #2: nam='latch free' ela= 53205337 address=1610838648 number=377 tries=0 obj#=-1 tim=1232316789354155&lt;br /&gt;&lt;br /&gt;SQL&gt; select name from v$latch where latch#=377;&lt;br /&gt; &lt;br /&gt;NAME&lt;br /&gt;----------------------------------------------------------------&lt;br /&gt;Result Cache: Latch&lt;/pre&gt;Surprised? I'm not. Don't expect me to start complaining about the absence of shared mode gets again.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;The Real Deal or result_cache_mode = force&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Things gets really nasty if you set &lt;span style="font-style:italic;"&gt;result_cache_mode = force&lt;/span&gt; which has a potential outcome of rendering your entire system unusable for some extended period of time. Because even the simplest queries like &lt;span style="font-style:italic;"&gt;select * from dual&lt;/span&gt; will hang, escalating situation to a point where your normal users will be unable to even connect to a database, let alone executing the queries. For example, trying to connect with SQL*Plus results in the following:&lt;br /&gt;&lt;pre&gt;SQL ID: d6vwqbw6r2ffk&lt;br /&gt;Plan Hash: 1388734953&lt;br /&gt;SELECT USER&lt;br /&gt;FROM&lt;br /&gt; DUAL&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;call     count       cpu    elapsed       disk      query    current        rows&lt;br /&gt;------- ------  -------- ---------- ---------- ---------- ----------  ----------&lt;br /&gt;Parse        1      0.01       0.01          0          0          0           0&lt;br /&gt;Execute      1      0.00       0.00          0          0          0           0&lt;br /&gt;Fetch        2      0.00      43.40          0          0          0           1&lt;br /&gt;------- ------  -------- ---------- ---------- ---------- ----------  ----------&lt;br /&gt;total        4      0.01      43.41          0          0          0           1&lt;br /&gt;&lt;br /&gt;Misses in library cache during parse: 1&lt;br /&gt;Optimizer mode: ALL_ROWS&lt;br /&gt;Parsing user id: 38&lt;br /&gt;&lt;br /&gt;Rows     Row Source Operation&lt;br /&gt;-------  ---------------------------------------------------&lt;br /&gt;      1  RESULT CACHE  4c48ztv5ztq25dkd78360h2uxb (cr=0 pr=0 pw=0 time=0 us)&lt;br /&gt;      1   FAST DUAL  (cr=0 pr=0 pw=0 time=0 us cost=2 size=0 card=1)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Elapsed times include waiting on following events:&lt;br /&gt;  Event waited on                             Times   Max. Wait  Total Waited&lt;br /&gt;  ----------------------------------------   Waited  ----------  ------------&lt;br /&gt;  SQL*Net message to client                       2        0.00          0.00&lt;br /&gt;  latch free                                      1       43.39         43.39&lt;br /&gt;  SQL*Net message from client                     2        0.00          0.00&lt;br /&gt;&lt;/pre&gt;I don't have to mention the latch name again.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Real World?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;What are the chances of meeting the above situation on some real system? First of all, having a system with more than a 100GB SGA is not uncommon these day. The thing gets worse the more memory is being occupied by a &lt;span style="font-style:italic;"&gt;single&lt;/span&gt; result. Having a lot of small results seems to be not a problem. If we take 5% as a default threshold to be eligible to be placed into result cache memory, facing the above situation will require 1280M for result cache memory. That looks like quite a big number, however, on the scale of hundred gigs SGA it's no longer seems to be improbable. Aggregating a multi gigabytes of data into 64MB result doesn't sound insane either (at least not to me, you may have a different opinion here).&lt;br /&gt;&lt;br /&gt;Plus it takes someone to query v$result_cache_memory. So just don't do it. There are other views (like v$result_cache_objects) which you can use to at least get a part of that info, alas at lower granularity.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Environment&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The above tests were verified on Oracle 11GR1 running on Windows XP x64 (native) and Linux x86-64 under VMware (with XP x64 acting as a host OS).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15440695-132606393814285152?l=afatkulin.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://afatkulin.blogspot.com/2009/01/vresultcachememory-or-what-is-newest.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-6434456866649130453</guid><pubDate>Fri, 16 Jan 2009 23:20:00 +0000</pubDate><atom:updated>2009-01-16T19:11:52.369-05:00</atom:updated><title>insert /*+ append */ into ... values (...)</title><description>Alex Egorov has left a very fascinating comment to my post about &lt;a href="http://afatkulin.blogspot.com/2009/01/11g-adaptive-direct-path-reads-what-is.html"&gt;11G adaptive direct path reads&lt;/a&gt; which I thought definitely deserves mentioning.&lt;br /&gt;&lt;br /&gt;What we are talking about here is a serious behavior change how statements like...&lt;br /&gt;&lt;pre&gt;insert /*+ append */ into t values (1)&lt;/pre&gt;...are handled in 11G.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Saving yourself from yourself... not anymore&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;In previous Oracle versions, the &lt;span style="font-style:italic;"&gt;append&lt;/span&gt; hint in the cases like above was simply ignored for some good reasons. The hint is being obeyed in 11G.&lt;br /&gt;&lt;br /&gt;To better elaborate what that means, let's take the following example:&lt;br /&gt;&lt;pre&gt;begin&lt;br /&gt; for i in 1 .. 1000&lt;br /&gt; loop&lt;br /&gt;  insert /*+ append */ into t values (i);&lt;br /&gt;  commit;&lt;br /&gt; end loop;&lt;br /&gt;end;&lt;/pre&gt;...and compare how behavior will change in 11G.&lt;br /&gt;&lt;br /&gt;First of all, I've seen legacy code like the above in many places where developers were putting append hint everywhere they could because they heard it is like &lt;span style="font-style:italic;"&gt;fast=true&lt;/span&gt; thing.&lt;br /&gt;&lt;br /&gt;After you upgrade, you might expect the following:&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li/&gt;If you don't do commit after each row -- the code will break itself with &lt;span style="font-style:italic;"&gt;ORA-12838: cannot read/modify an object after modifying it in parallel&lt;/span&gt;.&lt;br /&gt;&lt;li/&gt;If you do commit (as in my example) -- you have all chances to grow the table beyond the skies as each direct path insert writes beyond the HWM (in 11G, the above code will grow the table by 1000 blocks each time it is executed).&lt;br /&gt;&lt;li/&gt;Serious locking issues as each direct path insert has to obtain an exclusive lock on the table first.&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;The above might produce quite an unpleasant surprises for some of you...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15440695-6434456866649130453?l=afatkulin.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://afatkulin.blogspot.com/2009/01/insert-append-into-values.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>10</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-6654449729965596193</guid><pubDate>Thu, 15 Jan 2009 21:27:00 +0000</pubDate><atom:updated>2009-01-27T11:43:17.288-05:00</atom:updated><title>Hot backup mode and a little known fact</title><description>Most people are aware of the implication hot backup puts on your redo stream. They'll tell you that on first modification Oracle will dump the entire block image into a redo stream and subsequent writes of the block will go as usual.&lt;br /&gt;&lt;br /&gt;This is only half of the story.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;The example&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Let's start by creating a simple table I'll use as an example:&lt;br /&gt;&lt;pre&gt;SQL&gt; create table t pctfree 99 pctused 1 as&lt;br /&gt;  2     select level n&lt;br /&gt;  3             from dual&lt;br /&gt;  4             connect by level &lt;= 100;&lt;br /&gt;&lt;br /&gt;Table created.&lt;/pre&gt;Now, let's measure the amount of redo generated if I'm going to update all rows in the table:&lt;br /&gt;&lt;pre&gt;SQL&gt; set autot traceonly stat&lt;br /&gt;SQL&gt; update t set n=n;&lt;br /&gt;&lt;br /&gt;100 rows updated.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Statistics&lt;br /&gt;----------------------------------------------------------&lt;br /&gt;         30  recursive calls&lt;br /&gt;         20  db block gets&lt;br /&gt;         44  consistent gets&lt;br /&gt;         18  physical reads&lt;br /&gt;      12152  redo size&lt;br /&gt;        826  bytes sent via SQL*Net to client&lt;br /&gt;        769  bytes received via SQL*Net from client&lt;br /&gt;          3  SQL*Net roundtrips to/from client&lt;br /&gt;          2  sorts (memory)&lt;br /&gt;          0  sorts (disk)&lt;br /&gt;        100  rows processed&lt;/pre&gt;That's almost 12K of redo. Let's see what happens if I put tablespace into a hot backup mode:&lt;br /&gt;&lt;pre&gt;SQL&gt; alter tablespace users begin backup;&lt;br /&gt;&lt;br /&gt;Tablespace altered.&lt;br /&gt;&lt;br /&gt;SQL&gt; update t set n=n;&lt;br /&gt;&lt;br /&gt;100 rows updated.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Statistics&lt;br /&gt;----------------------------------------------------------&lt;br /&gt;          0  recursive calls&lt;br /&gt;         20  db block gets&lt;br /&gt;         20  consistent gets&lt;br /&gt;          0  physical reads&lt;br /&gt;     152708  redo size&lt;br /&gt;        830  bytes sent via SQL*Net to client&lt;br /&gt;        769  bytes received via SQL*Net from client&lt;br /&gt;          3  SQL*Net roundtrips to/from client&lt;br /&gt;          1  sorts (memory)&lt;br /&gt;          0  sorts (disk)&lt;br /&gt;        100  rows processed&lt;/pre&gt;The amount of redo grew up to almost 150K. Oracle has to dump the entire block image into a redo stream due to fractured blocks problem. If we do the same update second time:&lt;br /&gt;&lt;pre&gt;SQL&gt; update t set n=n;&lt;br /&gt;&lt;br /&gt;100 rows updated.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Statistics&lt;br /&gt;----------------------------------------------------------&lt;br /&gt;          0  recursive calls&lt;br /&gt;         17  db block gets&lt;br /&gt;         20  consistent gets&lt;br /&gt;          0  physical reads&lt;br /&gt;      11980  redo size&lt;br /&gt;        830  bytes sent via SQL*Net to client&lt;br /&gt;        769  bytes received via SQL*Net from client&lt;br /&gt;          3  SQL*Net roundtrips to/from client&lt;br /&gt;          1  sorts (memory)&lt;br /&gt;          0  sorts (disk)&lt;br /&gt;        100  rows processed&lt;/pre&gt;It goes as usual. In fact, and this is kind of amazing, this is what most of the sources available on the internet will tell you about the behavior: the full image of the block gets written only on first modification in order to prevent fractured blocks problem and subsequent modifications will go as usuals.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Only half of the story&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Let me continue with my example:&lt;br /&gt;&lt;pre&gt;SQL&gt; alter system flush buffer_cache;&lt;br /&gt;&lt;br /&gt;System altered.&lt;br /&gt;&lt;br /&gt;SQL&gt; update t set n=n;&lt;br /&gt;&lt;br /&gt;100 rows updated.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Statistics&lt;br /&gt;----------------------------------------------------------&lt;br /&gt;          0  recursive calls&lt;br /&gt;         20  db block gets&lt;br /&gt;         20  consistent gets&lt;br /&gt;         20  physical reads&lt;br /&gt;     151916  redo size&lt;br /&gt;        831  bytes sent via SQL*Net to client&lt;br /&gt;        769  bytes received via SQL*Net from client&lt;br /&gt;          3  SQL*Net roundtrips to/from client&lt;br /&gt;          1  sorts (memory)&lt;br /&gt;          0  sorts (disk)&lt;br /&gt;        100  rows processed&lt;/pre&gt;What you see here is the entire image of the blocks written into a redo stream again.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;The true story&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt;Corrected 27-Jan-2009&lt;/span&gt;: initially I thought that it is a write to disk what matters, however, thanks to Jonathan Lewis for correcting me -- it's a read from disk to buffer. Please see his first comment.&lt;br /&gt;&lt;br /&gt;Don't confuse correlation with causation.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15440695-6654449729965596193?l=afatkulin.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://afatkulin.blogspot.com/2009/01/hot-backup-mode-and-little-known-fact.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>3</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-3390151551894897393</guid><pubDate>Thu, 15 Jan 2009 02:44:00 +0000</pubDate><atom:updated>2009-01-16T21:31:20.341-05:00</atom:updated><title>Consistent gets from cache (fastpath)</title><description>There is an intresting optimization introduced in 11G regarding the way blocks are accessed in the buffer cache.&lt;br /&gt;&lt;br /&gt;Let's start with a simple example to demonstrate my point.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;10GR2 example&lt;/span&gt;&lt;br /&gt;&lt;pre&gt;SQL&gt; create table t (&lt;br /&gt; 2   n number,&lt;br /&gt; 3   v varchar2(100),&lt;br /&gt; 4   constraint pk_n primary key (n)&lt;br /&gt; 5  );&lt;br /&gt;&lt;br /&gt;Table created&lt;br /&gt;&lt;br /&gt;SQL&gt;&lt;br /&gt;SQL&gt; insert into t&lt;br /&gt; 2   select level, rpad('*', 100, '*')&lt;br /&gt; 3    from dual&lt;br /&gt; 4    connect by level &lt;= 1000;&lt;br /&gt;&lt;br /&gt;1000 rows inserted&lt;br /&gt;&lt;br /&gt;SQL&gt; commit;&lt;br /&gt;&lt;br /&gt;Commit complete&lt;/pre&gt;I'm going to create the procedure to return number of consistent gets in my session:&lt;br /&gt;&lt;pre&gt;SQL&gt; create or replace procedure get_cg(&lt;br /&gt; 2   p_cg out number&lt;br /&gt; 3  ) is&lt;br /&gt; 4  begin&lt;br /&gt; 5   select ms.value end into p_cg&lt;br /&gt; 6    from v$mystat ms, v$statname sn&lt;br /&gt; 7    where ms.STATISTIC#=sn.STATISTIC#&lt;br /&gt; 8     and sn.NAME='consistent gets';&lt;br /&gt; 9  end get_cg;&lt;br /&gt;10  /&lt;br /&gt;&lt;br /&gt;Procedure created&lt;/pre&gt;Let's execute a simple select and measure the number of consistent gets:&lt;br /&gt;&lt;pre&gt;SQL&gt; declare&lt;br /&gt; 2   l_cg_b  number;&lt;br /&gt; 3   l_cg_a  number;&lt;br /&gt; 4  begin&lt;br /&gt; 5   get_cg(l_cg_b);&lt;br /&gt; 6   for cur in (select n from (select mod(level, 1000)+1 l from dual connect by&lt;br /&gt; level &lt;= 100000) l, t where t.n=l.l)&lt;br /&gt; 7   loop&lt;br /&gt; 8    null;   &lt;br /&gt; 9   end loop;&lt;br /&gt; 10   get_cg(l_cg_a);&lt;br /&gt; 11   dbms_output.put_line('consistent gets: '||to_char(l_cg_a-l_cg_b));&lt;br /&gt; 12  end;&lt;br /&gt; 13  /&lt;br /&gt;&lt;br /&gt;consistent gets: 100012&lt;br /&gt;&lt;br /&gt;PL/SQL procedure successfully completed&lt;/pre&gt;This is somewhat expected number, we did an index lookup (a unique scan) for all 100K rows returned from our connect by query.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;11GR1 example&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Let's do the same example in 11G. I've slightly modified the procedure to return two statistics -- &lt;span style="font-style:italic;"&gt;consistent gets&lt;/span&gt; and &lt;span style="font-style:italic;"&gt;consistent gets from cache (fastpath)&lt;/span&gt;:&lt;br /&gt;&lt;pre&gt;SQL&gt; create or replace procedure get_cg(&lt;br /&gt; 2   p_cg out number,&lt;br /&gt; 3   p_cgfp out number&lt;br /&gt; 4  ) is&lt;br /&gt; 5  begin&lt;br /&gt; 6   select max(case sn.NAME when 'consistent gets' then ms.value end),&lt;br /&gt; 7     max(case sn.NAME when 'consistent gets from cache (fastpath)'&lt;br /&gt; then ms.value end)&lt;br /&gt; 8     into p_cg, p_cgfp&lt;br /&gt; 9    from v$mystat ms, v$statname sn&lt;br /&gt;10    where ms.STATISTIC#=sn.STATISTIC#&lt;br /&gt;11     and sn.NAME in ('consistent gets', 'consistent gets from &lt;br /&gt;cache (fastpath)');&lt;br /&gt;12  end get_cg;&lt;br /&gt;13  /&lt;br /&gt;&lt;br /&gt;Procedure created&lt;br /&gt;&lt;br /&gt;SQL&gt; declare&lt;br /&gt; 2   l_cg_b  number;&lt;br /&gt; 3   l_cgfp_b number;&lt;br /&gt; 4   l_cg_a  number;&lt;br /&gt; 5   l_cgfp_a number;&lt;br /&gt; 6  begin&lt;br /&gt; 7   get_cg(l_cg_b, l_cgfp_b);&lt;br /&gt; 8   for cur in (select n from (select mod(level, 1000)+1 l from dual connect by &lt;br /&gt;level &lt;= 100000) l, t where t.n=l.l)&lt;br /&gt; 9   loop&lt;br /&gt; 10    null;&lt;br /&gt; 11   end loop;&lt;br /&gt; 12   get_cg(l_cg_a, l_cgfp_a);&lt;br /&gt; 13   dbms_output.put_line('consistent gets: '||to_char(l_cg_a-l_cg_b));&lt;br /&gt; 14   dbms_output.put_line('consistent gets (fastpath): '||to_char(l_cgfp_a-l_cgfp_b));&lt;br /&gt; 15  end;&lt;br /&gt; 16 &lt;br /&gt;/&lt;br /&gt;consistent gets: 2602&lt;br /&gt;consistent gets (fastpath): 1400&lt;br /&gt;&lt;br /&gt;PL/SQL procedure successfully completed&lt;/pre&gt;That's quite a difference! 2602 consistent gets for 11GR1 versus ~100K for 10GR2. Note a new &lt;span style="font-style:italic;"&gt;consistent gets (fastpath)&lt;/span&gt; statistic appearing in this test case.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;A slightly different example&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;To make you a bit more curious, here is a slightly different example executed on 11GR1:&lt;br /&gt;&lt;pre&gt;SQL&gt; declare&lt;br /&gt; 2   l_cg_b  number;&lt;br /&gt; 3   l_cgfp_b number;&lt;br /&gt; 4   l_cg_a  number;&lt;br /&gt; 5   l_cgfp_a number;&lt;br /&gt; 6  begin&lt;br /&gt; 7   get_cg(l_cg_b, l_cgfp_b);&lt;br /&gt; 8   for cur in (select n from (select trunc(dbms_random.value(1, 1000)) l from &lt;br /&gt;dual connect by level &lt;= 100000) l, t where t.n=l.l)&lt;br /&gt; 9   loop&lt;br /&gt; 10    null;&lt;br /&gt; 11   end loop;&lt;br /&gt; 12   get_cg(l_cg_a, l_cgfp_a);&lt;br /&gt; 13   dbms_output.put_line('consistent gets: '||to_char(l_cg_a-l_cg_b));&lt;br /&gt; 14   dbms_output.put_line('consistent gets (fastpath): '||to_char(l_cgfp_a-l_cgfp_b));&lt;br /&gt; 15  end;&lt;br /&gt; 16  /&lt;br /&gt;consistent gets: 101998&lt;br /&gt;consistent gets (fastpath): 2934&lt;br /&gt;&lt;br /&gt;PL/SQL procedure successfully completed&lt;/pre&gt;As soon as we've started reading index blocks in a random fashion, the effect, well, disappeared. Thus far optimization seems to be kicking in only if we repeaditly accessing the same block over and over again.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Another slightly different example&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;10GR2:&lt;br /&gt;&lt;pre&gt;SQL&gt; declare&lt;br /&gt; 2   l_cg_b  number;&lt;br /&gt; 3   l_cg_a  number;&lt;br /&gt; 4  begin&lt;br /&gt; 5   get_cg(l_cg_b);&lt;br /&gt; 6   for cur in (select v from (select mod(level, 1000)+1 l from dual connect by &lt;br /&gt;level &lt;= 100000) l, t where t.n=l.l)&lt;br /&gt; 7   loop&lt;br /&gt; 8    null;&lt;br /&gt; 9   end loop;&lt;br /&gt; 10   get_cg(l_cg_a);&lt;br /&gt; 11   dbms_output.put_line('consistent gets: '||to_char(l_cg_a-l_cg_b));&lt;br /&gt; 12  end;&lt;br /&gt; 13  /&lt;br /&gt;consistent gets: 201001&lt;br /&gt;&lt;br /&gt;PL/SQL procedure successfully completed&lt;/pre&gt;What we are doing here is reading the data from a table itself (not only from index). Again, that's expected number. We had to do an &lt;span style="font-style:italic;"&gt;index unique scan&lt;/span&gt; followed by &lt;span style="font-style:italic;"&gt;table access by rowid&lt;/span&gt;, resulting in two LIOs per returned row. Now, take a look what happens in 11GR1:&lt;br /&gt;&lt;pre&gt;SQL&gt; declare&lt;br /&gt; 2   l_cg_b  number;&lt;br /&gt; 3   l_cgfp_b number;&lt;br /&gt; 4   l_cg_a  number;&lt;br /&gt; 5   l_cgfp_a number;&lt;br /&gt; 6  begin&lt;br /&gt; 7   get_cg(l_cg_b, l_cgfp_b);&lt;br /&gt; 8   for cur in (select v from (select mod(level, 1000)+1 l from dual connect by &lt;br /&gt;level &lt;= 100000) l, t where t.n=l.l)&lt;br /&gt; 9   loop&lt;br /&gt; 10    null;&lt;br /&gt; 11   end loop;&lt;br /&gt; 12   get_cg(l_cg_a, l_cgfp_a);&lt;br /&gt; 13   dbms_output.put_line('consistent gets: '||to_char(l_cg_a-l_cg_b));&lt;br /&gt; 14   dbms_output.put_line('consistent gets (fastpath): '||to_char(l_cgfp_a-&lt;br /&gt;l_cgfp_b));&lt;br /&gt; 15  end;&lt;br /&gt; 16  /&lt;br /&gt;consistent gets: 102602&lt;br /&gt;consistent gets (fastpath): 1400&lt;br /&gt;&lt;br /&gt;PL/SQL procedure successfully completed&lt;/pre&gt;In other words, index blocks were "neutralized", but table blocks were not. Another point is that optimization still works for index blocks even though we accessing table blocks in between.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Works only for index blocks?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;That's really simple to verify using the following example:&lt;br /&gt;&lt;pre&gt;SQL&gt; declare&lt;br /&gt; 2   l_cg_b  number;&lt;br /&gt; 3   l_cgfp_b number;&lt;br /&gt; 4   l_cg_a  number;&lt;br /&gt; 5   l_cgfp_a number;&lt;br /&gt; 6  begin&lt;br /&gt; 7   get_cg(l_cg_b, l_cgfp_b);&lt;br /&gt; 8   for cur in (select v from (select case mod(level, 2) when 0 then &lt;br /&gt;'AAADq1AAEAAAAJHAAA' else 'AAADq1AAEAAAAJHAAB' end l from dual connect by level &lt;= &lt;br /&gt;100000) l, t where t.rowid=l.l)&lt;br /&gt; 9   loop&lt;br /&gt; 10    null;&lt;br /&gt; 11   end loop;&lt;br /&gt; 12   get_cg(l_cg_a, l_cgfp_a);&lt;br /&gt; 13   dbms_output.put_line('consistent gets: '||to_char(l_cg_a-l_cg_b));&lt;br /&gt; 14   dbms_output.put_line('consistent gets (fastpath): '||to_char(l_cgfp_a-&lt;br /&gt;l_cgfp_b));&lt;br /&gt; 15  end;&lt;br /&gt; 16  /&lt;br /&gt;consistent gets: 1000&lt;br /&gt;consistent gets (fastpath): 1000&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;PL/SQL procedure successfully completed&lt;/pre&gt;The same example in 10GR2, of course, produces ~100K LIOs, so the optimization seems to be kicking in for table blocks as well. Further, the behavior in the previous example can not be explained by table blocks being randomly distributed since the clustering factor is perfect:&lt;br /&gt;&lt;pre&gt;SQL&gt; select sum(bn_diff)&lt;br /&gt; 2   from (&lt;br /&gt; 3    select case when bn!=lag(bn, 1, bn) over (order by n)then 1 else 0 &lt;br /&gt;end bn_diff&lt;br /&gt; 4     from (select n, dbms_rowid.rowid_block_number(rowid) bn from &lt;br /&gt;t)&lt;br /&gt; 5   );&lt;br /&gt;&lt;br /&gt;SUM(BN_DIFF)&lt;br /&gt;------------&lt;br /&gt;         15&lt;br /&gt;&lt;br /&gt;SQL&gt; select count (distinct dbms_rowid.rowid_block_number(rowid)) from t;&lt;br /&gt;&lt;br /&gt;COUNT(DISTINCTDBMS_ROWID.ROWID&lt;br /&gt;------------------------------&lt;br /&gt;                           16&lt;/pre&gt;Perhaps, there is some overhead associated with &lt;span style="font-style:italic;"&gt;fastpath&lt;/span&gt; and the logic assumes a bad clustering factor by default (which works for majority of cases) thus doesn't tries to do a fastpath for table blocks in our previous example.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Moving further&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Take a look at this:&lt;br /&gt;&lt;pre&gt;SQL&gt; declare&lt;br /&gt; 2   l_cg_b  number;&lt;br /&gt; 3   l_cgfp_b number;&lt;br /&gt; 4   l_cg_a  number;&lt;br /&gt; 5   l_cgfp_a number;&lt;br /&gt; 6  begin&lt;br /&gt; 7   get_cg(l_cg_b, l_cgfp_b);&lt;br /&gt; 8   for cur in (select t1.n n1, t2.n n2 from (select mod(level, 1000)+1 l from &lt;br /&gt;dual connect by level &lt;= 100000) l, t t1, t t2 where t1.n=l.l and t2.n=l.l)&lt;br /&gt; 9   loop&lt;br /&gt; 10    null;&lt;br /&gt; 11   end loop;&lt;br /&gt; 12   get_cg(l_cg_a, l_cgfp_a);&lt;br /&gt; 13   dbms_output.put_line('consistent gets: '||to_char(l_cg_a-l_cg_b));&lt;br /&gt; 14   dbms_output.put_line('consistent gets (fastpath): '||to_char(l_cgfp_a-&lt;br /&gt;l_cgfp_b));&lt;br /&gt; 15  end;&lt;br /&gt; 16  /&lt;br /&gt;consistent gets: 5204&lt;br /&gt;consistent gets (fastpath): 2800&lt;br /&gt;&lt;br /&gt;PL/SQL procedure successfully completed&lt;/pre&gt;I've simply added the same table into a join twice. Though we were accessing exactly the same blocks from the same table, the optimization didn't span both tables.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Doesn't work across fetches&lt;/span&gt;&lt;br /&gt;&lt;pre&gt;SQL&gt; declare&lt;br /&gt; 2   l_cg_b  number;&lt;br /&gt; 3   l_cgfp_b number;&lt;br /&gt; 4   l_cg_a  number;&lt;br /&gt; 5   l_cgfp_a number;&lt;br /&gt; 6  begin&lt;br /&gt; 7   get_cg(l_cg_b, l_cgfp_b);&lt;br /&gt; 8   for cur in (select n from (select 1 l from dual connect by level &lt;= 100000) &lt;br /&gt;l, t where t.n=l.l)&lt;br /&gt; 9   loop&lt;br /&gt; 10    null;&lt;br /&gt; 11   end loop;&lt;br /&gt; 12   get_cg(l_cg_a, l_cgfp_a);&lt;br /&gt; 13   dbms_output.put_line('consistent gets: '||to_char(l_cg_a-l_cg_b));&lt;br /&gt; 14   dbms_output.put_line('consistent gets (fastpath): '||to_char(l_cgfp_a-&lt;br /&gt;l_cgfp_b));&lt;br /&gt; 15  end;&lt;br /&gt; 16  /&lt;br /&gt;consistent gets: 1008&lt;br /&gt;consistent gets (fastpath): 1001&lt;br /&gt;&lt;br /&gt;PL/SQL procedure successfully completed&lt;/pre&gt;While we were accessing the same row (hence, the block), we got 1K LIOs. It is explained by 11G doing implicit array fetches by 100 rows at a time and also points out at the fact that &lt;span style="font-style:italic;"&gt;consistent gets (fastpath)&lt;/span&gt; can not operate across fetches (though the same rule applies for "regular" consistent gets as well).&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;In a nutshell&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Overall, this looks like a nice feature. The downside, of course, is less predictability. 11G seems to have introduces these little tricks and features all over the place, which can really help you under certain circumstances but can also make your head spin while trying to figure out for "magical" performance differences...&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt;Added 16-Jan-2009: &lt;/span&gt;Take a look at &lt;a href="http://jonathanlewis.wordpress.com/2009/01/16/concurrency/"&gt;Concurrency&lt;/a&gt; post by Jonathan Lewis for some more details.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15440695-3390151551894897393?l=afatkulin.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://afatkulin.blogspot.com/2009/01/consistent-gets-from-cache-fastpath.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>3</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-8129309559253737069</guid><pubDate>Mon, 12 Jan 2009 06:21:00 +0000</pubDate><atom:updated>2009-01-12T02:12:27.103-05:00</atom:updated><title>Airlines...</title><description>Reliability considered to be a table stake in the airlines industry. Something you have to have if you want to play the planes game at all.&lt;br /&gt;&lt;br /&gt;I had a flight to Toronto a couple of days ago which I can describe as very fascinating and enlightening.&lt;br /&gt;&lt;br /&gt;My flight was scheduled at 8AM and I arrived to Ottawa International airport around 7:10AM. After checking in and passing required security checks I was waiting to board my plane. Almost immediately after I begun to wait, airport intercom said:&lt;br /&gt;&lt;blockquote&gt;&lt;span style="font-style:italic;"&gt;- If there is anyone willing to board an earlier flight to Toronto, you can do so.&lt;/span&gt;&lt;/blockquote&gt;Well, fine. It is better to be in Pearson earlier than waiting in Ottawa. So I boarded a flight which was supposed to take off 25 minutes earlier.&lt;br /&gt;&lt;br /&gt;As soon as everyone has boarded, they closed the doors and started to move the plane. When, suddenly, &lt;span style="font-style:italic;"&gt;boom!&lt;/span&gt; I'm observing &lt;span style="font-style:italic;"&gt;a lot&lt;/span&gt; of smoke coming from a back of the plane. The plane stops very hard, almost instantly. I can smell partially burned jet fuel. A minute later a fire extinguisher machine comes and stops a few meters from us doing nothing. I'm watching people becoming more and more curious (is the plane on fire? is it time to jump off the plane to avoid being cooked?). The pilot says:&lt;br /&gt;&lt;blockquote&gt;&lt;span style="font-style:italic;"&gt;- It seems that we've got some electronics snag here, let me try to reboot the on-board computer to see if that fixes the problem.&lt;/span&gt;&lt;/blockquote&gt;Now, that's cool. Ummm... Windows embedded or what? A couple of minutes later pilot comes again:&lt;br /&gt;&lt;blockquote&gt;&lt;span style="font-style:italic;"&gt;- It looks like our problem still persist, so I'm going to power off the entire plane and then put the power back on, let's hope it will fix our problem, &lt;span style="font-weight:bold;"&gt;because usually it does&lt;/span&gt;.&lt;/span&gt;&lt;/blockquote&gt;If there is a moment when you start thinking whether you should be sitting on the plane at all, it seems to be it. Some people has already stood up and clearly wandering how they can get off the plane. This is really a time when stewardess can help everyone:&lt;br /&gt;&lt;blockquote&gt;&lt;span style="font-style:italic;"&gt;- We were trying to figure out if there is a possibility to board the other plane but, unfortunately, all morning flights are extremely busy.&lt;/span&gt;&lt;/blockquote&gt;What a perfect argument to keep the plane's doors closed. The plane backs to life as pilots recycled the power...&lt;br /&gt;&lt;blockquote&gt;&lt;span style="font-style:italic;"&gt;- It looks like our problem was solved, so we are going to proceed for takeoff...&lt;/span&gt;&lt;/blockquote&gt;Why do I have that strange feeling that it would better be not solved so we can board off? Anyway, they are starting to move us again... When, suddenly, &lt;span style="font-style:italic;"&gt;boom!&lt;/span&gt; I'm observing a lot of smoke coming from a back of the plane. Here we go again, a fire extinguisher machine comes. That's enough, who can show me the exit? The pilot:&lt;br /&gt;&lt;blockquote&gt;&lt;span style="font-style:italic;"&gt;- It appears that our problem reoccurred again, the brakes applied without anything telling them to apply.&lt;/span&gt;&lt;/blockquote&gt;Let me think what would happen had the brakes applied on a runway itself... not a rocket since.&lt;br /&gt;&lt;blockquote&gt;&lt;span style="font-style:italic;"&gt;- We are going to call the maintenance to see if they can fix the problem, it should take around 25-30 minutes...&lt;/span&gt;&lt;/blockquote&gt;Right, while they are calling the maintenance, I'm watching my flight taking off... Some guy comes in and disappears under the plane.&lt;br /&gt;&lt;blockquote&gt;&lt;span style="font-style:italic;"&gt;- It looks like maintenance fixed the problem and after they'll be done with a paperwork, we can proceed for takeoff...&lt;/span&gt;&lt;/blockquote&gt;Read &lt;a href="http://www.amazon.com/Worst-First-Continentals-Remarkable-Comeback/dp/0471356522"&gt;the book&lt;/a&gt;, as Gordon says, if you don't do scheduled maintenance then the plane will schedule one for you. Trying to save costs? I have no doubts you can make service to be so cheap that no one wants to use it. Well, at least they gonna miss one of their KPIs, arrival on time, so there is a hope someone is going to do something about it...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15440695-8129309559253737069?l=afatkulin.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://afatkulin.blogspot.com/2009/01/airlines.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-3151972004631731731</guid><pubDate>Sun, 11 Jan 2009 04:45:00 +0000</pubDate><atom:updated>2009-01-11T01:03:33.331-05:00</atom:updated><title>Using DBMS_RULE_ADM to evaluate dynamic predicates</title><description>The best comments are the ones which inspire you to try and learn something new. I received one of these to my recent blog post about &lt;a href="http://afatkulin.blogspot.com/2009/01/scaling-dynamic-sql.html"&gt;Scaling dynamic SQL&lt;/a&gt;. Stas, one of my good friends, asked whether I have tried &lt;span style="font-style:italic;"&gt;DBMS_RULE_ADM&lt;/span&gt; package for this task. Well, now I did. Stas, you should be careful what you ask next time, you might get it :)&lt;br /&gt;&lt;br /&gt;The first thing we need to do is create an evaluation context:&lt;br /&gt;&lt;pre&gt;SQL&gt; declare&lt;br /&gt;  2   l_vtl sys.re$variable_type_list;&lt;br /&gt;  3  begin&lt;br /&gt;  4   l_vtl:=sys.re$variable_type_list(&lt;br /&gt;  5    sys.re$variable_type('age', 'number', null, null),&lt;br /&gt;  6    sys.re$variable_type('balance', 'number', null, null),&lt;br /&gt;  7    sys.re$variable_type('birthdate', 'date', null, null),&lt;br /&gt;  8    sys.re$variable_type('today', 'date', null, null)&lt;br /&gt;  9   );&lt;br /&gt; 10  &lt;br /&gt; 11   dbms_rule_adm.create_evaluation_context(&lt;br /&gt; 12    evaluation_context_name =&gt; 'adverts_ctx',&lt;br /&gt; 13    variable_types =&gt; l_vtl&lt;br /&gt; 14   );&lt;br /&gt; 15  end;&lt;br /&gt; 16  /&lt;br /&gt; &lt;br /&gt;PL/SQL procedure successfully completed&lt;/pre&gt;We can create the rules itself now:&lt;br /&gt;&lt;pre&gt;SQL&gt; declare&lt;br /&gt;  2   l_rules dbms_sql.Varchar2_Table;&lt;br /&gt;  3  begin&lt;br /&gt;  4   dbms_rule_adm.create_rule_set(&lt;br /&gt;  5    rule_set_name =&gt; 'adverts'&lt;br /&gt;  6   );&lt;br /&gt;  7  &lt;br /&gt;  8   l_rules(1):=':age between 16 and 18';&lt;br /&gt;  9   l_rules(2):=':birthdate = :today or :balance &gt; 1000';&lt;br /&gt; 10   l_rules(3):=':balance between 100 and 200 and :age &gt; 18';&lt;br /&gt; 11  &lt;br /&gt; 12   for i in 1 .. 3&lt;br /&gt; 13   loop&lt;br /&gt; 14    dbms_rule_adm.create_rule(&lt;br /&gt; 15     rule_name =&gt; 'rule_'||to_char(i),&lt;br /&gt; 16     condition =&gt; l_rules(i),&lt;br /&gt; 17     evaluation_context =&gt; 'adverts_ctx'&lt;br /&gt; 18    );&lt;br /&gt; 19  &lt;br /&gt; 20    dbms_rule_adm.add_rule(&lt;br /&gt; 21     rule_name =&gt; 'rule_'||to_char(i),&lt;br /&gt; 22     rule_set_name =&gt; 'adverts'&lt;br /&gt; 23    );&lt;br /&gt; 24   end loop;&lt;br /&gt; 25  end;&lt;br /&gt; 26  /&lt;br /&gt; &lt;br /&gt;PL/SQL procedure successfully completed&lt;/pre&gt;Since there will be no SQL executed, I've decided to simply measure wall clock time, given that we can obtain latch wait information from the extended SQL trace:&lt;br /&gt;&lt;pre&gt;SQL&gt; create table results&lt;br /&gt;  2  (&lt;br /&gt;  3   sid number,&lt;br /&gt;  4   cs number&lt;br /&gt;  5  );&lt;br /&gt; &lt;br /&gt;Table created&lt;/pre&gt;Here is the test procedure itself:&lt;br /&gt;&lt;pre&gt;SQL&gt; create or replace procedure test_dra(&lt;br /&gt;  2   p_i in number&lt;br /&gt;  3  ) is&lt;br /&gt;  4   l_age  sys.re$variable_value;&lt;br /&gt;  5   l_balance sys.re$variable_value;&lt;br /&gt;  6   l_birthdate sys.re$variable_value;&lt;br /&gt;  7   l_today  sys.re$variable_value;&lt;br /&gt;  8   l_vvl  sys.re$variable_value_list;&lt;br /&gt;  9   l_true  sys.re$rule_hit_list;&lt;br /&gt; 10   l_maybe  sys.re$rule_hit_list;&lt;br /&gt; 11   l_time  number;&lt;br /&gt; 12  begin&lt;br /&gt; 13   l_age:=sys.re$variable_value('age', anydata.convertnumber(16));&lt;br /&gt; 14   l_balance:=sys.re$variable_value('balance', anydata.convertnumber(1500));&lt;br /&gt; 15   l_birthdate:=sys.re$variable_value('birthdate', anydata.convertdate(to_date('20090101', 'yyyymmdd')));&lt;br /&gt; 16   l_today:=sys.re$variable_value('today', anydata.convertdate(trunc(sysdate)));&lt;br /&gt; 17  &lt;br /&gt; 18   l_vvl:=sys.re$variable_value_list(l_age, l_balance, l_birthdate, l_today);&lt;br /&gt; 19  &lt;br /&gt; 20   l_true:=sys.re$rule_hit_list();&lt;br /&gt; 21   l_maybe:=sys.re$rule_hit_list();&lt;br /&gt; 22  &lt;br /&gt; 23   l_time:=dbms_utility.get_time;&lt;br /&gt; 24   dbms_monitor.session_trace_enable(waits =&gt; true, binds =&gt; false);&lt;br /&gt; 25  &lt;br /&gt; 26   for i in 1 .. p_i&lt;br /&gt; 27   loop&lt;br /&gt; 28    dbms_rule.evaluate(&lt;br /&gt; 29     rule_set_name =&gt; 'adverts',&lt;br /&gt; 30     evaluation_context =&gt; 'adverts_ctx',&lt;br /&gt; 31     variable_values =&gt; l_vvl,&lt;br /&gt; 32     stop_on_first_hit =&gt; false,&lt;br /&gt; 33     true_rules =&gt; l_true,&lt;br /&gt; 34     maybe_rules =&gt; l_maybe&lt;br /&gt; 35    );&lt;br /&gt; 36   end loop;&lt;br /&gt; 37  &lt;br /&gt; 38   dbms_monitor.session_trace_disable;&lt;br /&gt; 39   insert into results values (sys_context('userenv', 'sid'), dbms_utility.get_time-l_time);&lt;br /&gt; 40  end test_dra;&lt;br /&gt; 41  /&lt;br /&gt; &lt;br /&gt;Procedure created&lt;/pre&gt;As before, I've used four parallel jobs with 100000 iterations each. Here are the results I've got:&lt;br /&gt;&lt;pre&gt;SQL&gt; select * from results;&lt;br /&gt; &lt;br /&gt;       SID         CS&lt;br /&gt;---------- ----------&lt;br /&gt;       134       3472&lt;br /&gt;       131       3412&lt;br /&gt;       133       3205&lt;br /&gt;       132       3358 &lt;/pre&gt;And tkprof of one of the jobs:&lt;br /&gt;&lt;pre&gt;OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS&lt;br /&gt;&lt;br /&gt;call     count       cpu    elapsed       disk      query    current        rows&lt;br /&gt;------- ------  -------- ---------- ---------- ---------- ----------  ----------&lt;br /&gt;Parse        0      0.00       0.00          0          0          0           0&lt;br /&gt;Execute      0      0.00       0.00          0          0          0           0&lt;br /&gt;Fetch        0      0.00       0.00          0          0          0           0&lt;br /&gt;------- ------  -------- ---------- ---------- ---------- ----------  ----------&lt;br /&gt;total        0      0.00       0.00          0          0          0           0&lt;br /&gt;&lt;br /&gt;Misses in library cache during parse: 0&lt;br /&gt;&lt;br /&gt;Elapsed times include waiting on following events:&lt;br /&gt;  Event waited on                             Times   Max. Wait  Total Waited&lt;br /&gt;  ----------------------------------------   Waited  ----------  ------------&lt;br /&gt;  library cache: mutex X                        769        0.00          0.04&lt;br /&gt;  latch free                                     39        0.00          0.00&lt;/pre&gt;Which is, well, about 6.5 times slower given wall clock time compared to the cached results using package from my previous blog post:&lt;br /&gt;&lt;pre&gt;OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS&lt;br /&gt;&lt;br /&gt;call     count       cpu    elapsed       disk      query    current        rows&lt;br /&gt;------- ------  -------- ---------- ---------- ---------- ----------  ----------&lt;br /&gt;Parse        3      0.00       0.00          0          0          0           0&lt;br /&gt;Execute 300000      4.85       4.97          0          0          0      300000&lt;br /&gt;Fetch        0      0.00       0.00          0          0          0           0&lt;br /&gt;------- ------  -------- ---------- ---------- ---------- ----------  ----------&lt;br /&gt;total   300003      4.85       4.97          0          0          0      300000&lt;br /&gt;&lt;br /&gt;Misses in library cache during parse: 0&lt;br /&gt;&lt;br /&gt;Elapsed times include waiting on following events:&lt;br /&gt;  Event waited on                             Times   Max. Wait  Total Waited&lt;br /&gt;  ----------------------------------------   Waited  ----------  ------------&lt;br /&gt;  library cache: mutex X                        209        0.00          0.00&lt;br /&gt;  cursor: pin S wait on X                        98        0.01          1.04&lt;br /&gt;  cursor: pin S                                 360        0.00          0.00&lt;/pre&gt;The interesting thing to note here is that &lt;span style="font-style:italic;"&gt;DBMS_RULE_ADM&lt;/span&gt; seems to have a better &lt;span style="font-style:italic;"&gt;scalability&lt;/span&gt;, however, I can't really say at what point it could be justified (if at all) given a pretty hefty wall clock time difference.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15440695-3151972004631731731?l=afatkulin.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://afatkulin.blogspot.com/2009/01/using-dbmsruleadm-to-evaluate-dynamic.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>0</thr:total></item></channel></rss>

