<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:blogger="http://schemas.google.com/blogger/2008" xmlns:georss="http://www.georss.org/georss" xmlns:gd="http://schemas.google.com/g/2005" xmlns:thr="http://purl.org/syndication/thread/1.0" version="2.0"><channel><atom:id>tag:blogger.com,1999:blog-15440695</atom:id><lastBuildDate>Fri, 17 May 2013 07:43:25 +0000</lastBuildDate><title>Alex Fatkulin's Blog</title><description /><link>http://afatkulin.blogspot.com/</link><managingEditor>noreply@blogger.com (Alex Fatkulin)</managingEditor><generator>Blogger</generator><openSearch:totalResults>153</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/TheQUADROBlog" /><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="thequadroblog" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-1241550679239294579</guid><pubDate>Thu, 14 Mar 2013 02:42:00 +0000</pubDate><atom:updated>2013-03-13T22:42:16.308-04:00</atom:updated><title>Parallel unfriendly</title><description>Take a look at the following Parallel section of a SQL Monitor report:
&lt;br/&gt;&lt;br/&gt;
&lt;img src="https://lh3.googleusercontent.com/-YK3sjWa9980/UUEjrrP_LKI/AAAAAAAAA0w/UJDSw08wAKg/s1152/oracle_median.png"/&gt;
Any query which produces such a report won't care about how much parallel you're running because virtually all the time is spent by the query coordinator (which is a serial process) being busy.
&lt;br/&gt;&lt;br/&gt;
In this case the query in question is quite simple:
&lt;pre&gt;select /*+ parallel(t,8) */ median(basket_amount) from whs.fact_sale t&lt;/pre&gt;
The reason it behaves the way it does has everything to do with how Oracle executes it:
&lt;pre&gt;Execution Plan
----------------------------------------------------------
Plan hash value: 712547042

-----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name       | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |            |     1 |     4 |   110K  (3)| 00:00:03 |       |       |        |      |            |
|   1 |  SORT GROUP BY                |            |     1 |     4 |            |          |       |       |        |      |            |
|   2 |   PX COORDINATOR              |            |       |       |            |          |       |       |        |      |            |
|   3 |    PX SEND QC (RANDOM)        | :TQ10000   |   724M|  2763M|   110K  (3)| 00:00:03 |       |       |  Q1,00 | P-&gt;S | QC (RAND)  |
|   4 |     PX BLOCK ITERATOR         |            |   724M|  2763M|   110K  (3)| 00:00:03 |     1 |1048575|  Q1,00 | PCWC |            |
|   5 |      TABLE ACCESS STORAGE FULL| FACT_SALE  |   724M|  2763M|   110K  (3)| 00:00:03 |     1 |1048575|  Q1,00 | PCWP |            |
-----------------------------------------------------------------------------------------------------------------------------------------&lt;/pre&gt;
Each parallel query slave will gets it's own chunk of the table to read from and then simply send data back to the coordinator. The coordinator will then have to deal with all this data by sorting more than 700M rows which, of course, won't be particularly fast. In this sense &lt;i&gt;median&lt;/i&gt; poses an interesting problem since Oracle can't calculate (or, rather, &lt;i&gt;discover&lt;/i&gt;) it without having access to the entire data set and query coordinator is the only process which can do it.
&lt;br/&gt;&lt;br/&gt;
So what do you do when you get impacted by a particular choice of algorithm implemented by Oracle? One way to deal with it is to see whether you can trade one set of problem for another, in case the alternative can be executed in a better way. In this particular case the fact table contains the sale transactions for a particular store chain. While there are many different ways to spent money, the number of distinct spending amounts should be relatively low compared to the number of rows we have in the table and in such a case we can calculate the median in a different way.
&lt;br/&gt;&lt;br/&gt;
What we can do instead is count how many occurrences of each spending we have and, when sorted by the spending amount, that will give us a compressed form of the raw data which still retains all the information required to find a &lt;i&gt;median&lt;/i&gt;. Let's say you have a table with the following data:
&lt;pre&gt;SQL&gt; select n from z_t;
 
         N
----------
         1
         1
         2
         3
         3
         5
         7
 
7 rows selected&lt;/pre&gt;The first step is to find how many occurrences of each value do we have:
&lt;pre&gt;SQL&gt; select n, count(*) cnt
  2   from z_t
  3   group by n;
 
         N        CNT
---------- ----------
         1          2
         2          1
         5          1
         3          2
         7          1&lt;/pre&gt;If the number of distinct values is relatively low, the group by will be able to collapse the result set enough as to make subsequent work to be not very significant as well as do it in a very parallel friendly way. The next step is to calculate the cardinality of the data set, at which places we have each distinct value as well as how many values are there:
&lt;pre&gt;SQL&gt; select n, lag(running_sum, 1, 0) over (order by n) prev_running_sum, running_sum, total_row_count
  2  from (
  3  select n,
  4   sum(cnt) over (order by n) running_sum,
  5   sum(cnt) over () total_row_count
  6  from (
  7  select n, count(*) cnt
  8   from z_t
  9   group by n
 10  ));
 
         N PREV_RUNNING_SUM RUNNING_SUM TOTAL_ROW_COUNT
---------- ---------------- ----------- ---------------
         1                0           2               7
         2                2           3               7
         3                3           5               7
         5                5           6               7
         7                6           7               7&lt;/pre&gt;
So what the above tells us is we have two 1s, followed by a single 2, followed by two 3s and so on. Because we have seven elements in our data set, we know that the median will be the item number four which we can now easily discover:
&lt;pre&gt;SQL&gt; select avg(n) from (
  2  select n, lag(value_begin, 1, 0) over (order by n) prev_value_begin, value_begin, total_row_count
  3  from (
  4  select n,
  5   sum(cnt) over (order by n) value_begin,
  6   sum(cnt) over () total_row_count
  7  from (
  8  select n, count(*) cnt
  9   from z_t
 10   group by n
 11  ))) where total_row_count/2 between prev_value_begin and value_begin;
 
    AVG(N)
----------
         3&lt;/pre&gt;The &lt;i&gt;avg&lt;/i&gt; is there for a case where we have an even number of elements since the median in this case equals to a mean value of two values in the middle.
&lt;br/&gt;&lt;br/&gt;
Our new real query will look like this:
&lt;pre&gt;select avg(n) from (
select n, lag(value_begin, 1, 0) over (order by n) prev_value_begin, value_begin, total_row_count
from (
select n,
 sum(cnt) over (order by n) value_begin,
 sum(cnt) over () total_row_count
from (
select /*+ parallel(t,8) */ basket_amount n, count(*) cnt
 from whs.fact_sale t
 group by basket_amount
))) where total_row_count/2 between prev_value_begin and value_begin;&lt;/pre&gt;
So what does a group by and a set of analytic functions is able to bring to a table? Let's take a look:
&lt;br/&gt;&lt;br/&gt;
&lt;img src="https://lh6.googleusercontent.com/-Y_W6akIR_sk/UUE2KSeX_VI/AAAAAAAAA1A/30KCUmwJaZE/s1152/analytical_median.png"/&gt;
The total execution time has dropped from almost 26 minutes down to 28 seconds. Moreover, the workload is now much more skewed towards parallel query slaves, which is exactly what we want to see. Of course, the trick only works if the group by is able to collapse the data sufficiently enough.</description><link>http://afatkulin.blogspot.com/2013/03/parallel-unfriendly.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://lh3.googleusercontent.com/-YK3sjWa9980/UUEjrrP_LKI/AAAAAAAAA0w/UJDSw08wAKg/s72-c/oracle_median.png" height="72" width="72" /><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-5116087954986656564</guid><pubDate>Tue, 26 Feb 2013 17:38:00 +0000</pubDate><atom:updated>2013-02-26T12:53:02.572-05:00</atom:updated><title>In-memory PQ and physical reads</title><description>In my &lt;a href="http://afatkulin.blogspot.ca/2013/02/does-in-memory-pq-work-with.html"&gt;previous post&lt;/a&gt; I've demonstrated how in-memory PQ can access the table directly from the buffer cache even when you're using manual DOP.
&lt;br/&gt;&lt;br/&gt;
One interesting question, however, is what happens when PQ slave needs to read some blocks from disk given that object has been qualified for in-memory (cached) access? Would the slave do it using direct or buffered I/O?
&lt;br/&gt;&lt;br/&gt;
The answer becomes somewhat clear once you realize that in-memory PQ is enabled by simply utilizing buffered reads. Since direct path reads can not take advantage of any blocks stored in the buffer cache (local or remote), trying to do direct path reads will defeat the whole point of in-memory PQ.
&lt;br/&gt;&lt;br/&gt;
To demonstrate the point here is the excerpt from a &lt;i&gt;tkprof&lt;/i&gt; output for one of the parallel query slaves:
&lt;pre&gt;Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  SORT AGGREGATE (cr=0 pr=0 pw=0 time=0 us)
         0          0          0   PX COORDINATOR  (cr=0 pr=0 pw=0 time=0 us)
         0          0          0    PX SEND QC (RANDOM) :TQ10000 (cr=0 pr=0 pw=0 time=0 us)
         1          1          1     SORT AGGREGATE (cr=62838 pr=62500 pw=0 time=7492751 us)
     57696      62500      72120      PX BLOCK ITERATOR (cr=62838 pr=62500 pw=0 time=24528381 us cost=18846 size=0 card=500000)
     57696      62500      72120       TABLE ACCESS FULL Z_TEST (cr=62838 pr=62500 pw=0 time=23944184 us cost=18846 size=0 card=500000)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  PX Deq: Execution Msg                         128        0.19          1.06
  db file scattered read                       4546        0.16         43.99
  latch free                                      1        0.00          0.00
  latch: cache buffers lru chain                 31        0.01          0.03
  latch: cache buffers chains                     3        0.00          0.00
  latch: object queue header operation            4        0.00          0.00&lt;/pre&gt;
Note that the slave waited on &lt;i&gt;db file scattered read&lt;/i&gt; event which is nothing else but buffered multiblock reads. If you were to run the same test on the Exadata platform you would see &lt;i&gt;cell multiblock physical read&lt;/i&gt; event instead, given that in-memory PQ did get utilized. There are a couple of consequences for this:
&lt;ul&gt;
&lt;li&gt;There is no need to do object level checkpoint. This makes in-memory PQ somewhat more friendly to be running in OLTP environment since it doesn't need to flush any dirty buffers to disk.&lt;/li&gt;
&lt;li&gt;If you running on the Exadata platform, none of the offloading will happen, even if you have to read the significant portion of the table from disk.&lt;/li&gt;
&lt;/ul&gt;&lt;br/&gt;
On another note it looks like the things came full circle. Serial sessions are now able to utilize direct path reads while parallel query slaves are able to do buffered I/O.
&lt;br/&gt;&lt;br/&gt;
&lt;i&gt;All tests were done on 11.2.0.3 (both Exadata and non-Exadata).&lt;/i&gt;</description><link>http://afatkulin.blogspot.com/2013/02/in-memory-pq-and-physical-reads.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-5443019742848231140</guid><pubDate>Mon, 18 Feb 2013 20:18:00 +0000</pubDate><atom:updated>2013-02-18T15:20:16.954-05:00</atom:updated><title>Does in-memory PQ work with PARALLEL_DEGREE_POLICY=MANUAL?</title><description>In-memory parallel execution seems to be gaining popularity especially among people running x2-8 and x3-8 Exadata systems or any other system that have large amounts of memory capable of caching lots of data.
&lt;br/&gt;&lt;br/&gt;
Oracle documentation &lt;a href="http://docs.oracle.com/cd/E11882_01/server.112/e25523/parallel002.htm#BEICFFGB"&gt;suggests&lt;/a&gt; that in order to utilize in-memory PQ, &lt;i&gt;parallel_degree_policy&lt;/i&gt; needs to be set to &lt;i&gt;auto&lt;/i&gt;.
&lt;br/&gt;&lt;br/&gt;
&lt;b&gt;_parallel_cluster_cache_policy&lt;/b&gt;
&lt;br/&gt;&lt;br/&gt;
One of the parameters influenced by &lt;i&gt;parallel_degree_policy&lt;/i&gt; is &lt;i&gt;_parallel_cluster_cache_policy&lt;/i&gt;. When using Auto DOP &lt;i&gt;_parallel_cluster_cache_policy&lt;/i&gt; will be set to &lt;i&gt;cached&lt;/i&gt;. The question then becomes what happens if we set &lt;i&gt;_parallel_cluster_cache_policy=cached&lt;/i&gt; while still keeping Manual DOP? Will the system use in-memory PQ?
&lt;br/&gt;&lt;br/&gt;
&lt;b&gt;Test table&lt;/b&gt;
&lt;br/&gt;&lt;br/&gt;
Below is a test table setup:
&lt;pre&gt;SQL&gt; create table z_test tablespace data as
        select level n, rpad('*', 4000, '*') v
                from dual
                connect by level &lt;= 500000;

Table created.

SQL&gt; alter table z_test add constraint pk_z_test primary key (n);

Table altered.

SQL&gt; select bytes/power(1024,2) mb
        from user_segments
        where segment_name='Z_TEST';

        MB
----------
      3968

SQL&gt; exec dbms_stats.gather_table_stats(user, 'z_test');

PL/SQL procedure successfully completed.&lt;/pre&gt;
The instance is running with 12G buffer cache so it'll have no problems fully caching the above table. All tests were done on my in-house test lab with Oracle 11.2.0.3.3 running inside a Linux VM.
&lt;br/&gt;&lt;br/&gt;
&lt;b&gt;Classic PQ #1&lt;/b&gt;
&lt;br/&gt;&lt;br/&gt;
Without setting any additional parameters PQ behave the way it always did -- by utilizing direct path reads directly to the process' memory:
&lt;pre&gt;SQL&gt; set timing on
SQL&gt; set autot trace exp stat
SQL&gt; select /*+ parallel(8) full(z_test) */ count(*) from z_test;

Elapsed: 00:00:02.86

Execution Plan
----------------------------------------------------------
Plan hash value: 2128527892

--------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name     | Rows  | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |     1 | 18846   (1)| 00:00:01 |        |      |            |
|   1 |  SORT AGGREGATE        |          |     1 |            |          |        |      |            |
|   2 |   PX COORDINATOR       |          |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM) | :TQ10000 |     1 |            |          |  Q1,00 | P-&gt;S | QC (RAND)  |
|   4 |     SORT AGGREGATE     |          |     1 |            |          |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR |          |   500K| 18846   (1)| 00:00:01 |  Q1,00 | PCWC |            |
|   6 |       TABLE ACCESS FULL| Z_TEST   |   500K| 18846   (1)| 00:00:01 |  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------------

Note
-----
   - Degree of Parallelism is 8 because of hint


Statistics
----------------------------------------------------------
         25  recursive calls
          0  db block gets
     500525  consistent gets
     500000  physical reads
          0  redo size
        526  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed&lt;/pre&gt;
It took us about 2.86 seconds to full scan the table which equals 1387MB/s throughput (my test lab storage setup is described &lt;a href="http://afatkulin.blogspot.ca/2012/10/zfs-home-storage-network-at-10gbe.html"&gt;here&lt;/a&gt;). The above clearly shows that we had to do physical reads in order to access the entire table.
&lt;br/&gt;&lt;br/&gt;
&lt;b&gt;Caching the table&lt;/b&gt;
&lt;br/&gt;&lt;br/&gt;
Of course, before testing the in-memory PQ, we need to make sure that our entire table sits in the buffer cache. The easiest way to do it is perform an FTS on the table using an index:
&lt;pre&gt;SQL&gt; select /*+ index(z_test,pk_z_test) */ v from z_test;

500000 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 579016438

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |   500K|  1907M|   501K  (1)| 00:00:03 |
|   1 |  TABLE ACCESS BY INDEX ROWID| Z_TEST    |   500K|  1907M|   501K  (1)| 00:00:03 |
|   2 |   INDEX FULL SCAN           | PK_Z_TEST |   500K|       |  1052   (1)| 00:00:01 |
-----------------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
     534311  consistent gets
     501105  physical reads
          0  redo size
 2021185355  bytes sent via SQL*Net to client
     367187  bytes received via SQL*Net from client
      33335  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     500000  rows processed
&lt;/pre&gt;
Let's check to make sure all table blocks are in the buffer cache:
&lt;pre&gt;SQL&gt; set autot off
SQL&gt; select count(*)
        from v$bh
        where objd= (select data_object_id from user_objects where object_name='Z_TEST')
                and status='xcur';

  COUNT(*)
----------
    500001
&lt;/pre&gt;
Now we're good to go!
&lt;br/&gt;&lt;br/&gt;
&lt;b&gt;Classic PQ #2&lt;/b&gt;
&lt;br/&gt;&lt;br/&gt;
Even with the table entirely cached we still get it using physical reads when utilizing classic PQ -- as it should be:
&lt;pre&gt;SQL&gt; set autot trace exp stat
SQL&gt; select /*+ parallel(8) full(z_test) */ count(*) from z_test;

Elapsed: 00:00:02.83

Execution Plan
----------------------------------------------------------
Plan hash value: 2128527892

--------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name     | Rows  | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |     1 | 18846   (1)| 00:00:01 |        |      |            |
|   1 |  SORT AGGREGATE        |          |     1 |            |          |        |      |            |
|   2 |   PX COORDINATOR       |          |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM) | :TQ10000 |     1 |            |          |  Q1,00 | P-&gt;S | QC (RAND)  |
|   4 |     SORT AGGREGATE     |          |     1 |            |          |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR |          |   500K| 18846   (1)| 00:00:01 |  Q1,00 | PCWC |            |
|   6 |       TABLE ACCESS FULL| Z_TEST   |   500K| 18846   (1)| 00:00:01 |  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------------

Note
-----
   - Degree of Parallelism is 8 because of hint


Statistics
----------------------------------------------------------
         24  recursive calls
          0  db block gets
     500525  consistent gets
     500000  physical reads
          0  redo size
        526  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
&lt;/pre&gt;
&lt;b&gt;In-memory PQ&lt;/b&gt;
&lt;br/&gt;&lt;br/&gt;
Let's flip the parameter responsible for in-memory PQ (while still keeping &lt;i&gt;parallel_degree_policy=manual&lt;/i&gt;) and see what happens:
&lt;pre&gt;SQL&gt; alter session set "_parallel_cluster_cache_policy"=cached;

Session altered.

Elapsed: 00:00:00.01
SQL&gt; select /*+ parallel(8) full(z_test) */ count(*) from z_test;

Elapsed: 00:00:00.36

Execution Plan
----------------------------------------------------------
Plan hash value: 2128527892

--------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name     | Rows  | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |     1 | 18846   (1)| 00:00:01 |        |      |            |
|   1 |  SORT AGGREGATE        |          |     1 |            |          |        |      |            |
|   2 |   PX COORDINATOR       |          |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM) | :TQ10000 |     1 |            |          |  Q1,00 | P-&gt;S | QC (RAND)  |
|   4 |     SORT AGGREGATE     |          |     1 |            |          |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR |          |   500K| 18846   (1)| 00:00:01 |  Q1,00 | PCWC |            |
|   6 |       TABLE ACCESS FULL| Z_TEST   |   500K| 18846   (1)| 00:00:01 |  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------------

Note
-----
   - Degree of Parallelism is 8 because of hint


Statistics
----------------------------------------------------------
         24  recursive calls
          0  db block gets
     502709  consistent gets
          0  physical reads
          0  redo size
        526  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed&lt;/pre&gt;
For you see -- the entire table got read from the buffer cache this time and &lt;i&gt;much faster&lt;/i&gt;! The fact that we did zero physical IOs shows in-memory PQ kicking in.
&lt;br/&gt;&lt;br/&gt;
&lt;b&gt; Conclusion &lt;/b&gt;
&lt;br/&gt;&lt;br/&gt;
It is possible to use in-memory PQ with Manual DOP by setting &lt;i&gt;_parallel_cluster_cache_policy=cached&lt;/i&gt;. Of course, always consult with Oracle support before flipping any of the underscore parameters.</description><link>http://afatkulin.blogspot.com/2013/02/does-in-memory-pq-work-with.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-1619637360283814406</guid><pubDate>Mon, 28 Jan 2013 15:37:00 +0000</pubDate><atom:updated>2013-01-28T10:37:33.034-05:00</atom:updated><title>GoldenGate and transient PK updates</title><description>The problem of transient PK updates is well known and comes from the fact that pretty much every replication solution on the market applies changes using row-by-row approach.
&lt;br/&gt;&lt;br/&gt;
To quickly recap the problem, if you have a table like:

&lt;pre&gt;SQL&gt; create table tpk (n number primary key);
 
Table created
 
SQL&gt; insert into tpk values (1);
 
1 row inserted
 
SQL&gt; insert into tpk values (2);
 
1 row inserted
 
SQL&gt; commit;
 
Commit complete&lt;/pre&gt;
Then executing the following statement...
&lt;pre&gt;update tpk set n=n+1&lt;/pre&gt;
...will result in a transient PK problem since the replication solution will have to decompose it into the following two statements:
&lt;pre&gt;update tpk set n=2 where n=1;
update tpk set n=3 where n=2;&lt;/pre&gt;
There are two immediate (and major) problems with the above statements. The first problem is that we can't execute the first statement without violating the primary key constraint. Another problem is, even if we somehow could execute the first statement, the second statement will result in updating both rows since they now have the same value!
&lt;br/&gt;&lt;br/&gt;
Oracle Streams historically &lt;a href="http://wedostreams.blogspot.ca/2009/04/oracle-streams-can-do-things-mortals.html"&gt;dealt with that problem&lt;/a&gt; using internal mechanism which you &lt;a href="http://www.pythian.com/blog/oracle-delete-and-re-insert-row-in-the-same-statement"&gt;could leverage&lt;/a&gt; by executing a specially constructed LCR. Other (third-party) replication solution were pretty much out of luck and had to rely on elaborate tricks in order to work around the problem.
&lt;br/&gt;&lt;br/&gt;This is all about to change.
&lt;br/&gt;&lt;br/&gt;
The necessity to better integrate GoldenGate and bring it feature set up has driven quite a bit of exciting innovation. One of these innovations solves the transient PK problem not only for GoldenGate but for everybody else.
&lt;br/&gt;&lt;br/&gt;&lt;b&gt;dbms_xstream_gg package&lt;/b&gt;
&lt;br/&gt;&lt;br/&gt;
The above package has been available at least since 11.2.0.2 and has two procedures which are directly relevant to the problem described above. I'm talking about &lt;i&gt;enable_tdup_workspace&lt;/i&gt; and &lt;i&gt;disable_tdup_workspace&lt;/i&gt;. Here is a quick demonstration of how they work:
&lt;pre&gt;SQL&gt; --this will result in PK violation
SQL&gt; update tpk set n=2 where n=1;
 
update tpk set n=2 where n=1
 
ORA-00001: unique constraint (ROOT.SYS_C005031) violated
 
SQL&gt; exec dbms_xstream_gg.enable_tdup_workspace;
 
PL/SQL procedure successfully completed
 
SQL&gt; --this is now works!
SQL&gt; update tpk set n=2 where n=1;
 
1 row updated
 
SQL&gt; update tpk set n=3 where n=2;
 
1 row updated
 
SQL&gt; exec dbms_xstream_gg.disable_tdup_workspace;
 
PL/SQL procedure successfully completed
 
SQL&gt; commit;
 
Commit complete&lt;/pre&gt;
As you can see, the procedure allows us to avoid classical transient PK problem! Indeed, that's what GoldenGate uses internally to avoid getting in troubles as well. The implementation seems to be leveraging the same &lt;i&gt;delete+insert&lt;/i&gt; trick Oracle Streams did:
&lt;pre&gt;SQL&gt; exec dbms_xstream_gg.enable_tdup_workspace;
 
PL/SQL procedure successfully completed
 
SQL&gt; select n, rowid from tpk;
 
         N ROWID
---------- ------------------
         1 AAAO25AAFAAClWFAAA
         2 AAAO25AAFAAClWFAAB
 
SQL&gt; update tpk set n=2 where n=1;
 
1 row updated
 
SQL&gt; select n, rowid from tpk;
 
         N ROWID
---------- ------------------
         2 AAAO25AAFAAClWFAAB
 
SQL&gt; update tpk set n=3 where n=2;
 
1 row updated
 
SQL&gt; select n, rowid from tpk;
 
         N ROWID
---------- ------------------
         2 AAAO25AAFAAClWFAAA
         3 AAAO25AAFAAClWFAAB&lt;/pre&gt;
Note how the row mysteriously disappears after the first update and then suddenly comes back after the second one?
&lt;br/&gt;&lt;br/&gt;
I think anybody who were into any sort of replication and its problems will find this to be one of the most significant new features made available. The only caveat is that the above package is not documented so anyone thinking about leveraging it needs to carefully think about the way it behaves.</description><link>http://afatkulin.blogspot.com/2013/01/goldengate-and-transient-pk-updates.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>1</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-7324299741975837670</guid><pubDate>Thu, 20 Dec 2012 21:36:00 +0000</pubDate><atom:updated>2012-12-20T16:36:24.163-05:00</atom:updated><title>Oracle GoldenGate Sequence Replication</title><description>When using Oracle GoldenGate sequence replication there is a number of issues you need to be aware of especially if you replicate quite a lot of busy sequences.
&lt;br/&gt;&lt;br/&gt;
The first issue is that GoldenGate sequence replication does not use bind variables. Let's execute the following statements on the source system:
&lt;pre&gt;SQL&gt; create sequence rep1.s1 nocache;
 
Sequence created
 
SQL&gt; select rep1.s1.nextval from dual;
 
   NEXTVAL
----------
         1
 
SQL&gt; select rep1.s1.nextval from dual;
 
   NEXTVAL
----------
         2&lt;/pre&gt;
GoldenGate uses PL/SQL procedure called &lt;i&gt;replicateSequence&lt;/i&gt; each time it needs to sync sequence values. The following calls will be made on the destination system as a result of the above statements:

&lt;pre&gt;BEGIN ggext .replicateSequence (TO_NUMBER(2), TO_NUMBER(20), TO_NUMBER(1), 'REP1', TO_NUMBER(0), 'S1', UPPER('ggrep'), TO_NUMBER(1), TO_NUMBER (0), ''); END;
BEGIN ggext .replicateSequence (TO_NUMBER(3), TO_NUMBER(20), TO_NUMBER(1), 'REP1', TO_NUMBER(0), 'S1', UPPER('ggrep'), TO_NUMBER(1), TO_NUMBER (0), ''); END;&lt;/pre&gt;
The first parameter is a target sequence value (seq$.highwater) and it's the one which is causing most of the issues, especially if the sequence has been declared with relatively low cache value (or &lt;i&gt;nocache&lt;/i&gt; at all, as in my example). Every time a new sequence last value gets written into the source system data dictionary we get a hard parse on the destination!
&lt;br/&gt;&lt;br/&gt;
When using higher cache values the problem of hard parses gets somewhat mitigated but there is another issue. When replicating such a sequence GoldenGate follows these steps:
&lt;br/&gt;
&lt;ol&gt;
&lt;li&gt;Sets sequence to &lt;i&gt;nocache&lt;/i&gt;&lt;/li&gt;
&lt;li&gt;Executes &lt;i&gt;sequence.nextval&lt;/i&gt; until it reaches the target value&lt;/li&gt;
&lt;li&gt;Restores altered sequence properties&lt;/li&gt;
&lt;/ol&gt;
So if you have, say, a sequence with cache 1000 then each time a new value gets written into seq$.highwater on the source, GoldenGate is going to execute sequence.nextval one thousand times on the destination! As you can imagine this aren't performing particularly fast as getting every next value will result in Oracle updating actual row in the data dictionary. All of the above means that replicating sequences can sometimes put quite a bit of strain on the system. A much faster (and simpler) approach would be to use &lt;i&gt;step&lt;/i&gt; to reach target sequence value instead of fetching it on-by-one in &lt;i&gt;nocache&lt;/i&gt; mode. Last but not least, you can end up with &lt;i&gt;nocache&lt;/i&gt; sequence if procedure crashes in the middle.</description><link>http://afatkulin.blogspot.com/2012/12/oracle-goldengate-sequence-replication.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-7444543305537503032</guid><pubDate>Sat, 20 Oct 2012 23:04:00 +0000</pubDate><atom:updated>2012-10-21T00:08:30.720-04:00</atom:updated><title>ZFS Home Storage Network at 10GbE</title><description>About a year ago I've decided to put all my data on a home built ZFS storage server. The growing amount of devices around my household prompted for an easier and much faster way to share the data. Since then the box was happily serving both CIFS as well as iSCSI over 1GbE network without any issues.
&lt;br/&gt;&lt;br/&gt;
I was keen on upgrading to 10GbE for quite some time as both my server as well as clients could easily saturate 1GbE link when ZFS had all the required data in ARC. 32GB RAM in my storage server usually left me with the ARC of about 20GB which in most cases happened to be enough to cache the entire workset I was working with. Bottom line is the box rarely had to touch the disk and even if it did there was 120GB L2ARC SSD to even the bumps capable of maxing out 1GbE link as well.
&lt;br/&gt;&lt;br/&gt;
It so happened that I managed to get my hands on a pair of 10GBASE-T Emulex OCe11102-NT NICs which I bought at a significant discount. With 10GBASE-T switches still costing upwards of multiple thousand dollars (even when used) I decided to just get a pair of CAT6 crossover cables running from the ZFS storage box to my workstation and do some tests to see what this configuration could be capable of.
&lt;br/&gt;&lt;br/&gt;
&lt;b&gt;Storage Server&lt;/b&gt;
&lt;br/&gt;&lt;br/&gt;My storage server is running Solaris 11 and the storage pool is built using 4x3TB Hitachi 5400RPM drives in RAID10 (ZFS mirror). The box has 32GB RAM and 120GB Vertex 3 MAX IOPS SSD for L2ARC. As mentioned above, the cache subsystem is enough to keep the box from hitting the disks most of the time. All that is driven by Intel Core i7-3770 CPU (Ivy Bridge).
&lt;br/&gt;&lt;br/&gt;
&lt;b&gt;iSCSI network&lt;/b&gt;
&lt;br/&gt;&lt;br/&gt;
I've decided to dedicate 10GbE adapters to the iSCSI network I have between the storage box and my workstation. First of all, this is where I need all the speed I can get. Secondly, I can utlilize both ports with iSCSI MPIO thus archiving 20Gb available bandwidth. This is probably a total overkill but since my cards are dual ported I may as well use both ports as all I need is an extra cat6 cable. The network utilizes 9K jumbo frames. ZFS volume is using 4K block size to match NTFS file system cluster size built on top of the iSCSI volume. COMSTAR is used as an iSCSI target server with Microsoft iSCSI Initiator used for the client.
&lt;br/&gt;&lt;br/&gt;
&lt;b&gt;Test Results - IOPS&lt;/b&gt;
&lt;br/&gt;&lt;br/&gt;
I'll start with IOPS results with 100% random read access over 20GB of data using Iometer at different block sizes and worker counts. Each worker was set to do 16 outstanding I/Os.
&lt;br/&gt;&lt;br/&gt;
&lt;img alt="IOPS" src="https://lh3.googleusercontent.com/-k6Flci3SFqQ/UIMoMMpDr9I/AAAAAAAAAxI/Oq7Xx7Vm5rA/s800/10gbe_iops.png"/&gt;
&lt;br/&gt;With 4K blocks the system is able to archive quite impressive 226K IOPS! The storage sever CPU is simply running flat out at this point so I'm confident there is more available from the network. At 16K blocks the system is pushing over 1.5GB/s of random IO which is equally impressive and clearly goes beyond what a single 10GbE link is capable of so the second link is certainly being put to a good use.
&lt;br/&gt;&lt;br/&gt;
&lt;b&gt;Test Results - Bandwidth&lt;/b&gt;
&lt;br/&gt;&lt;br/&gt;
For bandwidth test I've just set Iometer to do 1MB sequential reads with 16 outstanding IO/s per worker.
&lt;br/&gt;&lt;br/&gt;
&lt;img alt="Throughput" src="https://lh4.googleusercontent.com/-4j6s45ViZxM/UIMq6q9yioI/AAAAAAAAAxc/nSOjClxMuQk/s800/10gbe_thr.png"/&gt;
&lt;br/&gt;&lt;br/&gt;Even with a single worker the system can push 2085MB/s across the wire which is getting quite close to the maximum practical speed you can get out of 2x10GbE NICs so I'm quite happy with this result!
&lt;br/&gt;&lt;br/&gt;
&lt;b&gt;Conclusion&lt;/b&gt;
&lt;br/&gt;&lt;br/&gt;
I'll be doing more testing in the upcoming days but so far it appears that the upgrade was totally worth it. Having a home system capable of pushing 226K IOPs and 2GB/s bandwidth is an impressive show of how far you can push the system consisting mostly of consumer grade components. Keep in mind that the only way I could get the above numbers is by making sure all the data is available in ZFS ARC which was the initial goal of my setup.</description><link>http://afatkulin.blogspot.com/2012/10/zfs-home-storage-network-at-10gbe.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://lh3.googleusercontent.com/-k6Flci3SFqQ/UIMoMMpDr9I/AAAAAAAAAxI/Oq7Xx7Vm5rA/s72-c/10gbe_iops.png" height="72" width="72" /><thr:total>7</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-8882880100924630166</guid><pubDate>Thu, 26 Jul 2012 20:52:00 +0000</pubDate><atom:updated>2012-07-26T16:52:58.922-04:00</atom:updated><title>Exporting DBFS via NFS</title><description>Anybody who was thinking about exporting DBFS via NFS have probably stumbled upon the fact the &lt;a href="http://www.oracle.com/technetwork/database/features/secure-files/dbfs-sf-oow2010-359266.pdf"&gt;Oracle says&lt;/a&gt; it can not be done:
&lt;blockquote&gt;DBFS does not support exporting NFS or SAMBA exports&lt;/blockquote&gt;
&lt;b&gt;What's wrong with DBFS?&lt;/b&gt;
&lt;br/&gt;
&lt;br/&gt;
There is nothing wrong with DBFS itself. The problem originated form the fact that &lt;a href="http://fuse.sourceforge.net/"&gt;FUSE&lt;/a&gt; did not have proper interfaces implemented to support exporting by the kernel. Newer versions of the Linux kernel fully support exporting. I know that OEL 6.x works for sure as I did the DBFS exports myself through both NFS as well as Samba. The common minimum kernel version circulating across the internet seems to be 2.6.27 but I haven't had a chance to check whether it's true.
&lt;br/&gt;
&lt;br/&gt;
&lt;b&gt;Older Kernels&lt;/b&gt;
&lt;br/&gt;
&lt;br/&gt;
Fact of the matter is -- it was always possible to export &lt;i&gt;FUSE&lt;/i&gt; via &lt;i&gt;NFS&lt;/i&gt;. You just had to use user mode NFS server, like &lt;a href="http://unfs3.sourceforge.net"&gt;UNFS3&lt;/a&gt;. I did that too and know that it works regardless of the kernel version you're running. Unfortunately projects like &lt;i&gt;UNFS3&lt;/i&gt; are lagging in development as well as feature-wise. But they do work if you have to have the NFS exports working and stuck with the older Linux kernels.</description><link>http://afatkulin.blogspot.com/2012/07/exporting-dbfs-via-nfs.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-3527852335978798524</guid><pubDate>Thu, 26 Jul 2012 14:44:00 +0000</pubDate><atom:updated>2012-07-26T10:44:57.374-04:00</atom:updated><title>Enkitec Extreme Exadata Expo</title><description>I will be hanging around &lt;a href="http://extremeexadata.com/"&gt;E4&lt;/a&gt;, it's going to be a really cool and geeky event. See you all there!
&lt;br/&gt;&lt;br/&gt;
&lt;a href="http://extremeexadata.com/"&gt;&lt;img height="70%" src="http://extremeexadata.com/img/frontend/banner.jpg" width="70%" /&gt;&lt;/a&gt;</description><link>http://afatkulin.blogspot.com/2012/07/enkitec-extreme-exadata-expo.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>1</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-4992346031879331707</guid><pubDate>Tue, 24 Jul 2012 13:56:00 +0000</pubDate><atom:updated>2012-07-24T09:56:59.784-04:00</atom:updated><title>Oracle GoldenGate Integrated Capture</title><description>Oracle GoldenGate 11.2 release notes contain an interesting new feature:

&lt;blockquote&gt;Extract can now be used in integrated capture mode with an Oracle database. Extract integrates with an Oracle database log mining server to receive change data from that server in the form of logical change records (LCR).&lt;/blockquote&gt;

All of that just rings too many bells so I've decided to find out what exactly have happened. This feature requires database patches to be installed (described in &lt;i&gt;Note:1411356.1&lt;/i&gt;).
&lt;br/&gt;&lt;br/&gt;
&lt;b&gt;Stack dumps&lt;/b&gt;
&lt;br/&gt;&lt;br/&gt;
Stack dump reveals a lot of interesting information already (I've left only relevant pieces in place):

&lt;pre&gt;...
#10 0x00002b08f2ba21b7 in knxoutReceiveLCR () from /u01/app/oracle/ggs/libclntsh.so.11.1
#11 0x00002b08f2ae1048 in OCIXStreamOutLCRReceive () from /u01/app/oracle/ggs/libclntsh.so.11.1
#12 0x0000000000721a96 in IXAsyncReader::ProcessBatchNonCallbackArray() ()
#13 0x0000000000722dbc in IXAsyncReader::ReaderThread(void*) ()
#14 0x0000003ce8a0673d in start_thread () from /lib64/libpthread.so.0
...&lt;/pre&gt;

The first point of entry into &lt;i&gt;libclntsh.so.11.1&lt;/i&gt; library is nothing else but &lt;i&gt;&lt;a href="http://docs.oracle.com/cd/E11882_01/server.112/e16545/xstrm_oci.htm#CIHJABFG"&gt;OCIXStreamOutLCRReceive&lt;/a&gt;&lt;/i&gt;. This is a function to receive LCR (Logical Change Record) from an outbound &lt;i&gt;XStream&lt;/i&gt; server. Let's confirm that we have an outbound server in our database:
&lt;pre&gt;SQL&gt; select server_name, capture_name from dba_xstream_outbound;
 
SERVER_NAME                    CAPTURE_NAME
------------------------------ ------------------------------
OGG$TEST_EXT                   OGG$CAP_TEST_EXT&lt;/pre&gt;

An Oracle Streams Capture process itself:

&lt;pre&gt;SQL&gt; select capture_name, rule_set_name, purpose
  2   from dba_capture;
 
CAPTURE_NAME                   RULE_SET_NAME                  PURPOSE
------------------------------ ------------------------------ -------------------
OGG$CAP_TEST_EXT               OGG$TEST_EXT_CAPTURE_I         GoldenGate Capture&lt;/pre&gt;

And we can see all the familiar Oracle Streams rules:

&lt;pre&gt;SQL&gt; select rsr.rule_name, r.rule_condition
        from dba_rule_set_rules rsr, dba_rules r
        where rsr.rule_set_owner = r.rule_owner
                and rsr.rule_name = r.rule_name
                and rsr.rule_set_name='OGG$TEST_EXT_CAPTURE_I';  2    3    4    5

RULE_NAME       RULE_CONDITION
--------------- ----------------------------------------------------------------------
TEST21          (((:ddl.get_source_database_name() = 'TEST' )) and (:ddl.get_source_da
                tabase_name() != '$'))

GGS_TRACE20     (((:dml.get_object_owner() = 'GGEXT' and :dml.get_object_name() = 'GGS
                _TRACE')) and :dml.get_source_database_name() = 'TEST' )

TEST19          ((:dml.get_object_owner() = 'TEST') and :dml.get_source_database_name(
                ) = 'TEST' )
&lt;/pre&gt;
These rules got generated based on the parameter file I have for my Extract.
&lt;br/&gt;&lt;br/&gt;&lt;b&gt;XStream&lt;/b&gt;&lt;br/&gt;&lt;br/&gt;
Oracle GoldenGate Extract process became nothing else but an XStream client which receives LCRs from Streams Capture process and converts these to GoldenGate trail files. Anybody who was worried about Oracle Streams future because GoldenGate was supposed to be the new strategic direction (and so on and so forth) can stop worrying now. &lt;i&gt;Oracle GoldenGate is Oracle Streams&lt;/i&gt;. The only part left are trail files and I wouldn't be surprised if that will eventually go away as well.
&lt;br/&gt;&lt;br/&gt;Oracle Streams have won the tech battle.</description><link>http://afatkulin.blogspot.com/2012/07/oracle-goldengate-integrated-capture.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>2</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-5494434701862746855</guid><pubDate>Wed, 18 Jul 2012 17:32:00 +0000</pubDate><atom:updated>2012-07-18T13:32:22.627-04:00</atom:updated><title>V$CELL_CONFIG</title><description>If you ever wondered how you can convert XML information in &lt;i&gt;v$cell_config&lt;/i&gt; on Exadata database servers into relational rows and columns so you can get a nice view of the cell configuration without going into the cell itself then here is a nice piece of SQL which does the job. I'm using &lt;i&gt;celldisks&lt;/i&gt; as an example and selecting only a number of column so the output fits nicely on the screen:

&lt;pre&gt;SQL&gt; select cellname,
  2    name,
  3    deviceName,
  4    diskType,
  5    round(freeSpace/power(1024,3), 2) freeSpace,
  6    round(disk_size/power(1024,3), 2) disk_size
  7   from (
  8    select cellname, XMLTYPE.createXML(confval) confval
  9     from v$cell_config
 10     where conftype='CELLDISKS'
 11      and cellname='192.168.12.3'
 12    ) v,
 13    xmltable('/cli-output/celldisk' passing v.confval
 14     columns
 15      name varchar(15) path 'name',
 16      creationtime varchar(25) path 'creationTime',
 17      deviceName varchar(9) path 'deviceName',
 18      devicePartition varchar2(10) path 'devicePartition',
 19      diskType varchar2(9) path 'diskType',
 20      errorCount number path 'errorCount',
 21      freeSpace number path 'freeSpace',
 22      id varchar2(50) path 'id',
 23      interleaving varchar(10) path 'interleaving',
 24      lun varchar2(5) path 'lun',
 25      raidLevel number path 'raidLevel',
 26      disk_size number path 'size',
 27      status varchar2(10) path 'status'
 28   );
 
CELLNAME        NAME            DEVICENAME DISKTYPE   FREESPACE  DISK_SIZE
--------------- --------------- ---------- --------- ---------- ----------
192.168.12.3    CD_00_enkcel01  /dev/sda   HardDisk           0    1832.59
192.168.12.3    CD_01_enkcel01  /dev/sdb   HardDisk           0    1832.59
192.168.12.3    CD_02_enkcel01  /dev/sdc   HardDisk           0     1861.7
192.168.12.3    CD_04_enkcel01  /dev/sdl   HardDisk           0     1861.7
192.168.12.3    CD_05_enkcel01  /dev/sde   HardDisk           0     1861.7
192.168.12.3    CD_06_enkcel01  /dev/sdf   HardDisk           0     1861.7
192.168.12.3    CD_07_enkcel01  /dev/sdg   HardDisk           0     1861.7
192.168.12.3    CD_08_enkcel01  /dev/sdh   HardDisk           0     1861.7
192.168.12.3    CD_09_enkcel01  /dev/sdi   HardDisk           0     1861.7
192.168.12.3    CD_10_enkcel01  /dev/sdj   HardDisk           0     1861.7
192.168.12.3    CD_11_enkcel01  /dev/sdk   HardDisk           0     1861.7
192.168.12.3    FD_00_enkcel01  /dev/sds   FlashDisk          0      22.88
192.168.12.3    FD_01_enkcel01  /dev/sdr   FlashDisk          0      22.88
192.168.12.3    FD_02_enkcel01  /dev/sdt   FlashDisk          0      22.88
192.168.12.3    FD_03_enkcel01  /dev/sdu   FlashDisk          0      22.88
192.168.12.3    FD_04_enkcel01  /dev/sdaa  FlashDisk          0      22.88
192.168.12.3    FD_05_enkcel01  /dev/sdz   FlashDisk          0      22.88
192.168.12.3    FD_06_enkcel01  /dev/sdab  FlashDisk          0      22.88
192.168.12.3    FD_07_enkcel01  /dev/sdac  FlashDisk          0      22.88
192.168.12.3    FD_08_enkcel01  /dev/sdn   FlashDisk          0      22.88
192.168.12.3    FD_09_enkcel01  /dev/sdo   FlashDisk          0      22.88
192.168.12.3    FD_10_enkcel01  /dev/sdp   FlashDisk          0      22.88
192.168.12.3    FD_11_enkcel01  /dev/sdq   FlashDisk          0      22.88
192.168.12.3    FD_12_enkcel01  /dev/sdv   FlashDisk          0      22.88
192.168.12.3    FD_13_enkcel01  /dev/sdw   FlashDisk          0      22.88
192.168.12.3    FD_14_enkcel01  /dev/sdx   FlashDisk          0      22.88
192.168.12.3    FD_15_enkcel01  /dev/sdy   FlashDisk          0      22.88
 
27 rows selected&lt;/pre&gt;

Of course, the same approach can be applied to get other information out (like &lt;i&gt;Grid Disks&lt;/i&gt;, etc.).</description><link>http://afatkulin.blogspot.com/2012/07/vcellconfig.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>2</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-2130089705613092753</guid><pubDate>Tue, 17 Jul 2012 15:37:00 +0000</pubDate><atom:updated>2013-01-10T13:16:18.680-05:00</atom:updated><title>Displaying ASM Partner Disks</title><description>Here is a quick SQL which I sometimes use to show people disks and their respective partners (grouped by a failure group) inside an ASM disk group every time I need to explain both of these concepts.
&lt;br/&gt;&lt;br/&gt;
An example output from a quarter rack Exadata with a normal redundancy disk group:

&lt;pre&gt;SQL&gt; column p format a80
SQL&gt; variable group_number number
SQL&gt; exec :group_number := 1;

PL/SQL procedure successfully completed.

SQL&gt; select d||' =&gt; '||listagg(p, ',') within group (order by p) p
from (
select ad1.failgroup||'('||to_char(ad1.disk_number, 'fm000')||')' d,
 ad2.failgroup||'('||listagg(to_char(p.number_kfdpartner, 'fm000'), ',') within group (order by ad1.disk_number)||')' p
 from v$asm_disk ad1, x$kfdpartner p, v$asm_disk ad2
 where ad1.disk_number = p.disk
  and p.number_kfdpartner=ad2.disk_number
  and ad1.group_number = p.grp
  and ad2.group_number = p.grp
  and p.grp = :group_number
 group by ad1.failgroup, ad1.disk_number, ad2.failgroup
) group by d
order by d;

P
--------------------------------------------------------------------------------
EXAPCEL01(12) =&gt; EXAPCEL02(26,30,31,35),EXAPCEL03(06,08,09,11)
EXAPCEL01(13) =&gt; EXAPCEL02(30,31,34,35),EXAPCEL03(03,07,09,11)
EXAPCEL01(14) =&gt; EXAPCEL02(24,28,29,34),EXAPCEL03(05,09,10,11)
EXAPCEL01(15) =&gt; EXAPCEL02(28,29,33,34),EXAPCEL03(04,05,08,11)
EXAPCEL01(16) =&gt; EXAPCEL02(26,27,33,35),EXAPCEL03(02,06,07,09)
EXAPCEL01(17) =&gt; EXAPCEL02(25,32,33,35),EXAPCEL03(00,01,03,07)
EXAPCEL01(18) =&gt; EXAPCEL02(24,27,32,34),EXAPCEL03(00,03,08,10)
EXAPCEL01(19) =&gt; EXAPCEL02(25,26,32,33),EXAPCEL03(01,02,07,10)
EXAPCEL01(20) =&gt; EXAPCEL02(28,30,31,32),EXAPCEL03(04,05,06,10)
EXAPCEL01(21) =&gt; EXAPCEL02(24,27,29,31),EXAPCEL03(00,01,04,08)
EXAPCEL01(22) =&gt; EXAPCEL02(25,26,27,30),EXAPCEL03(02,03,04,06)
EXAPCEL01(23) =&gt; EXAPCEL02(24,25,28,29),EXAPCEL03(00,01,02,05)
EXAPCEL02(24) =&gt; EXAPCEL01(14,18,21,23),EXAPCEL03(05,07,09,11)
EXAPCEL02(25) =&gt; EXAPCEL01(17,19,22,23),EXAPCEL03(02,06,09,11)
EXAPCEL02(26) =&gt; EXAPCEL01(12,16,19,22),EXAPCEL03(04,08,10,11)
EXAPCEL02(27) =&gt; EXAPCEL01(16,18,21,22),EXAPCEL03(01,04,08,11)
EXAPCEL02(28) =&gt; EXAPCEL01(14,15,20,23),EXAPCEL03(00,06,08,09)
EXAPCEL02(29) =&gt; EXAPCEL01(14,15,21,23),EXAPCEL03(01,02,03,05)
EXAPCEL02(30) =&gt; EXAPCEL01(12,13,20,22),EXAPCEL03(04,05,08,10)
EXAPCEL02(31) =&gt; EXAPCEL01(12,13,20,21),EXAPCEL03(00,01,07,10)
EXAPCEL02(32) =&gt; EXAPCEL01(17,18,19,20),EXAPCEL03(00,06,07,10)
EXAPCEL02(33) =&gt; EXAPCEL01(15,16,17,19),EXAPCEL03(02,03,05,09)
EXAPCEL02(34) =&gt; EXAPCEL01(13,14,15,18),EXAPCEL03(02,03,04,07)
EXAPCEL02(35) =&gt; EXAPCEL01(12,13,16,17),EXAPCEL03(00,01,03,06)
EXAPCEL03(00) =&gt; EXAPCEL01(17,18,21,23),EXAPCEL02(28,31,32,35)
EXAPCEL03(01) =&gt; EXAPCEL01(17,19,21,23),EXAPCEL02(27,29,31,35)
EXAPCEL03(02) =&gt; EXAPCEL01(16,19,22,23),EXAPCEL02(25,29,33,34)
EXAPCEL03(03) =&gt; EXAPCEL01(13,17,18,22),EXAPCEL02(29,33,34,35)
EXAPCEL03(04) =&gt; EXAPCEL01(15,20,21,22),EXAPCEL02(26,27,30,34)
EXAPCEL03(05) =&gt; EXAPCEL01(14,15,20,23),EXAPCEL02(24,29,30,33)
EXAPCEL03(06) =&gt; EXAPCEL01(12,16,20,22),EXAPCEL02(25,28,32,35)
EXAPCEL03(07) =&gt; EXAPCEL01(13,16,17,19),EXAPCEL02(24,31,32,34)
EXAPCEL03(08) =&gt; EXAPCEL01(12,15,18,21),EXAPCEL02(26,27,28,30)
EXAPCEL03(09) =&gt; EXAPCEL01(12,13,14,16),EXAPCEL02(24,25,28,33)
EXAPCEL03(10) =&gt; EXAPCEL01(14,18,19,20),EXAPCEL02(26,30,31,32)
EXAPCEL03(11) =&gt; EXAPCEL01(12,13,14,15),EXAPCEL02(24,25,26,27)
&lt;/pre&gt;</description><link>http://afatkulin.blogspot.com/2012/07/displaying-asm-partner-disks.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-2952617336925788147</guid><pubDate>Fri, 06 Jul 2012 21:10:00 +0000</pubDate><atom:updated>2012-07-06T17:41:51.688-04:00</atom:updated><title>ASM normal redundancy with high protection template</title><description>One of the ways you can control how many mirror copies ASM keeps on disk is via diskgroup templates.
&lt;br/&gt;&lt;br/&gt;
This presents an interesting question -- can you turn your normal redundancy disk group into a high redundancy by using a diskgroup template with &lt;i&gt;high&lt;/i&gt; protection attribute thus telling ASM to use triple mirroring for related files?
&lt;br/&gt;&lt;br/&gt;
I'll start by creating a diskgroup:
&lt;pre&gt;SQL&gt; create diskgroup data normal redundancy
  2     disk '/dev/sdb1', '/dev/sdc1', '/dev/sdd1';

Diskgroup created.&lt;/pre&gt;
I'm not explicitly specifying any failgroups, each disk will end up in it's own FG and triple-mirroring will spread extents across all three disks. Let's add a high protection template:
&lt;pre&gt;SQL&gt; alter diskgroup data add template mirror_high attributes (high coarse);

Diskgroup altered.&lt;/pre&gt;
Now let's create two tablespaces, one using the default mirroring template (two-way) and one using &lt;i&gt;mirror_high&lt;/i&gt; (three-way) template:
&lt;pre&gt;SQL&gt; create tablespace mirror_normal datafile '+data' size 256m;

Tablespace created.

SQL&gt; create tablespace mirror_high datafile '+data(mirror_high)' size 256m;

Tablespace created.&lt;/pre&gt;
We can confirm how many mirror extents our datafiles have:
&lt;pre&gt;SQL&gt; select     a.name,
        case LXN_KFFXP
                when 0 then 'primary'
                when 1 then 'secondary'
                when 2 then 'third'
        end, count(*)
        from X$KFFXP, v$asm_alias a
        where a.file_number=NUMBER_KFFXP
        group by a.name, LXN_KFFXP
        order by 1, 2;  2    3    4    5    6    7    8    9   10

NAME                           CASELXN_K   COUNT(*)
------------------------------ --------- ----------
MIRROR_HIGH.257.787940101      primary          258
MIRROR_HIGH.257.787940101      secondary        258
MIRROR_HIGH.257.787940101      third            258
MIRROR_NORMAL.256.787940093    primary          258
MIRROR_NORMAL.256.787940093    secondary        258
MIRROR_NORMAL.256.787940093    third              1

6 rows selected.&lt;/pre&gt;
Indeed, all extents got triple mirrored for &lt;i&gt;MIRROR_HIGH&lt;/i&gt; tablespace and only double mirrored for &lt;i&gt;MIRROR_NORMAL&lt;/i&gt; (with the exception of one extent which probably holds some triple-mirrored metadata).
&lt;br/&gt;&lt;br/&gt;
Now, because all data essentially got triple mirrored for our &lt;i&gt;MIRROR_HIGH&lt;/i&gt; tablespace it ought to survive two disk failure, right? Let's give it a try:
&lt;pre&gt;SQL&gt; alter diskgroup data dismount;

Diskgroup altered.&lt;/pre&gt;
... then I'm going to change the permissions and make two disks inaccessible to ASM:
&lt;pre&gt;[oracle@ora11gr2 ~]$ chmod 000 /dev/sdc1
[oracle@ora11gr2 ~]$ chmod 000 /dev/sdd1&lt;/pre&gt;
Let's see if I could mount it back:
&lt;pre&gt;SQL&gt; alter diskgroup data mount force;
alter diskgroup data mount force
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15017: diskgroup "DATA" cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "DATA"&lt;/pre&gt;
...looks like the answer is no. It still sees that the remaining disk is missing required partners and does not allow the disk group to be mounted. How many extent copies the datafile had becomes a moot point.
&lt;br/&gt;&lt;br/&gt;
The conclusion is that triple-mirroring inside a normal redundancy disk group can help you in certain cases (like physical corruption of both primary and secondary extents so ASM can use a third one to read the data) but it's not a substitute for a high redundancy disk group.
&lt;br/&gt;&lt;br/&gt;
&lt;i&gt;&lt;b&gt;Update a little bit later&lt;/b&gt;: if anyone is thinking about doing it the other way around then remember that mirroring attribute works only for &lt;b&gt;normal&lt;/b&gt; redundancy disk groups.&lt;/i&gt;</description><link>http://afatkulin.blogspot.com/2012/07/asm-normal-redundancy-with-high.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>2</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-1133105370952144869</guid><pubDate>Thu, 05 Jul 2012 15:59:00 +0000</pubDate><atom:updated>2012-07-05T12:43:35.787-04:00</atom:updated><title>Serial direct path reads in 11GR2 and Exadata environments</title><description>Serial direct path reads were first introduced in Oracle 11G which were noticed by many customers (in both positive and negative ways) who upgraded from the earlier releases. Back then I did a quick &lt;a href="http://afatkulin.blogspot.ca/2009/01/11g-adaptive-direct-path-reads-what-is.html"&gt;write up&lt;/a&gt; on the subject to try and tackle some of the key variables which affect the behavior.
&lt;br/&gt;&lt;br/&gt;
To recap, the following were observed for 11G:
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;Serial direct path reads start at &lt;i&gt;_small_table_threshold*5&lt;/i&gt;.
&lt;/li&gt;
&lt;li&gt;Serial direct path reads stop when 50% of the table blocks are cached.
&lt;/li&gt;
&lt;li&gt;Serial direct path reads stop when 25% of the table blocks are dirty.
&lt;/li&gt;
&lt;/ul&gt;
Since then, many people noticed that 11GR2 seems to be more aggressive on the thresholds so I decided to re-run the tests and see what changed. I'm also going to run the same tests on the Exadata to see whether it changes anything compared to a normal 11GR2 database.
&lt;br/&gt;&lt;br/&gt;
To refresh the memory, here is how the methodology works (all code is available in the &lt;a href="http://afatkulin.blogspot.ca/2009/01/11g-adaptive-direct-path-reads-what-is.html"&gt;11G&lt;/a&gt; post).
&lt;br/&gt;&lt;br/&gt;
&lt;b&gt;Start threshold&lt;/b&gt;
&lt;br/&gt;&lt;br/&gt;
The function grows a segment in a loop, performing a full table scan and using &lt;i&gt;physical reads direct&lt;/i&gt; statistic as an indicator. Every iteration buffer cache is flushed to make sure cached blocks do not affect the numbers.
&lt;br/&gt;&lt;br/&gt;
&lt;b&gt;Stop cached threshold&lt;/b&gt;
&lt;br/&gt;&lt;br/&gt;
A loop is used to cache more and more blocks from a table followed by a full table scan. Again, &lt;i&gt;physical reads direct&lt;/i&gt; statistic is used to see when serial direct path reads no longer happen. Buffer cache is flushed at the beginning of the test.
&lt;br/&gt;&lt;br/&gt;
&lt;b&gt;Stop dirty threshold&lt;/b&gt;
&lt;br/&gt;&lt;br/&gt;
Similar to the above but blocks are updated instead of just being cached.
&lt;br/&gt;&lt;br/&gt;
&lt;b&gt;Results&lt;/b&gt;
&lt;br/&gt;&lt;br/&gt;
&lt;table border="1"&gt;

&lt;tr&gt;&lt;th&gt;&amp;nbsp;&lt;/th&gt;&lt;th&gt;Non-Exadata&lt;/th&gt;&lt;th&gt;Exadata&lt;/th&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;th&gt;Start threshold&lt;/th&gt;&lt;td&gt;_small_table_threshold&lt;/td&gt;&lt;td&gt;_small_table_threshold&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;th&gt;Stop cached&lt;/th&gt;&lt;td&gt;50%&lt;/td&gt;&lt;td&gt;50%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;th&gt;Stop dirty&lt;/th&gt;&lt;td&gt;25%&lt;/td&gt;&lt;td&gt;25%&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;br/&gt;
&lt;b&gt;Summary&lt;/b&gt;
&lt;br/&gt;&lt;br/&gt;11GR2 threshold is indeed more aggressive and starts at &lt;i&gt;_small_table_threshold&lt;/i&gt; (i.e. five times lower compared to 11G). There seems to be no change to the rest of the numbers. Behavior appears to be the same on both Exadata and non-Exadata systems.
&lt;br/&gt;&lt;br/&gt;Bottom line is 11GR2 starts doing serial direct path reads earlier and stops doing these at the same time as 11G did.
&lt;br/&gt;&lt;br/&gt;&lt;i&gt;All testing were done on 11.2.0.3&lt;/i&gt;</description><link>http://afatkulin.blogspot.com/2012/07/serial-direct-path-reads-in-11gr2-and.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-1041533750376446774</guid><pubDate>Wed, 20 Jun 2012 01:15:00 +0000</pubDate><atom:updated>2012-06-19T21:15:49.707-04:00</atom:updated><title>Online Redefinition and Reference Partitioning</title><description>Reference partitioning combined with &lt;i&gt;dbms_redefinition&lt;/i&gt; can result in a perpetually disabled foreign key constraint (I'm using 11.2.0.3 as an example):

&lt;pre&gt;SQL&gt; create table p (
  2   n number primary key
  3  ) partition by list (n)
  4  (
  5   partition p1 values (1)
  6  );
 
Table created
 
SQL&gt; create table r1 (n number not null);
 
Table created
 
SQL&gt; create table r2 (
  2   n number not null,
  3   constraint fk_r$n foreign key (n) referencing p (n)
  4  ) partition by reference (fk_r$n);
 
Table created
 
SQL&gt; exec dbms_redefinition.start_redef_table(user, 'R1', 'R2', options_flag =&gt; dbms_redefinition.cons_use_rowid);
 
PL/SQL procedure successfully completed
 
SQL&gt; exec dbms_redefinition.finish_redef_table(user, 'R1', 'R2');
 
PL/SQL procedure successfully completed
 
SQL&gt; select status, validated from user_constraints where constraint_name='FK_R$N';
 
STATUS   VALIDATED
-------- -------------
DISABLED NOT VALIDATED&lt;/pre&gt;

The annoying part comes when you realize that you can't do anything about it (at least not by normal means):

&lt;pre&gt;SQL&gt; alter table r1 enable constraint FK_R$N;
 
alter table r1 enable constraint FK_R$N
 
ORA-14650: operation not supported for reference-partitioned tables
 
SQL&gt; alter table r1 enable novalidate constraint FK_R$N;
 
alter table r1 enable novalidate constraint FK_R$N
 
ORA-14650: operation not supported for reference-partitioned tables&lt;/pre&gt;

Surprisingly enough the constraint actually works as far as reference partitioning is concerned. However, it results in an "unclean" status which can't be fixed! Of course the view just shows us what we have in &lt;i&gt;cdef$&lt;/i&gt;:
&lt;pre&gt;SQL&gt; select con#, enabled, defer
  2   from sys.cdef$
  3   where con# =
  4   (select con#
  5    from sys.con$
  6    where owner#=(select user# from sys.user$ where name=user)
  7     and name= 'FK_R$N'
  8   );
 
      CON#    ENABLED      DEFER
---------- ---------- ----------
     10997                   512&lt;/pre&gt;

&lt;i&gt;ENABLED&lt;/i&gt; set to &lt;i&gt;NULL&lt;/i&gt; will decode to &lt;i&gt;DISABLED&lt;/i&gt; while the following results in &lt;i&gt;NOT VALIDATED&lt;/i&gt;:

&lt;pre&gt;SQL&gt; select decode(bitand(512, 4), 4, 'VALIDATED', 'NOT VALIDATED') VALIDATED from dual;
 
VALIDATED
-------------
NOT VALIDATED&lt;/pre&gt;

So what we really need is &lt;i&gt;ENABLED&lt;/i&gt; set to 1 and &lt;i&gt;DEFER&lt;/i&gt; set to 516. Of course, playing with the data dictionary in such a way is a big no-no on production system without blessing of Oracle support, but since I'm on my sandbox database I'll give it a shot:

&lt;pre&gt;SQL&gt; update cdef$ set enabled=1, defer=516 where con#=10997;

1 row updated.

SQL&gt; commit;

Commit complete.

SQL&gt; startup force
ORACLE instance started.

Total System Global Area  400846848 bytes
Fixed Size                  2228784 bytes
Variable Size             146804176 bytes
Database Buffers          243269632 bytes
Redo Buffers                8544256 bytes
Database mounted.
Database opened.

SQL&gt; select status, validated from user_constraints where constraint_name='FK_R$N';
 
STATUS   VALIDATED
-------- -------------
ENABLED  VALIDATED&lt;/pre&gt;

Looks like it worked :) Note that there is practically no way for a reference partitioning constraint to be either disabled or not validated -- the rows have to go somewhere, after all, so we don't introduce any logical inconsistencies by setting &lt;i&gt;cdef$&lt;/i&gt; values to what they really should be.

&lt;br/&gt;&lt;br/&gt;
&lt;b&gt;Footnote:&lt;/b&gt;&lt;br/&gt;
The bug number for this one is &lt;b&gt;13572659&lt;/b&gt;.</description><link>http://afatkulin.blogspot.com/2012/06/online-redefinition-and-reference.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>1</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-5934938964816607889</guid><pubDate>Wed, 02 May 2012 14:09:00 +0000</pubDate><atom:updated>2012-05-02T10:09:38.598-04:00</atom:updated><title>Deferred Segment Creation Quirks</title><description>If you have a habit, as I do, to quickly check &lt;i&gt;dba_segments&lt;/i&gt; to make sure there is nothing important in the tablespace you are about to drop then deferred segment creation can present some surprises.

Consider the following:

&lt;pre&gt;SQL&gt; create tablespace test;
 
Tablespace created
 
SQL&gt; create table t (n number) tablespace test;
 
Table created
 
SQL&gt; select segment_created from user_tables where table_name='T';
 
SEGMENT_CREATED
---------------
NO

SQL&gt; select * from dba_segments where tablespace_name='TEST';

no rows selected&lt;/pre&gt;

I can now go ahead and drop the tablespace. Since there isn't really any segment Oracle allows me to do that without issuing any warnings:

&lt;pre&gt;SQL&gt; drop tablespace test;
 
Tablespace dropped&lt;/pre&gt;

The weird part comes when you want to do something with that table:

&lt;pre&gt;SQL&gt; insert into t values (1);
 
insert into t values (1)
 
ORA-00959: tablespace 'TEST' does not exist
 
SQL&gt; alter table t move tablespace users;
 
alter table t move tablespace users
 
ORA-00959: tablespace 'TEST' does not exist
 
SQL&gt; drop table t;
 
drop table t
 
ORA-00959: tablespace 'TEST' does not exist&lt;/pre&gt;

The most annoying part is that you can't move or even drop the table and I see no real reason for such behavior. After all, if Oracle allows you to drop the tablespace pretending that nothing is there, it should allow me to drop/change storage parameters for a non existing segment as well. Otherwise it looks like the "front end" got updated to support deferred segment creation while the "back end" still operates the legacy way. Thankfully there is an easy workaround -- just create a shell tablespace with required name and then move the table:

&lt;pre&gt;SQL&gt; create tablespace test;
 
Tablespace created
 
SQL&gt; alter table t move tablespace users;
 
Table altered&lt;/pre&gt;</description><link>http://afatkulin.blogspot.com/2012/05/deferred-segment-creation-quirks.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>2</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-672639677691357941</guid><pubDate>Wed, 02 May 2012 02:07:00 +0000</pubDate><atom:updated>2012-05-01T22:07:13.503-04:00</atom:updated><title>Result Cache Latch in 11GR2: Shared Mode Gets</title><description>Almost two years ago I wrote a post about &lt;a href="http://afatkulin.blogspot.ca/2010/06/11gr2-result-cache-scalability.html"&gt;11GR2 Result Cache Scalability&lt;/a&gt;. In the post I noted that there has been a significant improvement compared to 11GR1. What left me a little bit puzzled is where exactly the performance advantage came from. After all, it didn't look like the latch was changed to allow for shared mode gets as I still observed some sleeps during my tests. Back then I was quite happy with the results and made a note to revisit the subject in order to figure out exactly what happened which then got on my back burner and eventually slipped away.
&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Result Cache: RC Latch&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
We can begin by testing whether shared mode gets are allowed or not using an &lt;i&gt;oradebug&lt;/i&gt; call:
&lt;br /&gt;
&lt;pre&gt;SQL&amp;gt; select addr from v$latch where name='Result Cache: RC Latch';

ADDR
----------------
0000000060040B70

SQL&amp;gt; oradebug setmypid
Statement processed.
SQL&amp;gt; oradebug call kslgetsl_w 0x0000000060040B70 1 1 1 8
Function returned 1&lt;/pre&gt;
The forth argument (8) tells the function to acquire the latch in the shared mode and it does work! Now let's see what happens when another session wants to put something into the result cache:

&lt;br /&gt;
&lt;pre&gt;SQL&amp;gt; create table t as select 1 n from dual;

Table created.

SQL&amp;gt; select /*+ result_cache */ * from t;
--session waits&lt;/pre&gt;
As expected, the session waits. We do need to get a latch in exclusive mode in order to change the result cache memory contents so we can put something new in there. We can confirm that the session indeed waits on the Result Cache latch (which can be seen by the latch address):

&lt;br /&gt;
&lt;pre&gt;SQL&amp;gt; select event, to_char(p1, 'xxxxxxxxxx') latch_addr, seconds_in_wait, state
  2    from v$session_wait
  3    where sid=63;
 
EVENT           LATCH_ADDR  SECONDS_IN_WAIT STATE
--------------- ----------- --------------- -------------------
latch free         60040b70              49 WAITING&lt;/pre&gt;
Let's free the latch in our first session:
&lt;br /&gt;
&lt;pre&gt;SQL&amp;gt; oradebug call kslfre 0x0000000060040B70
Function returned 0&lt;/pre&gt;
Immediately our second session proceeds. Now it's time to do the most interesting part -- would the session block again if it only needs to &lt;i&gt;read&lt;/i&gt; from the result cache?
&lt;br /&gt;
&lt;br /&gt;
Session 1:
&lt;br /&gt;
&lt;pre&gt;SQL&amp;gt; oradebug call kslgetsl_w 0x0000000060040B70 1 1 1 8
Function returned 1&lt;/pre&gt;
Session 2:
&lt;br /&gt;
&lt;pre&gt;SQL&amp;gt; select /*+ result_cache */ * from t;

         N
----------
         1&lt;/pre&gt;
For you see -- the latch was indeed changed to allow for shared mode gets! We can further verify the behavior by acquiring the Result Cache latch in exclusive mode in our first session:
&lt;br /&gt;
&lt;pre&gt;SQL&amp;gt; oradebug call kslfre 0x0000000060040B70
Function returned 0
SQL&amp;gt; oradebug call kslgetsl_w 0x0000000060040B70 1 1 1 16
Function returned 1&lt;/pre&gt;
The value of 16 in the forth argument tells the function to acquire the latch in exclusive mode. Let's test the second session again:
&lt;br /&gt;
&lt;pre&gt;SQL&amp;gt; select /*+ result_cache */ * from t;
--sessin waits&lt;/pre&gt;
Well, you can't get a shared mode if someone else has exclusive. Again, freeing the latch in our first session allows second session to proceeds.&lt;br /&gt;
&lt;br /&gt;
So it is confirmed -- in 11GR2 the Result Cache latch can be acquired in a shared mode!</description><link>http://afatkulin.blogspot.com/2012/05/result-cache-latch-in-11gr2-shared-mode.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-8387692101803349912</guid><pubDate>Mon, 30 Apr 2012 21:38:00 +0000</pubDate><atom:updated>2012-04-30T18:49:31.047-04:00</atom:updated><title>ORA-09925: Unable to create audit trail file</title><description>Sometimes the issues you hit make no sense at all.

Consider the following when things behaving normally:

&lt;pre&gt;[oracle@ora11gr2 ~]$ export ORACLE_SID=test
[oracle@ora11gr2 ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Mon Apr 30 17:09:31 2012

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to an idle instance.

SQL&gt; exit
Disconnected&lt;/pre&gt;

I have no instance named &lt;i&gt;test&lt;/i&gt; running on the box and get the expected behavior. Now let's continue:

&lt;pre&gt;[oracle@ora11gr2 ~]$ touch /tmp/test
[oracle@ora11gr2 ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Mon Apr 30 17:10:54 2012

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

ERROR:
ORA-09925: Unable to create audit trail file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 9925
ORA-09925: Unable to create audit trail file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 9925&lt;/pre&gt;

The only thing I changed is created an empty file &lt;i&gt;/tmp/test&lt;/i&gt; and suddenly I'm getting some &lt;i&gt;"ORA-09925: Unable to create audit trail file"&lt;/i&gt; errors? What's going on?

If we &lt;i&gt;strace&lt;/i&gt; SQL*Plus we would see the following:

&lt;pre&gt;3798  stat("/u01/app/oracle/admin/test/adump", 0x7fff1a8b5bd0) = -1 ENOENT (No such file or directory)
3798  access("/tmp/test", F_OK)         = 0
3798  open("/u01/app/oracle/product/11.2.0/dbhome_2/rdbms/audit//tmp/test_ora_3798_1.aud", O_RDWR|O_CREAT|O_EXCL, 0660) = -1 ENOENT (No such file or directory)&lt;/pre&gt;

The first command checks for audit file directory in a default location. However, the second pair of commands does something strange. It tests whether file &lt;i&gt;/tmp/test&lt;/i&gt; exists and if it does it tries to open an audit trace file under a really weird path. But where does this path comes from?

If I look at my environment variables:

&lt;pre&gt;[oracle@ora11gr2 ~]$ env | grep /tmp
ORACLE_PATH=/tmp&lt;/pre&gt;

Apparently I have some strange &lt;i&gt;ORACLE_PATH&lt;/i&gt; variable set. What if I unset it?

&lt;pre&gt;[oracle@ora11gr2 ~]$ unset ORACLE_PATH
[oracle@ora11gr2 ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Mon Apr 30 17:22:27 2012

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to an idle instance.

SQL&gt; exit
Disconnected&lt;/pre&gt;

Strangely enough everything works normally again! So what I discovered is the following:

&lt;ol&gt;
&lt;li&gt;When you run &lt;i&gt;"sqlplus / as sysdba"&lt;/i&gt; SQL*Plus will check whether you have ORACLE_PATH environment variable set.&lt;/li&gt;
&lt;li&gt;If you do, it looks up a file named $ORACLE_PATH/$ORACLE_SID.&lt;/li&gt;
&lt;li&gt;If such a file exists you will get the above behavior.&lt;/li&gt;
&lt;/ol&gt;

I found this to be a very odd behavior if not outright confusing especially if you happened to have the above conditions met (which, in fact, did happen to me on a real system). A quick search on MOS revealed the following:

&lt;pre&gt;&lt;b&gt;Bug 13584524: STRANGE AUDIT DESTINATION IF ORACLE_PATH IS SET: ORA-09925&lt;/b&gt;&lt;/pre&gt;

For now the bug appears to be affecting only 11.2.0.3 regardless of the platform.</description><link>http://afatkulin.blogspot.com/2012/04/ora-09925-unable-to-create-audit-trail.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>2</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-7272373980035825037</guid><pubDate>Sun, 05 Feb 2012 02:03:00 +0000</pubDate><atom:updated>2012-02-07T10:02:56.720-05:00</atom:updated><title>session_cached_cursors again</title><description>Not so long time ago I had to troubleshoot code performance regression issue between 9iR2 and 11gR2.&lt;br /&gt;
&lt;br /&gt;
The issue had been related to a business-critical stored procedure that, for some reason, was performing almost 50% slower when run in 11gR2 compared to 9iR2. When I took a look at the procedure I've discovered that there were, strictly speaking, nothing that could regress. At least not in the commons sense when we're talking about upgrades. All the procedure was doing was a look up using a statement containing &lt;i&gt;UNION ALL&lt;/i&gt;.&lt;br /&gt;
&lt;br /&gt;
Below is the artificially created test case which demonstrates the issue.&lt;br /&gt;
&lt;br /&gt;
9iR2 results:&lt;br /&gt;
&lt;br /&gt;
&lt;pre&gt;select *
from
 dual where :"SYS_B_0"=:"SYS_B_1" union all select * from dual where
  :"SYS_B_2"=:"SYS_B_3"


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse    99999      3.88       4.41          0          0          0           0
Execute  99999      1.23       1.31          0          0          0           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   199998      5.11       5.73          0          0          0           0&lt;/pre&gt;&lt;br /&gt;
11gR2 results:&lt;br /&gt;
&lt;br /&gt;
&lt;pre&gt;select *
from
 dual where :"SYS_B_0"=:"SYS_B_1" union all select * from dual where
  :"SYS_B_2"=:"SYS_B_3"


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse    99999      6.14       7.21          0          0          0           0
Execute  99999      1.04       1.38          0          0          0           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   199998      7.19       8.59          0          0          0           0&lt;/pre&gt;&lt;br /&gt;
As you can see from the above figures, 11gR2 spent significantly more time parsing. Given that both sessions had 100% session cursor cache hit (and yes, the procedure didn't make use of bind variables either) and were otherwise run on pretty much identical environments (apart from the Oracle version), such a difference in parse time is certainly unexpected.&lt;br /&gt;
&lt;br /&gt;
It was another time to gear up for a reproducible test case. Perhaps I'm being lucky but it boiled down to a &lt;i&gt;session_cache_cursors&lt;/i&gt; setting again. To better demonstrate the point I'm going to run the following anonymous PL/SQL block in my 11gR2 database:&lt;br /&gt;
&lt;br /&gt;
&lt;pre&gt;SQL&amp;gt; begin
  2   execute immediate 'alter session set cursor_sharing=force';
  3   for i in 1 .. 10
  4   loop
  5    execute immediate 'alter session set session_cached_cursors='||to_char(i*100);
  6    for j in 1 .. 10000
  7    loop
  8     execute immediate
  9      'select /* test_'||to_char(i*100)||' */ * from dual where '||to_char(j)||'='||to_char(j)||
 10      ' union all select * from dual where '||to_char(j)||'='||to_char(j);
 11    end loop;
 12   end loop;
 13  end;
 14  /
 
PL/SQL procedure successfully completed&lt;/pre&gt;&lt;br /&gt;
What it does is sets &lt;i&gt;cursor_sharing=force&lt;/i&gt; and then executes the same statement with different settings for &lt;i&gt;session_cached_cursors&lt;/i&gt; -- from 100 to 1000 (with 100 increment). What do you think the results would be?&lt;br /&gt;
&lt;br /&gt;
&lt;pre&gt;SQL&amp;gt; select to_number(substr(sql_text, instr(sql_text, 'test')+5, 4)) cursor_cache, executions, cpu_time
  2   from v$sql
  3   where sql_text like '%test%'
  4    and executions=10000
  5   order by 1;
 
CURSOR_CACHE EXECUTIONS   CPU_TIME
------------ ---------- ----------
         100      10000     571916
         200      10000     517928
         300      10000     559908
         400      10000     572909
         500      10000     561912
         600      10000     616915
         700      10000     652899
         800      10000     670886
         900      10000     671892
        1000      10000     702893
 
10 rows selected&lt;/pre&gt;Surprisingly enough, rising &lt;i&gt;session_cached_cursors&lt;/i&gt; from 100 to 1000 increases the CPU time required to execute the statements from 571916 to 702893! You can also spot how the CPU time increases together with the &lt;i&gt;session_cached_cursors&lt;/i&gt; setting.&lt;br /&gt;
&lt;br /&gt;
What happened is 9iR2 was running with &lt;i&gt;session_cached_cursors=1000&lt;/i&gt; and, apparently, it never had any negative effect... until the setting was transferred to 11gR2. It's somewhat hard to speculate what exactly is happening but based on my limited test cases it appears that you need (a) a union all in the statement, (b) run with &lt;i&gt;cursor_sharing=force&lt;/i&gt; and (c) have literals in your query. Under these circumstances there appears to be that strange performance degradation observed in 11gR2. But I need to do more investigation on that one. For now all I can say is that, after setting &lt;i&gt;session_cached_cursors=100&lt;/i&gt;, 11gR2 exhibited the same results as 9iR2.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;&lt;i&gt;***** UPDATE 07-Feb-2012 *****&lt;/i&gt;&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
It appears that only 11.2.0.1 and 11.2.0.2 are affected by this issue (see comments for more details).&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;&lt;i&gt;***** UPDATE 07-Feb-2012 #2 *****&lt;/i&gt;&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
I've found the bug# for this one: "Bug 12345980 high parse time with cursor_sharing=force when session_cached_cursors set".</description><link>http://afatkulin.blogspot.com/2012/02/sessioncachedcursors-again.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>10</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-4803016963357368436</guid><pubDate>Thu, 02 Feb 2012 01:08:00 +0000</pubDate><atom:updated>2012-02-01T20:08:53.436-05:00</atom:updated><title>Latch: row cache objects, session_cached_cursors and a database link</title><description>What could possibly be common about the above things?&lt;br /&gt;
&lt;br /&gt;
Sometime ago I was involved in tuning a large scale production system which experienced quite a lot of latch contention. In fact the contention was bad enough as to render the entire system unusable during peak hours.&lt;br /&gt;
&lt;br /&gt;
A typical AWR report during times with moderate workload looked like this:&lt;br /&gt;
&lt;br /&gt;
&lt;pre&gt;Event                      Waits           Time(s) Avg wait (ms)   % DB time  Wait Class
DB CPU                     21,805          36.95  
latch: row cache objects   14,524,462      11,552  1               19.57      Concurrency
db file sequential read    2,697,778       8,988   3               15.23      User I/O
gc current block 3-way     4,202,356       3,599   1               6.10       Cluster
gc current block 2-way     3,670,293       2,330   1               3.95       Cluster&lt;/pre&gt;&lt;br /&gt;
(in reality it was a 7-node RAC cluster but it is not really relevant to our subject).&lt;br /&gt;
&lt;br /&gt;
It is not hard to spot &lt;i&gt;latch: row cache objects&lt;/i&gt; on a second place consuming almost 20% of DB time. This event was rapidly escalating whether load increased quickly bringing the entire cluster into unusable state as far as any front-end application were concerned. The total number of gets during an hour (the interval with which this particular AWR report had been made) totaled 1,281,167,103 which averages a whopping 355,879 gets per second!&lt;br /&gt;
&lt;br /&gt;
The first step was to take a look at the Dictionary Cache statistics and see if there were any obvious deviations:&lt;br /&gt;
&lt;pre&gt;Cache                   Get Requests    Pct Miss
dc_awr_control          72              9.72
dc_database_links       124,452         0.05
dc_files                46,900          17.14
dc_global_oids          555,599         0.07
dc_histogram_data       8,962,576       1.01
dc_histogram_defs       412,885,094     0.02
dc_object_grants        52,475          1.13
dc_objects              2,860,222       0.64
dc_profiles             61,189          0.01
dc_rollback_segments    392,885         0.00
dc_segments             657,653         7.72
dc_sequences            1,324           67.67
dc_table_scns           3,206           3.31
dc_tablespaces          1,253,322       0.06
dc_users                5,475,824       0.02
global database name    65,092          0.01
kqlsubheap_object       10,910          0.46
outstanding_alerts      721             95.98&lt;/pre&gt;&lt;br /&gt;
&lt;i&gt;dc_histogram_defs&lt;/i&gt; really stands out but so far provides no direct clues. Though the entire dictionary cache seems to be quite busy. Let's take a look at instance activity statistics:&lt;br /&gt;
&lt;pre&gt;&amp;nbsp;                 Per Second   Per Transaction  Per Exec        Per Call
DB Time(s):       16.4         0.5              0.00            0.00
DB CPU(s):        6.1          0.2              0.00            0.00
Redo size:        87,543.5     2,398.6
Logical reads:    336,555.7    9,221.2
Block changes:    449.3        12.3
Physical reads:   1,018.3      27.9
Physical writes:  36.8         1.0
User calls:       41,250.4     1,130.2
Parses:           6,264.4      171.6
Hard parses:      27.1         0.7
W/A MB processed: 33,769,202.4 925,233.0
Logons:           18.3         0.5
Executes:         14,230.5     389.9
Rollbacks:        3.4          0.1
Transactions:     36.5&lt;/pre&gt;&lt;br /&gt;
A couple of things to notice:&lt;br /&gt;
&lt;ol&gt;&lt;li&gt;The number of executions is not that high (at least for an IBM box with 56 CPU cores, that is).&lt;/li&gt;
&lt;li&gt;&lt;i&gt;Execute to Parse&lt;/i&gt; ratio is not particularly good nor bad (6,264.4 parses per second compared to 14,230.5 executions).&lt;/li&gt;
&lt;li&gt;None of the above seems to be enough to justify 355,879 &lt;i&gt;latch: row cache objects&lt;/i&gt; gets per second, even when you consider 27.1 hard parses/sec.&lt;/li&gt;
&lt;/ol&gt;&lt;br /&gt;
There had to be more going on. Looking at the &lt;i&gt;SQL ordered by Parse Calls&lt;/i&gt; section of the report revealed that all the top parsing queries were coming from a database link. Which prompted me to ask additional questions...&lt;br /&gt;
&lt;br /&gt;
It turned out that the system had a somewhat weird architecture. A 7-node RAC cluster served as a back end to a numerous front-end Oracle databases which were essentially acting as PL/SQL-engines, accessing all the data through the database link while dong all the processing. The architectural feasibility of such solution is debatable (to say the least) but let's not steer away from the main topic of our discussion.&lt;br /&gt;
&lt;br /&gt;
I geared to find out whether there was some special case which could cause &lt;i&gt;latch: row cache objects&lt;/i&gt; to shoot through the roof when we deal with a lot of queries executed through a database link. After a number of experiments I eventually found a cause which explains the title of this blog post.&lt;br /&gt;
&lt;br /&gt;
Take a look at the following example:&lt;br /&gt;
&lt;br /&gt;
&lt;pre&gt;SQL&gt; declare
  2   l_gets number;
  3  begin
  4   select gets into l_gets from v$latch@dblink where name='row cache objects';
  5  
  6   for i in 1 .. 1000
  7   loop
  8    for cur in (select null from dual@dblink) loop null; end loop;
  9   end loop;
 10  
 11   select sum(gets)-l_gets into l_gets from v$latch@dblink where name='row cache objects';
 12   dbms_output.put_line(to_char(l_gets));
 13  end;
 14  /
 
79
 
PL/SQL procedure successfully completed&lt;/pre&gt;&lt;br /&gt;
Accessing a remote table 1000 times through a database link only caused 79 &lt;i&gt;row cache objects&lt;/i&gt; gets on the &lt;i&gt;destination&lt;/i&gt; database. However, exactly the same example can produce quite a different result:&lt;br /&gt;
&lt;br /&gt;
&lt;pre&gt;SQL&gt; alter session set session_cached_cursors=0;
 
Session altered
 
SQL&gt; 
SQL&gt; declare
  2   l_gets number;
  3  begin
  4   select gets into l_gets from v$latch@dblink where name='row cache objects';
  5  
  6   for i in 1 .. 1000
  7   loop
  8    for cur in (select null from dual@dblink) loop null; end loop;
  9   end loop;
 10  
 11   select sum(gets)-l_gets into l_gets from v$latch@dblink where name='row cache objects';
 12   dbms_output.put_line(to_char(l_gets));
 13  end;
 14  /
 
3082
 
PL/SQL procedure successfully completed&lt;/pre&gt;&lt;br /&gt;
Now we jumped from 79 gets to 3082 which is a 39x increase! What's really astonishing is the fact that it's a cursor cache miss on the &lt;i&gt;source&lt;/i&gt; which wreaks havoc on the &lt;i&gt;destination&lt;/i&gt;. Evidently, the solution to the problem had nothing to do with the database where the problem was observed. The "PL/SQL-databases" had an insufficient setting for &lt;i&gt;session_cached_cursors&lt;/i&gt; and once that was corrected the issue disappeared.&lt;br /&gt;
&lt;br /&gt;
It is somewhat ironic that that problem would have never appears had the data and PL/SQL be allowed to be kept in the same database (where they belong).</description><link>http://afatkulin.blogspot.com/2012/02/latch-row-cache-objects.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>6</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-52312454049975182</guid><pubDate>Wed, 06 Apr 2011 00:23:00 +0000</pubDate><atom:updated>2011-04-05T20:23:23.372-04:00</atom:updated><title>Oracle GoldenGate and undocumented OCI redo log APIs</title><description>&lt;a href="http://afatkulin.blogspot.com/2011/03/oracle-goldengate-asm-and-dblogreader.html"&gt;Last time&lt;/a&gt; I did some quick write up on the new APIs appeared in Oracle GoldenGate 11G which can be used against Oracle 10.2.0.5 and 11.2.0.2 to improve how GG handles redo on ASM.&lt;br /&gt;
&lt;br /&gt;
Some of the items still required follow up, namely:&lt;br /&gt;
&lt;ul&gt;&lt;li&gt;What are the names of these new API functions?&lt;/li&gt;
&lt;li&gt;What kind of security privileges do you need to call these?&lt;/li&gt;&lt;/ul&gt;I was especially interested in the last item since the ability to read Oracle's redo log gives you  extremely powerful access to database information.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;New OCI APIs&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
Discovering these is pretty straightforward, all we need to do is a stack dump on a running GG Extract process:&lt;br /&gt;
&lt;pre&gt;[oracle@quadro.com ~]$ pstack 4390
...
#10 0x00002acc23443ef0 in knxOGGRedoLogRead () from /u01/app/oracle/ggs_11g/libclntsh.so.11.1
&lt;b&gt;#11 0x00002acc232a9b3b in OCIPOGGRedoLogRead () from /u01/app/oracle/ggs_11g/libclntsh.so.11.1&lt;/b&gt;
#12 0x00000000008fa92c in ASMReader2::readFile(unsigned int, unsigned int, unsigned int*, char*, char*) ()
...&lt;/pre&gt;Once we know the name of the read function (&lt;i&gt;OCIPOGGRedoLogRead&lt;/i&gt;) we can find the rest of them:&lt;br /&gt;
&lt;pre&gt;[oracle@quadro.com ~]$ nm /u01/app/oracle/ggs_11g/libclntsh.so.11.1 | grep OCIPOGG
00000000010e3954 T OCIPOGGRedoLogClose
00000000010e3c90 T OCIPOGGRedoLogOpen
00000000010e3a2a T OCIPOGGRedoLogRead&lt;/pre&gt;As you can see, these functions are indeed exposed directly through &lt;i&gt;libclntsh&lt;/i&gt; library. We can even see the Extract process "in action" calling these functions with the help of Linux's &lt;i&gt;gdb&lt;/i&gt;:&lt;br /&gt;
&lt;pre&gt;[oracle@quadro.com ~]$ ps -aef | grep extract
oracle    4390  4386  0 19:36 ?        00:00:00 /u01/app/oracle/ggs_11g/extract PARAMFILE /u01/app/oracle/ggs_11g/dirprm/11g_ext.prm REPORTFILE /u01/app/oracle/ggs_11g/dirrpt/11G_EXT.rpt PROCESSID 11G_EXT USESUBDIRS
oracle    4985  4114  0 19:51 pts/2    00:00:00 grep extract
[oracle@quadro.com ~]$ gdb /u01/app/oracle/ggs_11g/extract -p 4390
...
(gdb) break OCIPOGGRedoLogClose
Breakpoint 1 at 0x2acc232a9958
(gdb) break OCIPOGGRedoLogOpen
Breakpoint 2 at 0x2acc232a9c94
(gdb) break OCIPOGGRedoLogRead
Breakpoint 3 at 0x2acc232a9a2e
(gdb) info break
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x00002acc232a9958 &lt;OCIPOGGRedoLogClose+4&gt;
2       breakpoint     keep y   0x00002acc232a9c94 &lt;OCIPOGGRedoLogOpen+4&gt;
3       breakpoint     keep y   0x00002acc232a9a2e &lt;OCIPOGGRedoLogRead+4&gt;&lt;/pre&gt;After we've sat up our breakpoints we can let the Extract run and hit one of these:&lt;br /&gt;
&lt;pre&gt;(gdb) continue
Continuing.

Breakpoint 3, 0x00002acc232a9a2e in OCIPOGGRedoLogRead ()
   from /u01/app/oracle/ggs_11g/libclntsh.so.11.1

(gdb) bt
#0  0x00002acc232a9a2e in OCIPOGGRedoLogRead ()
   from /u01/app/oracle/ggs_11g/libclntsh.so.11.1
#1  0x00000000008fa92c in ASMReader2::readFile (this=0xa408100,
    offset=&lt;value optimized out&gt;, bytes_to_read=1024000,
    bytes_read=0x7fffb8c73ad4, buffer=0xabc0000 "\001\"", errtext=0xc457c0 "")
    at /home/ecloud/workspace/Build_FBO_OpenSys_r11.1.1.0.0_078_[34086]/perforce/src/app/er/redo/oracle/asm.c:798
...&lt;/pre&gt;On top of the stack we can see GoldenGate making a call to &lt;i&gt;OCIPOGGRedoLogRead&lt;/i&gt;. We can also get a sense of parameters by looking at the calling function (&lt;i&gt;ASMReader2::readFile&lt;/i&gt;). The function specifies how many bytes it would like to read (&lt;i&gt;bytes_to_read=1024000&lt;/i&gt;), a pointer to a variable which will hold the actual amount read (&lt;i&gt;bytes_read=0x7fffb8c73ad4&lt;/i&gt;), a pointer to a buffer to store the returned redo data (&lt;i&gt;buffer=0xabc0000&lt;/i&gt;) and the error text, if any (&lt;i&gt;errtext=0xc457c0&lt;/i&gt;).&lt;br /&gt;
&lt;br /&gt;
Now it is time to see the other two functions usage. What I did is switched the redo logs in my database, disabled breakpoint number 3 (the one which points to &lt;i&gt;OCIPOGGRedoLogRead&lt;/i&gt;) and let the Extract process continue:&lt;br /&gt;
&lt;pre&gt;(gdb) disable 3

(gdb) continue
Continuing.

Breakpoint 2, 0x00002acc232a9c94 in OCIPOGGRedoLogOpen ()
   from /u01/app/oracle/ggs_11g/libclntsh.so.11.1
(gdb) bt
#0  0x00002acc232a9c94 in OCIPOGGRedoLogOpen ()
   from /u01/app/oracle/ggs_11g/libclntsh.so.11.1
#1  0x00000000008fa859 in ASMReader2::openFile (this=0xa408100,
    filename=0xa408350 "+DATA/ora11gr2/onlinelog/group_3.258.743164441",
    lblksize=&lt;value optimized out&gt;, blksize=0x0, tot_blks=0x0,
    filetype=&lt;value optimized out&gt;, errtext=0xc457c0 "")
    at /home/ecloud/workspace/Build_FBO_OpenSys_r11.1.1.0.0_078_[34086]/perforce/src/app/er/redo/oracle/asm.c:762
...&lt;/pre&gt;Interestingly enough we never made a call to &lt;i&gt;OCIPOGGRedoLogClose&lt;/i&gt;. Maybe I should open an Oracle SR and complain :)&lt;br /&gt;
&lt;br /&gt;
As before, we can get a glimpse of parameters which include redo log file name as well as bunch of output parameters specifying the block size and so on (which looks somewhat similar to what you get from &lt;i&gt;dbms_diskgroup.open&lt;/i&gt; call).&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Security&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
Looking at how all this stuff is being exposed through OCI, I became even more eager to find out what kind of privileges do you require on the database side in order to be able to use these. The first step was to try and run the Extract process with stripped out privileges which thankfully resulted in the following error:&lt;br /&gt;
&lt;pre&gt;2011-04-05 20:10:36  ERROR   OGG-00446  Opening ASM file +DATA/ora11gr2/onlinelog/group_3.258.743164441 in DBLOGREADER mode: (1031) ORA-01031: insufficient privilegesNot able to establish initial position for sequence 398, rba 1040.&lt;/pre&gt;At least there is something but what exactly is it? GoldenGate's documentation (should I add "as usual?") doesn't mention anything in that regards so I had to figure that one out on my own.&lt;br /&gt;
&lt;br /&gt;
After some back and forth I was able to eventually discover the privilege which makes it all possible -- it's &lt;i&gt;select any transaction&lt;/i&gt;. Alas it is not documented to play this role but now you know anyway.</description><link>http://afatkulin.blogspot.com/2011/04/oracle-goldengate-and-undocumented-oci.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>2</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-4573771736624458913</guid><pubDate>Sat, 05 Mar 2011 18:21:00 +0000</pubDate><atom:updated>2011-03-05T13:21:22.898-05:00</atom:updated><title>Oracle GoldenGate, ASM and DBLOGREADER option</title><description>Some time ago I did a &lt;a href="http://www.pythian.com/news/7459/oracle-goldengate-extract-internals-part-ii/"&gt;write up&lt;/a&gt; detailing how Oracle GoldenGate interacts with Oracle ASM. The technology stack where the Extract process establishes a connection with the ASM instance and then reads file contents using dbms_diskgroup package is quite cumbersome to say the least. When compared to how GG reads redo from a filesystem (detailed &lt;a href="http://www.pythian.com/news/7225/oracle-goldengate-extract-internals-part-i/"&gt;here&lt;/a&gt;), the way GG has to deal with ASM is quite inefficient to say the least. In fact, as part of my original article, the recommendation was to use a bequeath connection when possible to cut on the amount of overhead. Nowadays the use of bequeath connection became an official recommendation from Oracle outlined in Oracle GG 11G &lt;a href="http://download.oracle.com/docs/cd/E18101_01/doc.1111/e18165.pdf"&gt;Release Notes&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;DBLOGREADER&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
I was quite excited to see that Oracle made some progress in it's newer release of GoldenGate (11.1). Here is what the documentation says about the new option:&lt;br /&gt;
&lt;br /&gt;
&lt;blockquote&gt;A new DBLOGREADER option was added to TRANLOGOPTIONS to enable Extract to use a &lt;br /&gt;
newer API for capturing from an Oracle ASM instance. This feature is available as of Oracle &lt;br /&gt;
10.2.0.5.&lt;/blockquote&gt;&lt;br /&gt;
This was indeed promising, however, what exactly has changed remained to be discovered.&lt;br /&gt;
&lt;br /&gt;
The first change you're going to notice when using DBLOGREADER is that the Extract process establishes a connection with the RDBMS instance (not an ASM instance). If you trace the relevant server side process then you'll observer the following event being emitted each time it reads from the redo:&lt;br /&gt;
&lt;br /&gt;
&lt;pre&gt;WAIT #0: nam='log file sequential read' ela= 3 log#=0 block#=102653 blocks=2000 obj#=-1 tim=1299345636565911&lt;/pre&gt;&lt;br /&gt;
One of the advantages of the new option is the ability to use a much larger buffer (the old mechanism was limited to something around 28K which is abysmal for any system with even moderate redo generation rate). As we can see from the above, we had &lt;i&gt;2000x512=1000K&lt;/i&gt; worth of data read in one shot so at least that part works as advertised. But how is it done?&lt;br /&gt;
&lt;br /&gt;
A stack dump reveals the following:&lt;br /&gt;
&lt;br /&gt;
&lt;pre&gt;[oracle@ora11gr2 ggs_11g]$ pstack 4740
#1  0x00000000090bae87 in sskgpwwait ()
#2  0x00000000090b9f9a in skgpwwait ()
#3  0x0000000008d29f93 in ksliwat ()
#4  0x0000000008d29541 in kslwaitctx ()
#5  0x0000000008d269ab in kslwait ()
&lt;b&gt;#6  0x00000000071f8566 in knloggRedoRead ()
#7  0x00000000071f9e05 in knloggmain ()&lt;/b&gt;
#8  0x0000000008eae67a in opiodr ()
#9  0x00000000090467f9 in ttcpip ()
#10 0x0000000001722f86 in opitsk ()
#11 0x0000000001727c26 in opiino ()
#12 0x0000000008eae67a in opiodr ()
#13 0x000000000171eedc in opidrv ()
#14 0x0000000001d95f5f in sou2o ()
#15 0x0000000000a07935 in opimai_real ()
#16 0x0000000001d9b408 in ssthrdmain ()
#17 0x0000000000a078a1 in main ()&lt;/pre&gt;&lt;br /&gt;
Look at the functions in line #6 and #7 (&lt;i&gt;knloggmain()&lt;/i&gt; and &lt;i&gt;knloggRedoRead()&lt;/i&gt;). These functions follow right after OPI (Oracle Program Interface) layer which makes this code path look rather efficient. No need to deal with compilation (KK) and execution (KX) layers. Just a direct and straightforward call to the functions we need. Indeed, when I did some quick and dirty tests (using 11.2.0.2), the new code path was able to archive about 90% performance compared to just reading your redo from ext3 filesystem.&lt;br /&gt;
&lt;br /&gt;
The only question I have about all this is a security paradigm behind &lt;i&gt;knloggmain()&lt;/i&gt; calls. This function appears to be directly exposed through the OPI layer and, while this helps archive greater efficiency, how does it control who can who can not calls it? You may not necessarily want somebody connect to your database and start reading your redo stream as you may get all kinds of sensitive information right there. I haven't spent any time figuring that one out yet though.</description><link>http://afatkulin.blogspot.com/2011/03/oracle-goldengate-asm-and-dblogreader.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>2</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-6695699248811420656</guid><pubDate>Sun, 19 Dec 2010 19:54:00 +0000</pubDate><atom:updated>2010-12-19T14:54:58.918-05:00</atom:updated><title>Oracle GoldenGate Trail File Size</title><description>When it comes to estimating how big your trail files will be, Oracle &lt;a href="http://download.oracle.com/docs/cd/E18101_01/doc.1111/e17799.pdf"&gt;documentation&lt;/a&gt; suggests to use the following formula:&lt;br /&gt;
&lt;pre&gt;[log volume in one hour] x [number of hours downtime] x .4 = trail disk space&lt;/pre&gt;It is also described as being a conservative estimate so you are likely to archive a better mileage. What I found is that there are some corner cases which can produce some "anomaly" results.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Test Case&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
I'll start with a specially crafted schema and table names as well as data:&lt;br /&gt;
&lt;pre&gt;create user uuuuuuuuuuuuuuuuuuuuuuuuuuuuuu identified by "u";

User created.

SQL&gt; grant resource to uuuuuuuuuuuuuuuuuuuuuuuuuuuuuu;

Grant succeeded.

SQL&gt; create table uuuuuuuuuuuuuuuuuuuuuuuuuuuuuu.tttttttttttttttttttttttttttttt
  2  (
  3     n number primary key
  4  );

Table created.

SQL&gt; alter table uuuuuuuuuuuuuuuuuuuuuuuuuuuuuu.tttttttttttttttttttttttttttttt
  2 add supplemental log data (primary key) columns;

Table altered.&lt;/pre&gt;I've just created a new trail file which is currently 974 bytes in size:&lt;br /&gt;
&lt;pre&gt;[oracle@gg1 dirdat]$ ls -l aa000000
-rw-rw-rw- 1 oracle oinstall 974 Dec 19 11:50 aa000000&lt;/pre&gt;What I'm going to do is insert some data into a table and then measure how much redo as well as trail data were generated:&lt;br /&gt;
&lt;pre&gt;SQL&gt; set autot traceonly stat
SQL&gt; insert into uuuuuuuuuuuuuuuuuuuuuuuuuuuuuu.tttttttttttttttttttttttttttttt
      select power(10,10)+(level-1)*power(10,10)
        from dual
        connect by level &lt;= 10000;  2    3    4

10000 rows created.


Statistics
----------------------------------------------------------
        0  recursive calls
      329  db block gets
       31  consistent gets
        0  physical reads
     364612  redo size
      821  bytes sent via SQL*Net to client
      917  bytes received via SQL*Net from client
        3  SQL*Net roundtrips to/from client
        2  sorts (memory)
        0  sorts (disk)
      10000  rows processed

SQL&gt; commit;

Commit complete.&lt;/pre&gt;Let's take a look at the trail file size:&lt;br /&gt;
&lt;pre&gt;[oracle@gg1 dirdat]$ ls -l aa000000
-rw-rw-rw- 1 oracle oinstall 1619890 Dec 19 13:40 aa000000&lt;/pre&gt;That is roughly 4.4 times bigger than our redo size and 11 times bigger than Oracle's "conservative" estimate. Your storage provisioning might be in for a surprise.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Trail File&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
To understand the key factors which resulted in our trail file to be of such outrageous size let's use &lt;i&gt;logdump&lt;/i&gt; and take a look inside:&lt;br /&gt;
&lt;pre&gt;Logdump 11 &gt;open ./dirdat/aa000000
Current LogTrail is /u01/app/oracle/ggs/dirdat/aa000000
Logdump 12 &gt;ghdr on
Logdump 13 &gt;next 2

2010/12/19 13:39:16.632.818 FileHeader           Len   966 RBA 0
Name: *FileHeader*
 3000 01b6 3000 0008 4747 0d0a 544c 0a0d 3100 0002 | 0...0...GG..TL..1...
 0002 3200 0004 2000 0000 3300 0008 02f1 bdfb 2d3e | ..2... ...3.......-&gt;
 74f2 3400 0028 0026 7572 693a 6767 313a 7175 6164 | t.4..(.&amp;uri:gg1:quad
 726f 3a63 6f6d 3a3a 7530 313a 6170 703a 6f72 6163 | ro:com::u01:app:orac
 6c65 3a67 6773 3600 0025 0023 2f75 3031 2f61 7070 | le:ggs6..%.#/u01/app
 2f6f 7261 636c 652f 6767 732f 6469 7264 6174 2f61 | /oracle/ggs/dirdat/a
 6130 3030 3030 3037 0000 0101 3800 0004 0000 0000 | a0000007....8.......

___________________________________________________________________
Hdr-Ind    :     E  (x45)     Partition  :     .  (x04)
UndoFlag   :     .  (x00)     BeforeAfter:     A  (x41)
RecLength  :    19  (x0013)   IO Time    : 2010/12/19 13:40:32.000.000
IOType     :     5  (x05)     OrigNode   :   255  (xff)
TransInd   :     .  (x00)     FormatType :     R  (x52)
SyskeyLen  :     0  (x00)     Incomplete :     .  (x00)
AuditRBA   :         45       AuditPos   : 37458516
Continued  :     N  (x00)     RecCount   :     1  (x01)

2010/12/19 13:40:32.000.000 Insert               Len    19 RBA 974
Name: UUUUUUUUUUUUUUUUUUUUUUUUUUUUUU.TTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
After  Image:                                             Partition 4   G  b
 0000 000f 0000 000b 3130 3030 3030 3030 3030 30   | ........10000000000&lt;/pre&gt;The first record is a standard trail file header and is of little interest to us. We're going to take look at the second record. First of all, as you can see, we've got a fully qualified table name stored in there. We can see the inserted value as well. Now if all that stuff got stored in plain text that could explain it... let's check it out:&lt;br /&gt;
&lt;pre&gt;[oracle@gg1 dirdat]$ grep --binary-files=text -o -i \
&gt; uuuuuuuuuuuuuuuuuuuuuuuuuuuuuu.tttttttttttttttttttttttttttttt \
&gt; aa000000 \
&gt; | wc -l
10000&lt;/pre&gt;Indeed, we've got a fully qualified table name appearing 10 thousand times on our trail file! The longer your schema and/or your table names are the bigger your trail file will be. The main reason Oracle Streams are using object identifiers (as well as Oracle's redo) is to avoid getting into exactly this kind of trouble. This makes GoldenGate configuration easier as it doesn't have to store the mapping information on the target database but you also have to pay the price with every row modification which gets captured.&lt;br /&gt;
&lt;br /&gt;
The story doesn't end there, however. Let's take a look at the inserted data. How do you think that got stored?&lt;br /&gt;
&lt;pre&gt;[oracle@gg1 dirdat]$ grep --binary-files=text -o -i \
&gt; 0000000000 \
&gt; aa000000 \
&gt; | wc -l
10000&lt;/pre&gt;That's right - all numbers got stored in plain text. The fundamental problem here is that trail file does not store data types. Everything just gets converted to strings. This helps dealing with heterogeneous data sources but also makes the storage for certain data types to be very inefficient. Again, you have to pay the price somewhere. If you &lt;i&gt;gzip&lt;/i&gt; the above trail file you'll get a whopping 25x compression ratio.&lt;br /&gt;
&lt;br /&gt;
You may ask yourself how column names are being stored? The answer is there are no column names in the trail file, just column positions. That's why GoldenGate requires column order to match between source and target databases, otherwise you'll get your data mapped to a wrong column. Add a lack of data types and you might be in for a very interesting results. If column order doesn't match then you'll have to use &lt;i&gt;defgen&lt;/i&gt; to get your data across properly.&lt;br /&gt;
&lt;br /&gt;
In a nutshell, watch out for tables/schemas with long names and lots of numeric data.</description><link>http://afatkulin.blogspot.com/2010/12/oracle-goldengate-trail-file-size.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>5</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-7788693593463771762</guid><pubDate>Tue, 20 Jul 2010 02:32:00 +0000</pubDate><atom:updated>2010-07-19T22:32:09.466-04:00</atom:updated><title>ASM Mirroring and Disk Partnership</title><description>Let's say you have a RAID10 disk array&amp;nbsp;with&amp;nbsp;50 disks. I'm offering you to play the following game. You'll spin a wheel of fortune&amp;nbsp;divided&amp;nbsp;into 50 equal slots with numbers ranging from 1 to 50. After you spun the wheel for the first time you'll write the resulted number down. After that I'll offer you to spin the second wheel of fortune equally&amp;nbsp;divided&amp;nbsp;into 49 slots where the number&amp;nbsp;you've just&amp;nbsp;pulled is missing (so you can't pull the same number twice).&lt;br /&gt;
&lt;br /&gt;
After that you'll end up with two (mutually exclusive) random numbers in a range from 1 to 50. What I'm going to ask you do next is pull the disks with corresponding numbers out of your RAID10 array. What are the odds of your entire array going down?&lt;br /&gt;
&lt;br /&gt;
In a classical RAID10 setup where every drive is being mirrored by a single other drive you can calculate the probability in a&amp;nbsp;straightforward&amp;nbsp;fashion. After you've pulled the first drive out (doesn't matter which one) there is only one specific disk out of 49 which you have to pull in order for the entire array to go bust. So your odds of taking the entire array down will be 1/49 or roughly 2%.&lt;br /&gt;
&lt;br /&gt;
Now let's say that instead of a regular RAID10 array you've got a normal redundancy ASM disk group with two failure groups (25 disks each). What are the odds now?&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;ASM Mirroring&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
Before we're going to answer the above question we need to realize that ASM does not mirror disks the same way traditional RAID10 does. In fact it doesn't mirror disks at all. It mirrors extents instead. For all you know is that the extents from the disk you've just pulled out won't be mirrored in the same failure group. So that leaves us with 24 disks as safe. But what about the other 25 disks from the other failure group? How much of these disks are unsafe and will result in your normal redundancy disk group going south?&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Disk Partnership&lt;/b&gt;&lt;br /&gt;
&lt;b&gt;&lt;br /&gt;
&lt;/b&gt;&lt;br /&gt;
When mirroring extents ASM uses a concept called Disk Partnership. Every disk in a normal redundancy disk group has one or more partners which are used to mirror primary extents from that disk. This also means that the loss of any of the partner disks is fatal to the disk group as you'll nuke both the primary extent and it's mirror copy (keep in mind that we're talking about pulling both disks out before the disk group would be able to rebalance). At least now we know what we need next in order to solve the puzzle. We need to find out how many partners each of the disks in our array have.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Disk Partners&lt;/b&gt;&lt;br /&gt;
&lt;b&gt;&lt;br /&gt;
&lt;/b&gt;&lt;br /&gt;
Let's say that the first disk we've pulled out was disk number 0. The following query can be used to find all partners for the disk number 0 in a first disk group:&lt;br /&gt;
&lt;br /&gt;
&lt;pre&gt;SQL&amp;gt; select p.number_kfdpartner, d.FAILGROUP
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;from x$kfdpartner p, v$asm_disk d
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;where p.disk=0
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;and p.grp=1
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;and p.grp=group_number
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;and p.number_kfdpartner=d.disk_number; &amp;nbsp;2 &amp;nbsp; &amp;nbsp;3 &amp;nbsp; &amp;nbsp;4 &amp;nbsp; &amp;nbsp;5 &amp;nbsp; &amp;nbsp;6

NUMBER_KFDPARTNER FAILGROUP
----------------- ------------------------------
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 25 FG2
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 26 FG2
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 27 FG2
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 29 FG2
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 38 FG2
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 46 FG2
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 48 FG2
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 49 FG2

8 rows selected.&lt;/pre&gt;Pulling any of the above disks out at the same time with the disk number 0 will be fatal for our normal redundancy disk group. In other words, once we pull the first disk out, there are other 8 disks out of 49 which are unsafe. That will bring our odds up (or down, depending in which outcome you're interested :) to&amp;nbsp;8/49 or&amp;nbsp;a little bit more than 16%.&lt;br /&gt;
&lt;br /&gt;
You can confirm that every disk has exactly 8 partners by running the following query:&lt;br /&gt;
&lt;br /&gt;
&lt;pre&gt;SQL&amp;gt; select min(cnt), max(cnt) from (
select number_kfdpartner disk_number, count(*) cnt
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;from x$kfdpartner
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;where grp=1
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;group by number_kfdpartner); &amp;nbsp;2 &amp;nbsp; &amp;nbsp;3 &amp;nbsp; &amp;nbsp;4 &amp;nbsp; &amp;nbsp;5

&amp;nbsp;&amp;nbsp;MIN(CNT) &amp;nbsp; MAX(CNT)
---------- ----------
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 8 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;8&lt;/pre&gt;&lt;br /&gt;
&lt;b&gt;Partner Disk Count&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
Is there any way to control the number of partner disks which ASM uses for extents mirroring? Turns out that there is. &lt;i&gt;Note that it's a completely non supported operation so you shouldn't be playing with it.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
The parameter which controls the &lt;i&gt;maximum&lt;/i&gt; number of partner disks is called&amp;nbsp;&lt;i&gt;_asm_partner_target_disk_part&lt;/i&gt;. In 11GR2 that parameter has a default value of 8. I didn't have a chance to check it in a previous versions but supposedly it's default value there is 10(&lt;a href="#ref1"&gt;1&lt;/a&gt;). So at least we know that Oracle itself sometimes changes it between different releases.&lt;br /&gt;
&lt;br /&gt;
All you need to do after changing this parameter is to rebalance the disk group:&lt;br /&gt;
&lt;br /&gt;
&lt;pre&gt;SQL&amp;gt; alter system set "_asm_partner_target_disk_part"=2;

System altered.

SQL&amp;gt; alter diskgroup data rebalance;

Diskgroup altered.

&lt;div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;SQL&amp;gt; select p.number_kfdpartner, d.FAILGROUP&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;from x$kfdpartner p, v$asm_disk d&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;where p.disk=0&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;and p.grp=1&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;and p.grp=group_number&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;and p.number_kfdpartner=d.disk_number; &amp;nbsp;2 &amp;nbsp; &amp;nbsp;3 &amp;nbsp; &amp;nbsp;4 &amp;nbsp; &amp;nbsp;5 &amp;nbsp; &amp;nbsp;6&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;NUMBER_KFDPARTNER FAILGROUP&lt;/div&gt;&lt;div&gt;----------------- ------------------------------&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 25 FG2&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 38 FG2&lt;/div&gt;&lt;/div&gt;&lt;/pre&gt;So here it goes!&lt;br /&gt;
&lt;div&gt;&lt;br /&gt;
&lt;/div&gt;&lt;div&gt;&lt;b&gt;References&lt;/b&gt;&lt;/div&gt;&lt;a name="ref1" href="http://www.amazon.com/Oracle-Automatic-Storage-Management-Under/dp/0071496076"&gt;Oracle Automatic Storage Management: Under-the-Hood &amp;amp; Practical Deployment Guide&lt;/a&gt;&lt;br /&gt;
&lt;a href="https://twiki.cern.ch/twiki/bin/view/PDBService/ASM_Internals"&gt;ASM Metadata and Internals&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;All test were performed with Oracle Grid Infrastructure 11.2.0.1&lt;/i&gt;</description><link>http://afatkulin.blogspot.com/2010/07/asm-mirroring-and-disk-partnership.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>5</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-3610321853051438063</guid><pubDate>Wed, 09 Jun 2010 02:00:00 +0000</pubDate><atom:updated>2010-06-09T22:11:43.598-04:00</atom:updated><title>11GR2 Result Cache Scalability</title><description>&lt;style type="text/css"&gt;.nobrtable br { display: none }&lt;/style&gt;&lt;span style="font-weight:bold;"&gt;Result Cache in 11GR1&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Two years ago I wrote a series of posts where I explained some of the dynamics around Result Cache latch. To recap, the result cache memory in 11GR1 is backed up by a single RC latch. That in itself wouldn't be so much of an issue (at least relatively to what we've got in reality) had the latch allowed for shared mode gets in case all you have to do is read from the result cache memory.&lt;br /&gt;&lt;br /&gt;Alas, the latch turned out to be without shared mode gets. It is going almost without saying that, as concurrency levels increased, that single latch was behaving more and more like to a hand brake (&lt;a href="http://www.pythian.com/news/683/oracle-11g-result-cache-tested-on-eight-way-itanium"&gt;link&lt;/a&gt; to a test I've done back then on a 8-way Itanium 2).&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Back to the future&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;When 11GR2 has been released I knew that at some point in time I'll need to go back and revisit this subject. What I did is a couple of quick and dirty runs which came back confirming the same single latch and no shared mode gets so it didn't look like something has really changed. At this point I've decided to revisit it a bit later. This a "bit later" happened just recently.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;How bad can it get?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;What I wanted to do is get an UltraSPARC T2 and face it against Core i7 980X on a different concurrency levels in order to see how bad it can get. T2 will require quite a lot of parallelism in order to keep up even with a single i7 core. But since all we've got is a single RC latch, I've expected T2 to choke on it quite fast as not only there will be a lot of processes competing for the same latch, the slow single-threaded performance will cause the latch to be held for a much longer periods of time. Performance degradation will be dare.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Result Cache in 11GR2&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I used the same test described &lt;a href="http://www.pythian.com/news/598/oracle-11g-query-result-cache-rc-latches"&gt;here&lt;/a&gt; as it is targeted at exploiting RC latch weakness and gives me the ability to compare with the old results. I've used 250K lookup iterations. The performance was measured as a total number of lookups performed per second and RC latch statistics were captured for analysis.&lt;br /&gt;&lt;br /&gt;Since 980X has 6 cores and 12 threads, the tests were done with 1 to 12 processes running at the same time which also gave an opportunity to see how well HT will scale. Note that I plan to do some further testing on T2 with up to 64 threads but for now I've tested up to 12 threads only as I couldn't get a test window big enough.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;UltraSPARC T2 Results&lt;/span&gt;&lt;br /&gt;&lt;div class="nobrtable"&gt;&lt;br /&gt;&lt;table border="1"&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;th&gt;# of processes&lt;/th&gt;&lt;br /&gt;  &lt;th&gt;Buffer Cache&lt;/th&gt;&lt;br /&gt;  &lt;th&gt;% linear&lt;/th&gt;&lt;br /&gt;  &lt;th&gt;Result Cache&lt;/th&gt;&lt;br /&gt;  &lt;th&gt;% linear&lt;/th&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;1&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;4426&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;100&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;4555&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;100&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;2&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;8930&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;100.88&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;9124&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;100.15&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;3&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;13465&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;101.41&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;13731&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;100.48&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;4&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;17886&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;101.03&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;18179&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;99.77&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;5&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;22290&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;100.72&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;22715&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;99.74&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;6&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;26615&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;100.22&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;27012&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;98.84&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;7&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;30659&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;98.96&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;30804&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;96.61&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;8&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;34347&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;97&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;34910&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;95.8&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;9&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;38389&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;96.37&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;39029&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;95.2&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;10&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;42772&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;96.64&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;43126&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;94.68&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;11&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;46840&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;96.21&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;46936&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;93.68&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;12&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;50667&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;95.4&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;50590&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;92.55&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt;&lt;/table&gt;&lt;br /&gt;&lt;/div&gt;When I saw these numbers for the first time I was quite surprised just how good these results are! UlstraSPARC T2 end up being far from choking and, as a matter of fact, the only position where Result Cache had to give up is the last one. If you reference the &lt;a href="http://www.pythian.com/news/683/oracle-11g-result-cache-tested-on-eight-way-itanium"&gt;results&lt;/a&gt; I've obtained on 8-way Itanium 2 in 11GR1 you'll see that Result Cache gave up much earlier and scaled a lot worse.&lt;br /&gt;&lt;br /&gt;This certainly looks promising so let's take a look at the RC latch statistic:&lt;br /&gt;&lt;br /&gt;&lt;div class="nobrtable"&gt;&lt;br /&gt;&lt;table border="1"&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;th&gt;# of processes&lt;/th&gt;&lt;br /&gt;  &lt;th&gt;Gets&lt;/th&gt;&lt;br /&gt;  &lt;th&gt;Misses&lt;/th&gt;&lt;br /&gt;  &lt;th&gt;Sleeps&lt;/th&gt;&lt;br /&gt;  &lt;th&gt;Wait Time&lt;/th&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;1&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;500001&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;0&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;0&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;0&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;2&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;1000002&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;40253&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;1&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;0&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;3&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;1500003&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;50404&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;0&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;0&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;4&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;2000004&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;165116&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;9&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;464&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;5&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;2500005&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;211559&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;5&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;182&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;6&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;3000006&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;437898&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;8&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;6877&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;7&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;3500007&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;805752&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;52&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;16556&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;8&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;4000008&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;1214762&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;20&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;2980&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;9&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;4500009&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;1775372&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;188&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;3140&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;10&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;5000010&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;2244964&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;491&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;29568&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;11&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;5500011&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;2552323&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;664&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;28011&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;12&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;6000012&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;3019903&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;1226&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;60005&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt;&lt;/table&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;There is one astonishing fact about the above number. Let's get some efficiency metrics in place for comparison between these numbers and the ones I've got in 11GR1. I'll use a data point with eight parallel processes as it's the highest reference point I can get across both data sets.&lt;br /&gt;&lt;br /&gt;First of all, the number of gets per execution remained the same and equals two gets per exec. If we were going to calculate &lt;span style="font-style:italic;"&gt;% miss per get&lt;/span&gt; we'll get 28.62% in 11GR1 and 50.33% in 11GR2. In other words, roughly every second get request has resulted in a miss in 11GR2 and every third in 11GR1. It may appear as if this got worse but it's really a consequence from something else.&lt;br /&gt;&lt;br /&gt;If we calculate &lt;span style="font-style:italic;"&gt;% sleep per miss&lt;/span&gt; we'll get 31.36% in 11GR1 but only 0.04% in 11GR2! In other words, the amount of times a process had to go to sleep has drastically decreased. In almost all of the cases the process was able to acquire a latch during a spin without going into a sleep. This also explains why &lt;span style="font-style:italic;"&gt;% miss per get&lt;/span&gt; in 11GR2 went up and shows that a lowering in efficiency for a single metric does not necessarily indicates a problem, it might happen because some other correlated metric has in fact improved.&lt;br /&gt;&lt;br /&gt;There is certainly a sign of a great improvement but what is it? Most likely the improvement is related to the optimization of how long the latch is required to be held. The time required to hold the latch became so small that, in most of the cases, the process is able to acquire it during spinning before being required to go to sleep (i.e. less than &lt;span style="font-style:italic;"&gt;_spin_count&lt;/span&gt; iterations).&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Core i7 980X Results&lt;/span&gt;&lt;br /&gt;&lt;div class="nobrtable"&gt;&lt;br /&gt;&lt;table border="1"&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;th&gt;# of processes&lt;/th&gt;&lt;br /&gt;  &lt;th&gt;Buffer Cache&lt;/th&gt;&lt;br /&gt;  &lt;th&gt;% linear&lt;/th&gt;&lt;br /&gt;  &lt;th&gt;Result Cache&lt;/th&gt;&lt;br /&gt;  &lt;th&gt;% linear&lt;/th&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;1&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;40064&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;100&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;43554&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;100&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;2&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;78989&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;98.58&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;84602&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;97.12&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;3&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;121753&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;101.3&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;127768&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;97.79&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;4&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;159490&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;99.52&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;166667&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;95.67&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;5&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;194704&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;97.2&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;204583&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;93.94&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;6&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;229709&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;95.56&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;240770&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;92.13&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;7&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;231788&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;82.65&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;244755&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;80.28&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;8&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;233918&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;72.98&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;246305&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;70.69&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;9&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;250836&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;69.57&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;260718&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;66.51&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;10&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;267094&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;66.67&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;275330&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;63.22&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;11&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;280326&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;63.61&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;290084&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;60.55&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;12&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;290416&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;60.41&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;293830&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;56.22&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt;&lt;/table&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;Here Result Cache won across all the positions. We need about 10 processes running on UltraSPARC T2 in order to beat a single process running on i7 980X. Performance gains declined rapidly once we got over six concurrent processes but still we were able to realize some additional performance with 12 threads being about 22% faster than 6 threads.&lt;br /&gt;&lt;br /&gt;Latch statistics:&lt;br /&gt;&lt;br /&gt;&lt;div class="nobrtable"&gt;&lt;br /&gt;&lt;table border="1"&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;th&gt;# of processes&lt;/th&gt;&lt;br /&gt;  &lt;th&gt;Gets&lt;/th&gt;&lt;br /&gt;  &lt;th&gt;Misses&lt;/th&gt;&lt;br /&gt;  &lt;th&gt;Sleeps&lt;/th&gt;&lt;br /&gt;  &lt;th&gt;Wait Time&lt;/th&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;1&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;500001&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;0&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;0&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;0&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;2&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;1000002&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;40456&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;0&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;0&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;3&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;1500003&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;117893&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;5&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;71&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;4&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;2000004&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;209399&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;0&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;0&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;5&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;2500005&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;381160&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;0&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;0&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;6&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;3000006&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;517745&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;11&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;179&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;7&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;3500007&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;913125&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;20&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;555&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;8&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;4000008&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;1355226&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;26&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;11914&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;9&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;4500009&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;1834112&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;13&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;1017&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;10&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;5000010&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;2602801&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;42&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;1607&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;11&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;5500011&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;3196415&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;145&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;3451&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt; &lt;tr&gt;&lt;br /&gt;  &lt;td&gt;12&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;6000012&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;3730467&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;184&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;123954&lt;/td&gt;&lt;br /&gt; &lt;/tr&gt;&lt;br /&gt;&lt;/table&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;Essentially we're looking at the same phenomena with the amount of sleeps being significantly lower compared to what we observed in 11GR1. With six concurrent processes &lt;span style="font-style:italic;"&gt;% miss per get&lt;/span&gt; is 17.26% and &lt;span style="font-style:italic;"&gt;% sleep per miss&lt;/span&gt; is 0.002%! This allowed Result Cache to stay ahead with up to (and including) 12 concurrent processes running.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;UltraSPARC T2 vs i7 980X&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;We'll wrap up with a nice graph showing result cache performance on both UltraSPARC T2 and Core i7 980X:&lt;br /&gt;&lt;br /&gt;&lt;img src="http://lh5.ggpht.com/_D2NCa9-CEuQ/TBBFkX67qUI/AAAAAAAAAoc/4INLSO4dFkI/t2_vs_980x.png"&gt;&lt;/img&gt;&lt;br /&gt;&lt;br /&gt;i7 980X starts almost where 12 UltraSPARC T2 processes ends. Would T2 be able to narrow the gap with more parallel threads? I'll certainly find out.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Conclusion&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;There is an enormous improvement when it comes to Result Cache scalability in 11GR2. Still it's slower than if we had shared mode gets (or multiple child latches or, even better, both) but it gets very, very close.</description><link>http://afatkulin.blogspot.com/2010/06/11gr2-result-cache-scalability.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://lh5.ggpht.com/_D2NCa9-CEuQ/TBBFkX67qUI/AAAAAAAAAoc/4INLSO4dFkI/s72-c/t2_vs_980x.png" height="72" width="72" /><thr:total>3</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-15440695.post-16644232208784654</guid><pubDate>Mon, 07 Jun 2010 21:47:00 +0000</pubDate><atom:updated>2010-06-07T21:21:27.224-04:00</atom:updated><title>Row cache objects latch contention</title><description>A data loading process was running on UltraSPARC T2 CPU. To take advantage of the platform architecture (or, I'd rather say, to avoid it's limitations) the loading process has been design to run with a massive amount of parallel query slaves in order to extract the maximum output from CMT architecture.&lt;br /&gt;&lt;br /&gt;Every time this data loading process executed, it experienced strange slowdowns on seemingly random points in time. Performance drops were quite substantial, which prompted to do an additional investigation. Upon a closer examination of ASH data, it turned out that all slowdowns were due to &lt;span style="font-style:italic;"&gt;latch: row cache objects&lt;/span&gt; contention.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt;Row cache objects&lt;/span&gt; latch protects the dictionary cache. The first thing was to figure out whether most of the contention was contributed by a particular &lt;span style="font-style:italic;"&gt;row cache objects&lt;/span&gt; child latch:&lt;br /&gt;&lt;pre&gt;SQL&gt; select latch#, child#, sleeps&lt;br /&gt;  2   from v$latch_children&lt;br /&gt;  3   where name='row cache objects'&lt;br /&gt;  4    and sleeps &gt; 0&lt;br /&gt;  5   order by sleeps desc;&lt;br /&gt; &lt;br /&gt;    LATCH#     CHILD#     SLEEPS&lt;br /&gt;---------- ---------- ----------&lt;br /&gt;       270          1   24241645&lt;br /&gt;       270          5        523&lt;br /&gt;       270          4         52&lt;/pre&gt;The first child certainly doesn't look good when we take the amount of sleeps experienced by it, compared to all the other child latches. Once we have the troublesome child latch identified, we can move on and see which type of dictionary cache it protects:&lt;br /&gt;&lt;pre&gt;SQL&gt; select distinct s.kqrstcln latch#,r.cache#,r.parameter name,r.type,r.subordinate#&lt;br /&gt;from v$rowcache r,x$kqrst s&lt;br /&gt;where r.cache#=s.kqrstcid&lt;br /&gt;order by 1,4,5;  2    3    4&lt;br /&gt;&lt;br /&gt; LATCH# CACHE# NAME                              TYPE        SUBORDINATE#&lt;br /&gt;------- ------ --------------------------------- ----------- ------------&lt;br /&gt;      1      3 dc_rollback_segments              PARENT&lt;br /&gt;      2      1 dc_free_extents                   PARENT&lt;br /&gt;      3      4 dc_used_extents                   PARENT&lt;br /&gt;      4      2 dc_segments                       PARENT&lt;br /&gt;      5      0 dc_tablespaces                    PARENT&lt;br /&gt;      6      5 dc_tablespace_quotas              PARENT&lt;br /&gt;      7      6 dc_files                          PARENT&lt;br /&gt;      8     10 dc_users                          PARENT&lt;br /&gt;      8      7 dc_users                          SUBORDINATE            0&lt;br /&gt;      8      7 dc_users                          SUBORDINATE            1&lt;br /&gt;      8      7 dc_users                          SUBORDINATE            2&lt;br /&gt;      9      8 dc_objects                        PARENT&lt;br /&gt;      9      8 dc_object_grants                  SUBORDINATE            0&lt;br /&gt;     10     17 dc_global_oids                    PARENT&lt;br /&gt;     11     12 dc_constraints                    PARENT&lt;br /&gt;     12     13 dc_sequences                      PARENT&lt;br /&gt;     13     16 dc_histogram_defs                 PARENT&lt;br /&gt;     13     16 dc_histogram_data                 SUBORDINATE            0&lt;br /&gt;     13     16 dc_histogram_data                 SUBORDINATE            1&lt;br /&gt;     14     32 kqlsubheap_object                 PARENT&lt;br /&gt;     15     19 dc_table_scns                     PARENT&lt;br /&gt;     15     19 dc_partition_scns                 SUBORDINATE            0&lt;br /&gt;     16     18 dc_outlines                       PARENT&lt;br /&gt;     17     14 dc_profiles                       PARENT&lt;br /&gt;     18     47 realm cache                       PARENT&lt;br /&gt;     18     47 realm auth                        SUBORDINATE            0&lt;br /&gt;     19     48 Command rule cache                PARENT&lt;br /&gt;     20     49 Realm Object cache                PARENT&lt;br /&gt;     20     49 Realm Subordinate Cache           SUBORDINATE            0&lt;br /&gt;     21     46 Rule Set Cache                    PARENT&lt;br /&gt;     22     34 extensible security user and rol  PARENT&lt;br /&gt;     23     35 extensible security principal pa  PARENT&lt;br /&gt;     24     37 extensible security UID to princ  PARENT&lt;br /&gt;     25     36 extensible security principal na  PARENT&lt;br /&gt;     26     33 extensible security principal ne  PARENT&lt;br /&gt;     27     38 XS security class privilege       PARENT&lt;br /&gt;     28     39 extensible security midtier cach  PARENT&lt;br /&gt;     29     44 event map                         PARENT&lt;br /&gt;     30     45 format                            PARENT&lt;br /&gt;     31     43 audit collector                   PARENT&lt;br /&gt;     32     15 global database name              PARENT&lt;br /&gt;     33     20 rule_info                         PARENT&lt;br /&gt;     34     21 rule_or_piece                     PARENT&lt;br /&gt;     34     21 rule_fast_operators               SUBORDINATE            0&lt;br /&gt;     35     23 dc_qmc_ldap_cache_entries         PARENT&lt;br /&gt;     36     52 qmc_app_cache_entries             PARENT&lt;br /&gt;     37     53 qmc_app_cache_entries             PARENT&lt;br /&gt;     38     27 qmtmrcin_cache_entries            PARENT&lt;br /&gt;     39     28 qmtmrctn_cache_entries            PARENT&lt;br /&gt;     40     29 qmtmrcip_cache_entries            PARENT&lt;br /&gt;     41     30 qmtmrctp_cache_entries            PARENT&lt;br /&gt;     42     31 qmtmrciq_cache_entries            PARENT&lt;br /&gt;     43     26 qmtmrctq_cache_entries            PARENT&lt;br /&gt;     44      9 qmrc_cache_entries                PARENT&lt;br /&gt;     45     50 qmemod_cache_entries              PARENT&lt;br /&gt;     46     24 outstanding_alerts                PARENT&lt;br /&gt;     47     22 dc_awr_control                    PARENT&lt;br /&gt;     48     25 SMO rowcache                      PARENT&lt;br /&gt;     49     40 sch_lj_objs                       PARENT&lt;br /&gt;     50     41 sch_lj_oids                       PARENT&lt;br /&gt;&lt;br /&gt;60 rows selected.&lt;/pre&gt;The first child protects &lt;span style="font-style:italic;"&gt;dc_rollback_segments&lt;/span&gt;. We can confirm it by referencing data in &lt;span style="font-style:italic;"&gt;v$rowcache&lt;/span&gt;:&lt;br /&gt;&lt;pre&gt;SQL&gt; select parameter, gets&lt;br /&gt;  2   from v$rowcache&lt;br /&gt;  3   order by gets desc;&lt;br /&gt; &lt;br /&gt;PARAMETER                              GETS&lt;br /&gt;-------------------------------- ----------&lt;br /&gt;dc_rollback_segments              310995555&lt;br /&gt;dc_tablespaces                     76251831&lt;br /&gt;dc_segments                         3912096&lt;br /&gt;dc_users                            2307601&lt;br /&gt;dc_objects                          1460725&lt;br /&gt;dc_users                             608659&lt;br /&gt;dc_histogram_defs                    250666&lt;br /&gt;global database name                  67475&lt;br /&gt;dc_histogram_data                     43098&lt;br /&gt;dc_histogram_data                     14364&lt;br /&gt;dc_global_oids                        14320&lt;br /&gt;outstanding_alerts                     2956&lt;br /&gt;dc_profiles                            2555&lt;br /&gt;dc_awr_control                         1925&lt;br /&gt;dc_object_grants                        745&lt;br /&gt;dc_files                                532&lt;br /&gt;dc_constraints                          201&lt;br /&gt;sch_lj_oids                             158&lt;br /&gt;dc_sequences                            156&lt;br /&gt;dc_table_scns                            20&lt;br /&gt;sch_lj_objs                              18&lt;br /&gt;dc_qmc_ldap_cache_entries                 0&lt;br /&gt;qmc_app_cache_entries                     0&lt;br /&gt;qmc_app_cache_entries                     0&lt;br /&gt;qmtmrcin_cache_entries                    0&lt;br /&gt;qmtmrctn_cache_entries                    0&lt;br /&gt;qmtmrcip_cache_entries                    0&lt;br /&gt;qmtmrctp_cache_entries                    0&lt;br /&gt;qmtmrciq_cache_entries                    0&lt;br /&gt;qmtmrctq_cache_entries                    0&lt;br /&gt;qmrc_cache_entries                        0&lt;br /&gt;qmemod_cache_entries                      0&lt;br /&gt;SMO rowcache                              0&lt;br /&gt;dc_users                                  0&lt;br /&gt;dc_partition_scns                         0&lt;br /&gt;dc_users                                  0&lt;br /&gt;realm auth                                0&lt;br /&gt;Realm Subordinate Cache                   0&lt;br /&gt;rule_or_piece                             0&lt;br /&gt;rule_info                                 0&lt;br /&gt;audit collector                           0&lt;br /&gt;format                                    0&lt;br /&gt;event map                                 0&lt;br /&gt;extensible security midtier cach          0&lt;br /&gt;XS security class privilege               0&lt;br /&gt;extensible security principal ne          0&lt;br /&gt;extensible security principal na          0&lt;br /&gt;extensible security UID to princ          0&lt;br /&gt;extensible security principal pa          0&lt;br /&gt;extensible security user and rol          0&lt;br /&gt;Rule Set Cache                            0&lt;br /&gt;Realm Object cache                        0&lt;br /&gt;Command rule cache                        0&lt;br /&gt;realm cache                               0&lt;br /&gt;dc_outlines                               0&lt;br /&gt;kqlsubheap_object                         0&lt;br /&gt;dc_tablespace_quotas                      0&lt;br /&gt;dc_used_extents                           0&lt;br /&gt;rule_fast_operators                       0&lt;br /&gt;dc_free_extents                           0&lt;br /&gt; &lt;br /&gt;60 rows selected&lt;/pre&gt;The next step is to see whether latch miss source can give us some more hints regarding the issue:&lt;br /&gt;&lt;pre&gt;SQL&gt; select "WHERE", sleep_count, location&lt;br /&gt;  2   from v$latch_misses&lt;br /&gt;  3   where parent_name='row cache objects'&lt;br /&gt;  4    and sleep_count &gt; 0;&lt;br /&gt; &lt;br /&gt;WHERE               SLEEP_COUNT LOCATION&lt;br /&gt;------------------- ----------- ------------------------------&lt;br /&gt;kqrpre: find obj       20612167 kqrpre: find obj&lt;br /&gt;kqrpup                        7 kqrpup&lt;br /&gt;kqrcmt: while loop            1 kqrcmt: while loop&lt;br /&gt;kqrcmt: clear flag            1 kqrcmt: clear flag&lt;br /&gt;kqreqd                  1026837 kqreqd&lt;br /&gt;kqreqd: reget           2602576 kqreqd: reget&lt;br /&gt; &lt;br /&gt;6 rows selected&lt;/pre&gt;Now if you take &lt;span style="font-style:italic;"&gt;kqrpre: find obj&lt;/span&gt; and plug it into a search on My Oracle Support you'll quickly yield &lt;span style="font-style:italic;"&gt;Bug 5749075  High Requests on dc_rollback_segments&lt;/span&gt;. Among other things, this note points out at the unusually high number of undo segments being created due to cleanup not able to work properly...&lt;br /&gt;&lt;pre&gt;SQL&gt; select count(*) from dba_rollback_segs;&lt;br /&gt; &lt;br /&gt;  COUNT(*)&lt;br /&gt;----------&lt;br /&gt;     14838&lt;/pre&gt;...and this seems to be the case. The only difference is that the issue has been observed on 11GR2 and the bug has been filled against the older versions. Though it was still worth checking in case we were seeing a regression. Indeed, after getting rid of that many undo segments by simply recreating the undo tablespace, the issue, thought not completely vanished, manifested itself a lot less making it's impact relatively insignificant to the process throughput.</description><link>http://afatkulin.blogspot.com/2010/06/row-cache-objects-latch-contention.html</link><author>noreply@blogger.com (Alex Fatkulin)</author><thr:total>5</thr:total></item></channel></rss>
