Martins Blog

Trying to explain complex things in simple terms

GTT and Smart Scan – the importance of the append hint

Posted by Martin Bach on December 5, 2016

While writing the previous two posts about GTTs and Smart Scan, I stumbled across an interesting observation. When I started putting my thoughts to (virtual) paper, I thought it was going to be a short post. Well, it wasn’t to be, again. Anyway, you might find this interesting.

Initial Situation

If you read the previous posts this code example I used to populate the GTT might look familiar:

insert /*+ append */ into gtt select * from t4 where rownum < 400000;
 
commit;

In situations like this where I’m moving data I developed a habit of using the append hint. I guess I’ve been working on HCC tables a bit too much and therefore didn’t even make a conscious decision to use that hint. It was the right thing to do, as you will see next.

The full test harness is shown here again for reference:

SQL> !cat test.sql
set lines 120 tab off trimspool on verify off timing on

-- this statement is using the /*+ append */ hint
insert /*+ append using_append */ into gtt select * from t4 where rownum < 400000;

commit;

-- not testing if I can get Smart Scans, I know there are :)
alter session set "_serial_direct_read" = always;

-- make sure the query can be ever so slightly more complex and always return rows
col min_id new_v min_id
col max_id new_v max_id

select min(id) min_id from gtt;
select max(id) max_id from gtt;

-- this is the actual test
@mystats start s=s
set echo on
select /* using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;
set echo off
@mystats stop t=1

SQL>

When executing the script and looking at the output I didn’t see anything out of the ordinary. When checking for the “depth” of the Smart Scan, I regularly consult the cell blocks processed by % layer statistics to see if I lose blocks between code layers. It wouldn’t appear so:

------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
...
STAT    cell IO uncompressed bytes                                             546,136,064
STAT    cell blocks processed by cache layer                                        66,667
STAT    cell blocks processed by data layer                                         66,667
STAT    cell blocks processed by txn layer                                          66,667
...
STAT    cell physical IO bytes eligible for predicate offload                  546,136,064
STAT    cell physical IO interconnect bytes                                        112,504
STAT    cell physical IO interconnect bytes returned by smart scan                  96,120
...
STAT    physical read IO requests                                                      523
STAT    physical read bytes                                                    546,152,448
STAT    physical read requests optimized                                               317
STAT    physical read total IO requests                                                523
STAT    physical read total bytes                                              546,152,448
STAT    physical read total bytes optimized                                    330,153,984
STAT    physical read total multi block requests                                       521
STAT    physical reads                                                              66,669
STAT    physical reads cache                                                             2
STAT    physical reads direct                                                       66,667
...
STAT    table scans (direct read)                                                        1
STAT    table scans (long tables)                                                        1

66,667 blocks in each code layer. This looks ok to me: the number of all physical reads recorded for this statement is 66,669. I certainly don’t want to endorse tuning by ratio, but in this case it sort-of makes sense: the offload efficiency pretty high in my case:

SQL> @fsx.sql
Enter value for sql_text:
Enter value for sql_id: a4jrkbnsfgk1j

SQL_ID         CHILD  PLAN_HASH  EXECS  AVG_ETIME AVG_PX OFFLOAD IO_SAVED_% SQL_TEXT
------------- ------ ---------- ------ ---------- ------ ------- ---------- ----------------------------------------------------------------------
a4jrkbnsfgk1j      0 2363333961      1        .01      0 Yes         100.00 select /* using_append */ count(*), id from gtt where id in (        7

That should suffice for a recap of the previous posts.

No append hint-different outcome

Now imagine my surprise when I didn’t execute a script but rather typed the insert command in an interactive session. In the next example, created to reproduce the observation, I “forgot” to specify the append hint:

SQL> !cat test2.sql
-- not using the append hint
insert /* not_using_append */ into gtt select * from t4 where rownum < 400000;

commit;

-- not testing if I can get Smart Scans, I know there are :)
alter session set "_serial_direct_read" = always;

-- make sure the query can be ever so slightly more complex and always return rows
col min_id new_v min_id
col max_id new_v max_id

select min(id) min_id from gtt;
select max(id) max_id from gtt;

-- this is the actual test
@mystats start s=s
set echo on
select /* not_using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;
set echo off
@mystats stop t=1

The difference between test.sql and test2.sql looks harmless enough – just a hint, or lack thereof. However the result is interesting. Most executions of the above script resulted in Oracle reporting the following statistics:

------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
...
STAT    calls to get snapshot scn: kcmgss                                              462
STAT    calls to kcmgcs                                                                 28
STAT    cell IO uncompressed bytes                                             544,800,768
STAT    cell blocks processed by cache layer                                        67,026
STAT    cell blocks processed by data layer                                              1
STAT    cell blocks processed by txn layer                                               1
STAT    cell commit cache queries                                                   67,025
STAT    cell flash cache read hits                                                     633
STAT    cell num smartio automem buffer allocation attempts                              1
STAT    cell physical IO bytes eligible for predicate offload                  544,800,768
STAT    cell physical IO interconnect bytes                                    545,000,688
STAT    cell physical IO interconnect bytes returned by smart scan             544,984,304
STAT    cell scans                                                                       1
STAT    cleanout - number of ktugct calls                                           66,503
STAT    cleanouts only - consistent read gets                                       66,503
STAT    commit txn count during cleanout                                            66,503
...
STAT    consistent gets                                                            133,250
STAT    consistent gets direct                                                      66,504
STAT    consistent gets examination                                                 66,506
STAT    consistent gets examination (fastpath)                                      66,506
...
STAT    immediate (CR) block cleanout applications                                  66,503
...
STAT    physical read IO requests                                                    1,044
STAT    physical read bytes                                                    544,817,152
STAT    physical read requests optimized                                               633
STAT    physical read total IO requests                                              1,044
STAT    physical read total bytes                                              544,817,152
STAT    physical read total bytes optimized                                    329,859,072
STAT    physical read total multi block requests                                       748
STAT    physical reads                                                              66,506
STAT    physical reads cache                                                             2
STAT    physical reads direct                                                       66,504
...

Huh? I was clearly surprised to see so many rows not making it to the higher layers (txn and data) after having been read by the cache layer. Further executions of the select statement within the same session do not necessarily improve this significantly, but I noticed savings by storage index to come into play.

Speaking more scientifically, the listing immediately above shows that out of 67,026 blocks that were read by the cache layer only 1 (!) was processed by txn and data layers. As a direct result Smart Scan does not offer any real benefit: cell physical IO interconnect bytes returned by smart scan is actually ≥ cell physical IO bytes eligible for predicate offload.

Why? What?

There were a few red herrings in the statistic counters I saw:

  • Cleanout statistics have been reported by mystats
    • commit txn count during cleanout
    • cleanouts only – consistent read gets
    • immediate (CR) block cleanout applications
    • etc.
  • There are consistent gets examination that might be peeks at undo information
  • Lots of queries to the commit cache (cell commit cache queries) – but for most of my tests I didn’t see replies. Only 1 out of maybe 20 tests produced some cell transactions found in commit cache, but even then their number was not significant. I will provide an example towards the end of this post.

Further investigation

So this leads me to suspect that the blocks that were created with the append hint are “clean”, e.g. suitable for a consistent read on the cells without further need for processing. It would appear so, judging by this block dump (dump the block from the tempfile, not datafile!):

Start dump data blocks tsn: 3 file#:1 minblk 2305025 maxblk 2305025
...
Block dump from disk:
buffer tsn: 3 rdba: 0x00632c01 (1/2305025)
scn: 0x0.64aaeb9f seq: 0x01 flg: 0x0c tail: 0xeb9f0601
frmt: 0x02 chkval: 0xd11e type: 0x06=trans data
...
Block header dump:  0x00632c01
 Object id on Block? Y
 seg/obj: 0x632c00  csc: 0x00.64aaeb9e  itc: 3  flg: -  typ: 1 - DATA
     fsl: 0  fnx: 0x0 ver: 0x01

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x000a.019.0012beca  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x03   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
bdba: 0x00632c01
data_block_dump,data header at 0x7f6250501074
===============
tsiz: 0x1f88
hsiz: 0x1e
pbl: 0x7f6250501074
     76543210
flag=--------
ntab=1
nrow=6
frre=-1
fsbo=0x1e
fseo=0x746
avsp=0x728
tosp=0x728
0xe:pti[0]      nrow=6  offs=0
0x12:pri[0]     offs=0x1b7d
0x14:pri[1]     offs=0x1772
0x16:pri[2]     offs=0x1367
0x18:pri[3]     offs=0xf5c
0x1a:pri[4]     offs=0xb51
0x1c:pri[5]     offs=0x746
block_row_dump:
tab 0, row 0, @0x1b7d
tl: 1035 fb: --H-FL-- lb: 0x0  cc: 6
col  0: [ 2]  c1 50
col  1: [999]
 37 39 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
...
tl: 1035 fb: --H-FL-- lb: 0x0  cc: 6
...
tl: 1035 fb: --H-FL-- lb: 0x0  cc: 6
...

This is in contrast to the blocks I dumped that have been created without the append hint. To make sure I’m comparing apples with apples these blocks have been created in a different session after having disconnected from the previous one:

Start dump data blocks tsn: 3 file#:1 minblk 2305025 maxblk 2305025
...
Block dump from disk:
buffer tsn: 3 rdba: 0x00632c01 (1/2305025)
scn: 0x0.64aab6bf seq: 0x00 flg: 0x0c tail: 0xb6bf0600
frmt: 0x02 chkval: 0x478e type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
...
Block header dump:  0x00632c01
 Object id on Block? Y
 seg/obj: 0x629300  csc: 0x00.64aab6bf  itc: 2  flg: -  typ: 1 - DATA
     fsl: 0  fnx: 0x0 ver: 0x01

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x000a.01b.0012be5d  0x00c04427.d97f.27  ----    6  fsc 0x0000.00000000
0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
bdba: 0x00632c01
data_block_dump,data header at 0x7f53b75e125c
===============
tsiz: 0x1fa0
hsiz: 0x1e
pbl: 0x7f53b75e125c
     76543210
flag=--------
ntab=1
nrow=6
frre=-1
fsbo=0x1e
fseo=0x752
avsp=0x734
tosp=0x734
0xe:pti[0]      nrow=6  offs=0
0x12:pri[0]     offs=0xf6c
0x14:pri[1]     offs=0x1379
0x16:pri[2]     offs=0x1786
0x18:pri[3]     offs=0x1b93
0x1a:pri[4]     offs=0x752
0x1c:pri[5]     offs=0xb5f
block_row_dump:
tab 0, row 0, @0xf6c
tl: 1037 fb: --H-FL-- lb: 0x1  cc: 6
col  0: [ 4]  c3 0f 05 50
col  1: [999]
 31 34 30 34 37 39 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
...
tl: 1037 fb: --H-FL-- lb: 0x1  cc: 6
...
tl: 1037 fb: --H-FL-- lb: 0x1  cc: 6
...

This supports the theory that the block I created without the append hint features some information that prevents it from being passed from cache to txn layer. I didn’t see active txn count during cleanout though, but commit txn count during cleanout and immediate (CR) block cleanout applications instead. So cleanout is definitely needed and performed.

I stated in the second bullet point that commit cache queries were not producing cache-hits most of the time. I managed to get an execution of the select statement against the blocks created without append hint, where at least some commit cache queries produced hits:

------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
STAT    cell IO uncompressed bytes                                             544,800,768
STAT    cell blocks helped by commit cache                                           6,981
STAT    cell blocks processed by cache layer                                        66,969
STAT    cell blocks processed by data layer                                          6,982
STAT    cell blocks processed by txn layer                                           6,982
STAT    cell commit cache queries                                                   66,968
STAT    cell flash cache read hits                                                     576
STAT    cell num smartio automem buffer allocation attempts                              1
STAT    cell physical IO bytes eligible for predicate offload                  544,800,768
STAT    cell physical IO interconnect bytes                                    487,801,976
STAT    cell physical IO interconnect bytes returned by smart scan             487,785,592
STAT    cell scans                                                                       1
STAT    cell transactions found in commit cache                                      6,981
STAT    cleanout - number of ktugct calls                                           59,522
STAT    cleanouts only - consistent read gets                                       59,522
STAT    commit txn count during cleanout                                            59,522
...
STAT    physical read IO requests                                                      987
STAT    physical read bytes                                                    544,817,152
STAT    physical read requests optimized                                               576
STAT    physical read total IO requests                                                987
STAT    physical read total bytes                                              544,817,152
STAT    physical read total bytes optimized                                    329,859,072
STAT    physical read total multi block requests                                       723
STAT    physical reads                                                              66,506
STAT    physical reads cache                                                             2
STAT    physical reads direct                                                       66,504

I couldn’t reproduce this at will though, and it wasn’t for lack of trying.

Summary

I am not entirely sure what causes the observed behaviour, especially with regards to the commit cache and/or minscn optimisation but it evidently seems to be better to populate a GTT with the append hint if you want to make best use of Smart Scans against such segments. A very quick and dirty test against a heap table suggests that this is true for this segment types as well.

Posted in 12c Release 1, Exadata | Leave a Comment »

Smart Scanning GTTs – what about Flash Cache?

Posted by Martin Bach on November 29, 2016

Continuing the example of the previous blog post (is it possible to Smart Scan GTTs?), I am investigating if Smart Scans against GTTs can benefit from Flash Cache, or “optimised reads”. The easiest way to see this in your session is to calculate the change in session counters based on v$mystat or v$sesstat. Other tools might also provide information about the amount of data read from Flash Cache, but there are quite often extra licenses required …

Calculating the delta in session counters can be done in many ways, for example using Tanel’s session snapper. For clearly defined test cases where I can control beginning and end of the execution I prefer to use another great script for the purpose. My thanks go to Adrian Billington for providing the Oracle community with the most useful “mystats” package which I’m using here. You can – and should – get it from oracle-developer.net.

Let’s check I can get flash cache read hits on GTT scans – which would help performance of repeated queries against the segment. And since GTTs reside on TEMP, I had a few doubts whether flash cache read hits were possible.

What better than to test?

Testing helps removing ambiguity (and documenting the test result helps me remember things!), so here’s the code used:

set lines 120 tab off trimspool on verify off timing on

-- it will be crucially important to use the /*+ append */ hint 
-- as you will see in the next post about GTTs on Exadata
insert /*+ append */ into gtt select * from t4 where rownum < 400000;

commit;

-- not testing if I can get Smart Scans, I know there are :)
alter session set "_serial_direct_read" = always;

-- make sure the query can be ever so slightly more complex and always return rows
col min_id new_v min_id
col max_id new_v max_id

select min(id) min_id from gtt;
select max(id) max_id from gtt;

-- hoping to trigger flash cache population here
select /* using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;
select /* using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;
select /* using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;
select /* using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;
select /* using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;
select /* using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;

-- and measuring effect, if any

@mystats start s=s

set echo on
select /* using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;
set echo off

@mystats stop t=1

Have a look at the code in mystats.sql to see what the options mean. Essentially I’m asking it to record session statistics only (start s=s) and after the execution list only those counters that have changed (t=1 where t is for threshold). I hacked the script a little to order by statistic name rather than the default.

The observation

It appears I can get Flash Cache hits. The relevant statistic name is “cell flash cache read hits”, and the fact that physical IO was optimised can also be seen in “physical read requests optimized” and “physical read total bytes optimized” amongst others. Here is the relevant output from mystats:

------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
...
STAT    cell blocks processed by cache layer                                        66,667
STAT    cell blocks processed by data layer                                         66,667
STAT    cell blocks processed by txn layer                                          66,667
STAT    cell flash cache read hits                                                     315
STAT    cell num smartio automem buffer allocation attempts                              1
STAT    cell physical IO bytes eligible for predicate offload                  546,136,064
STAT    cell physical IO interconnect bytes                                        112,504
STAT    cell physical IO interconnect bytes returned by smart scan                  96,120
STAT    cell scans                                                                       1
...
STAT    physical read IO requests                                                      523
STAT    physical read bytes                                                    546,152,448
STAT    physical read requests optimized                                               315
STAT    physical read total IO requests                                                523
STAT    physical read total bytes                                              546,152,448
STAT    physical read total bytes optimized                                    329,252,864
STAT    physical read total multi block requests                                       521
STAT    physical reads                                                              66,669
STAT    physical reads cache                                                             2
STAT    physical reads direct                                                       66,667
...

I usually check the cell blocks processed by% layer first to get an idea about the “depth” of the Smart Scan-losing blocks in transaction and data layers wouldn’t be great. There’s been 1 Smart Scan against my segment (the GTT isn’t partitioned) recorded in “cell scans”, so that covers that aspect as well.

Out of the 523 “physical read IO requests” 315 were optimised so that’s a pretty good ratio of approximately 60% of the IO requests coming from Flash Cache. The other statistics represent the same ratio in bytes rather than I/O requests.

Posted in 12c Release 1, Exadata | Leave a Comment »

Little things worth knowing: Can I Smart Scan a Global Temporary Table?

Posted by Martin Bach on November 25, 2016

A question that has come up while preparing to deliver another Exadata Optimisation Class is whether you can offload queries against Global Temporary Tables (GTT) or not. There are subtle differences between heap tables and GTTs such as how statistics are handled and where they are stored that merit a little investigation.

I have used the lab environment on the X3-2 (Exadata 12.1.2.3.0, 12.1.0.2.160419 RDBMS) to test if I can Smart Scan GTTs. There are some enhancements in 12c regarding GTTs that are mentioned on the Optimiser blog here:

https://blogs.oracle.com/optimizer/entry/gtts_and_upgrading_to_oracle

The Setup

I have used a brand new, 12c non-CDB for these tests. The SGA_TARGET is deliberately set to something really small so as to provoke Smart Scans without having to resort to setting _serial_direct_read to “always” in my session.

First of all, I needed a GTT for this purpose. I have a few scripts that create large tables for testing, and T4 will serve as my basis. Here is my setup:

SQL> create global temporary table gtt on commit preserve rows 
  2  as select * from t4 where 1 = 0;

Table created.

SQL> alter session enable parallel dml;

Session altered.

SQL> insert /*+ parallel(4) */ into gtt
  2  select /*+ parallel(4) */ * from t4
  3    where rownum < 400000;

399999 rows created.

SQL> commit

Commit complete.

This might be the time to point out that the GTT has two types of statistics in 12c – shared and session. Again, please refer to the Optimiser blog post for more information about private session statistics. I first looked at dba_tables for num_rows etc but found the column empty. Using a query I found on oracle-base.com in Tim’s “Session-Private Statistics for Global Temporary Tables in Oracle Database 12c Release 1” article I detected some statistics in the end:

SQL> select table_name, object_type, num_rows, scope
  2  from user_tab_statistics where table_name = 'GTT';

TABLE_NAME                     OBJECT_TYPE    NUM_ROWS SCOPE
------------------------------ ------------ ---------- -------
GTT                            TABLE                   SHARED
GTT                            TABLE            399999 SESSION

It’s a global optimiser setting where you gather session level statistics by default.

SQL> SELECT DBMS_STATS.get_prefs('GLOBAL_TEMP_TABLE_STATS') FROM dual;

DBMS_STATS.GET_PREFS('GLOBAL_TEMP_TABLE_STATS')
--------------------------------------------------
SESSION

The Testcase

With the table set up it is now time to test the Smart Scan. In addition to a SQL trace I request a NSMTIO trace to understand Oracle’s direct path read decision better.

SQL> alter session set events 'sql_trace level 8';

Session altered.

SQL> alter session set events 'trace[NSMTIO] disk=highest';

Session altered.

SQL> select count(*), id from gtt where id in (9244048,2529293) group by id;

  COUNT(*)         ID
---------- ----------
         1    2529293
         1    9244048

SQL> exit

Looking at the trace file I can see the following output for this particular statement:

=====================
PARSING IN CURSOR #140137295360240 len=70 dep=0 uid=78 oct=3 lid=78 tim=1228663420381 hv=2599214644 ad='1398f2ab0' sqlid='9p960vqdftrjn'
select count(*), id from gtt where id in (9244048,2529293) group by id
END OF STMT
PARSE #140137295360240:c=2000,e=1954,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=2363333961,tim=1228663420380
EXEC #140137295360240:c=0,e=74,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=2363333961,tim=1228663420505
WAIT #140137295360240: nam='SQL*Net message to client' ela= 8 driver id=1650815232 #bytes=1 p3=0 obj#=163899 tim=1228663420582
NSMTIO: kcbism: islarge 1 next 0 nblks 67197 type 2, bpid 3, kcbisdbfc 0 kcbnhl 32768 kcbstt 6686 keep_nb 0 kcbnbh 317347 kcbnwp 4
NSMTIO: kcbimd: nblks 67197 kcbstt 6686 kcbpnb 33430 kcbisdbfc 3 is_medium 0
NSMTIO: kcbivlo: nblks 67197 vlot 500 pnb 334303 kcbisdbfc 0 is_large 0
NSMTIO: qertbFetch:[MTT < OBJECT_SIZE < VLOT]: Checking cost to read from caches(local/remote) and checking storage reduction factors (OLTP/EHCC Comp)
NSMTIO: kcbdpc:DirectRead: tsn: 3, objd: 6468608, objn: 163899
ckpt: 1, nblks: 67197, ntcache: 3, ntdist:3
Direct Path for pdb 0 tsn 3  objd 6468608 objn 163899
Direct Path 1 ckpt 1, nblks 67197 ntcache 3 ntdist 3
Direct Path mndb 0 tdiob 3978 txiob 0 tciob 101
Direct path diomrc 128 dios 2 kcbisdbfc 0
NSMTIO: Additional Info: VLOT=1671515
Object# = 6468608, Object_Size = 67197 blocks
SqlId = 9p960vqdftrjn, plan_hash_value = 2363333961, Partition# = 0
WAIT #140137295360240: nam='reliable message' ela= 1053 channel context=5884629416 channel handle=5884574224 broadcast message=5885746648 obj#=163899 tim=1228663421984
WAIT #140137295360240: nam='enq: RO - fast object reuse' ela= 122 name|mode=1380909062 2=65585 0=1 obj#=163899 tim=1228663422173
WAIT #140137295360240: nam='enq: RO - fast object reuse' ela= 96 name|mode=1380909057 2=65585 0=2 obj#=163899 tim=1228663422323
WAIT #140137295360240: nam='cell single block physical read' ela= 308 cellhash#=2133459483 diskhash#=2964949887 bytes=8192 obj#=163899 tim=1228663423031
WAIT #140137295360240: nam='cell smart table scan' ela= 153 cellhash#=2133459483 p2=0 p3=0 obj#=163899 tim=1228663423651
WAIT #140137295360240: nam='cell smart table scan' ela= 152 cellhash#=379339958 p2=0 p3=0 obj#=163899 tim=1228663424114
WAIT #140137295360240: nam='cell smart table scan' ela= 145 cellhash#=3176594409 p2=0 p3=0 obj#=163899 tim=1228663424571
WAIT #140137295360240: nam='gc cr grant 2-way' ela= 135 p1=201 p2=2273408 p3=7 obj#=163899 tim=1228663425857
WAIT #140137295360240: nam='cell single block physical read' ela= 257 cellhash#=379339958 diskhash#=2689692736 bytes=8192 obj#=163899 tim=1228663426158
WAIT #140137295360240: nam='cell single block physical read' ela= 224 cellhash#=2133459483 diskhash#=3132070477 bytes=8192 obj#=163899 tim=1228663426466
WAIT #140137295360240: nam='cell smart table scan' ela= 251 cellhash#=2133459483 p2=0 p3=0 obj#=163899 tim=1228663426863
WAIT #140137295360240: nam='cell smart table scan' ela= 268 cellhash#=379339958 p2=0 p3=0 obj#=163899 tim=1228663427213
WAIT #140137295360240: nam='cell smart table scan' ela= 225 cellhash#=3176594409 p2=0 p3=0 obj#=163899 tim=1228663427514
WAIT #140137295360240: nam='cell smart table scan' ela= 85 cellhash#=2133459483 p2=0 p3=0 obj#=163899 tim=1228663427720
WAIT #140137295360240: nam='cell smart table scan' ela= 13 cellhash#=379339958 p2=0 p3=0 obj#=163899 tim=1228663427755
...

There is a simple conclusion: yes, there is a Smart Scan. I can’t see a massive difference between the NSMTIO trace for a GTT or a heap table. Since direct path read decisions are statistics driven (check parameter _direct_read_decision_statistics_driven) and there weren’t shared statistics on GTT I assume the single block reads are related to reading object statistics. OBJ# 163899 is the object_id of the GTT.

What is interesting is the use of “enq: RO – fast object reuse” prior to the start of the Smart Scan. This surprised me a little and I wasn’t expecting it. Here’s the same trace combination for a heap table for comparison:

=====================
PARSING IN CURSOR #140120190960712 len=91 dep=0 uid=78 oct=3 lid=78 tim=1226766583470 hv=2817388421 ad='13483d3f0' sqlid='g8948rkmyvvw5'
select /* heaptest001 */ count(*), id from heapt1 where id in (6460386, 6460390) group by id
END OF STMT
PARSE #140120190960712:c=1000,e=1925,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=3353166567,tim=1226766583470
EXEC #140120190960712:c=0,e=84,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3353166567,tim=1226766583599
WAIT #140120190960712: nam='SQL*Net message to client' ela= 3 driver id=1650815232 #bytes=1 p3=0 obj#=163896 tim=1226766583653
NSMTIO: kcbism: islarge 1 next 0 nblks 67024 type 2, bpid 3, kcbisdbfc 0 kcbnhl 32768 kcbstt 6686 keep_nb 0 kcbnbh 317347 kcbnwp 4
NSMTIO: kcbimd: nblks 67024 kcbstt 6686 kcbpnb 33430 kcbisdbfc 3 is_medium 0
NSMTIO: kcbivlo: nblks 67024 vlot 500 pnb 334303 kcbisdbfc 0 is_large 0
NSMTIO: qertbFetch:[MTT < OBJECT_SIZE < VLOT]: Checking cost to read from caches(local/remote) and checking storage reduction factors (OLTP/EHCC Comp)
NSMTIO: kcbdpc:DirectRead: tsn: 9, objd: 163896, objn: 163896
ckpt: 1, nblks: 67024, ntcache: 482, ntdist:482
Direct Path for pdb 0 tsn 9  objd 163896 objn 163896
Direct Path 1 ckpt 1, nblks 67024 ntcache 482 ntdist 482
Direct Path mndb 0 tdiob 3978 txiob 0 tciob 109
Direct path diomrc 128 dios 2 kcbisdbfc 0
NSMTIO: Additional Info: VLOT=1671515
Object# = 163896, Object_Size = 67024 blocks
SqlId = g8948rkmyvvw5, plan_hash_value = 3353166567, Partition# = 0
WAIT #140120190960712: nam='enq: KO - fast object checkpoint' ela= 351 name|mode=1263468550 2=65585 0=2 obj#=163896 tim=1226766584359
WAIT #140120190960712: nam='reliable message' ela= 1372 channel context=5884629416 channel handle=5252488952 broadcast message=5885771824 obj#=163896 tim=1226766585825
WAIT #140120190960712: nam='enq: KO - fast object checkpoint' ela= 88 name|mode=1263468550 2=65585 0=1 obj#=163896 tim=1226766585972
WAIT #140120190960712: nam='enq: KO - fast object checkpoint' ela= 90 name|mode=1263468545 2=65585 0=2 obj#=163896 tim=1226766586140
WAIT #140120190960712: nam='cell smart table scan' ela= 201 cellhash#=3176594409 p2=0 p3=0 obj#=163896 tim=1226766587377
WAIT #140120190960712: nam='cell smart table scan' ela= 137 cellhash#=379339958 p2=0 p3=0 obj#=163896 tim=1226766589330
WAIT #140120190960712: nam='cell smart table scan' ela= 127 cellhash#=2133459483 p2=0 p3=0 obj#=163896 tim=1226766593585
WAIT #140120190960712: nam='cell smart table scan' ela= 215 cellhash#=3176594409 p2=0 p3=0 obj#=163896 tim=1226766602986

This is the more familiar picture: reliable message, enq: KO – fast object checkpoint followed by cell smart table scan.

The Verdict

Quite simple this time: yes, you can use Smart Scans on GTT. Additionally there are a few caveats around the use of GTTs when you are migrating your database to 12.1 but you can read about them in the link to the Optimiser blog provided.

Posted in 12c Release 1, Exadata | 1 Comment »

Building an RPM for the Oracle database on Oracle Linux 7

Posted by Martin Bach on October 28, 2016

Thinking about automation a lot, especially in the context of cloud computing, I have decided to create a small series of related posts that hopefully help someone deploying environments in a fully automated way. As my colleague @fritshoogland has said many times: the only way to deploy database servers (or any other server for that matter) in a consistent way, is to do it via software. No matter how hard you try to set up 10 identical systems manually, there will be some, maybe ever so subtle, differences between them. And with the cloud you probably have 42 systems or more to deal with! In this context, my first post could be a building block: the provisioning of the Oracle RDBMS the form of an RPM.

The idea

In a nutshell, I would like to

  • Create a “golden image” of my database installation
  • Make this available for deployment via NFS
  • Create an RPM that allows me to deploy the “golden image” on the server I want to install and call clone.pl to complete the process
  • Execute the root scripts (orainstRoot.sh, root.sh)

This is actually quite easy to do! The oracle-validated RPM, and nowadays oracle-rdbms-server-release-preinstall.x86_64 packages simplify the task greatly as they can pull in dependent packages, set sensible default parameters and generally perform a lot of the tedious setup work for you. All I have to do in my RPM is to enforce the version of the system is Oracle Linux 7 and pull the preinstall RPM in if not yet installed. Since there is no more 12c RDBMS server for i686 this entire post implicitly is about Linux x86-64.

The Implementation

I am using a development system (linuxdev in this example) to build the RPM. It’s an Oracle Linux 7.2 system with all upgrades up to 24/10/2016. The first thing to do is to install the RPM dev tools and their dependencies. As always, you use yum to do so.

[root@linuxdev ~]# yum info rpmdevtools
Loaded plugins: ulninfo
Available Packages
Name        : rpmdevtools
Arch        : noarch
Version     : 8.3
Release     : 5.0.1.el7
Size        : 96 k
Repo        : ol7_latest/x86_64
Summary     : RPM Development Tools
URL         : https://fedorahosted.org/rpmdevtools/
Licence     : GPLv2+ and GPLv2
Description : This package contains scripts and (X)Emacs support files to aid in
            : development of RPM packages.
            : rpmdev-setuptree    Create RPM build tree within user's home directory
            : rpmdev-diff         Diff contents of two archives
            : rpmdev-newspec      Creates new .spec from template
            : rpmdev-rmdevelrpms  Find (and optionally remove) "development" RPMs
            : rpmdev-checksig     Check package signatures using alternate RPM keyring
            : rpminfo             Print information about executables and libraries
            : rpmdev-md5/sha*     Display checksums of all files in an archive file
            : rpmdev-vercmp       RPM version comparison checker
            : spectool            Expand and download sources and patches in specfiles
            : rpmdev-wipetree     Erase all files within dirs created by rpmdev-setuptree
            : rpmdev-extract      Extract various archives, "tar xvf" style
            : rpmdev-bumpspec     Bump revision in specfile
            : ...and many more.

[root@linuxdev ~]# yum install rpmdevtools

...

The development tools contain a lot of useful scripts, one of them I consider essential: rpmdev-setuptree. It sets up the relevant directory structure. Granted, the rpmdev tools are geared towards compiling software from their source, optionally apply patches and then create SRPMs and RPMs. RPM wasn’t designed for the purpose of packaging an Oracle database, but it can still be made to work. You can work around the size restriction using the %post hook in the SPEC file, as you will see later in this post. But I’m getting ahead of myself.

First I’m creating the RPM tree structure:

[oracle@linuxdev ~]$ rpmdev-setuptree
[oracle@linuxdev ~]$ ls -l rpmbuild/
total 0
drwxr-xr-x 2 oracle oinstall 6 Oct 24 10:50 BUILD
drwxr-xr-x 2 oracle oinstall 6 Oct 24 10:50 RPMS
drwxr-xr-x 2 oracle oinstall 6 Oct 24 10:50 SOURCES
drwxr-xr-x 2 oracle oinstall 6 Oct 24 10:50 SPECS
drwxr-xr-x 2 oracle oinstall 6 Oct 24 10:50 SRPMS
[oracle@linuxdev ~]$

Using another tool named rpmdev-newspec I create a new, empty SPEC file:

[oracle@linuxdev SPECS]$ rpmdev-newspec oracle_12102_jul16_psu_ojvm.spec
oracle_12102_jul16_psu_ojvm.spec created; type minimal, rpm version >= 4.11.
[oracle@linuxdev SPECS]$ ls -lrt
total 4
-rw-r--r-- 1 oracle oinstall 338 Oct 24 10:53 oracle_12102_jul16_psu_ojvm.spec
[oracle@linuxdev SPECS]$

Completing the SPEC file is easy if you know how to :) To save you the pain I put a stub together that is far from perfect, but should give you an idea. This is just a demo, the code especially lacks all (error, space, etc) checking which really should be in.

[oracle@linuxdev SPECS]$ cat oracle_12102_jul16_psu_ojvm.spec
Name:           oracle_12102_jul16_psu_ojvm
Version:        0
Release:        1%{?dist}
Summary:        a spec file for Oracle 12.1.0.2 with patch 23615289 (combo for 23054246 and 23177536)
Group:          Application/Databases
License:        Commercial
URL:            https://martincarstenbach.wordpress.com/
Requires:       oraclelinux-release >= 7:7.0, oracle-rdbms-server-12cR1-preinstall

%description
Oracle 12.1.0.2 EE patched with the July 2016 PSU and OJVM patch
Requires Oracle Linux 7, depends on the preinstall RPM to set up users, groups and defaults

ONLY A PROTOTYPE, USED FOR DEMONSTRATION PURPOSES ONLY

%files
%defattr(-,root,root,-)

%post
ORACLE_HOME=/u01/app/oracle/product/12.1.0.2/dbhome_1

if [ -d "$ORACLE_HOME" ]; then
        /usr/bin/echo "ORACLE_HOME directory $ORACLE_HOME exists-aborting";
        exit 127
fi

/usr/bin/echo "cloning the oracle home now to $ORACLE_HOME"
/usr/bin/mkdir -p $ORACLE_HOME

/usr/bin/umount /mnt
/usr/bin/mount -t nfs linuxdev:/u01/nfs_export /mnt

/usr/bin/tar --gzip -xf /mnt/oracle/12.1.0.2/oracle_12102_jul16_psu_ojvm.tar.gz -C $ORACLE_HOME

/usr/bin/chown -R oracle:oinstall /u01

/usr/bin/su - oracle -c "cd $ORACLE_HOME/clone/bin && ./clone.pl ORACLE_HOME=$ORACLE_HOME ORACLE_BASE=/u01/app/oracle -defaultHomeName"

/u01/app/oraInventory/orainstRoot.sh
$ORACLE_HOME/root.sh

umount /mnt

%changelog
* Mon Oct 24 2016 Martin Bach 
- initial version

Again, at the risk of repeating myself, this is just a stub and needs to be extended to include error handling and other things that we consider good coding practice. You also should check if there is sufficient space in the $ORACLE_HOME directory etc… you get the idea. Also, if you create another SPEC file for a 12c installation you currently can’t install a new Oracle Home as the path is hard coded to dbhome_1.

The script – which is the part in the %post section – is so simple it should be self-explanatory. It will be passed to /bin/sh and executed as part of the installation process.

With the SPEC file in place you can try and build the RPM. Funny enough, the rpmbuild tool is used for that:

[oracle@linuxdev SPECS]$ rpmbuild -ba oracle_12102_jul16_psu_ojvm.spec
Processing files: oracle_12102_jul16_psu_ojvm-0-1.el7.x86_64
Provides: oracle_12102_jul16_psu_ojvm = 0-1.el7 oracle_12102_jul16_psu_ojvm(x86-64) = 0-1.el7
Requires(interp): /bin/sh
Requires(rpmlib): rpmlib(FileDigests) <= 4.6.0-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1 rpmlib(CompressedFileNames) <= 3.0.4-1
Requires(post): /bin/sh
Checking for unpackaged file(s): /usr/lib/rpm/check-files /home/oracle/rpmbuild/BUILDROOT/oracle_12102_jul16_psu_ojvm-0-1.el7.x86_64
Wrote: /home/oracle/rpmbuild/SRPMS/oracle_12102_jul16_psu_ojvm-0-1.el7.src.rpm
Wrote: /home/oracle/rpmbuild/RPMS/x86_64/oracle_12102_jul16_psu_ojvm-0-1.el7.x86_64.rpm
Executing(%clean): /bin/sh -e /var/tmp/rpm-tmp.4xSozZ
+ umask 022
+ cd /home/oracle/rpmbuild/BUILD
+ /usr/bin/rm -rf /home/oracle/rpmbuild/BUILDROOT/oracle_12102_jul16_psu_ojvm-0-1.el7.x86_64
+ exit 0

This command has created a Source RPM, and the RPM file to be used, as you can see here:

[oracle@linuxdev SPECS]$ ls -l ../RPMS/x86_64/
total 4
-rw-r--r-- 1 oracle oinstall 2832 Oct 26 16:24 oracle_12102_jul16_psu_ojvm-0-1.el7.x86_64.rpm
[oracle@linuxdev SPECS]$ rpm -qpil  --requires ../RPMS/x86_64/oracle_12102_jul16_psu_ojvm-0-1.el7.x86_64.rpm
Name        : oracle_12102_jul16_psu_ojvm
Version     : 0
Release     : 1.el7
Architecture: x86_64
Install Date: (not installed)
Group       : Application/Databases
Size        : 0
License     : Commercial
Signature   : (none)
Source RPM  : oracle_12102_jul16_psu_ojvm-0-1.el7.src.rpm
Build Date  : Wed 26 Oct 2016 16:24:02 BST
Build Host  : linuxdev.example.com
Relocations : (not relocatable)
URL         : https://martincarstenbach.wordpress.com/
Summary     : a spec file for Oracle 12.1.0.2 with patch 23615289 (combo for 23054246 and 23177536)
Description :
Oracle 12.1.0.2 EE patched with the July 2016 PSU and OJVM patch
Requires Oracle Linux 7, depends on the preinstall RPM to set up users, groups and defaults

ONLY A PROTOTYPE, USED FOR DEMONSTRATION PURPOSES ONLY
oraclelinux-release >= 7:7.0
oracle-rdbms-server-12cR1-preinstall
/bin/sh
rpmlib(FileDigests) <= 4.6.0-1
rpmlib(PayloadFilesHavePrefix) <= 4.0-1
rpmlib(CompressedFileNames) <= 3.0.4-1
rpmlib(PayloadIsXz) <= 5.2-1
(contains no files)

You require a “gold image” binary copy for cloning as well. I am assuming it is in place and NFS-accessible in the location specified by the %post scriptlet.

You can refer to Frits Hoogland’s post for more detail about creating the gold image: https://fritshoogland.wordpress.com/2010/07/03/cloning-your-oracle-database-software-installation/. There are probably lots of other posts out there describing the same procedure.

Installation

Here is an example install session, executed as root on a new box I created: server9.

[root@server9 ~]# rpm -ihv /tmp/oracle_12102_jul16_psu_ojvm-0-1.el7.x86_64.rpm

Preparing...                          ################################# [100%]
Updating / installing...
   1:oracle_12102_jul16_psu_ojvm-0-1.e################################# [100%]
cloning the oracle home now to /u01/app/oracle/product/12.1.0.2/dbhome_1
./runInstaller -clone -waitForCompletion  "ORACLE_HOME=/u01/app/oracle/product/12.1.0.2/dbhome_1" 
"ORACLE_BASE=/u01/app/oracle" -defaultHomeName  -silent -paramFile 
/u01/app/oracle/product/12.1.0.2/dbhome_1/clone/clone_oraparam.ini
Starting Oracle Universal Installer...

Checking Temp space: must be greater than 500 MB.   Actual 9204 MB    Passed
Checking swap space: must be greater than 500 MB.   Actual 767 MB    Passed
Preparing to launch Oracle Universal Installer from /tmp/OraInstall2016-10-26_05-03-05PM. 
Please wait ...You can find the log of this install session at:
 /u01/app/oraInventory/logs/cloneActions2016-10-26_05-03-05PM.log
..................................................   5% Done.
..................................................   10% Done.
..................................................   15% Done.
..................................................   20% Done.
..................................................   25% Done.
..................................................   30% Done.
..................................................   35% Done.
..................................................   40% Done.
..................................................   45% Done.
..................................................   50% Done.
..................................................   55% Done.
..................................................   60% Done.
..................................................   65% Done.
..................................................   70% Done.
..................................................   75% Done.
..................................................   80% Done.
..................................................   85% Done.
..........
Copy files in progress.

Copy files successful.

Link binaries in progress.

Link binaries successful.

Setup files in progress.

Setup files successful.

Setup Inventory in progress.

Setup Inventory successful.

Finish Setup successful.
The cloning of OraHome1 was successful.
Please check '/u01/app/oraInventory/logs/cloneActions2016-10-26_05-03-05PM.log' for more details.

Setup Oracle Base in progress.

Setup Oracle Base successful.
..................................................   95% Done.

As a root user, execute the following script(s):
        1. /u01/app/oraInventory/orainstRoot.sh
        2. /u01/app/oracle/product/12.1.0.2/dbhome_1/root.sh



..................................................   100% Done.
Changing permissions of /u01/app/oraInventory.
Adding read,write permissions for group.
Removing read,write,execute permissions for world.

Changing groupname of /u01/app/oraInventory to oinstall.
The execution of the script is complete.
Check /u01/app/oracle/product/12.1.0.2/dbhome_1/install/root_server9_2016-10-26_17-04-05.log 
for the output of root script

The “post” script that is executed spawns a shell (/bin/sh) and executes a script stored in /var/tmp. In my case, this is the contents; pretty much exactly the %post section of the SPEC file:

[root@server9 ~]# cat /var/tmp/rpm-tmp.tZ4pYf
ORACLE_HOME=/u01/app/oracle/product/12.1.0.2/dbhome_1

if [ -d "$ORACLE_HOME" ]; then
        /usr/bin/echo "ORACLE_HOME directory $ORACLE_HOME exists-aborting";
        exit 127
fi

/usr/bin/echo "cloning the oracle home now to $ORACLE_HOME"
/usr/bin/mkdir -p $ORACLE_HOME

/usr/bin/umount /mnt
/usr/bin/mount -t nfs linuxdev:/u01/nfs_export /mnt

/usr/bin/tar --gzip -xf /mnt/oracle/12.1.0.2/oracle_12102_jul16_psu_ojvm.tar.gz -C $ORACLE_HOME

/usr/bin/chown -R oracle:oinstall /u01

/usr/bin/su - oracle -c "cd $ORACLE_HOME/clone/bin && ./clone.pl ORACLE_HOME=$ORACLE_HOME ORACLE_BASE=/u01/app/oracle -defaultHomeName"

/u01/app/oraInventory/orainstRoot.sh
$ORACLE_HOME/root.sh

umount /mnt
[root@server9 ~]#

And that’s it! One of the slightly more complex things was to ensure that this can be installed on Oracle Linux 7 only. One way to do this is to add the oraclelinux-release package as a dependency. I had to fiddle around with it a bit as it uses the epoch field (which seems rare), and not just the version. After I worked it out I managed to enforce the release. On my Oracle Linux 6 system the RPM could not be installed:

[root@linux6 ~]# rpm -ihv /tmp/oracle_12102_jul16_psu_ojvm-1-0.el7.x86_64.rpm
error: Failed dependencies:
	oraclelinux-release >= 7:7.0 is needed by oracle_12102_jul16_psu_ojvm-1-0.el7.x86_64

If you are now thinking of using the epoch:version to check for the oraclelinux-release on Oracle Linux 6, don’t. This is the current oraclelinux-release RPM taken from the “latest” channel in public-yum:

[oracle@linux6 ~]$ rpm -qpi --provides ./oraclelinux-release-6Server-8.0.3.x86_64.rpm
Name        : oraclelinux-release          Relocations: (not relocatable)
Version     : 6Server                           Vendor: Oracle America
Release     : 8.0.3                         Build Date: Wed May 11 11:51:57 2016
Install Date: (not installed)               Build Host: x86-ol6-builder-05.us.oracle.com
Group       : System Environment/Base       Source RPM: oraclelinux-release-6Server-8.0.3.src.rpm
Size        : 49976                            License: GPL
Signature   : RSA/8, Wed May 11 11:44:59 2016, Key ID 72f97b74ec551f03
Summary     : Oracle Linux 6 release file
Description :
System release and information files
config(oraclelinux-release) = 6:6Server-8.0.3
oraclelinux-release = 6:6Server-8.0.3
oraclelinux-release(x86-64) = 6:6Server-8.0.3

So instead of “Requires: oraclelinux-release >= 7:7.0”, … you probably need to replace that with “Requires: oraclelinux-release >= 6:6Server-2.0” for 12c (Oracle Linux 6 update 2 or newer, according to the Linux installation guide).

Posted in 12c Release 1, Linux | Tagged: , | 5 Comments »

Little things worth knowing: when a transient ASM disk failure cannot be fixed in time

Posted by Martin Bach on October 7, 2016

In the previous blog post I used libvirt and KVM in my lab environment to simulate a transient disk failure and how to recover from it. This post takes this example a step further: I am simulating another disk failure, but this time won’t pretend I can fix the issue and put it back. In other words, I simulate the effect of the disk_repair_time hitting zero.

Most of what I am covering here is an extension of the previous post, I’ll mention the main detail here for your benefit, but would like to invite you to revert to the previous post for more detail.

The idea is to show you the output of the ASM alert.log and result of the lost disk in the V$-views.

As with the previous post, the code examples in this one are for demonstration purposes only!

The setup in a nutshell

I am using Oracle Linux 7.2 with UEK4 as the host system; KVM and libvirt are responsible for the guests (virtual machines). The guest VM used for this example is named asmtest, and uses Oracle Linux 7.2 as well. There are 9 ASM disks – 8 for +DATA featuring normal redundancy in 2 failure groups. I added +RECO for the sake of completeness with external redundancy. This post is about +DATA. To keep it simple I used an Oracle Restart configuration patched to the July 2016 PSU.

Removing the disk

As with the previous example I am using libvirt to remove a “LUN” temporarily from the guest. And sure enough, the VM picks this up. This is the relevant output obtained via journalctl -f

Sep 29 15:34:36 asmtest kernel: sd 2:0:0:0: [sda] Synchronizing SCSI cache
Sep 29 15:34:36 asmtest kernel: sd 2:0:0:0: [sda]
Sep 29 15:34:36 asmtest kernel: Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Sep 29 15:34:36 asmtest kernel: sd 2:0:0:0: [sda]
Sep 29 15:34:36 asmtest kernel: Sense Key : Illegal Request [current] 
Sep 29 15:34:36 asmtest kernel: sd 2:0:0:0: [sda]
Sep 29 15:34:36 asmtest kernel: Add. Sense: Logical unit not supported

In the previous post I keep referring to I copied the part of the ASM instance’s alert.log that showed how the disk repair timer was ticking down. This time I am simulating the case where – for whatever reason – the transient failure could not be fixed. In that case, this is what you would see in the alert.log:

2016-09-29 15:38:21.752000 +01:00
WARNING: Started Drop Disk Timeout for Disk 1 (DATA_0001) in group 1 with a value 600
WARNING: Disk 1 (DATA_0001) in group 1 will be dropped in: (600) secs on ASM inst 1
2016-09-29 15:41:25.379000 +01:00
WARNING: Disk 1 (DATA_0001) in group 1 will be dropped in: (416) secs on ASM inst 1
2016-09-29 15:44:29.012000 +01:00
WARNING: Disk 1 (DATA_0001) in group 1 will be dropped in: (232) secs on ASM inst 1
2016-09-29 15:47:32.643000 +01:00
WARNING: Disk 1 (DATA_0001) in group 1 will be dropped in: (48) secs on ASM inst 1
2016-09-29 15:50:36.259000 +01:00
WARNING: Disk 1 (DATA_0001) in group 1 will be dropped in: (0) secs on ASM inst 1

The last line in the above output is definitely NOT what you want to see. If at all possible, you should fix the problem causing DATA_0001 in this example from being dropped. If you see the message about the disk being dropped you are facing the inevitable rebalance operation. Here is the continued output from the ASM alert.log:

WARNING: PST-initiated drop of 1 disk(s) in group 1(.4232658126))
SQL> alter diskgroup DATA drop disk DATA_0001 force /* ASM SERVER */
NOTE: cache closing disk 1 of grp 1: (not open) DATA_0001
NOTE: requesting all-instance membership refresh for group=1
GMON updating for reconfiguration, group 1 at 40 for pid 23, osid 3216
NOTE: cache closing disk 1 of grp 1: (not open) DATA_0001
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: group 1 PST updated.
NOTE: membership refresh pending for group 1/0xfc493cce (DATA)
GMON querying group 1 at 41 for pid 16, osid 2701
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATA
SUCCESS: refreshed membership for 1/0xfc493cce (DATA)
SUCCESS: alter diskgroup DATA drop disk DATA_0001 force /* ASM SERVER */
SUCCESS: PST-initiated drop disk in group 1(4232658126))
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: starting rebalance of group 1/0xfc493cce (DATA) at power 1
Starting background process ARB0
ARB0 started with pid=20, OS id=3655 
NOTE: assigning ARB0 to group 1/0xfc493cce (DATA) with 1 parallel I/O
2016-09-29 15:50:40.257000 +01:00
NOTE: restored redundancy of control and online logs in DATA
NOTE: Rebalance has restored redundancy for any existing control file or redo log in disk group DATA
NOTE: restored redundancy of control and online logs in DATA
2016-09-29 15:50:51.655000 +01:00
NOTE: requesting all-instance membership refresh for group=1
GMON updating for reconfiguration, group 1 at 42 for pid 21, osid 3719
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATA
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: group 1 PST updated.
2016-09-29 15:50:54.647000 +01:00
GMON updating for reconfiguration, group 1 at 43 for pid 21, osid 3719
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATA
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: group 1 PST updated.
2016-09-29 15:50:57.571000 +01:00
NOTE: membership refresh pending for group 1/0xfc493cce (DATA)
GMON querying group 1 at 44 for pid 16, osid 2701
GMON querying group 1 at 45 for pid 16, osid 2701
NOTE: Disk _DROPPED_0001_DATA in mode 0x0 marked for de-assignment
SUCCESS: refreshed membership for 1/0xfc493cce (DATA)
NOTE: Attempting voting file refresh on diskgroup DATA
2016-09-29 15:51:03.589000 +01:00
NOTE: stopping process ARB0
SUCCESS: rebalance completed for group 1/0xfc493cce (DATA)

The last line indicates that the rebalance operation is complete. My disk group was almost empty, the rebalance operation took almost no time. But indeed, when you check V$ASM_DISK, the disk is gone:

SQL> r
  1  select name,path,disk_number,mount_status,header_status,mode_status,state,failgroup,repair_timer
  2* from v$asm_disk where group_number = 1 order by disk_number

NAME       PATH            DISK_NUMBER MOUNT_S HEADER_STATU MODE_ST STATE    FAILGROUP  REPAIR_TIMER
---------- --------------- ----------- ------- ------------ ------- -------- ---------- ------------
DATA_0000  /dev/asm-disk02           0 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                0
DATA_0002  /dev/asm-disk03           2 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                0
DATA_0003  /dev/asm-disk06           3 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                0
DATA_0004  /dev/asm-disk07           4 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                0
DATA_0005  /dev/asm-disk08           5 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                0
DATA_0006  /dev/asm-disk04           6 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                0
DATA_0007  /dev/asm-disk05           7 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                0

7 rows selected.

SQL> select count(*), failgroup from v$asm_disk where name like 'DATA%' group by failgroup;

  COUNT(*) FAILGROUP
---------- ----------
         3 BATCH1
         4 BATCH2

My system is fully operational, and the rebalance did not run into any space problems. Space problems are the last thing you want to have when rebalancing. I did some research about this subject earlier and documented it in a blog post.

In a future blog post I am going to investigate what happens when a partner disk of DATA_0001 fails while the repair timer is ticking down.

Posted in 12c Release 1, Automatic Storage Management, KVM, Linux | Leave a Comment »

Little things worth knowing: transient ASM disk failures in 12c

Posted by Martin Bach on September 30, 2016

For quite some time we have been treated nicely by ASM when it comes to transient disk failures. Since 11.1 (if memory serves me right), transient failures won’t cause an ASM disk to be dropped immediately. This is good, because it can potentially save a lot of time! When a disk is dropped from an ASM disk, a rebalance operation is unavoidable. And there is more than a chance of another one following it, because ultimately you want to add the disk back to the disk group. Well, to be fair, this is only applicable for normal or high redundancy disk groups.

The feature I just described very briefly is referred to as ASM Fast Mirror Resync, and documented in the ASM Administrator’s Guide in section Administering Oracle ASM Disk Groups.

By the way, there is another new feature you might want to look at, called FAILGROUP_REPAIR_TIME, but I’ll keep that for another post.

In this post I’d like to demonstrate how easy it is to simulate a transient failure of an ASM disk using libvirt and KVM on a host running Oracle Linux 7.2. I also wanted to have the output from the O/S and ASM alert.log as personal reference.

As usual, this procedure is strictly for YOUR HOME LAB USE ONLY and for educational purposes.

The setup

When creating the VM – named “asmtest” – I created a boot volume, and a 50 GB volume for the Oracle binaries. To keep it manageable I went for an Oracle 12.1 Restart configuration, patched to the July PSU. There are 9 ASM disks, presented as SCSI devices (not virtio), I’ll explain why in a bit. The guest O/S is Oracle Linux 7.2, as is my lab server. All virtual disks on the VM are partitioned although that is not necessary (strictly speaking).

Using the lsscsi tool (part of the sg3_utils), I created udev rules following the excellent description I found on oracle-base.com.

[oracle@asmtest ~]$ cat /etc/udev/rules.d/61-asm.rules 

KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-0", SYMLINK+="asm-disk01", OWNER="oracle", GROUP="asmdba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-1", SYMLINK+="asm-disk02", OWNER="oracle", GROUP="asmdba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-2", SYMLINK+="asm-disk03", OWNER="oracle", GROUP="asmdba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-3", SYMLINK+="asm-disk04", OWNER="oracle", GROUP="asmdba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-0", SYMLINK+="asm-disk05", OWNER="oracle", GROUP="asmdba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-1", SYMLINK+="asm-disk06", OWNER="oracle", GROUP="asmdba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-2", SYMLINK+="asm-disk07", OWNER="oracle", GROUP="asmdba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-3", SYMLINK+="asm-disk08", OWNER="oracle", GROUP="asmdba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi3-0-0-0", SYMLINK+="asm-disk09", OWNER="oracle", GROUP="asmdba", MODE="0660"

This is slightly more complicated than my usual setup, but was needed for device name persistence. Unfortunately libvirt does not report the same block device names as my guest VM. Whilst the host “sees” sd[h-v] my guest reported these devices as sd[a-i]. Persistent device names to the rescue! The only issue I had initially was that disks presented to the VM via virtio don’t report any SCSI IDs I need for my udev rules, or in other words, lsscsi returned no data. Presenting the block devices via the SCSI interface did the trick.

The hypothesis

This snippet from the documentation (referenced above) is exactly what I wanted to test:

After you repair the disk, run the SQL statement ALTER DISKGROUP ONLINE DISK. This statement brings a repaired disk group back online to enable writes so that no new writes are missed. This statement also starts a procedure to copy of all of the extents that are marked as stale on their redundant copies.

Here is the current configuration I am using for my DATA disk group, as seen by ASM:

SQL> select name,state,type,offline_disks from v$asm_diskgroup
  2  where name = 'DATA';

NAME                           STATE       TYPE   OFFLINE_DISKS
------------------------------ ----------- ------ -------------
DATA                           MOUNTED     NORMAL             0



SQL> select name,disk_number,path,mount_status,header_status,mode_status,state,failgroup,repair_timer
  2  from v$asm_disk where group_number = 1 order by disk_number;

NAME            DISK_NUMBER PATH            MOUNT_S HEADER_STATU MODE_ST STATE    FAILGROUP        REPAIR_TIMER
--------------- ----------- --------------- ------- ------------ ------- -------- ---------------- ------------
DATA_0000                 0 /dev/asm-disk02 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                      0
DATA_0001                 1 /dev/asm-disk01 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                      0
DATA_0002                 2 /dev/asm-disk03 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                      0
DATA_0003                 3 /dev/asm-disk06 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                      0
DATA_0004                 4 /dev/asm-disk07 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                      0
DATA_0005                 5 /dev/asm-disk08 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                      0
DATA_0006                 6 /dev/asm-disk04 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                      0
DATA_0007                 7 /dev/asm-disk05 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                      0

You can see 2 failure groups, batch1 and batch2, and all disks are available/online. The operating system mapping works perfectly thanks to my udev rules:

[oracle@asmtest ~]$ ls -l /dev/asm-disk0*
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk01 -> sda1
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk02 -> sdd1
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk03 -> sdc1
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk04 -> sdb1
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk05 -> sde1
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk06 -> sdh1
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk07 -> sdg1
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk08 -> sdf1
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk09 -> sdi1

Let’s detach a disk and cause a transient failure

Libvirt is really cool, it allows me to modify my VM configuration at runtime. One example is removing of disks, using the detach-disk command:

[root@host ~]# virsh help detach-disk
  NAME
    detach-disk - detach disk device

  SYNOPSIS
    detach-disk <domain> <target> [--persistent] [--config] [--live] [--current]

  DESCRIPTION
    Detach disk device.

  OPTIONS
    [--domain] <string>  domain name, id or uuid
    [--target] <string>  target of disk device
    --persistent     make live change persistent
    --config         affect next boot
    --live           affect running domain
    --current        affect current domain

But which disk to remove? The disks attached to a VM can be listed using the domblklist command:

[root@host ~]# virsh domblklist asmtest
Target     Source
------------------------------------------------
vda        /path/to/slow/disk/asmtest
vdb        /path/to/slow/disk/asmtest-1
sdh        /path/to/fast/disk/asmtest-asm01.img
sdi        /path/to/fast/disk/asmtest-asm02.img
sdj        /path/to/fast/disk/asmtest-asm03.img
sdk        /path/to/fast/disk/asmtest-asm04.img
sdo        /path/to/fast/disk/asmtest-asm05.img
sdp        /path/to/fast/disk/asmtest-asm06.img
sdq        /path/to/fast/disk/asmtest-asm07.img
sdr        /path/to/fast/disk/asmtest-asm08.img
sdv        /path/to/fast/disk/asmtest-asm09.img

I mentioned previously that the information provided by the host does not necessarily match that of the guest. Since I have persistent device names though in my VM there shouldn’t be an issue. Let’s take a disk away, but only temporarily! Since my O/S and Oracle binaries reside on disks presented to the guest as virtio devices, I know they are named vd[a-z] and better left alone. That leaves sdh as the first disk to remove.

[root@host ~]# virsh detach-disk asmtest sdh --live
Disk detached successfully

And the disk is gone.

And what is happening?

The disk removal has immediate effect on the ASM instance. Switching over to my VM I can see the following information recorded in the ASM alert.log:

2016-09-29 13:45:13.936000 +01:00
WARNING: Write Failed. group:1 disk:1 AU:1 offset:1044480 size:4096
path:/dev/asm-disk01
 incarnation:0xe969ccd4 asynchronous result:'I/O error'
 subsys:System krq:0x7fda4f0b8210 bufp:0x7fda4f20b000 osderr1:0x69b5 osderr2:0x0
 IO elapsed time: 0 usec Time waited on I/O: 0 usec
WARNING: Hbeat write to PST disk 1.3916025044 in group 1 failed. [2]
NOTE: process _b000_+asm (4346) initiating offline of disk 1.3916025044 (DATA_0001) with mask 0x7e in group 1 (DATA) without client assisting
NOTE: checking PST: grp = 1
GMON checking disk modes for group 1 at 13 for pid 22, osid 4346
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: checking PST for grp 1 done.
NOTE: sending set offline flag message (844741558) to 1 disk(s) in group 1
WARNING: Disk 1 (DATA_0001) in group 1 mode 0x7f is now being offlined
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0xe969ccd4, mask = 0x6a, op = clear
GMON updating disk modes for group 1 at 14 for pid 22, osid 4346
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: PST update grp = 1 completed successfully
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0xe969ccd4, mask = 0x7e, op = clear
GMON updating disk modes for group 1 at 15 for pid 22, osid 4346
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: cache closing disk 1 of grp 1: DATA_0001
NOTE: PST update grp = 1 completed successfully
2016-09-29 13:46:53.300000 +01:00
WARNING: Started Drop Disk Timeout for Disk 1 (DATA_0001) in group 1 with a value 600
WARNING: Disk 1 (DATA_0001) in group 1 will be dropped in: (600) secs on ASM inst 1

I also had a journalctl -f running in a different session. Unsurprisingly the O/S also noticed the disk disappearing:

Sep 29 13:45:13 asmtest kernel: sd 2:0:0:0: [sda] Synchronizing SCSI cache
Sep 29 13:45:13 asmtest kernel: sd 2:0:0:0: [sda]
Sep 29 13:45:13 asmtest kernel: Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Sep 29 13:45:13 asmtest kernel: sd 2:0:0:0: [sda]
Sep 29 13:45:13 asmtest kernel: Sense Key : Illegal Request [current] 
Sep 29 13:45:13 asmtest kernel: sd 2:0:0:0: [sda]
Sep 29 13:45:13 asmtest kernel: Add. Sense: Logical unit not supported

A transient failure

In yet another session I can see that the disk is about to go, the repair timer has started ticking down, and is decremented as time passes. This did not happen straight away, there was a little delay of maybe a couple of minutes after the disk was removed.

SQL> r
  1  select name,disk_number,path,mount_status,header_status,mode_status,state,failgroup,repair_timer
  2* from v$asm_disk where group_number = 1 order by disk_number

NAME            DISK_NUMBER PATH            MOUNT_S HEADER_STATU MODE_ST STATE    FAILGROUP       REPAIR_TIMER
--------------- ----------- --------------- ------- ------------ ------- -------- --------------- ------------
DATA_0000                 0 /dev/asm-disk02 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0001                 1                 MISSING UNKNOWN      OFFLINE NORMAL   BATCH1                   600
DATA_0002                 2 /dev/asm-disk03 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0003                 3 /dev/asm-disk06 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0004                 4 /dev/asm-disk07 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0005                 5 /dev/asm-disk08 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0006                 6 /dev/asm-disk04 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0007                 7 /dev/asm-disk05 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0

8 rows selected.

SQL> r
  1  select name,disk_number,path,mount_status,header_status,mode_status,state,failgroup,repair_timer
  2* from v$asm_disk where group_number = 1 order by disk_number

NAME            DISK_NUMBER PATH            MOUNT_S HEADER_STATU MODE_ST STATE    FAILGROUP       REPAIR_TIMER
--------------- ----------- --------------- ------- ------------ ------- -------- --------------- ------------
DATA_0000                 0 /dev/asm-disk02 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0001                 1                 MISSING UNKNOWN      OFFLINE NORMAL   BATCH1                   416
DATA_0002                 2 /dev/asm-disk03 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0003                 3 /dev/asm-disk06 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0004                 4 /dev/asm-disk07 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0005                 5 /dev/asm-disk08 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0006                 6 /dev/asm-disk04 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0007                 7 /dev/asm-disk05 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0

8 rows selected.

Another day I would like to run an additional test to see what happens when the repair_timer is actually down to 0, but for now let’s add the disk back in. On the host, I use libvirt again for this:

[root@host ~]# virsh attach-disk asmtest /path/to/fast/disk/asmtest-asm01.img sdh --live
Disk attached successfully

As soon as the prompt returns, I can see the guest O/S picking the disk up.

Sep 29 13:54:09 asmtest kernel: scsi 2:0:0:0: Direct-Access     QEMU     QEMU HARDDISK    1.5. PQ: 0 ANSI: 5
Sep 29 13:54:09 asmtest kernel: sd 2:0:0:0: Attached scsi generic sg0 type 0
Sep 29 13:54:09 asmtest kernel: sd 2:0:0:0: [sda] 10485760 512-byte logical blocks: (5.36 GB/5.00 GiB)
Sep 29 13:54:09 asmtest kernel: sd 2:0:0:0: [sda] Write Protect is off
Sep 29 13:54:09 asmtest kernel: sd 2:0:0:0: [sda] Mode Sense: 63 00 00 08
Sep 29 13:54:09 asmtest kernel: sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Sep 29 13:54:09 asmtest kernel:  sda: sda1
Sep 29 13:54:09 asmtest kernel: sd 2:0:0:0: [sda] Attached SCSI disk

The documentation states that I should now be able to online the disk again, which I tried. The result is shown below:

SQL> alter diskgroup data online all;

Diskgroup altered.

I am showing you the complete ASM alert.log output here, feel free to skip to the bottom:

2016-09-29 13:54:44.774000 +01:00
SQL> alter diskgroup data online all
NOTE: cache closing disk 1 of grp 1: (not open) DATA_0001
NOTE: initiating resync of disk group 1 disks
DATA_0001 (1)
NOTE: process _user3750_+asm (3750) initiating offline of disk 1.3916025044 (DATA_0001) with mask 0x7e in group 1 (DATA) without client assisting
NOTE: sending set offline flag message (4172832432) to 1 disk(s) in group 1
WARNING: Disk 1 (DATA_0001) in group 1 mode 0x1 is now being offlined
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0xe969ccd4, mask = 0x6a, op = clear
GMON updating disk modes for group 1 at 17 for pid 20, osid 3750
NOTE: cache closing disk 1 of grp 1: (not open) DATA_0001
NOTE: PST update grp = 1 completed successfully
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0xe969ccd4, mask = 0x7e, op = clear
GMON updating disk modes for group 1 at 18 for pid 20, osid 3750
NOTE: cache closing disk 1 of grp 1: (not open) DATA_0001
NOTE: PST update grp = 1 completed successfully
NOTE: requesting all-instance membership refresh for group=1
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0x0, mask = 0x11, op = assign
GMON updating disk modes for group 1 at 19 for pid 20, osid 3750
NOTE: cache closing disk 1 of grp 1: (not open) DATA_0001
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: PST update grp = 1 completed successfully
NOTE: requesting all-instance disk validation for group=1
NOTE: disk validation pending for 1 disk in group 1/0xceb93c35 (DATA)
NOTE: Found /dev/asm-disk01 for disk DATA_0001
NOTE: completed disk validation for 1/0xceb93c35 (DATA)
NOTE: discarding redo for group 1 disk 1
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0x0, mask = 0x19, op = assign
GMON updating disk modes for group 1 at 20 for pid 20, osid 3750
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: PST update grp = 1 completed successfully
NOTE: membership refresh pending for group 1/0xceb93c35 (DATA)
GMON querying group 1 at 21 for pid 16, osid 2535
NOTE: cache opening disk 1 of grp 1: DATA_0001 path:/dev/asm-disk01
SUCCESS: refreshed membership for 1/0xceb93c35 (DATA)
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0x0, mask = 0x5d, op = assign
GMON updating disk modes for group 1 at 22 for pid 20, osid 3750
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: PST update grp = 1 completed successfully
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0x0, mask = 0x7d, op = assign
GMON updating disk modes for group 1 at 23 for pid 20, osid 3750
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: PST update grp = 1 completed successfully
SUCCESS: alter diskgroup data online all
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: starting rebalance of group 1/0xceb93c35 (DATA) at power 1
Starting background process ARB0
ARB0 started with pid=23, OS id=4986
NOTE: assigning ARB0 to group 1/0xceb93c35 (DATA) with 1 parallel I/O
2016-09-29 13:54:48.265000 +01:00
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0x0, mask = 0x7f, op = assign
GMON updating disk modes for group 1 at 24 for pid 23, osid 4986
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: PST update grp = 1 completed successfully
NOTE: reset timers for disk: 1
NOTE: completed online of disk group 1 disks
DATA_0001 (1)
2016-09-29 13:54:49.580000 +01:00
NOTE: stopping process ARB0
NOTE: requesting all-instance membership refresh for group=1
SUCCESS: rebalance completed for group 1/0xceb93c35 (DATA)
NOTE: membership refresh pending for group 1/0xceb93c35 (DATA)
GMON querying group 1 at 25 for pid 16, osid 2535
SUCCESS: refreshed membership for 1/0xceb93c35 (DATA)
2016-09-29 13:54:51.286000 +01:00
NOTE: Attempting voting file refresh on diskgroup DATA

This looks promising, but what about V$ASM_DISK?

SQL> select name,disk_number,path,mount_status,header_status,mode_status,state,failgroup,repair_timer
  2  from v$asm_disk where group_number = 1 order by disk_number
  3  /

NAME            DISK_NUMBER PATH            MOUNT_S HEADER_STATU MODE_ST STATE    FAILGROUP       REPAIR_TIMER
--------------- ----------- --------------- ------- ------------ ------- -------- --------------- ------------
DATA_0000                 0 /dev/asm-disk02 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0001                 1 /dev/asm-disk01 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0002                 2 /dev/asm-disk03 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0003                 3 /dev/asm-disk06 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0004                 4 /dev/asm-disk07 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0005                 5 /dev/asm-disk08 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0006                 6 /dev/asm-disk04 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0007                 7 /dev/asm-disk05 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0

8 rows selected.

SQL> 

Hurray, I have my disk back! And at no time did I see a rebalance operation. Thanks ASM!

Posted in 12c Release 1, KVM, Linux | 1 Comment »

Comparing workloads across databases

Posted by Martin Bach on September 26, 2016

Does this question sound familiar? “We are running the exact same workload in UAT as we do in production, but it behaves completely different. Can you help?”

If it does, then you might find the following post useful, provided you are appropriately licensed to use this feature anyway. I have been working on AWR data for a long time, and one of the best ways to do so according to David Kurtz and others, is to do with the AWR data directly. I have seen a number of presentations on how to consolidate AWR data from multiple sources in a single repository, and this is just my write up of this same procedure for my purpose. I don’t even want to pretend it’s elegant, but it works for me-it might work for you too but I can’t guarantee that :)

Setup

To demonstrate the concept I once again use Dominic Giles’s most excellent Swingbench. I am simulating two environments, “development” and “production”. Development is hosted on a virtual machine named server3, with 2 virtual CPUs and 8 GB DRAM. The production environment is my lab server, with 2s12c24t and 64 GB DRAM. To demonstrate a case of “same workload-vastly different performance” I created the swingbench order entry schema (SOE) on development without any indexes. All indexes are present in production.

[oracle@server3 ~]$ ./oewizard -cl -create -cs //server3/NCDB -dba system -dbap … -noindexes -nopart -p soe -scale 1 -ts soe_tbs -u soe -v

If you create the schema on the command line like I just did, make sure to grant execute on DBMS_LOCK to SOE as sys once the procedure is complete.

The command to create the swingbench schema in “production” included all the indexes, for this reason alone there should be a lot more throughput seen in “production” compared to “development”.

The workload has been generated using charbench, and I specified that AWR snapshots should be created at the beginning and end of the test. Each test had a run-time of 10 minutes. To make sure I am using the same setup I used the same swingconfig.xml (with the exception of the connection string of course).

[oracle@lab bin]$ ./charbench -c swingconfig_server3.xml -cs //lab/NCDB
Author  :        Dominic Giles
Version :        2.5.0.971

Results will be written to results.xml.
Hit Return to Terminate Run...

Time            Users   TPM     TPS

16:15:27        5       3838    72                                             
...

Working with AWR data

Working with raw AWR data and how to move it between systems is documented in the Database Performance Guide: chapters 6 covers transporting AWR data and chapter 8 covers the aspect of comparison. On a high level, the procedure in this blog involves only very few steps:

  1. Extract AWR data from each database, production and development
  2. Transfer the resulting AWR data to the AWR repository database. In my case this is a third database that will contain the AWR information about itself, and the AWR extracts to be loaded
  3. Import the data

AWR data export

Here is an example of the procedure that I completed on production to export the AWR data for my swingbench run. CAREFUL: the dump can be very large! Ensure you have the necessary space in your destination directory.

[oracle@lab ~]$ ncdb
The Oracle base remains unchanged with value /u01/app/oracle
environment switched to NCDB
[oracle@lab ~]$ sq

SQL*Plus: Release 12.1.0.2.0 Production on Fri Sep 23 15:48:42 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> @?/rdbms/admin/awrextr
~~~~~~~~~~~~~
AWR EXTRACT
~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~  This script will extract the AWR data for a range of snapshots  ~
~  into a dump file.  The script will prompt users for the         ~
~  following information:                                          ~
~     (1) database id                                              ~
~     (2) snapshot range to extract                                ~
~     (3) name of directory object                                 ~
~     (4) name of dump file                                        ~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


Databases in this Workload Repository schema
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

   DB Id     DB Name      Host
------------ ------------ ------------
* 3338715777 NCDB         lab

The default database id is the local one: '3338715777'.  To use this
database id, press <return> to continue, otherwise enter an alternative.

Enter value for dbid: 

Using 3338715777 for Database ID


Specify the number of days of snapshots to choose from
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Entering the number of days (n) will result in the most recent
(n) days of snapshots being listed.  Pressing <return> without
specifying a number lists all completed snapshots.


Enter value for num_days: 1

Listing the last day's Completed Snapshots

DB Name        Snap Id    Snap Started
------------ --------- ------------------
NCDB                64 23 Sep 2016 10:21
                    65 23 Sep 2016 10:21
                    66 23 Sep 2016 10:24
                    67 23 Sep 2016 10:24
                    68 23 Sep 2016 10:34
                    69 23 Sep 2016 10:34
                    70 23 Sep 2016 15:51
                    71 23 Sep 2016 16:08
                    72 23 Sep 2016 16:18


Specify the Begin and End Snapshot Ids
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Enter value for begin_snap: 71
Begin Snapshot Id specified: 71

Enter value for end_snap: 72
End   Snapshot Id specified: 72


Specify the Directory Name
~~~~~~~~~~~~~~~~~~~~~~~~~~

Directory Name                 Directory Path
------------------------------ -------------------------------------------------
DATA_FILE_DIR                  /u01/app/oracle/product/12.1.0.2/dbhome_1/demo/sc
                               hema/sales_history/

DATA_PUMP_DIR                  /u01/app/oracle/admin/NCDB/dpdump/
LOG_FILE_DIR                   /u01/app/oracle/product/12.1.0.2/dbhome_1/demo/sc
                               hema/log/

MEDIA_DIR                      /u01/app/oracle/product/12.1.0.2/dbhome_1/demo/sc
                               hema/product_media/

OPATCH_INST_DIR                /u01/app/oracle/product/12.1.0.2/dbhome_1/OPatch

Directory Name                 Directory Path
------------------------------ -------------------------------------------------
OPATCH_LOG_DIR                 /u01/app/oracle/product/12.1.0.2/dbhome_1/QOpatch
OPATCH_SCRIPT_DIR              /u01/app/oracle/product/12.1.0.2/dbhome_1/QOpatch
ORACLE_BASE                    /
ORACLE_HOME                    /
ORACLE_OCM_CONFIG_DIR          /u01/app/oracle/product/12.1.0.2/dbhome_1/ccr/hos
                               ts/lab/state

ORACLE_OCM_CONFIG_DIR2         /u01/app/oracle/product/12.1.0.2/dbhome_1/ccr/sta
                               te

SS_OE_XMLDIR                   /u01/app/oracle/product/12.1.0.2/dbhome_1/demo/sc

Directory Name                 Directory Path
------------------------------ -------------------------------------------------
                               hema/order_entry/

SUBDIR                         /u01/app/oracle/product/12.1.0.2/dbhome_1/demo/sc
                               hema/order_entry//2002/Sep

XMLDIR                         /u01/app/oracle/product/12.1.0.2/dbhome_1/rdbms/x
                               ml

XSDDIR                         /u01/app/oracle/product/12.1.0.2/dbhome_1/rdbms/x
                               ml/schema


Choose a Directory Name from the above list (case-sensitive).

Enter value for directory_name: DATA_PUMP_DIR

Using the dump directory: DATA_PUMP_DIR

Specify the Name of the Extract Dump File
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The prefix for the default dump file name is awrdat_71_72.
To use this name, press <return> to continue, otherwise enter
an alternative.

Enter value for file_name: prod_awrdat_71_72

Using the dump file prefix: prod_awrdat_71_72
|
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|  The AWR extract dump file will be located
|  in the following directory/file:
|   /u01/app/oracle/admin/NCDB/dpdump/
|   prod_awrdat_71_72.dmp
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|  *** AWR Extract Started ...
|
|  This operation will take a few moments. The
|  progress of the AWR extract operation can be
|  monitored in the following directory/file:
|   /u01/app/oracle/admin/NCDB/dpdump/
|   prod_awrdat_71_72.log
|

End of AWR Extract
SQL> 

As a result of that operation you’ll get 2 files: a log file and a dump file. As you can imagine there is some data pump activity involved. Transfer the dump file to your AWR repository database. In a fit of curiosity I named that database AWRDWH.

Repeat this task with the other database, in my case it’s NCDB on server3, selecting the releavant AWR snapshots and transfer the dump file to the AWRDWH database’s DATA_PUMP_DIR.

Loading the AWR extracts

Once the data is available for load, switch environment to the AWR repository database and load it. In this example I’m loading the extract from server3.


[oracle@lab ~]$ awrdwh
The Oracle base remains unchanged with value /u01/app/oracle
environment switched to AWRDWH
[oracle@lab ~]$ sq

SQL*Plus: Release 12.1.0.2.0 Production on Fri Sep 23 15:54:18 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options


SQL> @?/rdbms/admin/awrload
~~~~~~~~~~
AWR LOAD
~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~  This script will load the AWR data from a dump file. The   ~
~  script will prompt users for the following information:    ~
~     (1) name of directory object                            ~
~     (2) name of dump file                                   ~
~     (3) staging schema name to load AWR data into           ~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Specify the Directory Name
~~~~~~~~~~~~~~~~~~~~~~~~~~

Directory Name                 Directory Path
------------------------------ -------------------------------------------------
DATA_PUMP_DIR                  /u01/app/oracle/admin/AWRDWH/dpdump/
OPATCH_INST_DIR                /u01/app/oracle/product/12.1.0.2/dbhome_1/OPatch
OPATCH_LOG_DIR                 /u01/app/oracle/product/12.1.0.2/dbhome_1/QOpatch
OPATCH_SCRIPT_DIR              /u01/app/oracle/product/12.1.0.2/dbhome_1/QOpatch
ORACLE_BASE                    /
ORACLE_HOME                    /
ORACLE_OCM_CONFIG_DIR          /u01/app/oracle/product/12.1.0.2/dbhome_1/ccr/hos
                               ts/lab/state

ORACLE_OCM_CONFIG_DIR2         /u01/app/oracle/product/12.1.0.2/dbhome_1/ccr/sta
                               te

XMLDIR                         /u01/app/oracle/product/12.1.0.2/dbhome_1/rdbms/x
                               ml

XSDDIR                         /u01/app/oracle/product/12.1.0.2/dbhome_1/rdbms/x
                               ml/schema


Choose a Directory Name from the list above (case-sensitive).

Enter value for directory_name: DATA_PUMP_DIR

Using the dump directory: DATA_PUMP_DIR

Specify the Name of the Dump File to Load
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Please specify the prefix of the dump file (.dmp) to load:

Enter value for file_name: dev_awrdat_6_7

Loading from the file name: dev_awrdat_6_7.dmp

Staging Schema to Load AWR Snapshot Data
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The next step is to create the staging schema
where the AWR snapshot data will be loaded.
After loading the data into the staging schema,
the data will be transferred into the AWR tables
in the SYS schema.


The default staging schema name is AWR_STAGE.
To use this name, press <return> to continue, otherwise enter
an alternative.

Enter value for schema_name: 

Using the staging schema name: AWR_STAGE

Choose the Default tablespace for the AWR_STAGE user
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Choose the AWR_STAGE users's default tablespace.  This is the
tablespace in which the AWR data will be staged.

TABLESPACE_NAME                CONTENTS  DEFAULT TABLESPACE
------------------------------ --------- ------------------
SYSAUX                         PERMANENT *
USERS                          PERMANENT

Pressing <return> will result in the recommended default
tablespace (identified by *) being used.

Enter value for default_tablespace: USERS

Using tablespace USERS as the default tablespace for the AWR_STAGE


Choose the Temporary tablespace for the AWR_STAGE user
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Choose the AWR_STAGE user's temporary tablespace.

TABLESPACE_NAME                CONTENTS  DEFAULT TEMP TABLESPACE
------------------------------ --------- -----------------------
TEMP                           TEMPORARY *

Pressing <return> will result in the database's default temporary
tablespace (identified by *) being used.

Enter value for temporary_tablespace: TEMP

Using tablespace TEMP as the temporary tablespace for AWR_STAGE




... Creating AWR_STAGE user

|
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|  Loading the AWR data from the following
|  directory/file:
|   /u01/app/oracle/admin/AWRDWH/dpdump/
|   dev_awrdat_6_7.dmp
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|  *** AWR Load Started ...
|
|  This operation will take a few moments. The
|  progress of the AWR load operation can be
|  monitored in the following directory/file:
|   /u01/app/oracle/admin/AWRDWH/dpdump/
|   dev_awrdat_6_7.log
|
... Dropping AWR_STAGE user

End of AWR Load


Repeat the process with the production extract.

Compare

Now that you have both extracts, you can actually compare them!


SQL> @?/rdbms/admin/awrddrpi

Specify the Report Type
~~~~~~~~~~~~~~~~~~~~~~~
Would you like an HTML report, or a plain text report?
Enter 'html' for an HTML report, or 'text' for plain text
Defaults to 'html'
Enter value for report_type: text

Type Specified:  text


Instances in this Workload Repository schema
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

   DB Id     Inst Num DB Name      Instance     Host
------------ -------- ------------ ------------ ------------
* 539355091         1 AWRDWH       AWRDWH       lab
  3338715777        1 NCDB         NCDB         lab
  3612449208        1 SLOB         SLOB         lab
  3363997070        1 NCDB         NCDB         server3.exam
                                                ple.com

Database Id and Instance Number for the First Pair of Snapshots
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Enter value for dbid: 3363997070
Using 3363997070 for Database Id for the first pair of snapshots
Enter value for inst_num: 1
Using 1 for Instance Number for the first pair of snapshots


Specify the number of days of snapshots to choose from
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Entering the number of days (n) will result in the most recent
(n) days of snapshots being listed.  Pressing <return> without
specifying a number lists all completed snapshots.


Enter value for num_days: 1

Listing the last day's Completed Snapshots

                                                        Snap
Instance     DB Name        Snap Id    Snap Started    Level
------------ ------------ --------- ------------------ -----
NCDB         NCDB                 6 23 Sep 2016 13:17      1
                                  7 23 Sep 2016 13:27      1



Specify the First Pair of Begin and End Snapshot Ids
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Enter value for begin_snap: 6
First Begin Snapshot Id specified: 6

Enter value for end_snap: 7
First End   Snapshot Id specified: 7




Instances in this Workload Repository schema
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

   DB Id     Inst Num DB Name      Instance     Host
------------ -------- ------------ ------------ ------------
* 539355091         1 AWRDWH       AWRDWH       lab
  3338715777        1 NCDB         NCDB         lab
  3612449208        1 SLOB         SLOB         lab
  3363997070        1 NCDB         NCDB         server3.exam
                                                ple.com




Database Id and Instance Number for the Second Pair of Snapshots
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Enter value for dbid2: 3338715777
Using 3338715777 for Database Id for the second pair of snapshots
Enter value for inst_num2: 1
Using 1 for Instance Number for the second pair of snapshots


Specify the number of days of snapshots to choose from
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Entering the number of days (n) will result in the most recent
(n) days of snapshots being listed.  Pressing <return> without
specifying a number lists all completed snapshots.


Enter value for num_days2: 1

Listing the last day's Completed Snapshots

                                                        Snap
Instance     DB Name        Snap Id    Snap Started    Level
------------ ------------ --------- ------------------ -----
NCDB         NCDB                64 23 Sep 2016 10:21      1
                                 65 23 Sep 2016 10:21      1
                                 66 23 Sep 2016 10:24      1
                                 67 23 Sep 2016 10:24      1
                                 68 23 Sep 2016 10:34      1
                                 69 23 Sep 2016 10:34      1

                                 71 23 Sep 2016 16:08      1
                                 72 23 Sep 2016 16:18      1



Specify the Second Pair of Begin and End Snapshot Ids
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Enter value for begin_snap2: 71
Second Begin Snapshot Id specified: 71

Enter value for end_snap2: 72
Second End   Snapshot Id specified: 72



Specify the Report Name
~~~~~~~~~~~~~~~~~~~~~~~
The default report file name is awrdiff_1_6_1_71.txt  To use this name,
press <return> to continue, otherwise enter an alternative.

Enter value for report_name: 

Using the report name awrdiff_1_6_1_71.txt
...

This will produce the report in text format-not as useful as the HTML counterpart but it helps me copy/paste it here. Let’s have a look at the result:

WORKLOAD REPOSITORY COMPARE PERIOD REPORT

Snapshot Set    DB Id    Instance     Inst Num Release     Cluster Host          Std Block Size
------------ ----------- ------------ -------- ----------- ------- ------------ ---------------
First (1st)   3363997070 NCDB                1 12.1.0.2.0  NO      server3.exam     8192
Second (2nd)  3338715777 NCDB                1 12.1.0.2.0  NO      lab              8192

Snapshot Set  Begin Snap Id Begin Snap Time            End Snap Id End Snap Time                  Avg Active Users           Elapsed Time (min)            DB time (min)
------------ -------------- ------------------------- ------------ ------------------------- -------------------------- -------------------------- --------------------------
1st                       6 23-Sep-16 13:17:32 (Fri)            7 23-Sep-16 13:27:35 (Fri)                       4.7                      10.1                      47.7
2nd                      71 23-Sep-16 16:08:31 (Fri)           72 23-Sep-16 16:18:32 (Fri)                       0.6                      10.0                       5.7
                                                                                      ~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                                                      %Diff:                   -88.0%                      -0.4%                     -88.1%

Host Configuration Comparison
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                     1st                  2nd                 Diff     %Diff
----------------------------------- -------------------- -------------------- -------------------- ---------
Number of CPUs:                                        2                   24                   22   1,100.0
Number of CPU Cores:                                   2                   12                   10     500.0
Number of CPU Sockets:                                 2                    2                    0       0.0
Physical Memory:                                 7725.4M             64135.1M              -56410M     730.2
Load at Start Snapshot:                             2.04                  .95                -1.09     -53.4
Load at End Snapshot:                               3.12                 1.15                -1.97     -63.1
%User Time:                                        95.41                 1.51               -93.89     -98.4
%System Time:                                       3.53                  2.6                 -.93     -26.3
%Idle Time:                                          .94                95.87                94.93  10,098.9
%IO Wait Time:                                       .13                  .17                  .04      30.8
Cache Sizes
~~~~~~~~~~~
                          1st (M)    2nd (M)   Diff (M)    %Diff
---------------------- ---------- ---------- ---------- --------
Memory Target
.....SGA Target             768.0    8,192.0    7,424.0    966.6
..........Buffer Cache      516.0    5,440.0    4,924.0    954.3
..........Shared Pool       200.0    1,152.0      952.0    476.0
..........Large Pool          8.0      640.0      632.0  7,900.0
..........Java Pool           4.0      384.0      380.0  9,500.0
..........Streams Pool                 128.0      128.0    100.0
.....PGA Target             256.0    8,192.0    7,936.0  3,100.0
Log Buffer                    4.9       55.6       50.7  1,024.3
In-Memory Area                             0          0      0.0

Workload Comparison
~~~~~~~~~~~~~~~~~~~                      1st Per Sec          2nd Per Sec      %Diff              1st Per Txn          2nd Per Txn      %Diff
                                     ---------------      ---------------     ------          ---------------      ---------------     ------
                       DB time:                  4.7                  0.6      -88.0                      1.6                  0.0      -99.4
                      CPU time:                  1.9                  0.3      -86.6                      0.6                  0.0     -100.0
           Background CPU time:                  0.0                  0.1      600.0                      0.0                  0.0        0.0
             Redo size (bytes):             16,253.0            201,173.5    1,137.8                  5,461.1              3,354.0      -38.6
         Logical read (blocks):            266,911.0             30,851.5      -88.4                 89,684.0                514.4      -99.4
                 Block changes:                 76.9              1,343.3    1,646.4                     25.8                 22.4      -13.3
        Physical read (blocks):             37,003.5                600.2      -98.4                 12,433.4                 10.0      -99.9
       Physical write (blocks):                  7.5                 90.8    1,113.4                      2.5                  1.5      -39.8
              Read IO requests:                803.9                 65.5      -91.9                    270.1                  1.1      -99.6
             Write IO requests:                  5.6                 42.2      659.0                      1.9                  0.7      -62.6
                  Read IO (MB):                289.1                  4.7      -98.4                     97.1                  0.1      -99.9
                 Write IO (MB):                  0.1                  0.7    1,083.3                      0.0                  0.0      -50.0
                  IM scan rows:                  0.0                  0.0        0.0                      0.0              3,324.7      100.0
       Session Logical Read IM:
                    User calls:                 10.0                192.3    1,815.4                      3.4                  3.2       -4.7
                  Parses (SQL):                 14.7                 71.6      388.3                      4.9                  1.2      -75.9
             Hard parses (SQL):                  2.2                  0.4      -80.6                      0.7                  0.0      -98.6
            SQL Work Area (MB):                  1.3                 20.9    1,558.7                      0.4                  0.3    1,558.7
                        Logons:                  0.1                  0.1        0.0                      0.0                  0.0     -100.0
                Executes (SQL):                 67.6                658.0      873.5                     22.7                 11.0      -51.7
                  Transactions:                  3.0                 60.0    1,912.8

                                               First               Second       Diff
                                     ---------------      ---------------     ------
     % Blocks changed per Read:                  0.0                  4.4        4.3
              Recursive Call %:                 94.5                 79.5      -15.0
    Rollback per transaction %:                  0.0                  0.0        0.0
                 Rows per Sort:                 18.1                 12.8       -5.4
    Avg DB time per Call (sec):                  0.5                  0.0       -0.5

One of the treats of the difference report is that it shows you the difference in environment. It is immediately visible that the hosts are quite different, and as a result, so is the throughput. This is not much of a revelation because I knew from the beginning that my results were going to be different, but if you are seeing the systems for the first time this might be useful information.

Using sections further down in the report you should be able to compare execution statistics for the various SQL IDs (they are identical because of my identical workloads), and other pieces of information that give you a better understanding of what is going on.

Posted in Oracle | 1 Comment »

Little things worth knowing: using Remote Diagnostic Agent more effectively with profiles

Posted by Martin Bach on August 26, 2016

RDA, or the Remote Diagnostics Agent, has been around for a little while. Over the time, and with the growing number of Oracle’s acquisitions it has become, shall we say, a little more difficult to handle. It appears to me as if every one of them will have its diagnostics handled by RDA making it hard to focus on something specific, like for example the database.

I won’t go into very detail of the Remote Diagnostic Agent in this post; please make sure you read and understand the documentation on MOS and the README* files that come with it before using it.

I based this blog post on Oracle Linux 7.2 and RDA 8.12.16.6.14 in case you find this via a search engine.

Basic Usage

In the first step you need to download RDA from My Oracle Support. I used DOC ID 314422.1 to get it for Linux x86-64. Installation was simple, I just unzipped the zipfile as always into the RDBMS owner’s home which has plenty of free space for data collection in my lab systems. You might actually already have a copy of RDA somewhere that shipped with your Oracle product as it comes bundled with a lot of them. I am focusing on the Oracle database and infrastructure and as far as I know, have to download RDA in this case.

If you follow the documentation, you verify the installation and begin with the setup. And here is where I am starting to struggle with the tool’s concept, but let me explain why:

[oracle@server1 rda]$ ./rda.sh -S
------------------------------------------------------------------------------
RDA.BEGIN: Initializes the Data Collection
------------------------------------------------------------------------------
Enter the Oracle home to be used for data analysis
> /u01/app/oracle/product/12.1.0.2/dbhome_1

------------------------------------------------------------------------------
RDA.CONFIG: Collects Key Configuration Information
------------------------------------------------------------------------------
------------------------------------------------------------------------------
SAMPLE.SAMPLE: Controls Sampling
------------------------------------------------------------------------------
------------------------------------------------------------------------------
RDA.CCR: Collects OCM Diagnostic Information
------------------------------------------------------------------------------
Do you want to diagnose OCM installations (Y/N)?
Press Return to accept the default (N)
> n

------------------------------------------------------------------------------
RDA.CUST: Collects Customer-Specific Information
------------------------------------------------------------------------------
------------------------------------------------------------------------------
OS.OS: Collects the Operating System Information
------------------------------------------------------------------------------
------------------------------------------------------------------------------
OS.PROF: Collects the User Profile
------------------------------------------------------------------------------
------------------------------------------------------------------------------
OS.NET: Collects Network Information
------------------------------------------------------------------------------
Do you want RDA to perform the network ping tests (Y/N)?
Press Return to accept the default (N)
> 
...

And this takes you on a long journey where you are asked to collect diagnostic information about pretty much every Oracle product or at least so it seems. I was actually only interested in my local Oracle database installation, but nevertheless I was prompted for Oracle VM for example:


...

------------------------------------------------------------------------------
OS.OVMS: Collects Oracle VM Server Information
------------------------------------------------------------------------------
Should RDA analyze Oracle VM Server (Y/N)?
Press Return to accept the default (N)
> n

------------------------------------------------------------------------------
OS.OVMM: Collects Oracle VM Manager Information
------------------------------------------------------------------------------
Do you want RDA to analyze Oracle VM Manager (Y/N)?
Press Return to accept the default (N)
> n

...

A useful question for some, but not for me at this point. And so on; the list gets longer it seems with every release. And I am not willing to answer 42,000 questions every time I deploy RDA! That left me with 2 options:

  • Patiently go through the list and dutifully answer every question. If you inadvertently hit the return key one time too often-which happens quite easily-you can correct the problem later by editing the configuration file.
  • Consider using profiles

Option 1 might be viable because it’s a one-off process but I personally don’t find it very practical for various reasons. And yes, it seems possible to let RDA “guess” your environment but that didn’t work as I expected.

Profiles

Profiles on the other hand are easy to use! You can view them online on MOS 391983.1 or alternatively as part of RDA’s built-in man-page by executing ./rda.sh -L and searching for “Available profiles:”. If you haven’t set the PAGER variable you might end up with more, but I think less is more :)

[oracle@server1 rda]$ PAGER=less ./rda.sh -L

...

Available profiles:
  AS10g                            Oracle Application Server 10g problems
  AS10g_Discoverer                 Discoverer 10g problems

...

  DB10g                            Oracle Database 10g problems
  DB11g                            Oracle Database 11g problems
  DB12c                            Oracle Database 12c problems
  DB8i                             Oracle Database 8i problems
  DB9i                             Oracle Database 9i problems
  DB_Assessment                    Oracle Database assessment collections
  DB_BackupRecovery                Oracle Database backup and recovery

...

These profiles are a logical grouping of various tests provided by the RDA framework. From a remote support point of view they are fantastic and leave little room for user error! Always get the information you need, not a report that only has half of the required information for troubleshooting. If you like to see more detail you can combine the -M (for the manual) and -p (for profile) options as in this example:

[oracle@server1 rda]$ ./rda.sh -M -p DB11g
NAME
    Profile DB11g - Oracle Database 11g problems

MODULES
    The DB11g profile uses the following modules:
      DB:DCdb    Controls Oracle RDBMS Data Collection
      DB:DCdba   Collects Oracle RDBMS Information
      DB:DCdbm   Collects Oracle RDBMS Memory Information
      DB:DCdnfs  Collects Direct NFS Information
      DB:DClog   Collects Oracle Database Trace and Log Files
      DB:DCsp    Collects SQL*Plus/iSQL*Plus Information
      EM:DCagt   Collects Enterprise Manager Agent Information
      EM:DCdbc   Collects Database Control Information
      EM:DCgrid  Controls Grid Control Data Collection
      OS:DCinst  Collects the Oracle Installation Information
      OS:DCnet   Collects Network Information
      OS:DConet  Collects Oracle Net Information
      OS:DCos    Collects the Operating System Information
      OS:DCperf  Collects Performance Information
      OS:DCprof  Collects the User Profile

COPYRIGHT NOTICE
    Copyright (c) 2002, 2016, Oracle and/or its affiliates. All rights
    reserved.

TRADEMARK NOTICE
    Oracle and Java are registered trademarks of Oracle and/or its affiliates.
    Other names may be trademarks of their respective owners.

[oracle@server1 rda]$ 

By the way, rda.sh -M displays the entire man page.

Coming back to the profiles: I found the Assessments to be quite useful and a good starting point if you would like to get an overview:

[oracle@server1 rda]$ ./rda.sh -L | grep Assessment
  ADBA                             Collects ACS Oracle Database Assessment
  Apps_DB_Assessment               Oracle Applications Database assessment
  Asm_Assessment                   Oracle ASM assessment collections
  DB_Assessment                    Oracle Database assessment collections
  Exadata_Assessment               Oracle Exadata assessment collections
  Maa_Assessment                   Maximum Availability Architecture
  Maa_Exa_Assessment               Maximum Availability Architecture with
  Rac_Assessment                   Real Application Cluster assessment
  DB_Assessment                    DB Assessment collections

Specifying a profile does not entirely relieve you from having to enter information in an interactive session, but it reduces the time needed to complete the initial configuration.

Next time you need to run RDA, have a look at the avaible profiles, maybe there’s one that serves your needs!

Posted in Linux, Oracle | Tagged: | 1 Comment »

OSWatcher integration in Trace File Analyzer (TFA)

Posted by Martin Bach on August 2, 2016

Some time ago I wrote a post about using OSWatcher for system analysis. Neil Chandler (@ChandlerDBA) rightfully pointed out that although OSWatcher was cool, TFA was the way to go. TFA can include OSWatcher, but more importantly it adds a lot of value over and above what OSWatcher does.

I guess it depends on what you want to do-I still think that OSWatcher is a good starting point and enough for most problems on single instance systems. When it comes to clustered environments, TFA looks a lot more appealing though.

In this article I am taking a closer look at using TFA – which is part of the Oracle 11.2.0.4 and 12.1.0.2. TFA is automatically updated as part of the quarterly patches, which is nice because the default/base release does not seem to be working properly. Thankfully TFA can be patched outside the regular patch cycle.

What is TFA?

TFA is a tool which – amongst other things – helps you gather information about incidents across your cluster. If you ever worked on Exadata half-racks or other clusters with more than 4 nodes you will quickly start to appreciate having to use one tool for this task. The TFA output is suitable for attaching to a Service Request which should, at least in theory, help speed up the problem resolution.

It is also an excellent parsing tool and has excellent reporting capabilities thanks to its “analyze” command.

As an added benefit you get a lot of tools that were previously known as “RAC and DB Support Tools Bundle”. This includes OSWatcher as well, the reason for this post.

Plus you don’t have to worry about starting OSWatcher when booting: TFA is started via a systemd unit file in Oracle Linux 7, and I found it started as a service in Oracle Linux 6. On OL7.x you can check its status using the standard systemd commands suite, as shown here:

[oracle@rac12sbnode1 ~]$ systemctl status oracle-tfa
oracle-tfa.service - Oracle Trace File Analyzer
   Loaded: loaded (/etc/systemd/system/oracle-tfa.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2016-08-02 09:46:24 BST; 3h 14min ago
 Main PID: 27799 (init.tfa)
   CGroup: /system.slice/oracle-tfa.service
           ├─14670 /bin/sleep 30
           ├─27799 /bin/sh /etc/init.d/init.tfa run >/dev/null 2>&1 </dev/null
           └─27890 /u01/app/12.1.0.2/grid/jdk/jre/bin/java -Xms128m -Xmx512m \
                     oracle.rat.tfa.TFAMain /u01/app/12.1.0.2/grid/tfa/rac12sbnode1/tfa_home
[oracle@rac12sbnode1 ~]$ 

Updating TFA

It is quite likely that your version of TFA is older than the one available from MOS ID 1513912.2, which appears to be its main landing page. I applied the proactive bundle patch for July 2016 to my 2 node RAC cluster and found the TFA version to be 12.1.2.7.0. At the time of writing Oracle has released TFA 12.1.2.8.0.

The update is quite simple, but needs to be performed as root. To be sure I’m not doing something I shouldn’t be doing I checked the current version:

[oracle@rac12sbnode1 ~]$ tfactl print status

.----------------------------------------------------------------------------------------------------.
| Host         | Status of TFA | PID   | Port | Version    | Build ID             | Inventory Status |
+--------------+---------------+-------+------+------------+----------------------+------------------+
| rac12sbnode1 | RUNNING       | 23081 | 5000 | 12.1.2.7.0 | 12127020160304140533 | COMPLETE         |
| rac12sbnode2 | RUNNING       |  6296 | 5000 | 12.1.2.7.0 | 12127020160304140533 | COMPLETE         |
'--------------+---------------+-------+------+------------+----------------------+------------------'

In the next step, after switching to the root account, I staged the TFA software and executed the installer. This will automatically distribute the new version across all nodes in the cluster.

[root@rac12sbnode1 patches]# . oraenv
ORACLE_SID = [root] ? +ASM1
The Oracle base has been set to /u01/app/oracle
[root@rac12sbnode1 patches]# unzip -q TFALite_v12.1.2.8.0.zip 
[root@rac12sbnode1 patches]# ./installTFALite 
TFA Installation Log will be written to File : /tmp/tfa_install_25296_2016_08_02-09_40_12.log

Starting TFA installation

TFA HOME : /u01/app/12.1.0.2/grid/tfa/rac12sbnode1/tfa_home
TFA Build Version: 121280 Build Date: 201606232222
Installed Build Version: 121270 Build Date: 201603041405

TFA is already installed. Patching /u01/app/12.1.0.2/grid/tfa/rac12sbnode1/tfa_home...
TFA patching typical install from zipfile is written to /u01/app/12.1.0.2/grid/tfa/rac12sbnode1/tfapatch.log

TFA will be Patched on: 
rac12sbnode1
rac12sbnode2

Do you want to continue with patching TFA? [Y|N] [Y]: y

Checking for ssh equivalency in rac12sbnode2
Node rac12sbnode2 is not configured for ssh user equivalency

SSH is not configured on these nodes : 
rac12sbnode2

Do you want to configure SSH on these nodes ? [Y|N] [Y]: y

Configuring SSH on rac12sbnode2...

Generating keys on rac12sbnode1...

Copying keys to rac12sbnode2...

/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@rac12sbnode2's password: 

Creating ZIP: /u01/app/12.1.0.2/grid/tfa/rac12sbnode1/tfa_home/internal/tfapatch.zip

Using SSH to patch TFA to remote nodes :

Applying Patch on rac12sbnode2:

TFA_HOME: /u01/app/12.1.0.2/grid/tfa/rac12sbnode2/tfa_home
Stopping TFA Support Tools...
Shutting down TFA
Removed symlink /etc/systemd/system/multi-user.target.wants/oracle-tfa.service.
Removed symlink /etc/systemd/system/graphical.target.wants/oracle-tfa.service.
. . . . . 
. . . 
Successfully shutdown TFA..
Copying files from rac12sbnode1 to rac12sbnode2...

Current version of Berkeley DB in  is 5.0.84, so no upgrade required
Running commands to fix init.tfa and tfactl in rac12sbnode2...
Updating init.tfa in rac12sbnode2...
Starting TFA in rac12sbnode2...
Starting TFA..
Created symlink from /etc/systemd/system/multi-user.target.wants/oracle-tfa.service to /etc/systemd/system/oracle-tfa.service.
Created symlink from /etc/systemd/system/graphical.target.wants/oracle-tfa.service to /etc/systemd/system/oracle-tfa.service.
Waiting up to 100 seconds for TFA to be started..
. . . . . 
Successfully started TFA Process..
. . . . . 
TFA Started and listening for commands

Enabling Access for Non-root Users on rac12sbnode2...


Applying Patch on rac12sbnode1:

Stopping TFA Support Tools...

Shutting down TFA for Patching...

Shutting down TFA
Removed symlink /etc/systemd/system/graphical.target.wants/oracle-tfa.service.
Removed symlink /etc/systemd/system/multi-user.target.wants/oracle-tfa.service.
. . . . . 
. . . 
Successfully shutdown TFA..

Current version of Berkeley DB is 5.0.84, so no upgrade required

Copying TFA Certificates...

Running commands to fix init.tfa and tfactl in localhost

Starting TFA in rac12sbnode1...

Starting TFA..
Created symlink from /etc/systemd/system/multi-user.target.wants/oracle-tfa.service to /etc/systemd/system/oracle-tfa.service.
Created symlink from /etc/systemd/system/graphical.target.wants/oracle-tfa.service to /etc/systemd/system/oracle-tfa.service.
Waiting up to 100 seconds for TFA to be started..
. . . . . 
Successfully started TFA Process..
. . . . . 
TFA Started and listening for commands

Enabling Access for Non-root Users on rac12sbnode1...

root@rac12sbnode2's password: 
Removed SSH configuration on rac12sbnode2...

.--------------------------------------------------------------------.
| Host         | TFA Version | TFA Build ID         | Upgrade Status |
+--------------+-------------+----------------------+----------------+
| rac12sbnode1 |  12.1.2.8.0 | 12128020160623222219 | UPGRADED       |
| rac12sbnode2 |  12.1.2.8.0 | 12128020160623222219 | UPGRADED       |
'--------------+-------------+----------------------+----------------'

[root@rac12sbnode1 patches]# 

This has upgraded TFA in one easy step.

Support Tools Bundle Missing with stock-TFA

If you read MOS 1513912.2 carefully, you undoubtedly spotted that beginning with TFA 12.1.2.3.0 the RAC and DB Support Tools Bundle is included with TFA, alongside some other very useful utilities. But you only get them after deploying TFA from MOS. Here is the list as shown post-patch:

[oracle@rac12sbnode1 ~]$ tfactl toolstatus
.-------------------------------------------.
|           External Support Tools          |
+--------------+--------------+-------------+
| Host         | Tool         | Status      |
+--------------+--------------+-------------+
| rac12sbnode1 | alertsummary | DEPLOYED    |
| rac12sbnode1 | exachk       | DEPLOYED    |
| rac12sbnode1 | ls           | DEPLOYED    |
| rac12sbnode1 | pstack       | DEPLOYED    |
| rac12sbnode1 | orachk       | DEPLOYED    |
| rac12sbnode1 | sqlt         | DEPLOYED    |
| rac12sbnode1 | grep         | DEPLOYED    |
| rac12sbnode1 | summary      | DEPLOYED    |
| rac12sbnode1 | prw          | NOT RUNNING |
| rac12sbnode1 | vi           | DEPLOYED    |
| rac12sbnode1 | tail         | DEPLOYED    |
| rac12sbnode1 | param        | DEPLOYED    |
| rac12sbnode1 | dbglevel     | DEPLOYED    |
| rac12sbnode1 | darda        | DEPLOYED    |
| rac12sbnode1 | history      | DEPLOYED    |
| rac12sbnode1 | oratop       | DEPLOYED    |
| rac12sbnode1 | oswbb        | RUNNING     |
| rac12sbnode1 | dbperf       | RUNNING     |
| rac12sbnode1 | changes      | DEPLOYED    |
| rac12sbnode1 | events       | DEPLOYED    |
| rac12sbnode1 | ps           | DEPLOYED    |
| rac12sbnode1 | srdc         | DEPLOYED    |
'--------------+--------------+-------------'

The stock-version, although it gets patched with the proactive bundle patch, does not include them. I ran this command before applying the TFA patch, but after having applied the proactive bundle patch to my cluster:

[oracle@rac12sbnode1 ~]$ tfactl toolstatus
.------------------------.
| External Support Tools |
+-------+-------+--------+
| Host  | Tool  | Status |
+-------+-------+--------+
'-------+-------+--------'

This is actually a feature, not a bug, as documented in MOS 2054786.1. The note states quite clearly that the RAC and DB Support Tools bundle is only installed if you deploy the MOS version. I just did that; I am good.

TFA Tools

I really love the idea of having these tools availble. The TFA user guide, also available from MOS 1513912.2 (tab “Users Guide”) explains from page 39 onwards how to use them.

For example-

[oracle@rac12sbnode1 ~]$ tfactl oratop -h
Usage : /u01/app/12.1.0.2/grid/tfa/rac12sbnode1/tfa_home/bin/tfactl.pl oratop
         -database <dbname> <Oratop Options> <logon>

Options: 
-database <dbname> Database name to run oratop
<logon> : defalut will be / as sysdba. Specify a different user using
          {username[/password][@connect_identifier] | / }
          [AS {SYSDBA|SYSOPER}]
          connect_identifier: host[:port]/[service_name]
<Oratop Options>:
-k : FILE#:BLOCK#, section 4 lt is (EVENT/LATCH)
-m : MODULE/ACTION, section 4 (default is USERNAME/PROGRAM)
-s : SQL mode, section 4 (default is process mode)
-c : database service mode (default is connect string)
-f : detailed format, 132 columns (default: standard, 80 columns)
-b : batch mode (default is text-based user interface)
-n : maximum number of iterations (requires number)
-i : interval delay, requires value in seconds (default: 5s)

e.g:
   /u01/app/12.1.0.2/grid/tfa/rac12sbnode1/tfa_home/bin/tfactl.pl oratop -database testdb1
   /u01/app/12.1.0.2/grid/tfa/rac12sbnode1/tfa_home/bin/tfactl.pl oratop -database testdb1 -bn1


Makes for interesting output, this example is from a system running the Swingbench Order Entry benchmark on an overloaded set of VMs:


Oracle 12c - Primary NCDB   11:19:31 up:  14h,   2 ins,    5 sn,   1 us, 2.7G mt,    4% fra,   0 er,                      93.3% db
ID %CPU LOAD %DCU   AAS  ASC  ASI  ASW  ASP  AST  UST MBPS IOPS IORL LOGR PHYR PHYW  %FR   PGA TEMP UTPS UCPS SSRT DCTR DWTR  %DBT
 1   50    2   16   2.2    0    1    2    0    3    3  222  108   1m  25k  23k   25    6  373M    0   12   57  12m   14   85  59.2
 2   31    2    9   1.5    1    2    0    0    2    2  105  104 876u  19k  18k   14    5  385M    0    7   42  10m   13   86  40.8

EVENT (C)                                                         TOTAL WAITS   TIME(s)  AVG_MS  PCT                    WAIT_CLASS
DB CPU                                                                             1636           39                              
log file parallel write                                                 49704       849    17.2   20                    System I/O
control file sequential read                                           344732       687     2.0   16                    System I/O
log file switch (checkpoint incomplete)                                   319       612  1739.7   15                 Configuration
log file sync                                                            6328       404    63.2   10                        Commit

ID   SID     SPID USERNAME  PROGRAM    SRV  SERVICE  PGA  SQLID/BLOCKER OPN  E/T  STA  STE  WAIT_CLASS  EVENT/*LATCH           W/T
 2   249     9949 B/G       LG01       DED  SYS$BAC 1.4M                     14h  ACT  WAI  System I/O  log file parallel wri  49m
 1    11    25634 B/G       DBW0       DED  SYS$BAC  12M                     14h  ACT  WAI  System I/O  db file parallel writ  42m
 2   248     9945 B/G       LG00       DED  SYS$BAC 1.4M                     14h  ACT  WAI  System I/O  log file parallel wri  31m
 1   247    25636 B/G       LGWR       DED  SYS$BAC 1.7M                     14h  ACT  WAI  System I/O  *test excl. non-paren  24m
 1    22     6342 SOE       JDBC Thin  DED  NCDB    3.2M                       0  ACT  WAI  Commit      log file sync          12m
 2   247     9941 B/G       LGWR       DED  SYS$BAC 1.6M                     14h  ACT  WAI  Other       LGWR any worker group  12m
 1   265     6344 SOE       JDBC Thin  DED  NCDB    5.3M  7ws837zynp1zv SEL    0  ACT  I/O  User I/O    direct path read        5m
 1   257     6340 SOE       JDBC Thin  DED  NCDB    5.5M  8zz6y2yzdqjp0 SEL    0  ACT  CPU  User I/O    cpu runqueue            4m
 2    46    11208 SOE       JDBC Thin  DED  NCDB    6.4M  7ws837zynp1zv SEL    0  ACT  I/O  User I/O    direct path read        3m
 2   274    11206 SOE       JDBC Thin  DED  NCDB    5.4M  7ws837zynp1zv SEL    0  ACT  I/O  User I/O    direct path read        2m

Nice little overview :) But I’m digressing…

OSWatcher

There are two ways for TFA to access OSWatcher information:

  1. Using the analyze command to provide a summary view
  2. Invoking OSWatcher directly

The first option provides a nice overview. I’ve been running swingbench on the system with far too many users, which you can see here:

[oracle@rac12sbnode1 ~]$ tfactl analyze -comp osw -since 1h
INFO: analyzing host: rac12sbnode1

                     Report title: OSW top logs
                Report date range: last ~1 hour(s)
       Report (default) time zone: GMT - Greenwich Mean Time
              Analysis started at: 02-Aug-2016 03:27:14 PM BST
            Elapsed analysis time: 0 second(s).
               Configuration file: /u01/app/12.1.0.2/grid/tfa/rac12sbnode1/tfa_home/ext/tnt/conf/tnt.prop
              Configuration group: osw
                        Parameter: 
              Total osw rec count:            174, from 02-Aug-2016 02:00:16 PM BST to 02-Aug-2016 03:26:54 PM BST
OSW recs matching last ~1 hour(s):            120, from 02-Aug-2016 02:27:19 PM BST to 02-Aug-2016 03:26:54 PM BST
                        statistic: t     first   highest   (time)   lowest   (time)  average  non zero  3rd last  2nd last      last  trend
            top.loadavg.last01min:        3.91      5.48 @02:32PM     2.08 @02:46PM     3.42       103      3.21      2.91      2.64   -32%
            top.loadavg.last05min:        3.10      3.91 @02:32PM     3.05 @02:53PM     3.42       103      3.54      3.43      3.32     7%
            top.loadavg.last15min:        3.13      3.59 @03:22PM     3.13 @02:27PM     3.37       103      3.53      3.49      3.45    10%
                top.tasks.running:           5         9 @02:34PM        1 @02:27PM        3       119         2         1         2   -60%
               top.tasks.sleeping:         316       325 @02:40PM      315 @02:49PM      320       119       320       321       320     1%

INFO: analyzing host: rac12sbnode2

                     Report title: OSW top logs
                Report date range: last ~1 hour(s)
       Report (default) time zone: GMT - Greenwich Mean Time
              Analysis started at: 02-Aug-2016 03:27:15 PM BST
            Elapsed analysis time: 0 second(s).
               Configuration file: /u01/app/12.1.0.2/grid/tfa/rac12sbnode2/tfa_home/ext/tnt/conf/tnt.prop
              Configuration group: osw
                        Parameter: 
              Total osw rec count:            174, from 02-Aug-2016 02:00:14 PM BST to 02-Aug-2016 03:26:52 PM BST
OSW recs matching last ~1 hour(s):            120, from 02-Aug-2016 02:27:16 PM BST to 02-Aug-2016 03:26:52 PM BST
                        statistic: t     first   highest   (time)   lowest   (time)  average  non zero  3rd last  2nd last      last  trend
            top.loadavg.last01min:        2.75      5.73 @02:40PM     2.23 @02:52PM     3.44       111      3.91      4.31      3.88    41%
            top.loadavg.last05min:        2.78      4.16 @02:41PM     2.78 @02:27PM     3.40       111      3.52      3.65      3.67    32%
            top.loadavg.last15min:        2.93      3.60 @03:13PM     2.93 @02:27PM     3.32       111      3.49      3.53      3.55    21%
                top.tasks.running:           2         8 @03:10PM        1 @03:00PM        3       120         2         2         1   -50%

[oracle@rac12sbnode1 ~]$

As you can imagine the system is somewhat overloaded. The minimum interval to report on seems to be 1 hour:

[oracle@rac12sbnode1 ~]$ tfactl analyze -comp osw -since 5m

ERROR: Invalid value for -since. Supported values are n<h|d>

The analyze command can do a lot more, make sure to have a look at the documentation to find out more.

But you can run OSWatcher directly as well:

[oracle@rac12sbnode1 ~]$ tfactl oswbb -h

Usage : /u01/app/12.1.0.2/grid/tfa/rac12sbnode1/tfa_home/bin/tfactl.pl oswbb \
          [<OSWatcher Analyzer Options> | -since n[mhd] ]

Options: 

-since n[mhd] Run OSWatcher analyzer for last n [m]inutes or [h]ours or [d]ays.

<OSWatcher Analyzer Options>: -P <name> -L <name> -6 -7 -8 -B <time> -E <time> -A 
     -P <profile name>  User specified name of the html profile generated
                        by oswbba. This overrides the oswbba automatic naming
                        convention for html profiles. All profiles
                        whether user specified named or auto generated
                        named will be located in the /profile directory.

     -A <analysis name> Same as option A from the menu. Will generate
                        an analysis report in the /analysis directory or
                        user can also specify the name of the analysis file
                        by specifying full qualified path name of file.
                        The "A" option can not be used together with the
                        "S" option.
     -S <>              Will generate an analysis of a subset of the data
                        in the archive directory. This option must be used
                        together with the -b and -e options below. See the
                        section "Specifying the begin/end time of the analysis"
                        above. The "S" option can not be used together with
                        the "A" option.

     -START <filename>  Used with the analysis option to specify the first
                        file located in the oswvmstat directory to analyze.

     -STOP <filename>   Used with the analysis option to specify the last
                        file located in the oswvmstat directory to analyze.

     -b <begin time>    Used with the -S option to specify the begin time
                        of the analysis period. Example format:
                        -b Jan 09 13:00:00 2013

     -e <end time>      Used with the -S option to specify the end time
                        of the analysis period. Example format:
                        -e Jan 09 13:15:00 2013

     -L <location name> User specified location of an existing directory
                        to place any gif files generated
                        by oswbba. This overrides the oswbba automatic
                        convention for placing all gif files in the
                        /gif directory. This directory must pre-exist!
     -6                 Same as option 6 from the menu. Will generate
                        all cpu gif files.


     -7                 Same as option 7 from the menu. Will generate
                        all memory gif files.

     -8                 Same as option 8 from the menu. Will generate
                        all disk gif files.



     -NO_IOSTAT         Ignores files in the oswiostat directory from
                        analysis

     -NO_TOP            Ignores files in the oswtop directory from
                        analysis

     -NO_NETSTAT        Ignores files in the oswnetstat directory from
                        analysis

     -NO_PS             Ignores files in the oswps directory from
                        analysis

     -MEM_ALL           Analyzes virtual and resident memory allocations
                        for all processes. This is very resource intensive.

     -NO_Linux          Ignores files in the oswmeminfo directory from
                        analysis

e.g:
   /u01/app/12.1.0.2/grid/tfa/rac12sbnode1/tfa_home/bin/tfactl.pl oswbb
   /u01/app/12.1.0.2/grid/tfa/rac12sbnode1/tfa_home/bin/tfactl.pl oswbb -since 2h

[oracle@rac12sbnode1 ~]$

Those look quite similar to the ones I have shown you in my previous post about OSWatcher, so I won’t go into detail. Here is an example, note how I can specify the last 10 minutes:

[oracle@rac12sbnode1 ~]$ tfactl oswbb -since 10m

Validating times in the archive...

Warning. The end date you entered is not contained in the archive directory
The end date you entered is:     Tue Aug 02 15:39:06 BST 2016
The last date in the archive is: Tue Aug 02 15:38:55 BST 2016
Defaulting to using the last date in the archive

Scanning file headers for version and platform info...


Parsing file rac12sbnode1_iostat_16.08.02.1500.dat ...


Parsing file rac12sbnode1_vmstat_16.08.02.1500.dat ...




Parsing file rac12sbnode1_top_16.08.02.1500.dat ...


Parsing file rac12sbnode1_ps_16.08.02.1500.dat ...

...

After the analysis has completed, the report is opened in a pager and shown.

This report is best viewed in a fixed font editor like textpad...

OSWatcher Analyzer

Input Archive:       /u01/app/oracle/tfa/repository/suptools/rac12sbnode1/oswbb/oracle/archive
Archive Source Dest: /u01/app/oracle/tfa/repository/suptools/rac12sbnode1/oswbb/oracle/archive
Archive Start Time:  Aug 2 15:28:54 2016
Archive Stop Time:   Aug 2 15:38:55 2016
Hostname:            RAC12SBNODE1
OS Version:          Linux
Snapshot Freq:       30
CPU COUNT:           2

############################################################################
# Contents Of This Report:
#
# Section 1: System Status
# Section 2: System Slowdowns 
#   Section 2.1: System Slowdown RCA Process Level Ordered By Impact
# Section 3: System General Findings
# Section 4: CPU Detailed Findings
#   Section 4.1: CPU Run Queue:
#   Section 4.2: CPU Utilization: Percent Busy
#   Section 4.3: CPU Utilization: Percent Sys
# Section 5: Memory Detailed Findings
#   Section 5.1: Memory: Process Swap Queue 
#   Section 5.2: Memory: Scan Rate 
#   Section 5.3  Memory: Page In: 
#   Section 5.4  Memory: Page Tables (Linux only): 
#   Section 5.5: Top 5 Memory Consuming Processes Beginning
#   Section 5.6: Top 5 Memory Consuming Processes Ending
# Section 6: Disk Detailed Findings
#   Section 6.1: Disk Percent Utilization Findings
#   Section 6.2: Disk Service Times Findings
#   Section 6.3: Disk Wait Queue Times Findings
#   Section 6.4: Disk Throughput Findings
#   Section 6.5: Disk Reads Per Second
#   Section 6.6: Disk Writes Per Second

...

Summary

TFA really is a very useful tool, and this is not only due to the integration of OSWatcher. A lot of useful information that is beyond the scope of this article is available, and the search function is quite invaluable when trying to hunt down problems in your cluster. Maybe I’ll dedicate another post to that at some later time …

Posted in Linux | Tagged: , | 4 Comments »

Preventing a SPOF with CMAN 12c

Posted by Martin Bach on July 25, 2016

I have already written about the use of Connection Manager 11.2 to govern access to a database. While researching this piece I have updated the original post so that it’s relevant for 12c as well.

Although the idea of using Connection Manager is tempting, the obvious first question is about high availability of the process, you don’t want it to become a Single Point of Failure (SPOF). After all, if the gatekeeper to your database environment fails, you are effectively closed for business. One option would be to add another CMAN process that could be used.

Lab Setup

I have created a few VMs to simulate the scenario. I installed CMAN in 2 VMs running Oracle Linux 7 and CMAN 12c. The client homes have been patched up to the April PSU because that’s considered good practice (although I couldn’t find a reference to CMAN in the OPatch logs). The hosts are named oracledev and server4.

I have a 4 node RAC 12c database named NCDB to which the CMAN rules provide access. The SCAN is named rac12pri-scan.

And finally I have an Oracle 12c installation on server3 which will act as the client.

CMAN configuration

The CMAN configuration is quite straight forward. Following my earlier post’s instructions I created the following $ORACLE_HOME/network/cman.ora on oracledev:

[oracle@oracledev ~]$ cat $ORACLE_HOME/network/admin/cman.ora
cman1 =
  (configuration=
    (address=(protocol=tcp)(host=oracledev)(port=1521))
    (rule_list=
      (rule=(src=oracledev)(dst=127.0.0.1)(srv=cmon)(act=accept))
      (rule=(src=*)(dst=rac12pri-scan)(srv=NCDB)(act=accept))
    )
  )

Please note that the configuration file has been kept at its bare minimum to keep the discussion simple. You should review the net services reference guide for a list of available parameters and how to secure the system.

The configuration on host server4 is identical except that the CMAN host has changed. Using that information I can start the CMAN process:

[oracle@server4 ~]$ cmctl startup -c cman1

CMCTL for Linux: Version 12.1.0.2.0 - Production on 06-JUL-2016 13:38:32

Copyright (c) 1996, 2014, Oracle.  All rights reserved.

Current instance cman1 is not yet started
Connecting to (address=(protocol=tcp)(host=server4)(port=1521))
Starting Oracle Connection Manager instance cman1. Please wait...
TNS-04077: WARNING: No password set for the Oracle Connection Manager instance.
CMAN for Linux: Version 12.1.0.2.0 - Production
Status of the Instance
----------------------
Instance name             cman1
Version                   CMAN for Linux: Version 12.1.0.2.0 - Production
Start date                06-JUL-2016 13:38:32
Uptime                    0 days 0 hr. 0 min. 9 sec
Num of gateways started   2
Average Load level        0
Log Level                 SUPPORT
Trace Level               OFF
Instance Config file      /u01/app/oracle/product/12.1.0.2/client_1/network/admin/cman.ora
Instance Log directory    /u01/app/oracle/diag/netcman/server4/cman1/alert
Instance Trace directory  /u01/app/oracle/diag/netcman/server4/cman1/trace
The command completed successfully.
[oracle@server4 ~]$ 

I can start the second one the same way.

Client configuration

So this is all fair and good, but how can I configure the client to connect to the database? I would like two things to happen:

  • Connect time load balancing. In other words, don’t hammer just one of the two CMAN processes with incoming connection requests while the second one is idling around
  • Ensure that the alternative CMAN is tried in case one of the two systems dies

Things seemed a bit more complicated because I have to use SOURCE_ROUTE to indicate to the client that a connection manager process is involved. For instance, if I have a single CMAN process, I would use the following entry in tnsnames.ora:

NCDB_1CMAN =
 (description=
   (address_list=
     (source_route=yes)
     (address=(protocol=tcp)(host=server4)(port=1521))
     (address=(protocol=tcp)(host=rac12pri-scan)(port=1521))
   )
   (connect_data=
    (service_name=NCDB)
   )
 )

Relying on connection manager on “server4” I connect through to the NCDB database on my RAC system.

[oracle@server3 tns]$ sqlplus martin@NCDB_1CMAN

SQL*Plus: Release 12.1.0.2.0 Production on Wed Jul 6 13:44:46 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Enter password: 
Last Successful login time: Wed Jul 06 2016 13:44:34 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL> select count(*) from v$active_instances;

  COUNT(*)
----------
         4

But what happens if server4 is not available? Well you guessed right:

[oracle@server3 tns]$ sqlplus martin@NCDB_1CMAN

SQL*Plus: Release 12.1.0.2.0 Production on Wed Jul 6 13:52:51 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Enter password: 
ERROR:
ORA-12541: TNS:no listener

Enter user-name: 

This is the single point of failure I would like to prevent.

Using Two CMAN processes in the tnsnames.ora file

After a bit of digging around on My Oracle Support I gave up and used my search engine skills, and that proved more effective. The Oracle Database Net Services References has the correct example in chapter 6, which looks as follows after I applied it to my environment:

NCDB_2CMANS =
  (description =
    (load_balance = on)
    (failover = on)
    (address_list =                   # first CMAN on server4
      (source_route = yes)
      (address = (protocol = tcp)(host = server4)(port = 1521))
      (address=(protocol=tcp)(host=rac12pri-scan)(port=1521))
    )
    (address_list =                   # second CMAN on oracledev
      (source_route = yes)
      (address = (protocol = tcp)(host = oracledev)(port = 1521))
      (address=(protocol=tcp)(host=rac12pri-scan)(port=1521))
    )
    (connect_data =
      (service_name = NCDB)
      (server = dedicated)
    )
  )

You read this as follows:

  • The service name entry starts with the DESCRIPTION tag.
  • On the top level LOAD_BALANCE and FAILOVER are enabled. This allows client side load balancing plus failover to the next CMAN entry
  • Each CMAN host has its own ADDRESS_LIST. Essentially it’s the NCDB_1CMAN entry with the SOURCE_ROUTE, CMAN host and SCAN
  • There is nothing different in the CONNECT_DATA section compared with the previous, single CMAN-entry

The Result

As a result, I can connect to the database now – I haven’t got a CMAN process on server4 but there is one on oracledev:

[oracle@server4 ~]$ cmctl show status -c cman1

CMCTL for Linux: Version 12.1.0.2.0 - Production on 06-JUL-2016 14:17:16

Copyright (c) 1996, 2014, Oracle.  All rights reserved.

Current instance cman1 is not yet started
Connecting to (address=(protocol=tcp)(host=server4)(port=1521))
TNS-04011: Oracle Connection Manager instance not yet started.
[oracle@server4 ~]$ 

[oracle@oracledev ~]$ cmctl show status -c cman1

CMCTL for Linux: Version 12.1.0.2.0 - Production on 06-JUL-2016 14:17:45

Copyright (c) 1996, 2014, Oracle.  All rights reserved.

TNS-04077: WARNING: No password set for the Oracle Connection Manager instance.
Current instance cman1 is already started
Connecting to (address=(protocol=tcp)(host=oracledev)(port=1521))
Status of the Instance
----------------------
Instance name             cman1
Version                   CMAN for Linux: Version 12.1.0.2.0 - Production
Start date                06-JUL-2016 13:18:19
Uptime                    0 days 0 hr. 59 min. 26 sec
Num of gateways started   2
Average Load level        0
Log Level                 SUPPORT
Trace Level               OFF
Instance Config file      /u01/app/oracle/product/12.1.0.2/client_1/network/admin/cman.ora
Instance Log directory    /u01/app/oracle/diag/netcman/oracledev/cman1/alert
Instance Trace directory  /u01/app/oracle/diag/netcman/oracledev/cman1/trace
The command completed successfully.
[oracle@oracledev ~]$ 

A connection can be established:

[oracle@server3 tns]$ sqlplus martin@NCDB_2CMANS

SQL*Plus: Release 12.1.0.2.0 Production on Wed Jul 6 14:18:13 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Enter password: 
Last Successful login time: Wed Jul 06 2016 13:44:49 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL> select count(*) from v$active_instances;

  COUNT(*)
----------
         4

SQL>

Using an approach like this allows me to connect to the database even if one of the two CMAN systems are down. But what about load balancing? This appears to work as well, tailing the cman1.log file I can see that out of 10 sessions I requested in a bash for-loop they have been evenly spread across both CMAN hosts:

-- server4

06-JUL-2016 13:21:49 * (connect_data=(service_name=NCDB)(server=dedicated)(CID=(PROGRAM=sqlplus)
  (HOST=server3.example.com)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.53)(PORT=19796)) * establish * NCDB * 0
06-JUL-2016 13:21:49 * (connect_data=(service_name=NCDB)(server=dedicated)(CID=(PROGRAM=sqlplus)
  (HOST=server3.example.com)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.53)(PORT=19798)) * establish * NCDB * 0
06-JUL-2016 13:21:50 * (connect_data=(service_name=NCDB)(server=dedicated)(CID=(PROGRAM=sqlplus)
  (HOST=server3.example.com)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.53)(PORT=19800)) * establish * NCDB * 0
06-JUL-2016 13:21:50 * (connect_data=(service_name=NCDB)(server=dedicated)(CID=(PROGRAM=sqlplus)
  (HOST=server3.example.com)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.53)(PORT=19801)) * establish * NCDB * 0
06-JUL-2016 13:21:50 * (connect_data=(service_name=NCDB)(server=dedicated)(CID=(PROGRAM=sqlplus)
  (HOST=server3.example.com)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.53)(PORT=19802)) * establish * NCDB * 0

-- oracledev

06-JUL-2016 13:21:50 * (connect_data=(service_name=NCDB)(server=dedicated)(CID=(PROGRAM=sqlplus)
  (HOST=server3.example.com)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.53)(PORT=22637)) * establish * NCDB * 0
06-JUL-2016 13:21:50 * (connect_data=(service_name=NCDB)(server=dedicated)(CID=(PROGRAM=sqlplus)
  (HOST=server3.example.com)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.53)(PORT=22639)) * establish * NCDB * 0
06-JUL-2016 13:21:50 * (connect_data=(service_name=NCDB)(server=dedicated)(CID=(PROGRAM=sqlplus)
  (HOST=server3.example.com)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.53)(PORT=22643)) * establish * NCDB * 0
06-JUL-2016 13:21:51 * (connect_data=(service_name=NCDB)(server=dedicated)(CID=(PROGRAM=sqlplus)
  (HOST=server3.example.com)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.53)(PORT=22644)) * establish * NCDB * 0
06-JUL-2016 13:21:51 * (connect_data=(service_name=NCDB)(server=dedicated)(CID=(PROGRAM=sqlplus)
  (HOST=server3.example.com)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.53)(PORT=22645)) * establish * NCDB * 0

So both boxes ticked.

Posted in 12c Release 1, Linux | Tagged: , | Leave a Comment »