Martins Blog

Trying to explain complex things in simple terms

Archive for the ‘12c Release 1’ Category

Creating a RAC 12.1 Data Guard Physical Standby environment (2)

Posted by Martin Bach on January 13, 2017

In the first part of this mini-series you saw me define the environment as well as creating a primary database. With that out of the way it’s time to think about the standby. Before the standby can be created, a few preparations are necessary both on the primary as well as the standby cluster.

NOTE: As always, this is just a demonstration using VMs in my lab, based on my notes. Your system is most likely different, so in real-life you might take a different approach. The techniques I am using here were suitable for me, and my own small scale testing. I tried to make sure they are valid, but you may want to allocate more resources in your environment. Test, test, test on your own environment on test kit first!

Preparing the Creation of the Standby Database

It’s easier to follow the discussion if you keep in mind that my primary database is NCDBA and the standby is NCDBB.

Step 1: Listener Configuration

The first step to be completed is the network setup. I am planning on using an active database duplication which requires my node listeners on the cluster to “know” about my database instances. Thankfully since RAC 12.1 we don’t need to register the *_DGMGRL service (used for switchover operations by the Broker) anymore, making life a bit easier. The static registration of databases may not be necessary if you are using a different RMAN duplicate approach.

By default Oracle does not statically register databases, so it’s up to you to do so. On rac12sec1 I added a SID_LIST_LISTENER section (note: only the node listener!) to /u01/app/12.1.0.2/grid/network/admin/listener.ora. Remember that the node listeners reside in the GRID_HOME.

...
SID_LIST_LISTENER =
  (SID_LIST =
...
    (SID_DESC =
      (GLOBAL_DBNAME = NCDBB)
      (ORACLE_HOME = /u01/app/oracle/product/12.1.0.2/dbhome_1)
      (SID_NAME = NCDBB1)
    )
...
  )
...

Likewise, I added a static registration to rac12sec2 – identical to the previous example except for the ORACLE_SID, which is NCDBB2 of course. After which I reloaded the listeners on both nodes.

Step 2: Service Name Configuration

The next file to change is the RDBMS home’s tnsnames.ora. To make sure I cover all eventualities I created the following tnsnames.ora on all cluster nodes, primary and standby RAC

[oracle@rac12pri1 ~]$ cat $ORACLE_HOME/network/admin/tnsnames.ora 
# tnsnames.ora Network Configuration File: /u01/app/oracle/product/12.1.0.2/dbhome_1/network/admin/tnsnames.ora
# Generated by Oracle configuration tools.

NCDBA =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = rac12pri-scan)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = NCDBA)
    )
  )

NCDBB =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = rac12sec-scan)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = NCDBB)
    )
  )

# only used for RMAN duplicate ... from active database
NCDBA_DUP =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = rac12pri1-vip)(PORT = 1521))
      (ADDRESS = (PROTOCOL = TCP)(HOST = rac12pri2-vip)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = NCDBA)
    )
  )

NCDBB_DUP =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = rac12sec1-vip)(PORT = 1521))
      (ADDRESS = (PROTOCOL = TCP)(HOST = rac12sec2-vip)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = NCDBB)
    )
  )

The first two entires are self-explanatory. The second pair requires a little more explanation. Since I have statically registered my standby database with the node listeners only (eg not with the SCAN listeners-I never tried and don’t like the thought of that) the duplicate would fail trying to restart the auxiliary instance. I haven’t captured the output of the failed RMAN duplicate but it’s possible to show you the effect of using the “wrong” tnsname with a simple test (the database I want to connect to is of course down)

[oracle@rac12pri1 ~]$ sqlplus sys@NCDBB as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Wed Dec 14 14:36:49 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Enter password: 
ERROR:
ORA-12514: TNS:listener does not currently know of service requested in connect
descriptor


Enter user-name: 
[oracle@rac12pri1 ~]$ sqlplus sys@NCDBB_DUP as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Wed Dec 14 14:36:59 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Enter password: 
Connected to an idle instance.

SQL> 

So it seems to be better to use the net service name where I can be sure the database can be controlled from a remote session :)

NB: this test required a password file for the standby database which doesn’t exist yet and will be created in part 3.

Step 3: Modify the oratab file

You also should change the oratab and add NCDBB on both standby hosts, as shown here for and rac12sec2.

[oracle@rac12sec2 ~]$ grep NCDBB /etc/oratab
NCDBB:/u01/app/oracle/product/12.1.0.2/dbhome_1:N

Step 4: Prepare the Primary Database: Archivelog mode, Force Logging, and maybe even Flashback Database

There are a number of tasks to be completed on the primary database before I can start duplicating it for a standby environment. I need to make sure it’s in Archivelog mode, and that force logging is enabled. If I can afford to take the overhead, I enable Flashback Database as well to help reinstating a failed primary should that need arise. Of course, you don’t just enable Flashback Database, because it can have a negative impact on performance, and the Flashback logs will use space. Your FRA space usage calculation can be derailed if you are not careful. Thorough testing is needed to assess whether it is ok to operate with Flashback Database enabled.

The first task in my scenario is to enable Archivelog mode and to enforce logging.

[oracle@rac12pri1 ~]$ sq

SQL*Plus: Release 12.1.0.2.0 Production on Wed Dec 14 09:52:11 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL> archive log list
Database log mode              No Archive Mode
Automatic archival             Disabled
Archive destination            USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence     15
Current log sequence           16
SQL> exit

...

[oracle@rac12pri1 ~]$ srvctl stop database -d NCDBA
[oracle@rac12pri1 ~]$ sq

SQL*Plus: Release 12.1.0.2.0 Production on Wed Dec 14 10:00:34 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup mount
ORACLE instance started.

Total System Global Area 3221225472 bytes
Fixed Size                  2929552 bytes
Variable Size             838863984 bytes
Database Buffers         2365587456 bytes
Redo Buffers               13844480 bytes
Database mounted.
SQL> alter database archivelog;

Database altered.

SQL> alter database force logging;

Database altered.

SQL>  alter database open;

Database altered.

SQL> exit

It is easy to check if that was a success:

[oracle@rac12pri1 ~]$ srvctl start instance -db NCDBA -node rac12pri2

[oracle@rac12pri1 ~]$ sq

SQL*Plus: Release 12.1.0.2.0 Production on Wed Dec 14 15:04:28 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL> archive log list
Database log mode              Archive Mode
Automatic archival             Enabled
Archive destination            USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence     15
Next log sequence to archive   16
Current log sequence           16

Step 5: Add Standby Redo Logs

In order to use Real Time Apply you need Standby Redo Logs. If you create them now (eg before the RMAN duplicate) you are saving time. First of all you don’t need to create Standby Redo Logs (SRL) post switchover when NCDBA operates as a standby database. Secondly, if there are SRLs in place, they are automatically created on the standby database as part of the RMAN duplicate.

You need n+1 SRLs per thread. You can work the number out by looking at v$log. My tiny database has 2 threads with 2 members each – ok for the lab, almost certainly not ok for anything serious. Helps me keep storage footprint in the lab low though.

SQL> select * from v$log;

    GROUP#    THREAD#  SEQUENCE#      BYTES  BLOCKSIZE    MEMBERS ARC STATUS           FIRST_CHANGE# FIRST_TIM NEXT_CHANGE# NEXT_TIME     CON_ID
---------- ---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- --------- ------------ --------- ----------
         1          1         15   52428800        512          2 YES INACTIVE               1621242 14-DEC-16      1621618 14-DEC-16          0
         2          1         16   52428800        512          2 NO  CURRENT                1621618 14-DEC-16   2.8147E+14                    0
         3          2          1   52428800        512          2 YES INACTIVE               1620677 14-DEC-16      1628674 14-DEC-16          0
         4          2          2   52428800        512          2 NO  CURRENT                1629454 14-DEC-16   2.8147E+14 14-DEC-16          0

SQL> 

With that information I can create 3 SRLs per thread. Did I mention that I love Oracle Managed Files?

SQL> alter database add standby logfile thread 1 size 52428800;

Database altered.

SQL> /

Database altered.

SQL> /

Database altered.

SQL> alter database add standby logfile thread 2 size 52428800;

Database altered.

SQL> /

Database altered.

SQL> /

Database altered.

SQL> select group#,thread#,sequence#,bytes,status from v$standby_log;

    GROUP#    THREAD#  SEQUENCE#      BYTES STATUS
---------- ---------- ---------- ---------- ----------
         5          1          0   52428800 UNASSIGNED
         6          1          0   52428800 UNASSIGNED
         7          1          0   52428800 UNASSIGNED
         8          2          0   52428800 UNASSIGNED
         9          2          0   52428800 UNASSIGNED
        10          2          0   52428800 UNASSIGNED

6 rows selected.

The primary database should be in relatively good shape now – Flashback Database is off to simulate the case where my fictional load is considered to impact performance too much.

SQL> select name,db_unique_name,database_role,log_mode,force_logging,flashback_on from v$database;

NAME      DB_UNIQUE_NAME  DATABASE_ROLE    LOG_MODE     FORCE_LOGGING  FLASHBACK_ON
--------- --------------- ---------------- ------------ -------------- ------------------
NCDBA     NCDBA           PRIMARY          ARCHIVELOG   YES            NO

SQL>

Summary

This concludes part 2. In this blog post I shared my notes about configuration the network for RMAN duplicate … from active database. I also configured the primary database in preparation for the initial duplicate command.

Note that I’m not setting any initialisation parameters related to Data Guard: over time I have become a great fan of the Data Guard Broker and plan on using that instead of a manual configuration.

Posted in 12c Release 1, Data Guard, Linux | 4 Comments »

Creating a RAC 12.1 Data Guard Physical Standby environment (1)

Posted by Martin Bach on December 14, 2016

I have just realised that the number of posts about RAC 12c Release 1 on this blog is rather too small. And since I’m a great fan of RAC this has to change :) In this mini-series I am going to share my notes about creating a Data Guard setup on my 2 node 12.1.0.2.161018 RAC primary + identical 2 node RAC standby system in the lab.

NOTE: As always, this is just a demonstration using VMs in my lab, based on my notes. Your system is most likely different, so in real-life you might take a different approach. The techniques I am using here were suitable for me, and my own small scale testing. I tried to make sure they are valid, but you may want to allocate more resources in your environment. Test, test, test on your own environment on test kit first!

The lab Environment

My environment consists of the following entities:

  • Primary Site
    • Oracle Linux 7.2 – UEK3 x86_64
    • Two RAC nodes: rac12pri1 and rac12pri2
    • Grid Infrastructure 12.1.0.2.161018 in /u01/app/12.1.0.2/grid
    • Single RDBMS home 12.1.0.2.161018 in /u01/app/oracle/product/12.1.0.2/dbhome_1
    • Non-container database NCDBA is administrator managed and running on both nodes
  • Standby Site
    • Oracle Linux 7.2 – UEK3 x86_64
    • Two RAC nodes: rac12sec1 and rac12sec2
    • Grid Infrastructure 12.1.0.2.161018 in /u01/app/12.1.0.2/grid
    • Single RDBMS home 12.1.0.2.161018 in /u01/app/oracle/product/12.1.0.2/dbhome_1
    • Standby database NCDBB is administrator managed and running on both nodes

I won’t focus on the creation of the RAC systems, I may have covered some of it in earlier blog posts and of course in the RAC Book.

I have deliberately kept it simple. Although most systems in real life use a dedicated (set of) NICs for Data Guard traffic I decided against it-I didn’t want attention being drawn away from the Big Picture. Similarly I am not touching on the option to create a second SCAN that Oracle allows us to create from 12.1 onwards. If you are interested in these topics kindly refer to my other blog posts.

Creation of the Primary Database

After both RAC systems are set up it’s time to start with the creation of the primary database. This is easy:

dbca -silent -createDatabase -templateName RACDB.dbc \
-gdbName NCDBA -sysPassword ... -systemPassword ... -storageType ASM \
-diskGroupName DATA -recoveryGroupName RECO -sampleSchema true \
-totalMemory 4096 -dbsnmpPassword ... -nodeinfo rac12pri1,rac12pri2

The template referenced in “-templateName” is my own – I always create templates to be license compliant. I covered how to create your custom database template on this blog as well.

I won’t go into detail here about the naming of my databases in a Data Guard configuration. What I learned the hard way was not to use a DB_UNIQUE_NAME that reflects the role. Imagine everyone’s surprise when they connect to a database named STDBY operating in the primary role after a switchover… For lack of better ideas I went ahead and enumerated the databases: my primary database is NCDBA and the standby is NCDBB.

After the database is created, it is started automatically by DBCA.

[oracle@rac12pri1 ~]$ srvctl status database -db NCDBA
Instance NCDBA1 is running on node rac12pri1
Instance NCDBA2 is running on node rac12pri2
[oracle@rac12pri1 ~]$

However, the newly created database isn’t patched (this is a known issue documented on Mike Dietrich’s blog for example).

SQL> select name from v$database;

NAME
---------
NCDBA

SQL> select count(*) from dba_registry_sqlpatch;

  COUNT(*)
----------
         0

No way around it – time to call datapatch:

SQL> alter system set cluster_database=false scope=spfile sid='*';

System altered.

SQL> exit

...

[oracle@rac12pri1 ~]$ srvctl stop database -db NCDBA
[oracle@rac12pri1 ~]$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Wed Dec 14 13:39:04 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup upgrade
ORACLE instance started.

Total System Global Area 3221225472 bytes
Fixed Size                  2929552 bytes
Variable Size             771755120 bytes
Database Buffers         2432696320 bytes
Redo Buffers               13844480 bytes
Database mounted.
Database opened.
SQL>  exit

...

[oracle@rac12pri1 ~]$ cd $ORACLE_HOME/OPatch/
[oracle@rac12pri1 OPatch]$ ./datapatch -verbose
SQL Patching tool version 12.1.0.2.0 on Wed Dec 14 13:08:51 2016
Copyright (c) 2016, Oracle.  All rights reserved.

Log file for this invocation: /u01/app/oracle/cfgtoollogs/sqlpatch/sqlpatch_16313_2016_12_14_13_08_51/sqlpatch_invocation.log

Connecting to database...OK
Bootstrapping registry and package to current versions...done
Determining current state...done

Current state of SQL patches:
Patch 24315824 (Database PSU 12.1.0.2.161018, Oracle JavaVM Component (OCT2016)):
  Installed in the binary registry only
Bundle series DBBP:
  ID 161018 in the binary registry and not installed in the SQL registry

Adding patches to installation queue and performing prereq checks...
Installation queue:
  Nothing to roll back
  The following patches will be applied:
    24315824 (Database PSU 12.1.0.2.161018, Oracle JavaVM Component (OCT2016))
    24340679 (DATABASE BUNDLE PATCH: 12.1.0.2.161018 (24340679))

Installing patches...
Patch installation complete.  Total patches installed: 2

Validating logfiles...
Patch 24315824 apply: SUCCESS
  logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/24315824/20676808/24315824_apply_NCDBA_2016Dec14_13_09_26.log (no errors)
Patch 24340679 apply: SUCCESS
  logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/24340679/20713212/24340679_apply_NCDBA_2016Dec14_13_09_30.log (no errors)
SQL Patching tool complete on Wed Dec 14 13:15:08 2016
[oracle@rac12pri1 OPatch]$

This concludes part 1 – the database is now set up and running on the primary cluster. In the next part I’m going to describe how to prepare the primary and standby cluster for the Data Guard setup.

Posted in 12c Release 1, Data Guard, Linux, RAC | 1 Comment »

GTT and Smart Scan – the importance of the append hint

Posted by Martin Bach on December 5, 2016

While writing the previous two posts about GTTs and Smart Scan, I stumbled across an interesting observation. When I started putting my thoughts to (virtual) paper, I thought it was going to be a short post. Well, it wasn’t to be, again. Anyway, you might find this interesting.

Initial Situation

If you read the previous posts this code example I used to populate the GTT might look familiar:

insert /*+ append */ into gtt select * from t4 where rownum < 400000;
 
commit;

In situations like this where I’m moving data I developed a habit of using the append hint. I guess I’ve been working on HCC tables a bit too much and therefore didn’t even make a conscious decision to use that hint. It was the right thing to do, as you will see next.

The full test harness is shown here again for reference:

SQL> !cat test.sql
set lines 120 tab off trimspool on verify off timing on

-- this statement is using the /*+ append */ hint
insert /*+ append using_append */ into gtt select * from t4 where rownum < 400000;

commit;

-- not testing if I can get Smart Scans, I know there are :)
alter session set "_serial_direct_read" = always;

-- make sure the query can be ever so slightly more complex and always return rows
col min_id new_v min_id
col max_id new_v max_id

select min(id) min_id from gtt;
select max(id) max_id from gtt;

-- this is the actual test
@mystats start s=s
set echo on
select /* using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;
set echo off
@mystats stop t=1

SQL>

When executing the script and looking at the output I didn’t see anything out of the ordinary. When checking for the “depth” of the Smart Scan, I regularly consult the cell blocks processed by % layer statistics to see if I lose blocks between code layers. It wouldn’t appear so:

------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
...
STAT    cell IO uncompressed bytes                                             546,136,064
STAT    cell blocks processed by cache layer                                        66,667
STAT    cell blocks processed by data layer                                         66,667
STAT    cell blocks processed by txn layer                                          66,667
...
STAT    cell physical IO bytes eligible for predicate offload                  546,136,064
STAT    cell physical IO interconnect bytes                                        112,504
STAT    cell physical IO interconnect bytes returned by smart scan                  96,120
...
STAT    physical read IO requests                                                      523
STAT    physical read bytes                                                    546,152,448
STAT    physical read requests optimized                                               317
STAT    physical read total IO requests                                                523
STAT    physical read total bytes                                              546,152,448
STAT    physical read total bytes optimized                                    330,153,984
STAT    physical read total multi block requests                                       521
STAT    physical reads                                                              66,669
STAT    physical reads cache                                                             2
STAT    physical reads direct                                                       66,667
...
STAT    table scans (direct read)                                                        1
STAT    table scans (long tables)                                                        1

66,667 blocks in each code layer. This looks ok to me: the number of all physical reads recorded for this statement is 66,669. I certainly don’t want to endorse tuning by ratio, but in this case it sort-of makes sense: the offload efficiency pretty high in my case:

SQL> @fsx.sql
Enter value for sql_text:
Enter value for sql_id: a4jrkbnsfgk1j

SQL_ID         CHILD  PLAN_HASH  EXECS  AVG_ETIME AVG_PX OFFLOAD IO_SAVED_% SQL_TEXT
------------- ------ ---------- ------ ---------- ------ ------- ---------- ----------------------------------------------------------------------
a4jrkbnsfgk1j      0 2363333961      1        .01      0 Yes         100.00 select /* using_append */ count(*), id from gtt where id in (        7

That should suffice for a recap of the previous posts.

No append hint-different outcome

Now imagine my surprise when I didn’t execute a script but rather typed the insert command in an interactive session. In the next example, created to reproduce the observation, I “forgot” to specify the append hint:

SQL> !cat test2.sql
-- not using the append hint
insert /* not_using_append */ into gtt select * from t4 where rownum < 400000;

commit;

-- not testing if I can get Smart Scans, I know there are :)
alter session set "_serial_direct_read" = always;

-- make sure the query can be ever so slightly more complex and always return rows
col min_id new_v min_id
col max_id new_v max_id

select min(id) min_id from gtt;
select max(id) max_id from gtt;

-- this is the actual test
@mystats start s=s
set echo on
select /* not_using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;
set echo off
@mystats stop t=1

The difference between test.sql and test2.sql looks harmless enough – just a hint, or lack thereof. However the result is interesting. Most executions of the above script resulted in Oracle reporting the following statistics:

------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
...
STAT    calls to get snapshot scn: kcmgss                                              462
STAT    calls to kcmgcs                                                                 28
STAT    cell IO uncompressed bytes                                             544,800,768
STAT    cell blocks processed by cache layer                                        67,026
STAT    cell blocks processed by data layer                                              1
STAT    cell blocks processed by txn layer                                               1
STAT    cell commit cache queries                                                   67,025
STAT    cell flash cache read hits                                                     633
STAT    cell num smartio automem buffer allocation attempts                              1
STAT    cell physical IO bytes eligible for predicate offload                  544,800,768
STAT    cell physical IO interconnect bytes                                    545,000,688
STAT    cell physical IO interconnect bytes returned by smart scan             544,984,304
STAT    cell scans                                                                       1
STAT    cleanout - number of ktugct calls                                           66,503
STAT    cleanouts only - consistent read gets                                       66,503
STAT    commit txn count during cleanout                                            66,503
...
STAT    consistent gets                                                            133,250
STAT    consistent gets direct                                                      66,504
STAT    consistent gets examination                                                 66,506
STAT    consistent gets examination (fastpath)                                      66,506
...
STAT    immediate (CR) block cleanout applications                                  66,503
...
STAT    physical read IO requests                                                    1,044
STAT    physical read bytes                                                    544,817,152
STAT    physical read requests optimized                                               633
STAT    physical read total IO requests                                              1,044
STAT    physical read total bytes                                              544,817,152
STAT    physical read total bytes optimized                                    329,859,072
STAT    physical read total multi block requests                                       748
STAT    physical reads                                                              66,506
STAT    physical reads cache                                                             2
STAT    physical reads direct                                                       66,504
...

Huh? I was clearly surprised to see so many rows not making it to the higher layers (txn and data) after having been read by the cache layer. Further executions of the select statement within the same session do not necessarily improve this significantly, but I noticed savings by storage index to come into play.

Speaking more scientifically, the listing immediately above shows that out of 67,026 blocks that were read by the cache layer only 1 (!) was processed by txn and data layers. As a direct result Smart Scan does not offer any real benefit: cell physical IO interconnect bytes returned by smart scan is actually ≥ cell physical IO bytes eligible for predicate offload.

Why? What?

There were a few red herrings in the statistic counters I saw:

  • Cleanout statistics have been reported by mystats
    • commit txn count during cleanout
    • cleanouts only – consistent read gets
    • immediate (CR) block cleanout applications
    • etc.
  • There are consistent gets examination that might be peeks at undo information
  • Lots of queries to the commit cache (cell commit cache queries) – but for most of my tests I didn’t see replies. Only 1 out of maybe 20 tests produced some cell transactions found in commit cache, but even then their number was not significant. I will provide an example towards the end of this post.

Further investigation

So this leads me to suspect that the blocks that were created with the append hint are “clean”, e.g. suitable for a consistent read on the cells without further need for processing. It would appear so, judging by this block dump (dump the block from the tempfile, not datafile!):

Start dump data blocks tsn: 3 file#:1 minblk 2305025 maxblk 2305025
...
Block dump from disk:
buffer tsn: 3 rdba: 0x00632c01 (1/2305025)
scn: 0x0.64aaeb9f seq: 0x01 flg: 0x0c tail: 0xeb9f0601
frmt: 0x02 chkval: 0xd11e type: 0x06=trans data
...
Block header dump:  0x00632c01
 Object id on Block? Y
 seg/obj: 0x632c00  csc: 0x00.64aaeb9e  itc: 3  flg: -  typ: 1 - DATA
     fsl: 0  fnx: 0x0 ver: 0x01

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x000a.019.0012beca  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x03   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
bdba: 0x00632c01
data_block_dump,data header at 0x7f6250501074
===============
tsiz: 0x1f88
hsiz: 0x1e
pbl: 0x7f6250501074
     76543210
flag=--------
ntab=1
nrow=6
frre=-1
fsbo=0x1e
fseo=0x746
avsp=0x728
tosp=0x728
0xe:pti[0]      nrow=6  offs=0
0x12:pri[0]     offs=0x1b7d
0x14:pri[1]     offs=0x1772
0x16:pri[2]     offs=0x1367
0x18:pri[3]     offs=0xf5c
0x1a:pri[4]     offs=0xb51
0x1c:pri[5]     offs=0x746
block_row_dump:
tab 0, row 0, @0x1b7d
tl: 1035 fb: --H-FL-- lb: 0x0  cc: 6
col  0: [ 2]  c1 50
col  1: [999]
 37 39 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
...
tl: 1035 fb: --H-FL-- lb: 0x0  cc: 6
...
tl: 1035 fb: --H-FL-- lb: 0x0  cc: 6
...

This is in contrast to the blocks I dumped that have been created without the append hint. To make sure I’m comparing apples with apples these blocks have been created in a different session after having disconnected from the previous one:

Start dump data blocks tsn: 3 file#:1 minblk 2305025 maxblk 2305025
...
Block dump from disk:
buffer tsn: 3 rdba: 0x00632c01 (1/2305025)
scn: 0x0.64aab6bf seq: 0x00 flg: 0x0c tail: 0xb6bf0600
frmt: 0x02 chkval: 0x478e type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
...
Block header dump:  0x00632c01
 Object id on Block? Y
 seg/obj: 0x629300  csc: 0x00.64aab6bf  itc: 2  flg: -  typ: 1 - DATA
     fsl: 0  fnx: 0x0 ver: 0x01

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x000a.01b.0012be5d  0x00c04427.d97f.27  ----    6  fsc 0x0000.00000000
0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
bdba: 0x00632c01
data_block_dump,data header at 0x7f53b75e125c
===============
tsiz: 0x1fa0
hsiz: 0x1e
pbl: 0x7f53b75e125c
     76543210
flag=--------
ntab=1
nrow=6
frre=-1
fsbo=0x1e
fseo=0x752
avsp=0x734
tosp=0x734
0xe:pti[0]      nrow=6  offs=0
0x12:pri[0]     offs=0xf6c
0x14:pri[1]     offs=0x1379
0x16:pri[2]     offs=0x1786
0x18:pri[3]     offs=0x1b93
0x1a:pri[4]     offs=0x752
0x1c:pri[5]     offs=0xb5f
block_row_dump:
tab 0, row 0, @0xf6c
tl: 1037 fb: --H-FL-- lb: 0x1  cc: 6
col  0: [ 4]  c3 0f 05 50
col  1: [999]
 31 34 30 34 37 39 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
...
tl: 1037 fb: --H-FL-- lb: 0x1  cc: 6
...
tl: 1037 fb: --H-FL-- lb: 0x1  cc: 6
...

This supports the theory that the block I created without the append hint features some information that prevents it from being passed from cache to txn layer. I didn’t see active txn count during cleanout though, but commit txn count during cleanout and immediate (CR) block cleanout applications instead. So cleanout is definitely needed and performed.

I stated in the second bullet point that commit cache queries were not producing cache-hits most of the time. I managed to get an execution of the select statement against the blocks created without append hint, where at least some commit cache queries produced hits:

------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
STAT    cell IO uncompressed bytes                                             544,800,768
STAT    cell blocks helped by commit cache                                           6,981
STAT    cell blocks processed by cache layer                                        66,969
STAT    cell blocks processed by data layer                                          6,982
STAT    cell blocks processed by txn layer                                           6,982
STAT    cell commit cache queries                                                   66,968
STAT    cell flash cache read hits                                                     576
STAT    cell num smartio automem buffer allocation attempts                              1
STAT    cell physical IO bytes eligible for predicate offload                  544,800,768
STAT    cell physical IO interconnect bytes                                    487,801,976
STAT    cell physical IO interconnect bytes returned by smart scan             487,785,592
STAT    cell scans                                                                       1
STAT    cell transactions found in commit cache                                      6,981
STAT    cleanout - number of ktugct calls                                           59,522
STAT    cleanouts only - consistent read gets                                       59,522
STAT    commit txn count during cleanout                                            59,522
...
STAT    physical read IO requests                                                      987
STAT    physical read bytes                                                    544,817,152
STAT    physical read requests optimized                                               576
STAT    physical read total IO requests                                                987
STAT    physical read total bytes                                              544,817,152
STAT    physical read total bytes optimized                                    329,859,072
STAT    physical read total multi block requests                                       723
STAT    physical reads                                                              66,506
STAT    physical reads cache                                                             2
STAT    physical reads direct                                                       66,504

I couldn’t reproduce this at will though, and it wasn’t for lack of trying.

Summary

I am not entirely sure what causes the observed behaviour, especially with regards to the commit cache and/or minscn optimisation but it evidently seems to be better to populate a GTT with the append hint if you want to make best use of Smart Scans against such segments. A very quick and dirty test against a heap table suggests that this is true for this segment types as well.

Posted in 12c Release 1, Exadata | Leave a Comment »

Smart Scanning GTTs – what about Flash Cache?

Posted by Martin Bach on November 29, 2016

Continuing the example of the previous blog post (is it possible to Smart Scan GTTs?), I am investigating if Smart Scans against GTTs can benefit from Flash Cache, or “optimised reads”. The easiest way to see this in your session is to calculate the change in session counters based on v$mystat or v$sesstat. Other tools might also provide information about the amount of data read from Flash Cache, but there are quite often extra licenses required …

Calculating the delta in session counters can be done in many ways, for example using Tanel’s session snapper. For clearly defined test cases where I can control beginning and end of the execution I prefer to use another great script for the purpose. My thanks go to Adrian Billington for providing the Oracle community with the most useful “mystats” package which I’m using here. You can – and should – get it from oracle-developer.net.

Let’s check I can get flash cache read hits on GTT scans – which would help performance of repeated queries against the segment. And since GTTs reside on TEMP, I had a few doubts whether flash cache read hits were possible.

What better than to test?

Testing helps removing ambiguity (and documenting the test result helps me remember things!), so here’s the code used:

set lines 120 tab off trimspool on verify off timing on

-- it will be crucially important to use the /*+ append */ hint 
-- as you will see in the next post about GTTs on Exadata
insert /*+ append */ into gtt select * from t4 where rownum < 400000;

commit;

-- not testing if I can get Smart Scans, I know there are :)
alter session set "_serial_direct_read" = always;

-- make sure the query can be ever so slightly more complex and always return rows
col min_id new_v min_id
col max_id new_v max_id

select min(id) min_id from gtt;
select max(id) max_id from gtt;

-- hoping to trigger flash cache population here
select /* using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;
select /* using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;
select /* using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;
select /* using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;
select /* using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;
select /* using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;

-- and measuring effect, if any

@mystats start s=s

set echo on
select /* using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;
set echo off

@mystats stop t=1

Have a look at the code in mystats.sql to see what the options mean. Essentially I’m asking it to record session statistics only (start s=s) and after the execution list only those counters that have changed (t=1 where t is for threshold). I hacked the script a little to order by statistic name rather than the default.

The observation

It appears I can get Flash Cache hits. The relevant statistic name is “cell flash cache read hits”, and the fact that physical IO was optimised can also be seen in “physical read requests optimized” and “physical read total bytes optimized” amongst others. Here is the relevant output from mystats:

------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
...
STAT    cell blocks processed by cache layer                                        66,667
STAT    cell blocks processed by data layer                                         66,667
STAT    cell blocks processed by txn layer                                          66,667
STAT    cell flash cache read hits                                                     315
STAT    cell num smartio automem buffer allocation attempts                              1
STAT    cell physical IO bytes eligible for predicate offload                  546,136,064
STAT    cell physical IO interconnect bytes                                        112,504
STAT    cell physical IO interconnect bytes returned by smart scan                  96,120
STAT    cell scans                                                                       1
...
STAT    physical read IO requests                                                      523
STAT    physical read bytes                                                    546,152,448
STAT    physical read requests optimized                                               315
STAT    physical read total IO requests                                                523
STAT    physical read total bytes                                              546,152,448
STAT    physical read total bytes optimized                                    329,252,864
STAT    physical read total multi block requests                                       521
STAT    physical reads                                                              66,669
STAT    physical reads cache                                                             2
STAT    physical reads direct                                                       66,667
...

I usually check the cell blocks processed by% layer first to get an idea about the “depth” of the Smart Scan-losing blocks in transaction and data layers wouldn’t be great. There’s been 1 Smart Scan against my segment (the GTT isn’t partitioned) recorded in “cell scans”, so that covers that aspect as well.

Out of the 523 “physical read IO requests” 315 were optimised so that’s a pretty good ratio of approximately 60% of the IO requests coming from Flash Cache. The other statistics represent the same ratio in bytes rather than I/O requests.

Posted in 12c Release 1, Exadata | Leave a Comment »

Little things worth knowing: Can I Smart Scan a Global Temporary Table?

Posted by Martin Bach on November 25, 2016

A question that has come up while preparing to deliver another Exadata Optimisation Class is whether you can offload queries against Global Temporary Tables (GTT) or not. There are subtle differences between heap tables and GTTs such as how statistics are handled and where they are stored that merit a little investigation.

I have used the lab environment on the X3-2 (Exadata 12.1.2.3.0, 12.1.0.2.160419 RDBMS) to test if I can Smart Scan GTTs. There are some enhancements in 12c regarding GTTs that are mentioned on the Optimiser blog here:

https://blogs.oracle.com/optimizer/entry/gtts_and_upgrading_to_oracle

The Setup

I have used a brand new, 12c non-CDB for these tests. The SGA_TARGET is deliberately set to something really small so as to provoke Smart Scans without having to resort to setting _serial_direct_read to “always” in my session.

First of all, I needed a GTT for this purpose. I have a few scripts that create large tables for testing, and T4 will serve as my basis. Here is my setup:

SQL> create global temporary table gtt on commit preserve rows 
  2  as select * from t4 where 1 = 0;

Table created.

SQL> alter session enable parallel dml;

Session altered.

SQL> insert /*+ parallel(4) */ into gtt
  2  select /*+ parallel(4) */ * from t4
  3    where rownum < 400000;

399999 rows created.

SQL> commit

Commit complete.

This might be the time to point out that the GTT has two types of statistics in 12c – shared and session. Again, please refer to the Optimiser blog post for more information about private session statistics. I first looked at dba_tables for num_rows etc but found the column empty. Using a query I found on oracle-base.com in Tim’s “Session-Private Statistics for Global Temporary Tables in Oracle Database 12c Release 1” article I detected some statistics in the end:

SQL> select table_name, object_type, num_rows, scope
  2  from user_tab_statistics where table_name = 'GTT';

TABLE_NAME                     OBJECT_TYPE    NUM_ROWS SCOPE
------------------------------ ------------ ---------- -------
GTT                            TABLE                   SHARED
GTT                            TABLE            399999 SESSION

It’s a global optimiser setting where you gather session level statistics by default.

SQL> SELECT DBMS_STATS.get_prefs('GLOBAL_TEMP_TABLE_STATS') FROM dual;

DBMS_STATS.GET_PREFS('GLOBAL_TEMP_TABLE_STATS')
--------------------------------------------------
SESSION

The Testcase

With the table set up it is now time to test the Smart Scan. In addition to a SQL trace I request a NSMTIO trace to understand Oracle’s direct path read decision better.

SQL> alter session set events 'sql_trace level 8';

Session altered.

SQL> alter session set events 'trace[NSMTIO] disk=highest';

Session altered.

SQL> select count(*), id from gtt where id in (9244048,2529293) group by id;

  COUNT(*)         ID
---------- ----------
         1    2529293
         1    9244048

SQL> exit

Looking at the trace file I can see the following output for this particular statement:

=====================
PARSING IN CURSOR #140137295360240 len=70 dep=0 uid=78 oct=3 lid=78 tim=1228663420381 hv=2599214644 ad='1398f2ab0' sqlid='9p960vqdftrjn'
select count(*), id from gtt where id in (9244048,2529293) group by id
END OF STMT
PARSE #140137295360240:c=2000,e=1954,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=2363333961,tim=1228663420380
EXEC #140137295360240:c=0,e=74,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=2363333961,tim=1228663420505
WAIT #140137295360240: nam='SQL*Net message to client' ela= 8 driver id=1650815232 #bytes=1 p3=0 obj#=163899 tim=1228663420582
NSMTIO: kcbism: islarge 1 next 0 nblks 67197 type 2, bpid 3, kcbisdbfc 0 kcbnhl 32768 kcbstt 6686 keep_nb 0 kcbnbh 317347 kcbnwp 4
NSMTIO: kcbimd: nblks 67197 kcbstt 6686 kcbpnb 33430 kcbisdbfc 3 is_medium 0
NSMTIO: kcbivlo: nblks 67197 vlot 500 pnb 334303 kcbisdbfc 0 is_large 0
NSMTIO: qertbFetch:[MTT < OBJECT_SIZE < VLOT]: Checking cost to read from caches(local/remote) and checking storage reduction factors (OLTP/EHCC Comp)
NSMTIO: kcbdpc:DirectRead: tsn: 3, objd: 6468608, objn: 163899
ckpt: 1, nblks: 67197, ntcache: 3, ntdist:3
Direct Path for pdb 0 tsn 3  objd 6468608 objn 163899
Direct Path 1 ckpt 1, nblks 67197 ntcache 3 ntdist 3
Direct Path mndb 0 tdiob 3978 txiob 0 tciob 101
Direct path diomrc 128 dios 2 kcbisdbfc 0
NSMTIO: Additional Info: VLOT=1671515
Object# = 6468608, Object_Size = 67197 blocks
SqlId = 9p960vqdftrjn, plan_hash_value = 2363333961, Partition# = 0
WAIT #140137295360240: nam='reliable message' ela= 1053 channel context=5884629416 channel handle=5884574224 broadcast message=5885746648 obj#=163899 tim=1228663421984
WAIT #140137295360240: nam='enq: RO - fast object reuse' ela= 122 name|mode=1380909062 2=65585 0=1 obj#=163899 tim=1228663422173
WAIT #140137295360240: nam='enq: RO - fast object reuse' ela= 96 name|mode=1380909057 2=65585 0=2 obj#=163899 tim=1228663422323
WAIT #140137295360240: nam='cell single block physical read' ela= 308 cellhash#=2133459483 diskhash#=2964949887 bytes=8192 obj#=163899 tim=1228663423031
WAIT #140137295360240: nam='cell smart table scan' ela= 153 cellhash#=2133459483 p2=0 p3=0 obj#=163899 tim=1228663423651
WAIT #140137295360240: nam='cell smart table scan' ela= 152 cellhash#=379339958 p2=0 p3=0 obj#=163899 tim=1228663424114
WAIT #140137295360240: nam='cell smart table scan' ela= 145 cellhash#=3176594409 p2=0 p3=0 obj#=163899 tim=1228663424571
WAIT #140137295360240: nam='gc cr grant 2-way' ela= 135 p1=201 p2=2273408 p3=7 obj#=163899 tim=1228663425857
WAIT #140137295360240: nam='cell single block physical read' ela= 257 cellhash#=379339958 diskhash#=2689692736 bytes=8192 obj#=163899 tim=1228663426158
WAIT #140137295360240: nam='cell single block physical read' ela= 224 cellhash#=2133459483 diskhash#=3132070477 bytes=8192 obj#=163899 tim=1228663426466
WAIT #140137295360240: nam='cell smart table scan' ela= 251 cellhash#=2133459483 p2=0 p3=0 obj#=163899 tim=1228663426863
WAIT #140137295360240: nam='cell smart table scan' ela= 268 cellhash#=379339958 p2=0 p3=0 obj#=163899 tim=1228663427213
WAIT #140137295360240: nam='cell smart table scan' ela= 225 cellhash#=3176594409 p2=0 p3=0 obj#=163899 tim=1228663427514
WAIT #140137295360240: nam='cell smart table scan' ela= 85 cellhash#=2133459483 p2=0 p3=0 obj#=163899 tim=1228663427720
WAIT #140137295360240: nam='cell smart table scan' ela= 13 cellhash#=379339958 p2=0 p3=0 obj#=163899 tim=1228663427755
...

There is a simple conclusion: yes, there is a Smart Scan. I can’t see a massive difference between the NSMTIO trace for a GTT or a heap table. Since direct path read decisions are statistics driven (check parameter _direct_read_decision_statistics_driven) and there weren’t shared statistics on GTT I assume the single block reads are related to reading object statistics. OBJ# 163899 is the object_id of the GTT.

What is interesting is the use of “enq: RO – fast object reuse” prior to the start of the Smart Scan. This surprised me a little and I wasn’t expecting it. Here’s the same trace combination for a heap table for comparison:

=====================
PARSING IN CURSOR #140120190960712 len=91 dep=0 uid=78 oct=3 lid=78 tim=1226766583470 hv=2817388421 ad='13483d3f0' sqlid='g8948rkmyvvw5'
select /* heaptest001 */ count(*), id from heapt1 where id in (6460386, 6460390) group by id
END OF STMT
PARSE #140120190960712:c=1000,e=1925,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=3353166567,tim=1226766583470
EXEC #140120190960712:c=0,e=84,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3353166567,tim=1226766583599
WAIT #140120190960712: nam='SQL*Net message to client' ela= 3 driver id=1650815232 #bytes=1 p3=0 obj#=163896 tim=1226766583653
NSMTIO: kcbism: islarge 1 next 0 nblks 67024 type 2, bpid 3, kcbisdbfc 0 kcbnhl 32768 kcbstt 6686 keep_nb 0 kcbnbh 317347 kcbnwp 4
NSMTIO: kcbimd: nblks 67024 kcbstt 6686 kcbpnb 33430 kcbisdbfc 3 is_medium 0
NSMTIO: kcbivlo: nblks 67024 vlot 500 pnb 334303 kcbisdbfc 0 is_large 0
NSMTIO: qertbFetch:[MTT < OBJECT_SIZE < VLOT]: Checking cost to read from caches(local/remote) and checking storage reduction factors (OLTP/EHCC Comp)
NSMTIO: kcbdpc:DirectRead: tsn: 9, objd: 163896, objn: 163896
ckpt: 1, nblks: 67024, ntcache: 482, ntdist:482
Direct Path for pdb 0 tsn 9  objd 163896 objn 163896
Direct Path 1 ckpt 1, nblks 67024 ntcache 482 ntdist 482
Direct Path mndb 0 tdiob 3978 txiob 0 tciob 109
Direct path diomrc 128 dios 2 kcbisdbfc 0
NSMTIO: Additional Info: VLOT=1671515
Object# = 163896, Object_Size = 67024 blocks
SqlId = g8948rkmyvvw5, plan_hash_value = 3353166567, Partition# = 0
WAIT #140120190960712: nam='enq: KO - fast object checkpoint' ela= 351 name|mode=1263468550 2=65585 0=2 obj#=163896 tim=1226766584359
WAIT #140120190960712: nam='reliable message' ela= 1372 channel context=5884629416 channel handle=5252488952 broadcast message=5885771824 obj#=163896 tim=1226766585825
WAIT #140120190960712: nam='enq: KO - fast object checkpoint' ela= 88 name|mode=1263468550 2=65585 0=1 obj#=163896 tim=1226766585972
WAIT #140120190960712: nam='enq: KO - fast object checkpoint' ela= 90 name|mode=1263468545 2=65585 0=2 obj#=163896 tim=1226766586140
WAIT #140120190960712: nam='cell smart table scan' ela= 201 cellhash#=3176594409 p2=0 p3=0 obj#=163896 tim=1226766587377
WAIT #140120190960712: nam='cell smart table scan' ela= 137 cellhash#=379339958 p2=0 p3=0 obj#=163896 tim=1226766589330
WAIT #140120190960712: nam='cell smart table scan' ela= 127 cellhash#=2133459483 p2=0 p3=0 obj#=163896 tim=1226766593585
WAIT #140120190960712: nam='cell smart table scan' ela= 215 cellhash#=3176594409 p2=0 p3=0 obj#=163896 tim=1226766602986

This is the more familiar picture: reliable message, enq: KO – fast object checkpoint followed by cell smart table scan.

The Verdict

Quite simple this time: yes, you can use Smart Scans on GTT. Additionally there are a few caveats around the use of GTTs when you are migrating your database to 12.1 but you can read about them in the link to the Optimiser blog provided.

Posted in 12c Release 1, Exadata | 1 Comment »

Building an RPM for the Oracle database on Oracle Linux 7

Posted by Martin Bach on October 28, 2016

Thinking about automation a lot, especially in the context of cloud computing, I have decided to create a small series of related posts that hopefully help someone deploying environments in a fully automated way. As my colleague @fritshoogland has said many times: the only way to deploy database servers (or any other server for that matter) in a consistent way, is to do it via software. No matter how hard you try to set up 10 identical systems manually, there will be some, maybe ever so subtle, differences between them. And with the cloud you probably have 42 systems or more to deal with! In this context, my first post could be a building block: the provisioning of the Oracle RDBMS the form of an RPM.

The idea

In a nutshell, I would like to

  • Create a “golden image” of my database installation
  • Make this available for deployment via NFS
  • Create an RPM that allows me to deploy the “golden image” on the server I want to install and call clone.pl to complete the process
  • Execute the root scripts (orainstRoot.sh, root.sh)

This is actually quite easy to do! The oracle-validated RPM, and nowadays oracle-rdbms-server-release-preinstall.x86_64 packages simplify the task greatly as they can pull in dependent packages, set sensible default parameters and generally perform a lot of the tedious setup work for you. All I have to do in my RPM is to enforce the version of the system is Oracle Linux 7 and pull the preinstall RPM in if not yet installed. Since there is no more 12c RDBMS server for i686 this entire post implicitly is about Linux x86-64.

The Implementation

I am using a development system (linuxdev in this example) to build the RPM. It’s an Oracle Linux 7.2 system with all upgrades up to 24/10/2016. The first thing to do is to install the RPM dev tools and their dependencies. As always, you use yum to do so.

[root@linuxdev ~]# yum info rpmdevtools
Loaded plugins: ulninfo
Available Packages
Name        : rpmdevtools
Arch        : noarch
Version     : 8.3
Release     : 5.0.1.el7
Size        : 96 k
Repo        : ol7_latest/x86_64
Summary     : RPM Development Tools
URL         : https://fedorahosted.org/rpmdevtools/
Licence     : GPLv2+ and GPLv2
Description : This package contains scripts and (X)Emacs support files to aid in
            : development of RPM packages.
            : rpmdev-setuptree    Create RPM build tree within user's home directory
            : rpmdev-diff         Diff contents of two archives
            : rpmdev-newspec      Creates new .spec from template
            : rpmdev-rmdevelrpms  Find (and optionally remove) "development" RPMs
            : rpmdev-checksig     Check package signatures using alternate RPM keyring
            : rpminfo             Print information about executables and libraries
            : rpmdev-md5/sha*     Display checksums of all files in an archive file
            : rpmdev-vercmp       RPM version comparison checker
            : spectool            Expand and download sources and patches in specfiles
            : rpmdev-wipetree     Erase all files within dirs created by rpmdev-setuptree
            : rpmdev-extract      Extract various archives, "tar xvf" style
            : rpmdev-bumpspec     Bump revision in specfile
            : ...and many more.

[root@linuxdev ~]# yum install rpmdevtools

...

The development tools contain a lot of useful scripts, one of them I consider essential: rpmdev-setuptree. It sets up the relevant directory structure. Granted, the rpmdev tools are geared towards compiling software from their source, optionally apply patches and then create SRPMs and RPMs. RPM wasn’t designed for the purpose of packaging an Oracle database, but it can still be made to work. You can work around the size restriction using the %post hook in the SPEC file, as you will see later in this post. But I’m getting ahead of myself.

First I’m creating the RPM tree structure:

[oracle@linuxdev ~]$ rpmdev-setuptree
[oracle@linuxdev ~]$ ls -l rpmbuild/
total 0
drwxr-xr-x 2 oracle oinstall 6 Oct 24 10:50 BUILD
drwxr-xr-x 2 oracle oinstall 6 Oct 24 10:50 RPMS
drwxr-xr-x 2 oracle oinstall 6 Oct 24 10:50 SOURCES
drwxr-xr-x 2 oracle oinstall 6 Oct 24 10:50 SPECS
drwxr-xr-x 2 oracle oinstall 6 Oct 24 10:50 SRPMS
[oracle@linuxdev ~]$

Using another tool named rpmdev-newspec I create a new, empty SPEC file:

[oracle@linuxdev SPECS]$ rpmdev-newspec oracle_12102_jul16_psu_ojvm.spec
oracle_12102_jul16_psu_ojvm.spec created; type minimal, rpm version >= 4.11.
[oracle@linuxdev SPECS]$ ls -lrt
total 4
-rw-r--r-- 1 oracle oinstall 338 Oct 24 10:53 oracle_12102_jul16_psu_ojvm.spec
[oracle@linuxdev SPECS]$

Completing the SPEC file is easy if you know how to :) To save you the pain I put a stub together that is far from perfect, but should give you an idea. This is just a demo, the code especially lacks all (error, space, etc) checking which really should be in.

[oracle@linuxdev SPECS]$ cat oracle_12102_jul16_psu_ojvm.spec
Name:           oracle_12102_jul16_psu_ojvm
Version:        0
Release:        1%{?dist}
Summary:        a spec file for Oracle 12.1.0.2 with patch 23615289 (combo for 23054246 and 23177536)
Group:          Application/Databases
License:        Commercial
URL:            https://martincarstenbach.wordpress.com/
Requires:       oraclelinux-release >= 7:7.0, oracle-rdbms-server-12cR1-preinstall

%description
Oracle 12.1.0.2 EE patched with the July 2016 PSU and OJVM patch
Requires Oracle Linux 7, depends on the preinstall RPM to set up users, groups and defaults

ONLY A PROTOTYPE, USED FOR DEMONSTRATION PURPOSES ONLY

%files
%defattr(-,root,root,-)

%post
ORACLE_HOME=/u01/app/oracle/product/12.1.0.2/dbhome_1

if [ -d "$ORACLE_HOME" ]; then
        /usr/bin/echo "ORACLE_HOME directory $ORACLE_HOME exists-aborting";
        exit 127
fi

/usr/bin/echo "cloning the oracle home now to $ORACLE_HOME"
/usr/bin/mkdir -p $ORACLE_HOME

/usr/bin/umount /mnt
/usr/bin/mount -t nfs linuxdev:/u01/nfs_export /mnt

/usr/bin/tar --gzip -xf /mnt/oracle/12.1.0.2/oracle_12102_jul16_psu_ojvm.tar.gz -C $ORACLE_HOME

/usr/bin/chown -R oracle:oinstall /u01

/usr/bin/su - oracle -c "cd $ORACLE_HOME/clone/bin && ./clone.pl ORACLE_HOME=$ORACLE_HOME ORACLE_BASE=/u01/app/oracle -defaultHomeName"

/u01/app/oraInventory/orainstRoot.sh
$ORACLE_HOME/root.sh

umount /mnt

%changelog
* Mon Oct 24 2016 Martin Bach 
- initial version

Again, at the risk of repeating myself, this is just a stub and needs to be extended to include error handling and other things that we consider good coding practice. You also should check if there is sufficient space in the $ORACLE_HOME directory etc… you get the idea. Also, if you create another SPEC file for a 12c installation you currently can’t install a new Oracle Home as the path is hard coded to dbhome_1.

The script – which is the part in the %post section – is so simple it should be self-explanatory. It will be passed to /bin/sh and executed as part of the installation process.

With the SPEC file in place you can try and build the RPM. Funny enough, the rpmbuild tool is used for that:

[oracle@linuxdev SPECS]$ rpmbuild -ba oracle_12102_jul16_psu_ojvm.spec
Processing files: oracle_12102_jul16_psu_ojvm-0-1.el7.x86_64
Provides: oracle_12102_jul16_psu_ojvm = 0-1.el7 oracle_12102_jul16_psu_ojvm(x86-64) = 0-1.el7
Requires(interp): /bin/sh
Requires(rpmlib): rpmlib(FileDigests) <= 4.6.0-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1 rpmlib(CompressedFileNames) <= 3.0.4-1
Requires(post): /bin/sh
Checking for unpackaged file(s): /usr/lib/rpm/check-files /home/oracle/rpmbuild/BUILDROOT/oracle_12102_jul16_psu_ojvm-0-1.el7.x86_64
Wrote: /home/oracle/rpmbuild/SRPMS/oracle_12102_jul16_psu_ojvm-0-1.el7.src.rpm
Wrote: /home/oracle/rpmbuild/RPMS/x86_64/oracle_12102_jul16_psu_ojvm-0-1.el7.x86_64.rpm
Executing(%clean): /bin/sh -e /var/tmp/rpm-tmp.4xSozZ
+ umask 022
+ cd /home/oracle/rpmbuild/BUILD
+ /usr/bin/rm -rf /home/oracle/rpmbuild/BUILDROOT/oracle_12102_jul16_psu_ojvm-0-1.el7.x86_64
+ exit 0

This command has created a Source RPM, and the RPM file to be used, as you can see here:

[oracle@linuxdev SPECS]$ ls -l ../RPMS/x86_64/
total 4
-rw-r--r-- 1 oracle oinstall 2832 Oct 26 16:24 oracle_12102_jul16_psu_ojvm-0-1.el7.x86_64.rpm
[oracle@linuxdev SPECS]$ rpm -qpil  --requires ../RPMS/x86_64/oracle_12102_jul16_psu_ojvm-0-1.el7.x86_64.rpm
Name        : oracle_12102_jul16_psu_ojvm
Version     : 0
Release     : 1.el7
Architecture: x86_64
Install Date: (not installed)
Group       : Application/Databases
Size        : 0
License     : Commercial
Signature   : (none)
Source RPM  : oracle_12102_jul16_psu_ojvm-0-1.el7.src.rpm
Build Date  : Wed 26 Oct 2016 16:24:02 BST
Build Host  : linuxdev.example.com
Relocations : (not relocatable)
URL         : https://martincarstenbach.wordpress.com/
Summary     : a spec file for Oracle 12.1.0.2 with patch 23615289 (combo for 23054246 and 23177536)
Description :
Oracle 12.1.0.2 EE patched with the July 2016 PSU and OJVM patch
Requires Oracle Linux 7, depends on the preinstall RPM to set up users, groups and defaults

ONLY A PROTOTYPE, USED FOR DEMONSTRATION PURPOSES ONLY
oraclelinux-release >= 7:7.0
oracle-rdbms-server-12cR1-preinstall
/bin/sh
rpmlib(FileDigests) <= 4.6.0-1
rpmlib(PayloadFilesHavePrefix) <= 4.0-1
rpmlib(CompressedFileNames) <= 3.0.4-1
rpmlib(PayloadIsXz) <= 5.2-1
(contains no files)

You require a “gold image” binary copy for cloning as well. I am assuming it is in place and NFS-accessible in the location specified by the %post scriptlet.

You can refer to Frits Hoogland’s post for more detail about creating the gold image: https://fritshoogland.wordpress.com/2010/07/03/cloning-your-oracle-database-software-installation/. There are probably lots of other posts out there describing the same procedure.

Installation

Here is an example install session, executed as root on a new box I created: server9.

[root@server9 ~]# rpm -ihv /tmp/oracle_12102_jul16_psu_ojvm-0-1.el7.x86_64.rpm

Preparing...                          ################################# [100%]
Updating / installing...
   1:oracle_12102_jul16_psu_ojvm-0-1.e################################# [100%]
cloning the oracle home now to /u01/app/oracle/product/12.1.0.2/dbhome_1
./runInstaller -clone -waitForCompletion  "ORACLE_HOME=/u01/app/oracle/product/12.1.0.2/dbhome_1" 
"ORACLE_BASE=/u01/app/oracle" -defaultHomeName  -silent -paramFile 
/u01/app/oracle/product/12.1.0.2/dbhome_1/clone/clone_oraparam.ini
Starting Oracle Universal Installer...

Checking Temp space: must be greater than 500 MB.   Actual 9204 MB    Passed
Checking swap space: must be greater than 500 MB.   Actual 767 MB    Passed
Preparing to launch Oracle Universal Installer from /tmp/OraInstall2016-10-26_05-03-05PM. 
Please wait ...You can find the log of this install session at:
 /u01/app/oraInventory/logs/cloneActions2016-10-26_05-03-05PM.log
..................................................   5% Done.
..................................................   10% Done.
..................................................   15% Done.
..................................................   20% Done.
..................................................   25% Done.
..................................................   30% Done.
..................................................   35% Done.
..................................................   40% Done.
..................................................   45% Done.
..................................................   50% Done.
..................................................   55% Done.
..................................................   60% Done.
..................................................   65% Done.
..................................................   70% Done.
..................................................   75% Done.
..................................................   80% Done.
..................................................   85% Done.
..........
Copy files in progress.

Copy files successful.

Link binaries in progress.

Link binaries successful.

Setup files in progress.

Setup files successful.

Setup Inventory in progress.

Setup Inventory successful.

Finish Setup successful.
The cloning of OraHome1 was successful.
Please check '/u01/app/oraInventory/logs/cloneActions2016-10-26_05-03-05PM.log' for more details.

Setup Oracle Base in progress.

Setup Oracle Base successful.
..................................................   95% Done.

As a root user, execute the following script(s):
        1. /u01/app/oraInventory/orainstRoot.sh
        2. /u01/app/oracle/product/12.1.0.2/dbhome_1/root.sh



..................................................   100% Done.
Changing permissions of /u01/app/oraInventory.
Adding read,write permissions for group.
Removing read,write,execute permissions for world.

Changing groupname of /u01/app/oraInventory to oinstall.
The execution of the script is complete.
Check /u01/app/oracle/product/12.1.0.2/dbhome_1/install/root_server9_2016-10-26_17-04-05.log 
for the output of root script

The “post” script that is executed spawns a shell (/bin/sh) and executes a script stored in /var/tmp. In my case, this is the contents; pretty much exactly the %post section of the SPEC file:

[root@server9 ~]# cat /var/tmp/rpm-tmp.tZ4pYf
ORACLE_HOME=/u01/app/oracle/product/12.1.0.2/dbhome_1

if [ -d "$ORACLE_HOME" ]; then
        /usr/bin/echo "ORACLE_HOME directory $ORACLE_HOME exists-aborting";
        exit 127
fi

/usr/bin/echo "cloning the oracle home now to $ORACLE_HOME"
/usr/bin/mkdir -p $ORACLE_HOME

/usr/bin/umount /mnt
/usr/bin/mount -t nfs linuxdev:/u01/nfs_export /mnt

/usr/bin/tar --gzip -xf /mnt/oracle/12.1.0.2/oracle_12102_jul16_psu_ojvm.tar.gz -C $ORACLE_HOME

/usr/bin/chown -R oracle:oinstall /u01

/usr/bin/su - oracle -c "cd $ORACLE_HOME/clone/bin && ./clone.pl ORACLE_HOME=$ORACLE_HOME ORACLE_BASE=/u01/app/oracle -defaultHomeName"

/u01/app/oraInventory/orainstRoot.sh
$ORACLE_HOME/root.sh

umount /mnt
[root@server9 ~]#

And that’s it! One of the slightly more complex things was to ensure that this can be installed on Oracle Linux 7 only. One way to do this is to add the oraclelinux-release package as a dependency. I had to fiddle around with it a bit as it uses the epoch field (which seems rare), and not just the version. After I worked it out I managed to enforce the release. On my Oracle Linux 6 system the RPM could not be installed:

[root@linux6 ~]# rpm -ihv /tmp/oracle_12102_jul16_psu_ojvm-1-0.el7.x86_64.rpm
error: Failed dependencies:
	oraclelinux-release >= 7:7.0 is needed by oracle_12102_jul16_psu_ojvm-1-0.el7.x86_64

If you are now thinking of using the epoch:version to check for the oraclelinux-release on Oracle Linux 6, don’t. This is the current oraclelinux-release RPM taken from the “latest” channel in public-yum:

[oracle@linux6 ~]$ rpm -qpi --provides ./oraclelinux-release-6Server-8.0.3.x86_64.rpm
Name        : oraclelinux-release          Relocations: (not relocatable)
Version     : 6Server                           Vendor: Oracle America
Release     : 8.0.3                         Build Date: Wed May 11 11:51:57 2016
Install Date: (not installed)               Build Host: x86-ol6-builder-05.us.oracle.com
Group       : System Environment/Base       Source RPM: oraclelinux-release-6Server-8.0.3.src.rpm
Size        : 49976                            License: GPL
Signature   : RSA/8, Wed May 11 11:44:59 2016, Key ID 72f97b74ec551f03
Summary     : Oracle Linux 6 release file
Description :
System release and information files
config(oraclelinux-release) = 6:6Server-8.0.3
oraclelinux-release = 6:6Server-8.0.3
oraclelinux-release(x86-64) = 6:6Server-8.0.3

So instead of “Requires: oraclelinux-release >= 7:7.0”, … you probably need to replace that with “Requires: oraclelinux-release >= 6:6Server-2.0” for 12c (Oracle Linux 6 update 2 or newer, according to the Linux installation guide).

Posted in 12c Release 1, Linux | Tagged: , | 5 Comments »

Little things worth knowing: when a transient ASM disk failure cannot be fixed in time

Posted by Martin Bach on October 7, 2016

In the previous blog post I used libvirt and KVM in my lab environment to simulate a transient disk failure and how to recover from it. This post takes this example a step further: I am simulating another disk failure, but this time won’t pretend I can fix the issue and put it back. In other words, I simulate the effect of the disk_repair_time hitting zero.

Most of what I am covering here is an extension of the previous post, I’ll mention the main detail here for your benefit, but would like to invite you to revert to the previous post for more detail.

The idea is to show you the output of the ASM alert.log and result of the lost disk in the V$-views.

As with the previous post, the code examples in this one are for demonstration purposes only!

The setup in a nutshell

I am using Oracle Linux 7.2 with UEK4 as the host system; KVM and libvirt are responsible for the guests (virtual machines). The guest VM used for this example is named asmtest, and uses Oracle Linux 7.2 as well. There are 9 ASM disks – 8 for +DATA featuring normal redundancy in 2 failure groups. I added +RECO for the sake of completeness with external redundancy. This post is about +DATA. To keep it simple I used an Oracle Restart configuration patched to the July 2016 PSU.

Removing the disk

As with the previous example I am using libvirt to remove a “LUN” temporarily from the guest. And sure enough, the VM picks this up. This is the relevant output obtained via journalctl -f

Sep 29 15:34:36 asmtest kernel: sd 2:0:0:0: [sda] Synchronizing SCSI cache
Sep 29 15:34:36 asmtest kernel: sd 2:0:0:0: [sda]
Sep 29 15:34:36 asmtest kernel: Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Sep 29 15:34:36 asmtest kernel: sd 2:0:0:0: [sda]
Sep 29 15:34:36 asmtest kernel: Sense Key : Illegal Request [current] 
Sep 29 15:34:36 asmtest kernel: sd 2:0:0:0: [sda]
Sep 29 15:34:36 asmtest kernel: Add. Sense: Logical unit not supported

In the previous post I keep referring to I copied the part of the ASM instance’s alert.log that showed how the disk repair timer was ticking down. This time I am simulating the case where – for whatever reason – the transient failure could not be fixed. In that case, this is what you would see in the alert.log:

2016-09-29 15:38:21.752000 +01:00
WARNING: Started Drop Disk Timeout for Disk 1 (DATA_0001) in group 1 with a value 600
WARNING: Disk 1 (DATA_0001) in group 1 will be dropped in: (600) secs on ASM inst 1
2016-09-29 15:41:25.379000 +01:00
WARNING: Disk 1 (DATA_0001) in group 1 will be dropped in: (416) secs on ASM inst 1
2016-09-29 15:44:29.012000 +01:00
WARNING: Disk 1 (DATA_0001) in group 1 will be dropped in: (232) secs on ASM inst 1
2016-09-29 15:47:32.643000 +01:00
WARNING: Disk 1 (DATA_0001) in group 1 will be dropped in: (48) secs on ASM inst 1
2016-09-29 15:50:36.259000 +01:00
WARNING: Disk 1 (DATA_0001) in group 1 will be dropped in: (0) secs on ASM inst 1

The last line in the above output is definitely NOT what you want to see. If at all possible, you should fix the problem causing DATA_0001 in this example from being dropped. If you see the message about the disk being dropped you are facing the inevitable rebalance operation. Here is the continued output from the ASM alert.log:

WARNING: PST-initiated drop of 1 disk(s) in group 1(.4232658126))
SQL> alter diskgroup DATA drop disk DATA_0001 force /* ASM SERVER */
NOTE: cache closing disk 1 of grp 1: (not open) DATA_0001
NOTE: requesting all-instance membership refresh for group=1
GMON updating for reconfiguration, group 1 at 40 for pid 23, osid 3216
NOTE: cache closing disk 1 of grp 1: (not open) DATA_0001
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: group 1 PST updated.
NOTE: membership refresh pending for group 1/0xfc493cce (DATA)
GMON querying group 1 at 41 for pid 16, osid 2701
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATA
SUCCESS: refreshed membership for 1/0xfc493cce (DATA)
SUCCESS: alter diskgroup DATA drop disk DATA_0001 force /* ASM SERVER */
SUCCESS: PST-initiated drop disk in group 1(4232658126))
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: starting rebalance of group 1/0xfc493cce (DATA) at power 1
Starting background process ARB0
ARB0 started with pid=20, OS id=3655 
NOTE: assigning ARB0 to group 1/0xfc493cce (DATA) with 1 parallel I/O
2016-09-29 15:50:40.257000 +01:00
NOTE: restored redundancy of control and online logs in DATA
NOTE: Rebalance has restored redundancy for any existing control file or redo log in disk group DATA
NOTE: restored redundancy of control and online logs in DATA
2016-09-29 15:50:51.655000 +01:00
NOTE: requesting all-instance membership refresh for group=1
GMON updating for reconfiguration, group 1 at 42 for pid 21, osid 3719
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATA
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: group 1 PST updated.
2016-09-29 15:50:54.647000 +01:00
GMON updating for reconfiguration, group 1 at 43 for pid 21, osid 3719
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATA
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: group 1 PST updated.
2016-09-29 15:50:57.571000 +01:00
NOTE: membership refresh pending for group 1/0xfc493cce (DATA)
GMON querying group 1 at 44 for pid 16, osid 2701
GMON querying group 1 at 45 for pid 16, osid 2701
NOTE: Disk _DROPPED_0001_DATA in mode 0x0 marked for de-assignment
SUCCESS: refreshed membership for 1/0xfc493cce (DATA)
NOTE: Attempting voting file refresh on diskgroup DATA
2016-09-29 15:51:03.589000 +01:00
NOTE: stopping process ARB0
SUCCESS: rebalance completed for group 1/0xfc493cce (DATA)

The last line indicates that the rebalance operation is complete. My disk group was almost empty, the rebalance operation took almost no time. But indeed, when you check V$ASM_DISK, the disk is gone:

SQL> r
  1  select name,path,disk_number,mount_status,header_status,mode_status,state,failgroup,repair_timer
  2* from v$asm_disk where group_number = 1 order by disk_number

NAME       PATH            DISK_NUMBER MOUNT_S HEADER_STATU MODE_ST STATE    FAILGROUP  REPAIR_TIMER
---------- --------------- ----------- ------- ------------ ------- -------- ---------- ------------
DATA_0000  /dev/asm-disk02           0 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                0
DATA_0002  /dev/asm-disk03           2 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                0
DATA_0003  /dev/asm-disk06           3 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                0
DATA_0004  /dev/asm-disk07           4 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                0
DATA_0005  /dev/asm-disk08           5 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                0
DATA_0006  /dev/asm-disk04           6 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                0
DATA_0007  /dev/asm-disk05           7 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                0

7 rows selected.

SQL> select count(*), failgroup from v$asm_disk where name like 'DATA%' group by failgroup;

  COUNT(*) FAILGROUP
---------- ----------
         3 BATCH1
         4 BATCH2

My system is fully operational, and the rebalance did not run into any space problems. Space problems are the last thing you want to have when rebalancing. I did some research about this subject earlier and documented it in a blog post.

In a future blog post I am going to investigate what happens when a partner disk of DATA_0001 fails while the repair timer is ticking down.

Posted in 12c Release 1, Automatic Storage Management, KVM, Linux | Leave a Comment »

Little things worth knowing: transient ASM disk failures in 12c

Posted by Martin Bach on September 30, 2016

For quite some time we have been treated nicely by ASM when it comes to transient disk failures. Since 11.1 (if memory serves me right), transient failures won’t cause an ASM disk to be dropped immediately. This is good, because it can potentially save a lot of time! When a disk is dropped from an ASM disk, a rebalance operation is unavoidable. And there is more than a chance of another one following it, because ultimately you want to add the disk back to the disk group. Well, to be fair, this is only applicable for normal or high redundancy disk groups.

The feature I just described very briefly is referred to as ASM Fast Mirror Resync, and documented in the ASM Administrator’s Guide in section Administering Oracle ASM Disk Groups.

By the way, there is another new feature you might want to look at, called FAILGROUP_REPAIR_TIME, but I’ll keep that for another post.

In this post I’d like to demonstrate how easy it is to simulate a transient failure of an ASM disk using libvirt and KVM on a host running Oracle Linux 7.2. I also wanted to have the output from the O/S and ASM alert.log as personal reference.

As usual, this procedure is strictly for YOUR HOME LAB USE ONLY and for educational purposes.

The setup

When creating the VM – named “asmtest” – I created a boot volume, and a 50 GB volume for the Oracle binaries. To keep it manageable I went for an Oracle 12.1 Restart configuration, patched to the July PSU. There are 9 ASM disks, presented as SCSI devices (not virtio), I’ll explain why in a bit. The guest O/S is Oracle Linux 7.2, as is my lab server. All virtual disks on the VM are partitioned although that is not necessary (strictly speaking).

Using the lsscsi tool (part of the sg3_utils), I created udev rules following the excellent description I found on oracle-base.com.

[oracle@asmtest ~]$ cat /etc/udev/rules.d/61-asm.rules 

KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-0", SYMLINK+="asm-disk01", OWNER="oracle", GROUP="asmdba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-1", SYMLINK+="asm-disk02", OWNER="oracle", GROUP="asmdba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-2", SYMLINK+="asm-disk03", OWNER="oracle", GROUP="asmdba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-3", SYMLINK+="asm-disk04", OWNER="oracle", GROUP="asmdba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-0", SYMLINK+="asm-disk05", OWNER="oracle", GROUP="asmdba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-1", SYMLINK+="asm-disk06", OWNER="oracle", GROUP="asmdba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-2", SYMLINK+="asm-disk07", OWNER="oracle", GROUP="asmdba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-3", SYMLINK+="asm-disk08", OWNER="oracle", GROUP="asmdba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi3-0-0-0", SYMLINK+="asm-disk09", OWNER="oracle", GROUP="asmdba", MODE="0660"

This is slightly more complicated than my usual setup, but was needed for device name persistence. Unfortunately libvirt does not report the same block device names as my guest VM. Whilst the host “sees” sd[h-v] my guest reported these devices as sd[a-i]. Persistent device names to the rescue! The only issue I had initially was that disks presented to the VM via virtio don’t report any SCSI IDs I need for my udev rules, or in other words, lsscsi returned no data. Presenting the block devices via the SCSI interface did the trick.

The hypothesis

This snippet from the documentation (referenced above) is exactly what I wanted to test:

After you repair the disk, run the SQL statement ALTER DISKGROUP ONLINE DISK. This statement brings a repaired disk group back online to enable writes so that no new writes are missed. This statement also starts a procedure to copy of all of the extents that are marked as stale on their redundant copies.

Here is the current configuration I am using for my DATA disk group, as seen by ASM:

SQL> select name,state,type,offline_disks from v$asm_diskgroup
  2  where name = 'DATA';

NAME                           STATE       TYPE   OFFLINE_DISKS
------------------------------ ----------- ------ -------------
DATA                           MOUNTED     NORMAL             0



SQL> select name,disk_number,path,mount_status,header_status,mode_status,state,failgroup,repair_timer
  2  from v$asm_disk where group_number = 1 order by disk_number;

NAME            DISK_NUMBER PATH            MOUNT_S HEADER_STATU MODE_ST STATE    FAILGROUP        REPAIR_TIMER
--------------- ----------- --------------- ------- ------------ ------- -------- ---------------- ------------
DATA_0000                 0 /dev/asm-disk02 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                      0
DATA_0001                 1 /dev/asm-disk01 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                      0
DATA_0002                 2 /dev/asm-disk03 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                      0
DATA_0003                 3 /dev/asm-disk06 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                      0
DATA_0004                 4 /dev/asm-disk07 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                      0
DATA_0005                 5 /dev/asm-disk08 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                      0
DATA_0006                 6 /dev/asm-disk04 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                      0
DATA_0007                 7 /dev/asm-disk05 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                      0

You can see 2 failure groups, batch1 and batch2, and all disks are available/online. The operating system mapping works perfectly thanks to my udev rules:

[oracle@asmtest ~]$ ls -l /dev/asm-disk0*
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk01 -> sda1
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk02 -> sdd1
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk03 -> sdc1
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk04 -> sdb1
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk05 -> sde1
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk06 -> sdh1
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk07 -> sdg1
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk08 -> sdf1
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk09 -> sdi1

Let’s detach a disk and cause a transient failure

Libvirt is really cool, it allows me to modify my VM configuration at runtime. One example is removing of disks, using the detach-disk command:

[root@host ~]# virsh help detach-disk
  NAME
    detach-disk - detach disk device

  SYNOPSIS
    detach-disk <domain> <target> [--persistent] [--config] [--live] [--current]

  DESCRIPTION
    Detach disk device.

  OPTIONS
    [--domain] <string>  domain name, id or uuid
    [--target] <string>  target of disk device
    --persistent     make live change persistent
    --config         affect next boot
    --live           affect running domain
    --current        affect current domain

But which disk to remove? The disks attached to a VM can be listed using the domblklist command:

[root@host ~]# virsh domblklist asmtest
Target     Source
------------------------------------------------
vda        /path/to/slow/disk/asmtest
vdb        /path/to/slow/disk/asmtest-1
sdh        /path/to/fast/disk/asmtest-asm01.img
sdi        /path/to/fast/disk/asmtest-asm02.img
sdj        /path/to/fast/disk/asmtest-asm03.img
sdk        /path/to/fast/disk/asmtest-asm04.img
sdo        /path/to/fast/disk/asmtest-asm05.img
sdp        /path/to/fast/disk/asmtest-asm06.img
sdq        /path/to/fast/disk/asmtest-asm07.img
sdr        /path/to/fast/disk/asmtest-asm08.img
sdv        /path/to/fast/disk/asmtest-asm09.img

I mentioned previously that the information provided by the host does not necessarily match that of the guest. Since I have persistent device names though in my VM there shouldn’t be an issue. Let’s take a disk away, but only temporarily! Since my O/S and Oracle binaries reside on disks presented to the guest as virtio devices, I know they are named vd[a-z] and better left alone. That leaves sdh as the first disk to remove.

[root@host ~]# virsh detach-disk asmtest sdh --live
Disk detached successfully

And the disk is gone.

And what is happening?

The disk removal has immediate effect on the ASM instance. Switching over to my VM I can see the following information recorded in the ASM alert.log:

2016-09-29 13:45:13.936000 +01:00
WARNING: Write Failed. group:1 disk:1 AU:1 offset:1044480 size:4096
path:/dev/asm-disk01
 incarnation:0xe969ccd4 asynchronous result:'I/O error'
 subsys:System krq:0x7fda4f0b8210 bufp:0x7fda4f20b000 osderr1:0x69b5 osderr2:0x0
 IO elapsed time: 0 usec Time waited on I/O: 0 usec
WARNING: Hbeat write to PST disk 1.3916025044 in group 1 failed. [2]
NOTE: process _b000_+asm (4346) initiating offline of disk 1.3916025044 (DATA_0001) with mask 0x7e in group 1 (DATA) without client assisting
NOTE: checking PST: grp = 1
GMON checking disk modes for group 1 at 13 for pid 22, osid 4346
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: checking PST for grp 1 done.
NOTE: sending set offline flag message (844741558) to 1 disk(s) in group 1
WARNING: Disk 1 (DATA_0001) in group 1 mode 0x7f is now being offlined
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0xe969ccd4, mask = 0x6a, op = clear
GMON updating disk modes for group 1 at 14 for pid 22, osid 4346
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: PST update grp = 1 completed successfully
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0xe969ccd4, mask = 0x7e, op = clear
GMON updating disk modes for group 1 at 15 for pid 22, osid 4346
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: cache closing disk 1 of grp 1: DATA_0001
NOTE: PST update grp = 1 completed successfully
2016-09-29 13:46:53.300000 +01:00
WARNING: Started Drop Disk Timeout for Disk 1 (DATA_0001) in group 1 with a value 600
WARNING: Disk 1 (DATA_0001) in group 1 will be dropped in: (600) secs on ASM inst 1

I also had a journalctl -f running in a different session. Unsurprisingly the O/S also noticed the disk disappearing:

Sep 29 13:45:13 asmtest kernel: sd 2:0:0:0: [sda] Synchronizing SCSI cache
Sep 29 13:45:13 asmtest kernel: sd 2:0:0:0: [sda]
Sep 29 13:45:13 asmtest kernel: Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Sep 29 13:45:13 asmtest kernel: sd 2:0:0:0: [sda]
Sep 29 13:45:13 asmtest kernel: Sense Key : Illegal Request [current] 
Sep 29 13:45:13 asmtest kernel: sd 2:0:0:0: [sda]
Sep 29 13:45:13 asmtest kernel: Add. Sense: Logical unit not supported

A transient failure

In yet another session I can see that the disk is about to go, the repair timer has started ticking down, and is decremented as time passes. This did not happen straight away, there was a little delay of maybe a couple of minutes after the disk was removed.

SQL> r
  1  select name,disk_number,path,mount_status,header_status,mode_status,state,failgroup,repair_timer
  2* from v$asm_disk where group_number = 1 order by disk_number

NAME            DISK_NUMBER PATH            MOUNT_S HEADER_STATU MODE_ST STATE    FAILGROUP       REPAIR_TIMER
--------------- ----------- --------------- ------- ------------ ------- -------- --------------- ------------
DATA_0000                 0 /dev/asm-disk02 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0001                 1                 MISSING UNKNOWN      OFFLINE NORMAL   BATCH1                   600
DATA_0002                 2 /dev/asm-disk03 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0003                 3 /dev/asm-disk06 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0004                 4 /dev/asm-disk07 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0005                 5 /dev/asm-disk08 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0006                 6 /dev/asm-disk04 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0007                 7 /dev/asm-disk05 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0

8 rows selected.

SQL> r
  1  select name,disk_number,path,mount_status,header_status,mode_status,state,failgroup,repair_timer
  2* from v$asm_disk where group_number = 1 order by disk_number

NAME            DISK_NUMBER PATH            MOUNT_S HEADER_STATU MODE_ST STATE    FAILGROUP       REPAIR_TIMER
--------------- ----------- --------------- ------- ------------ ------- -------- --------------- ------------
DATA_0000                 0 /dev/asm-disk02 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0001                 1                 MISSING UNKNOWN      OFFLINE NORMAL   BATCH1                   416
DATA_0002                 2 /dev/asm-disk03 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0003                 3 /dev/asm-disk06 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0004                 4 /dev/asm-disk07 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0005                 5 /dev/asm-disk08 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0006                 6 /dev/asm-disk04 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0007                 7 /dev/asm-disk05 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0

8 rows selected.

Another day I would like to run an additional test to see what happens when the repair_timer is actually down to 0, but for now let’s add the disk back in. On the host, I use libvirt again for this:

[root@host ~]# virsh attach-disk asmtest /path/to/fast/disk/asmtest-asm01.img sdh --live
Disk attached successfully

As soon as the prompt returns, I can see the guest O/S picking the disk up.

Sep 29 13:54:09 asmtest kernel: scsi 2:0:0:0: Direct-Access     QEMU     QEMU HARDDISK    1.5. PQ: 0 ANSI: 5
Sep 29 13:54:09 asmtest kernel: sd 2:0:0:0: Attached scsi generic sg0 type 0
Sep 29 13:54:09 asmtest kernel: sd 2:0:0:0: [sda] 10485760 512-byte logical blocks: (5.36 GB/5.00 GiB)
Sep 29 13:54:09 asmtest kernel: sd 2:0:0:0: [sda] Write Protect is off
Sep 29 13:54:09 asmtest kernel: sd 2:0:0:0: [sda] Mode Sense: 63 00 00 08
Sep 29 13:54:09 asmtest kernel: sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Sep 29 13:54:09 asmtest kernel:  sda: sda1
Sep 29 13:54:09 asmtest kernel: sd 2:0:0:0: [sda] Attached SCSI disk

The documentation states that I should now be able to online the disk again, which I tried. The result is shown below:

SQL> alter diskgroup data online all;

Diskgroup altered.

I am showing you the complete ASM alert.log output here, feel free to skip to the bottom:

2016-09-29 13:54:44.774000 +01:00
SQL> alter diskgroup data online all
NOTE: cache closing disk 1 of grp 1: (not open) DATA_0001
NOTE: initiating resync of disk group 1 disks
DATA_0001 (1)
NOTE: process _user3750_+asm (3750) initiating offline of disk 1.3916025044 (DATA_0001) with mask 0x7e in group 1 (DATA) without client assisting
NOTE: sending set offline flag message (4172832432) to 1 disk(s) in group 1
WARNING: Disk 1 (DATA_0001) in group 1 mode 0x1 is now being offlined
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0xe969ccd4, mask = 0x6a, op = clear
GMON updating disk modes for group 1 at 17 for pid 20, osid 3750
NOTE: cache closing disk 1 of grp 1: (not open) DATA_0001
NOTE: PST update grp = 1 completed successfully
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0xe969ccd4, mask = 0x7e, op = clear
GMON updating disk modes for group 1 at 18 for pid 20, osid 3750
NOTE: cache closing disk 1 of grp 1: (not open) DATA_0001
NOTE: PST update grp = 1 completed successfully
NOTE: requesting all-instance membership refresh for group=1
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0x0, mask = 0x11, op = assign
GMON updating disk modes for group 1 at 19 for pid 20, osid 3750
NOTE: cache closing disk 1 of grp 1: (not open) DATA_0001
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: PST update grp = 1 completed successfully
NOTE: requesting all-instance disk validation for group=1
NOTE: disk validation pending for 1 disk in group 1/0xceb93c35 (DATA)
NOTE: Found /dev/asm-disk01 for disk DATA_0001
NOTE: completed disk validation for 1/0xceb93c35 (DATA)
NOTE: discarding redo for group 1 disk 1
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0x0, mask = 0x19, op = assign
GMON updating disk modes for group 1 at 20 for pid 20, osid 3750
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: PST update grp = 1 completed successfully
NOTE: membership refresh pending for group 1/0xceb93c35 (DATA)
GMON querying group 1 at 21 for pid 16, osid 2535
NOTE: cache opening disk 1 of grp 1: DATA_0001 path:/dev/asm-disk01
SUCCESS: refreshed membership for 1/0xceb93c35 (DATA)
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0x0, mask = 0x5d, op = assign
GMON updating disk modes for group 1 at 22 for pid 20, osid 3750
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: PST update grp = 1 completed successfully
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0x0, mask = 0x7d, op = assign
GMON updating disk modes for group 1 at 23 for pid 20, osid 3750
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: PST update grp = 1 completed successfully
SUCCESS: alter diskgroup data online all
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: starting rebalance of group 1/0xceb93c35 (DATA) at power 1
Starting background process ARB0
ARB0 started with pid=23, OS id=4986
NOTE: assigning ARB0 to group 1/0xceb93c35 (DATA) with 1 parallel I/O
2016-09-29 13:54:48.265000 +01:00
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0x0, mask = 0x7f, op = assign
GMON updating disk modes for group 1 at 24 for pid 23, osid 4986
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: PST update grp = 1 completed successfully
NOTE: reset timers for disk: 1
NOTE: completed online of disk group 1 disks
DATA_0001 (1)
2016-09-29 13:54:49.580000 +01:00
NOTE: stopping process ARB0
NOTE: requesting all-instance membership refresh for group=1
SUCCESS: rebalance completed for group 1/0xceb93c35 (DATA)
NOTE: membership refresh pending for group 1/0xceb93c35 (DATA)
GMON querying group 1 at 25 for pid 16, osid 2535
SUCCESS: refreshed membership for 1/0xceb93c35 (DATA)
2016-09-29 13:54:51.286000 +01:00
NOTE: Attempting voting file refresh on diskgroup DATA

This looks promising, but what about V$ASM_DISK?

SQL> select name,disk_number,path,mount_status,header_status,mode_status,state,failgroup,repair_timer
  2  from v$asm_disk where group_number = 1 order by disk_number
  3  /

NAME            DISK_NUMBER PATH            MOUNT_S HEADER_STATU MODE_ST STATE    FAILGROUP       REPAIR_TIMER
--------------- ----------- --------------- ------- ------------ ------- -------- --------------- ------------
DATA_0000                 0 /dev/asm-disk02 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0001                 1 /dev/asm-disk01 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0002                 2 /dev/asm-disk03 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0003                 3 /dev/asm-disk06 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0004                 4 /dev/asm-disk07 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0005                 5 /dev/asm-disk08 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0006                 6 /dev/asm-disk04 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0007                 7 /dev/asm-disk05 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0

8 rows selected.

SQL> 

Hurray, I have my disk back! And at no time did I see a rebalance operation. Thanks ASM!

Posted in 12c Release 1, KVM, Linux | 1 Comment »

Preventing a SPOF with CMAN 12c

Posted by Martin Bach on July 25, 2016

I have already written about the use of Connection Manager 11.2 to govern access to a database. While researching this piece I have updated the original post so that it’s relevant for 12c as well.

Although the idea of using Connection Manager is tempting, the obvious first question is about high availability of the process, you don’t want it to become a Single Point of Failure (SPOF). After all, if the gatekeeper to your database environment fails, you are effectively closed for business. One option would be to add another CMAN process that could be used.

Lab Setup

I have created a few VMs to simulate the scenario. I installed CMAN in 2 VMs running Oracle Linux 7 and CMAN 12c. The client homes have been patched up to the April PSU because that’s considered good practice (although I couldn’t find a reference to CMAN in the OPatch logs). The hosts are named oracledev and server4.

I have a 4 node RAC 12c database named NCDB to which the CMAN rules provide access. The SCAN is named rac12pri-scan.

And finally I have an Oracle 12c installation on server3 which will act as the client.

CMAN configuration

The CMAN configuration is quite straight forward. Following my earlier post’s instructions I created the following $ORACLE_HOME/network/cman.ora on oracledev:

[oracle@oracledev ~]$ cat $ORACLE_HOME/network/admin/cman.ora
cman1 =
  (configuration=
    (address=(protocol=tcp)(host=oracledev)(port=1521))
    (rule_list=
      (rule=(src=oracledev)(dst=127.0.0.1)(srv=cmon)(act=accept))
      (rule=(src=*)(dst=rac12pri-scan)(srv=NCDB)(act=accept))
    )
  )

Please note that the configuration file has been kept at its bare minimum to keep the discussion simple. You should review the net services reference guide for a list of available parameters and how to secure the system.

The configuration on host server4 is identical except that the CMAN host has changed. Using that information I can start the CMAN process:

[oracle@server4 ~]$ cmctl startup -c cman1

CMCTL for Linux: Version 12.1.0.2.0 - Production on 06-JUL-2016 13:38:32

Copyright (c) 1996, 2014, Oracle.  All rights reserved.

Current instance cman1 is not yet started
Connecting to (address=(protocol=tcp)(host=server4)(port=1521))
Starting Oracle Connection Manager instance cman1. Please wait...
TNS-04077: WARNING: No password set for the Oracle Connection Manager instance.
CMAN for Linux: Version 12.1.0.2.0 - Production
Status of the Instance
----------------------
Instance name             cman1
Version                   CMAN for Linux: Version 12.1.0.2.0 - Production
Start date                06-JUL-2016 13:38:32
Uptime                    0 days 0 hr. 0 min. 9 sec
Num of gateways started   2
Average Load level        0
Log Level                 SUPPORT
Trace Level               OFF
Instance Config file      /u01/app/oracle/product/12.1.0.2/client_1/network/admin/cman.ora
Instance Log directory    /u01/app/oracle/diag/netcman/server4/cman1/alert
Instance Trace directory  /u01/app/oracle/diag/netcman/server4/cman1/trace
The command completed successfully.
[oracle@server4 ~]$ 

I can start the second one the same way.

Client configuration

So this is all fair and good, but how can I configure the client to connect to the database? I would like two things to happen:

  • Connect time load balancing. In other words, don’t hammer just one of the two CMAN processes with incoming connection requests while the second one is idling around
  • Ensure that the alternative CMAN is tried in case one of the two systems dies

Things seemed a bit more complicated because I have to use SOURCE_ROUTE to indicate to the client that a connection manager process is involved. For instance, if I have a single CMAN process, I would use the following entry in tnsnames.ora:

NCDB_1CMAN =
 (description=
   (address_list=
     (source_route=yes)
     (address=(protocol=tcp)(host=server4)(port=1521))
     (address=(protocol=tcp)(host=rac12pri-scan)(port=1521))
   )
   (connect_data=
    (service_name=NCDB)
   )
 )

Relying on connection manager on “server4” I connect through to the NCDB database on my RAC system.

[oracle@server3 tns]$ sqlplus martin@NCDB_1CMAN

SQL*Plus: Release 12.1.0.2.0 Production on Wed Jul 6 13:44:46 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Enter password: 
Last Successful login time: Wed Jul 06 2016 13:44:34 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL> select count(*) from v$active_instances;

  COUNT(*)
----------
         4

But what happens if server4 is not available? Well you guessed right:

[oracle@server3 tns]$ sqlplus martin@NCDB_1CMAN

SQL*Plus: Release 12.1.0.2.0 Production on Wed Jul 6 13:52:51 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Enter password: 
ERROR:
ORA-12541: TNS:no listener

Enter user-name: 

This is the single point of failure I would like to prevent.

Using Two CMAN processes in the tnsnames.ora file

After a bit of digging around on My Oracle Support I gave up and used my search engine skills, and that proved more effective. The Oracle Database Net Services References has the correct example in chapter 6, which looks as follows after I applied it to my environment:

NCDB_2CMANS =
  (description =
    (load_balance = on)
    (failover = on)
    (address_list =                   # first CMAN on server4
      (source_route = yes)
      (address = (protocol = tcp)(host = server4)(port = 1521))
      (address=(protocol=tcp)(host=rac12pri-scan)(port=1521))
    )
    (address_list =                   # second CMAN on oracledev
      (source_route = yes)
      (address = (protocol = tcp)(host = oracledev)(port = 1521))
      (address=(protocol=tcp)(host=rac12pri-scan)(port=1521))
    )
    (connect_data =
      (service_name = NCDB)
      (server = dedicated)
    )
  )

You read this as follows:

  • The service name entry starts with the DESCRIPTION tag.
  • On the top level LOAD_BALANCE and FAILOVER are enabled. This allows client side load balancing plus failover to the next CMAN entry
  • Each CMAN host has its own ADDRESS_LIST. Essentially it’s the NCDB_1CMAN entry with the SOURCE_ROUTE, CMAN host and SCAN
  • There is nothing different in the CONNECT_DATA section compared with the previous, single CMAN-entry

The Result

As a result, I can connect to the database now – I haven’t got a CMAN process on server4 but there is one on oracledev:

[oracle@server4 ~]$ cmctl show status -c cman1

CMCTL for Linux: Version 12.1.0.2.0 - Production on 06-JUL-2016 14:17:16

Copyright (c) 1996, 2014, Oracle.  All rights reserved.

Current instance cman1 is not yet started
Connecting to (address=(protocol=tcp)(host=server4)(port=1521))
TNS-04011: Oracle Connection Manager instance not yet started.
[oracle@server4 ~]$ 

[oracle@oracledev ~]$ cmctl show status -c cman1

CMCTL for Linux: Version 12.1.0.2.0 - Production on 06-JUL-2016 14:17:45

Copyright (c) 1996, 2014, Oracle.  All rights reserved.

TNS-04077: WARNING: No password set for the Oracle Connection Manager instance.
Current instance cman1 is already started
Connecting to (address=(protocol=tcp)(host=oracledev)(port=1521))
Status of the Instance
----------------------
Instance name             cman1
Version                   CMAN for Linux: Version 12.1.0.2.0 - Production
Start date                06-JUL-2016 13:18:19
Uptime                    0 days 0 hr. 59 min. 26 sec
Num of gateways started   2
Average Load level        0
Log Level                 SUPPORT
Trace Level               OFF
Instance Config file      /u01/app/oracle/product/12.1.0.2/client_1/network/admin/cman.ora
Instance Log directory    /u01/app/oracle/diag/netcman/oracledev/cman1/alert
Instance Trace directory  /u01/app/oracle/diag/netcman/oracledev/cman1/trace
The command completed successfully.
[oracle@oracledev ~]$ 

A connection can be established:

[oracle@server3 tns]$ sqlplus martin@NCDB_2CMANS

SQL*Plus: Release 12.1.0.2.0 Production on Wed Jul 6 14:18:13 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Enter password: 
Last Successful login time: Wed Jul 06 2016 13:44:49 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL> select count(*) from v$active_instances;

  COUNT(*)
----------
         4

SQL>

Using an approach like this allows me to connect to the database even if one of the two CMAN systems are down. But what about load balancing? This appears to work as well, tailing the cman1.log file I can see that out of 10 sessions I requested in a bash for-loop they have been evenly spread across both CMAN hosts:

-- server4

06-JUL-2016 13:21:49 * (connect_data=(service_name=NCDB)(server=dedicated)(CID=(PROGRAM=sqlplus)
  (HOST=server3.example.com)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.53)(PORT=19796)) * establish * NCDB * 0
06-JUL-2016 13:21:49 * (connect_data=(service_name=NCDB)(server=dedicated)(CID=(PROGRAM=sqlplus)
  (HOST=server3.example.com)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.53)(PORT=19798)) * establish * NCDB * 0
06-JUL-2016 13:21:50 * (connect_data=(service_name=NCDB)(server=dedicated)(CID=(PROGRAM=sqlplus)
  (HOST=server3.example.com)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.53)(PORT=19800)) * establish * NCDB * 0
06-JUL-2016 13:21:50 * (connect_data=(service_name=NCDB)(server=dedicated)(CID=(PROGRAM=sqlplus)
  (HOST=server3.example.com)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.53)(PORT=19801)) * establish * NCDB * 0
06-JUL-2016 13:21:50 * (connect_data=(service_name=NCDB)(server=dedicated)(CID=(PROGRAM=sqlplus)
  (HOST=server3.example.com)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.53)(PORT=19802)) * establish * NCDB * 0

-- oracledev

06-JUL-2016 13:21:50 * (connect_data=(service_name=NCDB)(server=dedicated)(CID=(PROGRAM=sqlplus)
  (HOST=server3.example.com)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.53)(PORT=22637)) * establish * NCDB * 0
06-JUL-2016 13:21:50 * (connect_data=(service_name=NCDB)(server=dedicated)(CID=(PROGRAM=sqlplus)
  (HOST=server3.example.com)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.53)(PORT=22639)) * establish * NCDB * 0
06-JUL-2016 13:21:50 * (connect_data=(service_name=NCDB)(server=dedicated)(CID=(PROGRAM=sqlplus)
  (HOST=server3.example.com)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.53)(PORT=22643)) * establish * NCDB * 0
06-JUL-2016 13:21:51 * (connect_data=(service_name=NCDB)(server=dedicated)(CID=(PROGRAM=sqlplus)
  (HOST=server3.example.com)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.53)(PORT=22644)) * establish * NCDB * 0
06-JUL-2016 13:21:51 * (connect_data=(service_name=NCDB)(server=dedicated)(CID=(PROGRAM=sqlplus)
  (HOST=server3.example.com)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.53)(PORT=22645)) * establish * NCDB * 0

So both boxes ticked.

Posted in 12c Release 1, Linux | Tagged: , | Leave a Comment »

Tales from the field: potential reasons for PDB plug-in violations part 2

Posted by Martin Bach on July 14, 2016

In the first part of this little series I explained how a missing (default) tablespace in a PDB can have an impact when opening it. In this instalment I’ll add another common cause for PDBs not opening in read-write mode: different patch levels between the PDB to be plugged in and the CDB.

Scenario

Assume for a moment that you upgraded Oracle binaries from 12.1.0.2.5 to 12.1.0.2.160419. In other words, you applied the most current PSU + OJVM patch at the time of writing to your database. Once complete, you would see output similar to this for a single instance Oracle 12c installation without Oracle Restart:

[oracle@server2 ~]$ opatch lspatches
22674709;Database PSU 12.1.0.2.160419, Oracle JavaVM Component (Apr2016)
22291127;Database Patch Set Update : 12.1.0.2.160419 (22291127)

OPatch succeeded.

I am using database CDB1 in this blog post to indicate the patched CDB:

SYS@CDB$ROOT-CDB1> select patch_id,action,status,description
  2  from dba_registry_sqlpatch;

  PATCH_ID ACTION          STATUS
---------- --------------- ---------------
DESCRIPTION
--------------------------------------------------------------------------------
  21555660 APPLY           SUCCESS
Database PSU 12.1.0.2.5, Oracle JavaVM Component (Oct2015)

  21359755 APPLY           SUCCESS
Database Patch Set Update : 12.1.0.2.5 (21359755)

  21555660 ROLLBACK        SUCCESS
Database PSU 12.1.0.2.5, Oracle JavaVM Component (Oct2015)

  22674709 APPLY           SUCCESS
Database PSU 12.1.0.2.160419, Oracle JavaVM Component (Apr2016)

  22291127 APPLY           SUCCESS
Database Patch Set Update : 12.1.0.2.160419 (22291127)

NB: I double-checked twice and the above output seems correct: there is no rollback of the 12.1.0.2.5 DB PSU part prior to the installation of the new one.

Let’s also assume that you would like to plug a PDB into the newly patched CDB. The PDB to be unplugged from the other CDB (named CDB2) has a lower patch level:

SYS@CDB$ROOT-CDB2> select patch_id,action,status,description
  2  from dba_registry_sqlpatch;

  PATCH_ID ACTION          STATUS
---------- --------------- ---------------
DESCRIPTION
--------------------------------------------------------------------------------
  21555660 APPLY           SUCCESS
Database PSU 12.1.0.2.5, Oracle JavaVM Component (Oct2015)

  21359755 APPLY           SUCCESS
Database Patch Set Update : 12.1.0.2.5 (21359755)

Plugging in

The steps needed to plug the PDB into its new home have been discussed so many times that I have decided against showing them again. Please refer to the official documentation set (Database Administrator’s Guide) for all the detail. The PDB I would like to plug in to CDB1 is named PLUGINTEST.

SYS@CDB$ROOT-CDB1> create pluggable database plugintest 
  2  using '/home/oracle/plugintest.xml' nocopy;

Pluggable database created.

SYS@CDB$ROOT-CDB1> alter pluggable database plugintest open;

Warning: PDB altered with errors.

With the information I provided previously you can probably guess what’s wrong, but if you are a production DBA who has been tasked to “plug in a PDB” you might not have the background knowledge about patch levels of all the databases in the estate. How can you find out what went wrong? First stop is the alert.log:

alter pluggable database plugintest open
Thu Jul 07 11:02:30 2016
Pluggable database PLUGINTEST dictionary check beginning
Pluggable Database PLUGINTEST Dictionary check complete
Database Characterset for PLUGINTEST is WE8MSWIN1252
***************************************************************
WARNING: Pluggable Database PLUGINTEST with pdb id - 5 is
         altered with errors or warnings. Please look into
         PDB_PLUG_IN_VIOLATIONS view for more details.
***************************************************************
Opening pdb PLUGINTEST (5) with no Resource Manager plan active
Pluggable database PLUGINTEST opened read write
Completed: alter pluggable database plugintest open

This sounds a bit tragic, but not all is lost. The primary issue is that the PDB is opened in restricted mode, which is not terribly useful for ordinary users:

SYS@CDB$ROOT-CDB1> select con_id, name, open_mode, restricted
  2* from v$pdbs where name = 'PLUGINTEST'

    CON_ID NAME                           OPEN_MODE  RES
---------- ------------------------------ ---------- ---
         5 PLUGINTEST                     READ WRITE YES

SYS@CDB$ROOT-CDB1> conn user1/user1@localhost/plugintest
ERROR:
ORA-01035: ORACLE only available to users with RESTRICTED SESSION privilege


Warning: You are no longer connected to ORACLE.

The detail is – again – in PDB_PLUG_IN_VIOLATIONS:

SYS@CDB$ROOT-CDB1> select cause, type, message
  2  from PDB_PLUG_IN_VIOLATIONS
  3  where name = 'PLUGINTEST';

CAUSE                                                            TYPE
---------------------------------------------------------------- ---------
MESSAGE
--------------------------------------------------------------------------------
SQL Patch                                                        ERROR
PSU bundle patch 160419 (Database Patch Set Update : 12.1.0.2.160419 (22291127))
: Installed in the CDB but not in the PDB.

SQL Patch                                                        ERROR
PSU bundle patch 5 (Database Patch Set Update : 12.1.0.2.5 (21359755)): Installe
d in the PDB but not in the CDB.

SQL Patch                                                        ERROR
SQL patch ID/UID 21555660/19361790 (Database PSU 12.1.0.2.5, Oracle JavaVM Compo
nent (Oct2015)): Installed in the PDB but not in the CDB.

SQL Patch                                                        ERROR
SQL patch ID/UID 22674709/20057886 (Database PSU 12.1.0.2.160419, Oracle JavaVM
Component (Apr2016)): Installed in the CDB but not in the PDB.

Although the formatting is a bit of a pain to the eye, it should become clear that there are different patch levels preventing the PDB from opening read-write. If the number of options were different between the system a similar error would be thrown. But how does Oracle know about all that? It’s encoded in the XML file that you create when you unplug the PDB. In my case, here are the options used:

    <options>
      <option>APS=12.1.0.2.0</option>
      <option>CATALOG=12.1.0.2.0</option>
      <option>CATJAVA=12.1.0.2.0</option>
      <option>CATPROC=12.1.0.2.0</option>
      <option>CONTEXT=12.1.0.2.0</option>
      <option>DV=12.1.0.2.0</option>
      <option>JAVAVM=12.1.0.2.0</option>
      <option>OLS=12.1.0.2.0</option>
      <option>ORDIM=12.1.0.2.0</option>
      <option>OWM=12.1.0.2.0</option>
      <option>SDO=12.1.0.2.0</option>
      <option>XDB=12.1.0.2.0</option>
      <option>XML=12.1.0.2.0</option>
      <option>XOQ=12.1.0.2.0</option>
    </options>

Likewise, patches are recorded in the XML file, too:

    <sqlpatches>
      <sqlpatch>PSU bundle patch 5 (Database Patch Set Update : 12.1.0.2.5 (21359755)): 
                 APPLY SUCCESS</sqlpatch>
      <sqlpatch>SQL patch ID/UID 21555660/19361790 (Database PSU 12.1.0.2.5, Oracle JavaVM 
                 Component (Oct2015)): APPLY SUCCESS</sqlpatch>
    </sqlpatches>

Corrective Action

The suggested action is conveniently presented as well.

SYS@CDB$ROOT-CDB1> select cause, type, action from pdb_plug_in_violations where name = 'PLUGINTEST';

CAUSE      TYPE      ACTION
---------- --------- ------------------------------------------------------------
SQL Patch  ERROR     Call datapatch to install in the PDB or the CDB
SQL Patch  ERROR     Call datapatch to install in the PDB or the CDB
SQL Patch  ERROR     Call datapatch to install in the PDB or the CDB
SQL Patch  ERROR     Call datapatch to install in the PDB or the CDB

But you don’t follow blindly what you read, do you ;) So I headed over to the official documentation set and My Oracle Support. I found MOS note 1585822.1 most useful; it explains a similar situation (PDB was closed while datapatch ran) to the one I am seeing. So I decided to try running datapatch again.

SYS@CDB$ROOT-CDB1> alter pluggable database plugintest close immediate;

Pluggable database altered.

SYS@CDB$ROOT-CDB1> alter pluggable database plugintest open upgrade;

Pluggable database altered.

SYS@CDB$ROOT-CDB1>  exit
Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

[oracle@server2 ~]$ cd $ORACLE_HOME/OPatch
[oracle@server2 OPatch]$ ./datapatch -verbose
SQL Patching tool version 12.1.0.2.0 on Thu Jul  7 11:20:00 2016
Copyright (c) 2015, Oracle.  All rights reserved.

Log file for this invocation: /u01/app/oracle/cfgtoollogs/sqlpatch/sqlpatch_15280_2016_07_07_11_20_00/sqlpatch_invocation.log

Connecting to database...OK
Note:  Datapatch will only apply or rollback SQL fixes for PDBs
       that are in an open state, no patches will be applied to closed PDBs.
       Please refer to Note: Datapatch: Database 12c Post Patch SQL Automation
       (Doc ID 1585822.1)
Bootstrapping registry and package to current versions...done
Determining current state...done

Current state of SQL patches:
Patch 21555660 (Database PSU 12.1.0.2.5, Oracle JavaVM Component (Oct2015)):
  Installed in PLUGINTEST only
Patch 22674709 (Database PSU 12.1.0.2.160419, Oracle JavaVM Component (Apr2016)):
  Installed in binary and CDB$ROOT PDB$SEED SWINGBENCH1
Bundle series PSU:
  ID 160419 in the binary registry and ID 160419 in PDB CDB$ROOT, ID 160419 in PDB PDB$SEED, 
  ID 160419 in PDB SWINGBENCH1, ID 5 in PDB PLUGINTEST

Adding patches to installation queue and performing prereq checks...
Installation queue:
  For the following PDBs: CDB$ROOT PDB$SEED SWINGBENCH1
    Nothing to roll back
    Nothing to apply
  For the following PDBs: PLUGINTEST
    The following patches will be rolled back:
      21555660 (Database PSU 12.1.0.2.5, Oracle JavaVM Component (Oct2015))
    The following patches will be applied:
      22674709 (Database PSU 12.1.0.2.160419, Oracle JavaVM Component (Apr2016))
      22291127 (Database Patch Set Update : 12.1.0.2.160419 (22291127))

Installing patches...
Patch installation complete.  Total patches installed: 3

Validating logfiles...
Patch 21555660 rollback (pdb PLUGINTEST): SUCCESS
  logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/21555660/19361790/21555660_rollback_CDB1_PLUGINTEST_2016Jul07_11_20_35.log (no errors)
Patch 22674709 apply (pdb PLUGINTEST): SUCCESS
  logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/22674709/20057886/22674709_apply_CDB1_PLUGINTEST_2016Jul07_11_21_03.log (no errors)
Patch 22291127 apply (pdb PLUGINTEST): SUCCESS
  logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/22291127/19694308/22291127_apply_CDB1_PLUGINTEST_2016Jul07_11_21_04.log (no errors)
SQL Patching tool complete on Thu Jul  7 11:21:14 2016

This fixed my problem:

SYS@CDB$ROOT-CDB1> alter pluggable database plugintest close immediate;

Pluggable database altered.

SYS@CDB$ROOT-CDB1> alter pluggable database plugintest open;

Pluggable database altered.

SYS@CDB$ROOT-CDB1> select con_id, name, open_mode, restricted
  2 from v$pdbs where name = 'PLUGINTEST';

    CON_ID NAME                           OPEN_MODE  RES
---------- ------------------------------ ---------- ---
         5 PLUGINTEST                     READ WRITE NO

Interestingly the entries in PDB_PLUG_IN_VIOLATIONS do not disappear, but there is a timestamp in the view that should help you find out of the message is current or not.

Posted in 12c Release 1, Linux | Leave a Comment »