Martins Blog

Trying to explain complex things in simple terms

Archive for the ‘Oracle’ Category

Oracle (Database and Middleware) related posts

New Events for Data Guard and Synchronous Redo Transport in 12c (1)

Posted by Martin Bach on February 24, 2017

I may have said it before but I consider presenting and teaching a great way to expand one’s knowledge: first of all it requires me to really understand a subject. Secondly, when presenting, you get lots of interesting questions that can turn into blog posts like this one.

Lately I have been asked about the impact of synchronous log shipping to a physical standby database. I was sure there was an effect to be observed, depending most likely on the network latency between systems but I didn’t have any evidence I could pull out of the hat to back up my thoughts. So what better than trying! I also read that some of the events have changed in 12c, and wanted to make them visible. My environment is based on the 2 node RAC primary/2 node RAC standby configuration I wrote about in my previous posts.

Since their initial setup I upgraded the cluster to 12.1.0.2.170117 for Clusterware and RDBMS.

I am planning to share my findings in two posts: the first one (the one you are reading) will present the baseline, the second one to follow shows how performance can be impacted by poor network performance. I guess the second one will be a bit longer …

Background

Data Guard allows the administrator to define the databases in a Data Guard configuration to be in one of three possible modes. The first one is Maximum Performance, the second Maximum Availability and finally Maximum Protection. All three modes are covered in the official documentation set and MOS. Without saying it very scientifically:

  • Maximum Performance uses asynchronous log shipping and is the default. The performance of the standby database (or rather lack thereof) should not impact the performance of the primary. At a cost: it is not guaranteed that redo is actually written to the standby when the commit command returns.
  • Maximum Protection allows you to enforce a Zero Data Loss (ZDL) strategy but also at a cost: The primary will shut down if it can’t transmit redo information to the standby. That is how you prevent data loss!
  • Maximum Availability is the middle ground and I’ll focus on that protection mode here.

Make sure to read the documentation and understand the implications of each protection mode when assessing the most suitable mode for your environment!

Background: New Events

Before going into details I should probably point out that there are some new Data Guard events. Hurray! More events and more meaningful names, great. Just one caveat-have a look at the output from my 12c database:

select name, wait_class from v$event_name
where name in (
'ARCH Remote Write','ASYNC Remote Write','Redo Transport Attach',
'Redo Transport Close','Redo Transport Detach','Redo Transport Open',
'Redo Transport Ping','Redo Transport Slave Shutdown','Redo Transport Slave Startup',
'Redo Writer Remote Sync Complete', 'Redo Writer Remote Sync Notify',
'Remote SYNC Ping','SYNC Remote Write');

NAME                                               WAIT_CLASS
-------------------------------------------------- ----------------------------------------
ASYNC Remote Write                                 Other
SYNC Remote Write                                  Other
ARCH Remote Write                                  Other
Redo Transport Attach                              Other
Redo Transport Detach                              Other
Redo Transport Open                                Other
Redo Transport Close                               Other
Redo Transport Ping                                Other
Remote SYNC Ping                                   Other
Redo Transport Slave Startup                       Other
Redo Transport Slave Shutdown                      Other
Redo Writer Remote Sync Notify                     Other
Redo Writer Remote Sync Complete                   Other

13 rows selected.

Compare this to the pre 12c events (taken from an 11.2.0.3 database since they are mostly gone in 12c)

select name, wait_class from v$event_name
where name in (
'ARCH wait on ATTACH','ARCH wait on SENDREQ','ARCH wait on DETACH',
'LNS wait on ATTACH', 'LNS wait on SENDREQ','LNS wait on DETACH',
'LGWR wait on LNS','LNS wait on LGWR','LGWR-LNS wait on channel');

NAME                                               WAIT_CLASS
-------------------------------------------------- --------------------
ARCH wait on ATTACH                                Network
LNS wait on ATTACH                                 Network
LNS wait on SENDREQ                                Network
LNS wait on DETACH                                 Network
ARCH wait on SENDREQ                               Network
ARCH wait on DETACH                                Network
LNS wait on LGWR                                   Network
LGWR wait on LNS                                   Network
LGWR-LNS wait on channel                           Other

9 rows selected.

For the sake of completeness here are those events that remained in 12c:

select name, wait_class from v$event_name
where name in (
'ARCH wait on ATTACH','ARCH wait on SENDREQ','ARCH wait on DETACH',
'LNS wait on ATTACH', 'LNS wait on SENDREQ','LNS wait on DETACH',
'LGWR wait on LNS','LNS wait on LGWR','LGWR-LNS wait on channel');

NAME                           WAIT_CLASS
------------------------------ ------------------------------
LNS wait on LGWR               Network
LGWR wait on LNS               Network
LGWR-LNS wait on channel       Other

Bugger. The new events are all in the “Other” wait class. My initial attempts at filtering information from v$event_histogram based on events with the “Network” class had little success. I guess friends of the OEM performance pages might also find that interesting. If you wonder from where I pulled these events – I used a white paper named “Best Practices for Synchronous Redo Transport” as reference.

Testing: The Environment

As you read in the introduction, this environment is based on a two node RAC primary -> two node RAC standby configuration:

DGMGRL> show configuration verbose

Configuration - ractest

  Protection Mode: MaxPerformance
  Members:
  NCDBA - Primary database
    NCDBB - Physical standby database

  Properties:
    FastStartFailoverThreshold      = '30'
    OperationTimeout                = '30'
    TraceLevel                      = 'USER'
    FastStartFailoverLagLimit       = '30'
    CommunicationTimeout            = '180'
    ObserverReconnect               = '0'
    FastStartFailoverAutoReinstate  = 'TRUE'
    FastStartFailoverPmyShutdown    = 'TRUE'
    BystandersFollowRoleChange      = 'ALL'
    ObserverOverride                = 'FALSE'
    ExternalDestination1            = ''
    ExternalDestination2            = ''
    PrimaryLostWriteAction          = 'CONTINUE'

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS
DGMGRL> show database 'NCDBA';

Database - NCDBA

  Role:               PRIMARY
  Intended State:     TRANSPORT-ON
  Instance(s):
    NCDBA1
    NCDBA2

Database Status:
SUCCESS

DGMGRL> show database 'NCDBB'

Database - NCDBB

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 0 seconds ago)
  Apply Lag:          0 seconds (computed 0 seconds ago)
  Average Apply Rate: 177.00 KByte/s
  Real Time Query:    OFF
  Instance(s):
    NCDBB1
    NCDBB2 (apply instance)

Database Status:
SUCCESS

DGMGRL> show database 'NCDBA' LogXptMode
  LogXptMode = 'async'
DGMGRL> show database 'NCDBB' LogXptMode
  LogXptMode = 'async'

The configuration is currently set to “Maximum Performance”, databases ship redo asynchronously.

All systems are patched to the January 2017 Proactive Bundle Patch:

[oracle@rac12pri1 ~]$ $ORACLE_HOME/OPatch/opatch lspatches
24917972;Database PSU 12.1.0.2.170117, Oracle JavaVM Component (JAN2017)
24732082;Database Patch Set Update : 12.1.0.2.170117 (24732082)
24828633;OCW Patch Set Update : 12.1.0.2.170117 (24828633)

OPatch succeeded.

Testing part I: The Baseline

I am using Swingbench’s Order Entry benchmark to generate a little bit of load. For the avoidance of doubt: I am not interested in a comparison between asynchronous and synchronous log shipping. I would like to see the effect caused by the difference in latency in the form of hugely increased waits. In my first test I am running with the defaults of my VM (the NIC uses virtio drivers), no change to SDU or send/receive buffers.

$ date
Fri Feb 17 13:58:58 UTC 2017
$ ./charbench -cs //rac12pri-scan/swingbench_both \
> -uc 20 -r results_20_maxperf.xml -rt 00:10
Author  :        Dominic Giles
Version :        2.5.0.971

Results will be written to results_20_maxperf.xml.
Hit Return to Terminate Run...

Time            Users   TPM     TPS

1:59:55 PM      20      3852    255

On the apply side this translates to the following figures (the average apply rate is misleading and should be ignored; it captures all apply since MRP0 was started)

DGMGRL> show database 'NCDBB'

Database - NCDBB

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 0 seconds ago)
  Apply Lag:          0 seconds (computed 0 seconds ago)
  Average Apply Rate: 713.00 KByte/s
  Real Time Query:    OFF
  Instance(s):
    NCDBB1
    NCDBB2 (apply instance)

Database Status:
SUCCESS

There is no transport lag, and no apply lag either. When checking for waits on background processes (using Tanel Poder’s ashtop – which requires you to have the Diagnostic Pack license as it access v$active_session_history) I noticed the following:

SQL> @ashtop program,event,wait_class,session_state session_type='BACKGROUND' "to_date('17.02.2017 09:02','dd.mm.yyyy hh24:mi')" "to_date('17.02.2017 09:07','dd.mm.yyyy hh24:mi')"

    Total
  Seconds     AAS %This   PROGRAM                                          EVENT                                    WAIT_CLASS      SESSION FIRST_SEEN          LAST_SEEN           DIST_SQLEXEC_SEEN
--------- ------- ------- ------------------------------------------------ ---------------------------------------- --------------- ------- ------------------- ------------------- -----------------
       53      .2   29% | oracle@rac12pri2 (LGWR)                          log file parallel write                  System I/O      WAITING 2017-02-17 09:03:42 2017-02-17 09:06:53                 1
       23      .1   13% | oracle@rac12pri2 (LG00)                          log file parallel write                  System I/O      WAITING 2017-02-17 09:02:03 2017-02-17 09:03:26                 1
       19      .1   10% | oracle@rac12pri2 (DBW0)                                                                                   ON CPU  2017-02-17 09:02:07 2017-02-17 09:06:39                 1
       12      .0    7% | oracle@rac12pri2 (LGWR)                          target log write size                    Other           WAITING 2017-02-17 09:02:03 2017-02-17 09:03:27                 1
        9      .0    5% | oracle@rac12pri1 (LMS1)                                                                                   ON CPU  2017-02-17 09:04:37 2017-02-17 09:06:58                 1
        8      .0    4% | oracle@rac12pri2 (TT00)                                                                                   ON CPU  2017-02-17 09:02:15 2017-02-17 09:06:56                 1
        6      .0    3% | oracle@rac12pri1 (LG00)                          log file parallel write                  System I/O      WAITING 2017-02-17 09:03:08 2017-02-17 09:06:44                 1
        6      .0    3% | oracle@rac12pri2 (LGWR)                                                                                   ON CPU  2017-02-17 09:04:05 2017-02-17 09:06:41                 1
        5      .0    3% | oracle@rac12pri2 (LG00)                                                                                   ON CPU  2017-02-17 09:02:02 2017-02-17 09:03:27                 1
        5      .0    3% | oracle@rac12pri2 (LMS0)                                                                                   ON CPU  2017-02-17 09:02:07 2017-02-17 09:06:14                 1
        5      .0    3% | oracle@rac12pri2 (LMS1)                                                                                   ON CPU  2017-02-17 09:02:24 2017-02-17 09:06:28                 1
        4      .0    2% | oracle@rac12pri1 (LMS0)                                                                                   ON CPU  2017-02-17 09:04:21 2017-02-17 09:06:05                 1
        3      .0    2% | oracle@rac12pri1 (LGWR)                          LGWR all worker groups                   Other           WAITING 2017-02-17 09:03:08 2017-02-17 09:06:13                 1
        2      .0    1% | oracle@rac12pri1 (DBW0)                          db file parallel write                   System I/O      WAITING 2017-02-17 09:03:08 2017-02-17 09:03:42                 1
        2      .0    1% | oracle@rac12pri2 (CKPT)                                                                                   ON CPU  2017-02-17 09:05:10 2017-02-17 09:05:13                 1
        2      .0    1% | oracle@rac12pri2 (TT00)                          log file sequential read                 System I/O      WAITING 2017-02-17 09:03:17 2017-02-17 09:03:46                 1
        1      .0    1% | oracle@rac12pri1 (CJQ0)                                                                                   ON CPU  2017-02-17 09:04:44 2017-02-17 09:04:44                 1
        1      .0    1% | oracle@rac12pri1 (DBW0)                                                                                   ON CPU  2017-02-17 09:03:05 2017-02-17 09:03:05                 1
        1      .0    1% | oracle@rac12pri1 (LG00)                                                                                   ON CPU  2017-02-17 09:02:02 2017-02-17 09:02:02                 1
        1      .0    1% | oracle@rac12pri1 (LGWR)                          target log write size                    Other           WAITING 2017-02-17 09:05:47 2017-02-17 09:05:47                 1

The time difference between charbench and the database can be explained by time zones: my load generator is set to use UTC whereas the database is on EST; the result is a 6 hour time difference. This is a minor oversight on my part and has nothing to do with the location of the machines. In fact, they are all virtualised.

Summary Part I

In this part I tried to lay the foundation for the next one where I’m using my lab environment to simulate an artificially high network latency. I have to admit again that my setup is somewhat over-simplified-using Data Guard with asynchronous log shipping and Maximum Performance mode is not something I’d like to support for anything more important than my toy databases in the lab.

References

http://www.oracle.com/technetwork/database/availability/sync-2437177.pdf

Posted in 12c Release 1, KVM, Linux, Performance | Leave a Comment »

Creating a RAC 12.1 Data Guard Physical Standby environment (4)

Posted by Martin Bach on February 15, 2017

In the previous three parts of this series a lot of preparation work, needed for the configuration of Data Guard, was performed. In this part of the mini-series they all come to fruition. Using the Data Guard broker a switchover operation will be performed. A couple of new features in 12c make this easier. According to the “Changes in This Release for Oracle Data Guard Concepts and Administration” chapter of the 12.1 Data Guard Concepts and Administration guide:

When [you, ed.] perform a switchover from an Oracle RAC primary database to a physical standby database, it is no longer necessary to shut down all but one primary database instance.

I have always wanted to test that in a quiet moment…

I have previously blogged about another useful change that should make my life easier: the static registration of the *_DGMGRL services in the listener.ora file is no longer needed. Have a look at my Data Guard Broker Setup Changes post for more details and reference to the documentation.

NOTE: As always, this is just a demonstration using VMs in my lab, based on my notes. Your system is most likely different, so in real-life you might take a different approach. The techniques I am using here were suitable for me, and my own small scale testing. I tried to make sure they are valid, but you may want to allocate more resources in your environment. Test, test, test on your own environment on test kit first!

Now let’s get to it.

Step 1: Check the status of the configuration

In the first step I always check the configuration and make sure I can switch over. Data Guard 12c has a nifty automatic check that helps, but I always have a list of tasks I perform prior to a switchover (not shown in this blog post).

The following commands are somewhat sensitive to availability of the network – you should protect your sessions against any type of network failure! I am using screen (1) for that purpose, there are other tools out there doing similar things. Network glitches are too common to ignore, and I have come to appreciate the ability to resume work without too many problems after having seen the dreaded “broken pipe” message in my terminal window…

[oracle@rac12sec1 ~]$ dgmgrl
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production

Copyright (c) 2000, 2013, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys@ncdbb
Password:
Connected as SYSDBA.
DGMGRL> show configuration

Configuration - ractest

  Protection Mode: MaxPerformance
  Members:
  NCDBA - Primary database
    NCDBB - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 55 seconds ago)

DGMGRL> validate database 'NCDBB'
...

The command to check for switchover readiness is new to 12c as well and called “validate database”. I don’t have screen output from the situation at this point-just take my word that I was ready :) Don’t switch over if you have any concerns or doubts the operation might not succeed! “Validate database” does not relieve you from your duties to check for switchover readiness – follow your procedures.

Step 2: Switch Over

Finally, the big moment has come! It takes just one line to perform the switchover:

DGMGRL> switchover to 'NCDBB'
Performing switchover NOW, please wait...
New primary database "NCDBB" is opening...
Oracle Clusterware is restarting database "NCDBA" ...
Switchover succeeded, new primary is "NCDBB"
DGMGRL> 

DGMGRL> show database 'NCDBA';

Database - NCDBA

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 1 second ago)
  Apply Lag:          0 seconds (computed 1 second ago)
  Average Apply Rate: 9.00 KByte/s
  Real Time Query:    ON
  Instance(s):
    NCDBA1
    NCDBA2 (apply instance)

Database Status:
SUCCESS

DGMGRL> show database 'NCDBB';

Database - NCDBB

  Role:               PRIMARY
  Intended State:     TRANSPORT-ON
  Instance(s):
    NCDBB1
    NCDBB2

Database Status:
SUCCESS

DGMGRL> 

Well that was easy! Did you notice Data Guard Broker telling us that ‘Oracle Clusterware is restarting database “NCDBA” …’ ? I like it.

If you get stuck at this point something has gone wrong with the database registration in the OCR. You shouldn’t run into problems though, because you tested every aspect of the RAC system before handing the system over to its intended users, didn’t you?

Validating the new standby database shows no issues. I haven’t noticed it before but “validate database” allows you to get more verbose output:

DGMGRL> validate database verbose 'NCDBA';

  Database Role:     Physical standby database
  Primary Database:  NCDBB

  Ready for Switchover:  Yes
  Ready for Failover:    Yes (Primary Running)

  Capacity Information:
    Database  Instances        Threads
    NCDBB     2                2
    NCDBA     2                2

  Temporary Tablespace File Information:
    NCDBB TEMP Files:  1
    NCDBA TEMP Files:  1

  Flashback Database Status:
    NCDBB:  On
    NCDBA:  Off

  Data file Online Move in Progress:
    NCDBB:  No
    NCDBA:  No

  Standby Apply-Related Information:
    Apply State:      Running
    Apply Lag:        0 seconds (computed 1 second ago)
    Apply Delay:      0 minutes

  Transport-Related Information:
    Transport On:      Yes
    Gap Status:        No Gap
    Transport Lag:     0 seconds (computed 1 second ago)
    Transport Status:  Success

  Log Files Cleared:
    NCDBB Standby Redo Log Files:  Cleared
    NCDBA Online Redo Log Files:   Cleared
    NCDBA Standby Redo Log Files:  Available

  Current Log File Groups Configuration:
    Thread #  Online Redo Log Groups  Standby Redo Log Groups Status
              (NCDBB)                 (NCDBA)
    1         2                       3                       Sufficient SRLs
    2         2                       3                       Sufficient SRLs

  Future Log File Groups Configuration:
    Thread #  Online Redo Log Groups  Standby Redo Log Groups Status
              (NCDBA)                 (NCDBB)
    1         2                       3                       Sufficient SRLs
    2         2                       3                       Sufficient SRLs

  Current Configuration Log File Sizes:
    Thread #   Smallest Online Redo      Smallest Standby Redo
               Log File Size             Log File Size
               (NCDBB)                   (NCDBA)
    1          50 MBytes                 50 MBytes
    2          50 MBytes                 50 MBytes

  Future Configuration Log File Sizes:
    Thread #   Smallest Online Redo      Smallest Standby Redo
               Log File Size             Log File Size
               (NCDBA)                   (NCDBB)
    1          50 MBytes                 50 MBytes
    2          50 MBytes                 50 MBytes

  Apply-Related Property Settings:
    Property                        NCDBB Value              NCDBA Value
    DelayMins                       0                        0
    ApplyParallel                   AUTO                     AUTO

  Transport-Related Property Settings:
    Property                        NCDBB Value              NCDBA Value
    LogXptMode                      ASYNC                    ASYNC
    RedoRoutes                      <empty>                  <empty>
    Dependency                      <empty>                  <empty>
    DelayMins                       0                        0
    Binding                         optional                 optional
    MaxFailure                      0                        0
    MaxConnections                  1                        1
    ReopenSecs                      300                      300
    NetTimeout                      30                       30
    RedoCompression                 DISABLE                  DISABLE
    LogShipping                     ON                       ON

  Automatic Diagnostic Repository Errors:
    Error                       NCDBB    NCDBA
    No logging operation        NO       NO
    Control file corruptions    NO       NO
    SRL Group Unavailable       NO       NO
    System data file missing    NO       NO
    System data file corrupted  NO       NO
    System data file offline    NO       NO
    User data file missing      NO       NO
    User data file corrupted    NO       NO
    User data file offline      NO       NO
    Block Corruptions found     NO       NO

DGMGRL> 

Isn’t that cool? That’s more information at my fingertips than I can shake a stick at! It’s also a lot more than I could think of (eg online datafile move!).

Interestingly the Broker reports that I have “Sufficient SRLs”. I have seen it complain about the number of Standby Redo Logs in the past and blogged about this Interesting observation about standby redo logs in Data Guard

Summary

After 4 (!) posts about the matter I have finally been able to perform a switchover operation. Role reversals are a much neglected operation a DBA should be comfortable with. In a crisis situation everyone needs to be clear about what needs to be done to restore service to the users. The database is usually the easier part … Success of Data Guard switchover operations also depends on the quality of change management: it is easy to “forget” applying configuration changes on the DR site.

In today’s busy times only few of us are lucky enough to intimately know each and every database we look after. What’s more common (sadly!) is that a DBA looks after 42 or more databases. This really only works without too many issues if procedures and standards are rock solid, and enforced.

Posted in 12c Release 1, Data Guard, KVM, Linux, Uncategorized | Leave a Comment »

Creating a RAC 12.1 Data Guard Physical Standby environment (3)

Posted by Martin Bach on January 23, 2017

In the previous two parts of this series you read about my lab environment and the preparations on the network side as well as the database. In this part I’ll cover the database duplication. Again, this won’t be a short post …

NOTE: As always, this is just a demonstration using VMs in my lab, based on my notes. Your system is most likely different, so in real-life you might take a different approach. The techniques I am using here were suitable for me, and my own small scale testing. I tried to make sure they are valid, but you may want to allocate more resources in your environment. Test, test, test on your own environment on test kit first!

Step 1: Create an initialisation file

The next step is the preparation of an initialisation file. I am taking NCDBA as the sample and transfer it over to rac12sec1:

SQL> create pfile='/tmp/initNCDBA1.ora' from spfile;

File created.

[oracle@rac12pri1 ~]$ scp /tmp/initNCDBA1.ora rac12sec1:/tmp
oracle@rac12sec1's password: 
initNCDBA1.ora                                            100% 1515     1.5KB/s   00:00    
[oracle@rac12pri1 ~]$ 

The file is a bit verbose, and needs some editing. I appreciate that RMAN replaces the file anyway with a copy but I like to keep it tidy, at least at first … The end result is shown here-as always, you make sure that your version of the pfile matches your site’s requirements and standards. The important part is where you set the directory paths correctly, and you also ensure that db_name and db_unique_name are set correctly.

*.audit_file_dest='/u01/app/oracle/admin/NCDBB/adump'
*.audit_trail='db'
*.cluster_database=true
*.compatible='12.1.0.2.0'
*.db_block_size=8192
*.db_create_file_dest='+DATA'
*.db_domain=''
*.db_name='NCDBA'
*.db_unique_name='NCDBB'
*.db_recovery_file_dest='+RECO'
*.db_recovery_file_dest_size=4785m
*.diagnostic_dest='/u01/app/oracle'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=NCDBBXDB)'
NCDBB1.instance_number=1
NCDBB2.instance_number=2
*.open_cursors=300
*.pga_aggregate_target=1024m
*.processes=300
*.remote_login_passwordfile='exclusive'
*.sga_target=3072m
NCDBB2.thread=2
NCDBB1.thread=1
NCDBB1.undo_tablespace='UNDOTBS1'
NCDBB2.undo_tablespace='UNDOTBS2'

The file needs to be transferred to $ORACLE_HOME/dbs as initNCDBB1.ora.

Step 2: Create a temporary password file

You undoubtedly spotted that remote_login_passwordfile is set, which is a pre-requisite for connecting to the database as SYS via the network. The password file is required for an active database duplication, and without a password file this wouldn’t work. Since Oracle takes care of the password file during the RMAN duplicate it appears to be sufficient to simply create a password file in $ORACLE_HOME/dbs on rac12sec1:

[oracle@rac12sec1 dbs]$ orapwd file=orapwNCDBB1

Enter password for SYS:
[oracle@rac12sec1 dbs]$

During my testing this was enough to proceed to the next step.

Step 3: Duplicate the database “for standby”

The system should now be ready for the standby database’s creation. It’s been a bit of a task to get there, but from now on things will be easier … Using the *_DUP entries in the system’s tnsnames.ora file I connect to the target and auxiliary database, followed by the command to create the standby. In my case the database is more than tiny and the RMAN defaults are enough. Larger databases would probably need different settings and commands.


[oracle@rac12sec1 dbs]$ rman target sys@ncdba_dup auxiliary sys@ncdbb_dup

Recovery Manager: Release 12.1.0.2.0 - Production on Wed Dec 14 10:25:47 2016

Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.

target database Password: 
connected to target database: NCDBA (DBID=479005270)
auxiliary database Password: 
connected to auxiliary database: NCDBA (DBID=479005270, not open)

RMAN> startup clone nomount

connected to auxiliary database (not started)
Oracle instance started

Total System Global Area    3221225472 bytes

Fixed Size                     2929552 bytes
Variable Size                838863984 bytes
Database Buffers            2365587456 bytes
Redo Buffers                  13844480 bytes

RMAN> duplicate target database for standby from active database;

Starting Duplicate Db at 14-DEC-16
allocated channel: ORA_AUX_DISK_1
channel ORA_AUX_DISK_1: SID=240 instance=NCDBB1 device type=DISK

contents of Memory Script:
{
   backup as copy reuse
   targetfile  '+DATA/NCDBA/PASSWORD/pwdncdba.256.930561891' auxiliary format
 '/u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/orapwNCDBB1'   ;
}
executing Memory Script

Starting backup at 14-DEC-16
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=276 instance=NCDBA1 device type=DISK
Finished backup at 14-DEC-16

contents of Memory Script:
{
   sql clone "create spfile from memory";
   shutdown clone immediate;
   startup clone nomount;
   restore clone from service  'ncdba_dup' standby controlfile;
}  
executing Memory Script

sql statement: create spfile from memory

Oracle instance shut down

sql statement: alter system archive log current

contents of Memory Script:
{  
   switch clone datafile all;
}
executing Memory Script

datafile 1 switched to datafile copy
input datafile copy RECID=7 STAMP=930565247 file name=+DATA/NCDBB/DATAFILE/system.259.930565213
datafile 2 switched to datafile copy
input datafile copy RECID=8 STAMP=930565247 file name=+DATA/NCDBB/DATAFILE/undotbs2.260.930565221
datafile 3 switched to datafile copy
input datafile copy RECID=9 STAMP=930565247 file name=+DATA/NCDBB/DATAFILE/sysaux.261.930565223
datafile 4 switched to datafile copy
input datafile copy RECID=10 STAMP=930565247 file name=+DATA/NCDBB/DATAFILE/undotbs1.262.930565231
datafile 5 switched to datafile copy
input datafile copy RECID=11 STAMP=930565247 file name=+DATA/NCDBB/DATAFILE/example.263.930565233
datafile 6 switched to datafile copy
input datafile copy RECID=12 STAMP=930565247 file name=+DATA/NCDBB/DATAFILE/users.264.930565241
Finished Duplicate Db at 14-DEC-16

RMAN>

Perfect – the duplicate finished ok and all 6 datafiles have been transferred to my standby cluster. There are a few things worth noting here:

  • The password file of the primary (residing in ASM) has been copied to ‘/u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/orapwNCDBB1’ (not in ASM)
  • The standby database’s spfile has been created from memory – a lot of unneccesary and some obsolete parameters are now in the standby database’s spfile

Step 4: Finishing Touches

There are a few things still outstanding before declaring the environment ready. First of all you need to register the standby database in the Cluster Registry (OCR). Use srvctl for this task. One way to do this is shown here:

[oracle@rac12sec1 dbs]$ srvctl add database -db NCDBB -oraclehome $ORACLE_HOME -dbtype RAC \
> -role PHYSICAL_STANDBY -startoption MOUNT -dbname NCDBA -diskgroup "DATA,RECO"

[oracle@rac12sec1 dbs]$ srvctl add instance -db NCDBB -instance NCDBB1 -node rac12sec1
[oracle@rac12sec1 dbs]$ srvctl add instance -db NCDBB -instance NCDBB2 -node rac12sec2

When registering your standby database make sure you are license compliant!

Next I move the password file to ASM. The file currently resides on rac12sec1 in the $ORACLE_HOME/dbs directory. You can use the orapwd command to move it. If you registered the database in the OCR as I did in the previous step you will get an updated configuration as a result – the password file configuration is changed automatically.

[oracle@rac12sec1 dbs]$ srvctl config database -db NCDBB | grep -i password
Password file: +DATA/NCDBB/orapwncdbb

Since I like error messages when the database starts even less than unnecessary parameters in my spfile I tend to remove these from the standby database. And besides – in its current form the spfile wouldn’t allow me to mount both RAC instances anyway. For some odd reason the mapping between SID and instances is lost during the RMAN duplicate.

So I start off by creating a pfile from the spfile, and remove all unneeded underscore parameters and deprecated ones. In the resulting step I ensure that all the necessary RAC parameters are in. The minimalistic end result looks like this for my database:

*.audit_file_dest='/u01/app/oracle/admin/NCDBB/adump'
*.audit_trail='DB'
*.cluster_database=TRUE
*.compatible='12.1.0.2.0'
*.control_files='+DATA/NCDBB/CONTROLFILE/current.258.930565201',
   '+RECO/NCDBB/CONTROLFILE/current.258.930565201'#Restore Controlfile
*.db_block_size=8192
*.db_create_file_dest='+DATA'
*.db_domain=''
*.db_name='NCDBA'
*.db_recovery_file_dest='+RECO'
*.db_recovery_file_dest_size=4785M
*.db_unique_name='NCDBB'
*.diagnostic_dest='/u01/app/oracle'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=NCDBBXDB)'
NCDBB1.instance_number=1
NCDBB2.instance_number=2
*.open_cursors=300
*.pga_aggregate_target=1G
*.processes=300
*.remote_login_passwordfile='EXCLUSIVE'
*.sga_target=3G
NCDBB2.thread=2
NCDBB1.thread=1
NCDBB1.undo_tablespace='UNDOTBS1'
NCDBB2.undo_tablespace='UNDOTBS2'

Next I can move the spfile to ASM and update the OCR profile. After these steps, the OCR profile reflects the new location of the spfile. Be sure to check all $ORACLE_HOME/dbs directories on the cluster nodes to prevent local pfiles/spfiles taking precedence over the one referred to in the OCR.

[oracle@rac12sec1 dbs]$ srvctl config database -d NCDBB | grep spfile
Spfile: +DATA/NCDBB/spfileNCDBB

Everything is awesome!

It’s been a bit of a journey, but it was worth it. Before continuing with the remaining configuration steps, here is a little bit of detail about the system as it is:

[oracle@rac12sec1 dbs]$ srvctl status database -db NCDBB -verbose
Instance NCDBB1 is running on node rac12sec1. Instance status: Mounted (Closed).
Instance NCDBB2 is running on node rac12sec2. Instance status: Mounted (Closed).
[oracle@rac12sec1 dbs]$ 

SQL> select name,name,db_unique_name,database_role,log_mode,force_logging,flashback_on from v$database;

NAME      NAME      DB_UNIQUE_NAME   DATABASE_ROLE    LOG_MODE     FORCE_LOGGING   FLASHBACK_ON
--------- --------- ---------------- ---------------- ------------ --------------- ------------------
NCDBA     NCDBA     NCDBB            PHYSICAL STANDBY ARCHIVELOG   YES             NO

SQL> select * from v$active_instances;

INST_NUMBER INST_NAME                                                        CON_ID
----------- ------------------------------------------------------------ ----------
          1 rac12sec1:NCDBB1                                                      0
          2 rac12sec2:NCDBB2                                                      0

In my initial drafts I intended to continue this article with the Data Guard configuration, but having looked at the WordPress-formatted part 3 up to here I decided to split it into 2 pieces. Next up is the Data Guard Broker configuration, followed by more posts about switching over and application design considerations. Stay tuned!

Posted in 12c Release 1, KVM, Linux | Leave a Comment »

Creating a RAC 12.1 Data Guard Physical Standby environment (2)

Posted by Martin Bach on January 13, 2017

In the first part of this mini-series you saw me define the environment as well as creating a primary database. With that out of the way it’s time to think about the standby. Before the standby can be created, a few preparations are necessary both on the primary as well as the standby cluster.

NOTE: As always, this is just a demonstration using VMs in my lab, based on my notes. Your system is most likely different, so in real-life you might take a different approach. The techniques I am using here were suitable for me, and my own small scale testing. I tried to make sure they are valid, but you may want to allocate more resources in your environment. Test, test, test on your own environment on test kit first!

Preparing the Creation of the Standby Database

It’s easier to follow the discussion if you keep in mind that my primary database is NCDBA and the standby is NCDBB.

Step 1: Listener Configuration

The first step to be completed is the network setup. I am planning on using an active database duplication which requires my node listeners on the cluster to “know” about my database instances. Thankfully since RAC 12.1 we don’t need to register the *_DGMGRL service (used for switchover operations by the Broker) anymore, making life a bit easier. The static registration of databases may not be necessary if you are using a different RMAN duplicate approach.

By default Oracle does not statically register databases, so it’s up to you to do so. On rac12sec1 I added a SID_LIST_LISTENER section (note: only the node listener!) to /u01/app/12.1.0.2/grid/network/admin/listener.ora. Remember that the node listeners reside in the GRID_HOME.

...
SID_LIST_LISTENER =
  (SID_LIST =
...
    (SID_DESC =
      (GLOBAL_DBNAME = NCDBB)
      (ORACLE_HOME = /u01/app/oracle/product/12.1.0.2/dbhome_1)
      (SID_NAME = NCDBB1)
    )
...
  )
...

Likewise, I added a static registration to rac12sec2 – identical to the previous example except for the ORACLE_SID, which is NCDBB2 of course. After which I reloaded the listeners on both nodes.

Step 2: Service Name Configuration

The next file to change is the RDBMS home’s tnsnames.ora. To make sure I cover all eventualities I created the following tnsnames.ora on all cluster nodes, primary and standby RAC

[oracle@rac12pri1 ~]$ cat $ORACLE_HOME/network/admin/tnsnames.ora 
# tnsnames.ora Network Configuration File: /u01/app/oracle/product/12.1.0.2/dbhome_1/network/admin/tnsnames.ora
# Generated by Oracle configuration tools.

NCDBA =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = rac12pri-scan)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = NCDBA)
    )
  )

NCDBB =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = rac12sec-scan)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = NCDBB)
    )
  )

# only used for RMAN duplicate ... from active database
NCDBA_DUP =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = rac12pri1-vip)(PORT = 1521))
      (ADDRESS = (PROTOCOL = TCP)(HOST = rac12pri2-vip)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = NCDBA)
    )
  )

NCDBB_DUP =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = rac12sec1-vip)(PORT = 1521))
      (ADDRESS = (PROTOCOL = TCP)(HOST = rac12sec2-vip)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = NCDBB)
    )
  )

The first two entires are self-explanatory. The second pair requires a little more explanation. Since I have statically registered my standby database with the node listeners only (eg not with the SCAN listeners-I never tried and don’t like the thought of that) the duplicate would fail trying to restart the auxiliary instance. I haven’t captured the output of the failed RMAN duplicate but it’s possible to show you the effect of using the “wrong” tnsname with a simple test (the database I want to connect to is of course down)

[oracle@rac12pri1 ~]$ sqlplus sys@NCDBB as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Wed Dec 14 14:36:49 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Enter password: 
ERROR:
ORA-12514: TNS:listener does not currently know of service requested in connect
descriptor


Enter user-name: 
[oracle@rac12pri1 ~]$ sqlplus sys@NCDBB_DUP as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Wed Dec 14 14:36:59 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Enter password: 
Connected to an idle instance.

SQL> 

So it seems to be better to use the net service name where I can be sure the database can be controlled from a remote session :)

NB: this test required a password file for the standby database which doesn’t exist yet and will be created in part 3.

Step 3: Modify the oratab file

You also should change the oratab and add NCDBB on both standby hosts, as shown here for and rac12sec2.

[oracle@rac12sec2 ~]$ grep NCDBB /etc/oratab
NCDBB:/u01/app/oracle/product/12.1.0.2/dbhome_1:N

Step 4: Prepare the Primary Database: Archivelog mode, Force Logging, and maybe even Flashback Database

There are a number of tasks to be completed on the primary database before I can start duplicating it for a standby environment. I need to make sure it’s in Archivelog mode, and that force logging is enabled. If I can afford to take the overhead, I enable Flashback Database as well to help reinstating a failed primary should that need arise. Of course, you don’t just enable Flashback Database, because it can have a negative impact on performance, and the Flashback logs will use space. Your FRA space usage calculation can be derailed if you are not careful. Thorough testing is needed to assess whether it is ok to operate with Flashback Database enabled.

The first task in my scenario is to enable Archivelog mode and to enforce logging.

[oracle@rac12pri1 ~]$ sq

SQL*Plus: Release 12.1.0.2.0 Production on Wed Dec 14 09:52:11 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL> archive log list
Database log mode              No Archive Mode
Automatic archival             Disabled
Archive destination            USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence     15
Current log sequence           16
SQL> exit

...

[oracle@rac12pri1 ~]$ srvctl stop database -d NCDBA
[oracle@rac12pri1 ~]$ sq

SQL*Plus: Release 12.1.0.2.0 Production on Wed Dec 14 10:00:34 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup mount
ORACLE instance started.

Total System Global Area 3221225472 bytes
Fixed Size                  2929552 bytes
Variable Size             838863984 bytes
Database Buffers         2365587456 bytes
Redo Buffers               13844480 bytes
Database mounted.
SQL> alter database archivelog;

Database altered.

SQL> alter database force logging;

Database altered.

SQL>  alter database open;

Database altered.

SQL> exit

It is easy to check if that was a success:

[oracle@rac12pri1 ~]$ srvctl start instance -db NCDBA -node rac12pri2

[oracle@rac12pri1 ~]$ sq

SQL*Plus: Release 12.1.0.2.0 Production on Wed Dec 14 15:04:28 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL> archive log list
Database log mode              Archive Mode
Automatic archival             Enabled
Archive destination            USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence     15
Next log sequence to archive   16
Current log sequence           16

Step 5: Add Standby Redo Logs

In order to use Real Time Apply you need Standby Redo Logs. If you create them now (eg before the RMAN duplicate) you are saving time. First of all you don’t need to create Standby Redo Logs (SRL) post switchover when NCDBA operates as a standby database. Secondly, if there are SRLs in place, they are automatically created on the standby database as part of the RMAN duplicate.

You need n+1 SRLs per thread. You can work the number out by looking at v$log. My tiny database has 2 threads with 2 members each – ok for the lab, almost certainly not ok for anything serious. Helps me keep storage footprint in the lab low though.

SQL> select * from v$log;

    GROUP#    THREAD#  SEQUENCE#      BYTES  BLOCKSIZE    MEMBERS ARC STATUS           FIRST_CHANGE# FIRST_TIM NEXT_CHANGE# NEXT_TIME     CON_ID
---------- ---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- --------- ------------ --------- ----------
         1          1         15   52428800        512          2 YES INACTIVE               1621242 14-DEC-16      1621618 14-DEC-16          0
         2          1         16   52428800        512          2 NO  CURRENT                1621618 14-DEC-16   2.8147E+14                    0
         3          2          1   52428800        512          2 YES INACTIVE               1620677 14-DEC-16      1628674 14-DEC-16          0
         4          2          2   52428800        512          2 NO  CURRENT                1629454 14-DEC-16   2.8147E+14 14-DEC-16          0

SQL> 

With that information I can create 3 SRLs per thread. Did I mention that I love Oracle Managed Files?

SQL> alter database add standby logfile thread 1 size 52428800;

Database altered.

SQL> /

Database altered.

SQL> /

Database altered.

SQL> alter database add standby logfile thread 2 size 52428800;

Database altered.

SQL> /

Database altered.

SQL> /

Database altered.

SQL> select group#,thread#,sequence#,bytes,status from v$standby_log;

    GROUP#    THREAD#  SEQUENCE#      BYTES STATUS
---------- ---------- ---------- ---------- ----------
         5          1          0   52428800 UNASSIGNED
         6          1          0   52428800 UNASSIGNED
         7          1          0   52428800 UNASSIGNED
         8          2          0   52428800 UNASSIGNED
         9          2          0   52428800 UNASSIGNED
        10          2          0   52428800 UNASSIGNED

6 rows selected.

The primary database should be in relatively good shape now – Flashback Database is off to simulate the case where my fictional load is considered to impact performance too much.

SQL> select name,db_unique_name,database_role,log_mode,force_logging,flashback_on from v$database;

NAME      DB_UNIQUE_NAME  DATABASE_ROLE    LOG_MODE     FORCE_LOGGING  FLASHBACK_ON
--------- --------------- ---------------- ------------ -------------- ------------------
NCDBA     NCDBA           PRIMARY          ARCHIVELOG   YES            NO

SQL>

Summary

This concludes part 2. In this blog post I shared my notes about configuration the network for RMAN duplicate … from active database. I also configured the primary database in preparation for the initial duplicate command.

Note that I’m not setting any initialisation parameters related to Data Guard: over time I have become a great fan of the Data Guard Broker and plan on using that instead of a manual configuration.

Posted in 12c Release 1, Data Guard, Linux | 4 Comments »

Creating a RAC 12.1 Data Guard Physical Standby environment (1)

Posted by Martin Bach on December 14, 2016

I have just realised that the number of posts about RAC 12c Release 1 on this blog is rather too small. And since I’m a great fan of RAC this has to change :) In this mini-series I am going to share my notes about creating a Data Guard setup on my 2 node 12.1.0.2.161018 RAC primary + identical 2 node RAC standby system in the lab.

NOTE: As always, this is just a demonstration using VMs in my lab, based on my notes. Your system is most likely different, so in real-life you might take a different approach. The techniques I am using here were suitable for me, and my own small scale testing. I tried to make sure they are valid, but you may want to allocate more resources in your environment. Test, test, test on your own environment on test kit first!

The lab Environment

My environment consists of the following entities:

  • Primary Site
    • Oracle Linux 7.2 – UEK3 x86_64
    • Two RAC nodes: rac12pri1 and rac12pri2
    • Grid Infrastructure 12.1.0.2.161018 in /u01/app/12.1.0.2/grid
    • Single RDBMS home 12.1.0.2.161018 in /u01/app/oracle/product/12.1.0.2/dbhome_1
    • Non-container database NCDBA is administrator managed and running on both nodes
  • Standby Site
    • Oracle Linux 7.2 – UEK3 x86_64
    • Two RAC nodes: rac12sec1 and rac12sec2
    • Grid Infrastructure 12.1.0.2.161018 in /u01/app/12.1.0.2/grid
    • Single RDBMS home 12.1.0.2.161018 in /u01/app/oracle/product/12.1.0.2/dbhome_1
    • Standby database NCDBB is administrator managed and running on both nodes

I won’t focus on the creation of the RAC systems, I may have covered some of it in earlier blog posts and of course in the RAC Book.

I have deliberately kept it simple. Although most systems in real life use a dedicated (set of) NICs for Data Guard traffic I decided against it-I didn’t want attention being drawn away from the Big Picture. Similarly I am not touching on the option to create a second SCAN that Oracle allows us to create from 12.1 onwards. If you are interested in these topics kindly refer to my other blog posts.

Creation of the Primary Database

After both RAC systems are set up it’s time to start with the creation of the primary database. This is easy:

dbca -silent -createDatabase -templateName RACDB.dbc \
-gdbName NCDBA -sysPassword ... -systemPassword ... -storageType ASM \
-diskGroupName DATA -recoveryGroupName RECO -sampleSchema true \
-totalMemory 4096 -dbsnmpPassword ... -nodeinfo rac12pri1,rac12pri2

The template referenced in “-templateName” is my own – I always create templates to be license compliant. I covered how to create your custom database template on this blog as well.

I won’t go into detail here about the naming of my databases in a Data Guard configuration. What I learned the hard way was not to use a DB_UNIQUE_NAME that reflects the role. Imagine everyone’s surprise when they connect to a database named STDBY operating in the primary role after a switchover… For lack of better ideas I went ahead and enumerated the databases: my primary database is NCDBA and the standby is NCDBB.

After the database is created, it is started automatically by DBCA.

[oracle@rac12pri1 ~]$ srvctl status database -db NCDBA
Instance NCDBA1 is running on node rac12pri1
Instance NCDBA2 is running on node rac12pri2
[oracle@rac12pri1 ~]$

However, the newly created database isn’t patched (this is a known issue documented on Mike Dietrich’s blog for example).

SQL> select name from v$database;

NAME
---------
NCDBA

SQL> select count(*) from dba_registry_sqlpatch;

  COUNT(*)
----------
         0

No way around it – time to call datapatch:

SQL> alter system set cluster_database=false scope=spfile sid='*';

System altered.

SQL> exit

...

[oracle@rac12pri1 ~]$ srvctl stop database -db NCDBA
[oracle@rac12pri1 ~]$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Wed Dec 14 13:39:04 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup upgrade
ORACLE instance started.

Total System Global Area 3221225472 bytes
Fixed Size                  2929552 bytes
Variable Size             771755120 bytes
Database Buffers         2432696320 bytes
Redo Buffers               13844480 bytes
Database mounted.
Database opened.
SQL>  exit

...

[oracle@rac12pri1 ~]$ cd $ORACLE_HOME/OPatch/
[oracle@rac12pri1 OPatch]$ ./datapatch -verbose
SQL Patching tool version 12.1.0.2.0 on Wed Dec 14 13:08:51 2016
Copyright (c) 2016, Oracle.  All rights reserved.

Log file for this invocation: /u01/app/oracle/cfgtoollogs/sqlpatch/sqlpatch_16313_2016_12_14_13_08_51/sqlpatch_invocation.log

Connecting to database...OK
Bootstrapping registry and package to current versions...done
Determining current state...done

Current state of SQL patches:
Patch 24315824 (Database PSU 12.1.0.2.161018, Oracle JavaVM Component (OCT2016)):
  Installed in the binary registry only
Bundle series DBBP:
  ID 161018 in the binary registry and not installed in the SQL registry

Adding patches to installation queue and performing prereq checks...
Installation queue:
  Nothing to roll back
  The following patches will be applied:
    24315824 (Database PSU 12.1.0.2.161018, Oracle JavaVM Component (OCT2016))
    24340679 (DATABASE BUNDLE PATCH: 12.1.0.2.161018 (24340679))

Installing patches...
Patch installation complete.  Total patches installed: 2

Validating logfiles...
Patch 24315824 apply: SUCCESS
  logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/24315824/20676808/24315824_apply_NCDBA_2016Dec14_13_09_26.log (no errors)
Patch 24340679 apply: SUCCESS
  logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/24340679/20713212/24340679_apply_NCDBA_2016Dec14_13_09_30.log (no errors)
SQL Patching tool complete on Wed Dec 14 13:15:08 2016
[oracle@rac12pri1 OPatch]$

This concludes part 1 – the database is now set up and running on the primary cluster. In the next part I’m going to describe how to prepare the primary and standby cluster for the Data Guard setup.

Posted in 12c Release 1, Data Guard, Linux, RAC | 1 Comment »

GTT and Smart Scan – the importance of the append hint

Posted by Martin Bach on December 5, 2016

While writing the previous two posts about GTTs and Smart Scan, I stumbled across an interesting observation. When I started putting my thoughts to (virtual) paper, I thought it was going to be a short post. Well, it wasn’t to be, again. Anyway, you might find this interesting.

Initial Situation

If you read the previous posts this code example I used to populate the GTT might look familiar:

insert /*+ append */ into gtt select * from t4 where rownum < 400000;
 
commit;

In situations like this where I’m moving data I developed a habit of using the append hint. I guess I’ve been working on HCC tables a bit too much and therefore didn’t even make a conscious decision to use that hint. It was the right thing to do, as you will see next.

The full test harness is shown here again for reference:

SQL> !cat test.sql
set lines 120 tab off trimspool on verify off timing on

-- this statement is using the /*+ append */ hint
insert /*+ append using_append */ into gtt select * from t4 where rownum < 400000;

commit;

-- not testing if I can get Smart Scans, I know there are :)
alter session set "_serial_direct_read" = always;

-- make sure the query can be ever so slightly more complex and always return rows
col min_id new_v min_id
col max_id new_v max_id

select min(id) min_id from gtt;
select max(id) max_id from gtt;

-- this is the actual test
@mystats start s=s
set echo on
select /* using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;
set echo off
@mystats stop t=1

SQL>

When executing the script and looking at the output I didn’t see anything out of the ordinary. When checking for the “depth” of the Smart Scan, I regularly consult the cell blocks processed by % layer statistics to see if I lose blocks between code layers. It wouldn’t appear so:

------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
...
STAT    cell IO uncompressed bytes                                             546,136,064
STAT    cell blocks processed by cache layer                                        66,667
STAT    cell blocks processed by data layer                                         66,667
STAT    cell blocks processed by txn layer                                          66,667
...
STAT    cell physical IO bytes eligible for predicate offload                  546,136,064
STAT    cell physical IO interconnect bytes                                        112,504
STAT    cell physical IO interconnect bytes returned by smart scan                  96,120
...
STAT    physical read IO requests                                                      523
STAT    physical read bytes                                                    546,152,448
STAT    physical read requests optimized                                               317
STAT    physical read total IO requests                                                523
STAT    physical read total bytes                                              546,152,448
STAT    physical read total bytes optimized                                    330,153,984
STAT    physical read total multi block requests                                       521
STAT    physical reads                                                              66,669
STAT    physical reads cache                                                             2
STAT    physical reads direct                                                       66,667
...
STAT    table scans (direct read)                                                        1
STAT    table scans (long tables)                                                        1

66,667 blocks in each code layer. This looks ok to me: the number of all physical reads recorded for this statement is 66,669. I certainly don’t want to endorse tuning by ratio, but in this case it sort-of makes sense: the offload efficiency pretty high in my case:

SQL> @fsx.sql
Enter value for sql_text:
Enter value for sql_id: a4jrkbnsfgk1j

SQL_ID         CHILD  PLAN_HASH  EXECS  AVG_ETIME AVG_PX OFFLOAD IO_SAVED_% SQL_TEXT
------------- ------ ---------- ------ ---------- ------ ------- ---------- ----------------------------------------------------------------------
a4jrkbnsfgk1j      0 2363333961      1        .01      0 Yes         100.00 select /* using_append */ count(*), id from gtt where id in (        7

That should suffice for a recap of the previous posts.

No append hint-different outcome

Now imagine my surprise when I didn’t execute a script but rather typed the insert command in an interactive session. In the next example, created to reproduce the observation, I “forgot” to specify the append hint:

SQL> !cat test2.sql
-- not using the append hint
insert /* not_using_append */ into gtt select * from t4 where rownum < 400000;

commit;

-- not testing if I can get Smart Scans, I know there are :)
alter session set "_serial_direct_read" = always;

-- make sure the query can be ever so slightly more complex and always return rows
col min_id new_v min_id
col max_id new_v max_id

select min(id) min_id from gtt;
select max(id) max_id from gtt;

-- this is the actual test
@mystats start s=s
set echo on
select /* not_using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;
set echo off
@mystats stop t=1

The difference between test.sql and test2.sql looks harmless enough – just a hint, or lack thereof. However the result is interesting. Most executions of the above script resulted in Oracle reporting the following statistics:

------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
...
STAT    calls to get snapshot scn: kcmgss                                              462
STAT    calls to kcmgcs                                                                 28
STAT    cell IO uncompressed bytes                                             544,800,768
STAT    cell blocks processed by cache layer                                        67,026
STAT    cell blocks processed by data layer                                              1
STAT    cell blocks processed by txn layer                                               1
STAT    cell commit cache queries                                                   67,025
STAT    cell flash cache read hits                                                     633
STAT    cell num smartio automem buffer allocation attempts                              1
STAT    cell physical IO bytes eligible for predicate offload                  544,800,768
STAT    cell physical IO interconnect bytes                                    545,000,688
STAT    cell physical IO interconnect bytes returned by smart scan             544,984,304
STAT    cell scans                                                                       1
STAT    cleanout - number of ktugct calls                                           66,503
STAT    cleanouts only - consistent read gets                                       66,503
STAT    commit txn count during cleanout                                            66,503
...
STAT    consistent gets                                                            133,250
STAT    consistent gets direct                                                      66,504
STAT    consistent gets examination                                                 66,506
STAT    consistent gets examination (fastpath)                                      66,506
...
STAT    immediate (CR) block cleanout applications                                  66,503
...
STAT    physical read IO requests                                                    1,044
STAT    physical read bytes                                                    544,817,152
STAT    physical read requests optimized                                               633
STAT    physical read total IO requests                                              1,044
STAT    physical read total bytes                                              544,817,152
STAT    physical read total bytes optimized                                    329,859,072
STAT    physical read total multi block requests                                       748
STAT    physical reads                                                              66,506
STAT    physical reads cache                                                             2
STAT    physical reads direct                                                       66,504
...

Huh? I was clearly surprised to see so many rows not making it to the higher layers (txn and data) after having been read by the cache layer. Further executions of the select statement within the same session do not necessarily improve this significantly, but I noticed savings by storage index to come into play.

Speaking more scientifically, the listing immediately above shows that out of 67,026 blocks that were read by the cache layer only 1 (!) was processed by txn and data layers. As a direct result Smart Scan does not offer any real benefit: cell physical IO interconnect bytes returned by smart scan is actually ≥ cell physical IO bytes eligible for predicate offload.

Why? What?

There were a few red herrings in the statistic counters I saw:

  • Cleanout statistics have been reported by mystats
    • commit txn count during cleanout
    • cleanouts only – consistent read gets
    • immediate (CR) block cleanout applications
    • etc.
  • There are consistent gets examination that might be peeks at undo information
  • Lots of queries to the commit cache (cell commit cache queries) – but for most of my tests I didn’t see replies. Only 1 out of maybe 20 tests produced some cell transactions found in commit cache, but even then their number was not significant. I will provide an example towards the end of this post.

Further investigation

So this leads me to suspect that the blocks that were created with the append hint are “clean”, e.g. suitable for a consistent read on the cells without further need for processing. It would appear so, judging by this block dump (dump the block from the tempfile, not datafile!):

Start dump data blocks tsn: 3 file#:1 minblk 2305025 maxblk 2305025
...
Block dump from disk:
buffer tsn: 3 rdba: 0x00632c01 (1/2305025)
scn: 0x0.64aaeb9f seq: 0x01 flg: 0x0c tail: 0xeb9f0601
frmt: 0x02 chkval: 0xd11e type: 0x06=trans data
...
Block header dump:  0x00632c01
 Object id on Block? Y
 seg/obj: 0x632c00  csc: 0x00.64aaeb9e  itc: 3  flg: -  typ: 1 - DATA
     fsl: 0  fnx: 0x0 ver: 0x01

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x000a.019.0012beca  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x03   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
bdba: 0x00632c01
data_block_dump,data header at 0x7f6250501074
===============
tsiz: 0x1f88
hsiz: 0x1e
pbl: 0x7f6250501074
     76543210
flag=--------
ntab=1
nrow=6
frre=-1
fsbo=0x1e
fseo=0x746
avsp=0x728
tosp=0x728
0xe:pti[0]      nrow=6  offs=0
0x12:pri[0]     offs=0x1b7d
0x14:pri[1]     offs=0x1772
0x16:pri[2]     offs=0x1367
0x18:pri[3]     offs=0xf5c
0x1a:pri[4]     offs=0xb51
0x1c:pri[5]     offs=0x746
block_row_dump:
tab 0, row 0, @0x1b7d
tl: 1035 fb: --H-FL-- lb: 0x0  cc: 6
col  0: [ 2]  c1 50
col  1: [999]
 37 39 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
...
tl: 1035 fb: --H-FL-- lb: 0x0  cc: 6
...
tl: 1035 fb: --H-FL-- lb: 0x0  cc: 6
...

This is in contrast to the blocks I dumped that have been created without the append hint. To make sure I’m comparing apples with apples these blocks have been created in a different session after having disconnected from the previous one:

Start dump data blocks tsn: 3 file#:1 minblk 2305025 maxblk 2305025
...
Block dump from disk:
buffer tsn: 3 rdba: 0x00632c01 (1/2305025)
scn: 0x0.64aab6bf seq: 0x00 flg: 0x0c tail: 0xb6bf0600
frmt: 0x02 chkval: 0x478e type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
...
Block header dump:  0x00632c01
 Object id on Block? Y
 seg/obj: 0x629300  csc: 0x00.64aab6bf  itc: 2  flg: -  typ: 1 - DATA
     fsl: 0  fnx: 0x0 ver: 0x01

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x000a.01b.0012be5d  0x00c04427.d97f.27  ----    6  fsc 0x0000.00000000
0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
bdba: 0x00632c01
data_block_dump,data header at 0x7f53b75e125c
===============
tsiz: 0x1fa0
hsiz: 0x1e
pbl: 0x7f53b75e125c
     76543210
flag=--------
ntab=1
nrow=6
frre=-1
fsbo=0x1e
fseo=0x752
avsp=0x734
tosp=0x734
0xe:pti[0]      nrow=6  offs=0
0x12:pri[0]     offs=0xf6c
0x14:pri[1]     offs=0x1379
0x16:pri[2]     offs=0x1786
0x18:pri[3]     offs=0x1b93
0x1a:pri[4]     offs=0x752
0x1c:pri[5]     offs=0xb5f
block_row_dump:
tab 0, row 0, @0xf6c
tl: 1037 fb: --H-FL-- lb: 0x1  cc: 6
col  0: [ 4]  c3 0f 05 50
col  1: [999]
 31 34 30 34 37 39 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
...
tl: 1037 fb: --H-FL-- lb: 0x1  cc: 6
...
tl: 1037 fb: --H-FL-- lb: 0x1  cc: 6
...

This supports the theory that the block I created without the append hint features some information that prevents it from being passed from cache to txn layer. I didn’t see active txn count during cleanout though, but commit txn count during cleanout and immediate (CR) block cleanout applications instead. So cleanout is definitely needed and performed.

I stated in the second bullet point that commit cache queries were not producing cache-hits most of the time. I managed to get an execution of the select statement against the blocks created without append hint, where at least some commit cache queries produced hits:

------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
STAT    cell IO uncompressed bytes                                             544,800,768
STAT    cell blocks helped by commit cache                                           6,981
STAT    cell blocks processed by cache layer                                        66,969
STAT    cell blocks processed by data layer                                          6,982
STAT    cell blocks processed by txn layer                                           6,982
STAT    cell commit cache queries                                                   66,968
STAT    cell flash cache read hits                                                     576
STAT    cell num smartio automem buffer allocation attempts                              1
STAT    cell physical IO bytes eligible for predicate offload                  544,800,768
STAT    cell physical IO interconnect bytes                                    487,801,976
STAT    cell physical IO interconnect bytes returned by smart scan             487,785,592
STAT    cell scans                                                                       1
STAT    cell transactions found in commit cache                                      6,981
STAT    cleanout - number of ktugct calls                                           59,522
STAT    cleanouts only - consistent read gets                                       59,522
STAT    commit txn count during cleanout                                            59,522
...
STAT    physical read IO requests                                                      987
STAT    physical read bytes                                                    544,817,152
STAT    physical read requests optimized                                               576
STAT    physical read total IO requests                                                987
STAT    physical read total bytes                                              544,817,152
STAT    physical read total bytes optimized                                    329,859,072
STAT    physical read total multi block requests                                       723
STAT    physical reads                                                              66,506
STAT    physical reads cache                                                             2
STAT    physical reads direct                                                       66,504

I couldn’t reproduce this at will though, and it wasn’t for lack of trying.

Summary

I am not entirely sure what causes the observed behaviour, especially with regards to the commit cache and/or minscn optimisation but it evidently seems to be better to populate a GTT with the append hint if you want to make best use of Smart Scans against such segments. A very quick and dirty test against a heap table suggests that this is true for this segment types as well.

Posted in 12c Release 1, Exadata | Leave a Comment »

Smart Scanning GTTs – what about Flash Cache?

Posted by Martin Bach on November 29, 2016

Continuing the example of the previous blog post (is it possible to Smart Scan GTTs?), I am investigating if Smart Scans against GTTs can benefit from Flash Cache, or “optimised reads”. The easiest way to see this in your session is to calculate the change in session counters based on v$mystat or v$sesstat. Other tools might also provide information about the amount of data read from Flash Cache, but there are quite often extra licenses required …

Calculating the delta in session counters can be done in many ways, for example using Tanel’s session snapper. For clearly defined test cases where I can control beginning and end of the execution I prefer to use another great script for the purpose. My thanks go to Adrian Billington for providing the Oracle community with the most useful “mystats” package which I’m using here. You can – and should – get it from oracle-developer.net.

Let’s check I can get flash cache read hits on GTT scans – which would help performance of repeated queries against the segment. And since GTTs reside on TEMP, I had a few doubts whether flash cache read hits were possible.

What better than to test?

Testing helps removing ambiguity (and documenting the test result helps me remember things!), so here’s the code used:

set lines 120 tab off trimspool on verify off timing on

-- it will be crucially important to use the /*+ append */ hint 
-- as you will see in the next post about GTTs on Exadata
insert /*+ append */ into gtt select * from t4 where rownum < 400000;

commit;

-- not testing if I can get Smart Scans, I know there are :)
alter session set "_serial_direct_read" = always;

-- make sure the query can be ever so slightly more complex and always return rows
col min_id new_v min_id
col max_id new_v max_id

select min(id) min_id from gtt;
select max(id) max_id from gtt;

-- hoping to trigger flash cache population here
select /* using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;
select /* using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;
select /* using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;
select /* using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;
select /* using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;
select /* using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;

-- and measuring effect, if any

@mystats start s=s

set echo on
select /* using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;
set echo off

@mystats stop t=1

Have a look at the code in mystats.sql to see what the options mean. Essentially I’m asking it to record session statistics only (start s=s) and after the execution list only those counters that have changed (t=1 where t is for threshold). I hacked the script a little to order by statistic name rather than the default.

The observation

It appears I can get Flash Cache hits. The relevant statistic name is “cell flash cache read hits”, and the fact that physical IO was optimised can also be seen in “physical read requests optimized” and “physical read total bytes optimized” amongst others. Here is the relevant output from mystats:

------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
...
STAT    cell blocks processed by cache layer                                        66,667
STAT    cell blocks processed by data layer                                         66,667
STAT    cell blocks processed by txn layer                                          66,667
STAT    cell flash cache read hits                                                     315
STAT    cell num smartio automem buffer allocation attempts                              1
STAT    cell physical IO bytes eligible for predicate offload                  546,136,064
STAT    cell physical IO interconnect bytes                                        112,504
STAT    cell physical IO interconnect bytes returned by smart scan                  96,120
STAT    cell scans                                                                       1
...
STAT    physical read IO requests                                                      523
STAT    physical read bytes                                                    546,152,448
STAT    physical read requests optimized                                               315
STAT    physical read total IO requests                                                523
STAT    physical read total bytes                                              546,152,448
STAT    physical read total bytes optimized                                    329,252,864
STAT    physical read total multi block requests                                       521
STAT    physical reads                                                              66,669
STAT    physical reads cache                                                             2
STAT    physical reads direct                                                       66,667
...

I usually check the cell blocks processed by% layer first to get an idea about the “depth” of the Smart Scan-losing blocks in transaction and data layers wouldn’t be great. There’s been 1 Smart Scan against my segment (the GTT isn’t partitioned) recorded in “cell scans”, so that covers that aspect as well.

Out of the 523 “physical read IO requests” 315 were optimised so that’s a pretty good ratio of approximately 60% of the IO requests coming from Flash Cache. The other statistics represent the same ratio in bytes rather than I/O requests.

Posted in 12c Release 1, Exadata | Leave a Comment »

Little things worth knowing: Can I Smart Scan a Global Temporary Table?

Posted by Martin Bach on November 25, 2016

A question that has come up while preparing to deliver another Exadata Optimisation Class is whether you can offload queries against Global Temporary Tables (GTT) or not. There are subtle differences between heap tables and GTTs such as how statistics are handled and where they are stored that merit a little investigation.

I have used the lab environment on the X3-2 (Exadata 12.1.2.3.0, 12.1.0.2.160419 RDBMS) to test if I can Smart Scan GTTs. There are some enhancements in 12c regarding GTTs that are mentioned on the Optimiser blog here:

https://blogs.oracle.com/optimizer/entry/gtts_and_upgrading_to_oracle

The Setup

I have used a brand new, 12c non-CDB for these tests. The SGA_TARGET is deliberately set to something really small so as to provoke Smart Scans without having to resort to setting _serial_direct_read to “always” in my session.

First of all, I needed a GTT for this purpose. I have a few scripts that create large tables for testing, and T4 will serve as my basis. Here is my setup:

SQL> create global temporary table gtt on commit preserve rows 
  2  as select * from t4 where 1 = 0;

Table created.

SQL> alter session enable parallel dml;

Session altered.

SQL> insert /*+ parallel(4) */ into gtt
  2  select /*+ parallel(4) */ * from t4
  3    where rownum < 400000;

399999 rows created.

SQL> commit

Commit complete.

This might be the time to point out that the GTT has two types of statistics in 12c – shared and session. Again, please refer to the Optimiser blog post for more information about private session statistics. I first looked at dba_tables for num_rows etc but found the column empty. Using a query I found on oracle-base.com in Tim’s “Session-Private Statistics for Global Temporary Tables in Oracle Database 12c Release 1” article I detected some statistics in the end:

SQL> select table_name, object_type, num_rows, scope
  2  from user_tab_statistics where table_name = 'GTT';

TABLE_NAME                     OBJECT_TYPE    NUM_ROWS SCOPE
------------------------------ ------------ ---------- -------
GTT                            TABLE                   SHARED
GTT                            TABLE            399999 SESSION

It’s a global optimiser setting where you gather session level statistics by default.

SQL> SELECT DBMS_STATS.get_prefs('GLOBAL_TEMP_TABLE_STATS') FROM dual;

DBMS_STATS.GET_PREFS('GLOBAL_TEMP_TABLE_STATS')
--------------------------------------------------
SESSION

The Testcase

With the table set up it is now time to test the Smart Scan. In addition to a SQL trace I request a NSMTIO trace to understand Oracle’s direct path read decision better.

SQL> alter session set events 'sql_trace level 8';

Session altered.

SQL> alter session set events 'trace[NSMTIO] disk=highest';

Session altered.

SQL> select count(*), id from gtt where id in (9244048,2529293) group by id;

  COUNT(*)         ID
---------- ----------
         1    2529293
         1    9244048

SQL> exit

Looking at the trace file I can see the following output for this particular statement:

=====================
PARSING IN CURSOR #140137295360240 len=70 dep=0 uid=78 oct=3 lid=78 tim=1228663420381 hv=2599214644 ad='1398f2ab0' sqlid='9p960vqdftrjn'
select count(*), id from gtt where id in (9244048,2529293) group by id
END OF STMT
PARSE #140137295360240:c=2000,e=1954,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=2363333961,tim=1228663420380
EXEC #140137295360240:c=0,e=74,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=2363333961,tim=1228663420505
WAIT #140137295360240: nam='SQL*Net message to client' ela= 8 driver id=1650815232 #bytes=1 p3=0 obj#=163899 tim=1228663420582
NSMTIO: kcbism: islarge 1 next 0 nblks 67197 type 2, bpid 3, kcbisdbfc 0 kcbnhl 32768 kcbstt 6686 keep_nb 0 kcbnbh 317347 kcbnwp 4
NSMTIO: kcbimd: nblks 67197 kcbstt 6686 kcbpnb 33430 kcbisdbfc 3 is_medium 0
NSMTIO: kcbivlo: nblks 67197 vlot 500 pnb 334303 kcbisdbfc 0 is_large 0
NSMTIO: qertbFetch:[MTT < OBJECT_SIZE < VLOT]: Checking cost to read from caches(local/remote) and checking storage reduction factors (OLTP/EHCC Comp)
NSMTIO: kcbdpc:DirectRead: tsn: 3, objd: 6468608, objn: 163899
ckpt: 1, nblks: 67197, ntcache: 3, ntdist:3
Direct Path for pdb 0 tsn 3  objd 6468608 objn 163899
Direct Path 1 ckpt 1, nblks 67197 ntcache 3 ntdist 3
Direct Path mndb 0 tdiob 3978 txiob 0 tciob 101
Direct path diomrc 128 dios 2 kcbisdbfc 0
NSMTIO: Additional Info: VLOT=1671515
Object# = 6468608, Object_Size = 67197 blocks
SqlId = 9p960vqdftrjn, plan_hash_value = 2363333961, Partition# = 0
WAIT #140137295360240: nam='reliable message' ela= 1053 channel context=5884629416 channel handle=5884574224 broadcast message=5885746648 obj#=163899 tim=1228663421984
WAIT #140137295360240: nam='enq: RO - fast object reuse' ela= 122 name|mode=1380909062 2=65585 0=1 obj#=163899 tim=1228663422173
WAIT #140137295360240: nam='enq: RO - fast object reuse' ela= 96 name|mode=1380909057 2=65585 0=2 obj#=163899 tim=1228663422323
WAIT #140137295360240: nam='cell single block physical read' ela= 308 cellhash#=2133459483 diskhash#=2964949887 bytes=8192 obj#=163899 tim=1228663423031
WAIT #140137295360240: nam='cell smart table scan' ela= 153 cellhash#=2133459483 p2=0 p3=0 obj#=163899 tim=1228663423651
WAIT #140137295360240: nam='cell smart table scan' ela= 152 cellhash#=379339958 p2=0 p3=0 obj#=163899 tim=1228663424114
WAIT #140137295360240: nam='cell smart table scan' ela= 145 cellhash#=3176594409 p2=0 p3=0 obj#=163899 tim=1228663424571
WAIT #140137295360240: nam='gc cr grant 2-way' ela= 135 p1=201 p2=2273408 p3=7 obj#=163899 tim=1228663425857
WAIT #140137295360240: nam='cell single block physical read' ela= 257 cellhash#=379339958 diskhash#=2689692736 bytes=8192 obj#=163899 tim=1228663426158
WAIT #140137295360240: nam='cell single block physical read' ela= 224 cellhash#=2133459483 diskhash#=3132070477 bytes=8192 obj#=163899 tim=1228663426466
WAIT #140137295360240: nam='cell smart table scan' ela= 251 cellhash#=2133459483 p2=0 p3=0 obj#=163899 tim=1228663426863
WAIT #140137295360240: nam='cell smart table scan' ela= 268 cellhash#=379339958 p2=0 p3=0 obj#=163899 tim=1228663427213
WAIT #140137295360240: nam='cell smart table scan' ela= 225 cellhash#=3176594409 p2=0 p3=0 obj#=163899 tim=1228663427514
WAIT #140137295360240: nam='cell smart table scan' ela= 85 cellhash#=2133459483 p2=0 p3=0 obj#=163899 tim=1228663427720
WAIT #140137295360240: nam='cell smart table scan' ela= 13 cellhash#=379339958 p2=0 p3=0 obj#=163899 tim=1228663427755
...

There is a simple conclusion: yes, there is a Smart Scan. I can’t see a massive difference between the NSMTIO trace for a GTT or a heap table. Since direct path read decisions are statistics driven (check parameter _direct_read_decision_statistics_driven) and there weren’t shared statistics on GTT I assume the single block reads are related to reading object statistics. OBJ# 163899 is the object_id of the GTT.

What is interesting is the use of “enq: RO – fast object reuse” prior to the start of the Smart Scan. This surprised me a little and I wasn’t expecting it. Here’s the same trace combination for a heap table for comparison:

=====================
PARSING IN CURSOR #140120190960712 len=91 dep=0 uid=78 oct=3 lid=78 tim=1226766583470 hv=2817388421 ad='13483d3f0' sqlid='g8948rkmyvvw5'
select /* heaptest001 */ count(*), id from heapt1 where id in (6460386, 6460390) group by id
END OF STMT
PARSE #140120190960712:c=1000,e=1925,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=3353166567,tim=1226766583470
EXEC #140120190960712:c=0,e=84,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3353166567,tim=1226766583599
WAIT #140120190960712: nam='SQL*Net message to client' ela= 3 driver id=1650815232 #bytes=1 p3=0 obj#=163896 tim=1226766583653
NSMTIO: kcbism: islarge 1 next 0 nblks 67024 type 2, bpid 3, kcbisdbfc 0 kcbnhl 32768 kcbstt 6686 keep_nb 0 kcbnbh 317347 kcbnwp 4
NSMTIO: kcbimd: nblks 67024 kcbstt 6686 kcbpnb 33430 kcbisdbfc 3 is_medium 0
NSMTIO: kcbivlo: nblks 67024 vlot 500 pnb 334303 kcbisdbfc 0 is_large 0
NSMTIO: qertbFetch:[MTT < OBJECT_SIZE < VLOT]: Checking cost to read from caches(local/remote) and checking storage reduction factors (OLTP/EHCC Comp)
NSMTIO: kcbdpc:DirectRead: tsn: 9, objd: 163896, objn: 163896
ckpt: 1, nblks: 67024, ntcache: 482, ntdist:482
Direct Path for pdb 0 tsn 9  objd 163896 objn 163896
Direct Path 1 ckpt 1, nblks 67024 ntcache 482 ntdist 482
Direct Path mndb 0 tdiob 3978 txiob 0 tciob 109
Direct path diomrc 128 dios 2 kcbisdbfc 0
NSMTIO: Additional Info: VLOT=1671515
Object# = 163896, Object_Size = 67024 blocks
SqlId = g8948rkmyvvw5, plan_hash_value = 3353166567, Partition# = 0
WAIT #140120190960712: nam='enq: KO - fast object checkpoint' ela= 351 name|mode=1263468550 2=65585 0=2 obj#=163896 tim=1226766584359
WAIT #140120190960712: nam='reliable message' ela= 1372 channel context=5884629416 channel handle=5252488952 broadcast message=5885771824 obj#=163896 tim=1226766585825
WAIT #140120190960712: nam='enq: KO - fast object checkpoint' ela= 88 name|mode=1263468550 2=65585 0=1 obj#=163896 tim=1226766585972
WAIT #140120190960712: nam='enq: KO - fast object checkpoint' ela= 90 name|mode=1263468545 2=65585 0=2 obj#=163896 tim=1226766586140
WAIT #140120190960712: nam='cell smart table scan' ela= 201 cellhash#=3176594409 p2=0 p3=0 obj#=163896 tim=1226766587377
WAIT #140120190960712: nam='cell smart table scan' ela= 137 cellhash#=379339958 p2=0 p3=0 obj#=163896 tim=1226766589330
WAIT #140120190960712: nam='cell smart table scan' ela= 127 cellhash#=2133459483 p2=0 p3=0 obj#=163896 tim=1226766593585
WAIT #140120190960712: nam='cell smart table scan' ela= 215 cellhash#=3176594409 p2=0 p3=0 obj#=163896 tim=1226766602986

This is the more familiar picture: reliable message, enq: KO – fast object checkpoint followed by cell smart table scan.

The Verdict

Quite simple this time: yes, you can use Smart Scans on GTT. Additionally there are a few caveats around the use of GTTs when you are migrating your database to 12.1 but you can read about them in the link to the Optimiser blog provided.

Posted in 12c Release 1, Exadata | 1 Comment »

Building an RPM for the Oracle database on Oracle Linux 7

Posted by Martin Bach on October 28, 2016

Thinking about automation a lot, especially in the context of cloud computing, I have decided to create a small series of related posts that hopefully help someone deploying environments in a fully automated way. As my colleague @fritshoogland has said many times: the only way to deploy database servers (or any other server for that matter) in a consistent way, is to do it via software. No matter how hard you try to set up 10 identical systems manually, there will be some, maybe ever so subtle, differences between them. And with the cloud you probably have 42 systems or more to deal with! In this context, my first post could be a building block: the provisioning of the Oracle RDBMS the form of an RPM.

The idea

In a nutshell, I would like to

  • Create a “golden image” of my database installation
  • Make this available for deployment via NFS
  • Create an RPM that allows me to deploy the “golden image” on the server I want to install and call clone.pl to complete the process
  • Execute the root scripts (orainstRoot.sh, root.sh)

This is actually quite easy to do! The oracle-validated RPM, and nowadays oracle-rdbms-server-release-preinstall.x86_64 packages simplify the task greatly as they can pull in dependent packages, set sensible default parameters and generally perform a lot of the tedious setup work for you. All I have to do in my RPM is to enforce the version of the system is Oracle Linux 7 and pull the preinstall RPM in if not yet installed. Since there is no more 12c RDBMS server for i686 this entire post implicitly is about Linux x86-64.

The Implementation

I am using a development system (linuxdev in this example) to build the RPM. It’s an Oracle Linux 7.2 system with all upgrades up to 24/10/2016. The first thing to do is to install the RPM dev tools and their dependencies. As always, you use yum to do so.

[root@linuxdev ~]# yum info rpmdevtools
Loaded plugins: ulninfo
Available Packages
Name        : rpmdevtools
Arch        : noarch
Version     : 8.3
Release     : 5.0.1.el7
Size        : 96 k
Repo        : ol7_latest/x86_64
Summary     : RPM Development Tools
URL         : https://fedorahosted.org/rpmdevtools/
Licence     : GPLv2+ and GPLv2
Description : This package contains scripts and (X)Emacs support files to aid in
            : development of RPM packages.
            : rpmdev-setuptree    Create RPM build tree within user's home directory
            : rpmdev-diff         Diff contents of two archives
            : rpmdev-newspec      Creates new .spec from template
            : rpmdev-rmdevelrpms  Find (and optionally remove) "development" RPMs
            : rpmdev-checksig     Check package signatures using alternate RPM keyring
            : rpminfo             Print information about executables and libraries
            : rpmdev-md5/sha*     Display checksums of all files in an archive file
            : rpmdev-vercmp       RPM version comparison checker
            : spectool            Expand and download sources and patches in specfiles
            : rpmdev-wipetree     Erase all files within dirs created by rpmdev-setuptree
            : rpmdev-extract      Extract various archives, "tar xvf" style
            : rpmdev-bumpspec     Bump revision in specfile
            : ...and many more.

[root@linuxdev ~]# yum install rpmdevtools

...

The development tools contain a lot of useful scripts, one of them I consider essential: rpmdev-setuptree. It sets up the relevant directory structure. Granted, the rpmdev tools are geared towards compiling software from their source, optionally apply patches and then create SRPMs and RPMs. RPM wasn’t designed for the purpose of packaging an Oracle database, but it can still be made to work. You can work around the size restriction using the %post hook in the SPEC file, as you will see later in this post. But I’m getting ahead of myself.

First I’m creating the RPM tree structure:

[oracle@linuxdev ~]$ rpmdev-setuptree
[oracle@linuxdev ~]$ ls -l rpmbuild/
total 0
drwxr-xr-x 2 oracle oinstall 6 Oct 24 10:50 BUILD
drwxr-xr-x 2 oracle oinstall 6 Oct 24 10:50 RPMS
drwxr-xr-x 2 oracle oinstall 6 Oct 24 10:50 SOURCES
drwxr-xr-x 2 oracle oinstall 6 Oct 24 10:50 SPECS
drwxr-xr-x 2 oracle oinstall 6 Oct 24 10:50 SRPMS
[oracle@linuxdev ~]$

Using another tool named rpmdev-newspec I create a new, empty SPEC file:

[oracle@linuxdev SPECS]$ rpmdev-newspec oracle_12102_jul16_psu_ojvm.spec
oracle_12102_jul16_psu_ojvm.spec created; type minimal, rpm version >= 4.11.
[oracle@linuxdev SPECS]$ ls -lrt
total 4
-rw-r--r-- 1 oracle oinstall 338 Oct 24 10:53 oracle_12102_jul16_psu_ojvm.spec
[oracle@linuxdev SPECS]$

Completing the SPEC file is easy if you know how to :) To save you the pain I put a stub together that is far from perfect, but should give you an idea. This is just a demo, the code especially lacks all (error, space, etc) checking which really should be in.

[oracle@linuxdev SPECS]$ cat oracle_12102_jul16_psu_ojvm.spec
Name:           oracle_12102_jul16_psu_ojvm
Version:        0
Release:        1%{?dist}
Summary:        a spec file for Oracle 12.1.0.2 with patch 23615289 (combo for 23054246 and 23177536)
Group:          Application/Databases
License:        Commercial
URL:            https://martincarstenbach.wordpress.com/
Requires:       oraclelinux-release >= 7:7.0, oracle-rdbms-server-12cR1-preinstall

%description
Oracle 12.1.0.2 EE patched with the July 2016 PSU and OJVM patch
Requires Oracle Linux 7, depends on the preinstall RPM to set up users, groups and defaults

ONLY A PROTOTYPE, USED FOR DEMONSTRATION PURPOSES ONLY

%files
%defattr(-,root,root,-)

%post
ORACLE_HOME=/u01/app/oracle/product/12.1.0.2/dbhome_1

if [ -d "$ORACLE_HOME" ]; then
        /usr/bin/echo "ORACLE_HOME directory $ORACLE_HOME exists-aborting";
        exit 127
fi

/usr/bin/echo "cloning the oracle home now to $ORACLE_HOME"
/usr/bin/mkdir -p $ORACLE_HOME

/usr/bin/umount /mnt
/usr/bin/mount -t nfs linuxdev:/u01/nfs_export /mnt

/usr/bin/tar --gzip -xf /mnt/oracle/12.1.0.2/oracle_12102_jul16_psu_ojvm.tar.gz -C $ORACLE_HOME

/usr/bin/chown -R oracle:oinstall /u01

/usr/bin/su - oracle -c "cd $ORACLE_HOME/clone/bin && ./clone.pl ORACLE_HOME=$ORACLE_HOME ORACLE_BASE=/u01/app/oracle -defaultHomeName"

/u01/app/oraInventory/orainstRoot.sh
$ORACLE_HOME/root.sh

umount /mnt

%changelog
* Mon Oct 24 2016 Martin Bach 
- initial version

Again, at the risk of repeating myself, this is just a stub and needs to be extended to include error handling and other things that we consider good coding practice. You also should check if there is sufficient space in the $ORACLE_HOME directory etc… you get the idea. Also, if you create another SPEC file for a 12c installation you currently can’t install a new Oracle Home as the path is hard coded to dbhome_1.

The script – which is the part in the %post section – is so simple it should be self-explanatory. It will be passed to /bin/sh and executed as part of the installation process.

With the SPEC file in place you can try and build the RPM. Funny enough, the rpmbuild tool is used for that:

[oracle@linuxdev SPECS]$ rpmbuild -ba oracle_12102_jul16_psu_ojvm.spec
Processing files: oracle_12102_jul16_psu_ojvm-0-1.el7.x86_64
Provides: oracle_12102_jul16_psu_ojvm = 0-1.el7 oracle_12102_jul16_psu_ojvm(x86-64) = 0-1.el7
Requires(interp): /bin/sh
Requires(rpmlib): rpmlib(FileDigests) <= 4.6.0-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1 rpmlib(CompressedFileNames) <= 3.0.4-1
Requires(post): /bin/sh
Checking for unpackaged file(s): /usr/lib/rpm/check-files /home/oracle/rpmbuild/BUILDROOT/oracle_12102_jul16_psu_ojvm-0-1.el7.x86_64
Wrote: /home/oracle/rpmbuild/SRPMS/oracle_12102_jul16_psu_ojvm-0-1.el7.src.rpm
Wrote: /home/oracle/rpmbuild/RPMS/x86_64/oracle_12102_jul16_psu_ojvm-0-1.el7.x86_64.rpm
Executing(%clean): /bin/sh -e /var/tmp/rpm-tmp.4xSozZ
+ umask 022
+ cd /home/oracle/rpmbuild/BUILD
+ /usr/bin/rm -rf /home/oracle/rpmbuild/BUILDROOT/oracle_12102_jul16_psu_ojvm-0-1.el7.x86_64
+ exit 0

This command has created a Source RPM, and the RPM file to be used, as you can see here:

[oracle@linuxdev SPECS]$ ls -l ../RPMS/x86_64/
total 4
-rw-r--r-- 1 oracle oinstall 2832 Oct 26 16:24 oracle_12102_jul16_psu_ojvm-0-1.el7.x86_64.rpm
[oracle@linuxdev SPECS]$ rpm -qpil  --requires ../RPMS/x86_64/oracle_12102_jul16_psu_ojvm-0-1.el7.x86_64.rpm
Name        : oracle_12102_jul16_psu_ojvm
Version     : 0
Release     : 1.el7
Architecture: x86_64
Install Date: (not installed)
Group       : Application/Databases
Size        : 0
License     : Commercial
Signature   : (none)
Source RPM  : oracle_12102_jul16_psu_ojvm-0-1.el7.src.rpm
Build Date  : Wed 26 Oct 2016 16:24:02 BST
Build Host  : linuxdev.example.com
Relocations : (not relocatable)
URL         : https://martincarstenbach.wordpress.com/
Summary     : a spec file for Oracle 12.1.0.2 with patch 23615289 (combo for 23054246 and 23177536)
Description :
Oracle 12.1.0.2 EE patched with the July 2016 PSU and OJVM patch
Requires Oracle Linux 7, depends on the preinstall RPM to set up users, groups and defaults

ONLY A PROTOTYPE, USED FOR DEMONSTRATION PURPOSES ONLY
oraclelinux-release >= 7:7.0
oracle-rdbms-server-12cR1-preinstall
/bin/sh
rpmlib(FileDigests) <= 4.6.0-1
rpmlib(PayloadFilesHavePrefix) <= 4.0-1
rpmlib(CompressedFileNames) <= 3.0.4-1
rpmlib(PayloadIsXz) <= 5.2-1
(contains no files)

You require a “gold image” binary copy for cloning as well. I am assuming it is in place and NFS-accessible in the location specified by the %post scriptlet.

You can refer to Frits Hoogland’s post for more detail about creating the gold image: https://fritshoogland.wordpress.com/2010/07/03/cloning-your-oracle-database-software-installation/. There are probably lots of other posts out there describing the same procedure.

Installation

Here is an example install session, executed as root on a new box I created: server9.

[root@server9 ~]# rpm -ihv /tmp/oracle_12102_jul16_psu_ojvm-0-1.el7.x86_64.rpm

Preparing...                          ################################# [100%]
Updating / installing...
   1:oracle_12102_jul16_psu_ojvm-0-1.e################################# [100%]
cloning the oracle home now to /u01/app/oracle/product/12.1.0.2/dbhome_1
./runInstaller -clone -waitForCompletion  "ORACLE_HOME=/u01/app/oracle/product/12.1.0.2/dbhome_1" 
"ORACLE_BASE=/u01/app/oracle" -defaultHomeName  -silent -paramFile 
/u01/app/oracle/product/12.1.0.2/dbhome_1/clone/clone_oraparam.ini
Starting Oracle Universal Installer...

Checking Temp space: must be greater than 500 MB.   Actual 9204 MB    Passed
Checking swap space: must be greater than 500 MB.   Actual 767 MB    Passed
Preparing to launch Oracle Universal Installer from /tmp/OraInstall2016-10-26_05-03-05PM. 
Please wait ...You can find the log of this install session at:
 /u01/app/oraInventory/logs/cloneActions2016-10-26_05-03-05PM.log
..................................................   5% Done.
..................................................   10% Done.
..................................................   15% Done.
..................................................   20% Done.
..................................................   25% Done.
..................................................   30% Done.
..................................................   35% Done.
..................................................   40% Done.
..................................................   45% Done.
..................................................   50% Done.
..................................................   55% Done.
..................................................   60% Done.
..................................................   65% Done.
..................................................   70% Done.
..................................................   75% Done.
..................................................   80% Done.
..................................................   85% Done.
..........
Copy files in progress.

Copy files successful.

Link binaries in progress.

Link binaries successful.

Setup files in progress.

Setup files successful.

Setup Inventory in progress.

Setup Inventory successful.

Finish Setup successful.
The cloning of OraHome1 was successful.
Please check '/u01/app/oraInventory/logs/cloneActions2016-10-26_05-03-05PM.log' for more details.

Setup Oracle Base in progress.

Setup Oracle Base successful.
..................................................   95% Done.

As a root user, execute the following script(s):
        1. /u01/app/oraInventory/orainstRoot.sh
        2. /u01/app/oracle/product/12.1.0.2/dbhome_1/root.sh



..................................................   100% Done.
Changing permissions of /u01/app/oraInventory.
Adding read,write permissions for group.
Removing read,write,execute permissions for world.

Changing groupname of /u01/app/oraInventory to oinstall.
The execution of the script is complete.
Check /u01/app/oracle/product/12.1.0.2/dbhome_1/install/root_server9_2016-10-26_17-04-05.log 
for the output of root script

The “post” script that is executed spawns a shell (/bin/sh) and executes a script stored in /var/tmp. In my case, this is the contents; pretty much exactly the %post section of the SPEC file:

[root@server9 ~]# cat /var/tmp/rpm-tmp.tZ4pYf
ORACLE_HOME=/u01/app/oracle/product/12.1.0.2/dbhome_1

if [ -d "$ORACLE_HOME" ]; then
        /usr/bin/echo "ORACLE_HOME directory $ORACLE_HOME exists-aborting";
        exit 127
fi

/usr/bin/echo "cloning the oracle home now to $ORACLE_HOME"
/usr/bin/mkdir -p $ORACLE_HOME

/usr/bin/umount /mnt
/usr/bin/mount -t nfs linuxdev:/u01/nfs_export /mnt

/usr/bin/tar --gzip -xf /mnt/oracle/12.1.0.2/oracle_12102_jul16_psu_ojvm.tar.gz -C $ORACLE_HOME

/usr/bin/chown -R oracle:oinstall /u01

/usr/bin/su - oracle -c "cd $ORACLE_HOME/clone/bin && ./clone.pl ORACLE_HOME=$ORACLE_HOME ORACLE_BASE=/u01/app/oracle -defaultHomeName"

/u01/app/oraInventory/orainstRoot.sh
$ORACLE_HOME/root.sh

umount /mnt
[root@server9 ~]#

And that’s it! One of the slightly more complex things was to ensure that this can be installed on Oracle Linux 7 only. One way to do this is to add the oraclelinux-release package as a dependency. I had to fiddle around with it a bit as it uses the epoch field (which seems rare), and not just the version. After I worked it out I managed to enforce the release. On my Oracle Linux 6 system the RPM could not be installed:

[root@linux6 ~]# rpm -ihv /tmp/oracle_12102_jul16_psu_ojvm-1-0.el7.x86_64.rpm
error: Failed dependencies:
	oraclelinux-release >= 7:7.0 is needed by oracle_12102_jul16_psu_ojvm-1-0.el7.x86_64

If you are now thinking of using the epoch:version to check for the oraclelinux-release on Oracle Linux 6, don’t. This is the current oraclelinux-release RPM taken from the “latest” channel in public-yum:

[oracle@linux6 ~]$ rpm -qpi --provides ./oraclelinux-release-6Server-8.0.3.x86_64.rpm
Name        : oraclelinux-release          Relocations: (not relocatable)
Version     : 6Server                           Vendor: Oracle America
Release     : 8.0.3                         Build Date: Wed May 11 11:51:57 2016
Install Date: (not installed)               Build Host: x86-ol6-builder-05.us.oracle.com
Group       : System Environment/Base       Source RPM: oraclelinux-release-6Server-8.0.3.src.rpm
Size        : 49976                            License: GPL
Signature   : RSA/8, Wed May 11 11:44:59 2016, Key ID 72f97b74ec551f03
Summary     : Oracle Linux 6 release file
Description :
System release and information files
config(oraclelinux-release) = 6:6Server-8.0.3
oraclelinux-release = 6:6Server-8.0.3
oraclelinux-release(x86-64) = 6:6Server-8.0.3

So instead of “Requires: oraclelinux-release >= 7:7.0”, … you probably need to replace that with “Requires: oraclelinux-release >= 6:6Server-2.0” for 12c (Oracle Linux 6 update 2 or newer, according to the Linux installation guide).

Posted in 12c Release 1, Linux | Tagged: , | 5 Comments »

Little things worth knowing: when a transient ASM disk failure cannot be fixed in time

Posted by Martin Bach on October 7, 2016

In the previous blog post I used libvirt and KVM in my lab environment to simulate a transient disk failure and how to recover from it. This post takes this example a step further: I am simulating another disk failure, but this time won’t pretend I can fix the issue and put it back. In other words, I simulate the effect of the disk_repair_time hitting zero.

Most of what I am covering here is an extension of the previous post, I’ll mention the main detail here for your benefit, but would like to invite you to revert to the previous post for more detail.

The idea is to show you the output of the ASM alert.log and result of the lost disk in the V$-views.

As with the previous post, the code examples in this one are for demonstration purposes only!

The setup in a nutshell

I am using Oracle Linux 7.2 with UEK4 as the host system; KVM and libvirt are responsible for the guests (virtual machines). The guest VM used for this example is named asmtest, and uses Oracle Linux 7.2 as well. There are 9 ASM disks – 8 for +DATA featuring normal redundancy in 2 failure groups. I added +RECO for the sake of completeness with external redundancy. This post is about +DATA. To keep it simple I used an Oracle Restart configuration patched to the July 2016 PSU.

Removing the disk

As with the previous example I am using libvirt to remove a “LUN” temporarily from the guest. And sure enough, the VM picks this up. This is the relevant output obtained via journalctl -f

Sep 29 15:34:36 asmtest kernel: sd 2:0:0:0: [sda] Synchronizing SCSI cache
Sep 29 15:34:36 asmtest kernel: sd 2:0:0:0: [sda]
Sep 29 15:34:36 asmtest kernel: Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Sep 29 15:34:36 asmtest kernel: sd 2:0:0:0: [sda]
Sep 29 15:34:36 asmtest kernel: Sense Key : Illegal Request [current] 
Sep 29 15:34:36 asmtest kernel: sd 2:0:0:0: [sda]
Sep 29 15:34:36 asmtest kernel: Add. Sense: Logical unit not supported

In the previous post I keep referring to I copied the part of the ASM instance’s alert.log that showed how the disk repair timer was ticking down. This time I am simulating the case where – for whatever reason – the transient failure could not be fixed. In that case, this is what you would see in the alert.log:

2016-09-29 15:38:21.752000 +01:00
WARNING: Started Drop Disk Timeout for Disk 1 (DATA_0001) in group 1 with a value 600
WARNING: Disk 1 (DATA_0001) in group 1 will be dropped in: (600) secs on ASM inst 1
2016-09-29 15:41:25.379000 +01:00
WARNING: Disk 1 (DATA_0001) in group 1 will be dropped in: (416) secs on ASM inst 1
2016-09-29 15:44:29.012000 +01:00
WARNING: Disk 1 (DATA_0001) in group 1 will be dropped in: (232) secs on ASM inst 1
2016-09-29 15:47:32.643000 +01:00
WARNING: Disk 1 (DATA_0001) in group 1 will be dropped in: (48) secs on ASM inst 1
2016-09-29 15:50:36.259000 +01:00
WARNING: Disk 1 (DATA_0001) in group 1 will be dropped in: (0) secs on ASM inst 1

The last line in the above output is definitely NOT what you want to see. If at all possible, you should fix the problem causing DATA_0001 in this example from being dropped. If you see the message about the disk being dropped you are facing the inevitable rebalance operation. Here is the continued output from the ASM alert.log:

WARNING: PST-initiated drop of 1 disk(s) in group 1(.4232658126))
SQL> alter diskgroup DATA drop disk DATA_0001 force /* ASM SERVER */
NOTE: cache closing disk 1 of grp 1: (not open) DATA_0001
NOTE: requesting all-instance membership refresh for group=1
GMON updating for reconfiguration, group 1 at 40 for pid 23, osid 3216
NOTE: cache closing disk 1 of grp 1: (not open) DATA_0001
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: group 1 PST updated.
NOTE: membership refresh pending for group 1/0xfc493cce (DATA)
GMON querying group 1 at 41 for pid 16, osid 2701
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATA
SUCCESS: refreshed membership for 1/0xfc493cce (DATA)
SUCCESS: alter diskgroup DATA drop disk DATA_0001 force /* ASM SERVER */
SUCCESS: PST-initiated drop disk in group 1(4232658126))
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: starting rebalance of group 1/0xfc493cce (DATA) at power 1
Starting background process ARB0
ARB0 started with pid=20, OS id=3655 
NOTE: assigning ARB0 to group 1/0xfc493cce (DATA) with 1 parallel I/O
2016-09-29 15:50:40.257000 +01:00
NOTE: restored redundancy of control and online logs in DATA
NOTE: Rebalance has restored redundancy for any existing control file or redo log in disk group DATA
NOTE: restored redundancy of control and online logs in DATA
2016-09-29 15:50:51.655000 +01:00
NOTE: requesting all-instance membership refresh for group=1
GMON updating for reconfiguration, group 1 at 42 for pid 21, osid 3719
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATA
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: group 1 PST updated.
2016-09-29 15:50:54.647000 +01:00
GMON updating for reconfiguration, group 1 at 43 for pid 21, osid 3719
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATA
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: group 1 PST updated.
2016-09-29 15:50:57.571000 +01:00
NOTE: membership refresh pending for group 1/0xfc493cce (DATA)
GMON querying group 1 at 44 for pid 16, osid 2701
GMON querying group 1 at 45 for pid 16, osid 2701
NOTE: Disk _DROPPED_0001_DATA in mode 0x0 marked for de-assignment
SUCCESS: refreshed membership for 1/0xfc493cce (DATA)
NOTE: Attempting voting file refresh on diskgroup DATA
2016-09-29 15:51:03.589000 +01:00
NOTE: stopping process ARB0
SUCCESS: rebalance completed for group 1/0xfc493cce (DATA)

The last line indicates that the rebalance operation is complete. My disk group was almost empty, the rebalance operation took almost no time. But indeed, when you check V$ASM_DISK, the disk is gone:

SQL> r
  1  select name,path,disk_number,mount_status,header_status,mode_status,state,failgroup,repair_timer
  2* from v$asm_disk where group_number = 1 order by disk_number

NAME       PATH            DISK_NUMBER MOUNT_S HEADER_STATU MODE_ST STATE    FAILGROUP  REPAIR_TIMER
---------- --------------- ----------- ------- ------------ ------- -------- ---------- ------------
DATA_0000  /dev/asm-disk02           0 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                0
DATA_0002  /dev/asm-disk03           2 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                0
DATA_0003  /dev/asm-disk06           3 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                0
DATA_0004  /dev/asm-disk07           4 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                0
DATA_0005  /dev/asm-disk08           5 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                0
DATA_0006  /dev/asm-disk04           6 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                0
DATA_0007  /dev/asm-disk05           7 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                0

7 rows selected.

SQL> select count(*), failgroup from v$asm_disk where name like 'DATA%' group by failgroup;

  COUNT(*) FAILGROUP
---------- ----------
         3 BATCH1
         4 BATCH2

My system is fully operational, and the rebalance did not run into any space problems. Space problems are the last thing you want to have when rebalancing. I did some research about this subject earlier and documented it in a blog post.

In a future blog post I am going to investigate what happens when a partner disk of DATA_0001 fails while the repair timer is ticking down.

Posted in 12c Release 1, Automatic Storage Management, KVM, Linux | Leave a Comment »