Martins Blog

Trying to explain complex things in simple terms

Archive for the ‘KVM’ Category

New Events for Data Guard and Synchronous Redo Transport in 12c (1)

Posted by Martin Bach on February 24, 2017

I may have said it before but I consider presenting and teaching a great way to expand one’s knowledge: first of all it requires me to really understand a subject. Secondly, when presenting, you get lots of interesting questions that can turn into blog posts like this one.

Lately I have been asked about the impact of synchronous log shipping to a physical standby database. I was sure there was an effect to be observed, depending most likely on the network latency between systems but I didn’t have any evidence I could pull out of the hat to back up my thoughts. So what better than trying! I also read that some of the events have changed in 12c, and wanted to make them visible. My environment is based on the 2 node RAC primary/2 node RAC standby configuration I wrote about in my previous posts.

Since their initial setup I upgraded the cluster to 12.1.0.2.170117 for Clusterware and RDBMS.

I am planning to share my findings in two posts: the first one (the one you are reading) will present the baseline, the second one to follow shows how performance can be impacted by poor network performance. I guess the second one will be a bit longer …

Background

Data Guard allows the administrator to define the databases in a Data Guard configuration to be in one of three possible modes. The first one is Maximum Performance, the second Maximum Availability and finally Maximum Protection. All three modes are covered in the official documentation set and MOS. Without saying it very scientifically:

  • Maximum Performance uses asynchronous log shipping and is the default. The performance of the standby database (or rather lack thereof) should not impact the performance of the primary. At a cost: it is not guaranteed that redo is actually written to the standby when the commit command returns.
  • Maximum Protection allows you to enforce a Zero Data Loss (ZDL) strategy but also at a cost: The primary will shut down if it can’t transmit redo information to the standby. That is how you prevent data loss!
  • Maximum Availability is the middle ground and I’ll focus on that protection mode here.

Make sure to read the documentation and understand the implications of each protection mode when assessing the most suitable mode for your environment!

Background: New Events

Before going into details I should probably point out that there are some new Data Guard events. Hurray! More events and more meaningful names, great. Just one caveat-have a look at the output from my 12c database:

select name, wait_class from v$event_name
where name in (
'ARCH Remote Write','ASYNC Remote Write','Redo Transport Attach',
'Redo Transport Close','Redo Transport Detach','Redo Transport Open',
'Redo Transport Ping','Redo Transport Slave Shutdown','Redo Transport Slave Startup',
'Redo Writer Remote Sync Complete', 'Redo Writer Remote Sync Notify',
'Remote SYNC Ping','SYNC Remote Write');

NAME                                               WAIT_CLASS
-------------------------------------------------- ----------------------------------------
ASYNC Remote Write                                 Other
SYNC Remote Write                                  Other
ARCH Remote Write                                  Other
Redo Transport Attach                              Other
Redo Transport Detach                              Other
Redo Transport Open                                Other
Redo Transport Close                               Other
Redo Transport Ping                                Other
Remote SYNC Ping                                   Other
Redo Transport Slave Startup                       Other
Redo Transport Slave Shutdown                      Other
Redo Writer Remote Sync Notify                     Other
Redo Writer Remote Sync Complete                   Other

13 rows selected.

Compare this to the pre 12c events (taken from an 11.2.0.3 database since they are mostly gone in 12c)

select name, wait_class from v$event_name
where name in (
'ARCH wait on ATTACH','ARCH wait on SENDREQ','ARCH wait on DETACH',
'LNS wait on ATTACH', 'LNS wait on SENDREQ','LNS wait on DETACH',
'LGWR wait on LNS','LNS wait on LGWR','LGWR-LNS wait on channel');

NAME                                               WAIT_CLASS
-------------------------------------------------- --------------------
ARCH wait on ATTACH                                Network
LNS wait on ATTACH                                 Network
LNS wait on SENDREQ                                Network
LNS wait on DETACH                                 Network
ARCH wait on SENDREQ                               Network
ARCH wait on DETACH                                Network
LNS wait on LGWR                                   Network
LGWR wait on LNS                                   Network
LGWR-LNS wait on channel                           Other

9 rows selected.

For the sake of completeness here are those events that remained in 12c:

select name, wait_class from v$event_name
where name in (
'ARCH wait on ATTACH','ARCH wait on SENDREQ','ARCH wait on DETACH',
'LNS wait on ATTACH', 'LNS wait on SENDREQ','LNS wait on DETACH',
'LGWR wait on LNS','LNS wait on LGWR','LGWR-LNS wait on channel');

NAME                           WAIT_CLASS
------------------------------ ------------------------------
LNS wait on LGWR               Network
LGWR wait on LNS               Network
LGWR-LNS wait on channel       Other

Bugger. The new events are all in the “Other” wait class. My initial attempts at filtering information from v$event_histogram based on events with the “Network” class had little success. I guess friends of the OEM performance pages might also find that interesting. If you wonder from where I pulled these events – I used a white paper named “Best Practices for Synchronous Redo Transport” as reference.

Testing: The Environment

As you read in the introduction, this environment is based on a two node RAC primary -> two node RAC standby configuration:

DGMGRL> show configuration verbose

Configuration - ractest

  Protection Mode: MaxPerformance
  Members:
  NCDBA - Primary database
    NCDBB - Physical standby database

  Properties:
    FastStartFailoverThreshold      = '30'
    OperationTimeout                = '30'
    TraceLevel                      = 'USER'
    FastStartFailoverLagLimit       = '30'
    CommunicationTimeout            = '180'
    ObserverReconnect               = '0'
    FastStartFailoverAutoReinstate  = 'TRUE'
    FastStartFailoverPmyShutdown    = 'TRUE'
    BystandersFollowRoleChange      = 'ALL'
    ObserverOverride                = 'FALSE'
    ExternalDestination1            = ''
    ExternalDestination2            = ''
    PrimaryLostWriteAction          = 'CONTINUE'

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS
DGMGRL> show database 'NCDBA';

Database - NCDBA

  Role:               PRIMARY
  Intended State:     TRANSPORT-ON
  Instance(s):
    NCDBA1
    NCDBA2

Database Status:
SUCCESS

DGMGRL> show database 'NCDBB'

Database - NCDBB

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 0 seconds ago)
  Apply Lag:          0 seconds (computed 0 seconds ago)
  Average Apply Rate: 177.00 KByte/s
  Real Time Query:    OFF
  Instance(s):
    NCDBB1
    NCDBB2 (apply instance)

Database Status:
SUCCESS

DGMGRL> show database 'NCDBA' LogXptMode
  LogXptMode = 'async'
DGMGRL> show database 'NCDBB' LogXptMode
  LogXptMode = 'async'

The configuration is currently set to “Maximum Performance”, databases ship redo asynchronously.

All systems are patched to the January 2017 Proactive Bundle Patch:

[oracle@rac12pri1 ~]$ $ORACLE_HOME/OPatch/opatch lspatches
24917972;Database PSU 12.1.0.2.170117, Oracle JavaVM Component (JAN2017)
24732082;Database Patch Set Update : 12.1.0.2.170117 (24732082)
24828633;OCW Patch Set Update : 12.1.0.2.170117 (24828633)

OPatch succeeded.

Testing part I: The Baseline

I am using Swingbench’s Order Entry benchmark to generate a little bit of load. For the avoidance of doubt: I am not interested in a comparison between asynchronous and synchronous log shipping. I would like to see the effect caused by the difference in latency in the form of hugely increased waits. In my first test I am running with the defaults of my VM (the NIC uses virtio drivers), no change to SDU or send/receive buffers.

$ date
Fri Feb 17 13:58:58 UTC 2017
$ ./charbench -cs //rac12pri-scan/swingbench_both \
> -uc 20 -r results_20_maxperf.xml -rt 00:10
Author  :        Dominic Giles
Version :        2.5.0.971

Results will be written to results_20_maxperf.xml.
Hit Return to Terminate Run...

Time            Users   TPM     TPS

1:59:55 PM      20      3852    255

On the apply side this translates to the following figures (the average apply rate is misleading and should be ignored; it captures all apply since MRP0 was started)

DGMGRL> show database 'NCDBB'

Database - NCDBB

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 0 seconds ago)
  Apply Lag:          0 seconds (computed 0 seconds ago)
  Average Apply Rate: 713.00 KByte/s
  Real Time Query:    OFF
  Instance(s):
    NCDBB1
    NCDBB2 (apply instance)

Database Status:
SUCCESS

There is no transport lag, and no apply lag either. When checking for waits on background processes (using Tanel Poder’s ashtop – which requires you to have the Diagnostic Pack license as it access v$active_session_history) I noticed the following:

SQL> @ashtop program,event,wait_class,session_state session_type='BACKGROUND' "to_date('17.02.2017 09:02','dd.mm.yyyy hh24:mi')" "to_date('17.02.2017 09:07','dd.mm.yyyy hh24:mi')"

    Total
  Seconds     AAS %This   PROGRAM                                          EVENT                                    WAIT_CLASS      SESSION FIRST_SEEN          LAST_SEEN           DIST_SQLEXEC_SEEN
--------- ------- ------- ------------------------------------------------ ---------------------------------------- --------------- ------- ------------------- ------------------- -----------------
       53      .2   29% | oracle@rac12pri2 (LGWR)                          log file parallel write                  System I/O      WAITING 2017-02-17 09:03:42 2017-02-17 09:06:53                 1
       23      .1   13% | oracle@rac12pri2 (LG00)                          log file parallel write                  System I/O      WAITING 2017-02-17 09:02:03 2017-02-17 09:03:26                 1
       19      .1   10% | oracle@rac12pri2 (DBW0)                                                                                   ON CPU  2017-02-17 09:02:07 2017-02-17 09:06:39                 1
       12      .0    7% | oracle@rac12pri2 (LGWR)                          target log write size                    Other           WAITING 2017-02-17 09:02:03 2017-02-17 09:03:27                 1
        9      .0    5% | oracle@rac12pri1 (LMS1)                                                                                   ON CPU  2017-02-17 09:04:37 2017-02-17 09:06:58                 1
        8      .0    4% | oracle@rac12pri2 (TT00)                                                                                   ON CPU  2017-02-17 09:02:15 2017-02-17 09:06:56                 1
        6      .0    3% | oracle@rac12pri1 (LG00)                          log file parallel write                  System I/O      WAITING 2017-02-17 09:03:08 2017-02-17 09:06:44                 1
        6      .0    3% | oracle@rac12pri2 (LGWR)                                                                                   ON CPU  2017-02-17 09:04:05 2017-02-17 09:06:41                 1
        5      .0    3% | oracle@rac12pri2 (LG00)                                                                                   ON CPU  2017-02-17 09:02:02 2017-02-17 09:03:27                 1
        5      .0    3% | oracle@rac12pri2 (LMS0)                                                                                   ON CPU  2017-02-17 09:02:07 2017-02-17 09:06:14                 1
        5      .0    3% | oracle@rac12pri2 (LMS1)                                                                                   ON CPU  2017-02-17 09:02:24 2017-02-17 09:06:28                 1
        4      .0    2% | oracle@rac12pri1 (LMS0)                                                                                   ON CPU  2017-02-17 09:04:21 2017-02-17 09:06:05                 1
        3      .0    2% | oracle@rac12pri1 (LGWR)                          LGWR all worker groups                   Other           WAITING 2017-02-17 09:03:08 2017-02-17 09:06:13                 1
        2      .0    1% | oracle@rac12pri1 (DBW0)                          db file parallel write                   System I/O      WAITING 2017-02-17 09:03:08 2017-02-17 09:03:42                 1
        2      .0    1% | oracle@rac12pri2 (CKPT)                                                                                   ON CPU  2017-02-17 09:05:10 2017-02-17 09:05:13                 1
        2      .0    1% | oracle@rac12pri2 (TT00)                          log file sequential read                 System I/O      WAITING 2017-02-17 09:03:17 2017-02-17 09:03:46                 1
        1      .0    1% | oracle@rac12pri1 (CJQ0)                                                                                   ON CPU  2017-02-17 09:04:44 2017-02-17 09:04:44                 1
        1      .0    1% | oracle@rac12pri1 (DBW0)                                                                                   ON CPU  2017-02-17 09:03:05 2017-02-17 09:03:05                 1
        1      .0    1% | oracle@rac12pri1 (LG00)                                                                                   ON CPU  2017-02-17 09:02:02 2017-02-17 09:02:02                 1
        1      .0    1% | oracle@rac12pri1 (LGWR)                          target log write size                    Other           WAITING 2017-02-17 09:05:47 2017-02-17 09:05:47                 1

The time difference between charbench and the database can be explained by time zones: my load generator is set to use UTC whereas the database is on EST; the result is a 6 hour time difference. This is a minor oversight on my part and has nothing to do with the location of the machines. In fact, they are all virtualised.

Summary Part I

In this part I tried to lay the foundation for the next one where I’m using my lab environment to simulate an artificially high network latency. I have to admit again that my setup is somewhat over-simplified-using Data Guard with asynchronous log shipping and Maximum Performance mode is not something I’d like to support for anything more important than my toy databases in the lab.

References

http://www.oracle.com/technetwork/database/availability/sync-2437177.pdf

Posted in 12c Release 1, KVM, Linux, Performance | Leave a Comment »

Creating a RAC 12.1 Data Guard Physical Standby environment (4)

Posted by Martin Bach on February 15, 2017

In the previous three parts of this series a lot of preparation work, needed for the configuration of Data Guard, was performed. In this part of the mini-series they all come to fruition. Using the Data Guard broker a switchover operation will be performed. A couple of new features in 12c make this easier. According to the “Changes in This Release for Oracle Data Guard Concepts and Administration” chapter of the 12.1 Data Guard Concepts and Administration guide:

When [you, ed.] perform a switchover from an Oracle RAC primary database to a physical standby database, it is no longer necessary to shut down all but one primary database instance.

I have always wanted to test that in a quiet moment…

I have previously blogged about another useful change that should make my life easier: the static registration of the *_DGMGRL services in the listener.ora file is no longer needed. Have a look at my Data Guard Broker Setup Changes post for more details and reference to the documentation.

NOTE: As always, this is just a demonstration using VMs in my lab, based on my notes. Your system is most likely different, so in real-life you might take a different approach. The techniques I am using here were suitable for me, and my own small scale testing. I tried to make sure they are valid, but you may want to allocate more resources in your environment. Test, test, test on your own environment on test kit first!

Now let’s get to it.

Step 1: Check the status of the configuration

In the first step I always check the configuration and make sure I can switch over. Data Guard 12c has a nifty automatic check that helps, but I always have a list of tasks I perform prior to a switchover (not shown in this blog post).

The following commands are somewhat sensitive to availability of the network – you should protect your sessions against any type of network failure! I am using screen (1) for that purpose, there are other tools out there doing similar things. Network glitches are too common to ignore, and I have come to appreciate the ability to resume work without too many problems after having seen the dreaded “broken pipe” message in my terminal window…

[oracle@rac12sec1 ~]$ dgmgrl
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production

Copyright (c) 2000, 2013, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys@ncdbb
Password:
Connected as SYSDBA.
DGMGRL> show configuration

Configuration - ractest

  Protection Mode: MaxPerformance
  Members:
  NCDBA - Primary database
    NCDBB - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 55 seconds ago)

DGMGRL> validate database 'NCDBB'
...

The command to check for switchover readiness is new to 12c as well and called “validate database”. I don’t have screen output from the situation at this point-just take my word that I was ready :) Don’t switch over if you have any concerns or doubts the operation might not succeed! “Validate database” does not relieve you from your duties to check for switchover readiness – follow your procedures.

Step 2: Switch Over

Finally, the big moment has come! It takes just one line to perform the switchover:

DGMGRL> switchover to 'NCDBB'
Performing switchover NOW, please wait...
New primary database "NCDBB" is opening...
Oracle Clusterware is restarting database "NCDBA" ...
Switchover succeeded, new primary is "NCDBB"
DGMGRL> 

DGMGRL> show database 'NCDBA';

Database - NCDBA

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 1 second ago)
  Apply Lag:          0 seconds (computed 1 second ago)
  Average Apply Rate: 9.00 KByte/s
  Real Time Query:    ON
  Instance(s):
    NCDBA1
    NCDBA2 (apply instance)

Database Status:
SUCCESS

DGMGRL> show database 'NCDBB';

Database - NCDBB

  Role:               PRIMARY
  Intended State:     TRANSPORT-ON
  Instance(s):
    NCDBB1
    NCDBB2

Database Status:
SUCCESS

DGMGRL> 

Well that was easy! Did you notice Data Guard Broker telling us that ‘Oracle Clusterware is restarting database “NCDBA” …’ ? I like it.

If you get stuck at this point something has gone wrong with the database registration in the OCR. You shouldn’t run into problems though, because you tested every aspect of the RAC system before handing the system over to its intended users, didn’t you?

Validating the new standby database shows no issues. I haven’t noticed it before but “validate database” allows you to get more verbose output:

DGMGRL> validate database verbose 'NCDBA';

  Database Role:     Physical standby database
  Primary Database:  NCDBB

  Ready for Switchover:  Yes
  Ready for Failover:    Yes (Primary Running)

  Capacity Information:
    Database  Instances        Threads
    NCDBB     2                2
    NCDBA     2                2

  Temporary Tablespace File Information:
    NCDBB TEMP Files:  1
    NCDBA TEMP Files:  1

  Flashback Database Status:
    NCDBB:  On
    NCDBA:  Off

  Data file Online Move in Progress:
    NCDBB:  No
    NCDBA:  No

  Standby Apply-Related Information:
    Apply State:      Running
    Apply Lag:        0 seconds (computed 1 second ago)
    Apply Delay:      0 minutes

  Transport-Related Information:
    Transport On:      Yes
    Gap Status:        No Gap
    Transport Lag:     0 seconds (computed 1 second ago)
    Transport Status:  Success

  Log Files Cleared:
    NCDBB Standby Redo Log Files:  Cleared
    NCDBA Online Redo Log Files:   Cleared
    NCDBA Standby Redo Log Files:  Available

  Current Log File Groups Configuration:
    Thread #  Online Redo Log Groups  Standby Redo Log Groups Status
              (NCDBB)                 (NCDBA)
    1         2                       3                       Sufficient SRLs
    2         2                       3                       Sufficient SRLs

  Future Log File Groups Configuration:
    Thread #  Online Redo Log Groups  Standby Redo Log Groups Status
              (NCDBA)                 (NCDBB)
    1         2                       3                       Sufficient SRLs
    2         2                       3                       Sufficient SRLs

  Current Configuration Log File Sizes:
    Thread #   Smallest Online Redo      Smallest Standby Redo
               Log File Size             Log File Size
               (NCDBB)                   (NCDBA)
    1          50 MBytes                 50 MBytes
    2          50 MBytes                 50 MBytes

  Future Configuration Log File Sizes:
    Thread #   Smallest Online Redo      Smallest Standby Redo
               Log File Size             Log File Size
               (NCDBA)                   (NCDBB)
    1          50 MBytes                 50 MBytes
    2          50 MBytes                 50 MBytes

  Apply-Related Property Settings:
    Property                        NCDBB Value              NCDBA Value
    DelayMins                       0                        0
    ApplyParallel                   AUTO                     AUTO

  Transport-Related Property Settings:
    Property                        NCDBB Value              NCDBA Value
    LogXptMode                      ASYNC                    ASYNC
    RedoRoutes                      <empty>                  <empty>
    Dependency                      <empty>                  <empty>
    DelayMins                       0                        0
    Binding                         optional                 optional
    MaxFailure                      0                        0
    MaxConnections                  1                        1
    ReopenSecs                      300                      300
    NetTimeout                      30                       30
    RedoCompression                 DISABLE                  DISABLE
    LogShipping                     ON                       ON

  Automatic Diagnostic Repository Errors:
    Error                       NCDBB    NCDBA
    No logging operation        NO       NO
    Control file corruptions    NO       NO
    SRL Group Unavailable       NO       NO
    System data file missing    NO       NO
    System data file corrupted  NO       NO
    System data file offline    NO       NO
    User data file missing      NO       NO
    User data file corrupted    NO       NO
    User data file offline      NO       NO
    Block Corruptions found     NO       NO

DGMGRL> 

Isn’t that cool? That’s more information at my fingertips than I can shake a stick at! It’s also a lot more than I could think of (eg online datafile move!).

Interestingly the Broker reports that I have “Sufficient SRLs”. I have seen it complain about the number of Standby Redo Logs in the past and blogged about this Interesting observation about standby redo logs in Data Guard

Summary

After 4 (!) posts about the matter I have finally been able to perform a switchover operation. Role reversals are a much neglected operation a DBA should be comfortable with. In a crisis situation everyone needs to be clear about what needs to be done to restore service to the users. The database is usually the easier part … Success of Data Guard switchover operations also depends on the quality of change management: it is easy to “forget” applying configuration changes on the DR site.

In today’s busy times only few of us are lucky enough to intimately know each and every database we look after. What’s more common (sadly!) is that a DBA looks after 42 or more databases. This really only works without too many issues if procedures and standards are rock solid, and enforced.

Posted in 12c Release 1, Data Guard, KVM, Linux, Uncategorized | Leave a Comment »

Creating a RAC 12.1 Data Guard Physical Standby environment (3)

Posted by Martin Bach on January 23, 2017

In the previous two parts of this series you read about my lab environment and the preparations on the network side as well as the database. In this part I’ll cover the database duplication. Again, this won’t be a short post …

NOTE: As always, this is just a demonstration using VMs in my lab, based on my notes. Your system is most likely different, so in real-life you might take a different approach. The techniques I am using here were suitable for me, and my own small scale testing. I tried to make sure they are valid, but you may want to allocate more resources in your environment. Test, test, test on your own environment on test kit first!

Step 1: Create an initialisation file

The next step is the preparation of an initialisation file. I am taking NCDBA as the sample and transfer it over to rac12sec1:

SQL> create pfile='/tmp/initNCDBA1.ora' from spfile;

File created.

[oracle@rac12pri1 ~]$ scp /tmp/initNCDBA1.ora rac12sec1:/tmp
oracle@rac12sec1's password: 
initNCDBA1.ora                                            100% 1515     1.5KB/s   00:00    
[oracle@rac12pri1 ~]$ 

The file is a bit verbose, and needs some editing. I appreciate that RMAN replaces the file anyway with a copy but I like to keep it tidy, at least at first … The end result is shown here-as always, you make sure that your version of the pfile matches your site’s requirements and standards. The important part is where you set the directory paths correctly, and you also ensure that db_name and db_unique_name are set correctly.

*.audit_file_dest='/u01/app/oracle/admin/NCDBB/adump'
*.audit_trail='db'
*.cluster_database=true
*.compatible='12.1.0.2.0'
*.db_block_size=8192
*.db_create_file_dest='+DATA'
*.db_domain=''
*.db_name='NCDBA'
*.db_unique_name='NCDBB'
*.db_recovery_file_dest='+RECO'
*.db_recovery_file_dest_size=4785m
*.diagnostic_dest='/u01/app/oracle'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=NCDBBXDB)'
NCDBB1.instance_number=1
NCDBB2.instance_number=2
*.open_cursors=300
*.pga_aggregate_target=1024m
*.processes=300
*.remote_login_passwordfile='exclusive'
*.sga_target=3072m
NCDBB2.thread=2
NCDBB1.thread=1
NCDBB1.undo_tablespace='UNDOTBS1'
NCDBB2.undo_tablespace='UNDOTBS2'

The file needs to be transferred to $ORACLE_HOME/dbs as initNCDBB1.ora.

Step 2: Create a temporary password file

You undoubtedly spotted that remote_login_passwordfile is set, which is a pre-requisite for connecting to the database as SYS via the network. The password file is required for an active database duplication, and without a password file this wouldn’t work. Since Oracle takes care of the password file during the RMAN duplicate it appears to be sufficient to simply create a password file in $ORACLE_HOME/dbs on rac12sec1:

[oracle@rac12sec1 dbs]$ orapwd file=orapwNCDBB1

Enter password for SYS:
[oracle@rac12sec1 dbs]$

During my testing this was enough to proceed to the next step.

Step 3: Duplicate the database “for standby”

The system should now be ready for the standby database’s creation. It’s been a bit of a task to get there, but from now on things will be easier … Using the *_DUP entries in the system’s tnsnames.ora file I connect to the target and auxiliary database, followed by the command to create the standby. In my case the database is more than tiny and the RMAN defaults are enough. Larger databases would probably need different settings and commands.


[oracle@rac12sec1 dbs]$ rman target sys@ncdba_dup auxiliary sys@ncdbb_dup

Recovery Manager: Release 12.1.0.2.0 - Production on Wed Dec 14 10:25:47 2016

Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.

target database Password: 
connected to target database: NCDBA (DBID=479005270)
auxiliary database Password: 
connected to auxiliary database: NCDBA (DBID=479005270, not open)

RMAN> startup clone nomount

connected to auxiliary database (not started)
Oracle instance started

Total System Global Area    3221225472 bytes

Fixed Size                     2929552 bytes
Variable Size                838863984 bytes
Database Buffers            2365587456 bytes
Redo Buffers                  13844480 bytes

RMAN> duplicate target database for standby from active database;

Starting Duplicate Db at 14-DEC-16
allocated channel: ORA_AUX_DISK_1
channel ORA_AUX_DISK_1: SID=240 instance=NCDBB1 device type=DISK

contents of Memory Script:
{
   backup as copy reuse
   targetfile  '+DATA/NCDBA/PASSWORD/pwdncdba.256.930561891' auxiliary format
 '/u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/orapwNCDBB1'   ;
}
executing Memory Script

Starting backup at 14-DEC-16
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=276 instance=NCDBA1 device type=DISK
Finished backup at 14-DEC-16

contents of Memory Script:
{
   sql clone "create spfile from memory";
   shutdown clone immediate;
   startup clone nomount;
   restore clone from service  'ncdba_dup' standby controlfile;
}  
executing Memory Script

sql statement: create spfile from memory

Oracle instance shut down

sql statement: alter system archive log current

contents of Memory Script:
{  
   switch clone datafile all;
}
executing Memory Script

datafile 1 switched to datafile copy
input datafile copy RECID=7 STAMP=930565247 file name=+DATA/NCDBB/DATAFILE/system.259.930565213
datafile 2 switched to datafile copy
input datafile copy RECID=8 STAMP=930565247 file name=+DATA/NCDBB/DATAFILE/undotbs2.260.930565221
datafile 3 switched to datafile copy
input datafile copy RECID=9 STAMP=930565247 file name=+DATA/NCDBB/DATAFILE/sysaux.261.930565223
datafile 4 switched to datafile copy
input datafile copy RECID=10 STAMP=930565247 file name=+DATA/NCDBB/DATAFILE/undotbs1.262.930565231
datafile 5 switched to datafile copy
input datafile copy RECID=11 STAMP=930565247 file name=+DATA/NCDBB/DATAFILE/example.263.930565233
datafile 6 switched to datafile copy
input datafile copy RECID=12 STAMP=930565247 file name=+DATA/NCDBB/DATAFILE/users.264.930565241
Finished Duplicate Db at 14-DEC-16

RMAN>

Perfect – the duplicate finished ok and all 6 datafiles have been transferred to my standby cluster. There are a few things worth noting here:

  • The password file of the primary (residing in ASM) has been copied to ‘/u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/orapwNCDBB1’ (not in ASM)
  • The standby database’s spfile has been created from memory – a lot of unneccesary and some obsolete parameters are now in the standby database’s spfile

Step 4: Finishing Touches

There are a few things still outstanding before declaring the environment ready. First of all you need to register the standby database in the Cluster Registry (OCR). Use srvctl for this task. One way to do this is shown here:

[oracle@rac12sec1 dbs]$ srvctl add database -db NCDBB -oraclehome $ORACLE_HOME -dbtype RAC \
> -role PHYSICAL_STANDBY -startoption MOUNT -dbname NCDBA -diskgroup "DATA,RECO"

[oracle@rac12sec1 dbs]$ srvctl add instance -db NCDBB -instance NCDBB1 -node rac12sec1
[oracle@rac12sec1 dbs]$ srvctl add instance -db NCDBB -instance NCDBB2 -node rac12sec2

When registering your standby database make sure you are license compliant!

Next I move the password file to ASM. The file currently resides on rac12sec1 in the $ORACLE_HOME/dbs directory. You can use the orapwd command to move it. If you registered the database in the OCR as I did in the previous step you will get an updated configuration as a result – the password file configuration is changed automatically.

[oracle@rac12sec1 dbs]$ srvctl config database -db NCDBB | grep -i password
Password file: +DATA/NCDBB/orapwncdbb

Since I like error messages when the database starts even less than unnecessary parameters in my spfile I tend to remove these from the standby database. And besides – in its current form the spfile wouldn’t allow me to mount both RAC instances anyway. For some odd reason the mapping between SID and instances is lost during the RMAN duplicate.

So I start off by creating a pfile from the spfile, and remove all unneeded underscore parameters and deprecated ones. In the resulting step I ensure that all the necessary RAC parameters are in. The minimalistic end result looks like this for my database:

*.audit_file_dest='/u01/app/oracle/admin/NCDBB/adump'
*.audit_trail='DB'
*.cluster_database=TRUE
*.compatible='12.1.0.2.0'
*.control_files='+DATA/NCDBB/CONTROLFILE/current.258.930565201',
   '+RECO/NCDBB/CONTROLFILE/current.258.930565201'#Restore Controlfile
*.db_block_size=8192
*.db_create_file_dest='+DATA'
*.db_domain=''
*.db_name='NCDBA'
*.db_recovery_file_dest='+RECO'
*.db_recovery_file_dest_size=4785M
*.db_unique_name='NCDBB'
*.diagnostic_dest='/u01/app/oracle'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=NCDBBXDB)'
NCDBB1.instance_number=1
NCDBB2.instance_number=2
*.open_cursors=300
*.pga_aggregate_target=1G
*.processes=300
*.remote_login_passwordfile='EXCLUSIVE'
*.sga_target=3G
NCDBB2.thread=2
NCDBB1.thread=1
NCDBB1.undo_tablespace='UNDOTBS1'
NCDBB2.undo_tablespace='UNDOTBS2'

Next I can move the spfile to ASM and update the OCR profile. After these steps, the OCR profile reflects the new location of the spfile. Be sure to check all $ORACLE_HOME/dbs directories on the cluster nodes to prevent local pfiles/spfiles taking precedence over the one referred to in the OCR.

[oracle@rac12sec1 dbs]$ srvctl config database -d NCDBB | grep spfile
Spfile: +DATA/NCDBB/spfileNCDBB

Everything is awesome!

It’s been a bit of a journey, but it was worth it. Before continuing with the remaining configuration steps, here is a little bit of detail about the system as it is:

[oracle@rac12sec1 dbs]$ srvctl status database -db NCDBB -verbose
Instance NCDBB1 is running on node rac12sec1. Instance status: Mounted (Closed).
Instance NCDBB2 is running on node rac12sec2. Instance status: Mounted (Closed).
[oracle@rac12sec1 dbs]$ 

SQL> select name,name,db_unique_name,database_role,log_mode,force_logging,flashback_on from v$database;

NAME      NAME      DB_UNIQUE_NAME   DATABASE_ROLE    LOG_MODE     FORCE_LOGGING   FLASHBACK_ON
--------- --------- ---------------- ---------------- ------------ --------------- ------------------
NCDBA     NCDBA     NCDBB            PHYSICAL STANDBY ARCHIVELOG   YES             NO

SQL> select * from v$active_instances;

INST_NUMBER INST_NAME                                                        CON_ID
----------- ------------------------------------------------------------ ----------
          1 rac12sec1:NCDBB1                                                      0
          2 rac12sec2:NCDBB2                                                      0

In my initial drafts I intended to continue this article with the Data Guard configuration, but having looked at the WordPress-formatted part 3 up to here I decided to split it into 2 pieces. Next up is the Data Guard Broker configuration, followed by more posts about switching over and application design considerations. Stay tuned!

Posted in 12c Release 1, KVM, Linux | Leave a Comment »

Little things worth knowing: when a transient ASM disk failure cannot be fixed in time

Posted by Martin Bach on October 7, 2016

In the previous blog post I used libvirt and KVM in my lab environment to simulate a transient disk failure and how to recover from it. This post takes this example a step further: I am simulating another disk failure, but this time won’t pretend I can fix the issue and put it back. In other words, I simulate the effect of the disk_repair_time hitting zero.

Most of what I am covering here is an extension of the previous post, I’ll mention the main detail here for your benefit, but would like to invite you to revert to the previous post for more detail.

The idea is to show you the output of the ASM alert.log and result of the lost disk in the V$-views.

As with the previous post, the code examples in this one are for demonstration purposes only!

The setup in a nutshell

I am using Oracle Linux 7.2 with UEK4 as the host system; KVM and libvirt are responsible for the guests (virtual machines). The guest VM used for this example is named asmtest, and uses Oracle Linux 7.2 as well. There are 9 ASM disks – 8 for +DATA featuring normal redundancy in 2 failure groups. I added +RECO for the sake of completeness with external redundancy. This post is about +DATA. To keep it simple I used an Oracle Restart configuration patched to the July 2016 PSU.

Removing the disk

As with the previous example I am using libvirt to remove a “LUN” temporarily from the guest. And sure enough, the VM picks this up. This is the relevant output obtained via journalctl -f

Sep 29 15:34:36 asmtest kernel: sd 2:0:0:0: [sda] Synchronizing SCSI cache
Sep 29 15:34:36 asmtest kernel: sd 2:0:0:0: [sda]
Sep 29 15:34:36 asmtest kernel: Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Sep 29 15:34:36 asmtest kernel: sd 2:0:0:0: [sda]
Sep 29 15:34:36 asmtest kernel: Sense Key : Illegal Request [current] 
Sep 29 15:34:36 asmtest kernel: sd 2:0:0:0: [sda]
Sep 29 15:34:36 asmtest kernel: Add. Sense: Logical unit not supported

In the previous post I keep referring to I copied the part of the ASM instance’s alert.log that showed how the disk repair timer was ticking down. This time I am simulating the case where – for whatever reason – the transient failure could not be fixed. In that case, this is what you would see in the alert.log:

2016-09-29 15:38:21.752000 +01:00
WARNING: Started Drop Disk Timeout for Disk 1 (DATA_0001) in group 1 with a value 600
WARNING: Disk 1 (DATA_0001) in group 1 will be dropped in: (600) secs on ASM inst 1
2016-09-29 15:41:25.379000 +01:00
WARNING: Disk 1 (DATA_0001) in group 1 will be dropped in: (416) secs on ASM inst 1
2016-09-29 15:44:29.012000 +01:00
WARNING: Disk 1 (DATA_0001) in group 1 will be dropped in: (232) secs on ASM inst 1
2016-09-29 15:47:32.643000 +01:00
WARNING: Disk 1 (DATA_0001) in group 1 will be dropped in: (48) secs on ASM inst 1
2016-09-29 15:50:36.259000 +01:00
WARNING: Disk 1 (DATA_0001) in group 1 will be dropped in: (0) secs on ASM inst 1

The last line in the above output is definitely NOT what you want to see. If at all possible, you should fix the problem causing DATA_0001 in this example from being dropped. If you see the message about the disk being dropped you are facing the inevitable rebalance operation. Here is the continued output from the ASM alert.log:

WARNING: PST-initiated drop of 1 disk(s) in group 1(.4232658126))
SQL> alter diskgroup DATA drop disk DATA_0001 force /* ASM SERVER */
NOTE: cache closing disk 1 of grp 1: (not open) DATA_0001
NOTE: requesting all-instance membership refresh for group=1
GMON updating for reconfiguration, group 1 at 40 for pid 23, osid 3216
NOTE: cache closing disk 1 of grp 1: (not open) DATA_0001
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: group 1 PST updated.
NOTE: membership refresh pending for group 1/0xfc493cce (DATA)
GMON querying group 1 at 41 for pid 16, osid 2701
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATA
SUCCESS: refreshed membership for 1/0xfc493cce (DATA)
SUCCESS: alter diskgroup DATA drop disk DATA_0001 force /* ASM SERVER */
SUCCESS: PST-initiated drop disk in group 1(4232658126))
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: starting rebalance of group 1/0xfc493cce (DATA) at power 1
Starting background process ARB0
ARB0 started with pid=20, OS id=3655 
NOTE: assigning ARB0 to group 1/0xfc493cce (DATA) with 1 parallel I/O
2016-09-29 15:50:40.257000 +01:00
NOTE: restored redundancy of control and online logs in DATA
NOTE: Rebalance has restored redundancy for any existing control file or redo log in disk group DATA
NOTE: restored redundancy of control and online logs in DATA
2016-09-29 15:50:51.655000 +01:00
NOTE: requesting all-instance membership refresh for group=1
GMON updating for reconfiguration, group 1 at 42 for pid 21, osid 3719
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATA
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: group 1 PST updated.
2016-09-29 15:50:54.647000 +01:00
GMON updating for reconfiguration, group 1 at 43 for pid 21, osid 3719
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATA
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: group 1 PST updated.
2016-09-29 15:50:57.571000 +01:00
NOTE: membership refresh pending for group 1/0xfc493cce (DATA)
GMON querying group 1 at 44 for pid 16, osid 2701
GMON querying group 1 at 45 for pid 16, osid 2701
NOTE: Disk _DROPPED_0001_DATA in mode 0x0 marked for de-assignment
SUCCESS: refreshed membership for 1/0xfc493cce (DATA)
NOTE: Attempting voting file refresh on diskgroup DATA
2016-09-29 15:51:03.589000 +01:00
NOTE: stopping process ARB0
SUCCESS: rebalance completed for group 1/0xfc493cce (DATA)

The last line indicates that the rebalance operation is complete. My disk group was almost empty, the rebalance operation took almost no time. But indeed, when you check V$ASM_DISK, the disk is gone:

SQL> r
  1  select name,path,disk_number,mount_status,header_status,mode_status,state,failgroup,repair_timer
  2* from v$asm_disk where group_number = 1 order by disk_number

NAME       PATH            DISK_NUMBER MOUNT_S HEADER_STATU MODE_ST STATE    FAILGROUP  REPAIR_TIMER
---------- --------------- ----------- ------- ------------ ------- -------- ---------- ------------
DATA_0000  /dev/asm-disk02           0 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                0
DATA_0002  /dev/asm-disk03           2 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                0
DATA_0003  /dev/asm-disk06           3 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                0
DATA_0004  /dev/asm-disk07           4 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                0
DATA_0005  /dev/asm-disk08           5 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                0
DATA_0006  /dev/asm-disk04           6 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                0
DATA_0007  /dev/asm-disk05           7 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                0

7 rows selected.

SQL> select count(*), failgroup from v$asm_disk where name like 'DATA%' group by failgroup;

  COUNT(*) FAILGROUP
---------- ----------
         3 BATCH1
         4 BATCH2

My system is fully operational, and the rebalance did not run into any space problems. Space problems are the last thing you want to have when rebalancing. I did some research about this subject earlier and documented it in a blog post.

In a future blog post I am going to investigate what happens when a partner disk of DATA_0001 fails while the repair timer is ticking down.

Posted in 12c Release 1, Automatic Storage Management, KVM, Linux | Leave a Comment »

Little things worth knowing: transient ASM disk failures in 12c

Posted by Martin Bach on September 30, 2016

For quite some time we have been treated nicely by ASM when it comes to transient disk failures. Since 11.1 (if memory serves me right), transient failures won’t cause an ASM disk to be dropped immediately. This is good, because it can potentially save a lot of time! When a disk is dropped from an ASM disk, a rebalance operation is unavoidable. And there is more than a chance of another one following it, because ultimately you want to add the disk back to the disk group. Well, to be fair, this is only applicable for normal or high redundancy disk groups.

The feature I just described very briefly is referred to as ASM Fast Mirror Resync, and documented in the ASM Administrator’s Guide in section Administering Oracle ASM Disk Groups.

By the way, there is another new feature you might want to look at, called FAILGROUP_REPAIR_TIME, but I’ll keep that for another post.

In this post I’d like to demonstrate how easy it is to simulate a transient failure of an ASM disk using libvirt and KVM on a host running Oracle Linux 7.2. I also wanted to have the output from the O/S and ASM alert.log as personal reference.

As usual, this procedure is strictly for YOUR HOME LAB USE ONLY and for educational purposes.

The setup

When creating the VM – named “asmtest” – I created a boot volume, and a 50 GB volume for the Oracle binaries. To keep it manageable I went for an Oracle 12.1 Restart configuration, patched to the July PSU. There are 9 ASM disks, presented as SCSI devices (not virtio), I’ll explain why in a bit. The guest O/S is Oracle Linux 7.2, as is my lab server. All virtual disks on the VM are partitioned although that is not necessary (strictly speaking).

Using the lsscsi tool (part of the sg3_utils), I created udev rules following the excellent description I found on oracle-base.com.

[oracle@asmtest ~]$ cat /etc/udev/rules.d/61-asm.rules 

KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-0", SYMLINK+="asm-disk01", OWNER="oracle", GROUP="asmdba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-1", SYMLINK+="asm-disk02", OWNER="oracle", GROUP="asmdba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-2", SYMLINK+="asm-disk03", OWNER="oracle", GROUP="asmdba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-3", SYMLINK+="asm-disk04", OWNER="oracle", GROUP="asmdba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-0", SYMLINK+="asm-disk05", OWNER="oracle", GROUP="asmdba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-1", SYMLINK+="asm-disk06", OWNER="oracle", GROUP="asmdba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-2", SYMLINK+="asm-disk07", OWNER="oracle", GROUP="asmdba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-3", SYMLINK+="asm-disk08", OWNER="oracle", GROUP="asmdba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi3-0-0-0", SYMLINK+="asm-disk09", OWNER="oracle", GROUP="asmdba", MODE="0660"

This is slightly more complicated than my usual setup, but was needed for device name persistence. Unfortunately libvirt does not report the same block device names as my guest VM. Whilst the host “sees” sd[h-v] my guest reported these devices as sd[a-i]. Persistent device names to the rescue! The only issue I had initially was that disks presented to the VM via virtio don’t report any SCSI IDs I need for my udev rules, or in other words, lsscsi returned no data. Presenting the block devices via the SCSI interface did the trick.

The hypothesis

This snippet from the documentation (referenced above) is exactly what I wanted to test:

After you repair the disk, run the SQL statement ALTER DISKGROUP ONLINE DISK. This statement brings a repaired disk group back online to enable writes so that no new writes are missed. This statement also starts a procedure to copy of all of the extents that are marked as stale on their redundant copies.

Here is the current configuration I am using for my DATA disk group, as seen by ASM:

SQL> select name,state,type,offline_disks from v$asm_diskgroup
  2  where name = 'DATA';

NAME                           STATE       TYPE   OFFLINE_DISKS
------------------------------ ----------- ------ -------------
DATA                           MOUNTED     NORMAL             0



SQL> select name,disk_number,path,mount_status,header_status,mode_status,state,failgroup,repair_timer
  2  from v$asm_disk where group_number = 1 order by disk_number;

NAME            DISK_NUMBER PATH            MOUNT_S HEADER_STATU MODE_ST STATE    FAILGROUP        REPAIR_TIMER
--------------- ----------- --------------- ------- ------------ ------- -------- ---------------- ------------
DATA_0000                 0 /dev/asm-disk02 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                      0
DATA_0001                 1 /dev/asm-disk01 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                      0
DATA_0002                 2 /dev/asm-disk03 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                      0
DATA_0003                 3 /dev/asm-disk06 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                      0
DATA_0004                 4 /dev/asm-disk07 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                      0
DATA_0005                 5 /dev/asm-disk08 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                      0
DATA_0006                 6 /dev/asm-disk04 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                      0
DATA_0007                 7 /dev/asm-disk05 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                      0

You can see 2 failure groups, batch1 and batch2, and all disks are available/online. The operating system mapping works perfectly thanks to my udev rules:

[oracle@asmtest ~]$ ls -l /dev/asm-disk0*
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk01 -> sda1
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk02 -> sdd1
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk03 -> sdc1
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk04 -> sdb1
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk05 -> sde1
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk06 -> sdh1
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk07 -> sdg1
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk08 -> sdf1
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk09 -> sdi1

Let’s detach a disk and cause a transient failure

Libvirt is really cool, it allows me to modify my VM configuration at runtime. One example is removing of disks, using the detach-disk command:

[root@host ~]# virsh help detach-disk
  NAME
    detach-disk - detach disk device

  SYNOPSIS
    detach-disk <domain> <target> [--persistent] [--config] [--live] [--current]

  DESCRIPTION
    Detach disk device.

  OPTIONS
    [--domain] <string>  domain name, id or uuid
    [--target] <string>  target of disk device
    --persistent     make live change persistent
    --config         affect next boot
    --live           affect running domain
    --current        affect current domain

But which disk to remove? The disks attached to a VM can be listed using the domblklist command:

[root@host ~]# virsh domblklist asmtest
Target     Source
------------------------------------------------
vda        /path/to/slow/disk/asmtest
vdb        /path/to/slow/disk/asmtest-1
sdh        /path/to/fast/disk/asmtest-asm01.img
sdi        /path/to/fast/disk/asmtest-asm02.img
sdj        /path/to/fast/disk/asmtest-asm03.img
sdk        /path/to/fast/disk/asmtest-asm04.img
sdo        /path/to/fast/disk/asmtest-asm05.img
sdp        /path/to/fast/disk/asmtest-asm06.img
sdq        /path/to/fast/disk/asmtest-asm07.img
sdr        /path/to/fast/disk/asmtest-asm08.img
sdv        /path/to/fast/disk/asmtest-asm09.img

I mentioned previously that the information provided by the host does not necessarily match that of the guest. Since I have persistent device names though in my VM there shouldn’t be an issue. Let’s take a disk away, but only temporarily! Since my O/S and Oracle binaries reside on disks presented to the guest as virtio devices, I know they are named vd[a-z] and better left alone. That leaves sdh as the first disk to remove.

[root@host ~]# virsh detach-disk asmtest sdh --live
Disk detached successfully

And the disk is gone.

And what is happening?

The disk removal has immediate effect on the ASM instance. Switching over to my VM I can see the following information recorded in the ASM alert.log:

2016-09-29 13:45:13.936000 +01:00
WARNING: Write Failed. group:1 disk:1 AU:1 offset:1044480 size:4096
path:/dev/asm-disk01
 incarnation:0xe969ccd4 asynchronous result:'I/O error'
 subsys:System krq:0x7fda4f0b8210 bufp:0x7fda4f20b000 osderr1:0x69b5 osderr2:0x0
 IO elapsed time: 0 usec Time waited on I/O: 0 usec
WARNING: Hbeat write to PST disk 1.3916025044 in group 1 failed. [2]
NOTE: process _b000_+asm (4346) initiating offline of disk 1.3916025044 (DATA_0001) with mask 0x7e in group 1 (DATA) without client assisting
NOTE: checking PST: grp = 1
GMON checking disk modes for group 1 at 13 for pid 22, osid 4346
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: checking PST for grp 1 done.
NOTE: sending set offline flag message (844741558) to 1 disk(s) in group 1
WARNING: Disk 1 (DATA_0001) in group 1 mode 0x7f is now being offlined
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0xe969ccd4, mask = 0x6a, op = clear
GMON updating disk modes for group 1 at 14 for pid 22, osid 4346
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: PST update grp = 1 completed successfully
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0xe969ccd4, mask = 0x7e, op = clear
GMON updating disk modes for group 1 at 15 for pid 22, osid 4346
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: cache closing disk 1 of grp 1: DATA_0001
NOTE: PST update grp = 1 completed successfully
2016-09-29 13:46:53.300000 +01:00
WARNING: Started Drop Disk Timeout for Disk 1 (DATA_0001) in group 1 with a value 600
WARNING: Disk 1 (DATA_0001) in group 1 will be dropped in: (600) secs on ASM inst 1

I also had a journalctl -f running in a different session. Unsurprisingly the O/S also noticed the disk disappearing:

Sep 29 13:45:13 asmtest kernel: sd 2:0:0:0: [sda] Synchronizing SCSI cache
Sep 29 13:45:13 asmtest kernel: sd 2:0:0:0: [sda]
Sep 29 13:45:13 asmtest kernel: Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Sep 29 13:45:13 asmtest kernel: sd 2:0:0:0: [sda]
Sep 29 13:45:13 asmtest kernel: Sense Key : Illegal Request [current] 
Sep 29 13:45:13 asmtest kernel: sd 2:0:0:0: [sda]
Sep 29 13:45:13 asmtest kernel: Add. Sense: Logical unit not supported

A transient failure

In yet another session I can see that the disk is about to go, the repair timer has started ticking down, and is decremented as time passes. This did not happen straight away, there was a little delay of maybe a couple of minutes after the disk was removed.

SQL> r
  1  select name,disk_number,path,mount_status,header_status,mode_status,state,failgroup,repair_timer
  2* from v$asm_disk where group_number = 1 order by disk_number

NAME            DISK_NUMBER PATH            MOUNT_S HEADER_STATU MODE_ST STATE    FAILGROUP       REPAIR_TIMER
--------------- ----------- --------------- ------- ------------ ------- -------- --------------- ------------
DATA_0000                 0 /dev/asm-disk02 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0001                 1                 MISSING UNKNOWN      OFFLINE NORMAL   BATCH1                   600
DATA_0002                 2 /dev/asm-disk03 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0003                 3 /dev/asm-disk06 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0004                 4 /dev/asm-disk07 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0005                 5 /dev/asm-disk08 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0006                 6 /dev/asm-disk04 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0007                 7 /dev/asm-disk05 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0

8 rows selected.

SQL> r
  1  select name,disk_number,path,mount_status,header_status,mode_status,state,failgroup,repair_timer
  2* from v$asm_disk where group_number = 1 order by disk_number

NAME            DISK_NUMBER PATH            MOUNT_S HEADER_STATU MODE_ST STATE    FAILGROUP       REPAIR_TIMER
--------------- ----------- --------------- ------- ------------ ------- -------- --------------- ------------
DATA_0000                 0 /dev/asm-disk02 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0001                 1                 MISSING UNKNOWN      OFFLINE NORMAL   BATCH1                   416
DATA_0002                 2 /dev/asm-disk03 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0003                 3 /dev/asm-disk06 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0004                 4 /dev/asm-disk07 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0005                 5 /dev/asm-disk08 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0006                 6 /dev/asm-disk04 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0007                 7 /dev/asm-disk05 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0

8 rows selected.

Another day I would like to run an additional test to see what happens when the repair_timer is actually down to 0, but for now let’s add the disk back in. On the host, I use libvirt again for this:

[root@host ~]# virsh attach-disk asmtest /path/to/fast/disk/asmtest-asm01.img sdh --live
Disk attached successfully

As soon as the prompt returns, I can see the guest O/S picking the disk up.

Sep 29 13:54:09 asmtest kernel: scsi 2:0:0:0: Direct-Access     QEMU     QEMU HARDDISK    1.5. PQ: 0 ANSI: 5
Sep 29 13:54:09 asmtest kernel: sd 2:0:0:0: Attached scsi generic sg0 type 0
Sep 29 13:54:09 asmtest kernel: sd 2:0:0:0: [sda] 10485760 512-byte logical blocks: (5.36 GB/5.00 GiB)
Sep 29 13:54:09 asmtest kernel: sd 2:0:0:0: [sda] Write Protect is off
Sep 29 13:54:09 asmtest kernel: sd 2:0:0:0: [sda] Mode Sense: 63 00 00 08
Sep 29 13:54:09 asmtest kernel: sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Sep 29 13:54:09 asmtest kernel:  sda: sda1
Sep 29 13:54:09 asmtest kernel: sd 2:0:0:0: [sda] Attached SCSI disk

The documentation states that I should now be able to online the disk again, which I tried. The result is shown below:

SQL> alter diskgroup data online all;

Diskgroup altered.

I am showing you the complete ASM alert.log output here, feel free to skip to the bottom:

2016-09-29 13:54:44.774000 +01:00
SQL> alter diskgroup data online all
NOTE: cache closing disk 1 of grp 1: (not open) DATA_0001
NOTE: initiating resync of disk group 1 disks
DATA_0001 (1)
NOTE: process _user3750_+asm (3750) initiating offline of disk 1.3916025044 (DATA_0001) with mask 0x7e in group 1 (DATA) without client assisting
NOTE: sending set offline flag message (4172832432) to 1 disk(s) in group 1
WARNING: Disk 1 (DATA_0001) in group 1 mode 0x1 is now being offlined
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0xe969ccd4, mask = 0x6a, op = clear
GMON updating disk modes for group 1 at 17 for pid 20, osid 3750
NOTE: cache closing disk 1 of grp 1: (not open) DATA_0001
NOTE: PST update grp = 1 completed successfully
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0xe969ccd4, mask = 0x7e, op = clear
GMON updating disk modes for group 1 at 18 for pid 20, osid 3750
NOTE: cache closing disk 1 of grp 1: (not open) DATA_0001
NOTE: PST update grp = 1 completed successfully
NOTE: requesting all-instance membership refresh for group=1
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0x0, mask = 0x11, op = assign
GMON updating disk modes for group 1 at 19 for pid 20, osid 3750
NOTE: cache closing disk 1 of grp 1: (not open) DATA_0001
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: PST update grp = 1 completed successfully
NOTE: requesting all-instance disk validation for group=1
NOTE: disk validation pending for 1 disk in group 1/0xceb93c35 (DATA)
NOTE: Found /dev/asm-disk01 for disk DATA_0001
NOTE: completed disk validation for 1/0xceb93c35 (DATA)
NOTE: discarding redo for group 1 disk 1
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0x0, mask = 0x19, op = assign
GMON updating disk modes for group 1 at 20 for pid 20, osid 3750
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: PST update grp = 1 completed successfully
NOTE: membership refresh pending for group 1/0xceb93c35 (DATA)
GMON querying group 1 at 21 for pid 16, osid 2535
NOTE: cache opening disk 1 of grp 1: DATA_0001 path:/dev/asm-disk01
SUCCESS: refreshed membership for 1/0xceb93c35 (DATA)
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0x0, mask = 0x5d, op = assign
GMON updating disk modes for group 1 at 22 for pid 20, osid 3750
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: PST update grp = 1 completed successfully
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0x0, mask = 0x7d, op = assign
GMON updating disk modes for group 1 at 23 for pid 20, osid 3750
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: PST update grp = 1 completed successfully
SUCCESS: alter diskgroup data online all
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: starting rebalance of group 1/0xceb93c35 (DATA) at power 1
Starting background process ARB0
ARB0 started with pid=23, OS id=4986
NOTE: assigning ARB0 to group 1/0xceb93c35 (DATA) with 1 parallel I/O
2016-09-29 13:54:48.265000 +01:00
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0x0, mask = 0x7f, op = assign
GMON updating disk modes for group 1 at 24 for pid 23, osid 4986
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: PST update grp = 1 completed successfully
NOTE: reset timers for disk: 1
NOTE: completed online of disk group 1 disks
DATA_0001 (1)
2016-09-29 13:54:49.580000 +01:00
NOTE: stopping process ARB0
NOTE: requesting all-instance membership refresh for group=1
SUCCESS: rebalance completed for group 1/0xceb93c35 (DATA)
NOTE: membership refresh pending for group 1/0xceb93c35 (DATA)
GMON querying group 1 at 25 for pid 16, osid 2535
SUCCESS: refreshed membership for 1/0xceb93c35 (DATA)
2016-09-29 13:54:51.286000 +01:00
NOTE: Attempting voting file refresh on diskgroup DATA

This looks promising, but what about V$ASM_DISK?

SQL> select name,disk_number,path,mount_status,header_status,mode_status,state,failgroup,repair_timer
  2  from v$asm_disk where group_number = 1 order by disk_number
  3  /

NAME            DISK_NUMBER PATH            MOUNT_S HEADER_STATU MODE_ST STATE    FAILGROUP       REPAIR_TIMER
--------------- ----------- --------------- ------- ------------ ------- -------- --------------- ------------
DATA_0000                 0 /dev/asm-disk02 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0001                 1 /dev/asm-disk01 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0002                 2 /dev/asm-disk03 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0003                 3 /dev/asm-disk06 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0004                 4 /dev/asm-disk07 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0005                 5 /dev/asm-disk08 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0006                 6 /dev/asm-disk04 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0007                 7 /dev/asm-disk05 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0

8 rows selected.

SQL> 

Hurray, I have my disk back! And at no time did I see a rebalance operation. Thanks ASM!

Posted in 12c Release 1, KVM, Linux | 1 Comment »

Installing Oracle 12.1.0.2 RAC on Oracle Linux 7-part 2

Posted by Martin Bach on January 12, 2015

In the first part of the article series you could read how a kickstart file made the installation of Oracle Linux 7 a lot more bearable. In this part of the series it’s all about configuring the operating system. The installation of Grid Infrastructure and the Oracle database is for another set of posts.

There are quite some differences between Oracle Linux 6 and 7

To me the transition from Oracle Linux 6 to 7 feels like the step from Solaris 9 to 10 at the time. Personally I think that a lot has changed. Although, it’s fair to say that it has been quite some time it has been announced that the network stack commands we know and love are deprecated and might go… Even with Oracle Linux 6 there was a threat that network manager would now be the only tool to modify your network settings (which thankfully was not the case). A lot of efforts of the Linux community have now come to fruition, and it’s time to adjust to the future. Even when it’s painful (and it is, at least a bit).

Configuring the network

The warning has been out there quite a while but now it seems to be true-no more system-config-network-tui to configure the network! No more ifconfig! Oh dear-quite a bit of learning to be done. Luckily someone else has done all the legwork and documented the changes. A good example is this one:

https://dougvitale.wordpress.com/2011/12/21/deprecated-linux-networking-commands-and-their-replacements/

So first of all-don’t fear: although all network interfaces are configured using network manager now, you can still use a command line tool: nmtui. After trying it out I have to say I’m not really convinced about its usability. What appears better is the use of the nmcli, network manager command line tool. It’s use is quite confusing, and it appears to me as if the whole network manager toolset was developed for laptop users, not servers. But I digress. I have a few interfaces in my RAC VM, the first was configured during the installation, eth[1-3] aren’t configured yet.

[root@localhost ~]# nmcli connection show
NAME         UUID                                  TYPE            DEVICE
System eth0  77e3f8a9-76d0-4051-a8f2-cbbe39dab089  802-3-ethernet  eth0
[root@localhost ~]# nmcli device status
DEVICE  TYPE      STATE         CONNECTION
eth0    ethernet  connected     System eth0
eth1    ethernet  disconnected  --
eth2    ethernet  disconnected  --
eth3    ethernet  disconnected  --
lo      loopback  unmanaged     --
[root@localhost ~]#

At this point I have used eth0 as the management network (similar to the way Exadata does) and will use the other networks for the database. eth1 will act as the public network, eth2 and eth3 will be private.

Although the network interfaces can be named differently for device name persistence I stick with the old naming for now. I don’t want to run into trouble with the installer just yet. On physical hardware you are very likely to see very different network interface names, the kernel uses a naming scheme identifying where the cards are (on the main board, or in extension cards for example). I’ll write another post about that soon.

Using dnsmasq (on the host) I configure my hosts for these addresses:

[root@ol62 ~]# grep rac12pri /etc/hosts
192.168.100.107	rac12pri1.example.com		rac12pri1
192.168.100.108	rac12pri1-vip.example.com	rac12pri1-vip
192.168.100.109	rac12pri2.example.com		rac12pri2
192.168.100.110	rac12pri2-vip.example.com	rac12pri2-vip
192.168.100.111	rac12pri-scan.example.com	rac12pri-scan
192.168.100.112	rac12pri-scan.example.com	rac12pri-scan
192.168.100.113	rac12pri-scan.example.com	rac12pri-scan

Configuring the interface is actually not too hard once you got the hang of it. It took me a little while to get it though… It almost appears as if something that was simple and easy to use was made difficult to use.

[root@localhost ~]# nmcli con add con-name eth1 ifname eth1 type ethernet ip4 192.168.100.107/24 gw4 192.168.100.1
[root@localhost ~]# nmcli con add con-name eth2 ifname eth2 type ethernet ip4 192.168.101.107/24
[root@localhost ~]# nmcli con add con-name eth3 ifname eth3 type ethernet ip4 192.168.102.107/24 

[root@localhost ~]# nmcli con show
NAME         UUID                                  TYPE            DEVICE
eth2         ccc7f592-b563-4b9d-a36b-2b45809e4643  802-3-ethernet  eth2
eth1         ae897dee-42ff-4ccd-843b-7c97ba0d5315  802-3-ethernet  eth1
System eth0  77e3f8a9-76d0-4051-a8f2-cbbe39dab089  802-3-ethernet  eth0
eth3         b6074c9a-dcc4-4487-9a8a-052e4c60bbca  802-3-ethernet  eth3

I can now verify the IP addresses using the “ip” tool (ifconfig was not installed, I haven’t yet checked if there was a compatibility package though)

[root@localhost ~]# ip addr show
1: lo: &lt;LOOPBACK,UP,LOWER_UP&gt; mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: &lt;BROADCAST,MULTICAST,UP,LOWER_UP&gt; mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:6e:6f:67 brd ff:ff:ff:ff:ff:ff
    inet 192.168.150.111/24 brd 192.168.150.255 scope global eth0
    inet6 fe80::5054:ff:fe6e:6f67/64 scope link
       valid_lft forever preferred_lft forever
3: eth1: &lt;BROADCAST,MULTICAST,UP,LOWER_UP&gt; mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:96:ad:88 brd ff:ff:ff:ff:ff:ff
    inet 192.168.100.107/24 brd 192.168.100.255 scope global eth1
    inet6 fe80::5054:ff:fe96:ad88/64 scope link
       valid_lft forever preferred_lft forever
4: eth2: &lt;BROADCAST,MULTICAST,UP,LOWER_UP&gt; mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:c1:cc:8e brd ff:ff:ff:ff:ff:ff
    inet 192.168.101.107/24 brd 192.168.101.255 scope global eth2
    inet6 fe80::5054:ff:fec1:cc8e/64 scope link
       valid_lft forever preferred_lft forever
5: eth3: &lt;BROADCAST,MULTICAST,UP,LOWER_UP&gt; mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:7e:59:45 brd ff:ff:ff:ff:ff:ff
    inet 192.168.102.107/24 brd 192.168.102.255 scope global eth3
    inet6 fe80::5054:ff:fe7e:5945/64 scope link
       valid_lft forever preferred_lft forever

Now what’s left is setting the hostname-which is a simple call to hostnamectl –static set-hostname rac12pri1. nmcli gives you an interface to changing the hostname as well. I repeated the steps for node 2, they are identical except for the network IP addresses of course.

So that concludes the network setup.

Managing linux daemons

If you are curious about setting services at runlevel, then there’ll be another surprise:

[root@rac12pri2 ~]# chkconfig --list

Note: This output shows SysV services only and does not include native
      systemd services. SysV configuration data might be overridden by native
      systemd configuration.

      If you want to list systemd services use 'systemctl list-unit-files'.
      To see services enabled on particular target use
      'systemctl list-dependencies [target]'.

iprdump        	0:off	1:off	2:on	3:on	4:on	5:on	6:off
iprinit        	0:off	1:off	2:on	3:on	4:on	5:on	6:off
iprupdate      	0:off	1:off	2:on	3:on	4:on	5:on	6:off
netconsole     	0:off	1:off	2:off	3:off	4:off	5:off	6:off
network        	0:off	1:off	2:on	3:on	4:on	5:on	6:off
pmcd           	0:off	1:off	2:off	3:off	4:off	5:off	6:off
pmie           	0:off	1:off	2:off	3:off	4:off	5:off	6:off
pmlogger       	0:off	1:off	2:off	3:off	4:off	5:off	6:off
pmmgr          	0:off	1:off	2:off	3:off	4:off	5:off	6:off
pmproxy        	0:off	1:off	2:off	3:off	4:off	5:off	6:off
pmwebd         	0:off	1:off	2:off	3:off	4:off	5:off	6:off
rhnsd          	0:off	1:off	2:on	3:on	4:on	5:on	6:off
[root@rac12pri2 ~]#

If you just got familiar with upstart then there are some bad news: upstart is now replaced with systemd… This might be the right time to read up on that if you aren’t familiar with it yet:

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/System_Administrators_Guide/chap-Managing_Services_with_systemd.html

Things are a little different with that, so here is an example how to enable and start the NTP service. It has to be installed first if that hasn’t been the case. You also should add the -x flag in /etc/sysconfig/ntpd. First I would like to see if the service is available. You use systemctl for this-so instead of a chkconfig ntpd –list you call systemctl as shown:

[root@rac12pri ~]# systemctl list-units --type service --all | grep ntpd
ntpd.service                                                                              loaded inactive dead    Network Time Service
ntpdate.service                                                                           loaded inactive dead    Set time via NTP

I have to get used to the new syntax: previously you used “service <whatever> status” and then, if you needed, typed backspace a few times and changed status to start. The new syntax is closer to human language but less practical: systemctl status <service>. Changing status to start requires more typing.

The check proved that the service exists (i.e. the NTP package is installed), but it is not started. We can change this:

[root@rac12pri ~]# systemctl enable ntpd.service
[root@rac12pri ~]# systemctl start ntpd.service
[root@rac12pri ~]# systemctl status ntpd.service
ntpd.service - Network Time Service
Loaded: loaded (/usr/lib/systemd/system/ntpd.service; enabled)
Active: active (running) since Tue 2014-12-16 15:38:47 GMT; 1s ago
Process: 5179 ExecStart=/usr/sbin/ntpd -u ntp:ntp $OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 5180 (ntpd)
CGroup: /system.slice/ntpd.service
└─5180 /usr/sbin/ntpd -u ntp:ntp -g -x

Dec 16 15:38:47 rac12pri ntpd[5180]: Listen normally on 8 eth1 fe80::5054:ff:fe96:ad88 UDP 123
Dec 16 15:38:47 rac12pri ntpd[5180]: Listen normally on 9 eth2 fe80::5054:ff:fec1:cc8e UDP 123
Dec 16 15:38:47 rac12pri ntpd[5180]: Listen normally on 10 eth3 fe80::5054:ff:fe7e:5945 UDP 123
Dec 16 15:38:47 rac12pri ntpd[5180]: Listen normally on 11 eth0 fe80::5054:ff:fe6e:6f67 UDP 123
Dec 16 15:38:47 rac12pri ntpd[5180]: Listening on routing socket on fd #28 for interface updates
Dec 16 15:38:47 rac12pri ntpd[5180]: 0.0.0.0 c016 06 restart
Dec 16 15:38:47 rac12pri ntpd[5180]: 0.0.0.0 c012 02 freq_set ntpd 0.000 PPM
Dec 16 15:38:47 rac12pri ntpd[5180]: 0.0.0.0 c011 01 freq_not_set
Dec 16 15:38:47 rac12pri systemd[1]: Started Network Time Service.
Dec 16 15:38:48 rac12pri ntpd[5180]: 0.0.0.0 c614 04 freq_mode
[root@rac12pri ~]#

The call to “systemctl enable” replaces an invocation of chkconfig to automatically start ntpd as a service (chkconfig ntpd on). Starting the service does not produce any output, hence the need to check the status.

There is a slight caveat with the use of NTP: it is not the default time keeping service. Another tool, named chronyd is used instead.

This causes a problem after the next reboot: chronyd will be started, NTPd won’t be. The Red Hat documentation therefore has a section on how to switch:

[root@rac12pri ~]# systemctl stop chronyd
[root@rac12pri ~]# systemctl disable chronyd
[root@rac12pri ~]# systemctl status chronyd

Storage

Shared storage is provided by KVM. I am using my SSDs in the lab from where I create a few “LUNs”. These must explicitly be made “shareable” to be accessible by more than one guest. Since 12.1.0.2.0 Oracle installs a database for the cluster health monitor by default. Currently I use the following setup for my lab 12.1.0.2 clusters:

  1. +CHM (external redundancy) – 1x 15GB
  2. +OCR (normal redundancy) – 3x 2 GB
  3. +DATA (external redundancy) – 1 x 15GB
  4. +RECO (external redundancy) – 1 x 10 GB

If you use the guided installation of Grid Infrastructure the installer will prompt you for a single disk group only. This means that the CHM database as well as the OCR and voting files be installed in that disk group. I prefer to separate them though, which is why I create a second disk group OCR after the installation has completed and move the voting files and OCR out of +CHM.

DATA and RECO are standard Exadata disk groups and I like to keep things consistent for myself.

I use fdisk to partition the future ASM disks with 1 partition spanning the whole LUN.

Other tasks

A lot of the other pre-installation tasks can actually be performed during the kickstart installation. I still like to use SELinux in permissive mode even though-according to Requirements for Installing Oracle Database 12.1 on RHEL6 or OL6 64-bit (x86-64) (Doc ID 1529864.1)-selinux can be in “enforcing”. The directive in the kickstart file is

selinux –permissive

You shouldn’t have to install additional packages-all packages to be installed should go into the %packages section of the file. Simply copy the package names from the official documentation and paste below the last package in the section. There is one exception to the rule: cvuqdisk must be installed from the Oracle installation media.

Settings for /etc/sysctl.conf and /etc/security/limits.conf can also be made in the kickstart file as shown in the first part of this series.

Storage to be made available to RAC must have permissions set. Since there isn’t an ASMLib in Oracle Linux 7 to my knowledge UDEV will have to be used, and my udev configuration file, too, is in the first part.

To make sure my user and group IDs for the oracle and grid account are the same I create the accounts in the kickstart file as well. Passwords are deliberately not set-they may evolve and I can’t possibly remember them all :)

User equivalence can be set up using a technique I have already described in an earlier blog post. Although the user equivalence setup can be deferred to when you install Grid Infrastructure I still perform it before to allow me to run the cluster verification tool with the -fixup option.

Posted in 12c Release 1, KVM, Linux, RAC | 3 Comments »

Upgrading clustered Grid Infrastructure to 12.1.0.2 from 12.1.0.1.3

Posted by Martin Bach on July 29, 2014

Oracle 12.1.0.2 is out, after lots of announcements the product has finally been released. I had just extended my 12.1.0.1.3 cluster to 3 nodes and was about to apply the July PSU when I saw the news. So why not try and upgrade to the brand new thing?

What struck me at first was the list of new features … Oracle’s patching strategy has really changed over time. I remember the days when Oracle didn’t usually add additional features into point releases. Have a look at the new 12.1.0.2 features and that would possibly qualify to be 12c Release 2…

In summary the upgrade process is actually remarkably simple, and hasn’t changed much since earlier versions of the software. Here are the steps in chronological order.

./runInstaller

I don’t know how often I have type ./ruinInstaller instead of runInstaller, but here you go. This is the first wizard screen after splash screen has disappeared.

GI 12.1.0.2-001

Naturally I went for the upgrade of my cluster. Before launching the installer though I made sure that everything was in working order by means of cluvfy. On to the next screen:

GI 002

always install English only. Troubleshooting Oracle in a different language (especially if I don’t speak or understand) is really hard so I avoid it in the first place.

Over to the screen that follows and oops-my SYSDG disk group (containing OCR and voting files) is too small. Bugger. In the end I added 3 new 10GB LUNs and dropped the old ones. But it took me a couple of hours to do so. Worse: it wasn’t even needed, but proved to be a good learning exercise. The requirement to have that much free space is most likely caused by the management repository and related infrastructure.

GI 003 error

Back to this screen everything is in best order, the print screen has been taken just prior to the change to the next. Note the button to skip the updates on unreachable nodes. Not sure if I wanted to do that though.

GI 003

I haven’t got OEM agents on the servers (yet) so I’m skipping the registration for now. You can always do that later.

GI 004

This screen is familiar; I am keeping my choices from the initial installation. Grid Infrastructure is owned by Oracle despite the ASMDBA and ASMADMIN groups by the way.

GI 005

On the screen below you define where on the file system you want to install Grid Infrastructure. Remember that for clustered deployments the ORACLE_HOME cannot be in the path of the ORACLE_BASE. For this to work you have to jump to the command line and create the directory on all servers and grant ownership to the GI owner account (oracle in this case, could be grid as well).

GI 006

Since I like to be in control I don’t allow Oracle to run the root scripts. I didn’t in 12.1.0.1 either:

GI 007

In that screen you notice the familiar checking of requirements.

GI 008

In my case there were only a few new ones shown here. This is a lab server so I don’t plan on using swap, but the kernel parameter “panic_on_oops” is new. I also didn’t set the reverse path filtering which I corrected before continuing. Interestingly the installer points out that there is a change in the asm_diskstring with its implications.

One thing I haven’t recorded here (because I am using Oracle Linux 6.5 with UEK3) is the requirement for using a 2.6.39 kernel – that sounds like UEK2 to me.

Update: my system is Oracle Linux 6.5, not Red Hat. See Sigrid’s comments below: for Red Hat Linux there doesn’t seem to be a similar requirement to use UEK 2, which matches the documentation (Installation guide for Grid Infrastructure/Linux).

Another interesting case was that the kernel_core pattern wasn’t equal on all nodes. Turned out that 2 nodes had the package abrt installed, and the other two didn’t. Once the packages were installed on all nodes, the warning went away.

GI 009

Unfortunately I didn’t take a print screen of the summary in case you wonder where that is. I went straight into the installation phase:

GI 010

At the end of which you are prompted to run the upgrade scripts. Remember to run them in screen and pay attention to the order you run them in.

GI 011

The output from the last node is shown here:

[root@rac12node3 ~]# /u01/app/12.1.0.2/grid/rootupgrade.sh
Performing root user operation.

The following environment variables are set as:
    ORACLE_OWNER= oracle
    ORACLE_HOME=  /u01/app/12.1.0.2/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The file "oraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]:
The contents of "coraenv" have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/app/12.1.0.2/grid/crs/install/crsconfig_params
2014/07/26 16:15:51 CLSRSC-4015: Performing install or upgrade action for Oracle
Trace File Analyzer (TFA) Collector.

2014/07/26 16:19:58 CLSRSC-4003: Successfully patched Oracle Trace File Analyzer (TFA) Collector.

2014/07/26 16:20:02 CLSRSC-464: Starting retrieval of the cluster configuration data

2014/07/26 16:20:51 CLSRSC-465: Retrieval of the cluster configuration data has successfully completed.

2014/07/26 16:20:51 CLSRSC-363: User ignored prerequisites during installation

ASM configuration upgraded in local node successfully.

2014/07/26 16:21:16 CLSRSC-466: Starting shutdown of the current Oracle Grid Infrastructure stack

2014/07/26 16:22:51 CLSRSC-467: Shutdown of the current Oracle Grid Infrastructure stack has successfully
completed.

OLR initialization - successful
2014/07/26 16:26:53 CLSRSC-329: Replacing Clusterware entries in file 'oracle-ohasd.conf'

CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
2014/07/26 16:34:34 CLSRSC-343: Successfully started Oracle Clusterware stack

clscfg: EXISTING configuration version 5 detected.
clscfg: version 5 is 12c Release 1.
Successfully taken the backup of node specific configuration in OCR.
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
2014/07/26 16:35:55 CLSRSC-478: Setting Oracle Clusterware active version on the last node to be upgraded

2014/07/26 16:35:55 CLSRSC-482: Running command: '/u01/app/12.1.0.2/grid/bin/crsctl set crs activeversion'

Started to upgrade the Oracle Clusterware. This operation may take a few minutes.
Started to upgrade the CSS.
The CSS was successfully upgraded.
Started to upgrade Oracle ASM.
Started to upgrade the CRS.
The CRS was successfully upgraded.
Successfully upgraded the Oracle Clusterware.
Oracle Clusterware operating version was successfully set to 12.1.0.2.0
2014/07/26 16:38:51 CLSRSC-479: Successfully set Oracle Clusterware active version

2014/07/26 16:39:13 CLSRSC-476: Finishing upgrade of resource types

2014/07/26 16:39:26 CLSRSC-482: Running command: 'upgrade model  -s 12.1.0.1.0 -d 12.1.0.2.0 -p last'

2014/07/26 16:39:26 CLSRSC-477: Successfully completed upgrade of resource types

2014/07/26 16:40:17 CLSRSC-325: Configure Oracle Grid Infrastructure for a Cluster ... succeeded

Did you notice that TFA has been added? Trace File Analyzer is another of these cool things to play with, it was available with 11.2.0.4 and as an add-on to 12.1.0.1.

Result!

Back to OUI to complete the upgrade. After which cluvfy performs a final check and I’m done. Prove it worked:

[oracle@rac12node1 ~]$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Thu Jul 26 17:13:02 2014

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> select banner from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
PL/SQL Release 12.1.0.2.0 - Production
CORE	12.1.0.2.0	Production
TNS for Linux: Version 12.1.0.2.0 - Production
NLSRTL Version 12.1.0.2.0 - Production

SQL>

In another post I’ll detail the upgrade for my databases. I am particularly interested about the unplug/plug way of migrating…

Posted in 12c Release 1, KVM, Linux, RAC | Tagged: , | 6 Comments »

RAC 12c enhancements: adding an additional SCAN-part 1

Posted by Martin Bach on April 22, 2014

Based on customer request Oracle has added the functionality to add a second SCAN, completely independent of the SCAN defined/created during the cluster creation. Why would you want to use this feature? A few reasons that spring to mind are:

  • Consolidation: customers insist on using a different network
  • Separate network for Data Guard traffic

To demonstrate the concept I am going to show you in this blog post how I

  1. Add a new network resource
  2. Create new VIPs
  3. Add a new SCAN
  4. Add a new SCAN listener

It actually sounds more complex than it is, but I have a feeling I need to split this article in multiple parts as it’s far too long.

The lab setup

When you install RAC 11.2 and 12.1 you are prompted to specify a Single Client Access Name, or SCAN. This SCAN is usually defined in the corporate DNS server and resolves to 3 IP addresses. This allows for an easy way to implement client-side load balancing. The SCAN is explained in more detail in Pro Oracle Database 11g RAC on Linux for 11.2 and on OTN for 11.2 and 12.1. To spice the whole configuration up a little bit I decided to use RAC One Node on the clusters I am using for this demonstration.

I created 2 12.1.0.1.2 clusters for this Data Guard test. Hosts ron12cprinode1 and ron12cprinode2 form the primary cluster, ron12csbynode1 and ron12csbynode2 will form the standby cluster. The RAC One Node database is named RON:

[oracle@ron12cprinode1 ~]$ srvctl config database -db ron
Database unique name: ron
Database name: ron
Oracle home: /u01/app/oracle/product/12.1.0.1/dbhome_1
Oracle user: oracle
Spfile: +DATA/ron/spfilepri.ora
Password file: +DATA/ron/orapwpri
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: ron
Database instances:
Disk Groups: RECO
Mount point paths:
Services: ron12c
Type: RACOneNode
Online relocation timeout: 30
Instance name prefix: pri
Candidate servers: ron12cprinode1,ron12cprinode2
Database is administrator managed
[oracle@ron12cprinode1 ~]$

To make things even more interesting I defined my ORACLE_SID prefix on the primary to be “pri” and “sby” on the standby.

[oracle@ron12cprinode1 ~]$ ps -ef | grep smon
oracle    2553     1  0 Feb06 ?        00:00:09 asm_smon_+ASM1
oracle   15660 15578  0 05:05 pts/3    00:00:00 grep smon
oracle   28241     1  0 Feb07 ?        00:00:18 ora_smon_pri_1
[oracle@ron12cprinode1 ~]$

A quick check with gpnptool reveals the network usage before the addition of the second SCAN:

<gpnp:Network-Profile>
 <gpnp:HostNetwork id="gen" HostName="*">
  <gpnp:Network id="net1" IP="192.168.100.0" Adapter="eth0" Use="public"/>
  <gpnp:Network id="net2" IP="192.168.101.0" Adapter="eth1" Use="cluster_interconnect"/>
 </gpnp:HostNetwork>
</gpnp:Network-Profile>

There is the default network, (“netnum 1”) that is created on the network defined as “public” during the installation. I have another spare network port (eth2) reserved for the new network and Data Guard traffic. Currently network 1 is the only one available.

[root@ron12cprinode1 ~]# srvctl config network
Network 1 exists
Subnet IPv4: 192.168.100.0/255.255.255.0/eth0, static
Subnet IPv6:

As you can see RAC 12c now supports IPv6. I have another network available that I want to make available for Data Guard traffic. For this purpose I added all nodes into DNS. I am a bit old-fashioned when it comes to DNS, I am still using bind most of the time. Here is an excerpt of my reverse name resolution file:

; hosts - primary cluster
50	PTR	ron12cprinode1.dg.example.com.
51	PTR	ron12cprinode1-vip.example.com.
52	PTR	ron12cprinode2.dg.example.com.
53	PTR	ron12cprinode2-vip.dg.example.com.
; Data Guard SCAN - primary cluster
54	PTR	ron12cpri-scan.dg.example.com.
55	PTR	ron12cpri-scan.dg.example.com.
56	PTR	ron12cpri-scan.dg.example.com.

; hosts - standby cluster
57	PTR	ron12csbynode1.dg.example.com.
58	PTR	ron12csbynode1-vip.dg.example.com.
59	PTR	ron12csbynode2.dg.example.com.
60	PTR	ron12csbynode2-vip.dg.example.com.
; Data Guard SCAN - standby cluster
61	PTR	ron12csby-scan.dg.example.com.
62	PTR	ron12csby-scan.dg.example.com.
63	PTR	ron12csby-scan.dg.example.com.

The domain is *.dg.example.com, the primary database client traffic will be routed through *.example.com.

Adding the new network

The first step to be performed is to make Clusterware aware of the second network. I am doing this on both sides of the cluster. Notice that the primary nodes are called *pri* whereas the standby cluster is called *sby*

[root@ron12cprinode1 ~]# srvctl add network -netnum 2 -subnet 192.168.102.0/255.255.255.0/eth2 -nettype static -verbose
Successfully added Network.

[root@ron12csbynode1 ~]# srvctl add network -netnum 2 -subnet 192.168.102.0/255.255.255.0/eth2 -nettype static -verbose
Successfully added Network.

So this worked, now I have 2 networks:

[root@ron12cprinode1 ~]# srvctl config network
Network 1 exists
Subnet IPv4: 192.168.100.0/255.255.255.0/eth0, static
Subnet IPv6:
Network 2 exists
Subnet IPv4: 192.168.102.0/255.255.255.0/eth2, static
Subnet IPv6:
[root@ron12cprinode1 ~]#

In the next step I have to add VIPs for the new nodes on the *.dg.example.com subnet. The VIPs must be added on all cluster nodes, 4 in my case.

[oracle@ron12cprinode2 ~]$ srvctl add vip -h

Adds a VIP to the Oracle Clusterware.

Usage: srvctl add vip -node <node_name> -netnum <network_number> -address {<name>|<ip>}/<netmask>[/if1[|if2...]] [-skip] [-verbose]
    -node     <node_name>          Node name
    -address                       <vip_name|ip>/<netmask>[/if1[|if2...]] VIP address specification for node applications
    -netnum   <net_num>            Network number (default number is 1)
    -skip                          Skip reachability check of VIP address
    -verbose                       Verbose output
    -help                          Print usage
[oracle@ron12cprinode2 ~]$

So I did this on each node in my cluster

[root@ron12cprinode1 ~]# srvctl add vip -node ron12cprinode1 -netnum 2 -address 192.168.102.51/255.255.255.0/eth2 -verbose
Network exists: 2/192.168.102.0/255.255.255.0/eth2, type static
Successfully added VIP.
[root@ron12cprinode2 ~]# srvctl add vip -node ron12cprinode2 -netnum 2 -address 192.168.102.53/255.255.255.0/eth2 -verbose
Network exists: 2/192.168.102.0/255.255.255.0/eth2, type static
Successfully added VIP.

[root@ron12csbynode1 ~]# srvctl add vip -node ron12csbynode1 -netnum 2 -address 192.168.102.58/255.255.255.0/eth2 -verbose
Network exists: 2/192.168.102.0/255.255.255.0/eth2, type static
Successfully added VIP.
[root@ron12csbynode2 ~]# srvctl add vip -node ron12csbynode2 -netnum 2 -address 192.168.102.60/255.255.255.0/eth2 -verbose
Network exists: 2/192.168.102.0/255.255.255.0/eth2, type static
Successfully added VIP.

And I need to start the VIPs. They have some funny names as you can see in crsctl status resource (the names can’t be defined, see output of srvctl add scan -h above)

[root@ron12cprinode1 ~]# srvctl status vip -vip ron12cprinode1_2
VIP 192.168.102.51 is enabled
VIP 192.168.102.51 is not running
[root@ron12cprinode1 ~]# srvctl start vip -vip ron12cprinode1_2
[root@ron12cprinode1 ~]# srvctl start vip -vip ron12cprinode2_2
[root@ron12cprinode1 ~]# srvctl status vip -vip ron12cprinode1_2
VIP 192.168.102.51 is enabled
VIP 192.168.102.51 is running on node: ron12cprinode1
[root@ron12cprinode1 ~]#

Add the second SCAN

At this time you can add the second SCAN. The command syntax is shown here:

[oracle@ron12cprinode1 ~]$ srvctl add scan -h

Adds a SCAN VIP to the Oracle Clusterware.

Usage: srvctl add scan -scanname <scan_name> [-netnum <network_number>]
    -scanname <scan_name>          Domain name qualified SCAN name
    -netnum  <net_num>             Network number (default number is 1)
    -subnet                        <subnet>/<netmask>[/if1[|if2...]] NET address specification for network
    -help                          Print usage

Implemented on my first cluster node the command is easier to comprehend.

[root@ron12cprinode1 ~]# srvctl add scan -scanname ron12cpri-dgscan.dg.example.com -netnum 2

[root@ron12cprinode1 ~]# srvctl status scan
SCAN VIP scan1 is enabled
SCAN VIP scan1 is running on node ron12cprinode2
SCAN VIP scan2 is enabled
SCAN VIP scan2 is running on node ron12cprinode1
SCAN VIP scan3 is enabled
SCAN VIP scan3 is running on node ron12cprinode1

You need to create the SCAN on both clusters. On my primary cluster the SCAN has been created with the following configuration:

[root@ron12cprinode1 ~]# srvctl config scan -netnum 2
SCAN name: ron12cpri-dgscan.dg.example.com, Network: 2
Subnet IPv4: 192.168.102.0/255.255.255.0/eth2
Subnet IPv6:
SCAN 0 IPv4 VIP: 192.168.102.54
SCAN name: ron12cpri-dgscan.dg.example.com, Network: 2
Subnet IPv4: 192.168.102.0/255.255.255.0/eth2
Subnet IPv6:
SCAN 1 IPv4 VIP: 192.168.102.55
SCAN name: ron12cpri-dgscan.dg.example.com, Network: 2
Subnet IPv4: 192.168.102.0/255.255.255.0/eth2
Subnet IPv6:
SCAN 2 IPv4 VIP: 192.168.102.56
[root@ron12cprinode1 ~]#

You can see the new VIPs in the output of ifconfig, just as you would with the primary SCAN:

eth2      Link encap:Ethernet  HWaddr 52:54:00:FE:E2:D5
          inet addr:192.168.102.50  Bcast:192.168.102.255  Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fefe:e2d5/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:523 errors:0 dropped:0 overruns:0 frame:0
          TX packets:339 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:118147 (115.3 KiB)  TX bytes:72869 (71.1 KiB)

eth2:1    Link encap:Ethernet  HWaddr 52:54:00:FE:E2:D5
          inet addr:192.168.102.55  Bcast:192.168.102.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

So there is nothing too surprising in the output, it’s exactly the same as before with the public SCAN created during the installation.

End of part 1

This already seems like a lot of text to me so I think it’s time to pause here. The next parts will demonstrate the addition of the SCAN listeners, the new node listeners on the *.dg.example.com network and finally the duplication of the primary RAC One Node database for use as a standby database.

Posted in 12c Release 1, KVM | Tagged: , | 4 Comments »

Interesting GNS anomaly in 12.1.0.1.2 RAC

Posted by Martin Bach on March 13, 2014

I was encountering an interesting anomaly with my 12.1.0.1.2 RAC cluster based on Grid Naming System. I have written about the setup here.

No need to get back to the post-instead let me give you some background if you are not familiar with the Grid Naming System in RAC. Most RAC installations use a Domain Name Server (DNS) to resolve names to IP addresses. In the case of GNS the name resolution is delegated from the main DNS server to a so-called subdomain. In my case my DNS server (bind version 9) is configured on aux.example.com (192.168.100.2), and it delegates name resolution for *.gns.example.com (my cluster) to the Grid Naming System. The GNS address in use is 192.168.100.37 and must be defined in the corporate DNS. The required IP addresses have to be supplied by a DHCP server in the same range as the public network. My public network is on 192.168.100/24, the private network is on 192.168.101/24 and 192.168.102/24. Which component resolves what? Here is a summary:

  • The GNS VIP is resolved by DNS. It’s not in the subdomain according to the documentation. I am using gns-vip.example.com (192.168.100.37) for it.
  • The node public IP is resolved via GNS. Oracle uses the $(hostname) to assign the name. The IP address is fixed
  • The node VIPs are resolved via GNS, the assigned name is $(hostname)-vip and the addresses are assigned by DHCP
  • The private interconnect interface is resolved by GNS, usually as $(hostname)-priv. Addresses are either static or assigned by DHCP
  • The SCAN VIPs are resolved via GNS and assigned by DHCP. And this is exactly where my problem was.

The Problem

All over sudden I had connection problems with my tomcat server connecting to the database. The address I used in tnsnames ora was rac12scan.gns.example.com and hadn’t changed at all. It simply didn’t point to a valid IP half of the time. When I pinged the SCAN I was surprised to see that it listed adresses that weren’t in the output of “srvctl config scan”.

And sure enough, a nslookup against the SCAN resulted in the output of 7 (!) addresses. There was obviously a discrepancy between what Oracle thought the SCAN was (3 IP addresses) and what GNS reported. I tried various cures, including restarting the DNS and DHCP servers (even moving the lease file on the DHCP server to a temporary location), and variations of restarting GNS and the whole cluster (crsctl stop clsuter -all; crsctl start cluster -all). Nothing helped.

Troubleshooting

As you’d expect the name resolution only works with GNS started. If you try to “nslookup rac12scan.gns.example.com” while GNS is down, you get the NXDOMAIN result code which does not really help. The output of “srvctl status gns -list and -detail” is useful in troubleshooting the problem. I also ran “cluvfy comp gns” but that came back clean. Nevertheless, the system instisted that the SCAN was made up of 7 IP addresses, and only 4 of them had SCAN VIPs. No surprise I can’t connect half the time. Now the question is of course: why does the GNS daemon grab so many IP addresses? So I looked at the gnsd.log in $GRID_HOME/log/$(hostname -s)/gns to see what happens. Here is an excerpt from the log file:

2014-03-11 16:37:30.305: [     GNS][1009862208]main::clsgnocrOpenKeyInternal: (:CLSGN01305:) #0 of 3: OCR error 4 (PROC-4: The cluster registry key to be operated on does not exist.) is not expected 32 - no retry will be done.
2014-03-11 16:37:30.305: [     GNS][1009862208]main::clsgnocrOpenKeyInternal: (:CLSGN01307:) Pass #0: open of "SYSTEM.GNS.SUBDOMAIN.gns|dexample|dcom.rac12gns-scan1-vip.SRV" failed with error 4 Last: PROC-4: The cluster registry key to be operated on does not exist. (4) - unable to retry.
2014-03-11 16:37:30.306: [     GNS][1009862208]main::clsgnocrOpenKeyInternal: (:CLSGN01305:) #0 of 3: OCR error 4 (PROC-4: The cluster registry key to be operated on does not exist.) is not expected 32 - no retry will be done.
2014-03-11 16:37:30.306: [     GNS][1009862208]main::clsgnocrOpenKeyInternal: (:CLSGN01307:) Pass #0: open of "SYSTEM.GNS.SUBDOMAIN.gns|dexample|dcom.rac12gns-scan1-vip.TXT" failed with error 4 Last: PROC-4: The cluster registry key to be operated on does not exist. (4) - unable to retry.
2014-03-11 16:37:30.306: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12node1-vip" Type: A 192.168.100.44 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.308: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12node2-vip" Type: A 192.168.100.39 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.308: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12scan" Type: A 192.168.100.34 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.309: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12scan" Type: A 192.168.100.36 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.309: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12scan" Type: A 192.168.100.35 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.309: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12scan" Type: A 192.168.100.40 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.309: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12scan" Type: A 192.168.100.42 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.309: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12scan" Type: A 192.168.100.41 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.309: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12scan" Type: A 192.168.100.31 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.309: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12gns-scan2-vip" Type: A 192.168.100.41 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.310: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12gns-scan2-vip" Type: A 192.168.100.34 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.310: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12gns-scan3-vip" Type: A 192.168.100.40 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.310: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12gns-scan3-vip" Type: A 192.168.100.38 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.310: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12gns-scan1-vip" Type: A 192.168.100.31 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.310: [   CLSNS][1009862208]Resolve::clsns_SetTraceLevel:trace level set to 1.
2014-03-11 16:37:30.310: [   CLSNS][1009862208]Resolve::clsns_SetTraceLevel:trace level set to 1.
2014-03-11 16:37:30.311: [   CLSNS][1009862208]Resolver #0::clsns_SetTraceLevel:trace level set to 1.
2014-03-11 16:37:30.311: [   CLSNS][1009862208]Resolver #0::clsns_SetTraceLevel:trace level set to 1.
2014-03-11 16:37:30.311: [   CLSNS][1009862208]Resolver #1::clsns_SetTraceLevel:trace level set to 1.

You can see the address assignment here, and this corresponds to the DHCP leases I saw on the DHCP server host’s /var/log/messages file. Notice that rac12scan grabs 7 IP addresses. This is surprising, the documentation normally states just 3 IP addresses for use with the SCAN. As you can further see the SCAN VIPs resolve only to a subset of these. 192.168.100.36 for example does not have a corresponding SCAN-VIP like some others too. Also, rac12gns-scan2-vip and rac12gns-scan3-vip appear twice. So if rac12scan now resolves to 192.168.100.36 you will get a TNS Error in the form “Destination host does not exist”. Annoying! But why does it do so? The mesages before (PROC-4) gave me a clue.

The information about GNS VIPs seems to be stored in the OCR. An ocrdump confirmed the findings. The question remains: why do Oracle use more than 3 IP addresses for the SCAN? I can only speculate that the DHCP addresses obtained are added to the OCR, and the existing ones don’t seem to be cleared out or updated correctly.

In an effort to solve the problem I dropped the SCAN listeners and the SCAN VIPS and recreated them. But this seemed to confuse the cluster even more, and the name resolution didn’t work. I tried both the subdomain delegation (nslookup rac12scan.gns.example.com) and querying GNS directly (nslookup rac12scan.gns.example.com – 192.168.100.37), both did not resolve anything. A restart of Clusterware didn’t change the situation so I decided to reboot the cluster nodes hoping to bring everything back to normal (yes I should know better, this isn’t Windows). After some waiting until the servers come back up there was no change. Now the system was so confused that it didn’t come up with a SCAN at all. Trying to restore the OCR failed, and I couldn’t even run a rootcrs.pl -deconfig -force in preparation for a reinstallation of Clusterware. So I trashed the system. I don’t believe there is a time for GNS in my lab after this experience.

Thanks to Marcin Przepiorowski (@pioro) who helped during the investigation.

References:

Grid Infrastructure Installation Guide 12c:

  • 1.1.3 Oracle Grid Infrastructure Network Checklist
  • 5.5.1 About Oracle Grid Infrastructure Name Resolution Options
  • 5.11 Grid Naming Service Standard Cluster Configuration Example

Posted in 12c Release 1, KVM, Linux | 1 Comment »

Applying GI PSU 12.1.0.1.2 in the lab

Posted by Martin Bach on January 15, 2014

In my previous posts about the first RAC Grid Infrastructure Patchset I document a few issues I encountered that were worth noting. But where things work as advertised I am more than happy to document it too. In a way, the January 2014 GI PSU works as you’d hope it would (at least in my lab for my 2 node cluster). Well-almost: if you have a non 12.1 database in your environment you might encounter this.

UPDATE: You might want review some additional information with regards to datapatch.

Admittedly it’s taken from an Oracle Restart (i.e. non cluster) environment but I can’t see this not happening in RAC:

[root@server1 stage]# opatchauto apply /u01/stage/12.1.0.1.2/17735306 -ocmrf /u01/stage/ocm.rsp
OPatch Automation Tool
Copyright (c) 2013, Oracle Corporation.  All rights reserved.

OPatchauto version : 12.1.0.1.2
OUI version        : 12.1.0.1.0
Running from       : /u01/app/grid/product/12.1.0.1/grid

opatchauto log file: /u01/app/grid/product/12.1.0.1/grid/cfgtoollogs/opatchauto/17735306/ -
opatch_gi_2014-02-17_20-04-54_deploy.log

Parameter Validation: Successful

System Configuration Collection failed: oracle.osysmodel.driver.crs.productdriver.ProductDriverException:
PRCD-1027 : Failed to retrieve database ora11
PRCD-1229 : An attempt to access configuration of database ora11 was rejected because its
 version 11.2.0.3.0 differs from the program version 12.1.0.1.0. Instead run the program
from /u01/app/oracle/product/11.2.0.3.

opatchauto failed with error code 2.
[root@server1 stage]#

The environment

I have a 2 node RAC cluster that hasn’t previously been patched. It uses GNS for name resolution and is probably one of the more complex setups. So I was keen to test the new GI Patch Set Update. Note that a PSU will roll back previous patches, it’s not a delta as you might know from RPM.

The cluster installation using GNS was slightly more challenging than the usual RAC system so I decided not to try separation of duties. In other words, the GRID and RDBMS owner are identical: oracle.

Staging

As always you have to update OPatch before you can apply the latest patch. I did this in all 4 homes: 2 RDBMS and 2 Grid homes. Download and stage the patch to a location you like by unzipping it.

Next you need the ocm response file. I simply copied it from the other cluster.

Patch application

I started the installation of the patch on node 1 of my cluster in a screen session as always. Node 1 was responsible for the GNS resource at the time. Following the instructions and some previous experience I decided to let opatchauto do all of the work. And this time Oracle even included the “apply” keyword in the docs! And it didn’t bail out immediately with obscure error messages either…

[root@rac12node1 temp]# opatchauto apply /u01/temp/17735306 -ocmrf /u01/temp/ocm.rsp
OPatch Automation Tool
Copyright (c) 2013, Oracle Corporation.  All rights reserved.

OPatchauto version : 12.1.0.1.2
OUI version        : 12.1.0.1.0
Running from       : /u01/app/12.1.0.1/grid

opatchauto log file: /u01/app/12.1.0.1/grid/cfgtoollogs/opatchauto/17735306/opatch_gi_2014-01-15_08-54-25_deploy.log

Parameter Validation: Successful

Grid Infrastructure home:
/u01/app/12.1.0.1/grid
RAC home(s):
/u01/app/oracle/product/12.1.0.1/dbhome_1

Configuration Validation: Successful

Patch Location: /u01/temp/17735306
Grid Infrastructure Patch(es): 17077442 17303297 17552800
RAC Patch(es): 17077442 17552800

Patch Validation: Successful

Stopping RAC (/u01/app/oracle/product/12.1.0.1/dbhome_1) ... Successful
Following database(s) were stopped and will be restarted later during the session: rac12c

Applying patch(es) to "/u01/app/oracle/product/12.1.0.1/dbhome_1" ...
Patch "/u01/temp/17735306/17077442" successfully applied to "/u01/app/oracle/product/12.1.0.1/dbhome_1".
Patch "/u01/temp/17735306/17552800" successfully applied to "/u01/app/oracle/product/12.1.0.1/dbhome_1".

Stopping CRS ... Successful

Applying patch(es) to "/u01/app/12.1.0.1/grid" ...
Patch "/u01/temp/17735306/17077442" successfully applied to "/u01/app/12.1.0.1/grid".
Patch "/u01/temp/17735306/17303297" successfully applied to "/u01/app/12.1.0.1/grid".
Patch "/u01/temp/17735306/17552800" successfully applied to "/u01/app/12.1.0.1/grid".

Starting CRS ... Successful

Starting RAC (/u01/app/oracle/product/12.1.0.1/dbhome_1) ... Successful

SQL changes, if any, are applied successfully on the following database(s): RAC12C

Apply Summary:
Following patch(es) are successfully installed:
GI Home: /u01/app/12.1.0.1/grid: 17077442, 17303297, 17552800
RAC Home: /u01/app/oracle/product/12.1.0.1/dbhome_1: 17077442, 17552800

opatchauto succeeded.

Wow, I could hardly believe my eyes. I “tail”d the logfile in a different session to see what it was doing but the output seems less verbose these days than initial when “opatch auto” came out. See further down in the article about useful log locations.

A few remarks. It took quite a while for the patch to advance past “Starting CRS…” I remember staring at the screen anxiously in previous patch exercises and witnessed the CRS start sequence timing out. In this case there is a valid reason for the slowness, and it’s documented in $GRID_HOME/cfgtoollogs/crsconfig/crspatch*.log: it validates ACFS and applies patches to the -MGMTDB database. In fact you should open a third session to tail the crspatch log file to see what is happening as it provides a wealth of information about stopping and starting Clusterware resources.

With GI PSU 12.1.0.1.1 I ran into problems with CRS which got confused about who is mastering the OCR. With 12.1.0.1.2 this didn’t happen. CRS started successfully on the last node, and even started the database instance.

Log locations

Information about what’s going on can be found in many locations…

    • $GRID_HOME/cfgtoollogs/opatchauto/17735306/opatch_gi_timestamp_deploy.log for the opatchauto process
    • $GRID_HOME/cfgtoollogs/crsconfig/crspatch_hostname_timestamp.log records stop, unlocking, start and locking of Clusterware

More log information can be found in:

  • $GRID_HOME/cfgtoollogs/opatch contains logs for the opatch commands issued against the GRID home
  • $ORACLE_HOME/cfgtoollogs/opatch/opatchtimestamp.log – records output from the individual opatch commands issued against the RDBMS home.

Note these logs are not written to in that sequence. Start with the opatchauto logs, then refer to the more specific log locations for individual troubleshooting. The complete log file names are referenced at the end of the opatchauto log.

SQLPatch

The post installation steps require you to load SQL scripts into the database-it seems to have done that, as shown in the opatchauto log file:

2014-01-15_09-32-13 :
Ignoring the dbconsole command.
2014-01-15_09-32-17 :
Executing command:
/bin/bash -c 'ORACLE_HOME=/u01/app/oracle/product/12.1.0.1/dbhome_1 \
ORACLE_SID=$(/u01/app/oracle/product/12.1.0.1/dbhome_1/bin/srvctl status instance -d RAC12C -n rac12node1 | cut -d " " -f 2) \
/u01/app/oracle/product/12.1.0.1/dbhome_1/OPatch/datapatch'
2014-01-15_09-33-45 :
Successfully executed the above command.

SQL changes, if any, are applied successfully on the following database(s): RAC12C

Except that I could not find a log file in $ORACLE_HOME/sqlpatch/17552800 directory in the RDBMS home on the first node. The -MGMTDB has been patched, the log is in the $GRID_HOME. The registry$history view didn’t show any sign of a patch either. Only after applying the PSU on the last node did Oracle patch the database.

Update

It appears that PSU 1 and 2 have issues when running datapatch. I found this in the output of my patch log:

Executing command:
/bin/bash -c 'ORACLE_HOME=/u01/app/oracle/product/12.1.0.1/dbhome_2 ORACLE_SID=$(/u01/app/oracle/product/12.1.0.1/dbhome_2/bin/srvctl sta
tus instance -d RCDB1 -n rac12node3 | cut -d " " -f 2) /u01/app/oracle/product/12.1.0.1/dbhome_2/OPatch/datapatch'
2014-03-21_11-25-36 :

COMMAND EXECUTION FAILURE :
SQL Patching tool version 12.1.0.1.0 on Fri Mar 21 11:23:26 2014
Copyright (c) 2013, Oracle.  All rights reserved.

Connecting to database...OK
Determining current state...done
For the following PDBs: CDB$ROOT
  Nothing to roll back
  The following patches will be applied: 17552800
For the following PDBs: PDB$SEED
  Nothing to roll back
  The following patches will be applied: 17552800
For the following PDBs: ACDEMOPDB
  Nothing to roll back
  The following patches will be applied: 17552800
For the following PDBs: DUBLIN
  Nothing to roll back
  The following patches will be applied: 17552800
Adding patches to installation queue...done
Installing patches...done
Validating logfiles...

ERROR:
mkdir: cannot create directory `/u01/app/grid/cfgtoollogs': Permission denied
mkdir: cannot create directory `/u01/app/grid/cfgtoollogs': Permission denied
mkdir: cannot create directory `/u01/app/grid/cfgtoollogs': Permission denied
mkdir: cannot create directory `/u01/app/grid/cfgtoollogs': Permission denied
DBD::Oracle::st execute failed: ORA-03113: end-of-file on communication channel
Process ID: 20551
Session ID: 21 Serial number: 9 (DBD ERROR: OCIStmtExecute) [for Statement "ALTER PLUGGABLE DATABASE pdb$seed
                 CLOSE IMMEDIATE INSTANCES=ALL"] at /u01/app/oracle/product/12.1.0.1/dbhome_2/sqlpatch/sqlpatch.pm line 448, <LOGFILE> line 6129.

The corresponding bug number is 17421502, and a patch exists. I downloaded the patch and applied it on all of my nodes (rolling!). After everything has come up, I re-ran datapatch and this time it seemed to have worked without an “end-of-file on communication channel”. See MOS Datapatch may fail while patching a RAC+PDB environment (Doc ID 1632818.1) for more information.


Note that you have to set your ORACLE_SID to the database to be patched for the above command to work.

Summary

Applying the January PSU to my cluster worked flawlessly, definitely a huge improvement over the last PSU which was a bit of a disaster.

Posted in 12c Release 1, KVM, Linux | Tagged: | Leave a Comment »