Martins Blog

Trying to explain complex things in simple terms

OSWatcher integration in Trace File Analyzer (TFA)

Posted by Martin Bach on August 2, 2016

Some time ago I wrote a post about using OSWatcher for system analysis. Neil Chandler (@ChandlerDBA) rightfully pointed out that although OSWatcher was cool, TFA was the way to go. TFA can include OSWatcher, but more importantly it adds a lot of value over and above what OSWatcher does.

I guess it depends on what you want to do-I still think that OSWatcher is a good starting point and enough for most problems on single instance systems. When it comes to clustered environments, TFA looks a lot more appealing though.

In this article I am taking a closer look at using TFA – which is part of the Oracle 11.2.0.4 and 12.1.0.2. TFA is automatically updated as part of the quarterly patches, which is nice because the default/base release does not seem to be working properly. Thankfully TFA can be patched outside the regular patch cycle.

What is TFA?

TFA is a tool which – amongst other things – helps you gather information about incidents across your cluster. If you ever worked on Exadata half-racks or other clusters with more than 4 nodes you will quickly start to appreciate having to use one tool for this task. The TFA output is suitable for attaching to a Service Request which should, at least in theory, help speed up the problem resolution.

It is also an excellent parsing tool and has excellent reporting capabilities thanks to its “analyze” command.

As an added benefit you get a lot of tools that were previously known as “RAC and DB Support Tools Bundle”. This includes OSWatcher as well, the reason for this post.

Plus you don’t have to worry about starting OSWatcher when booting: TFA is started via a systemd unit file in Oracle Linux 7, and I found it started as a service in Oracle Linux 6. On OL7.x you can check its status using the standard systemd commands suite, as shown here:

[oracle@rac12sbnode1 ~]$ systemctl status oracle-tfa
oracle-tfa.service - Oracle Trace File Analyzer
   Loaded: loaded (/etc/systemd/system/oracle-tfa.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2016-08-02 09:46:24 BST; 3h 14min ago
 Main PID: 27799 (init.tfa)
   CGroup: /system.slice/oracle-tfa.service
           ├─14670 /bin/sleep 30
           ├─27799 /bin/sh /etc/init.d/init.tfa run >/dev/null 2>&1 </dev/null
           └─27890 /u01/app/12.1.0.2/grid/jdk/jre/bin/java -Xms128m -Xmx512m \
                     oracle.rat.tfa.TFAMain /u01/app/12.1.0.2/grid/tfa/rac12sbnode1/tfa_home
[oracle@rac12sbnode1 ~]$ 

Updating TFA

It is quite likely that your version of TFA is older than the one available from MOS ID 1513912.2, which appears to be its main landing page. I applied the proactive bundle patch for July 2016 to my 2 node RAC cluster and found the TFA version to be 12.1.2.7.0. At the time of writing Oracle has released TFA 12.1.2.8.0.

The update is quite simple, but needs to be performed as root. To be sure I’m not doing something I shouldn’t be doing I checked the current version:

[oracle@rac12sbnode1 ~]$ tfactl print status

.----------------------------------------------------------------------------------------------------.
| Host         | Status of TFA | PID   | Port | Version    | Build ID             | Inventory Status |
+--------------+---------------+-------+------+------------+----------------------+------------------+
| rac12sbnode1 | RUNNING       | 23081 | 5000 | 12.1.2.7.0 | 12127020160304140533 | COMPLETE         |
| rac12sbnode2 | RUNNING       |  6296 | 5000 | 12.1.2.7.0 | 12127020160304140533 | COMPLETE         |
'--------------+---------------+-------+------+------------+----------------------+------------------'

In the next step, after switching to the root account, I staged the TFA software and executed the installer. This will automatically distribute the new version across all nodes in the cluster.

[root@rac12sbnode1 patches]# . oraenv
ORACLE_SID = [root] ? +ASM1
The Oracle base has been set to /u01/app/oracle
[root@rac12sbnode1 patches]# unzip -q TFALite_v12.1.2.8.0.zip 
[root@rac12sbnode1 patches]# ./installTFALite 
TFA Installation Log will be written to File : /tmp/tfa_install_25296_2016_08_02-09_40_12.log

Starting TFA installation

TFA HOME : /u01/app/12.1.0.2/grid/tfa/rac12sbnode1/tfa_home
TFA Build Version: 121280 Build Date: 201606232222
Installed Build Version: 121270 Build Date: 201603041405

TFA is already installed. Patching /u01/app/12.1.0.2/grid/tfa/rac12sbnode1/tfa_home...
TFA patching typical install from zipfile is written to /u01/app/12.1.0.2/grid/tfa/rac12sbnode1/tfapatch.log

TFA will be Patched on: 
rac12sbnode1
rac12sbnode2

Do you want to continue with patching TFA? [Y|N] [Y]: y

Checking for ssh equivalency in rac12sbnode2
Node rac12sbnode2 is not configured for ssh user equivalency

SSH is not configured on these nodes : 
rac12sbnode2

Do you want to configure SSH on these nodes ? [Y|N] [Y]: y

Configuring SSH on rac12sbnode2...

Generating keys on rac12sbnode1...

Copying keys to rac12sbnode2...

/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@rac12sbnode2's password: 

Creating ZIP: /u01/app/12.1.0.2/grid/tfa/rac12sbnode1/tfa_home/internal/tfapatch.zip

Using SSH to patch TFA to remote nodes :

Applying Patch on rac12sbnode2:

TFA_HOME: /u01/app/12.1.0.2/grid/tfa/rac12sbnode2/tfa_home
Stopping TFA Support Tools...
Shutting down TFA
Removed symlink /etc/systemd/system/multi-user.target.wants/oracle-tfa.service.
Removed symlink /etc/systemd/system/graphical.target.wants/oracle-tfa.service.
. . . . . 
. . . 
Successfully shutdown TFA..
Copying files from rac12sbnode1 to rac12sbnode2...

Current version of Berkeley DB in  is 5.0.84, so no upgrade required
Running commands to fix init.tfa and tfactl in rac12sbnode2...
Updating init.tfa in rac12sbnode2...
Starting TFA in rac12sbnode2...
Starting TFA..
Created symlink from /etc/systemd/system/multi-user.target.wants/oracle-tfa.service to /etc/systemd/system/oracle-tfa.service.
Created symlink from /etc/systemd/system/graphical.target.wants/oracle-tfa.service to /etc/systemd/system/oracle-tfa.service.
Waiting up to 100 seconds for TFA to be started..
. . . . . 
Successfully started TFA Process..
. . . . . 
TFA Started and listening for commands

Enabling Access for Non-root Users on rac12sbnode2...


Applying Patch on rac12sbnode1:

Stopping TFA Support Tools...

Shutting down TFA for Patching...

Shutting down TFA
Removed symlink /etc/systemd/system/graphical.target.wants/oracle-tfa.service.
Removed symlink /etc/systemd/system/multi-user.target.wants/oracle-tfa.service.
. . . . . 
. . . 
Successfully shutdown TFA..

Current version of Berkeley DB is 5.0.84, so no upgrade required

Copying TFA Certificates...

Running commands to fix init.tfa and tfactl in localhost

Starting TFA in rac12sbnode1...

Starting TFA..
Created symlink from /etc/systemd/system/multi-user.target.wants/oracle-tfa.service to /etc/systemd/system/oracle-tfa.service.
Created symlink from /etc/systemd/system/graphical.target.wants/oracle-tfa.service to /etc/systemd/system/oracle-tfa.service.
Waiting up to 100 seconds for TFA to be started..
. . . . . 
Successfully started TFA Process..
. . . . . 
TFA Started and listening for commands

Enabling Access for Non-root Users on rac12sbnode1...

root@rac12sbnode2's password: 
Removed SSH configuration on rac12sbnode2...

.--------------------------------------------------------------------.
| Host         | TFA Version | TFA Build ID         | Upgrade Status |
+--------------+-------------+----------------------+----------------+
| rac12sbnode1 |  12.1.2.8.0 | 12128020160623222219 | UPGRADED       |
| rac12sbnode2 |  12.1.2.8.0 | 12128020160623222219 | UPGRADED       |
'--------------+-------------+----------------------+----------------'

[root@rac12sbnode1 patches]# 

This has upgraded TFA in one easy step.

Support Tools Bundle Missing with stock-TFA

If you read MOS 1513912.2 carefully, you undoubtedly spotted that beginning with TFA 12.1.2.3.0 the RAC and DB Support Tools Bundle is included with TFA, alongside some other very useful utilities. But you only get them after deploying TFA from MOS. Here is the list as shown post-patch:

[oracle@rac12sbnode1 ~]$ tfactl toolstatus
.-------------------------------------------.
|           External Support Tools          |
+--------------+--------------+-------------+
| Host         | Tool         | Status      |
+--------------+--------------+-------------+
| rac12sbnode1 | alertsummary | DEPLOYED    |
| rac12sbnode1 | exachk       | DEPLOYED    |
| rac12sbnode1 | ls           | DEPLOYED    |
| rac12sbnode1 | pstack       | DEPLOYED    |
| rac12sbnode1 | orachk       | DEPLOYED    |
| rac12sbnode1 | sqlt         | DEPLOYED    |
| rac12sbnode1 | grep         | DEPLOYED    |
| rac12sbnode1 | summary      | DEPLOYED    |
| rac12sbnode1 | prw          | NOT RUNNING |
| rac12sbnode1 | vi           | DEPLOYED    |
| rac12sbnode1 | tail         | DEPLOYED    |
| rac12sbnode1 | param        | DEPLOYED    |
| rac12sbnode1 | dbglevel     | DEPLOYED    |
| rac12sbnode1 | darda        | DEPLOYED    |
| rac12sbnode1 | history      | DEPLOYED    |
| rac12sbnode1 | oratop       | DEPLOYED    |
| rac12sbnode1 | oswbb        | RUNNING     |
| rac12sbnode1 | dbperf       | RUNNING     |
| rac12sbnode1 | changes      | DEPLOYED    |
| rac12sbnode1 | events       | DEPLOYED    |
| rac12sbnode1 | ps           | DEPLOYED    |
| rac12sbnode1 | srdc         | DEPLOYED    |
'--------------+--------------+-------------'

The stock-version, although it gets patched with the proactive bundle patch, does not include them. I ran this command before applying the TFA patch, but after having applied the proactive bundle patch to my cluster:

[oracle@rac12sbnode1 ~]$ tfactl toolstatus
.------------------------.
| External Support Tools |
+-------+-------+--------+
| Host  | Tool  | Status |
+-------+-------+--------+
'-------+-------+--------'

This is actually a feature, not a bug, as documented in MOS 2054786.1. The note states quite clearly that the RAC and DB Support Tools bundle is only installed if you deploy the MOS version. I just did that; I am good.

TFA Tools

I really love the idea of having these tools availble. The TFA user guide, also available from MOS 1513912.2 (tab “Users Guide”) explains from page 39 onwards how to use them.

For example-

[oracle@rac12sbnode1 ~]$ tfactl oratop -h
Usage : /u01/app/12.1.0.2/grid/tfa/rac12sbnode1/tfa_home/bin/tfactl.pl oratop
         -database <dbname> <Oratop Options> <logon>

Options: 
-database <dbname> Database name to run oratop
<logon> : defalut will be / as sysdba. Specify a different user using
          {username[/password][@connect_identifier] | / }
          [AS {SYSDBA|SYSOPER}]
          connect_identifier: host[:port]/[service_name]
<Oratop Options>:
-k : FILE#:BLOCK#, section 4 lt is (EVENT/LATCH)
-m : MODULE/ACTION, section 4 (default is USERNAME/PROGRAM)
-s : SQL mode, section 4 (default is process mode)
-c : database service mode (default is connect string)
-f : detailed format, 132 columns (default: standard, 80 columns)
-b : batch mode (default is text-based user interface)
-n : maximum number of iterations (requires number)
-i : interval delay, requires value in seconds (default: 5s)

e.g:
   /u01/app/12.1.0.2/grid/tfa/rac12sbnode1/tfa_home/bin/tfactl.pl oratop -database testdb1
   /u01/app/12.1.0.2/grid/tfa/rac12sbnode1/tfa_home/bin/tfactl.pl oratop -database testdb1 -bn1


Makes for interesting output, this example is from a system running the Swingbench Order Entry benchmark on an overloaded set of VMs:


Oracle 12c - Primary NCDB   11:19:31 up:  14h,   2 ins,    5 sn,   1 us, 2.7G mt,    4% fra,   0 er,                      93.3% db
ID %CPU LOAD %DCU   AAS  ASC  ASI  ASW  ASP  AST  UST MBPS IOPS IORL LOGR PHYR PHYW  %FR   PGA TEMP UTPS UCPS SSRT DCTR DWTR  %DBT
 1   50    2   16   2.2    0    1    2    0    3    3  222  108   1m  25k  23k   25    6  373M    0   12   57  12m   14   85  59.2
 2   31    2    9   1.5    1    2    0    0    2    2  105  104 876u  19k  18k   14    5  385M    0    7   42  10m   13   86  40.8

EVENT (C)                                                         TOTAL WAITS   TIME(s)  AVG_MS  PCT                    WAIT_CLASS
DB CPU                                                                             1636           39                              
log file parallel write                                                 49704       849    17.2   20                    System I/O
control file sequential read                                           344732       687     2.0   16                    System I/O
log file switch (checkpoint incomplete)                                   319       612  1739.7   15                 Configuration
log file sync                                                            6328       404    63.2   10                        Commit

ID   SID     SPID USERNAME  PROGRAM    SRV  SERVICE  PGA  SQLID/BLOCKER OPN  E/T  STA  STE  WAIT_CLASS  EVENT/*LATCH           W/T
 2   249     9949 B/G       LG01       DED  SYS$BAC 1.4M                     14h  ACT  WAI  System I/O  log file parallel wri  49m
 1    11    25634 B/G       DBW0       DED  SYS$BAC  12M                     14h  ACT  WAI  System I/O  db file parallel writ  42m
 2   248     9945 B/G       LG00       DED  SYS$BAC 1.4M                     14h  ACT  WAI  System I/O  log file parallel wri  31m
 1   247    25636 B/G       LGWR       DED  SYS$BAC 1.7M                     14h  ACT  WAI  System I/O  *test excl. non-paren  24m
 1    22     6342 SOE       JDBC Thin  DED  NCDB    3.2M                       0  ACT  WAI  Commit      log file sync          12m
 2   247     9941 B/G       LGWR       DED  SYS$BAC 1.6M                     14h  ACT  WAI  Other       LGWR any worker group  12m
 1   265     6344 SOE       JDBC Thin  DED  NCDB    5.3M  7ws837zynp1zv SEL    0  ACT  I/O  User I/O    direct path read        5m
 1   257     6340 SOE       JDBC Thin  DED  NCDB    5.5M  8zz6y2yzdqjp0 SEL    0  ACT  CPU  User I/O    cpu runqueue            4m
 2    46    11208 SOE       JDBC Thin  DED  NCDB    6.4M  7ws837zynp1zv SEL    0  ACT  I/O  User I/O    direct path read        3m
 2   274    11206 SOE       JDBC Thin  DED  NCDB    5.4M  7ws837zynp1zv SEL    0  ACT  I/O  User I/O    direct path read        2m

Nice little overview :) But I’m digressing…

OSWatcher

There are two ways for TFA to access OSWatcher information:

  1. Using the analyze command to provide a summary view
  2. Invoking OSWatcher directly

The first option provides a nice overview. I’ve been running swingbench on the system with far too many users, which you can see here:

[oracle@rac12sbnode1 ~]$ tfactl analyze -comp osw -since 1h
INFO: analyzing host: rac12sbnode1

                     Report title: OSW top logs
                Report date range: last ~1 hour(s)
       Report (default) time zone: GMT - Greenwich Mean Time
              Analysis started at: 02-Aug-2016 03:27:14 PM BST
            Elapsed analysis time: 0 second(s).
               Configuration file: /u01/app/12.1.0.2/grid/tfa/rac12sbnode1/tfa_home/ext/tnt/conf/tnt.prop
              Configuration group: osw
                        Parameter: 
              Total osw rec count:            174, from 02-Aug-2016 02:00:16 PM BST to 02-Aug-2016 03:26:54 PM BST
OSW recs matching last ~1 hour(s):            120, from 02-Aug-2016 02:27:19 PM BST to 02-Aug-2016 03:26:54 PM BST
                        statistic: t     first   highest   (time)   lowest   (time)  average  non zero  3rd last  2nd last      last  trend
            top.loadavg.last01min:        3.91      5.48 @02:32PM     2.08 @02:46PM     3.42       103      3.21      2.91      2.64   -32%
            top.loadavg.last05min:        3.10      3.91 @02:32PM     3.05 @02:53PM     3.42       103      3.54      3.43      3.32     7%
            top.loadavg.last15min:        3.13      3.59 @03:22PM     3.13 @02:27PM     3.37       103      3.53      3.49      3.45    10%
                top.tasks.running:           5         9 @02:34PM        1 @02:27PM        3       119         2         1         2   -60%
               top.tasks.sleeping:         316       325 @02:40PM      315 @02:49PM      320       119       320       321       320     1%

INFO: analyzing host: rac12sbnode2

                     Report title: OSW top logs
                Report date range: last ~1 hour(s)
       Report (default) time zone: GMT - Greenwich Mean Time
              Analysis started at: 02-Aug-2016 03:27:15 PM BST
            Elapsed analysis time: 0 second(s).
               Configuration file: /u01/app/12.1.0.2/grid/tfa/rac12sbnode2/tfa_home/ext/tnt/conf/tnt.prop
              Configuration group: osw
                        Parameter: 
              Total osw rec count:            174, from 02-Aug-2016 02:00:14 PM BST to 02-Aug-2016 03:26:52 PM BST
OSW recs matching last ~1 hour(s):            120, from 02-Aug-2016 02:27:16 PM BST to 02-Aug-2016 03:26:52 PM BST
                        statistic: t     first   highest   (time)   lowest   (time)  average  non zero  3rd last  2nd last      last  trend
            top.loadavg.last01min:        2.75      5.73 @02:40PM     2.23 @02:52PM     3.44       111      3.91      4.31      3.88    41%
            top.loadavg.last05min:        2.78      4.16 @02:41PM     2.78 @02:27PM     3.40       111      3.52      3.65      3.67    32%
            top.loadavg.last15min:        2.93      3.60 @03:13PM     2.93 @02:27PM     3.32       111      3.49      3.53      3.55    21%
                top.tasks.running:           2         8 @03:10PM        1 @03:00PM        3       120         2         2         1   -50%

[oracle@rac12sbnode1 ~]$

As you can imagine the system is somewhat overloaded. The minimum interval to report on seems to be 1 hour:

[oracle@rac12sbnode1 ~]$ tfactl analyze -comp osw -since 5m

ERROR: Invalid value for -since. Supported values are n<h|d>

The analyze command can do a lot more, make sure to have a look at the documentation to find out more.

But you can run OSWatcher directly as well:

[oracle@rac12sbnode1 ~]$ tfactl oswbb -h

Usage : /u01/app/12.1.0.2/grid/tfa/rac12sbnode1/tfa_home/bin/tfactl.pl oswbb \
          [<OSWatcher Analyzer Options> | -since n[mhd] ]

Options: 

-since n[mhd] Run OSWatcher analyzer for last n [m]inutes or [h]ours or [d]ays.

<OSWatcher Analyzer Options>: -P <name> -L <name> -6 -7 -8 -B <time> -E <time> -A 
     -P <profile name>  User specified name of the html profile generated
                        by oswbba. This overrides the oswbba automatic naming
                        convention for html profiles. All profiles
                        whether user specified named or auto generated
                        named will be located in the /profile directory.

     -A <analysis name> Same as option A from the menu. Will generate
                        an analysis report in the /analysis directory or
                        user can also specify the name of the analysis file
                        by specifying full qualified path name of file.
                        The "A" option can not be used together with the
                        "S" option.
     -S <>              Will generate an analysis of a subset of the data
                        in the archive directory. This option must be used
                        together with the -b and -e options below. See the
                        section "Specifying the begin/end time of the analysis"
                        above. The "S" option can not be used together with
                        the "A" option.

     -START <filename>  Used with the analysis option to specify the first
                        file located in the oswvmstat directory to analyze.

     -STOP <filename>   Used with the analysis option to specify the last
                        file located in the oswvmstat directory to analyze.

     -b <begin time>    Used with the -S option to specify the begin time
                        of the analysis period. Example format:
                        -b Jan 09 13:00:00 2013

     -e <end time>      Used with the -S option to specify the end time
                        of the analysis period. Example format:
                        -e Jan 09 13:15:00 2013

     -L <location name> User specified location of an existing directory
                        to place any gif files generated
                        by oswbba. This overrides the oswbba automatic
                        convention for placing all gif files in the
                        /gif directory. This directory must pre-exist!
     -6                 Same as option 6 from the menu. Will generate
                        all cpu gif files.


     -7                 Same as option 7 from the menu. Will generate
                        all memory gif files.

     -8                 Same as option 8 from the menu. Will generate
                        all disk gif files.



     -NO_IOSTAT         Ignores files in the oswiostat directory from
                        analysis

     -NO_TOP            Ignores files in the oswtop directory from
                        analysis

     -NO_NETSTAT        Ignores files in the oswnetstat directory from
                        analysis

     -NO_PS             Ignores files in the oswps directory from
                        analysis

     -MEM_ALL           Analyzes virtual and resident memory allocations
                        for all processes. This is very resource intensive.

     -NO_Linux          Ignores files in the oswmeminfo directory from
                        analysis

e.g:
   /u01/app/12.1.0.2/grid/tfa/rac12sbnode1/tfa_home/bin/tfactl.pl oswbb
   /u01/app/12.1.0.2/grid/tfa/rac12sbnode1/tfa_home/bin/tfactl.pl oswbb -since 2h

[oracle@rac12sbnode1 ~]$

Those look quite similar to the ones I have shown you in my previous post about OSWatcher, so I won’t go into detail. Here is an example, note how I can specify the last 10 minutes:

[oracle@rac12sbnode1 ~]$ tfactl oswbb -since 10m

Validating times in the archive...

Warning. The end date you entered is not contained in the archive directory
The end date you entered is:     Tue Aug 02 15:39:06 BST 2016
The last date in the archive is: Tue Aug 02 15:38:55 BST 2016
Defaulting to using the last date in the archive

Scanning file headers for version and platform info...


Parsing file rac12sbnode1_iostat_16.08.02.1500.dat ...


Parsing file rac12sbnode1_vmstat_16.08.02.1500.dat ...




Parsing file rac12sbnode1_top_16.08.02.1500.dat ...


Parsing file rac12sbnode1_ps_16.08.02.1500.dat ...

...

After the analysis has completed, the report is opened in a pager and shown.

This report is best viewed in a fixed font editor like textpad...

OSWatcher Analyzer

Input Archive:       /u01/app/oracle/tfa/repository/suptools/rac12sbnode1/oswbb/oracle/archive
Archive Source Dest: /u01/app/oracle/tfa/repository/suptools/rac12sbnode1/oswbb/oracle/archive
Archive Start Time:  Aug 2 15:28:54 2016
Archive Stop Time:   Aug 2 15:38:55 2016
Hostname:            RAC12SBNODE1
OS Version:          Linux
Snapshot Freq:       30
CPU COUNT:           2

############################################################################
# Contents Of This Report:
#
# Section 1: System Status
# Section 2: System Slowdowns 
#   Section 2.1: System Slowdown RCA Process Level Ordered By Impact
# Section 3: System General Findings
# Section 4: CPU Detailed Findings
#   Section 4.1: CPU Run Queue:
#   Section 4.2: CPU Utilization: Percent Busy
#   Section 4.3: CPU Utilization: Percent Sys
# Section 5: Memory Detailed Findings
#   Section 5.1: Memory: Process Swap Queue 
#   Section 5.2: Memory: Scan Rate 
#   Section 5.3  Memory: Page In: 
#   Section 5.4  Memory: Page Tables (Linux only): 
#   Section 5.5: Top 5 Memory Consuming Processes Beginning
#   Section 5.6: Top 5 Memory Consuming Processes Ending
# Section 6: Disk Detailed Findings
#   Section 6.1: Disk Percent Utilization Findings
#   Section 6.2: Disk Service Times Findings
#   Section 6.3: Disk Wait Queue Times Findings
#   Section 6.4: Disk Throughput Findings
#   Section 6.5: Disk Reads Per Second
#   Section 6.6: Disk Writes Per Second

...

Summary

TFA really is a very useful tool, and this is not only due to the integration of OSWatcher. A lot of useful information that is beyond the scope of this article is available, and the search function is quite invaluable when trying to hunt down problems in your cluster. Maybe I’ll dedicate another post to that at some later time …

Posted in Linux | Tagged: , | 4 Comments »

Preventing a SPOF with CMAN 12c

Posted by Martin Bach on July 25, 2016

I have already written about the use of Connection Manager 11.2 to govern access to a database. While researching this piece I have updated the original post so that it’s relevant for 12c as well.

Although the idea of using Connection Manager is tempting, the obvious first question is about high availability of the process, you don’t want it to become a Single Point of Failure (SPOF). After all, if the gatekeeper to your database environment fails, you are effectively closed for business. One option would be to add another CMAN process that could be used.

Lab Setup

I have created a few VMs to simulate the scenario. I installed CMAN in 2 VMs running Oracle Linux 7 and CMAN 12c. The client homes have been patched up to the April PSU because that’s considered good practice (although I couldn’t find a reference to CMAN in the OPatch logs). The hosts are named oracledev and server4.

I have a 4 node RAC 12c database named NCDB to which the CMAN rules provide access. The SCAN is named rac12pri-scan.

And finally I have an Oracle 12c installation on server3 which will act as the client.

CMAN configuration

The CMAN configuration is quite straight forward. Following my earlier post’s instructions I created the following $ORACLE_HOME/network/cman.ora on oracledev:

[oracle@oracledev ~]$ cat $ORACLE_HOME/network/admin/cman.ora
cman1 =
  (configuration=
    (address=(protocol=tcp)(host=oracledev)(port=1521))
    (rule_list=
      (rule=(src=oracledev)(dst=127.0.0.1)(srv=cmon)(act=accept))
      (rule=(src=*)(dst=rac12pri-scan)(srv=NCDB)(act=accept))
    )
  )

Please note that the configuration file has been kept at its bare minimum to keep the discussion simple. You should review the net services reference guide for a list of available parameters and how to secure the system.

The configuration on host server4 is identical except that the CMAN host has changed. Using that information I can start the CMAN process:

[oracle@server4 ~]$ cmctl startup -c cman1

CMCTL for Linux: Version 12.1.0.2.0 - Production on 06-JUL-2016 13:38:32

Copyright (c) 1996, 2014, Oracle.  All rights reserved.

Current instance cman1 is not yet started
Connecting to (address=(protocol=tcp)(host=server4)(port=1521))
Starting Oracle Connection Manager instance cman1. Please wait...
TNS-04077: WARNING: No password set for the Oracle Connection Manager instance.
CMAN for Linux: Version 12.1.0.2.0 - Production
Status of the Instance
----------------------
Instance name             cman1
Version                   CMAN for Linux: Version 12.1.0.2.0 - Production
Start date                06-JUL-2016 13:38:32
Uptime                    0 days 0 hr. 0 min. 9 sec
Num of gateways started   2
Average Load level        0
Log Level                 SUPPORT
Trace Level               OFF
Instance Config file      /u01/app/oracle/product/12.1.0.2/client_1/network/admin/cman.ora
Instance Log directory    /u01/app/oracle/diag/netcman/server4/cman1/alert
Instance Trace directory  /u01/app/oracle/diag/netcman/server4/cman1/trace
The command completed successfully.
[oracle@server4 ~]$ 

I can start the second one the same way.

Client configuration

So this is all fair and good, but how can I configure the client to connect to the database? I would like two things to happen:

  • Connect time load balancing. In other words, don’t hammer just one of the two CMAN processes with incoming connection requests while the second one is idling around
  • Ensure that the alternative CMAN is tried in case one of the two systems dies

Things seemed a bit more complicated because I have to use SOURCE_ROUTE to indicate to the client that a connection manager process is involved. For instance, if I have a single CMAN process, I would use the following entry in tnsnames.ora:

NCDB_1CMAN =
 (description=
   (address_list=
     (source_route=yes)
     (address=(protocol=tcp)(host=server4)(port=1521))
     (address=(protocol=tcp)(host=rac12pri-scan)(port=1521))
   )
   (connect_data=
    (service_name=NCDB)
   )
 )

Relying on connection manager on “server4” I connect through to the NCDB database on my RAC system.

[oracle@server3 tns]$ sqlplus martin@NCDB_1CMAN

SQL*Plus: Release 12.1.0.2.0 Production on Wed Jul 6 13:44:46 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Enter password: 
Last Successful login time: Wed Jul 06 2016 13:44:34 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL> select count(*) from v$active_instances;

  COUNT(*)
----------
         4

But what happens if server4 is not available? Well you guessed right:

[oracle@server3 tns]$ sqlplus martin@NCDB_1CMAN

SQL*Plus: Release 12.1.0.2.0 Production on Wed Jul 6 13:52:51 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Enter password: 
ERROR:
ORA-12541: TNS:no listener

Enter user-name: 

This is the single point of failure I would like to prevent.

Using Two CMAN processes in the tnsnames.ora file

After a bit of digging around on My Oracle Support I gave up and used my search engine skills, and that proved more effective. The Oracle Database Net Services References has the correct example in chapter 6, which looks as follows after I applied it to my environment:

NCDB_2CMANS =
  (description =
    (load_balance = on)
    (failover = on)
    (address_list =                   # first CMAN on server4
      (source_route = yes)
      (address = (protocol = tcp)(host = server4)(port = 1521))
      (address=(protocol=tcp)(host=rac12pri-scan)(port=1521))
    )
    (address_list =                   # second CMAN on oracledev
      (source_route = yes)
      (address = (protocol = tcp)(host = oracledev)(port = 1521))
      (address=(protocol=tcp)(host=rac12pri-scan)(port=1521))
    )
    (connect_data =
      (service_name = NCDB)
      (server = dedicated)
    )
  )

You read this as follows:

  • The service name entry starts with the DESCRIPTION tag.
  • On the top level LOAD_BALANCE and FAILOVER are enabled. This allows client side load balancing plus failover to the next CMAN entry
  • Each CMAN host has its own ADDRESS_LIST. Essentially it’s the NCDB_1CMAN entry with the SOURCE_ROUTE, CMAN host and SCAN
  • There is nothing different in the CONNECT_DATA section compared with the previous, single CMAN-entry

The Result

As a result, I can connect to the database now – I haven’t got a CMAN process on server4 but there is one on oracledev:

[oracle@server4 ~]$ cmctl show status -c cman1

CMCTL for Linux: Version 12.1.0.2.0 - Production on 06-JUL-2016 14:17:16

Copyright (c) 1996, 2014, Oracle.  All rights reserved.

Current instance cman1 is not yet started
Connecting to (address=(protocol=tcp)(host=server4)(port=1521))
TNS-04011: Oracle Connection Manager instance not yet started.
[oracle@server4 ~]$ 

[oracle@oracledev ~]$ cmctl show status -c cman1

CMCTL for Linux: Version 12.1.0.2.0 - Production on 06-JUL-2016 14:17:45

Copyright (c) 1996, 2014, Oracle.  All rights reserved.

TNS-04077: WARNING: No password set for the Oracle Connection Manager instance.
Current instance cman1 is already started
Connecting to (address=(protocol=tcp)(host=oracledev)(port=1521))
Status of the Instance
----------------------
Instance name             cman1
Version                   CMAN for Linux: Version 12.1.0.2.0 - Production
Start date                06-JUL-2016 13:18:19
Uptime                    0 days 0 hr. 59 min. 26 sec
Num of gateways started   2
Average Load level        0
Log Level                 SUPPORT
Trace Level               OFF
Instance Config file      /u01/app/oracle/product/12.1.0.2/client_1/network/admin/cman.ora
Instance Log directory    /u01/app/oracle/diag/netcman/oracledev/cman1/alert
Instance Trace directory  /u01/app/oracle/diag/netcman/oracledev/cman1/trace
The command completed successfully.
[oracle@oracledev ~]$ 

A connection can be established:

[oracle@server3 tns]$ sqlplus martin@NCDB_2CMANS

SQL*Plus: Release 12.1.0.2.0 Production on Wed Jul 6 14:18:13 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Enter password: 
Last Successful login time: Wed Jul 06 2016 13:44:49 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL> select count(*) from v$active_instances;

  COUNT(*)
----------
         4

SQL>

Using an approach like this allows me to connect to the database even if one of the two CMAN systems are down. But what about load balancing? This appears to work as well, tailing the cman1.log file I can see that out of 10 sessions I requested in a bash for-loop they have been evenly spread across both CMAN hosts:

-- server4

06-JUL-2016 13:21:49 * (connect_data=(service_name=NCDB)(server=dedicated)(CID=(PROGRAM=sqlplus)
  (HOST=server3.example.com)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.53)(PORT=19796)) * establish * NCDB * 0
06-JUL-2016 13:21:49 * (connect_data=(service_name=NCDB)(server=dedicated)(CID=(PROGRAM=sqlplus)
  (HOST=server3.example.com)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.53)(PORT=19798)) * establish * NCDB * 0
06-JUL-2016 13:21:50 * (connect_data=(service_name=NCDB)(server=dedicated)(CID=(PROGRAM=sqlplus)
  (HOST=server3.example.com)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.53)(PORT=19800)) * establish * NCDB * 0
06-JUL-2016 13:21:50 * (connect_data=(service_name=NCDB)(server=dedicated)(CID=(PROGRAM=sqlplus)
  (HOST=server3.example.com)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.53)(PORT=19801)) * establish * NCDB * 0
06-JUL-2016 13:21:50 * (connect_data=(service_name=NCDB)(server=dedicated)(CID=(PROGRAM=sqlplus)
  (HOST=server3.example.com)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.53)(PORT=19802)) * establish * NCDB * 0

-- oracledev

06-JUL-2016 13:21:50 * (connect_data=(service_name=NCDB)(server=dedicated)(CID=(PROGRAM=sqlplus)
  (HOST=server3.example.com)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.53)(PORT=22637)) * establish * NCDB * 0
06-JUL-2016 13:21:50 * (connect_data=(service_name=NCDB)(server=dedicated)(CID=(PROGRAM=sqlplus)
  (HOST=server3.example.com)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.53)(PORT=22639)) * establish * NCDB * 0
06-JUL-2016 13:21:50 * (connect_data=(service_name=NCDB)(server=dedicated)(CID=(PROGRAM=sqlplus)
  (HOST=server3.example.com)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.53)(PORT=22643)) * establish * NCDB * 0
06-JUL-2016 13:21:51 * (connect_data=(service_name=NCDB)(server=dedicated)(CID=(PROGRAM=sqlplus)
  (HOST=server3.example.com)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.53)(PORT=22644)) * establish * NCDB * 0
06-JUL-2016 13:21:51 * (connect_data=(service_name=NCDB)(server=dedicated)(CID=(PROGRAM=sqlplus)
  (HOST=server3.example.com)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.53)(PORT=22645)) * establish * NCDB * 0

So both boxes ticked.

Posted in 12c Release 1, Linux | Tagged: , | Leave a Comment »

Tales from the field: potential reasons for PDB plug-in violations part 2

Posted by Martin Bach on July 14, 2016

In the first part of this little series I explained how a missing (default) tablespace in a PDB can have an impact when opening it. In this instalment I’ll add another common cause for PDBs not opening in read-write mode: different patch levels between the PDB to be plugged in and the CDB.

Scenario

Assume for a moment that you upgraded Oracle binaries from 12.1.0.2.5 to 12.1.0.2.160419. In other words, you applied the most current PSU + OJVM patch at the time of writing to your database. Once complete, you would see output similar to this for a single instance Oracle 12c installation without Oracle Restart:

[oracle@server2 ~]$ opatch lspatches
22674709;Database PSU 12.1.0.2.160419, Oracle JavaVM Component (Apr2016)
22291127;Database Patch Set Update : 12.1.0.2.160419 (22291127)

OPatch succeeded.

I am using database CDB1 in this blog post to indicate the patched CDB:

SYS@CDB$ROOT-CDB1> select patch_id,action,status,description
  2  from dba_registry_sqlpatch;

  PATCH_ID ACTION          STATUS
---------- --------------- ---------------
DESCRIPTION
--------------------------------------------------------------------------------
  21555660 APPLY           SUCCESS
Database PSU 12.1.0.2.5, Oracle JavaVM Component (Oct2015)

  21359755 APPLY           SUCCESS
Database Patch Set Update : 12.1.0.2.5 (21359755)

  21555660 ROLLBACK        SUCCESS
Database PSU 12.1.0.2.5, Oracle JavaVM Component (Oct2015)

  22674709 APPLY           SUCCESS
Database PSU 12.1.0.2.160419, Oracle JavaVM Component (Apr2016)

  22291127 APPLY           SUCCESS
Database Patch Set Update : 12.1.0.2.160419 (22291127)

NB: I double-checked twice and the above output seems correct: there is no rollback of the 12.1.0.2.5 DB PSU part prior to the installation of the new one.

Let’s also assume that you would like to plug a PDB into the newly patched CDB. The PDB to be unplugged from the other CDB (named CDB2) has a lower patch level:

SYS@CDB$ROOT-CDB2> select patch_id,action,status,description
  2  from dba_registry_sqlpatch;

  PATCH_ID ACTION          STATUS
---------- --------------- ---------------
DESCRIPTION
--------------------------------------------------------------------------------
  21555660 APPLY           SUCCESS
Database PSU 12.1.0.2.5, Oracle JavaVM Component (Oct2015)

  21359755 APPLY           SUCCESS
Database Patch Set Update : 12.1.0.2.5 (21359755)

Plugging in

The steps needed to plug the PDB into its new home have been discussed so many times that I have decided against showing them again. Please refer to the official documentation set (Database Administrator’s Guide) for all the detail. The PDB I would like to plug in to CDB1 is named PLUGINTEST.

SYS@CDB$ROOT-CDB1> create pluggable database plugintest 
  2  using '/home/oracle/plugintest.xml' nocopy;

Pluggable database created.

SYS@CDB$ROOT-CDB1> alter pluggable database plugintest open;

Warning: PDB altered with errors.

With the information I provided previously you can probably guess what’s wrong, but if you are a production DBA who has been tasked to “plug in a PDB” you might not have the background knowledge about patch levels of all the databases in the estate. How can you find out what went wrong? First stop is the alert.log:

alter pluggable database plugintest open
Thu Jul 07 11:02:30 2016
Pluggable database PLUGINTEST dictionary check beginning
Pluggable Database PLUGINTEST Dictionary check complete
Database Characterset for PLUGINTEST is WE8MSWIN1252
***************************************************************
WARNING: Pluggable Database PLUGINTEST with pdb id - 5 is
         altered with errors or warnings. Please look into
         PDB_PLUG_IN_VIOLATIONS view for more details.
***************************************************************
Opening pdb PLUGINTEST (5) with no Resource Manager plan active
Pluggable database PLUGINTEST opened read write
Completed: alter pluggable database plugintest open

This sounds a bit tragic, but not all is lost. The primary issue is that the PDB is opened in restricted mode, which is not terribly useful for ordinary users:

SYS@CDB$ROOT-CDB1> select con_id, name, open_mode, restricted
  2* from v$pdbs where name = 'PLUGINTEST'

    CON_ID NAME                           OPEN_MODE  RES
---------- ------------------------------ ---------- ---
         5 PLUGINTEST                     READ WRITE YES

SYS@CDB$ROOT-CDB1> conn user1/user1@localhost/plugintest
ERROR:
ORA-01035: ORACLE only available to users with RESTRICTED SESSION privilege


Warning: You are no longer connected to ORACLE.

The detail is – again – in PDB_PLUG_IN_VIOLATIONS:

SYS@CDB$ROOT-CDB1> select cause, type, message
  2  from PDB_PLUG_IN_VIOLATIONS
  3  where name = 'PLUGINTEST';

CAUSE                                                            TYPE
---------------------------------------------------------------- ---------
MESSAGE
--------------------------------------------------------------------------------
SQL Patch                                                        ERROR
PSU bundle patch 160419 (Database Patch Set Update : 12.1.0.2.160419 (22291127))
: Installed in the CDB but not in the PDB.

SQL Patch                                                        ERROR
PSU bundle patch 5 (Database Patch Set Update : 12.1.0.2.5 (21359755)): Installe
d in the PDB but not in the CDB.

SQL Patch                                                        ERROR
SQL patch ID/UID 21555660/19361790 (Database PSU 12.1.0.2.5, Oracle JavaVM Compo
nent (Oct2015)): Installed in the PDB but not in the CDB.

SQL Patch                                                        ERROR
SQL patch ID/UID 22674709/20057886 (Database PSU 12.1.0.2.160419, Oracle JavaVM
Component (Apr2016)): Installed in the CDB but not in the PDB.

Although the formatting is a bit of a pain to the eye, it should become clear that there are different patch levels preventing the PDB from opening read-write. If the number of options were different between the system a similar error would be thrown. But how does Oracle know about all that? It’s encoded in the XML file that you create when you unplug the PDB. In my case, here are the options used:

    <options>
      <option>APS=12.1.0.2.0</option>
      <option>CATALOG=12.1.0.2.0</option>
      <option>CATJAVA=12.1.0.2.0</option>
      <option>CATPROC=12.1.0.2.0</option>
      <option>CONTEXT=12.1.0.2.0</option>
      <option>DV=12.1.0.2.0</option>
      <option>JAVAVM=12.1.0.2.0</option>
      <option>OLS=12.1.0.2.0</option>
      <option>ORDIM=12.1.0.2.0</option>
      <option>OWM=12.1.0.2.0</option>
      <option>SDO=12.1.0.2.0</option>
      <option>XDB=12.1.0.2.0</option>
      <option>XML=12.1.0.2.0</option>
      <option>XOQ=12.1.0.2.0</option>
    </options>

Likewise, patches are recorded in the XML file, too:

    <sqlpatches>
      <sqlpatch>PSU bundle patch 5 (Database Patch Set Update : 12.1.0.2.5 (21359755)): 
                 APPLY SUCCESS</sqlpatch>
      <sqlpatch>SQL patch ID/UID 21555660/19361790 (Database PSU 12.1.0.2.5, Oracle JavaVM 
                 Component (Oct2015)): APPLY SUCCESS</sqlpatch>
    </sqlpatches>

Corrective Action

The suggested action is conveniently presented as well.

SYS@CDB$ROOT-CDB1> select cause, type, action from pdb_plug_in_violations where name = 'PLUGINTEST';

CAUSE      TYPE      ACTION
---------- --------- ------------------------------------------------------------
SQL Patch  ERROR     Call datapatch to install in the PDB or the CDB
SQL Patch  ERROR     Call datapatch to install in the PDB or the CDB
SQL Patch  ERROR     Call datapatch to install in the PDB or the CDB
SQL Patch  ERROR     Call datapatch to install in the PDB or the CDB

But you don’t follow blindly what you read, do you ;) So I headed over to the official documentation set and My Oracle Support. I found MOS note 1585822.1 most useful; it explains a similar situation (PDB was closed while datapatch ran) to the one I am seeing. So I decided to try running datapatch again.

SYS@CDB$ROOT-CDB1> alter pluggable database plugintest close immediate;

Pluggable database altered.

SYS@CDB$ROOT-CDB1> alter pluggable database plugintest open upgrade;

Pluggable database altered.

SYS@CDB$ROOT-CDB1>  exit
Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

[oracle@server2 ~]$ cd $ORACLE_HOME/OPatch
[oracle@server2 OPatch]$ ./datapatch -verbose
SQL Patching tool version 12.1.0.2.0 on Thu Jul  7 11:20:00 2016
Copyright (c) 2015, Oracle.  All rights reserved.

Log file for this invocation: /u01/app/oracle/cfgtoollogs/sqlpatch/sqlpatch_15280_2016_07_07_11_20_00/sqlpatch_invocation.log

Connecting to database...OK
Note:  Datapatch will only apply or rollback SQL fixes for PDBs
       that are in an open state, no patches will be applied to closed PDBs.
       Please refer to Note: Datapatch: Database 12c Post Patch SQL Automation
       (Doc ID 1585822.1)
Bootstrapping registry and package to current versions...done
Determining current state...done

Current state of SQL patches:
Patch 21555660 (Database PSU 12.1.0.2.5, Oracle JavaVM Component (Oct2015)):
  Installed in PLUGINTEST only
Patch 22674709 (Database PSU 12.1.0.2.160419, Oracle JavaVM Component (Apr2016)):
  Installed in binary and CDB$ROOT PDB$SEED SWINGBENCH1
Bundle series PSU:
  ID 160419 in the binary registry and ID 160419 in PDB CDB$ROOT, ID 160419 in PDB PDB$SEED, 
  ID 160419 in PDB SWINGBENCH1, ID 5 in PDB PLUGINTEST

Adding patches to installation queue and performing prereq checks...
Installation queue:
  For the following PDBs: CDB$ROOT PDB$SEED SWINGBENCH1
    Nothing to roll back
    Nothing to apply
  For the following PDBs: PLUGINTEST
    The following patches will be rolled back:
      21555660 (Database PSU 12.1.0.2.5, Oracle JavaVM Component (Oct2015))
    The following patches will be applied:
      22674709 (Database PSU 12.1.0.2.160419, Oracle JavaVM Component (Apr2016))
      22291127 (Database Patch Set Update : 12.1.0.2.160419 (22291127))

Installing patches...
Patch installation complete.  Total patches installed: 3

Validating logfiles...
Patch 21555660 rollback (pdb PLUGINTEST): SUCCESS
  logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/21555660/19361790/21555660_rollback_CDB1_PLUGINTEST_2016Jul07_11_20_35.log (no errors)
Patch 22674709 apply (pdb PLUGINTEST): SUCCESS
  logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/22674709/20057886/22674709_apply_CDB1_PLUGINTEST_2016Jul07_11_21_03.log (no errors)
Patch 22291127 apply (pdb PLUGINTEST): SUCCESS
  logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/22291127/19694308/22291127_apply_CDB1_PLUGINTEST_2016Jul07_11_21_04.log (no errors)
SQL Patching tool complete on Thu Jul  7 11:21:14 2016

This fixed my problem:

SYS@CDB$ROOT-CDB1> alter pluggable database plugintest close immediate;

Pluggable database altered.

SYS@CDB$ROOT-CDB1> alter pluggable database plugintest open;

Pluggable database altered.

SYS@CDB$ROOT-CDB1> select con_id, name, open_mode, restricted
  2 from v$pdbs where name = 'PLUGINTEST';

    CON_ID NAME                           OPEN_MODE  RES
---------- ------------------------------ ---------- ---
         5 PLUGINTEST                     READ WRITE NO

Interestingly the entries in PDB_PLUG_IN_VIOLATIONS do not disappear, but there is a timestamp in the view that should help you find out of the message is current or not.

Posted in 12c Release 1, Linux | Leave a Comment »

Tales from the field: potential reasons for PDB plug-in violations part 1

Posted by Martin Bach on June 29, 2016

Container Databases have been an area that I have researched intensively over the past years. With this post (and hopefully some others that follow) I would like to demonstrate some of the new situations the DBA might be confronted with. Please don’t use this post to give the new 12c architecture a hard time: standardised deployments (which I love) help you a lot. Not only do your DBA scripts work reliably everywhere, but the error condition I am showing in this post should be a lot less likely.

At the end of the post I’ll show an alternative approach using a standardised way of creating PDBs.

Environment

Setting the scene, my environment is as follows:

  • Oracle Linux 7.2 with UEK4 (4.1.12-37.5.1.el7uek.x86_64 GNU/Linux)
  • Single Instance databases CDB1 and CDB3
  • Database PSU 12.1.0.2.160419, Oracle JavaVM Component (Apr2016)
  • Database Patch Set Update : 12.1.0.2.160419 (22291127)
  • Oracle Managed Files (OMF) but no use of ASM

These aren’t in VMs for a change.

Scenario

The following steps lead me to the discovery of the plug-in violation. In CDB1 I created a common user, let’s call it c##martin for the sake of argument.

SQL> select sys_context('userenv','con_name') as con_name, name, cdb
  2  from v$database
  3  /

CON_NAME                       NAME      CDB
------------------------------ --------- ---
CDB$ROOT                       CDB1      YES

SQL> create user c##martin identified by xxxxxxxxxx 
  2  default tablespace tools  temporary tablespace temp
  3  quota unlimited on tools account unlock
  4  container = ALL;

User created.

So far nothing too exciting. Grant privileges as needed.

The Problem

Some time later I tried to create a PDB from the seed, and it failed:

SQL> create pluggable database pdb10 
  2  admin user pdbadmin identified by xxxxxxxxxx
  3  /

Pluggable database created.

SQL> alter pluggable database pdb10 open;

Warning: PDB altered with errors.

SQL> 

Altered with errors? The command I just typed is probably the simplest and most basic way to create a PDB, what could possibly go wrong with it? Fist stop is the alert.log, and this is what has been recorded:

2016-06-29 09:48:38.261000 +01:00
create pluggable database pdb10
admin user pdbadmin identified by *
 APEX_040200.WWV_FLOW_ADVISOR_CHECKS (CHECK_STATEMENT) - CLOB populated
2016-06-29 09:48:44.549000 +01:00
****************************************************************
Pluggable Database PDB10 with pdb id - 3 is created as UNUSABLE.
If any errors are encountered before the pdb is marked as NEW,
then the pdb must be dropped
****************************************************************
Database Characterset for PDB10 is WE8MSWIN1252
Deleting old file#5 from file$
Deleting old file#7 from file$
Adding new file#19 to file$(old file#5)
Adding new file#20 to file$(old file#7)
Successfully created internal service pdb10 at open
2016-06-29 09:48:45.629000 +01:00
ALTER SYSTEM: Flushing buffer cache inst=0 container=3 local
****************************************************************
Post plug operations are now complete.
Pluggable database PDB10 with pdb id - 3 is now marked as NEW.
****************************************************************
Completed: create pluggable database pdb10
admin user pdbadmin identified by *
2016-06-29 09:53:48.816000 +01:00
alter pluggable database pdb10 open
Pluggable database PDB10 dictionary check beginning
Pluggable Database PDB10 Dictionary check complete
Database Characterset for PDB10 is WE8MSWIN1252
2016-06-29 09:53:51.317000 +01:00
Opening pdb PDB10 (3) with no Resource Manager plan active
Pluggable database PDB10 opened read write
Completed: alter pluggable database pdb10 open

OK so there’s nothing in there. But wait-am I not trying to create a new PDB and plug it into the CDB? And aren’t problems in there recorded in a view? So let’s give that a try:

SQL> select cause, message from pdb_plug_in_violations where name = 'PDB10';

CAUSE
----------------------------------------------------------------
MESSAGE
--------------------------------------------------------------------------------
Sync Failure
Sync PDB failed with ORA-959 during 'create user c##martin identified by *defaul
t tablespace tools  temporary tablespace temp
quota unlimited on tools account unlock
container = ALL'

Got you! A sync failure: the common user I created earlier (container = ALL) can’t be created on the PDB because of a missing tablespace. Is the PDB in an unrecoverable state? No, but it is opened in restricted mode. Connections to it won’t be possible:

SQL> select name,open_mode,restricted from v$pdbs 
  2  where name = 'PDB10';

NAME                           OPEN_MODE  RES
------------------------------ ---------- ---
PDB10                          READ WRITE YES

SQL> conn pdbadmin/xxxxxxxxxx@localhost/PDB10
ERROR:
ORA-01035: ORACLE only available to users with RESTRICTED SESSION privilege


Warning: You are no longer connected to ORACLE.

NB: the same can happen when you create a common user in the CDB while a PDB that doesn’t meet the criteria for it to created is closed. For example, if all PDBs are open and you try to create c#martin and there is a PDB without the tools tablespace the create user command fails. It will not fail if a PDB doesn’t have a tools tablepsace but is closed at the time the user is created. The command to open the PDB will throw the same error as shown above when you try to open it.

The Fix

Dead easy-a common user cannot be created due to a missing tablespace. So let’s create it and see if we can open the PDB:

SQL> conn / as sysdba
Connected.
SQL> alter session set container = PDB10;

Session altered.

SQL> create tablespace tools datafile size 10m;

Tablespace created.

SQL> conn / as sysdba
Connected.
SQL> alter pluggable database pdb10 close immediate;

Pluggable database altered.

SQL> alter pluggable database pdb10 open;

Pluggable database altered.

SQL> select name,open_mode,restricted from v$pdbs 
  2   where name = 'PDB10';

NAME                           OPEN_MODE  RES
------------------------------ ---------- ---
PDB10                          READ WRITE NO

SQL> 

Fixed! But I would still argue that a different approach – a “gold image PDB” might be better suited for production work than a clone from the seed.

A potentially better Approach

I personally think that creating a PDB from the seed is not something that should be done in production, but that is – a personal opinion…

The problem you just read about could be avoided with a Golden Image CDB/PDB. I wrote about a way to create such a golden database image in my last post. Following the procedure I created CDB3 using dbca feeding it my template. It contains the goldenImage PDB which has been created as follows:

SQL> select sys_context('userenv','con_name') as con_name, name, cdb
  2  from v$database;

CON_NAME                       NAME      CDB
------------------------------ --------- ---
CDB$ROOT                       CDB3      YES

SQL> create pluggable database goldenImage 
  2  admin user pdbadmin identified by xxxxxxxx
  3  default tablespace users datafile size 10m
  4  storage (maxsize 20G);

Pluggable database created.

SQL> alter pluggable database goldenImage open;

Pluggable database altered.

SQL> alter session set container = goldenImage;

Session altered.

SQL> create tablespace tools datafile size 10m;

Tablespace created.

CDB3 also features a user C##MARTIN which has been created exactly as in CDB1:

SQL> select u.username, u.account_status, nvl(p.name, 'CDB$ROOT') con_name
  2  from cdb_users u, v$pdbs p
  3  where u.username = 'C##MARTIN'
  4 and u.con_id = p.con_id(+);

USERNAME             ACCOUNT_STATUS       CON_NAME
-------------------- -------------------- ------------------------------
C##MARTIN            OPEN                 CDB$ROOT
C##MARTIN            OPEN                 GOLDENIMAGE

With everything in place I can go about cloning goldenImage and avoid the problem I wrote about in the first part of this post altogether.

SQL> create pluggable database pdb10 from goldenImage;

Pluggable database created.

SQL> alter pluggable database pdb10 open;

Pluggable database altered.

SQL> select cause,message from pdb_plug_in_violations where name = 'PDB10';

no rows selected

SQL> conn c##martin/xxxxxxxxxx@localhost/pdb10
Connected.
SQL> show con_name

CON_NAME
------------------------------
PDB10

Standards win again.

Posted in 12c Release 1, Linux | Leave a Comment »

Automatic Deployment of CDBs with non-default database options in 12.1.0.2

Posted by Martin Bach on June 20, 2016

I have written about the importance of automation in previous posts, and this one is following the same line. This time I wanted to test and document my experience with “golden database templates”. Wy? Because most DBAs will appreciate that there are more interesting things to do than to run “create database” scripts all day, so why not automate the process? For quite some time DBCA, or the Database Creation Assistant offers you the option to create templates. One approach you could use would be to

  • Create a “golden master” database
  • Use DBCA to create a template based on this
  • And enjoy the fact that every database is exactly the same

All of this is done in hope to make life for the administrators a lot easier.

Motivation

In many environments a database is not an island, it’s rather part of a wider ecosystem where monitoring solutions, auditing, and what-have-you need to integrate with each database from the moment it is deployed. A template – at least to my mind – is better suited to this task than executing a number of scripts post database creation. SQL scripts can always run into errors, and their execution might require review of the output by a human (which is what I’d like to avoid). Oh, and it seems to take forever, even on reasonably modern hardware, to create a database from scripts.

Furthermore, it is possible to create a CDB with fewer options in 12.1.0.2, as explained by Mike Dietrich and on MOS. I wanted to get a reference to myself on how to deploy a CDB exactly the same way, each time, without all the options in it. Please have a look at these references for more background information.

All of them are very good references, and they contain the necessary “ifs” and “buts”. Please make sure you understand them all, especially those in MOS DOC ID 1616554.1, section 1. If the referenced documents resonate with you, and the pros/cons are understood, read on :)

Further to the introductory question on why to do this, there is more to mention: I don’t want any extra components in my database, for 2 main reasons:

  1. A component that is not installed can’t be used as an attack vector
  2. Components that aren’t installed in the RDBMS can’t be used accidentally

And there are many more … For the purpose of this post I am going to create a database with all the extra components I ever need: none.

In case you found this via an Internet search engine, here are the facts about my environment.

[oracle@server1 OPatch]$ ./opatch lspatches
22674709;Database PSU 12.1.0.2.160419, Oracle JavaVM Component (Apr2016)
22291127;Database Patch Set Update : 12.1.0.2.160419 (22291127)

OPatch succeeded.
[oracle@server1 OPatch]$ cat /etc/oracle-release
Oracle Linux Server release 6.8
[oracle@server1 OPatch]$ uname -r -o
4.1.12-37.4.1.el6uek.x86_64 GNU/Linux

This is not a RAC nor Oracle Restart database, just a single RDBMS home with the databases on traditional file system. This is a VM running on VirtualBox 5.0.20 in my lab.

Create the Golden Image database

Following MOS note 2001512.1 and understanding the comments in each section I consciously make the decision to remove everything that’s not needed for my purposes (these aren’t necessarily yours!) I decided to create the scripts for a CDB named CDBTEMPL. After exiting DBCA I can find these scripts in /u01/app/oracle/admin/CDBTEMPL/scripts/

[oracle@server1 scripts]$ ls -l
total 72
-rwxr-xr-x. 1 oracle oinstall  834 Jun 15 08:31 CDBTEMPL.sh
-rwxr-xr-x. 1 oracle oinstall 1260 Jun 15 08:31 CDBTEMPL.sql
-rw-r-----. 1 oracle oinstall  550 Jun 15 08:31 CreateClustDBViews.sql
-rw-r-----. 1 oracle oinstall 1552 Jun 15 08:31 CreateDB.sql
-rw-r-----. 1 oracle oinstall 1668 Jun 15 08:31 CreateDBCatalog.sql
-rw-r-----. 1 oracle oinstall  397 Jun 15 08:31 CreateDBFiles.sql
-rw-r-----. 1 oracle oinstall 1171 Jun 15 08:31 JServer.sql
-rw-r-----. 1 oracle oinstall  386 Jun 15 08:31 apex.sql
-rw-r-----. 1 oracle oinstall  998 Jun 15 08:31 context.sql
-rw-r-----. 1 oracle oinstall  353 Jun 15 08:31 cwmlite.sql
-rw-r-----. 1 oracle oinstall  365 Jun 15 08:31 datavault.sql
-rw-r-----. 1 oracle oinstall 2011 Jun 15 08:31 init.ora
-rw-r-----. 1 oracle oinstall  342 Jun 15 08:31 interMedia.sql
-rw-r-----. 1 oracle oinstall  344 Jun 15 08:31 labelSecurity.sql
-rw-r-----. 1 oracle oinstall 1155 Jun 15 08:31 lockAccount.sql
-rw-r-----. 1 oracle oinstall  360 Jun 15 08:31 ordinst.sql
-rw-r-----. 1 oracle oinstall  857 Jun 15 08:31 postDBCreation.sql
-rw-r-----. 1 oracle oinstall  335 Jun 15 08:31 spatial.sql

I need to update the $ORACLE_SID.sql file, which originally looks like this:

set verify off
ACCEPT sysPassword CHAR PROMPT 'Enter new password for SYS: ' HIDE
ACCEPT systemPassword CHAR PROMPT 'Enter new password for SYSTEM: ' HIDE
ACCEPT dvOwnerPassword CHAR PROMPT 'Enter  user password: ' HIDE
ACCEPT dvAccountManagerPassword CHAR PROMPT 'Enter  user password: ' HIDE
host /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/orapwd file=/u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/orapwCDBTEMPL force=y format=12
@/u01/app/oracle/admin/CDBTEMPL/scripts/CreateDB.sql
@/u01/app/oracle/admin/CDBTEMPL/scripts/CreateDBFiles.sql
@/u01/app/oracle/admin/CDBTEMPL/scripts/CreateDBCatalog.sql
@/u01/app/oracle/admin/CDBTEMPL/scripts/JServer.sql
@/u01/app/oracle/admin/CDBTEMPL/scripts/context.sql
@/u01/app/oracle/admin/CDBTEMPL/scripts/ordinst.sql
@/u01/app/oracle/admin/CDBTEMPL/scripts/interMedia.sql
@/u01/app/oracle/admin/CDBTEMPL/scripts/cwmlite.sql
@/u01/app/oracle/admin/CDBTEMPL/scripts/spatial.sql
@/u01/app/oracle/admin/CDBTEMPL/scripts/labelSecurity.sql
@/u01/app/oracle/admin/CDBTEMPL/scripts/apex.sql
@/u01/app/oracle/admin/CDBTEMPL/scripts/datavault.sql
@/u01/app/oracle/admin/CDBTEMPL/scripts/CreateClustDBViews.sql
@/u01/app/oracle/admin/CDBTEMPL/scripts/lockAccount.sql
@/u01/app/oracle/admin/CDBTEMPL/scripts/postDBCreation.sql

Using the very useful dependency matrix in MOS note 2001512.1 I can safely remove all components I don’t need for my particular case (again yours might be different). The resulting file looks like this for me:

set verify off
ACCEPT sysPassword CHAR PROMPT 'Enter new password for SYS: ' HIDE
ACCEPT systemPassword CHAR PROMPT 'Enter new password for SYSTEM: ' HIDE
ACCEPT dvOwnerPassword CHAR PROMPT 'Enter  user password: ' HIDE
ACCEPT dvAccountManagerPassword CHAR PROMPT 'Enter  user password: ' HIDE
host /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/orapwd file=/u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/orapwCDBTEMPL force=y format=12
@/u01/app/oracle/admin/CDBTEMPL/scripts/CreateDB.sql
@/u01/app/oracle/admin/CDBTEMPL/scripts/CreateDBFiles.sql
@/u01/app/oracle/admin/CDBTEMPL/scripts/CreateDBCatalog.sql
@/u01/app/oracle/admin/CDBTEMPL/scripts/CreateClustDBViews.sql
@/u01/app/oracle/admin/CDBTEMPL/scripts/lockAccount.sql
@/u01/app/oracle/admin/CDBTEMPL/scripts/postDBCreation.sql

Another VERY important aspect is the use of ORACLE’s perl. You need to make sure that your PATH uses oracle’s perl, and not the one installed with the host. In other words, “which perl” typed into the command prompt should return /u01/app/oracle/product/12.1.0.2/dbhome_1/perl/bin/perl, as shown in this example:

[oracle@server1 ~]$ export PATH=$ORACLE_HOME/perl/bin:$PATH
[oracle@server1 ~]$ which perl
/u01/app/oracle/product/12.1.0.2/dbhome_1/perl/bin/perl

If you are on a recent chip and want try these steps in Virtualbox like me, you might run into issues with perl core-dumping on you. Have a look at http://dbaontap.com/2016/01/13/vbox5/ before you start to see if you are susceptible to the same problem.

When you are ready, start the database creation in a screen session.

[oracle@server1 dbs]$ cd /u01/app/oracle/admin/CDBTEMPL/scripts/
[oracle@server1 scripts]$ export PATH=$ORACLE_HOME/perl/bin:$PATH
[oracle@server1 scripts]$ which perl
/u01/app/oracle/product/12.1.0.2/dbhome_1/perl/bin/perl
[oracle@server1 scripts]$ bash CDBTEMPL.sh

Don’t be startled by having to enter a password for USER twice, this is a slight irregularity with the script, looking at the source it appears it’s prompting you for Data Vault and Oracle Label Security information:

ACCEPT sysPassword CHAR PROMPT 'Enter new password for SYS: ' HIDE
ACCEPT systemPassword CHAR PROMPT 'Enter new password for SYSTEM: ' HIDE
ACCEPT dvOwnerPassword CHAR PROMPT 'Enter  user password: ' HIDE
ACCEPT dvAccountManagerPassword CHAR PROMPT 'Enter  user password: ' HIDE

The last 2 don’t quite make sense, and going by the extra space character there should have been a username …

Once the script has been started, go and get some coffee. It will take a while to finish. After it completed, it’s time to see what we have. The alias “sq” is just a shortcut for “sqlplus / as sysdba”.

[oracle@server1 ~]$ sq

SQL*Plus: Release 12.1.0.2.0 Production on Wed Jun 15 10:20:10 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> show pdbs

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO

SQL> select comp_name, version, status from dba_server_registry;

COMP_NAME                                VERSION                        STATUS
---------------------------------------- ------------------------------ -----------
Oracle Workspace Manager                 12.1.0.2.0                     VALID
Oracle XML Database                      12.1.0.2.0                     VALID
Oracle Database Catalog Views            12.1.0.2.0                     VALID
Oracle Database Packages and Types       12.1.0.2.0                     VALID
Oracle Real Application Clusters         12.1.0.2.0                     OPTION OFF

SQL> select cdb, name from v$database;

CDB NAME
--- ---------
YES CDBTEMPL

SQL> select * From v$pwfile_users;

USERNAME                       SYSDB SYSOP SYSAS SYSBA SYSDG SYSKM     CON_ID
------------------------------ ----- ----- ----- ----- ----- ----- ----------
SYS                            TRUE  TRUE  FALSE FALSE FALSE FALSE          0
SYSDG                          FALSE FALSE FALSE FALSE TRUE  FALSE          1
SYSBACKUP                      FALSE FALSE FALSE TRUE  FALSE FALSE          1
SYSKM                          FALSE FALSE FALSE FALSE FALSE TRUE           1

So this looks like it worked in my favour: the database is created, and I can merely see the bare minimum of options present. Well actually it seems that you could potentially also remove workspace manager and the RAC views, but they are ok for me to keep. Refer to MOS for more details about these two.

The database is in fact a CDB, and it’s name is CDBTEMPL. All the users I might want are in the password file as well.

Customising the Golden Image

First I want a new PDB that I use as en lieu of the PDB$SEED as the basis for new PDBs. Let’s create it:

SQL> !mkdir /u01/oradata/CDBTEMPL/MASTER

SQL> create pluggable database MASTER admin user PDBMASTER identified by PDBMASTERPWD
  2  default tablespace users
  3  datafile '/u01/oradata/CDBTEMPL/MASTER/users01.dbf'
  4  size 250m autoextend on next 250m maxsize 1G
  5  file_name_convert = ('/u01/oradata/CDBTEMPL/pdbseed','/u01/oradata/CDBTEMPL/MASTER')
  6  storage (maxsize 20G);

Pluggable database created.

While I’m still connected to the CDB$ROOT I create a few common user accounts that I need:

SQL> create user c##backupadmin identified by backupadminpwd;

User created.

SQL> grant sysbackup to c##backupadmin;

Grant succeeded.

SQL> create user c##monitor identified by monitorpassword;

User created.

-- grants as needed for the monitoring account

Switching to the MASTER-PDB in the next step I create components that I might need, for example STATSPACK or SQLT. Or anything else your standards require you to put in there. For example:

SQL> alter session set container = MASTER;
Session altered.
SQL> show con_name

CON_NAME
------------------------------
MASTER

SQL> create tablespace statpack_tbs datafile
  2  '/u01/oradata/CDBTEMPL/MASTER/statpack_tbs01.dbf'
  3  size 250m autoextend on next 250m maxsize 5g;

Tablespace created.

SQL> start ?/rdbms/admin/spcreate

Choose the Default tablespace for the PERFSTAT user
---------------------------------------------------
Below is the list of online tablespaces in this database which can
store user data.  Specifying the SYSTEM tablespace for the user's
default tablespace will result in the installation FAILING, as
using SYSTEM for performance data is not supported.

Choose the PERFSTAT users's default tablespace.  This is the tablespace
in which the STATSPACK tables and indexes will be created.

TABLESPACE_NAME                CONTENTS  STATSPACK DEFAULT TABLESPACE
------------------------------ --------- ----------------------------
STATPACK_TBS                   PERMANENT
SYSAUX                         PERMANENT *
USERS                          PERMANENT

Pressing <return> will result in STATSPACK's recommended default
tablespace (identified by *) being used.

Enter value for default_tablespace: STATPACK_TBS

Using tablespace STATPACK_TBS as PERFSTAT default tablespace.

Choose the Temporary tablespace for the PERFSTAT user
-----------------------------------------------------
Below is the list of online tablespaces in this database which can
store temporary data (e.g. for sort workareas).  Specifying the SYSTEM
tablespace for the user's temporary tablespace will result in the
installation FAILING, as using SYSTEM for workareas is not supported.

Choose the PERFSTAT user's Temporary tablespace.

TABLESPACE_NAME                CONTENTS  DB DEFAULT TEMP TABLESPACE
------------------------------ --------- --------------------------
TEMP                           TEMPORARY *

Pressing <return> will result in the database's default Temporary
tablespace (identified by *) being used.

Enter value for temporary_tablespace: TEMP

Using tablespace TEMP as PERFSTAT temporary tablespace.

... Creating PERFSTAT user

Create the Template

With the Golden Image complete, it is time to create a template from this. You might think that all you need to do is to execute dbca and create a clone template but wait! I did this too but ran into an issue. Fast forward a bit (don’t worry I’ll detail the steps later), when I created a database off the template, dbca errored out with this message:

[oracle@server1 ~]$ dbca -silent -createDatabase -templateName CDB_MASTER_20160615.dbc \
> -gdbName CDB1 -sysPassword sysPassword -systemPassword systemPassword \
> -datafileDestination /u01/oradata -recoveryAreaDestination /u01/fra -totalMemory 4096
Copying database files
1% complete
3% complete
37% complete
Creating and starting Oracle instance
40% complete
45% complete
50% complete
55% complete
DBCA Operation failed.
Look at the log file "/u01/app/oracle/cfgtoollogs/dbca/CDB1/CDB1.log" for further details.

[oracle@server1 ~]$ cat /u01/app/oracle/cfgtoollogs/dbca/CDB1/CDB1.log

Unique database identifier check passed.

/u01/ has enough space. Required space is 8660 MB , available space is 14332 MB.
File Validations Successful.
Copying database files
DBCA_PROGRESS : 1%
DBCA_PROGRESS : 3%
DBCA_PROGRESS : 37%
Creating and starting Oracle instance
DBCA_PROGRESS : 40%
DBCA_PROGRESS : 45%
DBCA_PROGRESS : 50%
DBCA_PROGRESS : 55%
Error while executing "/u01/app/oracle/product/12.1.0.2/dbhome_1/rdbms/admin/dbmssml.sql".
Refer to "/u01/app/oracle/cfgtoollogs/dbca/CDB1/dbmssml0.log" for more details.
Error in Process: /u01/app/oracle/product/12.1.0.2/dbhome_1/perl/bin/perl
DBCA_PROGRESS : DBCA Operation failed.

Hmmm, not too good. Funny enough there was no content in the log file mentioned in the error message, but there was a similarly named file nearby:

[oracle@server1 ~]$ ls -l /u01/app/oracle/cfgtoollogs/dbca/CDB1/dbmssml*
-rw-r-----. 1 oracle oinstall   0 Jun 15 11:51 /u01/app/oracle/cfgtoollogs/dbca/CDB1/dbmssml0.log
-rw-r-----. 1 oracle oinstall 521 Jun 15 11:51 /u01/app/oracle/cfgtoollogs/dbca/CDB1/dbmssml_catcon_11711.lst
[oracle@server1 ~]$

Looking at dbmssml_catcon_11711.lst I saw this message:

[oracle@server1 ~]$ cat /u01/app/oracle/cfgtoollogs/dbca/CDB1/dbmssml_catcon_11711.lst
catcon: See /u01/app/oracle/cfgtoollogs/dbca/CDB1/dbmssml*.log files for output generated by scripts
catcon: See /u01/app/oracle/cfgtoollogs/dbca/CDB1/dbmssml_*.lst files for spool files, if any
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

        catconInit: start logging catcon output at 2016-06-15 11:51:11

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

validate_con_names: MASTER is not open
catconInit: Unexpected error returned by validate_con_names

So it seems I have to make sure that the clone database opens the MASTER PDB as part of the process. By default, a PDB is in mount mode when the database instance (re)starts. But I can force it to open. That’s not hard since 12.1.0.2, I can save a PDB’s state like this:

SQL> alter pluggable database master open;

Pluggable database altered.

SQL> alter pluggable database master save state;

Pluggable database altered.

SQL> SELECT con_name, instance_name, state FROM dba_pdb_saved_states;

CON_NAME                       INSTANCE_NAME        STATE
------------------------------ -------------------- --------------
MASTER                         CDBTEMPL             OPEN

That’s it-a quick test reveals that the MASTER PDB opens when I bounce the instance:

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.

Total System Global Area 1073741824 bytes
Fixed Size                  2932632 bytes
Variable Size             377487464 bytes
Database Buffers          687865856 bytes
Redo Buffers                5455872 bytes
Database mounted.
Database opened.
SQL> show pdbs

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 MASTER                         READ WRITE NO
SQL>

Now I can create the clone!

[oracle@server1 ~]$ dbca -createCloneTemplate -help
Create a clone template from an existing database by specifying the following parameters:
        -createCloneTemplate
                -sourceSID    <Database unique name for RAC Database and SID for Single
                               Instance Database>
                -templateName      <new template name>
                [-sysDBAUserName     <user name  with SYSDBA privileges>
                 -sysDBAPassword     <password for sysDBAUserName user name>]
                [-maxBackupSetSizeInMB     <Maximum backup set size (MB)>]
                [-rmanParallelism  <Parallelism value>]
                [-datafileJarLocation       <directory to place the datafiles in a compressed
                                             format>]

[oracle@server1 ~]$ dbca -silent -createCloneTemplate -sourceSID CDBTEMPL -templateName \
> CDB_MASTER_20160615 -sysDBAPassword sysDBAPassword \
> -rmanParallelism 2 -datafileJarLocation /u01/templates

Gathering information from the source database
4% complete
8% complete
13% complete
17% complete
22% complete
Backup datafiles
28% complete
88% complete
Creating template file
100% complete
Look at the log file "/u01/app/oracle/cfgtoollogs/dbca/silent.log_2016-06-15_11-31-28-AM"
for further details.

[oracle@server1 ~]$ cat /u01/app/oracle/cfgtoollogs/dbca/silent.log_2016-06-15_11-31-28-AM
Gathering information from the source database
DBCA_PROGRESS : 4%
DBCA_PROGRESS : 8%
DBCA_PROGRESS : 13%
DBCA_PROGRESS : 17%
DBCA_PROGRESS : 22%
Backup datafiles
DBCA_PROGRESS : 28%
DBCA_PROGRESS : 88%
Creating template file
DBCA_PROGRESS : 100%
The generation of the clone database template "CDB_MASTER_20160615" is successful.

That was a success!

Create a CDB from Template

This step is almost exactly the same as for the non-custom template. All you need to do is to invoke dbca in silent mode and point it to the new template. Here is an example:

[oracle@server1 ~]$ dbca -silent -createDatabase -templateName CDB_MASTER_20160615.dbc \
> -gdbName CDB1 -sysPassword sysPassword -systemPassword systemPassword \
> -datafileDestination /u01/oradata -recoveryAreaDestination /u01/fra -totalMemory 4096
Copying database files
1% complete
3% complete
37% complete
Creating and starting Oracle instance
40% complete
45% complete
50% complete
55% complete
56% complete
60% complete
62% complete
Completing Database Creation
66% complete
70% complete
73% complete
85% complete
96% complete
100% complete
Look at the log file "/u01/app/oracle/cfgtoollogs/dbca/CDB1/CDB10.log" for further details.

[oracle@server1 ~]$ cat /u01/app/oracle/cfgtoollogs/dbca/CDB1/CDB10.log

Unique database identifier check passed.

/u01/ has enough space. Required space is 10270 MB , available space is 13312 MB.
File Validations Successful.
Copying database files
DBCA_PROGRESS : 1%
DBCA_PROGRESS : 3%
DBCA_PROGRESS : 37%
Creating and starting Oracle instance
DBCA_PROGRESS : 40%
DBCA_PROGRESS : 45%
DBCA_PROGRESS : 50%
DBCA_PROGRESS : 55%
DBCA_PROGRESS : 56%
DBCA_PROGRESS : 60%
DBCA_PROGRESS : 62%
Completing Database Creation
DBCA_PROGRESS : 66%
DBCA_PROGRESS : 70%
DBCA_PROGRESS : 73%
DBCA_PROGRESS : 85%
DBCA_PROGRESS : 96%
DBCA_PROGRESS : 100%
Database creation complete. For details check the logfiles at:
 /u01/app/oracle/cfgtoollogs/dbca/CDB1.
Database Information:
Global Database Name:CDB1
System Identifier(SID):CDB1

Unlike in my first attempt this worked. But there are a couple of post-creation steps now to be taken. First, I would like to use MASTER as the golden master copy for each PDB. That requires the PDB to be opened read-only. But I would also like to check if everything worked

[oracle@server1 ~]$ echo $ORACLE_SID $ORACLE_HOME
CDB1 /u01/app/oracle/product/12.1.0.2/dbhome_1
[oracle@server1 ~]$ sq

SQL*Plus: Release 12.1.0.2.0 Production on Wed Jun 15 13:07:51 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> show pdbs

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 MASTER                         READ WRITE NO
SQL> alter session set container = MASTER;

Session altered.

SQL> select username, account_status
  2    from dba_users
  3   where oracle_maintained <> 'Y';

USERNAME                       ACCOUNT_STATUS
------------------------------ --------------------------------
SQLTXPLAIN                     OPEN
SQLTXADMIN                     EXPIRED & LOCKED
PERFSTAT                       OPEN
C##BACKUPADMIN                 EXPIRED & LOCKED
C##MONITOR                     EXPIRED & LOCKED
PDBMASTER                      OPEN

6 rows selected.

SQL> select tablespace_name from dba_tablespaces;

TABLESPACE_NAME
------------------------------
SYSTEM
SYSAUX
TEMP
USERS
STATPACK_TBS
SQLT_TBS

This seems to have worked. Notice how the common users are expired and locked. Now to enable cloning, I simply repeat the save state command but this time I want the PDB to be open read-only.

SQL> alter pluggable database master close immediate;

Pluggable database altered.

SQL> alter pluggable database master open read only;

Pluggable database altered.

SQL> alter pluggable database master save state;

Pluggable database altered.

-- create a new PDB

SQL> create pluggable database userpdb from master...;

Pluggable database created.

SQL> alter pluggable database userpdb open;

Pluggable database altered.

SQL> conn perfstat/perfstatPWD@localhost/userpdb
Connected.
SQL> show con_name

CON_NAME
------------------------------
USERPDB
SQL>

That seems to have worked as well. Of course there are more variations to the theme, and you certainly need to perform intensive testing before you can roll out a strategy like this but I think you get the idea. The process cannot stop here of course, there are many more important tasks (backup, monitoring, auditing, logging, …) to be configured but those are left out for now.

Posted in 12c Release 1 | Tagged: , , | Leave a Comment »

Using OSWatcher for system diagnostics

Posted by Martin Bach on June 14, 2016

OSWatcher is a superb tool that gathers information about your system in the background and stores it in an (optionally compressed) archive directory. As an Oracle DBA I like the analogy with statspack: you make the tool available on the host in a location with – very important – enough available disk space and then start it. Most users add it to the startup mechanism their O/S uses- SysV init, upstart, or systemd for example on Linux to allow it to start in the background. OSWatcher will then gather a lot of the interesting O/S related statistics that you so desperately need in an “after the fact” situation. There are plenty of reasons where you might want that information.

Example use cases

Say for example the system experienced an outage-a RAC node rebooted. The big question is “why?”. There might be information in the usual Grid Infrastructure logs but they might be inconclusive and a look at the O/S can be necessary. The granularity SAR offers is often not enough, you are bound to loose detail information in the 10 minute average. What if you had 30 second granularity to see what happened more or less right before the crash?

Or maybe you want to see the system’s activity during testing for a new software release? And then compare to yesterday’s test cycle? The use cases are endless, and I’m sure you can think of one, too.

ExaWatcher as the role model

In Expert Oracle Exadata (both editions) the authors covered ExaWatcher quite extensively. ExaWatcher is (not really surprisingly) OSWatcher for Exadata, or at least sort of. And it’s there by default on cells and compute nodes. I have always taken the stance that if something is implemented in Exadata then someone must have spent a few brain cells working out why it’s a good idea to deploy that tool to every Exadata in the world. That’s a big responsibility to have and you’d hope that there was some serious thinking behind decisions to deploy such a tool.

OSWatcher deployment

I deploy OSWatcher by default on every VM I create in my lab. It’s part of the automated build process I use. That ensures that I’ll have a log of recent O/S information whenever I need it. Again the analogy to Statspack: just like OSWatcher Statspack isn’t installed by default and has to be created manually. In so many cases however it was deployed after the problem occurred. It could be that last night’s problems don’t re-appear the following execution cycle. It might be better to have OSWatcher deployed and recording information proactively.

Deploying OSWatcher is dead simple-get it from MOS and copy the tarball to the host to monitor. In this post I am using Oracle Linux 7.1, at first without the compatibility tools for the network stack. I customarily use the minimal installation that doesn’t come with net-tools. What are the net-tools again? They are the tools we have come to love over the years:

[oracle@rac12node2 ~]$ yum info net-tools
Available Packages
Name        : net-tools
Arch        : x86_64
Version     : 2.0
Release     : 0.17.20131004git.el7
Size        : 303 k
Repo        : local
Summary     : Basic networking tools
URL         : http://sourceforge.net/projects/net-tools/
Licence     : GPLv2+
Description : The net-tools package contains basic networking tools,
            : including ifconfig, netstat, route, and others.
            : Most of them are obsolete. For replacement check iproute package.

But they are obsolete. You could argue that it might be just the time to get used to iproute as it’s the future, but for OSWatcher on OL 7 net-tools are needed and I installed them. If you don’t OSWatcher won’t abort but can’t collect information netstat and ifconfig provide.

In case you are using RAC you need to make OSWatcher aware of the private interconnect-this is detailed in the OSWatcher user’s guide.

Starting OSWatcher

Once the tarball is unzipped in a location of your choice you need to start OSWatcher. The script startOSWbb.sh is responsible for starting the tool, and it can take a number of arguments that are explained in the shell script for convenience and the user guide available from MOS. The first parameter is used to set the snapshot interval in seconds. The second indicates the archive duration in hours (eg how long the information should be stored) and the optional third which compression tool to use for compressing the raw data. Parameters 2 and 3 have direct implications to space usage. The optional fourth parameter is used to set a custom location for the archive directory that will contain the data collected.

I want to have 30 second snapshot intervals and retain data for 2 days or 48 hours. Be aware that using a tool such as compress or gzip as the third argument will save space by compressing older files, but when you are trying to use the analyser (more on that later) then you’ll have to decompress the files first. Clever use of the find (1) command can help you uncompressing only the raw data files you need.

WARNING Depending on the system you monitor and the amount of activity you might end up generating quite a lot of data. And by a lot I mean it! The location you choose for storing the archive directory must be independent of anything important. In other words, should the location fill up despite all the efforts you put in to prevent that from happening, it must not have an impact on availability. It is imperative to keep a keen eye on space usage. You certainly don’t want to fill up important mount points with performance information.

 

Starting OSWatcher from the command line

With that said it’s time to start the tool:

[oracle@rac12node1 oswbb]$ ./startOSWbb.sh 30 48 gzip
[oracle@rac12node1 oswbb]$ Info...Zip option IS specified.
Info...OSW will use gzip to compress files.
Setting the archive log directory to/some/mount/point/oswbb/archive

Testing for discovery of OS Utilities...
VMSTAT found on your system.
IOSTAT found on your system.
MPSTAT found on your system.
IFCONFIG found on your system.
NETSTAT found on your system.
TOP found on your system.
Warning... /proc/slabinfo not found on your system.

Testing for discovery of OS CPU COUNT
oswbb is looking for the CPU COUNT on your system
CPU COUNT will be used by oswbba to automatically look for cpu problems

CPU COUNT found on your system.
CPU COUNT = 2

Discovery completed.

Starting OSWatcher v7.3.3  on Wed Dec 16 10:02:49 GMT 2015
With SnapshotInterval = 30
With ArchiveInterval = 48

OSWatcher - Written by Carl Davis, Center of Expertise,
Oracle Corporation
For questions on install/usage please go to MOS (Note:301137.1)

...

Data is stored in directory: /some/mount/point/oswbb/archive

Starting Data Collection...

oswbb heartbeat:Wed Dec 16 10:02:54 GMT 2015

Launching OSWatcher from the command line is probably the exception, most users will start OSW during the boot process as part of the multi-user runlevel.

Note that if you haven’t installed net-tools on Oracle Linux 7 you will see errors when starting OSWatcher as it can’t find netstat and ifconfig. If memory serves me right then you get net-tools with previous releases of Oracle Linux by default so this is OL 7 specific.

You will also notice an error that /proc/slabinfo does not exist. It does exist, but the permissions have changed to 0400 and the file is owned by root:root. Not having slab information is not a problem for me, I always struggle to make sense of it anyway.

What it does

The OSWatcher daemon now happily monitors my system and places information into the archive directory:

[oracle@rac12node1 oswbb]$ ls -l archive/
total 40
drwxr-xr-x. 2 oracle oinstall 4096 Dec 16 10:02 oswifconfig
drwxr-xr-x. 2 oracle oinstall 4096 Dec 16 10:02 oswiostat
drwxr-xr-x. 2 oracle oinstall 4096 Dec 16 10:02 oswmeminfo
drwxr-xr-x. 2 oracle oinstall 4096 Dec 16 10:02 oswmpstat
drwxr-xr-x. 2 oracle oinstall 4096 Dec 16 10:02 oswnetstat
drwxr-xr-x. 2 oracle oinstall 4096 Dec 16 10:02 oswprvtnet
drwxr-xr-x. 2 oracle oinstall 4096 Dec 16 10:02 oswps
drwxr-xr-x. 2 oracle oinstall 4096 Dec 16 10:02 oswslabinfo
drwxr-xr-x. 2 oracle oinstall 4096 Dec 16 10:02 oswtop
drwxr-xr-x. 2 oracle oinstall 4096 Dec 16 10:02 oswvmstat

There is a directory per tool used – ifconfig, iostat, meminfo, …. , vmstat. Inside the directory you find the output. The file format is plain text (optionally compressed if not the current file if you specified gzip or compress as the third parameter to the start script), and each snapshot is indicated by a line starting zzz. Here is an example for iostat taken while swingbench was running.

...

zzz ***Wed Dec 16 10:32:30 GMT 2015
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          12.23    0.00    7.45   76.06    0.00    4.26

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00     0.00    0.00    2.00     0.00     4.00     4.00     0.24  128.00    0.00  128.00 121.00  24.20
vdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
vdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
vdd               0.00     0.00    2.00    1.00     1.00     0.50     1.00     0.11   36.67   33.00   44.00  36.67  11.00
vde               0.00     0.00    2.00    1.00     1.00     0.50     1.00     0.04   14.67    0.00   44.00  14.67   4.40
vdf               0.00     0.00    2.00    1.00     1.00     0.50     1.00     0.10   31.67   25.50   44.00  31.67   9.50
vdg               0.00     0.00   43.00   23.00   400.00    97.00    15.06     3.83   57.05   62.37   47.09  15.15 100.00
vdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.24    0.00    0.00    0.00   0.00  24.20
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
vdi               0.00     0.00   39.00   23.00   320.00    78.50    12.85     1.11   17.40    2.85   42.09  12.39  76.80

zzz ***Wed Dec 16 10:33:00 GMT 2015
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          10.58    0.00    7.94   81.48    0.00    0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00    17.00    0.00    1.00     0.00    72.00   144.00     0.01    1.00    0.00    1.00  11.00   1.10
vdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
vdc               0.00     0.00   12.00    2.00   192.00    16.00    29.71     0.13    9.57    8.25   17.50   9.57  13.40
vdd               0.00     0.00    2.00    2.00     1.00     0.50     0.75     0.03   14.50    8.50   20.50   7.25   2.90
vde               0.00     0.00    2.00    2.00     1.00     0.50     0.75     0.02   13.25    2.00   24.50   6.00   2.40
vdf               0.00     0.00    2.00    2.00     1.00     0.50     0.75     0.11   34.50   23.00   46.00  27.25  10.90
vdg               0.00     0.00   32.00   31.00   272.00   123.00    12.54     4.96   80.46   90.50   70.10  15.68  98.80
vdh               0.00     0.00    0.00    1.00     0.00     0.00     0.00     0.01   37.00    0.00   37.00   8.00   0.80
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-1              0.00     0.00    0.00   19.00     0.00    76.00     8.00     0.03    0.95    0.00    0.95   0.63   1.20
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
vdi               0.00     0.00   22.00   22.00   184.00    78.50    11.93     1.28   30.45    2.00   58.91  17.23  75.80

zzz ***Wed Dec 16 10:33:30 GMT 2015
...

As you can see there are snapshots every 30 seconds, just as requested. Let’s not focus on the horrendous I/O times-this is a virtualised RAC system on a host that struggles with a CPU bottleneck, and it does not have underlying SSD for the VMs…

You can navigate the archive directory and browse the files you are interested in. They all have timestamps in their names making it easy to identify each file’s contents.

The OSWatcher Analyser

Looking at text files is one way of digesting information. There is another option, the OSWatcher analyser. Here is an example of its use (it requires the DISPLAY variable to be set):

[oracle@rac12node1 oswbb]$ java -jar oswbba.jar -i archive -b "Dec 16 10:25:00 2015" \
> -e "Dec 16 10:35:00 2015" -P swingbench -s

Validating times in the archive...

Scanning file headers for version and platform info...


Parsing file rac12node1_iostat_15.12.16.1000.dat ...


Parsing file rac12node1_vmstat_15.12.16.1000.dat ...


Parsing file rac12node1_netstat_15.12.16.1000.dat ...


Parsing file rac12node1_top_15.12.16.1000.dat ...


Parsing file rac12node1_ps_15.12.16.1000.dat ...


A new analysis file analysis/rac12node1_1450262374129.txt has been created.
Generating file profile/rac12node1_swingbench/OSW_profile_files/OSWg_OS_Run_Queue.gif
Generating file profile/rac12node1_swingbench/OSW_profile_files/OSWg_OS_Block_Queue.gif
Generating file profile/rac12node1_swingbench/OSW_profile_files/OSWg_OS_Cpu_Idle.gif
Generating file profile/rac12node1_swingbench/OSW_profile_files/OSWg_OS_Cpu_System.gif
Generating file profile/rac12node1_swingbench/OSW_profile_files/OSWg_OS_Cpu_User.gif
Generating file profile/rac12node1_swingbench/OSW_profile_files/OSWg_OS_Cpu_Wa.gif
Generating file profile/rac12node1_swingbench/OSW_profile_files/OSWg_OS_Cpu_Interrupts.gif
Generating file profile/rac12node1_swingbench/OSW_profile_files/OSWg_OS_Context_Switches.gif
Generating file profile/rac12node1_swingbench/OSW_profile_files/OSWg_OS_Memory_Swap.gif
Generating file profile/rac12node1_swingbench/OSW_profile_files/OSWg_OS_Memory_Free.gif
Generating file profile/rac12node1_swingbench/OSW_profile_files/OSWg_OS_Memory_Page_In_Rate.gif
Generating file profile/rac12node1_swingbench/OSW_profile_files/OSWg_OS_IO_ST.gif
Generating file profile/rac12node1_swingbench/OSW_profile_files/OSWg_OS_IO_RPS.gif
Generating file profile/rac12node1_swingbench/OSW_profile_files/OSWg_OS_IO_WPS.gif
Generating file profile/rac12node1_swingbench/OSW_profile_files/OSWg_OS_IO_PB.gif
Generating file profile/rac12node1_swingbench/OSW_profile_files/OSWg_OS_IO_PBTP_1.gif
Generating file profile/rac12node1_swingbench/OSW_profile_files/OSWg_OS_IO_PBTP_2.gif
Generating file profile/rac12node1_swingbench/OSW_profile_files/OSWg_OS_IO_PBTP_3.gif
Generating file profile/rac12node1_swingbench/OSW_profile_files/OSWg_OS_IO_TPS.gif
[oracle@rac12node1 oswbb]$ 

In this example I asked the analyser tool to limit the search to specific time ranges and to create a “profile”. Refer to the MOS note about the OSWatcher Analyser for more information about the command line options.

The analyser will create a HTML “Profile” which is then stored in the profile directory as you can see in the last part of the output. If you transfer this to a system with a web-browser you can enjoy the graphical representation of the raw data. Very neat if you want to check on the O/S level if anything unusual might have happened.

Note also how there was a new analysis file created-have a look at it as it can provide very valuable information. In my example the following contents was recorded:

############################################################################
# Contents Of This Report:
#
# Section 1: System Status
# Section 2: System Slowdowns
#   Section 2.1: System Slowdown RCA Process Level Ordered By Impact
# Section 3: System General Findings
# Section 4: CPU Detailed Findings
#   Section 4.1: CPU Run Queue:
#   Section 4.2: CPU Utilization: Percent Busy
#   Section 4.3: CPU Utilization: Percent Sys
# Section 5: Memory Detailed Findings
#   Section 5.1: Memory: Process Swap Queue
#   Section 5.2: Memory: Scan Rate
#   Section 5.3  Memory: Page In:
#   Section 5.4  Memory: Page Tables (Linux only):
#   Section 5.5: Top 5 Memory Consuming Processes Beginning
#   Section 5.6: Top 5 Memory Consuming Processes Ending
# Section 6: Disk Detailed Findings
#   Section 6.1: Disk Percent Utilization Findings
#   Section 6.2: Disk Service Times Findings
#   Section 6.3: Disk Wait Queue Times Findings
#   Section 6.4: Disk Throughput Findings
#   Section 6.5: Disk Reads Per Second
#   Section 6.6: Disk Writes Per Second
#   Section 6.7: Disk Percent CPU waiting on I/O
# Section 7: Network Detailed Findings
#   Section 7.1  Network Data Link Findings
#   Section 7.2: Network IP Findings
#   Section 7.3: Network UDP Findings
#   Section 7.4: Network TCP Findings
# Section 8: Process Detailed Findings
#   Section 8.1: PS Process Summary Ordered By Time
#   Section 8.2: PS for Processes With Status = D or T Ordered By Time
#   Section 8.3: PS for (Processes with CPU > 0) When System Idle CPU < 30% Ordered By Time
#   Section 8.4: Top VSZ Processes Increasing Memory Per Snapshot
#   Section 8.5: Top RSS Processes Increasing Memory Per Snapshot
#
############################################################################

Summary

OSWatcher is a great tool that can be used on Oracle Linux 7 and other supported platforms. It provides a wealth of information both in textual as well as graphical format. It is invaluable in many situations where you need to retrospectively have a look at the state of the O/S in a given period and records lots of useful information.

As always, read the documentation found on MOS and make sure you understand the implications of using the tool. Also make sure you test it thoroughly first. Please ensure that you have sufficient disk space for your archive directory on a mount point that cannot affect the availability of your system.

Posted in Oracle | Tagged: | 3 Comments »

Little things worth knowing: auto-DOP or hint-which one takes precedence?

Posted by Martin Bach on June 2, 2016

This is just another short post about one of the little things worth knowing. Assume you are on 12.1.0.2 for example and you want to know if parallel_degree_limit reliably keeps the degree of parallelism at bay, for all sessions and queries across the board. After all, every machine has finite CPU (and other) resources that shouldn’t be exhausted. Setting parallel_degree_limit of course only has an effect if you set parallel_degree_policy to limited, auto or adaptive. The latter is the Big Knob to enable Automatic Degree Of Parallelism (DOP), and the post assumes that Auto DOP is enabled for the session. In this configuration, what happens to hinted statements?

Documentation

The documentation (12c Reference Guide, section on parallel_degree_limit) states that you can provide an integer in addition to “CPU” and “IO”:

A numeric value for this parameter specifies the maximum degree of parallelism the optimizer can choose for a SQL statement when automatic degree of parallelism is active. Automatic degree of parallelism is only enabled if PARALLEL_DEGREE_POLICY is set to ADAPTIVE, AUTO, or LIMITED.

Without further context this may sound a bit ambiguous-does enabling Auto DOP plus setting parallel_degree_limit to 8 ensure no query can exceed that DOP? And does that setting actually stop users from going over and above that value?

Testing

I am using a 12.1.0.2 RAC database for the test. I would like to limit the maximum DOP of a given statement to 8. Here are my settings for my lab system (emphasis on lab):

SQL> show parameter parallel

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
fast_start_parallel_rollback         string      LOW
parallel_adaptive_multi_user         boolean     FALSE
parallel_automatic_tuning            boolean     FALSE
parallel_degree_level                integer     100
parallel_degree_limit                string      8
parallel_degree_policy               string      AUTO
parallel_execution_message_size      integer     16384
parallel_force_local                 boolean     TRUE
parallel_instance_group              string
parallel_io_cap_enabled              boolean     FALSE
parallel_max_servers                 integer     128
parallel_min_percent                 integer     0
parallel_min_servers                 integer     16
parallel_min_time_threshold          string      AUTO
parallel_server                      boolean     TRUE
parallel_server_instances            integer     2
parallel_servers_target              integer     128
parallel_threads_per_cpu             integer     2
recovery_parallelism                 integer     0

Now I need a table to query – after looking around in my demo-schema I found T1 which looks like a good candidate.

SQL> select table_name,num_rows,partitioned,compression,degree from tabs where table_name = 'T1';

TABLE_NAME                       NUM_ROWS PAR COMPRESS DEGREE
------------------------------ ---------- --- -------- ----------
T1                               32000000 NO  DISABLED          1

Unlike when setting parallel_degree_policy to limited I don’t need to worry about decorating the table with a default DOP. With parallel_degree_policy set to auto a parallel plan can be chosen even with the DOP on the table set to 1, provided the optimiser reckons that statement execution exceeds parallel_min_time_threshold. In my case I left its value at its default, which is 10 seconds. When querying the table I do not have to specify a hint to enable PX, or define PX on the session level as you can see in this example:

SQL> select count(*) from T1;

  COUNT(*)
----------
  32000000

SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------------
SQL_ID  byb4cbw4vy1cw, child number 1
-------------------------------------
select count(*) from T1

Plan hash value: 2755517917

----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name           | Rows  | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                |       |   200K(100)|          |        |      |            |
|   1 |  SORT AGGREGATE                |                |     1 |            |          |        |      |            |
|   2 |   PX COORDINATOR               |                |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)         | :TQ10000       |     1 |            |          |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE             |                |     1 |            |          |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR         |                |    32M|   200K  (1)| 00:00:04 |  Q1,00 | PCWC |            |
|*  6 |       TABLE ACCESS STORAGE FULL| T1             |    32M|   200K  (1)| 00:00:04 |  Q1,00 | PCWP |            |
----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - storage(:Z>=:Z AND :Z<=:Z)

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 8 because of degree limit

The important bit here is the information in the Note section: Oracle computed the DOP to be 8 because of the degree limit. Exactly what I wanted.

Now what if I use the hint?

No changes to the setup, this is the next statement I typed in:

SQL> select /*+ parallel(32) */ count(*) from T1;


  COUNT(*)
----------
  32000000

SQL> SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  9d9aha2bdd5zc, child number 0
-------------------------------------
select /*+ parallel(32) */ count(*) from T1

Plan hash value: 2755517917

----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name           | Rows  | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                |       | 50217 (100)|          |        |      |            |
|   1 |  SORT AGGREGATE                |                |     1 |            |          |        |      |            |
|   2 |   PX COORDINATOR               |                |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)         | :TQ10000       |     1 |            |          |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE             |                |     1 |            |          |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR         |                |    32M| 50217   (1)| 00:00:01 |  Q1,00 | PCWC |            |
|*  6 |       TABLE ACCESS STORAGE FULL| T1             |    32M| 50217   (1)| 00:00:01 |  Q1,00 | PCWP |            |
----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - storage(:Z>=:Z AND :Z<=:Z)

Note
-----
   - Degree of Parallelism is 32 because of hint

So based on this experiment it appears as if the hint took precedence over parallel_degree_limit in a session that had Auto DOP enabled. That doesn’t mean the DBA is out of luck, other tools come to mind to limit the DOP such as Database Resource Manager (DBRM). I have written about using DBRM to limit the DOP for all users of a consumer group some time ago. If you read that post make sure you look at Randolf’s comments.

Posted in 11g Release 1, 12c Release 1, Linux | Tagged: | Leave a Comment »

Experimenting with the ZFSSA’s snapshot capability using the simulator part 2

Posted by Martin Bach on May 31, 2016

In my last post I wrote down some notes about my experience while experimenting with the ZFSSA simulator. A simulator is a great way to get familiar with the user interface and general usability of a product. What I wanted to find out using the ZFSSA simulator was the answer to the question: “what happens to a clone of a database when I roll the master copy forward?”

In the first part of the series I explained how I created a clone of a database, named CLONE1. It is based on a backup of my NCDB database. On top of the backup I have created a snapshot as the basis for my clone. A clone in ZFS(SA) terminology is a writeable snapshot, and CLONE1 uses it. But what would happen to CLONE1 if I modified the source database, NCDB? And can I create a new clone-CLONE2-based on a new backup of the source without modifying the first clone? Let’s try this.

Changing the Source Database

Let’s change the source database a little by creating a new tablespace with a couple of data files. To add a little bit of spice to the scenario I decided to create a new user. Its only schema object will be created on the new tablespace.

[oracle@oraclelinux7 ~]$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Thu Mar 3 17:25:57 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Advanced Analytics
and Real Application Testing options

SQL> create tablespace dropme datafile size 5m;

Tablespace created.

SQL> alter tablespace dropme add datafile size 5m;

Tablespace altered.

SQL> grant dba to new_user identified by ...;

Grant succeeded.

SQL> create table new_user.t1 tablespace dropme as 
  2  select * from dba_source where rownum <= 100; 

Table created. 

SQL> 

This should be enough for this quick test.

Backing Up

The next step is to roll my existing image copies forward to reflect the changes. That’s not too hard, essentially you create an incremental backup “for recover of copy” … followed by a recover “copy of database with tag” … When the backup/recover command combination completed you also need to back up the archived logs in a second step.

At the risk of repeating myself, please be careful: adding this procedure to an existing backup strategy might have adverse side effects-as always make sure you understand the implications of this technique and its impact. Test thoroughly!

As with the previous backup I stored the incremental backup “for recover” of my image copies in ‘/zfssa/ncdb_bkp/data/’. That’s a location mounted via NFS from the ZFSSA. This is the same location I previously used for the image copies. There is nothing too exciting to report about the backup.

Just as with the previous post my archived logs went to ‘/zfssa/ncdb_bkp/archive/’ to complete the preparations. Here are the files that were created:

[oracle@oraclelinux7 ~]$ ls -lrt /zfssa/ncdb_bkp/data/
total 3035612
-rw-r-----. 1 oracle asmdba    7192576 Mar  3 17:26 6qqvijpt_1_1
-rw-r-----. 1 oracle asmdba    3702784 Mar  3 17:26 6pqvijps_1_1
-rw-r-----. 1 oracle asmdba    1851392 Mar  3 17:40 6tqvikjv_1_1
-rw-r-----. 1 oracle asmdba     442368 Mar  3 17:40 6sqvikjv_1_1
-rw-r-----. 1 oracle asmdba    5251072 Mar  3 17:40 data_D-NCDB_I-3358649481_TS-DROPME_FNO-7_6uqvikk2
-rw-r-----. 1 oracle asmdba  650125312 Mar  3 17:40 data_D-NCDB_I-3358649481_TS-SYSAUX_FNO-3_6kqvie3k
-rw-r-----. 1 oracle asmdba    5251072 Mar  3 17:40 data_D-NCDB_I-3358649481_TS-DROPME_FNO-2_6rqvijq0
-rw-r-----. 1 oracle asmdba  828383232 Mar  3 17:40 data_D-NCDB_I-3358649481_TS-SYSTEM_FNO-1_6jqvie1q
-rw-r-----. 1 oracle asmdba    5251072 Mar  3 17:40 data_D-NCDB_I-3358649481_TS-USERS_FNO-6_6mqvie57
-rw-r-----. 1 oracle asmdba  293609472 Mar  3 17:40 data_D-NCDB_I-3358649481_TS-UNDOTBS1_FNO-4_6lqvie4d
-rw-r-----. 1 oracle asmdba 1304174592 Mar  3 17:40 data_D-NCDB_I-3358649481_TS-EXAMPLE_FNO-5_6iqvie1p
[oracle@oraclelinux7 ~]$ ls -lrt /zfssa/ncdb_bkp/archive/
total 8204
-r--r-----. 1 oracle asmdba    1024 Mar  3 16:06 1_118_905507850.arc
-r--r-----. 1 oracle asmdba 2869760 Mar  3 16:06 1_117_905507850.arc
-r--r-----. 1 oracle asmdba    1024 Mar  3 17:49 1_120_905507850.arc
-r--r-----. 1 oracle asmdba 5426688 Mar  3 17:49 1_119_905507850.arc
[oracle@oraclelinux7 ~]$ 

The image copies are now current, and you can see the 2 data files for the new tablespace “DROPME” that didn’t exist before.

Creating the Second clone Database

With the preparations in place it is time to see if I can create a new clone that reflects the new tablespace and users. I also would like to see if the following steps have any implications on my database CLONE1.

On the ZFSSA Simulator

I headed over to the ZFSSA (simulator) and navigated to shares -> projects. After selecting “NCDB_BKP” I chose Snapshots and hit the (+) button to create snap1. You should see snap0 on that view as well if you are following the examples.

The next task is to create a new project. You should already see the projects pane on the left hand side. Click on the (+) sign next to ALL to create a new one. I named the project NCDB_CLONE2 to stay in line with the naming convention I used previously. With the project created, you should set the properties as needed. I moved the mount point to /export/ncdb_clone2/. On a real ZFSSA you’d set others as well, but that is out of scope of this post. Consult the relevant white papers for more information.

Just as described in the first post now you need to create clones based on snap1. To do so, switch back to the NCDB_BKP project and select (the list of) shares. You should see alert, archive, data and redo. Create clones for each, by following these steps per share:

  • Hover the mouse over the share name
  • Click on the share’s pencil icon to edit share properties
  • Select “snapshots”
  • Hover the mouse over the snapshot name, snap1
  • In the “Clones” column, click on the [+] sign
  • In the resulting pop-up, make sure to create the clone in NCDB_CLONE2 and give it the same name as the share you are creating the snapshot for

The end result should be 4 shares shown in the new project, all of them based on snapshots of their cousins from NCDB_BKP. That concludes the work on the ZFSSA for now.

NB: you can script this :)

On the database Server

If you haven’t done so yet, create the directories to mount the new shares. Everything I mount from the ZFSSA goes to /zfssa on the database server, and as a result I have my files in /zfssa/ncdb_clone2/{alert,archive,data,redo}. Update /etc/fstab accordingly and mount the shares.

The remaining steps are the same as for the creation of CLONE1, and I am not repeating them here. Because it’s very important, here is the warning again: Be careful with the path information in the create controlfile statement and make sure they point to /zfssa/ncdb_clone2/!

Reviewing the Outcome

After CLONE2 is created, my initial questions may be answered.

  1. Is the new tablespace part of CLONE2?
  2. Can I access the newly created table new_user.t1 in CLONE2?
  3. Did anything change for CLONE1?

Starting with CLONE2, I can see the new tablespace, and the data in new_user.t1 is available as well:

SQL> select name from v$database;

NAME
---------
CLONE2

SQL> select tablespace_name from dba_tablespaces;

TABLESPACE_NAME
------------------------------
SYSTEM
SYSAUX
UNDOTBS1
TEMP
USERS
EXAMPLE
DROPME

7 rows selected.

SQL> select count(*) from new_user.t1;

  COUNT(*)
----------
       100

SQL>

And what about CLONE1?

SQL> select tablespace_name from dba_tablespaces;

TABLESPACE_NAME
------------------------------
SYSTEM
SYSAUX
UNDOTBS1
TEMP
USERS
EXAMPLE

6 rows selected.

SQL> select count(*) from new_user.t1;
select count(*) from new_user.t1
                              *
ERROR at line 1:
ORA-00942: table or view does not exist

Well that looks ok!

Posted in Linux, Oracle | Tagged: | Leave a Comment »

Experimenting with the ZFSSA’s snapshot capability using the simulator

Posted by Martin Bach on May 17, 2016

Recently I have been asked how the Copy-on-Write cloning works on the ZFS Storage Appliance. More specifically, the question was about the “master” copy: did it have to be static or could it be rolled forward? What better than a test to work out how it works. Unfortunately I don’t have an actual system available to me at home so I had to revert to the simulator, hoping that it represents the real appliance accurately.

Setup

First I downloaded the ZFS Storage Appliance Simulator from the Oracle website and created a nice, new, shiny storage system (albeit virtual). Furthermore I have an Oracle Linux 7 system with UEK3 that will attach to the ZFSSA using dNFS. The appliance has an IP address of 192.168.56.101 while the Linux system is accessible via 192.168.56.20. This is of course a virtual toy environment, a real life setup would be quite different using IPMP and multiple paths preferably over Infiniband.

Configuration

Configuring the system is a two step process. The first is to create a storage pool on the ZFSSA that will host database backups, snapshots and clones. The second part is the configuration of the database server to use dNFS. I have written about that in detail in a previous blog post: https://martincarstenbach.wordpress.com/2014/07/09/setting-up-direct-nfs-on-oracle-12c/.

Step 1: Configure the ZFSSA

Again this is the simulator and I can only wish I had the real thing :) I have created a mirrored pool across all disks (that’s possible at this point because I skipped the pool creation during the initial appliance configuration). Navigating to Configuration -> Storage I clicked on the + button to create the pool and assigned all disks to it. I used a mirrored configuration, which again is owned to my lab setup. Depending on your type of (Exadata) backup you would probably choose something else. There are white papers that explain the best data profile based on the workload.

Next I created a new project, named NCDB_BKP to have a common location to set attributes. I tend to set a different mount point, in this case /export/ncdb_bkp to group all shares about to be created. Set the other attributes (compression, record size, access permissions etc) according to your workload. Following the recommendation in the white paper listed in the reference section I created 4 shares under the NCDB_BKP project:
– data
– redo
– alert
– archive

You probably get where this is heading … With those steps I took from the white paper listed in the reference section, the setup of the ZFSSA Simulator is done, at least for now. Head over to the database server.

Step 2: configure the database server

On the database server I customarily create a /zfssa/<projectName>/ mount point where I’m intending to mount the project’s shares. In other words I have this:

[oracle@oraclelinux7 ~]$ ls -l /zfssa/ncdb_bkp
total 0
drwxr-xr-x. 2 oracle oinstall 6 Jan 12 16:28 alert
drwxr-xr-x. 2 oracle oinstall 6 Jan 12 16:28 archive
drwxr-xr-x. 2 oracle oinstall 6 Jan 12 16:28 data
drwxr-xr-x. 2 oracle oinstall 6 Jan 12 16:28 redo

These will be mounted from the ZFSSA – edit your fstab to mount the shares from the appliance (simulator). When mounted, you’d see something like that:

[root@oraclelinux7 ~]# mount | awk '/^[0-9].*/ {print $1}'
192.168.56.101:/export/ncdb_bkp/alert
192.168.56.101:/export/ncdb_bkp/archive
192.168.56.101:/export/ncdb_bkp/data
192.168.56.101:/export/ncdb_bkp/redo

The next step is to add those to the oranfstab which I covered in the previous post I referred to. That should be it for now! Time to take a backup of the source database in preparation for the cloning. Be careful: adding image copies to an existing backup strategy might have adverse side effects-as always make sure you understand the implications of this technique and its impact. As always, test thoroughly!

Creating a backup

Let’s have a look at the database before taking a backup:

[oracle@oraclelinux7 ~]$ NLS_DATE_FORMAT="dd.mm.yyyy hh24:mi:ss" rman target /

Recovery Manager: Release 12.1.0.2.0 - Production on Thu Mar 3 10:12:28 2016

Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.

connected to target database: NCDB (DBID=3358649481)

RMAN> report schema;

using target database control file instead of recovery catalog
Report of database schema for database with db_unique_name NCDB

List of Permanent Datafiles
===========================
File Size(MB) Tablespace           RB segs Datafile Name
---- -------- -------------------- ------- ------------------------
1    790      SYSTEM               YES     +DATA/NCDB/DATAFILE/system.279.905507757
3    610      SYSAUX               NO      +DATA/NCDB/DATAFILE/sysaux.273.905507723
4    280      UNDOTBS1             YES     +DATA/NCDB/DATAFILE/undotbs1.259.905507805
5    1243     EXAMPLE              NO      +DATA/NCDB/DATAFILE/example.283.905507865
6    5        USERS                NO      +DATA/NCDB/DATAFILE/users.266.905507803

List of Temporary Files
=======================
File Size(MB) Tablespace           Maxsize(MB) Tempfile Name
---- -------- -------------------- ----------- --------------------
1    60       TEMP                 32767       +DATA/NCDB/TEMPFILE/temp.282.905507861

RMAN>

Nothing special, just a standard DBCA-created General Purpose database … Time to take the image copy.

RMAN> @/u01/app/oracle/admin/NCDB/scripts/imagecopy.rman

RMAN> run {
2>   allocate channel c1 device type disk format '/zfssa/ncdb_bkp/data/%U';
3>   allocate channel c2 device type disk format '/zfssa/ncdb_bkp/data/%U';
4>   backup incremental level 1 for recover of copy with tag 'zfssa' database ;
5>   recover copy of database with tag 'zfssa';
6> }

allocated channel: c1
channel c1: SID=36 device type=DISK

allocated channel: c2
channel c2: SID=258 device type=DISK

Starting backup at 03.03.2016 14:48:25
no parent backup or copy of datafile 5 found
no parent backup or copy of datafile 1 found
no parent backup or copy of datafile 3 found
no parent backup or copy of datafile 4 found
no parent backup or copy of datafile 6 found
channel c1: starting datafile copy
input datafile file number=00005 name=+DATA/NCDB/DATAFILE/example.283.905507865
channel c2: starting datafile copy
input datafile file number=00001 name=+DATA/NCDB/DATAFILE/system.279.905507757
output file name=/zfssa/ncdb_bkp/data/data_D-NCDB_I-3358649481_TS-SYSTEM_FNO-1_6jqvie1q 
  tag=ZFSSA RECID=36 STAMP=905525356
channel c2: datafile copy complete, elapsed time: 00:00:55
channel c2: starting datafile copy
input datafile file number=00003 name=+DATA/NCDB/DATAFILE/sysaux.273.905507723
output file name=/zfssa/ncdb_bkp/data/data_D-NCDB_I-3358649481_TS-EXAMPLE_FNO-5_6iqvie1p 
  tag=ZFSSA RECID=37 STAMP=905525386
channel c1: datafile copy complete, elapsed time: 00:01:23
channel c1: starting datafile copy
input datafile file number=00004 name=+DATA/NCDB/DATAFILE/undotbs1.259.905507805
output file name=/zfssa/ncdb_bkp/data/data_D-NCDB_I-3358649481_TS-UNDOTBS1_FNO-4_6lqvie4d
  tag=ZFSSA RECID=39 STAMP=905525414
channel c1: datafile copy complete, elapsed time: 00:00:25
channel c1: starting datafile copy
input datafile file number=00006 name=+DATA/NCDB/DATAFILE/users.266.905507803
output file name=/zfssa/ncdb_bkp/data/data_D-NCDB_I-3358649481_TS-SYSAUX_FNO-3_6kqvie3k
  tag=ZFSSA RECID=38 STAMP=905525409
channel c2: datafile copy complete, elapsed time: 00:00:51
output file name=/zfssa/ncdb_bkp/data/data_D-NCDB_I-3358649481_TS-USERS_FNO-6_6mqvie57
  tag=ZFSSA RECID=40 STAMP=905525416
channel c1: datafile copy complete, elapsed time: 00:00:03
Finished backup at 03.03.2016 14:50:18

Starting recover at 03.03.2016 14:50:18
no copy of datafile 1 found to recover
no copy of datafile 3 found to recover
no copy of datafile 4 found to recover
no copy of datafile 5 found to recover
no copy of datafile 6 found to recover
Finished recover at 03.03.2016 14:50:19
released channel: c1
released channel: c2

RMAN> **end-of-file**

The Oracle white paper I just mentioned proposes using the %b flag as part of the RMAN formatSpec in the backup command – which interestingly does not work. The second time I ran it it failed like this:

RMAN> @/u01/app/oracle/admin/NCDB/scripts/imagecopy.rman

RMAN> run {
2>   allocate channel c1 device type disk format '/zfssa/ncdb_bkp/data/%b';
3>   allocate channel c2 device type disk format '/zfssa/ncdb_bkp/data/%b';
4>   backup incremental level 1 for recover of copy with tag 'zfssa' database ;
5>   recover copy of database with tag 'zfssa';
6> }

allocated channel: c1
channel c1: SID=36 device type=DISK

allocated channel: c2
channel c2: SID=258 device type=DISK

Starting backup at 03.03.2016 14:42:04
channel c1: starting incremental level 1 datafile backup set
channel c1: specifying datafile(s) in backup set
input datafile file number=00005 name=+DATA/NCDB/DATAFILE/example.283.905507865
input datafile file number=00006 name=+DATA/NCDB/DATAFILE/users.266.905507803
input datafile file number=00004 name=+DATA/NCDB/DATAFILE/undotbs1.259.905507805
channel c1: starting piece 1 at 03.03.2016 14:42:04
RMAN-03009: failure of backup command on c1 channel at 03/03/2016 14:42:04
ORA-19715: invalid format b for generated name
ORA-27302: failure occurred at: slgpn
continuing other job steps, job failed will not be re-run
channel c2: starting incremental level 1 datafile backup set
channel c2: specifying datafile(s) in backup set
input datafile file number=00001 name=+DATA/NCDB/DATAFILE/system.279.905507757
input datafile file number=00003 name=+DATA/NCDB/DATAFILE/sysaux.273.905507723
channel c2: starting piece 1 at 03.03.2016 14:42:04
released channel: c1
released channel: c2
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on c2 channel at 03/03/2016 14:42:04
ORA-19715: invalid format b for generated name
ORA-27302: failure occurred at: slgpn

RMAN> **end-of-file**

This makes sense-Oracle tries to create an incremental backup with the same name as the data file copy, which would erase the copy and replace it with the incremental backup. Thankfully RMAN does not allow that to happen. This is why I chose the %U flag in the formatSpec, as it allows the incremental backup to be created successfully in addition to the datafile image copies. I am conscious of the fact that the image copies have somewhat ugly names.

After this little digression it’s time to back up the archived logs. When using copies of archived logs you need to make sure that you don’t have overlapping backups. The Oracle white paper has the complete syntax, I spare you the detail as it’s rather boring.

After some time the result is a set of image copies of the database plus the archived logs:

[oracle@oraclelinux7 ~]$ ls -lR /zfssa/ncdb_bkp/
/zfssa/ncdb_bkp/:
total 14
drwxr-xr-x. 2 oracle dba 2 Mar  3 10:32 alert
drwxr-xr-x. 2 oracle dba 4 Mar  3  2016 archive
drwxr-xr-x. 2 oracle dba 7 Mar  3  2016 data
drwxr-xr-x. 2 oracle dba 2 Mar  3 10:32 redo

/zfssa/ncdb_bkp/alert:
total 0

/zfssa/ncdb_bkp/archive:
total 2821
-r--r-----. 1 oracle asmdba 2869760 Mar  3  2016 1_117_905507850.arc
-r--r-----. 1 oracle asmdba    1024 Mar  3  2016 1_118_905507850.arc

/zfssa/ncdb_bkp/data:
total 3001657
-rw-r-----. 1 oracle asmdba 1304174592 Mar  3  2016 data_D-NCDB_I-3358649481_TS-EXAMPLE_FNO-5_6iqvie1p
-rw-r-----. 1 oracle asmdba  639639552 Mar  3  2016 data_D-NCDB_I-3358649481_TS-SYSAUX_FNO-3_6kqvie3k
-rw-r-----. 1 oracle asmdba  828383232 Mar  3  2016 data_D-NCDB_I-3358649481_TS-SYSTEM_FNO-1_6jqvie1q
-rw-r-----. 1 oracle asmdba  293609472 Mar  3  2016 data_D-NCDB_I-3358649481_TS-UNDOTBS1_FNO-4_6lqvie4d
-rw-r-----. 1 oracle asmdba    5251072 Mar  3  2016 data_D-NCDB_I-3358649481_TS-USERS_FNO-6_6mqvie57

/zfssa/ncdb_bkp/redo:
total 0

[oracle@oraclelinux7 ~]$

Create a Clone

Now that the database is backed up in form of an image copy and I have archived redo logs I can create a clone of it. This requires you to jump back to the ZFSSA interface (either CLI or BUI) and create snapshots followed by clones on the directories used.

One way is to log in to the BUI, select the NCDB_BKP project, navigate to “snapshots” and creating one by clicking on the + button. I named it snap0. If you plan on doing this more regularly it can also be scripted.

A clone is a writeable snapshot and is as easy to create. Add a new project – for example NCDB_CLONE1 – and set it up as required for your workload. In the next step you need to switch back to the backup project and for each of the 4 shares create a clone. To do so, navigate to the list of shares underneath NCDB_BKP and click on the pencil icon. This takes you to the share settings. Click on snapshots and you should see snap0. Hovering the mouse over the snapshot’s name reveals additional icons on the right, one of which allows you to create a clone (“clone snapshot as a new share”). Hit that plus sign and change the project to your clone (NCDB_CLONE1) and assign a name to the clone. I tend to use the same name as the source. The mount point should automatically be updated to /export/ncdb_clone1/sharename.

Now you need to get back to the database server and add the mount points for the recently created clones: /zfssa/ncdb_clone1/{data,redo,arch,alert}. Edit the fstab and oranfstab files, then mount the new shares.

Finishing the clone creation

The following procedure is most likely familiar to DBAs who created databases as file system copies. The steps are somewhere along the line of this:
– register the database in oratab
– create an initialisation file
– create a password file
– back up the source controlfile to trace in order to create a “create controlfile” statement for the clone
– start the clone
– create the controlfile
– recover the clone using the newly created backup controlfile
– open the clone with the resetlogs option
– add temp file(s)

Based on my source database’s parameter file, I created the following for the clone database which I’ll call CLONE1:

*.audit_file_dest='/u01/app/oracle/admin/CLONE1/adump'
*.audit_trail='db'
*.compatible='12.1.0.2.0'
*.control_files='/zfssa/ncdb_clone1/redo/control1.ctl'
*.db_block_size=8192
*.db_create_file_dest='/zfssa/ncdb_clone1/data'
*.db_domain=''
*.db_name='CLONE1'
*.db_recovery_file_dest='/zfssa/ncdb_clone1/archive'
*.db_recovery_file_dest_size=4560m
*.diagnostic_dest='/u01/app/oracle'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=CLONE1XDB)'
*.nls_language='ENGLISH'
*.nls_territory='UNITED KINGDOM'
*.open_cursors=300
*.pga_aggregate_target=512m
*.processes=300
*.remote_login_passwordfile='EXCLUSIVE'
*.sga_target=1024m
*.undo_tablespace='UNDOTBS1'

Notice how some of the filesystem related parameters changed to point to the mount points exported via NFS from the ZFSSA. I have refrained from changing diagnostic_dest to the ZFSSA simulator, this prevented the database from starting (perf told me that the sqlplus session spent all the time trying to use the network).

I’ll spare you the details of the create controlfile command, just make sure you change paths and point to the data files on the cloned shares (/zfssa/ncdb_clone1/data/*) and NOT on the ones where the master copy resides. After the controlfile is created, recover the database using the backup controlfile, then open it. Voila! You have just created CLONE1. Just add a temp file and you are almost good to go.

In the next part

The post has already become a bit too long, so I’ll stop here and split it into this one and a second part. In the next part you will read about changes to the source database (NCDB) and how I’ll roll the image copies forward.

References

http://www.oracle.com/technetwork/articles/systems-hardware-architecture/cloning-solution-353626.pdf

Posted in Linux, Oracle | Tagged: | 1 Comment »

Trouble with multiple SCAN listeners in 12c Release 1

Posted by Martin Bach on May 14, 2016

UPDATE 28-JUL-2016: the problem – as stated by Anil in the comments section – seems to be fixed. I applied the Proactive Bundle Patch 23273686 and OJVM patch 23177536 to the Grid and RDBMS home. After a successful execution of datapatch the errors in the CRSD agent log I reported in this post went away, and I can also see the settings in the database alert.log.

Scroll to the bottom of the post for more detail.

For those not yet on that patch level-the original post

Prompted by comments made by readers about my posts describing how to add a second SCAN in 12c Release 1 and problems regarding the listener_networks parameter I thought it was time to revisit the situation. I’m still running Oracle Linux 7.1/UEK3 (although that should not matter) but upgraded to 12.1.0.2.5. This is the release that is reportedly showing the odd behaviour. I don’t recall my exact version when I wrote the original posts back in April 2014, but by looking at them I think it all worked ok at the time. Here is my current patchlevel after the upgrade to the troublesome PSU.

 SQL> select patch_id,description,status from DBA_REGISTRY_SQLPATCH;

  PATCH_ID DESCRIPTION                                                                      STATUS
---------- -------------------------------------------------------------------------------- ---------------
  21555660 Database PSU 12.1.0.2.5, Oracle JavaVM Component (Oct2015)                       SUCCESS
  21359755 Database Patch Set Update : 12.1.0.2.5 (21359755)                                SUCCESS

This is not the latest patch level! After having seen a presentation by Ludovico Caldara (@ludodba) just this week I would think that the April 2016 patch which was current at the time of writing is the one you want to be on :) I have an upgrade to the April 2016 PSU planned but first wanted to test against 12.1.0.2.5 to see if I could reproduce the issue.

Problem Statement

The parameter listener_networks should be populated automatically by CRS when the database comes up-and indeed you can see evidence of that in the respective node’s crsd_oraagent_oracle.trc file. However, with 2 SCANs present, there is something not quite right. You can see this in the usual locations:

– SCAN listeners
– listener_networks parameter
– CRSD trace

The most obvious clue is that you cannot connect to the database any more using one of the SCANs. To avoid you having to flick back and forth between my posts, here is the network configuration again:

[oracle@ron12cprinode2 ~]$ srvctl config scan -netnum 1
SCAN name: ron12cpri-scan, Network: 1
Subnet IPv4: 192.168.100.0/255.255.255.0/eth0, static
Subnet IPv6: 
SCAN 0 IPv4 VIP: 192.168.100.65
SCAN VIP is enabled.
SCAN VIP is individually enabled on nodes: 
SCAN VIP is individually disabled on nodes: 
SCAN 1 IPv4 VIP: 192.168.100.66
SCAN VIP is enabled.
SCAN VIP is individually enabled on nodes: 
SCAN VIP is individually disabled on nodes: 
SCAN 2 IPv4 VIP: 192.168.100.64
SCAN VIP is enabled.
SCAN VIP is individually enabled on nodes: 
SCAN VIP is individually disabled on nodes: 

[oracle@ron12cprinode2 ~]$ srvctl config scan -netnum 2
SCAN name: ron12cpri-dg-scan, Network: 2
Subnet IPv4: 192.168.102.0/255.255.255.0/eth2, static
Subnet IPv6: 
SCAN 0 IPv4 VIP: 192.168.102.66
SCAN VIP is enabled.
SCAN VIP is individually enabled on nodes: 
SCAN VIP is individually disabled on nodes: 
SCAN 1 IPv4 VIP: 192.168.102.64
SCAN VIP is enabled.
SCAN VIP is individually enabled on nodes: 
SCAN VIP is individually disabled on nodes: 
SCAN 2 IPv4 VIP: 192.168.102.65
SCAN VIP is enabled.
SCAN VIP is individually enabled on nodes: 
SCAN VIP is individually disabled on nodes: 

It’s essentially using the 192.168.100/24 network for the “public” traffic and 192.168.102/24 for Data Guard. I still use my RAC One Node database RON, which is currently active on node 2. All of my SCAN listeners should know about its services, RON_SRV and RON_DG_SRV. However, that’s not the case, as a quick check reveals:

[oracle@ron12cprinode1 ~]$ lsnrctl status LISTENER_SCAN1

LSNRCTL for Linux: Version 12.1.0.2.0 - Production on 14-MAY-2016 19:56:14

Copyright (c) 1991, 2014, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER_SCAN1
Version                   TNSLSNR for Linux: Version 12.1.0.2.0 - Production
Start Date                14-MAY-2016 19:21:39
Uptime                    0 days 0 hr. 34 min. 35 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/12.1.0.2/grid/network/admin/listener.ora
Listener Log File         /u01/app/oracle/diag/tnslsnr/ron12cprinode1/listener_scan1/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.65)(PORT=1521)))
Services Summary...
Service "-MGMTDBXDB" has 1 instance(s).
  Instance "-MGMTDB", status READY, has 1 handler(s) for this service...
Service "RON" has 1 instance(s).
  Instance "RON_1", status READY, has 1 handler(s) for this service...
Service "RONXDB" has 1 instance(s).
  Instance "RON_1", status READY, has 1 handler(s) for this service...
Service "RON_DG_SRV" has 1 instance(s).
  Instance "RON_1", status READY, has 1 handler(s) for this service...
Service "RON_SRV" has 1 instance(s).
  Instance "RON_1", status READY, has 1 handler(s) for this service...
Service "_mgmtdb" has 1 instance(s).
  Instance "-MGMTDB", status READY, has 1 handler(s) for this service...
Service "ron12cpri" has 1 instance(s).
  Instance "-MGMTDB", status READY, has 1 handler(s) for this service...
The command completed successfully
[oracle@ron12cprinode1 ~]$ lsnrctl status DGSCANLSNR_SCAN2_NET2

LSNRCTL for Linux: Version 12.1.0.2.0 - Production on 14-MAY-2016 19:56:23

Copyright (c) 1991, 2014, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=DGSCANLSNR_SCAN2_NET2)))
STATUS of the LISTENER
------------------------
Alias                     DGSCANLSNR_SCAN2_NET2
Version                   TNSLSNR for Linux: Version 12.1.0.2.0 - Production
Start Date                14-MAY-2016 19:21:42
Uptime                    0 days 0 hr. 34 min. 41 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/12.1.0.2/grid/network/admin/listener.ora
Listener Log File         /u01/app/oracle/diag/tnslsnr/ron12cprinode1/dgscanlsnr_scan2_net2/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=DGSCANLSNR_SCAN2_NET2)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.102.64)(PORT=1521)))
The listener supports no services
The command completed successfully
[oracle@ron12cprinode1 ~]$ 

Before generating the above output I specifically “reset” the listener_networks settings on both instances, and ensured that they were created dynamically. After the database restart I couldn’t make out ANY entry for listener_networks:

SQL> select inst_id,name,value from gv$parameter where name = 'listener_networks';

   INST_ID NAME                           VALUE
---------- ------------------------------ ------------------------------
         1 listener_networks

SQL> select inst_id,name,value from gv$spparameter where name = 'listener_networks';

   INST_ID NAME                           VALUE
---------- ------------------------------ ------------------------------
         1 listener_networks

Nothing at all. I manage to reach the system using the RON_SRV service that’s known to the first (public) network’s SCAN:

[oracle@ron12cprinode1 ~]$ sqlplus a/b@ron12cpri-scan/RON_SRV

SQL*Plus: Release 12.1.0.2.0 Production on Sat May 14 20:18:02 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

ERROR:
ORA-01017: invalid username/password; logon denied


Enter user-name: 
[oracle@ron12cprinode1 ~]$ 

The same test failed for connections against ron12cpri-dg-scan, stating that the listener didn’t know about the service. Checking the CRSD trace (on the node the instance runs!) I could see the reason:

2016-05-14 19:47:49.637611 : USRTHRD:2023044864: {1:58687:2893} Endp=ron12cpri-dg-scan:1521
2016-05-14 19:47:49.637638 : USRTHRD:2023044864: {1:58687:2893} Final Endp=(( NAME=net2)(LOCAL_LISTENER=(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.102.63)(PORT=1523)))))','(( NAME=net2)(REMOTE_LISTENER=ron12cpri-dg-scan:1521ron12cpri-dg-scan:1521ron12cpri-dg-scan:1521)), remoteEndp= ron12cpri-scan:1521','ron12cpri-scan:1521','ron12cpri-scan:1521
2016-05-14 19:47:49.637662 : USRTHRD:2023044864: {1:58687:2893} sqlStmt = ALTER SYSTEM SET REMOTE_LISTENER=' ron12cpri-scan:1521','ron12cpri-scan:1521','ron12cpri-scan:1521' SCOPE=MEMORY SID='RON_1' /* db agent *//* {1:58687:2893} */
2016-05-14 19:47:49.645739 : USRTHRD:2023044864: {1:58687:2893} sqlStmt = ALTER SYSTEM SET LISTENER_NETWORKS='(( NAME=net2)(LOCAL_LISTENER=(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.102.63)(PORT=1523)))))','(( NAME=net2)(REMOTE_LISTENER=ron12cpri-dg-scan:1521ron12cpri-dg-scan:1521ron12cpri-dg-scan:1521))' SCOPE=MEMORY SID='RON_1' /* db agent *//* {1:58687:2893} */
2016-05-14 19:47:49.655035 : USRTHRD:2023044864: {1:58687:2893} ORA-02097: parameter cannot be modified because specified value is invalid
ORA-00119: invalid specification for system parameter LISTENER_NETWORKS
ORA-00137: invalid LISTENER_NETWORKS specification with NAME=net2
ORA-00138: all addresses specified for attribute REMOTE_LISTENER are invalid
ORA-00132: syntax error or unresolved network name 'ron12cpri-dg-scan:1521ron12cpri-dg-scan:1521ron12cpri-dg-scan:1521'

2016-05-14 19:47:49.655191 : USRTHRD:2023044864: {1:58687:2893} DbAgent::DedicatedThread::run setRemoteListener Exception OCIException
2016-05-14 19:47:49.655207 : USRTHRD:2023044864: {1:58687:2893} ORA-02097: parameter cannot be modified because specified value is invalid
ORA-00119: invalid specification for system parameter LISTENER_NETWORKS
ORA-00137: invalid LISTENER_NETWORKS specification with NAME=net2
ORA-00138: all addresses specified for attribute REMOTE_LISTENER are invalid
ORA-00132: syntax error or unresolved network name 'ron12cpri-dg-scan:1521ron12cpri-dg-scan:1521ron12cpri-dg-scan:1521'

Looking at this output it appears that setting the remote_listener worked, although I thought we’d only set the host once and not thrice? This looks fishy. It appears to work though, as confirmed in v$parameter and the fact that I can connect against the system.

Interestingly setting listener_networks fails with an ORA-02097: parameter cannot be modified because specified value is invalid. This makes sense: there are white spaces missing in the alter system command, and even if the spaces were correct, the command would fail. Trying manually confirms that thought:

SQL> ALTER SYSTEM SET LISTENER_NETWORKS='(( NAME=net2)(LOCAL_LISTENER=(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.102.63)(PORT=1523)))))','(( NAME=net2)(REMOTE_LISTENER=ron12cpri-dg-scan:1521 ron12cpri-dg-scan:1521 ron12cpri-dg-scan:1521))' SCOPE=MEMORY SID='RON_1';
ALTER SYSTEM SET LISTENER_NETWORKS='(( NAME=net2)(LOCAL_LISTENER=(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.102.63)(PORT=1523)))))','(( NAME=net2)(REMOTE_LISTENER=ron12cpri-dg-scan:1521 ron12cpri-dg-scan:1521 ron12cpri-dg-scan:1521))' SCOPE=MEMORY SID='RON_1'
*
ERROR at line 1:
ORA-02097: parameter cannot be modified because specified value is invalid
ORA-00119: invalid specification for system parameter LISTENER_NETWORKS
ORA-00137: invalid LISTENER_NETWORKS specification with NAME=net2
ORA-00138: all addresses specified for attribute REMOTE_LISTENER are invalid
ORA-00132: syntax error or unresolved network name 'ron12cpri-dg-scan:1521
ron12cpri-dg-scan:1521 ron12cpri-dg-scan:1521'

OK – that’s all I need to know. When changing the command to look like what I thought it should look like in the first place (the remote listener specifies only 1 host:port) it works:

SQL> ALTER SYSTEM SET LISTENER_NETWORKS='(( NAME=net2)(LOCAL_LISTENER=(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.102.63)(PORT=1523)))))','(( NAME=net2)(REMOTE_LISTENER=ron12cpri-dg-scan:1521))' SCOPE=MEMORY SID='RON_1';

System altered.

SQL> alter system register;

System altered.

What does that mean? Back to my SCAN listeners again:

[oracle@ron12cprinode1 ~]$ lsnrctl status LISTENER_SCAN1

LSNRCTL for Linux: Version 12.1.0.2.0 - Production on 14-MAY-2016 20:07:19

Copyright (c) 1991, 2014, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))
STATUS of the LISTENER
------------------------   
Alias                     LISTENER_SCAN1
Version                   TNSLSNR for Linux: Version 12.1.0.2.0 - Production
Start Date                14-MAY-2016 19:21:39
Uptime                    0 days 0 hr. 45 min. 40 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/12.1.0.2/grid/network/admin/listener.ora
Listener Log File         /u01/app/oracle/diag/tnslsnr/ron12cprinode1/listener_scan1/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.65)(PORT=1521)))
Services Summary...
Service "-MGMTDBXDB" has 1 instance(s).
  Instance "-MGMTDB", status READY, has 1 handler(s) for this service...
Service "RON" has 1 instance(s).
  Instance "RON_1", status READY, has 1 handler(s) for this service...
Service "RONXDB" has 1 instance(s).
  Instance "RON_1", status READY, has 1 handler(s) for this service...
Service "RON_DG_SRV" has 1 instance(s).
  Instance "RON_1", status READY, has 1 handler(s) for this service...
Service "RON_SRV" has 1 instance(s).
  Instance "RON_1", status READY, has 1 handler(s) for this service...
Service "_mgmtdb" has 1 instance(s).
  Instance "-MGMTDB", status READY, has 1 handler(s) for this service...
Service "ron12cpri" has 1 instance(s).
  Instance "-MGMTDB", status READY, has 1 handler(s) for this service...
The command completed successfully

No negative change there, but would the DG_SCAN listener also pick it up?

[oracle@ron12cprinode1 ~]$ lsnrctl status DGSCANLSNR_SCAN2_NET2

LSNRCTL for Linux: Version 12.1.0.2.0 - Production on 14-MAY-2016 20:07:21

Copyright (c) 1991, 2014, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=DGSCANLSNR_SCAN2_NET2)))
STATUS of the LISTENER
------------------------   
Alias                     DGSCANLSNR_SCAN2_NET2
Version                   TNSLSNR for Linux: Version 12.1.0.2.0 - Production
Start Date                14-MAY-2016 19:21:42
Uptime                    0 days 0 hr. 45 min. 39 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/12.1.0.2/grid/network/admin/listener.ora
Listener Log File         /u01/app/oracle/diag/tnslsnr/ron12cprinode1/dgscanlsnr_scan2_net2/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=DGSCANLSNR_SCAN2_NET2)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.102.64)(PORT=1521)))
Services Summary...
Service "RON" has 1 instance(s).
  Instance "RON_1", status READY, has 1 handler(s) for this service...
Service "RONXDB" has 1 instance(s).
  Instance "RON_1", status READY, has 1 handler(s) for this service...
Service "RON_DG_SRV" has 1 instance(s).
  Instance "RON_1", status READY, has 1 handler(s) for this service...
Service "RON_SRV" has 1 instance(s).
  Instance "RON_1", status READY, has 1 handler(s) for this service...
The command completed successfully

Well it seems it did. Now the question is: can I connect?

[oracle@ron12cprinode1 ~]$ sqlplus system/secretPWD@ron12cpri-scan/RON_SRV

SQL*Plus: Release 12.1.0.2.0 Production on Sat May 14 20:08:45 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

ERROR:
ORA-28002: the password will expire within 7 days


Last Successful login time: Sat May 14 2016 19:58:28 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options
[oracle@ron12cprinode1 ~]$ sqlplus system/secretPWD@ron12cpri-dg-scan/RON_DG_SRV

SQL*Plus: Release 12.1.0.2.0 Production on Sat May 14 20:08:54 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

ERROR:
ORA-28002: the password will expire within 7 days


Last Successful login time: Sat May 14 2016 20:08:45 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL> 

Summary

Well I can! So that should solve the problem for the active instance, however there are problems bound to happen when the instance restarts. Since I don’t really have control over the instance name in RAC One Node (RON_1 can be started on node 1 and node 2) I can’t hard-code the value for listener_networks into the spfile. As an end result I’d lock myself out just like CRS did. This is likely a similar issue for multi-node RAC using policy managed databases.

I have repeated the test with the latest version of the stack (upgraded in place), and got the same result. Here are the version numbers:

[oracle@ron12cprinode2 ~]$ /u01/app/oracle/product/12.1.0.2/dbhome_1/OPatch/opatch lspatches
22674709;Database PSU 12.1.0.2.160419, Oracle JavaVM Component (Apr2016)
22502555;OCW Patch Set Update : 12.1.0.2.160419 (22502555)
22291127;Database Patch Set Update : 12.1.0.2.160419 (22291127)

OPatch succeeded.

[oracle@ron12cprinode2 ~]$ /u01/app/12.1.0.2/grid/OPatch/opatch lspatches
22674709;Database PSU 12.1.0.2.160419, Oracle JavaVM Component (Apr2016)
22502555;OCW Patch Set Update : 12.1.0.2.160419 (22502555)
22291127;Database Patch Set Update : 12.1.0.2.160419 (22291127)

OPatch succeeded.

SQL> select action_time, patch_id,description,status from DBA_REGISTRY_SQLPATCH order by action_time;

ACTION_TIME                                PATCH_ID DESCRIPTION                                                            STATUS
---------------------------------------- ---------- ---------------------------------------------------------------------- ---------------
29-OCT-15 12.46.55.763581                  21555660 Database PSU 12.1.0.2.5, Oracle JavaVM Component (Oct2015)             SUCCESS
29-OCT-15 12.46.55.939750                  21359755 Database Patch Set Update : 12.1.0.2.5 (21359755)                      SUCCESS
14-MAY-16 21.32.15.211167                  21555660 Database PSU 12.1.0.2.5, Oracle JavaVM Component (Oct2015)             SUCCESS
14-MAY-16 21.32.15.233105                  22674709 Database PSU 12.1.0.2.160419, Oracle JavaVM Component (Apr2016)        SUCCESS
14-MAY-16 21.32.15.591460                  22291127 Database Patch Set Update : 12.1.0.2.160419 (22291127)                 SUCCESS

The same values for remote_listener and listener_networks as with 12.1.0.2.5 have been observed, and the error about setting listener_networks in the CRSD log was identical to the previous release. I guess that needs to be raised with Oracle …

Apparently this is now fixed with the July Proactive Patch

As stated in the introduction I applied Proactive Bundle Patch 23273686 and OJVM patch 23177536 to the Grid and RDBMS home. After a successful execution of datapatch the errors in the CRSD agent log I reported in this post went away, and I can also see the settings in the database alert.log:

Reconfiguration complete (total time 0.0 secs)
Instance started by oraagent
NOTE: ASMB registering with ASM instance as Flex client 0xffffffffffffffff (reg:3848556776) (new connection)
ORACLE_BASE from environment = /u01/app/oracle
NOTE: ASMB connected to ASM instance +ASM1 osid: 23292 (Flex mode; client id 0x10007)
NOTE: initiating MARK startup
Starting background process MARK
MARK started with pid=34, OS id=23296
ALTER SYSTEM SET local_listener=' (ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.61)(PORT=1521))'
   SCOPE=MEMORY SID='RON_1';
NOTE: MARK has subscribed
2016-07-28 15:05:58.435000 +01:00
ALTER SYSTEM SET remote_listener=' ron12cpri-scan:1521' SCOPE=MEMORY SID='RON_1';
ALTER SYSTEM SET listener_networks='(( NAME=net2)(LOCAL_LISTENER=(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)
  (HOST=192.168.102.61)(PORT=1523)))))','(( NAME=net2)(REMOTE_LISTENER=ron12cpri-dg-scan:1521))' 
  SCOPE=MEMORY SID='RON_1';
ALTER DATABASE MOUNT /* db agent *//* {1:45086:1009} */
2016-07-28 15:05:59.470000 +01:00

I can now see that the settings are reflected in the database as well:

SQL> select inst_id, value from gv$spparameter where name = 'listener_networks';

   INST_ID VALUE
---------- ------------------------------
         1

SQL> select inst_id, value from gv$parameter where name = 'listener_networks';

   INST_ID
----------
VALUE
--------------------------------------------------------------------------------
         1
(( NAME=net2)(LOCAL_LISTENER=(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.1
02.61)(PORT=1523))))), (( NAME=net2)(REMOTE_LISTENER=ron12cpri-dg-scan:1521))

There is only 1 instance because this is a RAC One database. There is no value in the spfile for the listener_networks parameter, but it’s set in gv$parameter, indicating it has been set dynamically.

As a result, even the listeners know about my database:

[oracle@ron12cprinode2 ~]$ lsnrctl status LISTENER_SCAN1

LSNRCTL for Linux: Version 12.1.0.2.0 - Production on 28-JUL-2016 15:20:59

Copyright (c) 1991, 2014, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER_SCAN1
Version                   TNSLSNR for Linux: Version 12.1.0.2.0 - Production
Start Date                28-JUL-2016 14:35:04
Uptime                    0 days 0 hr. 45 min. 55 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/12.1.0.2/grid/network/admin/listener.ora
Listener Log File         /u01/app/oracle/diag/tnslsnr/ron12cprinode2/listener_scan1/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.65)(PORT=1521)))
Services Summary...
Service "-MGMTDBXDB" has 1 instance(s).
  Instance "-MGMTDB", status READY, has 1 handler(s) for this service...
Service "RON" has 1 instance(s).
  Instance "RON_1", status READY, has 1 handler(s) for this service...
Service "RONXDB" has 1 instance(s).
  Instance "RON_1", status READY, has 1 handler(s) for this service...
Service "RON_DG_SRV" has 1 instance(s).
  Instance "RON_1", status READY, has 1 handler(s) for this service...
Service "RON_SRV" has 1 instance(s).
  Instance "RON_1", status READY, has 1 handler(s) for this service...
Service "_mgmtdb" has 1 instance(s).
  Instance "-MGMTDB", status READY, has 1 handler(s) for this service...
Service "ron12cpri" has 1 instance(s).
  Instance "-MGMTDB", status READY, has 1 handler(s) for this service...
The command completed successfully

What about the SCAN listener on the second network?

[oracle@ron12cprinode2 ~]$ lsnrctl status DGSCANLSNR_SCAN2_NET2

LSNRCTL for Linux: Version 12.1.0.2.0 - Production on 28-JUL-2016 15:21:10

Copyright (c) 1991, 2014, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=DGSCANLSNR_SCAN2_NET2)))
STATUS of the LISTENER
------------------------
Alias                     DGSCANLSNR_SCAN2_NET2
Version                   TNSLSNR for Linux: Version 12.1.0.2.0 - Production
Start Date                28-JUL-2016 14:35:03
Uptime                    0 days 0 hr. 46 min. 7 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/12.1.0.2/grid/network/admin/listener.ora
Listener Log File         /u01/app/oracle/diag/tnslsnr/ron12cprinode2/dgscanlsnr_scan2_net2/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=DGSCANLSNR_SCAN2_NET2)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.102.64)(PORT=1521)))
Services Summary...
Service "RON" has 1 instance(s).
  Instance "RON_1", status READY, has 1 handler(s) for this service...
Service "RONXDB" has 1 instance(s).
  Instance "RON_1", status READY, has 1 handler(s) for this service...
Service "RON_DG_SRV" has 1 instance(s).
  Instance "RON_1", status READY, has 1 handler(s) for this service...
Service "RON_SRV" has 1 instance(s).
  Instance "RON_1", status READY, has 1 handler(s) for this service...
The command completed successfully

So finally I can connect to the system referencing both SCANs, without having to do anything on my end:

[oracle@ron12cprinode2 ~]$ sqlplus a/b@ron12cpri-scan/ron_srv

SQL*Plus: Release 12.1.0.2.0 Production on Thu Jul 28 15:23:08 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

ERROR:
ORA-01017: invalid username/password; logon denied


Enter user-name: 
[oracle@ron12cprinode2 ~]$ sqlplus a/b@ron12cpri-dg-scan/ron_srv

SQL*Plus: Release 12.1.0.2.0 Production on Thu Jul 28 15:23:17 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

ERROR:
ORA-01017: invalid username/password; logon denied


Enter user-name: 

This seems to prove that the problem is solved.

Posted in 12c Release 1, Linux, RAC | 2 Comments »

 
Follow

Get every new post delivered to your Inbox.

Join 3,312 other followers