Category Archives: Linux

Little things worth knowing: OSWatcher Analyser Dashboard

I have written a few articles about Tracefile Analyzer (TFA) in the recent past. As you may recall from these posts, a more comprehensive TFA version than the one provided with the installation media is available from My Oracle Support (MOS for short). As part of this version, you get a lot of useful, additional tools, including OSWatcher. I love OSWatcher, simply because it gives me insights that would be very hard to get with sar for example. SAR tends to be the least common denominator on most (proper) operating systems and it’s better than nothing, but please read on and let me explain why I like OSWatcher so much.

Feel free to head back to my earlier posts if you like to get some more details about “stock TFA” and “MOS TFA”. These are terms I coined by the way, you won’t find them in a MOS search.

Say hello to the OSWatcher Analyser dashboard

In this article I’d like to introduce the OSWatcher Analyser dashboard to you. I think it’s probably the best way (certainly a very convenient one) to get an overview of what is going on at the operating system level. And it includes pretty pictures that are so easy to understand! OSWatcher (more precisely, oswbb) should be running on the system as part of “MOS Tracefile Analyzer”. In other words, it has to be installed.

The environment for this post

As always, before beginning the write-up, here are some details about the system I’m using. My virtual RAC system is based on KVM, and I’m running Oracle Linux 7.4 with UEK 4. The database reasonably current at version 18.3.0.0.180717 (but that shouldn’t matter as this post isn’t about the database!), and I have upgraded TFA to 18.3.3.0.0. This is the latest TFA version at the time of writing.

[oracle@rac18pri1 ~]$ tfactl print version
TFA Version : 18.3.3.0.0

[oracle@rac18pri1 ~]$ tfactl toolstatus

.------------------------------------------------------------------.
|                  TOOLS STATUS - HOST : rac18pri1                 |
+----------------------+--------------+--------------+-------------+
| Tool Type            | Tool         | Version      | Status      |
+----------------------+--------------+--------------+-------------+
| Development Tools    | orachk       |   12.2.0.1.3 | DEPLOYED    |
|                      | oratop       |       14.1.2 | DEPLOYED    |
+----------------------+--------------+--------------+-------------+
| Support Tools Bundle | darda        | 2.10.0.R6036 | DEPLOYED    |
|                      | oswbb        |        8.1.2 | RUNNING     |
|                      | prw          | 12.1.13.11.4 | NOT RUNNING |
+----------------------+--------------+--------------+-------------+
| TFA Utilities        | alertsummary |   12.2.1.1.0 | DEPLOYED    |
|                      | calog        |   12.2.0.1.0 | DEPLOYED    |
|                      | dbcheck      |   18.3.0.0.0 | DEPLOYED    |
|                      | dbglevel     |   12.2.1.1.0 | DEPLOYED    |
|                      | grep         |   12.2.1.1.0 | DEPLOYED    |
|                      | history      |   12.2.1.1.0 | DEPLOYED    |
|                      | ls           |   12.2.1.1.0 | DEPLOYED    |
|                      | managelogs   |   12.2.1.1.0 | DEPLOYED    |
|                      | menu         |   12.2.1.1.0 | DEPLOYED    |
|                      | param        |   12.2.1.1.0 | DEPLOYED    |
|                      | ps           |   12.2.1.1.0 | DEPLOYED    |
|                      | pstack       |   12.2.1.1.0 | DEPLOYED    |
|                      | summary      |   12.2.1.1.0 | DEPLOYED    |
|                      | tail         |   12.2.1.1.0 | DEPLOYED    |
|                      | triage       |   12.2.1.1.0 | DEPLOYED    |
|                      | vi           |   12.2.1.1.0 | DEPLOYED    |
'----------------------+--------------+--------------+-------------'

Note :-
  DEPLOYED    : Installed and Available - To be configured or run interactively.
  NOT RUNNING : Configured and Available - Currently turned off interactively.
  RUNNING     : Configured and Available.

[oracle@rac18pri1 ~]$ 

As you can see, oswbb is part of the “Support Tools Bundle” and actively running out of the box.

Apart from OSWatcher.sh it is important for OSWatcherFM.sh to run as well, as it clears the archive directory where OSWatcher stores its results. Of course that doesn’t relieve you from keeping an eye on your filesystem usage numbers :) Both of these are started by default.

Invoking OSWBB

Assuming there is data for analysis (eg OSWatcher has run for a bit), you can invoke oswbb via tfactl. It will go through a somewhat lengthy parse phase if you don’t narrow your search down to a specific time, but eventually presents you with options:

[oracle@rac18pri1 ~]$ tfactl oswbb

Starting OSW Analyzer V8.1.2
OSWatcher Analyzer Written by Oracle Center of Expertise
Copyright (c)  2017 by Oracle Corporation

Parsing Data. Please Wait...

Scanning file headers for version and platform info...

Parsing file rac18pri1_iostat_18.10.10.0600.dat ...
Parsing file rac18pri1_iostat_18.10.10.0700.dat ...
Parsing file rac18pri1_iostat_18.10.10.0800.dat ...

[...]

Parsing file rac18pri1_ps_18.10.12.0400.dat ...
Parsing file rac18pri1_ps_18.10.16.1900.dat ...
Parsing file rac18pri1_ps_18.10.16.2000.dat ...


Parsing Completed.


Enter 1 to Display CPU Process Queue Graphs
Enter 2 to Display CPU Utilization Graphs
Enter 3 to Display CPU Other Graphs
Enter 4 to Display Memory Graphs
Enter 5 to Display Disk IO Graphs

Enter GC to Generate All CPU Gif Files
Enter GM to Generate All Memory Gif Files
Enter GD to Generate All Disk Gif Files
Enter GN to Generate All Network Gif Files

Enter L to Specify Alternate Location of Gif Directory
Enter Z to Zoom Graph Time Scale (Does not change analysis dataset)
Enter B to Returns to Baseline Graph Time Scale (Does not change analysis dataset)
Enter R to Remove Currently Displayed Graphs

Enter X to Export Parsed Data to Flat File
Enter S to Analyze Subset of Data(Changes analysis dataset including graph time scale)
Enter A to Analyze Data
Enter D to Generate DashBoard

Enter Q to Quit Program

Please Select an option:

As you can see, there are plenty of options available. In the interactive mode you are currently using it’s possible to display all sorts of charts for CPU, memory, disk, and network.

Another option – and one I like a lot, is to generate all the data in a dashboard.

Creating the dashboard

If you choose option “D” oswbb will go ahead and create a dashboard. You are prompted to enter a name before it goes off and creates it. The location where it saves it is not immediately obvious … In my case – using RAC – the result is stored in $ORACLE_BASE/tfa/repository/suptools/$(hostname)/oswbb/oracle/oswbb/analysis

I suppose that’s the same location in Oracle Restart and single instance, although I haven’t been able to verify that claim.

Within the “analysis” directory you find all your information combined in a subdirectory. It has the same name you assigned earlier when you requested the dashboard generation. It’s probably easiest to transfer the entire directory to a workstation for analysis.

Within the top level directory you find 2 files – analysis.txt as well as another directory containing the HTML representation of the data gathered.

[martin@host dashboard1]$ ls -l
total 14588
-rw-r--r-- 1 martin martin 14931077 Oct 16 22:07 analysis.txt
drwxr-xr-x 7 martin martin     4096 Oct 16 22:07 dashboard
[martin@host dashboard1]$

Let’s start with the latter to get a first impression.

The HTML output

I have included a couple of screenshots that you might find interesting about the system under observation. If this was a “real” system and not an undersized set of VMs I’d probably be quite nervous by just looking at this data … there is no reason for concern though, this is just my lab! And I did my best to hurt the VMs, otherwise there wouldn’t be anything to show here :)

overview

As you can see, all the main components are colour-coded. A number of important charts are presented as well. CPU looks particularly bad. You can enlarge the images by clicking on them.

The dashboard provides you with detailed information for CPU, memory, I/O and networking. I’d like to show you a few of these now.

CPU

Clicking on the red CPU button, I get more details about CPU utilisation on the system.

cpu

There are actually a few more charts, but they didn’t fit on the screen. I like the fact that I am immediately presented with critical findings at the top of the page, and there is more information in form of charts. Again, clicking on the chart provides a larger picture.

Heading over to the text file (or by clicking on the “details” button not shown in the figure) I mentioned earlier I get more details:

############################################################################
# Section 4.2: CPU UTILIZATION: PERCENT BUSY
# CPU utilization should not be high over long periods of time. The higher 
# the cpu utilization the longer it will take processes to run.  Below lists 
# the number of times (NUMBER) and percent of the number of times (PERCENT) 
# that cpu percent busy was High (>95%) or Very High (100%). Pay attention 
# to high spanning multiple snaps as this represents the number of times cpu
# percent busy remained high in back to back snapshots
                                       NUMBER  PERCENT
------------------------------------------------------
Snaps captured in archive                 5568   100.00
High (>95%)                                 48     0.86
Very High (100%)                            42     0.75
High spanning multiple snaps                24     0.43

CPU UTILIZATION: The following snaps recorded cpu utilization of 100% busy:
SnapTime                      
------------------------------
Wed Oct 10 11:08:56 BST 2018
Wed Oct 10 11:09:29 BST 2018
Wed Oct 10 20:14:54 BST 2018
[...]

Using these data points I can further drill down into the information gathered by OSWatcher in the archive directory to look at specific output.

Memory

The memory button is also in bright red, so something must be wrong in that area.

memory

The system doesn’t seem to be in best shape. This is entirely my fault: I have assigned too little memory to my VMs, and here is proof!

Changing the scope

After the first investigation using all the data, you may want to narrow the scope down a bit. Or you might already know that an issue was reported last night between 2 AM and 4 AM. Narrowing the scope down for the purpose of creating a (more detailed) dashboard can be done using the “S” flag in the initial menu, as shown here:

Please Select an Option:S

Specify Analysis Start Time. Valid entry between Oct 10 07:00:03 2018 and Oct 16 21:31:59 2018
Example Format To Enter Time: Oct 10 07:00:03 2018  :Oct 11 21:00:00 2018

Specify Analysis End Time. Valid entry between Oct 10 07:00:03 2018 and Oct 16 21:31:59 2018
Example Format To Enter Time: Oct 16 21:31:59 2018  :Oct 12 01:00:00 2018

Dates accepted. Verifying valid begin/end data points...

Validating times in the archive...


Recalibrating data...
Scanning file headers for version and platform info...

Parsing file rac18pri1_iostat_18.10.11.2000.dat ...

[...]

Parsing file rac18pri1_ps_18.10.12.0000.dat ...

Enter a unique analysis directory name or enter  to accept default name:

There are other options to narrow the scope down: menu option “Z” allows you to specify a date/time range for interactive use without changing the dataset. In interactive mode you can change the scope using “Z”, followed by the creation of any chart of interest (menu items 1-5). You can reset the “zoom” by pressing “B”.

Summary

Did I say I really like OSWatcher? It’s really helpful. If you install it together with TFA you don’t even need to concern yourself with writing a startup script, it’ll just be started. If you have to, you can go back in time (within reason) and investigate O/S performance statistics. Unlike SAR, it will give you a 30 second granularity, which is a lot better than the rather coarse 10 minute interval.

Admittedly, SAR keeps a month worth of data, so both tools really complement each other quite nicely.

Happy troubleshooting!

Advertisements

RAC One node databases are relocated by opatchauto in 12.2 part 2

In a previous post I shared how I found out that RAC One Node databases are relocated on-line during patching and I promised a few more tests and sharing of implications. If you aren’t familiar with RAC One Node I recommend having a look at the official documentation: The Real Application Clusters Administration and Deployment Guide features it prominently in the introduction. One of the things I like to keep in mind when working with RAC One Node is this, quoting from section 1.3 in said admin and deployment guide:

Using the Single Client Access Name (SCAN) to connect to the database, clients can locate the service independently of the node on which it is running. Relocating an Oracle RAC One Node instance is therefore mostly transparent to the client, depending on the client connection. Oracle recommends to use either Application Continuity and Oracle Fast Application Notification or Transparent Application Failover to minimize the impact of a relocation on the client.

A little later, you can read this important additional piece of information in the “notes” section:

To prepare for all failure possibilities, you must add at least one Dynamic Database Service (Oracle Clusterware-managed database service) to an Oracle RAC One Node database.

If you created your RAC One database with the database creation assistant (dbca), you are already complying with that rule. In my case, my lab database is named RON with the mandatory service RON_SVC. I opted for an instance prefix of “DCB”.

What I’d like to try out for this blog post is what happens to an active workload on a RAC One database during patching.

I have successfully patched multi-node RAC systems, but that required the application to support this procedure. One of my favourite talks is named “Advanced RAC programming techniques” where I demonstrate the resilience of an application based on RAC to node failures by using Universal Connection Pool (UCP), JDBC and Fast Connection Failover (FCF). UCP is a Fast Application Notification (FAN) aware connection pool allowing my application to react to cluster events such as node up/node down. The idea is to mask instance failure from users.

If all of this sounds super-cryptic, I would like to suggest chapters 5 and 6 of the RAC administration and deployment guide for you. There you can read more about Workload Management with Dynamic Database Services and Application Continuity in RAC.

RAC One Node is different from multi-node RAC as it is only ever active on one node in normal operations. Online relocation, as described in the previous post, is supported by temporarily adding a second (destination) instance to the cluster, and moving all transactions across before terminating them after a time-out (default 30m). Once the time-out is reached (or all sessions made it across to the destination) the source instance is shut down and you are back to 1 active instance.

The online relocation does not care too much about the compatibility of the application with the process. If your application is written for RAC, most likely it will migrate quickly from source to destination instance. If it isn’t, well, the hard timeout might kick in and evict a few of your users. In a worse case your users need to re-connect to the database. Even worse still, you might have to restart the middle-tier system(s).

Sadly I haven’t seen too many applications capable of handling RAC events properly. One application that does is Swingbench, so I’ll stick with it. I configured it according to Dominic Giles’s blog post.

This post assumes that you are properly licensed to use all these features.

The environment

My setup hasn’t changed from previous post so I spare you the boredom of repeating it here. Jump over to the other post for details.

Preparations

For this blogpost I need to ensure that my RAC One node database resides on the node I am about to patch. I have again followed the patch readme very carefully, I have made sure that I have (proven, working, tested) backups of the entire stack and all the rest of it…

My database is indeed actively running on the node I am about to patch:

[oracle@rac122sec2 ~]$ srvctl status database -db DCB -detail -verbose
Instance DCB_1 is running on node rac122sec2 with online services RON_SVC. Instance status: Open.
Instance DCB_1 is connected to ASM instance +ASM2
Online relocation: INACTIVE

Before I can benefit from Application Continuity, I need to make a few changes to my application service, RON_SVC. There are quite a few sources to choose from, I went with the JDBC Developer’s Guide. Here’s the modification:

[oracle@rac122sec2 ~]$ srvctl modify service -db DCB -service RON_SVC -commit_outcome true \
> -failoverretry 30 -failoverdelay 10 -failovertype transaction \
> -replay_init_time 1800 -retention 86400 -notification true
[oracle@rac122sec2 ~]

Following the instructions on Dominic Giles’s blog, I also need to grant SOE the right to execute DBMS_APP_CONT.

Finally, I need to make changes to my Swingbench configuration file. The relevant part is shown here:

<SwingBenchConfiguration xmlns="http://www.dominicgiles.com/swingbench/config">
    <Name>"Order Entry (PLSQL) V2"</Name>
    <Comment>Version 2 of the SOE Benchmark running in the database using PL/SQL</Comment>
    <Connection>
        <UserName>soe</UserName>
        <Password></Password>
        <ConnectString>(DESCRIPTION=(TRANSPORT_CONNECT_TIMEOUT=5)
        (RETRY_COUNT=3)(FAILOVER=ON)
        (ADDRESS=(PROTOCOL=TCP)(HOST=rac122sec-scan)(PORT=1521))
        (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=RON_SVC)))</ConnectString>
        <DriverType>Oracle jdbc Driver</DriverType>
        <ConnectionPooling>
            <PooledInitialLimit>5</PooledInitialLimit>
            <PooledMinLimit>5</PooledMinLimit>
            <PooledMaxLimit>10</PooledMaxLimit>
            <PooledInactivityTimeOut>50</PooledInactivityTimeOut>
            <PooledConnectionWaitTimeout>45</PooledConnectionWaitTimeout>
            <PooledPropertyCheckInterval>10</PooledPropertyCheckInterval>
            <PooledAbandonedConnectionTimeout>120</PooledAbandonedConnectionTimeout>
        </ConnectionPooling>
        <Properties>
            <Property Key="StatementCaching">50</Property>
            <Property Key="FastFailover">true</Property>
            <Property Key="FetchSize">20</Property>
            <Property Key="OnsConfiguration">nodes=rac122sec1:6200,rac122sec2:6200</Property>
            <Property Key="FetchSize">20</Property>
            <Property Key="AppContinuityDriver">true</Property>
        </Properties>
    </Connection>
    <Load>
        <NumberOfUsers>5</NumberOfUsers>
        <MinDelay>0</MinDelay>
        <MaxDelay>0</MaxDelay>
        <InterMinDelay>50</InterMinDelay>
        <InterMaxDelay>500</InterMaxDelay>
        <QueryTimeout>120</QueryTimeout>
        <MaxTransactions>-1</MaxTransactions>
        <RunTime>0:0</RunTime>
        <LogonGroupCount>1</LogonGroupCount>
        <LogonDelay>1000</LogonDelay>
        <LogOutPostTransaction>false</LogOutPostTransaction>
        <WaitTillAllLogon>false</WaitTillAllLogon>
        <StatsCollectionStart>0:0</StatsCollectionStart>
        <StatsCollectionEnd>0:0</StatsCollectionEnd>
        <ConnectionRefresh>0</ConnectionRefresh>
        <TransactionList>
...

The connection string is actually on a single line, I have formatted it for readability in the above example. The main change from the standard configuration file is the use of connection pooling and setting the required properties for Application Continuity.

Let’s patch!

Once all the preparations are completed, it’s time to see how RAC One Node deals with an active workload undergoing an online relocation during patching. First of all I need to start the workload. I’d normally use charbench for this, but this time around opted for the GUI. It shows performance graphs over a 3 minute rolling window.

A few minutes after starting the benchmark I commenced patching. Soon thereafter, opatchauto told me that it was relocating the database.

OPatchauto session is initiated at Fri Jul 27 14:52:23 2018

...

Relocating RACOne home before patching on home /u01/app/oracle/product/12.2.0.1/dbhome_1

According to the session log, this happened at 14:54. And by the way, always keep the session log, it’s invaluable!

2018-07-27 14:54:07,914 INFO  [163] com.oracle.glcm.patch.auto.db.product.executor.PatchingStepExecutor - 
Executing command as oracle: 
 /bin/sh -c 'ORACLE_HOME=/u01/app/oracle/product/12.2.0.1/dbhome_1 
 /u01/app/oracle/product/12.2.0.1/dbhome_1/bin/srvctl relocate database -d DCB -n rac122sec1 -w 30 -v'
2018-07-27 14:54:07,914 INFO  [163] com.oracle.glcm.patch.auto.db.product.executor.PatchingStepExecutor - 
  COMMAND Looks like this: 
    /bin/sh -c 'ORACLE_HOME=/u01/app/oracle/product/12.2.0.1/dbhome_1 
    /u01/app/oracle/product/12.2.0.1/dbhome_1/bin/srvctl relocate database -d DCB -n rac122sec1 -w 30 -v'
2018-07-27 14:55:31,736 INFO  [163] com.oracle.glcm.patch.auto.db.product.executor.PatchingStepExecutor - 
  status: 0
2018-07-27 14:55:31,737 INFO  [163] com.oracle.glcm.patch.auto.db.product.executor.PatchingStepExecutor - 
  Successfully executed the above command.

2018-07-27 14:55:31,737 INFO  [163] com.oracle.glcm.patch.auto.db.product.executor.PatchingStepExecutor - 
  Output from the command:
2018-07-27 14:55:31,737 INFO  [163] com.oracle.glcm.patch.auto.db.product.executor.PatchingStepExecutor -
Configuration updated to two instances
Instance DCB_2 started
Services relocated
Waiting for up to 30 minutes for instance DCB_1 to stop ...
Instance DCB_1 stopped
Configuration updated to one instance

2018-07-27 14:55:31,737 INFO  [163] com.oracle.glcm.patch.auto.db.product.executor.PatchingStepExecutor - 
  Command executed successfully.

You can see the same information by querying Clusterware, although there aren’t any timestamps associated with it:

[root@rac122sec1 ~]# srvctl status database -db DCB -detail -verbose
Instance DCB_1 is running on node rac122sec2 with online services RON_SVC. Instance status: Open.
Instance DCB_1 is connected to ASM instance +ASM2
Online relocation: ACTIVE
Source instance: DCB_1 on rac122sec2
Destination instance: DCB_2 on rac122sec1
[root@rac122sec1 ~]# 

Although the online relocation timeout was set to 30 minutes, use of modern coding techniques and connection pooling allowed for a much faster online relocation. As you can see in the log excerpt the entire relocation was completed 2018-07-27 14:55:31,737. Clusterware now tells me that my database runs on node 1:

[root@rac122sec1 ~]# date;  srvctl status database -db DCB -detail -verbose
Fri 27 Jul 14:55:38 2018
Instance DCB_2 is running on node rac122sec1 with online services RON_SVC. Instance status: Open.
Instance DCB_2 is connected to ASM instance +ASM1
Online relocation: INACTIVE

While this is wicked, what are the implications for the application? Have a look at this print screen, taken a minute after the online relocation completed.

As you can see there was nothing unusual recorded (tab events to the left), and I couldn’t see a drop in the number of sessions connected. I noticed a slight blip in performance though but it recovered very soon thereafter.

Summary

During automatic patching of the Oracle stack opatchauto will perform an online relocation of a RAC One Node database if it is found running on the node currently undergoing patching. If your application is developed with RAC in mind – such as using a FAN-aware connection pool like UCP, and either supports Fast Connection Failover or Application Continuity, there is a good chance that patching the system does not affect your service.

While my test was successful, it is by no means representative of a production workload – my RAC One database has a SGA of 4 GB and there were 5 concurrent users – hardly what I see in the real world. However that shouldn’t deter you: if you are using RAC One Node I think it’s definitely worth a try implementing modern technology into the application.

RAC One node databases are relocated by opatchauto in 12.2 part 1

This is an interesting observation I wanted to share. I have a feeling as if there didn’t seem to be too much information out there for RAC One Node (RON) users, and I hope this helps someone thinking about patching his system.

RAC-rolling patching is well documented in patch readme files, blog posts and official white papers. Most RAC DBAs have a solid handle on the procedure. Patching RAC One Node is a different affair.

What happens when patching a RAC One Node system? As the name suggests a RAC One Node database is a cluster database restricted to one active instance in normal operations. It is possible to relocate the database from one node to another online. Oracle does this by temporarily adding a second instance to the cluster database with the intention of letting it take over from the source instance. At the end of the online relocation, the source instance is shut down, and only the destination instance remains up and running.

An online relocation quite often is a manual operation. However I noticed that such an online relocation can happen during patching with opatchauto as well, at least in 12.2.

This post is intended to show you the process as it is, in the next part I’d like to show some implications of that approach.

The environment

In this example my lab environment consists of a 2 node RAC system currently patched to 12.2.0.1.180417. I wanted to apply the July 2018 RU to the system next to get some experience with the patch.

I have one RDBMS home in addition to the mandatory Grid home, same release level for both, no one-off patches (it’s a lab after all). The virtual machines run Oracle Linux 7.4 with kernel UEK4. To keep things simple there’s a single RAC One database, named RON. I assigned it DCB (“data centre B”) as unique name because I don’t like setting db_unique_name to reflect roles such as “PROD” and “STDBY”. It gets confusing when “STDBY” runs in primary role :)

Here’s the current status of my components:

[oracle@rac122sec2 ~]$ srvctl status database -db DCB
Instance DCB_1 is running on node rac122sec1
Online relocation: INACTIVE

[oracle@rac122sec2 ~]$ srvctl status service -db DCB
Service RON_SVC is running on instance(s) DCB_1
[oracle@rac122sec2 ~]$ 

For the curious, here’s the configuration metadata:

[oracle@rac122sec2 ~]$ srvctl config service -db DCB
Service name: RON_SVC
Server pool: 
Cardinality: 1
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: false
Global: false
Commit Outcome: false
Failover type: 
Failover method: 
TAF failover retries: 
TAF failover delay: 
Failover restore: NONE
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: SERVICE_TIME
TAF policy specification: NONE
Edition: 
Pluggable database name: 
Maximum lag time: ANY
SQL Translation Profile: 
Retention: 86400 seconds
Replay Initiation Time: 300 seconds
Drain timeout: 
Stop option: 
Session State Consistency: DYNAMIC
GSM Flags: 0
Service is enabled
Preferred instances: DCB_1
Available instances: 
CSS critical: no

[oracle@rac122sec2 ~]$ srvctl config database -db DCB
Database unique name: DCB
Database name: RON
Oracle home: /u01/app/oracle/product/12.2.0.1/dbhome_1
Oracle user: oracle
Spfile: +DATA/DCB/spfileRON.ora
Password file: +DATA/DCB/orapwRON
Domain: 
Start options: open
Stop options: immediateb
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: 
Disk Groups: DATA,RECO
Mount point paths: 
Services: RON_SVC
Type: RACOneNode
Online relocation timeout: 30
Instance name prefix: DCB
Candidate servers: rac122sec1,rac122sec2
OSDBA group: dba
OSOPER group: oper
Database instances: DCB_1
CSS critical: no
CPU count: 0
Memory target: 0
Maximum memory: 0
Default network number for database services: 
Database is administrator managed

The most important takeaway is that my RON instance DCB_1 is running on node rac122sec1.

Now let’s patch

After having followed the instructions in the patch readme closely, and after double/triple/quadrupel checking that I have (working, tried and tested!) backups of the entire stack I am ready to patch. This time around I’m following the instructions for the automatic application of the Grid Infrastructure RU, eg using opatchauto. Here is some relevant output from the patching session:

...
OPatchauto session is initiated at Thu Jul 26 14:12:12 2018

System initialization log file is /u01/app/12.2.0.1/grid/cfgtoollogs/opatchautodb/systemconfig2018-07-26_02-12-14PM.log.

Session log file is /u01/app/12.2.0.1/grid/cfgtoollogs/opatchauto/opatchauto2018-07-26_02-13-15PM.log
The id for this session is Q4JA

Executing OPatch prereq operations to verify patch applicability on home /u01/app/12.2.0.1/grid

Executing OPatch prereq operations to verify patch applicability on home /u01/app/oracle/product/12.2.0.1/dbhome_1
Patch applicability verified successfully on home /u01/app/oracle/product/12.2.0.1/dbhome_1

Patch applicability verified successfully on home /u01/app/12.2.0.1/grid


Verifying SQL patch applicability on home /u01/app/oracle/product/12.2.0.1/dbhome_1
SQL patch applicability verified successfully on home /u01/app/oracle/product/12.2.0.1/dbhome_1


Preparing to bring down database service on home /u01/app/oracle/product/12.2.0.1/dbhome_1

WARNING: The service RON_SVC configured on dcb will not be switched as it is not configured to run on any other node(s).
No step execution required.........
 

Relocating RACOne home before patching on home /u01/app/oracle/product/12.2.0.1/dbhome_1
/u01/app/oracle/product/12.2.0.1/dbhome_1 is not a RACOne database. No step execution required........
 

Bringing down CRS service on home /u01/app/12.2.0.1/grid
...

Wait a minute, what’s that? Have a look at the line beginning with “Relocating RACOne home before patching…”. Relocating the database wasn’t necessary in this case (remember that the database was active on rac122sec1-the other node), but opatchauto can definitely relocate your RAC One database.

When it does, you will see something like this in the output generated by opatchauto:

...
Preparing to bring down database service on home /u01/app/oracle/product/12.2.0.1/dbhome_1

WARNING: The service RON_SVC configured on dcb will not be switched as it is not configured to run on any other node(s).
Successfully prepared home /u01/app/oracle/product/12.2.0.1/dbhome_1 to bring down database service


Relocating RACOne home before patching on home /u01/app/oracle/product/12.2.0.1/dbhome_1
Relocated RACOne home before patching on home /u01/app/oracle/product/12.2.0.1/dbhome_1
...

The last 2 lines are those of interest. opatchauto detected that a RAC One database was running on the active node, and relocated it. Under the covers it uses a “srvctl relocate database …” command, as shown in the session log file.

Interestingly however, and contrary to what I expected, opatchauto moves the RAC One database back to where it came from as a post-patch step. Towards then end of the patching session I saw this:

...
Starting CRS service on home /u01/app/12.2.0.1/grid
Postpatch operation log file location: /u01/app/oracle/crsdata/rac122sec2/crsconfig/crspatch_rac122sec2_2018-07-26_03-01-06PM.log
CRS service started successfully on home /u01/app/12.2.0.1/grid


Relocating back RACOne to home /u01/app/oracle/product/12.2.0.1/dbhome_1
Relocated back RACOne home successfully to home /u01/app/oracle/product/12.2.0.1/dbhome_1


Preparing home /u01/app/oracle/product/12.2.0.1/dbhome_1 after database service restarted
No step execution required.........
...

The relevant bit is in the middle (“relocating …”). After relocating the database to rac122sec1 opatchauto moved it back to rac122sec2.

Summary

Unlike rolling patching on multi-node RAC where all instances on the patched RDBMS home are shut down and applications rely on connection pools and Fast Application Notification to maintain service ability, a RAC One Node database might be relocated to a different node in the cluster. There are implications to that process for application developers, some of which I hope to share in the next post.

Little things worth knowing: Creating a RAC One Node database on the command line

This post is going to be super short, and mostly just a note to myself as I constantly forget how to create a RAC One database on the command line. This post is for 12.2.0.1 but should be similar on 12.1 (although I didn’t test!).

Provided you are licensed appropriately, this is probably the most basic way how you create an admin-managed RAC One database on Linux for use in a lab environment:

dbca -silent -createDatabase -gdbName RON -templateName gold_image01.dbc \
 -createAsContainerDatabase false -databaseConfigType RACONENODE \
 -RACOneNodeServiceName RON_SVC -totalMemory 1024 \
 -nodelist rac122node1,rac122node2 -storageType ASM \
 -datafileDestination '+DATA' -recoveryAreaDestination '+RECO' \
 -recoveryAreaSize 10240

This works for me, but most likely not for you :) And it’s certainly not suitable for a production deployment. Make sure to adapt the command as needed; I tend to create gold images for use with dbca, and this is one example.

The command itself should be fairly self-explanatory. If you are unsure about the meaning of the various options, have a look at the output of “dbca -help -createDatabase” and the official documentation/My Oracle Support. I learned the hard way that forgetting the “-nodelist” results in a single instance creation instead of an error message.

I didn’t find too many examples on the net, hope someone finds this useful.

OSWatcher, Tracefile Analyzer, and Oracle RAC 12.2

When I started the series about Tracefile Analyzer (TFA) I promised three parts. One for single instance, another one for Oracle Restart and this one is going to be about Real Application Clusters. The previous two parts are published already, this is the final piece.

The environment

I am using a virtualised 2-node cluster named rac122pri with nodes rac122pri1{1,2} based on Oracle Linux 7.4. RAC is patched to 12.2.0.1.180116. I installed a Grid Home and an RDBMS home (Enterprise Edition).

Real Application Clusters

Before starting this discussion it’s worth pointing out that TFA integration in RAC 12.2 works really well. TFA is installed as part of the initial setup of the binaries and documented in the Autonomous Health Framework.

As soon as you execute root.sh as part of your RAC 12.2 Grid Infrastructure installation, TFA is set up as the first step out of 19. Unfortunately I don’t have the screen output available, but all the action is logged in rootcrs_$(hostname)_timestamp.log anyway. You can find that logfile in $ORACLE_BASE/crsdata/$(hostname)/crsconfig. If memory serves me right this is a 12.2 change, I think the file was in $GRID_HOME/cfgtoollogs in 12.1. but I’m not sure (and don’t have a system to check against).

Here is the relevant output from the log file:

2018-05-11 12:16:02: Command output:
>  CLSRSC-594: Executing installation step 1 of 19: 'SetupTFA'. 
>End Command output
2018-05-11 12:16:02: CLSRSC-594: Executing installation step 1 of 19: 'SetupTFA'.
2018-05-11 12:16:02: Executing cmd: /u01/app/12.2.0.1/grid/bin/clsecho -p has -f clsrsc -m 4001
2018-05-11 12:16:02: Command output:
>  CLSRSC-4001: Installing Oracle Trace File Analyzer (TFA) Collector. 
>End Command output
2018-05-11 12:16:02: CLSRSC-4001: Installing Oracle Trace File Analyzer (TFA) Collector.
2018-05-11 12:16:02: Executing cmd: /u01/app/12.2.0.1/grid/crs/install/tfa_setup -silent -crshome /u01/app/12.2.0.1/grid
2018-05-11 12:16:24: Command output:
>  TFA Installation Log will be written to File : /tmp/tfa_install_8879_2018_05_11-12_16_02.log
>  
>  Starting TFA installation
>  
>  Using JAVA_HOME : /u01/app/12.2.0.1/grid/jdk/jre
>  
>  Running Auto Setup for TFA as user root...
>  
>  Installing TFA now...
>  
>  
>  TFA Will be Installed on rac122pri1...
>  
>  TFA will scan the following Directories
>  ++++++++++++++++++++++++++++++++++++++++++++
>  
>  .---------------------------------------------------------.
>  |                        rac122pri1                       |
>  +----------------------------------------------+----------+
>  | Trace Directory                              | Resource |
>  +----------------------------------------------+----------+
>  | /u01/app/12.2.0.1/grid/OPatch/crs/log        | CRS      |

... [a lot more directories]

>  | /u01/app/oracle/crsdata/rac122pri1/trace     | CRS      |
>  '----------------------------------------------+----------'
>  
>  
>  Installing TFA on rac122pri1:
>  HOST: rac122pri1     TFA_HOME: /u01/app/12.2.0.1/grid/tfa/rac122pri1/tfa_home
>  
>  .------------------------------------------------------------------------------.
>  | Host       | Status of TFA | PID  | Port | Version    | Build ID             |
>  +------------+---------------+------+------+------------+----------------------+
>  | rac122pri1 | RUNNING       | 9165 | 5000 | 12.2.1.0.0 | 12210020161122170355 |
>  '------------+---------------+------+------+------------+----------------------'
>  
>  Running Inventory in All Nodes...
>  
>  Enabling Access for Non-root Users on rac122pri1...
>  
>  Adding default users to TFA Access list...
>  
>  Summary of TFA Installation:
>  .----------------------------------------------------------------------.
>  |                              rac122pri1                              |
>  +---------------------+------------------------------------------------+
>  | Parameter           | Value                                          |
>  +---------------------+------------------------------------------------+
>  | Install location    | /u01/app/12.2.0.1/grid/tfa/rac122pri1/tfa_home |
>  | Repository location | /u01/app/oracle/tfa/repository                 |
>  | Repository usage    | 0 MB out of 10240 MB                           |
>  '---------------------+------------------------------------------------'
>  
>  
>  Installing oratop extension..
>  
>  
>  TFA is successfully installed...

...

>End Command output
2018-05-11 12:16:24: Executing cmd: /u01/app/12.2.0.1/grid/bin/clsecho -p has -f clsrsc -m 4002
2018-05-11 12:16:24: Command output:
>  CLSRSC-4002: Successfully installed Oracle Trace File Analyzer (TFA) Collector. 
>End Command output

As with the other database deployment types, TFA integrates into systemd on Oracle Linux 7:

[root@rac122pri1 ~]# systemctl status oracle-tfa
● oracle-tfa.service - Oracle Trace File Analyzer
   Loaded: loaded (/etc/systemd/system/oracle-tfa.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2018-06-20 23:03:11 BST; 15h ago
 Main PID: 3092 (init.tfa)
   CGroup: /system.slice/oracle-tfa.service
           ├─ 3092 /bin/sh /etc/init.d/init.tfa run >/dev/null 2>&1 </dev/null
           ├─ 3698 /u01/app/12.2.0.1/grid/tfa/rac122pri1/tfa_home/jre/bin/java -
  server -Xms32m -Xmx64m -Djava.awt.headless=true -Ddisable.checkForUpdate=true
  oracle.rat.tfa.TFAMain /u01/app/12.2.0.1/grid/tfa/rac122pri1/...
           └─15827 /bin/sleep 30

The service is enabled and starts together with the operating system.

RAC-TFA lives in the Grid Home

If you read the previous parts of this series you might have noticed in the previous output that unlike single instance and Oracle Restart environments, TFA is resident in the Grid Home as opposed to ORACLE_BASE. This could have implications if you are using a system implementing separation of duties. For everyone else it’s a question of making sure the environment is set correctly.

[oracle@rac122pri1 ~]$ . oraenv
ORACLE_SID = [oracle] ? +ASM1
The Oracle base has been set to /u01/app/oracle
[oracle@rac122pri1 ~]$ which tfactl
/u01/app/12.2.0.1/grid/bin/tfactl

Another difference between single node and RAC installations is the status message. TFA reports its status on all nodes in the cluster, as shown here:

[oracle@rac122pri1 ~]$ tfactl print status

.-------------------------------------------------------------------------------------------------.
| Host       | Status of TFA | PID  | Port | Version    | Build ID             | Inventory Status |
+------------+---------------+------+------+------------+----------------------+------------------+
| rac122pri1 | RUNNING       | 3698 | 5000 | 18.2.1.0.0 | 18210020180529111033 | COMPLETE         |
| rac122pri2 | RUNNING       | 3697 | 5000 | 18.2.1.0.0 | 18210020180529111033 | COMPLETE         |
'------------+---------------+------+------+------------+----------------------+------------------'
[oracle@rac122pri1 ~]$

This isn’t the stock version by the way, I have recently upgraded my deployment to 18.2.1.0. More about that later. I recently installed RAC 12.2 and when invoking TFA am greeted with the familiar “WARNING – your software is older than 180 days. Please consider upgrading TFA to the latest version”.

You may want to consider upgrading the installation

As with the other deployments, the version you get with the base installation is most likely outdated. My base release showed this:

.-------------------------------------------------------------------------------------------------.
| Host       | Status of TFA | PID  | Port | Version    | Build ID             | Inventory Status |
+------------+---------------+------+------+------------+----------------------+------------------+
| rac122pri1 | RUNNING       | 3698 | 5000 | 12.2.1.0.0 | 12210020161122170355 | COMPLETE         |
| rac122pri2 | RUNNING       | 3697 | 5000 | 12.2.1.0.0 | 12210020161122170355 | COMPLETE         |
'------------+---------------+------+------+------------+----------------------+------------------'

But not only that: if you check for toolstatus you will notice that most of the support tools are missing just like with the other deployments. I love to have them and as such upgraded to the current version at the time of writing (18.2.1.0.0). My Oracle Support DOC ID 1513912.1 is the reference, as always.

After the upgrade completed, I could see the following tools:

[oracle@rac122pri1 ~]$ tfactl toolstatus

.------------------------------------------------------------------.
|                 TOOLS STATUS - HOST : rac122pri1                 |
+----------------------+--------------+--------------+-------------+
| Tool Type            | Tool         | Version      | Status      |
+----------------------+--------------+--------------+-------------+
| Development Tools    | orachk       |   12.2.0.1.3 | DEPLOYED    |
|                      | oratop       |       14.1.2 | DEPLOYED    |
+----------------------+--------------+--------------+-------------+
| Support Tools Bundle | darda        | 2.10.0.R6036 | DEPLOYED    |
|                      | oswbb        |        8.0.1 | RUNNING     |
|                      | prw          | 12.1.13.11.4 | NOT RUNNING |
+----------------------+--------------+--------------+-------------+
| TFA Utilities        | alertsummary |   12.2.1.1.0 | DEPLOYED    |
|                      | calog        |   12.2.0.1.0 | DEPLOYED    |
|                      | changes      |   12.2.1.1.0 | DEPLOYED    |
|                      | dbglevel     |   12.2.1.1.0 | DEPLOYED    |
|                      | events       |   12.2.1.1.0 | DEPLOYED    |
|                      | grep         |   12.2.1.1.0 | DEPLOYED    |
|                      | history      |   12.2.1.1.0 | DEPLOYED    |
|                      | ls           |   12.2.1.1.0 | DEPLOYED    |
|                      | managelogs   |   12.2.1.1.0 | DEPLOYED    |
|                      | menu         |   12.2.1.1.0 | DEPLOYED    |
|                      | param        |   12.2.1.1.0 | DEPLOYED    |
|                      | ps           |   12.2.1.1.0 | DEPLOYED    |
|                      | pstack       |   12.2.1.1.0 | DEPLOYED    |
|                      | search       |   18.2.0.0.0 | DEPLOYED    |
|                      | summary      |   12.2.1.1.0 | DEPLOYED    |
|                      | tail         |   12.2.1.1.0 | DEPLOYED    |
|                      | triage       |   12.2.1.1.0 | DEPLOYED    |
|                      | vi           |   12.2.1.1.0 | DEPLOYED    |
'----------------------+--------------+--------------+-------------'

Note :-
  DEPLOYED    : Installed and Available - To be configured or run interactively.
  NOT RUNNING : Configured and Available - Currently turned off interactively.
  RUNNING     : Configured and Available.

Summary

Whilst TFA is a great tool for single instance and Oracle Restart environments, it is indispensible for RAC. The ability to gather a lot of diagnostic data that would otherwise be difficult to get is the prime use case. Tools such as OSWatcher are very useful when it comes to diagnosing system status at the time of an instance eviction for example.

Hybrid Columnar Compression in 12.2 – nice new feature

Oracle 12.2 introduced an interesting optimisation for Hybrid Columnar Compression (HCC). Until 12.2 you had to use direct path inserts into HCC compressed segments for data to be actually compressed. If you didn’t use a direct path insert you would still succeed in entering data into the segment, however your newly inserted data was not HCC compressed. There is no error message or other warning telling you about that, which can lead to surprises for the unaware.

My friend and colleague Nick has pointed out that the official HCC white paper states – somewhat hidden – that this requirement is no longer as strict in 12.2. I haven’t managed to find the document Nick pointed out, but a quick search using my favourite retrieval engine unearthed the updated version for 18c.

The new feature is easy to miss: you read on page 3 that data must be loaded using data warehouse bulk loading (direct path) techniques. Examples of said technologies follow before the author continues with a discussion about querying HCC data. For me that was the end of the subject… Turns out it wasn’t: the nugget Nick unearthed was on page 4, in the last paragraph before the next section on “HCC Warehouse (Query) Compression”. Quoting literally from said white paper:

Starting with Oracle Database 12.2, HCC automatically compresses new data from SQL INSERT … SELECT statements, without the APPEND hint and array inserts from programmatic interfaces such as PL/SQL and the Oracle Call Interface (OCI).

Aha! Once you know what to look for you find the same information in the 12.2 new features guide, too. Sometimes it’s hard to see the wood for all those trees.

So there is no more need for the append hint for insert-select in 12.2? This would be great news, and should alleviate some issues people have from unknowingly using HCC without direct path inserts. And it’s good for concurrency! Although I do like white papers, there are times when the white paper has to be checked against the lab to see if it is actually true.

I did exactly this for this blog post.

12.2 Test Case

I am using the SOE.ORDERS table for that purpose, as it holds a fair bit of data. To see whether the new algorithm works I decided to create a new empty table ORDERS_HCC with the same table structure as SOE.ORDERS. In the next step I issue an insert-select statement. If the white paper is correct it’ll compress the data using Query High.

SQL (12.2)> show user    
USER is "SOE"
SQL (12.2)> select banner from v$version where rownum  create table orders_hcc 
  2  column store compress for query high 
  3  as select * from orders where 1 = 0; 
                                                                                                
Table created.

SQL (12.2)> insert into orders_hcc 
  2  select * from orders where rownum  commit;

Commit complete.

SQL (12.2)>

Note that I specifically omitted the /*+ append */ hint in the insert statement. Also note that the preceding CTAS statement didn’t select any rows from the source. In Oracle releases up to and including 12.1, data in ORDERS_HCC would not be compressed at the end of this little code snippet.

But how can you prove the white paper is right? Thankfully Oracle is well instrumented, and even more thankfully that instrumentation not limited to performance. The Oracle-provided dbms_compression package features a procedure called “get_compression_type()”, which allows you to pass it a ROWID and some other information and it’ll tell you the block’s compression algorithm.

Remember that you can change the compression algorithm for a given segment many times over. A partition in a range partitioned table can start uncompressed while being actively used, and as data gets colder, compression levels can be increased. Technically speaking the “alter table … column store” command on its own does not change the way data is stored in the segment. Data currently stored will remain in whatever state it was before changing the compression attribute. Only newly inserted data will be compressed according to the changed segment attribute.

Back to the example: using DBMS_COMPRESSION I would like to find out if my table data is indeed compressed for Query High after my earlier insert command. Let’s see if it is, using the first 10 rows as a rough indication.

SQL (12.2)> select dbms_compression.get_compression_type(user, 'ORDERS_HCC', rowid) ctype
  2  from ORDERS_HCC where rownum < 11;

     CTYPE
----------
	 4
	 4
	 4
	 4
	 4
	 4
	 4
	 4
	 4
	 4

10 rows selected.

Well it would appear as if these are all compressed for Query High (QH). Looking at the package definition I can see that a compression type of 4 indicates Query High.

So far so good, but I merely checked 10 out of 1 million rows. I’m fairly sure the rest of the table is also HCC compressed for QH, but I want proof. To remove any doubts, I can query the whole table. I’m doing this so you don’t have to. The next query will take forever (eg certainly more than 1 minute) to execute, and it is CPU bound so please don’t do this at work. If you really feel like having to run a query like this, don’t do it outside the lab. You have been warned :)

SQL (12.2)> with comptypes as (
  2   select rownum rn, dbms_compression.get_compression_type(user, 'ORDERS_HCC', rowid) ctype
  3     from ORDERS_HCC
  4  )
  5      select count(*), ctype from comptypes group by ctype;

  COUNT(*)      CTYPE
---------- ----------
   1000000          4

I always start these types of queries in a screen (1) session to prevent network connection issues from interrupting my long running task. After some time, the query returns with the results as you can see. The entire table is compressed with Query High.

Summary

Array-inserts into HCC segments can compress data in Oracle 12.2 even if you don’t specify the append hint. The behaviour for conventional inserts did not change. I am going to post another piece of research containing some more data later this week or next.

New option for configuring multipathing in Oracle Linux 7

Teaching is and remains the best way for picking up new things :) While updating notes for a class I came across an interesting change in the way the device-mapper multipath works in Oracle Linux 7.4.

In the past, everyone including me used scsi_id to get the WWID of a LUN for use with dm-multipath. This is still the way Oracle documents it for Oracle Linux 7.

The utility was specified as an argument to getuid_callout in /etc/multipath.conf, as shown here:

defaults {
[...]
    getuid_callout        "/lib/udev/scsi_id --whitelisted --device=/dev/%n"
[...]
}

This is how it was done more or less since Red Hat 5 days. The location of scsi_id has changed over time, as expertly described on Oracle Base for example.

I don’t exactly know what happened to getuid_callout, but I can’t find it in the DM Multipath documentation for Red Hat Enterprise Linux 7. It certainly is present in its counterpart for Red Hat Enterprise Linux 6. A quick search using my favourite search engine did not reveal any clues as to what happened. The closest I got to an answer was a page in the SLES 11 SP3 documentation, which indicated that you should use a different parameter than getuid_callout.

Now I was curious! If I continued using getuid_callout, would the service stop working? Or just throw a warning?

The experiment

I quickly spun up a VM on KVM, running Oracle Linux 7.4 which I updated to the latest and greatest version of each package as of April 3rd 2018. The current versions of dm-multipath as of April 3rd 2018 are:

[root@server6 ~]# rpm -qa | grep -i multipath
device-mapper-multipath-0.4.9-111.el7_4.2.x86_64
device-mapper-multipath-libs-0.4.9-111.el7_4.2.x86_64
[root@server6 ~]# 

I have presented a small “LUN” to the VM, using the virtual SCSI driver. I initially tried with the virtio driver but found it was not exporting necessary properties back to the VM that I need later in this post. It shouldn’t matter though, all of this is a playground and I am most interested in the concepts, not the details.

You will also notice that I am only showing snippets of the multipath.conf file. The reason for this is simple: if you don’t follow your storage vendor’s documentation detailing how to configure dm-multipath for the storage system in use, you might run into issues. Always consult your storage vendor’s official documentation!

Installing dm-multipath

I didn’t have multipathing installed initially and had to install it first. Here is the transcript:

[root@server6 ~]# yum install device-mapper-multipath

[...]

--> Running transaction check
---> Package boost-system.x86_64 0:1.53.0-27.el7 will be installed
---> Package boost-thread.x86_64 0:1.53.0-27.el7 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

================================================================================================
 Package                            Arch         Version                 Repository        Size
================================================================================================
Installing:
 device-mapper-multipath            x86_64       0.4.9-111.el7_4.2       ol7_latest       134 k
Installing for dependencies:
 boost-system                       x86_64       1.53.0-27.el7           ol7_latest        39 k
 boost-thread                       x86_64       1.53.0-27.el7           ol7_latest        57 k
 device-mapper-multipath-libs       x86_64       0.4.9-111.el7_4.2       ol7_latest       251 k
 librados2                          x86_64       1:0.94.5-2.el7          ol7_latest       1.7 M
Updating for dependencies:
 kpartx                             x86_64       0.4.9-111.el7_4.2       ol7_latest        72 k

Transaction Summary
================================================================================================
Install  1 Package  (+4 Dependent packages)
Upgrade             ( 1 Dependent package)

Total download size: 2.2 M

[...]

Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : boost-system-1.53.0-27.el7.x86_64                                            1/7 
  Installing : boost-thread-1.53.0-27.el7.x86_64                                            2/7 
  Installing : 1:librados2-0.94.5-2.el7.x86_64                                              3/7 
  Installing : device-mapper-multipath-libs-0.4.9-111.el7_4.2.x86_64                        4/7 
  Updating   : kpartx-0.4.9-111.el7_4.2.x86_64                                              5/7 
  Installing : device-mapper-multipath-0.4.9-111.el7_4.2.x86_64                             6/7 
  Cleanup    : kpartx-0.4.9-111.el7.x86_64                                                  7/7 
  Verifying  : boost-system-1.53.0-27.el7.x86_64                                            1/7 
  Verifying  : 1:librados2-0.94.5-2.el7.x86_64                                              2/7 
  Verifying  : boost-thread-1.53.0-27.el7.x86_64                                            3/7 
  Verifying  : device-mapper-multipath-0.4.9-111.el7_4.2.x86_64                             4/7 
  Verifying  : kpartx-0.4.9-111.el7_4.2.x86_64                                              5/7 
  Verifying  : device-mapper-multipath-libs-0.4.9-111.el7_4.2.x86_64                        6/7 
  Verifying  : kpartx-0.4.9-111.el7.x86_64                                                  7/7 

Installed:
  device-mapper-multipath.x86_64 0:0.4.9-111.el7_4.2                                            

Dependency Installed:
  boost-system.x86_64 0:1.53.0-27.el7                      boost-thread.x86_64 0:1.53.0-27.el7 
  device-mapper-multipath-libs.x86_64 0:0.4.9-111.el7_4.2  librados2.x86_64 1:0.94.5-2.el7     

Dependency Updated:
  kpartx.x86_64 0:0.4.9-111.el7_4.2                                                             

Complete!
[root@server6 ~]# 

First configuration attempt

Next I quickly created a config file based on what I used in the past, making sure I add getuid_callout in the defaults {} section. And sure enough, when starting the service it complains about illegal syntax:

[root@server6 ~]# systemctl status multipathd
● multipathd.service - Device-Mapper Multipath Device Controller
   Loaded: loaded (/usr/lib/systemd/system/multipathd.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2018-04-03 14:56:19 BST; 1min 28s ago
  Process: 586 ExecStart=/sbin/multipathd (code=exited, status=0/SUCCESS)
  Process: 568 ExecStartPre=/sbin/multipath -A (code=exited, status=0/SUCCESS)
  Process: 564 ExecStartPre=/sbin/modprobe dm-multipath (code=exited, status=0/SUCCESS)
 Main PID: 596 (multipathd)
   CGroup: /system.slice/multipathd.service
           └─596 /sbin/multipathd

Apr 03 14:56:19 server6 systemd[1]: Starting Device-Mapper Multipath Device Controller...
Apr 03 14:56:19 server6 multipath[568]: Apr 03 14:56:19 | /etc/multipath.conf line 24, invalid keyword: getuid_callout
Apr 03 14:56:19 server6 systemd[1]: Started Device-Mapper Multipath Device Controller.
Apr 03 14:56:19 server6 multipathd[596]: /etc/multipath.conf line 24, invalid keyword: getuid_callout

Note the line reading “/etc/multipath.conf line 24, invalid keyword: getuid_callout”. It didn’t seem to prevent the meta-device from being created, but I didn’t like the error message popping up all the time.

After removing the getuid_callout from my multipath.conf file to let the defaults take over, the error went away after a restart of the service:

Apr 03 15:02:54 server6 systemd[1]: Stopped Device-Mapper Multipath Device Controller.

Apr 03 15:10:43 server6 systemd[1]: Starting Device-Mapper Multipath Device Controller...
Apr 03 15:10:43 server6 systemd[1]: Started Device-Mapper Multipath Device Controller.
Apr 03 15:10:43 server6 multipathd[11593]: mpatha: load table [0 4194304 multipath 0 0 1 1 service-time 0 1 1 8:0 1]
Apr 03 15:10:43 server6 multipathd[11593]: mpatha: event checker started
Apr 03 15:10:43 server6 multipathd[11593]: path checkers start up

So I guess I have to rethink my approach to dm-multipath and device name persistence if I want to avoid this error message.

The New Way of doing things

I have once been told by a good friend: “when unsure, consult the documentation”. On my Linux system, my first stop is the man page. After a quick “man multipath.conf” I learned that getuid_callout is nowhere to be found. The SUSE note mentioned a new attribute, named uid_attribute, which seems to have taken its place:

uid_attribute:
The udev attribute providing a unique path identifier. Default value is ID_SERIAL

I also found this to be valuable about the “multipaths” section:

This section defines the multipath topologies. They are indexed by a World Wide Identifier (wwid), which is taken to be the value of the udev attribute given by the uid_attribute keyword.

Aha! The new model is potentially much more flexible than the old one.

Controlling aliases with multipath {} sections

I really like to assign alias names to LUNs mapped to my database systems. This way, I know that a device named “/dev/mapper/oracle_asm_data_001” is most likely used as the first ASM disk in disk group DATA. If I see /dev/mapper/mpatha instead, I know precisely nothing about its purpose.

Giving LUNs a name is not hard, you can do this in the multipaths {} section of /etc/multipath.conf. I really like doing this for reasons just mentioned. Which options do I have?

The easiest one is quite simply a change of getuid_callout to uid_attribute or its removal. Thankfully – at least in my case – the output of scsi_id and the ID_SERIAL attribute provided by udev are nearly identical:

[root@server6 ~]# /lib/udev/scsi_id --whitelisted --replace-whitespace --device=/dev/sda
0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-0
[root@server6 ~]# udevadm info /dev/sda | grep ID_SERIAL
E: ID_SERIAL=0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-0
E: ID_SERIAL_SHORT=drive-scsi0-0-0-0
[root@server6 ~]# 

Note that I included the –replace-whitespace option in the scsi_id command, otherwise the output wouldn’t be identical. This is a concession to my lab, on real hardware I haven’t seen white space in WWIDs (this doesn’t mean there aren’t). Removing the getuid_callout from the default {} section was actually enough to create a persistent device name.

[...]
multipaths {

  multipath {
    wwid  "0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-0"
    alias "oracle_asm_data_001"
  }
}

And yes, I get my alias after restarting the multipathing daemon.

Apr 03 16:40:49 server6 systemd[1]: Starting Device-Mapper Multipath Device Controller...
Apr 03 16:40:49 server6 systemd[1]: Started Device-Mapper Multipath Device Controller.
Apr 03 16:40:49 server6 multipathd[12271]: oracle_asm_data_001: load table [0 4194304 multipath 0 0 1 1 service-time 0 1 1 8:0 1]
Apr 03 16:40:49 server6 multipathd[12271]: oracle_asm_data_001: event checker started
Apr 03 16:40:49 server6 multipathd[12271]: path checkers start up

[root@server6 ~]# multipath -l
oracle_asm_data_001 (0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-0) dm-3 QEMU    ,QEMU HARDDISK   
size=2.0G features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=0 status=active
  `- 0:0:0:0 sda 8:0 active undef unknown
[root@server6 ~]# 

There is of course no second path, and there is a fair bit of information missing that your would otherwise get from an enterprise deployment, but you get the idea.