Category Archives: Oracle

Oracle (Database and Middleware) related posts

Creating a RAC 12.1 Data Guard Physical Standby environment (3b)

Huh, what is this I hear you ask? Part 3b? Oracle 12.1? Well, there’s a bit of a story to this post. Back in December 2016 I started to write a series of blog posts (part 1 | part 2 | part 3 | part 4) about how I created a standby database on RAC 12.1. For some reason I forgot to post this part. Up until now the step where I am creating the broker configuration was missing. Thanks to a friend I discovered my mistake. I decided to post this article to complete the series. There’s a pretty big logical gap that needed filling :)

I also deliberately decided against making changes to my notes, so this is written in the same style as the articles preceding this one. Also bear in mind that this is Oracle 12.1!

NOTE: As always, this is just a demonstration using VMs in my lab, based on my notes. Your system is most likely different, so in real-life you might take a different approach. The techniques I am using here were suitable for me, and my own small-scale testing. I tried to make sure they are valid, but you may want to allocate more resources in your environment. Test, test, test on your own environment on test kit first!

Here is the original blog post

In a last-minute decision I decided to split part 3 into half: first the database duplication, followed by the Data Guard configuration. It really got a bit too long (even for my standards), and would have put readers off the article. I appreciate your patience though…

What’s the situation?

At the end of part 3 I had a physical standby database ready for use, including its registration in the OCR. it was looking promising, but there was more work to do.

At the moment, the standby database is out of sync with the primary, a fact that needs to be rectified by creating a Data Guard configuration. I prefer to do this on the command line. If you are so inclined you can of course push the mouse around and do it with OEM.

Step 1: Data Guard Broker pre-requisites

Before creating the configuration you should change the location of the broker configuration files to shared storage. This needs to be done on each side – primary and every standby – before you can start the broker.

SQL> alter system set dg_broker_config_file1='+DATA/NCDBA/dr1NCDBA.dat' scope=both sid='*';

System altered.

SQL> alter system set dg_broker_config_file2='+DATA/NCDBA/dr2NCDBA.dat' scope=both sid='*';

System altered.

SQL> show parameter dg_broker

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
dg_broker_config_file1               string      +DATA/NCDBA/dr1NCDBA.dat
dg_broker_config_file2               string      +DATA/NCDBA/dr2NCDBA.dat
dg_broker_start                      boolean     FALSE

SQL> alter system set dg_broker_start=true scope=both sid='*';

System altered.

SQL> 

I had to repeat the step on NCDBB, of course with adapted paths and file names.

Step 2: Creating a Data Guard Broker configuration

Once that’s done, you create the configuration using dgmgrl, the Data Guard Broker Manager Line Mode tool. I named my configuration RACTEST and added both databases. Here is an example of my session:

[oracle@rac12pri1 ~]$ dgmgrl
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production

Copyright (c) 2000, 2013, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys@ncdba
Password:
Connected as SYSDBA.
DGMGRL> create configuration ractest as primary database is 'NCDBA' connect identifier is 'NCDBA';
Configuration "ractest" created with primary database "NCDBA"
DGMGRL> add database 'NCDBB' as connect identifier is 'NCDBB';
Database "NCDBB" added
DGMGRL> show configuration

Configuration - ractest

  Protection Mode: MaxPerformance
  Members:
  NCDBA - Primary database
    NCDBB - Physical standby database 

Fast-Start Failover: DISABLED

Configuration Status:
DISABLED

As a first step I connect as SYSDBA before creating the configuration. You can usually type help command to remind you of the exact syntax. After the configuration is created you can start adding (standby) databases. In my case there is only 1 – physical standby – database, but you can think of environments where you have more than one standby environment although that is less common. When specifying the connect identifier you use valid net*8 names found in the default tnsnames.ora (the one in the RDBMS home).

At this point it is important to set standby_file_management to auto. Otherwise managed recovery can abort and you will have to do extra work managing data files that can easily be avoided. The broker syntax to do so involves changing database properties (type show database verbose 'NCDBA' to show all of them). Many Data Guard broker properties are not easily mapped back to the corresponding initialisation parameter. In the case of standby_file_management it is not hard, and the documentation helps with the rest.

Eventually I enable the configuration and check the status. I have found that it can take a couple of minutes at times to get the SUCCESS message.

DGMGRL> edit database 'NCDBA' set property standbyfilemanagement=auto;
Property "standbyfilemanagement" updated
DGMGRL> edit database 'NCDBB' set property standbyfilemanagement=auto;
Property "standbyfilemanagement" updated
DGMGRL> enable configuration
Enabled.
DGMGRL> show configuration

Configuration - ractest

  Protection Mode: MaxPerformance
  Members:
  NCDBA - Primary database
    NCDBB - Physical standby database 

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 21 seconds ago)

There are other things you must consider in accordance with your company’s (data protection/availability/business continuity/…) standards and other documentation – again my example is rather basic, but deliberately so. It’s my lab playground after all.

For the sake of completeness, here is the current configuration of NCDBA and NCDBB:

DGMGRL> show database 'NCDBA'

Database - NCDBA

  Role:               PRIMARY
  Intended State:     TRANSPORT-ON
  Instance(s):
    NCDBA1
    NCDBA2

Database Status:
SUCCESS

DGMGRL> show database 'NCDBB';

Database - NCDBB

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 1 second ago)
  Apply Lag:          0 seconds (computed 1 second ago)
  Average Apply Rate: 223.00 KByte/s
  Real Time Query:    OFF
  Instance(s):
    NCDBB1
    NCDBB2 (apply instance)

Database Status:
SUCCESS

DGMGRL> exit

Please note that the broker detected the fact that my systems are cluster databases and lists both instances for each database. It also shows that NCDBB2 is the apply instance. I was positively surprised about the broker’s ability to detect both instances automatically. I seem to remember cases in earlier releases where said instance detection occasionally required a little nudging to get it right.

This example is deliberately kept as minimalist as possible: it works. Most likely it won’t satisfy your requirements. Review your Data Guard (and any other relevant) standards and change the configuration accordingly.

Summary

… back to present time :) This post completes the series, all my notes are now transcribed to the blog. I hope you find this useful.

back to part 3 | on to part 4

Advertisements

RAC One node databases are relocated by opatchauto in 12.2 part 2

In a previous post I shared how I found out that RAC One Node databases are relocated on-line during patching and I promised a few more tests and sharing of implications. If you aren’t familiar with RAC One Node I recommend having a look at the official documentation: The Real Application Clusters Administration and Deployment Guide features it prominently in the introduction. One of the things I like to keep in mind when working with RAC One Node is this, quoting from section 1.3 in said admin and deployment guide:

Using the Single Client Access Name (SCAN) to connect to the database, clients can locate the service independently of the node on which it is running. Relocating an Oracle RAC One Node instance is therefore mostly transparent to the client, depending on the client connection. Oracle recommends to use either Application Continuity and Oracle Fast Application Notification or Transparent Application Failover to minimize the impact of a relocation on the client.

A little later, you can read this important additional piece of information in the “notes” section:

To prepare for all failure possibilities, you must add at least one Dynamic Database Service (Oracle Clusterware-managed database service) to an Oracle RAC One Node database.

If you created your RAC One database with the database creation assistant (dbca), you are already complying with that rule. In my case, my lab database is named RON with the mandatory service RON_SVC. I opted for an instance prefix of “DCB”.

What I’d like to try out for this blog post is what happens to an active workload on a RAC One database during patching.

I have successfully patched multi-node RAC systems, but that required the application to support this procedure. One of my favourite talks is named “Advanced RAC programming techniques” where I demonstrate the resilience of an application based on RAC to node failures by using Universal Connection Pool (UCP), JDBC and Fast Connection Failover (FCF). UCP is a Fast Application Notification (FAN) aware connection pool allowing my application to react to cluster events such as node up/node down. The idea is to mask instance failure from users.

If all of this sounds super-cryptic, I would like to suggest chapters 5 and 6 of the RAC administration and deployment guide for you. There you can read more about Workload Management with Dynamic Database Services and Application Continuity in RAC.

RAC One Node is different from multi-node RAC as it is only ever active on one node in normal operations. Online relocation, as described in the previous post, is supported by temporarily adding a second (destination) instance to the cluster, and moving all transactions across before terminating them after a time-out (default 30m). Once the time-out is reached (or all sessions made it across to the destination) the source instance is shut down and you are back to 1 active instance.

The online relocation does not care too much about the compatibility of the application with the process. If your application is written for RAC, most likely it will migrate quickly from source to destination instance. If it isn’t, well, the hard timeout might kick in and evict a few of your users. In a worse case your users need to re-connect to the database. Even worse still, you might have to restart the middle-tier system(s).

Sadly I haven’t seen too many applications capable of handling RAC events properly. One application that does is Swingbench, so I’ll stick with it. I configured it according to Dominic Giles’s blog post.

This post assumes that you are properly licensed to use all these features.

The environment

My setup hasn’t changed from previous post so I spare you the boredom of repeating it here. Jump over to the other post for details.

Preparations

For this blogpost I need to ensure that my RAC One node database resides on the node I am about to patch. I have again followed the patch readme very carefully, I have made sure that I have (proven, working, tested) backups of the entire stack and all the rest of it…

My database is indeed actively running on the node I am about to patch:

[oracle@rac122sec2 ~]$ srvctl status database -db DCB -detail -verbose
Instance DCB_1 is running on node rac122sec2 with online services RON_SVC. Instance status: Open.
Instance DCB_1 is connected to ASM instance +ASM2
Online relocation: INACTIVE

Before I can benefit from Application Continuity, I need to make a few changes to my application service, RON_SVC. There are quite a few sources to choose from, I went with the JDBC Developer’s Guide. Here’s the modification:

[oracle@rac122sec2 ~]$ srvctl modify service -db DCB -service RON_SVC -commit_outcome true \
> -failoverretry 30 -failoverdelay 10 -failovertype transaction \
> -replay_init_time 1800 -retention 86400 -notification true
[oracle@rac122sec2 ~]

Following the instructions on Dominic Giles’s blog, I also need to grant SOE the right to execute DBMS_APP_CONT.

Finally, I need to make changes to my Swingbench configuration file. The relevant part is shown here:

<SwingBenchConfiguration xmlns="http://www.dominicgiles.com/swingbench/config">
    <Name>"Order Entry (PLSQL) V2"</Name>
    <Comment>Version 2 of the SOE Benchmark running in the database using PL/SQL</Comment>
    <Connection>
        <UserName>soe</UserName>
        <Password></Password>
        <ConnectString>(DESCRIPTION=(TRANSPORT_CONNECT_TIMEOUT=5)
        (RETRY_COUNT=3)(FAILOVER=ON)
        (ADDRESS=(PROTOCOL=TCP)(HOST=rac122sec-scan)(PORT=1521))
        (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=RON_SVC)))</ConnectString>
        <DriverType>Oracle jdbc Driver</DriverType>
        <ConnectionPooling>
            <PooledInitialLimit>5</PooledInitialLimit>
            <PooledMinLimit>5</PooledMinLimit>
            <PooledMaxLimit>10</PooledMaxLimit>
            <PooledInactivityTimeOut>50</PooledInactivityTimeOut>
            <PooledConnectionWaitTimeout>45</PooledConnectionWaitTimeout>
            <PooledPropertyCheckInterval>10</PooledPropertyCheckInterval>
            <PooledAbandonedConnectionTimeout>120</PooledAbandonedConnectionTimeout>
        </ConnectionPooling>
        <Properties>
            <Property Key="StatementCaching">50</Property>
            <Property Key="FastFailover">true</Property>
            <Property Key="FetchSize">20</Property>
            <Property Key="OnsConfiguration">nodes=rac122sec1:6200,rac122sec2:6200</Property>
            <Property Key="FetchSize">20</Property>
            <Property Key="AppContinuityDriver">true</Property>
        </Properties>
    </Connection>
    <Load>
        <NumberOfUsers>5</NumberOfUsers>
        <MinDelay>0</MinDelay>
        <MaxDelay>0</MaxDelay>
        <InterMinDelay>50</InterMinDelay>
        <InterMaxDelay>500</InterMaxDelay>
        <QueryTimeout>120</QueryTimeout>
        <MaxTransactions>-1</MaxTransactions>
        <RunTime>0:0</RunTime>
        <LogonGroupCount>1</LogonGroupCount>
        <LogonDelay>1000</LogonDelay>
        <LogOutPostTransaction>false</LogOutPostTransaction>
        <WaitTillAllLogon>false</WaitTillAllLogon>
        <StatsCollectionStart>0:0</StatsCollectionStart>
        <StatsCollectionEnd>0:0</StatsCollectionEnd>
        <ConnectionRefresh>0</ConnectionRefresh>
        <TransactionList>
...

The connection string is actually on a single line, I have formatted it for readability in the above example. The main change from the standard configuration file is the use of connection pooling and setting the required properties for Application Continuity.

Let’s patch!

Once all the preparations are completed, it’s time to see how RAC One Node deals with an active workload undergoing an online relocation during patching. First of all I need to start the workload. I’d normally use charbench for this, but this time around opted for the GUI. It shows performance graphs over a 3 minute rolling window.

A few minutes after starting the benchmark I commenced patching. Soon thereafter, opatchauto told me that it was relocating the database.

OPatchauto session is initiated at Fri Jul 27 14:52:23 2018

...

Relocating RACOne home before patching on home /u01/app/oracle/product/12.2.0.1/dbhome_1

According to the session log, this happened at 14:54. And by the way, always keep the session log, it’s invaluable!

2018-07-27 14:54:07,914 INFO  [163] com.oracle.glcm.patch.auto.db.product.executor.PatchingStepExecutor - 
Executing command as oracle: 
 /bin/sh -c 'ORACLE_HOME=/u01/app/oracle/product/12.2.0.1/dbhome_1 
 /u01/app/oracle/product/12.2.0.1/dbhome_1/bin/srvctl relocate database -d DCB -n rac122sec1 -w 30 -v'
2018-07-27 14:54:07,914 INFO  [163] com.oracle.glcm.patch.auto.db.product.executor.PatchingStepExecutor - 
  COMMAND Looks like this: 
    /bin/sh -c 'ORACLE_HOME=/u01/app/oracle/product/12.2.0.1/dbhome_1 
    /u01/app/oracle/product/12.2.0.1/dbhome_1/bin/srvctl relocate database -d DCB -n rac122sec1 -w 30 -v'
2018-07-27 14:55:31,736 INFO  [163] com.oracle.glcm.patch.auto.db.product.executor.PatchingStepExecutor - 
  status: 0
2018-07-27 14:55:31,737 INFO  [163] com.oracle.glcm.patch.auto.db.product.executor.PatchingStepExecutor - 
  Successfully executed the above command.

2018-07-27 14:55:31,737 INFO  [163] com.oracle.glcm.patch.auto.db.product.executor.PatchingStepExecutor - 
  Output from the command:
2018-07-27 14:55:31,737 INFO  [163] com.oracle.glcm.patch.auto.db.product.executor.PatchingStepExecutor -
Configuration updated to two instances
Instance DCB_2 started
Services relocated
Waiting for up to 30 minutes for instance DCB_1 to stop ...
Instance DCB_1 stopped
Configuration updated to one instance

2018-07-27 14:55:31,737 INFO  [163] com.oracle.glcm.patch.auto.db.product.executor.PatchingStepExecutor - 
  Command executed successfully.

You can see the same information by querying Clusterware, although there aren’t any timestamps associated with it:

[root@rac122sec1 ~]# srvctl status database -db DCB -detail -verbose
Instance DCB_1 is running on node rac122sec2 with online services RON_SVC. Instance status: Open.
Instance DCB_1 is connected to ASM instance +ASM2
Online relocation: ACTIVE
Source instance: DCB_1 on rac122sec2
Destination instance: DCB_2 on rac122sec1
[root@rac122sec1 ~]# 

Although the online relocation timeout was set to 30 minutes, use of modern coding techniques and connection pooling allowed for a much faster online relocation. As you can see in the log excerpt the entire relocation was completed 2018-07-27 14:55:31,737. Clusterware now tells me that my database runs on node 1:

[root@rac122sec1 ~]# date;  srvctl status database -db DCB -detail -verbose
Fri 27 Jul 14:55:38 2018
Instance DCB_2 is running on node rac122sec1 with online services RON_SVC. Instance status: Open.
Instance DCB_2 is connected to ASM instance +ASM1
Online relocation: INACTIVE

While this is wicked, what are the implications for the application? Have a look at this print screen, taken a minute after the online relocation completed.

As you can see there was nothing unusual recorded (tab events to the left), and I couldn’t see a drop in the number of sessions connected. I noticed a slight blip in performance though but it recovered very soon thereafter.

Summary

During automatic patching of the Oracle stack opatchauto will perform an online relocation of a RAC One Node database if it is found running on the node currently undergoing patching. If your application is developed with RAC in mind – such as using a FAN-aware connection pool like UCP, and either supports Fast Connection Failover or Application Continuity, there is a good chance that patching the system does not affect your service.

While my test was successful, it is by no means representative of a production workload – my RAC One database has a SGA of 4 GB and there were 5 concurrent users – hardly what I see in the real world. However that shouldn’t deter you: if you are using RAC One Node I think it’s definitely worth a try implementing modern technology into the application.

RAC One node databases are relocated by opatchauto in 12.2 part 1

This is an interesting observation I wanted to share. I have a feeling as if there didn’t seem to be too much information out there for RAC One Node (RON) users, and I hope this helps someone thinking about patching his system.

RAC-rolling patching is well documented in patch readme files, blog posts and official white papers. Most RAC DBAs have a solid handle on the procedure. Patching RAC One Node is a different affair.

What happens when patching a RAC One Node system? As the name suggests a RAC One Node database is a cluster database restricted to one active instance in normal operations. It is possible to relocate the database from one node to another online. Oracle does this by temporarily adding a second instance to the cluster database with the intention of letting it take over from the source instance. At the end of the online relocation, the source instance is shut down, and only the destination instance remains up and running.

An online relocation quite often is a manual operation. However I noticed that such an online relocation can happen during patching with opatchauto as well, at least in 12.2.

This post is intended to show you the process as it is, in the next part I’d like to show some implications of that approach.

The environment

In this example my lab environment consists of a 2 node RAC system currently patched to 12.2.0.1.180417. I wanted to apply the July 2018 RU to the system next to get some experience with the patch.

I have one RDBMS home in addition to the mandatory Grid home, same release level for both, no one-off patches (it’s a lab after all). The virtual machines run Oracle Linux 7.4 with kernel UEK4. To keep things simple there’s a single RAC One database, named RON. I assigned it DCB (“data centre B”) as unique name because I don’t like setting db_unique_name to reflect roles such as “PROD” and “STDBY”. It gets confusing when “STDBY” runs in primary role :)

Here’s the current status of my components:

[oracle@rac122sec2 ~]$ srvctl status database -db DCB
Instance DCB_1 is running on node rac122sec1
Online relocation: INACTIVE

[oracle@rac122sec2 ~]$ srvctl status service -db DCB
Service RON_SVC is running on instance(s) DCB_1
[oracle@rac122sec2 ~]$ 

For the curious, here’s the configuration metadata:

[oracle@rac122sec2 ~]$ srvctl config service -db DCB
Service name: RON_SVC
Server pool: 
Cardinality: 1
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: false
Global: false
Commit Outcome: false
Failover type: 
Failover method: 
TAF failover retries: 
TAF failover delay: 
Failover restore: NONE
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: SERVICE_TIME
TAF policy specification: NONE
Edition: 
Pluggable database name: 
Maximum lag time: ANY
SQL Translation Profile: 
Retention: 86400 seconds
Replay Initiation Time: 300 seconds
Drain timeout: 
Stop option: 
Session State Consistency: DYNAMIC
GSM Flags: 0
Service is enabled
Preferred instances: DCB_1
Available instances: 
CSS critical: no

[oracle@rac122sec2 ~]$ srvctl config database -db DCB
Database unique name: DCB
Database name: RON
Oracle home: /u01/app/oracle/product/12.2.0.1/dbhome_1
Oracle user: oracle
Spfile: +DATA/DCB/spfileRON.ora
Password file: +DATA/DCB/orapwRON
Domain: 
Start options: open
Stop options: immediateb
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: 
Disk Groups: DATA,RECO
Mount point paths: 
Services: RON_SVC
Type: RACOneNode
Online relocation timeout: 30
Instance name prefix: DCB
Candidate servers: rac122sec1,rac122sec2
OSDBA group: dba
OSOPER group: oper
Database instances: DCB_1
CSS critical: no
CPU count: 0
Memory target: 0
Maximum memory: 0
Default network number for database services: 
Database is administrator managed

The most important takeaway is that my RON instance DCB_1 is running on node rac122sec1.

Now let’s patch

After having followed the instructions in the patch readme closely, and after double/triple/quadrupel checking that I have (working, tried and tested!) backups of the entire stack I am ready to patch. This time around I’m following the instructions for the automatic application of the Grid Infrastructure RU, eg using opatchauto. Here is some relevant output from the patching session:

...
OPatchauto session is initiated at Thu Jul 26 14:12:12 2018

System initialization log file is /u01/app/12.2.0.1/grid/cfgtoollogs/opatchautodb/systemconfig2018-07-26_02-12-14PM.log.

Session log file is /u01/app/12.2.0.1/grid/cfgtoollogs/opatchauto/opatchauto2018-07-26_02-13-15PM.log
The id for this session is Q4JA

Executing OPatch prereq operations to verify patch applicability on home /u01/app/12.2.0.1/grid

Executing OPatch prereq operations to verify patch applicability on home /u01/app/oracle/product/12.2.0.1/dbhome_1
Patch applicability verified successfully on home /u01/app/oracle/product/12.2.0.1/dbhome_1

Patch applicability verified successfully on home /u01/app/12.2.0.1/grid


Verifying SQL patch applicability on home /u01/app/oracle/product/12.2.0.1/dbhome_1
SQL patch applicability verified successfully on home /u01/app/oracle/product/12.2.0.1/dbhome_1


Preparing to bring down database service on home /u01/app/oracle/product/12.2.0.1/dbhome_1

WARNING: The service RON_SVC configured on dcb will not be switched as it is not configured to run on any other node(s).
No step execution required.........
 

Relocating RACOne home before patching on home /u01/app/oracle/product/12.2.0.1/dbhome_1
/u01/app/oracle/product/12.2.0.1/dbhome_1 is not a RACOne database. No step execution required........
 

Bringing down CRS service on home /u01/app/12.2.0.1/grid
...

Wait a minute, what’s that? Have a look at the line beginning with “Relocating RACOne home before patching…”. Relocating the database wasn’t necessary in this case (remember that the database was active on rac122sec1-the other node), but opatchauto can definitely relocate your RAC One database.

When it does, you will see something like this in the output generated by opatchauto:

...
Preparing to bring down database service on home /u01/app/oracle/product/12.2.0.1/dbhome_1

WARNING: The service RON_SVC configured on dcb will not be switched as it is not configured to run on any other node(s).
Successfully prepared home /u01/app/oracle/product/12.2.0.1/dbhome_1 to bring down database service


Relocating RACOne home before patching on home /u01/app/oracle/product/12.2.0.1/dbhome_1
Relocated RACOne home before patching on home /u01/app/oracle/product/12.2.0.1/dbhome_1
...

The last 2 lines are those of interest. opatchauto detected that a RAC One database was running on the active node, and relocated it. Under the covers it uses a “srvctl relocate database …” command, as shown in the session log file.

Interestingly however, and contrary to what I expected, opatchauto moves the RAC One database back to where it came from as a post-patch step. Towards then end of the patching session I saw this:

...
Starting CRS service on home /u01/app/12.2.0.1/grid
Postpatch operation log file location: /u01/app/oracle/crsdata/rac122sec2/crsconfig/crspatch_rac122sec2_2018-07-26_03-01-06PM.log
CRS service started successfully on home /u01/app/12.2.0.1/grid


Relocating back RACOne to home /u01/app/oracle/product/12.2.0.1/dbhome_1
Relocated back RACOne home successfully to home /u01/app/oracle/product/12.2.0.1/dbhome_1


Preparing home /u01/app/oracle/product/12.2.0.1/dbhome_1 after database service restarted
No step execution required.........
...

The relevant bit is in the middle (“relocating …”). After relocating the database to rac122sec1 opatchauto moved it back to rac122sec2.

Summary

Unlike rolling patching on multi-node RAC where all instances on the patched RDBMS home are shut down and applications rely on connection pools and Fast Application Notification to maintain service ability, a RAC One Node database might be relocated to a different node in the cluster. There are implications to that process for application developers, some of which I hope to share in the next post.

Little things worth knowing: Creating a RAC One Node database on the command line

This post is going to be super short, and mostly just a note to myself as I constantly forget how to create a RAC One database on the command line. This post is for 12.2.0.1 but should be similar on 12.1 (although I didn’t test!).

Provided you are licensed appropriately, this is probably the most basic way how you create an admin-managed RAC One database on Linux for use in a lab environment:

dbca -silent -createDatabase -gdbName RON -templateName gold_image01.dbc \
 -createAsContainerDatabase false -databaseConfigType RACONENODE \
 -RACOneNodeServiceName RON_SVC -totalMemory 1024 \
 -nodelist rac122node1,rac122node2 -storageType ASM \
 -datafileDestination '+DATA' -recoveryAreaDestination '+RECO' \
 -recoveryAreaSize 10240

This works for me, but most likely not for you :) And it’s certainly not suitable for a production deployment. Make sure to adapt the command as needed; I tend to create gold images for use with dbca, and this is one example.

The command itself should be fairly self-explanatory. If you are unsure about the meaning of the various options, have a look at the output of “dbca -help -createDatabase” and the official documentation/My Oracle Support. I learned the hard way that forgetting the “-nodelist” results in a single instance creation instead of an error message.

I didn’t find too many examples on the net, hope someone finds this useful.

OSWatcher, Tracefile Analyzer, and Oracle RAC 12.2

When I started the series about Tracefile Analyzer (TFA) I promised three parts. One for single instance, another one for Oracle Restart and this one is going to be about Real Application Clusters. The previous two parts are published already, this is the final piece.

The environment

I am using a virtualised 2-node cluster named rac122pri with nodes rac122pri1{1,2} based on Oracle Linux 7.4. RAC is patched to 12.2.0.1.180116. I installed a Grid Home and an RDBMS home (Enterprise Edition).

Real Application Clusters

Before starting this discussion it’s worth pointing out that TFA integration in RAC 12.2 works really well. TFA is installed as part of the initial setup of the binaries and documented in the Autonomous Health Framework.

As soon as you execute root.sh as part of your RAC 12.2 Grid Infrastructure installation, TFA is set up as the first step out of 19. Unfortunately I don’t have the screen output available, but all the action is logged in rootcrs_$(hostname)_timestamp.log anyway. You can find that logfile in $ORACLE_BASE/crsdata/$(hostname)/crsconfig. If memory serves me right this is a 12.2 change, I think the file was in $GRID_HOME/cfgtoollogs in 12.1. but I’m not sure (and don’t have a system to check against).

Here is the relevant output from the log file:

2018-05-11 12:16:02: Command output:
>  CLSRSC-594: Executing installation step 1 of 19: 'SetupTFA'. 
>End Command output
2018-05-11 12:16:02: CLSRSC-594: Executing installation step 1 of 19: 'SetupTFA'.
2018-05-11 12:16:02: Executing cmd: /u01/app/12.2.0.1/grid/bin/clsecho -p has -f clsrsc -m 4001
2018-05-11 12:16:02: Command output:
>  CLSRSC-4001: Installing Oracle Trace File Analyzer (TFA) Collector. 
>End Command output
2018-05-11 12:16:02: CLSRSC-4001: Installing Oracle Trace File Analyzer (TFA) Collector.
2018-05-11 12:16:02: Executing cmd: /u01/app/12.2.0.1/grid/crs/install/tfa_setup -silent -crshome /u01/app/12.2.0.1/grid
2018-05-11 12:16:24: Command output:
>  TFA Installation Log will be written to File : /tmp/tfa_install_8879_2018_05_11-12_16_02.log
>  
>  Starting TFA installation
>  
>  Using JAVA_HOME : /u01/app/12.2.0.1/grid/jdk/jre
>  
>  Running Auto Setup for TFA as user root...
>  
>  Installing TFA now...
>  
>  
>  TFA Will be Installed on rac122pri1...
>  
>  TFA will scan the following Directories
>  ++++++++++++++++++++++++++++++++++++++++++++
>  
>  .---------------------------------------------------------.
>  |                        rac122pri1                       |
>  +----------------------------------------------+----------+
>  | Trace Directory                              | Resource |
>  +----------------------------------------------+----------+
>  | /u01/app/12.2.0.1/grid/OPatch/crs/log        | CRS      |

... [a lot more directories]

>  | /u01/app/oracle/crsdata/rac122pri1/trace     | CRS      |
>  '----------------------------------------------+----------'
>  
>  
>  Installing TFA on rac122pri1:
>  HOST: rac122pri1     TFA_HOME: /u01/app/12.2.0.1/grid/tfa/rac122pri1/tfa_home
>  
>  .------------------------------------------------------------------------------.
>  | Host       | Status of TFA | PID  | Port | Version    | Build ID             |
>  +------------+---------------+------+------+------------+----------------------+
>  | rac122pri1 | RUNNING       | 9165 | 5000 | 12.2.1.0.0 | 12210020161122170355 |
>  '------------+---------------+------+------+------------+----------------------'
>  
>  Running Inventory in All Nodes...
>  
>  Enabling Access for Non-root Users on rac122pri1...
>  
>  Adding default users to TFA Access list...
>  
>  Summary of TFA Installation:
>  .----------------------------------------------------------------------.
>  |                              rac122pri1                              |
>  +---------------------+------------------------------------------------+
>  | Parameter           | Value                                          |
>  +---------------------+------------------------------------------------+
>  | Install location    | /u01/app/12.2.0.1/grid/tfa/rac122pri1/tfa_home |
>  | Repository location | /u01/app/oracle/tfa/repository                 |
>  | Repository usage    | 0 MB out of 10240 MB                           |
>  '---------------------+------------------------------------------------'
>  
>  
>  Installing oratop extension..
>  
>  
>  TFA is successfully installed...

...

>End Command output
2018-05-11 12:16:24: Executing cmd: /u01/app/12.2.0.1/grid/bin/clsecho -p has -f clsrsc -m 4002
2018-05-11 12:16:24: Command output:
>  CLSRSC-4002: Successfully installed Oracle Trace File Analyzer (TFA) Collector. 
>End Command output

As with the other database deployment types, TFA integrates into systemd on Oracle Linux 7:

[root@rac122pri1 ~]# systemctl status oracle-tfa
● oracle-tfa.service - Oracle Trace File Analyzer
   Loaded: loaded (/etc/systemd/system/oracle-tfa.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2018-06-20 23:03:11 BST; 15h ago
 Main PID: 3092 (init.tfa)
   CGroup: /system.slice/oracle-tfa.service
           ├─ 3092 /bin/sh /etc/init.d/init.tfa run >/dev/null 2>&1 </dev/null
           ├─ 3698 /u01/app/12.2.0.1/grid/tfa/rac122pri1/tfa_home/jre/bin/java -
  server -Xms32m -Xmx64m -Djava.awt.headless=true -Ddisable.checkForUpdate=true
  oracle.rat.tfa.TFAMain /u01/app/12.2.0.1/grid/tfa/rac122pri1/...
           └─15827 /bin/sleep 30

The service is enabled and starts together with the operating system.

RAC-TFA lives in the Grid Home

If you read the previous parts of this series you might have noticed in the previous output that unlike single instance and Oracle Restart environments, TFA is resident in the Grid Home as opposed to ORACLE_BASE. This could have implications if you are using a system implementing separation of duties. For everyone else it’s a question of making sure the environment is set correctly.

[oracle@rac122pri1 ~]$ . oraenv
ORACLE_SID = [oracle] ? +ASM1
The Oracle base has been set to /u01/app/oracle
[oracle@rac122pri1 ~]$ which tfactl
/u01/app/12.2.0.1/grid/bin/tfactl

Another difference between single node and RAC installations is the status message. TFA reports its status on all nodes in the cluster, as shown here:

[oracle@rac122pri1 ~]$ tfactl print status

.-------------------------------------------------------------------------------------------------.
| Host       | Status of TFA | PID  | Port | Version    | Build ID             | Inventory Status |
+------------+---------------+------+------+------------+----------------------+------------------+
| rac122pri1 | RUNNING       | 3698 | 5000 | 18.2.1.0.0 | 18210020180529111033 | COMPLETE         |
| rac122pri2 | RUNNING       | 3697 | 5000 | 18.2.1.0.0 | 18210020180529111033 | COMPLETE         |
'------------+---------------+------+------+------------+----------------------+------------------'
[oracle@rac122pri1 ~]$

This isn’t the stock version by the way, I have recently upgraded my deployment to 18.2.1.0. More about that later. I recently installed RAC 12.2 and when invoking TFA am greeted with the familiar “WARNING – your software is older than 180 days. Please consider upgrading TFA to the latest version”.

You may want to consider upgrading the installation

As with the other deployments, the version you get with the base installation is most likely outdated. My base release showed this:

.-------------------------------------------------------------------------------------------------.
| Host       | Status of TFA | PID  | Port | Version    | Build ID             | Inventory Status |
+------------+---------------+------+------+------------+----------------------+------------------+
| rac122pri1 | RUNNING       | 3698 | 5000 | 12.2.1.0.0 | 12210020161122170355 | COMPLETE         |
| rac122pri2 | RUNNING       | 3697 | 5000 | 12.2.1.0.0 | 12210020161122170355 | COMPLETE         |
'------------+---------------+------+------+------------+----------------------+------------------'

But not only that: if you check for toolstatus you will notice that most of the support tools are missing just like with the other deployments. I love to have them and as such upgraded to the current version at the time of writing (18.2.1.0.0). My Oracle Support DOC ID 1513912.1 is the reference, as always.

After the upgrade completed, I could see the following tools:

[oracle@rac122pri1 ~]$ tfactl toolstatus

.------------------------------------------------------------------.
|                 TOOLS STATUS - HOST : rac122pri1                 |
+----------------------+--------------+--------------+-------------+
| Tool Type            | Tool         | Version      | Status      |
+----------------------+--------------+--------------+-------------+
| Development Tools    | orachk       |   12.2.0.1.3 | DEPLOYED    |
|                      | oratop       |       14.1.2 | DEPLOYED    |
+----------------------+--------------+--------------+-------------+
| Support Tools Bundle | darda        | 2.10.0.R6036 | DEPLOYED    |
|                      | oswbb        |        8.0.1 | RUNNING     |
|                      | prw          | 12.1.13.11.4 | NOT RUNNING |
+----------------------+--------------+--------------+-------------+
| TFA Utilities        | alertsummary |   12.2.1.1.0 | DEPLOYED    |
|                      | calog        |   12.2.0.1.0 | DEPLOYED    |
|                      | changes      |   12.2.1.1.0 | DEPLOYED    |
|                      | dbglevel     |   12.2.1.1.0 | DEPLOYED    |
|                      | events       |   12.2.1.1.0 | DEPLOYED    |
|                      | grep         |   12.2.1.1.0 | DEPLOYED    |
|                      | history      |   12.2.1.1.0 | DEPLOYED    |
|                      | ls           |   12.2.1.1.0 | DEPLOYED    |
|                      | managelogs   |   12.2.1.1.0 | DEPLOYED    |
|                      | menu         |   12.2.1.1.0 | DEPLOYED    |
|                      | param        |   12.2.1.1.0 | DEPLOYED    |
|                      | ps           |   12.2.1.1.0 | DEPLOYED    |
|                      | pstack       |   12.2.1.1.0 | DEPLOYED    |
|                      | search       |   18.2.0.0.0 | DEPLOYED    |
|                      | summary      |   12.2.1.1.0 | DEPLOYED    |
|                      | tail         |   12.2.1.1.0 | DEPLOYED    |
|                      | triage       |   12.2.1.1.0 | DEPLOYED    |
|                      | vi           |   12.2.1.1.0 | DEPLOYED    |
'----------------------+--------------+--------------+-------------'

Note :-
  DEPLOYED    : Installed and Available - To be configured or run interactively.
  NOT RUNNING : Configured and Available - Currently turned off interactively.
  RUNNING     : Configured and Available.

Summary

Whilst TFA is a great tool for single instance and Oracle Restart environments, it is indispensible for RAC. The ability to gather a lot of diagnostic data that would otherwise be difficult to get is the prime use case. Tools such as OSWatcher are very useful when it comes to diagnosing system status at the time of an instance eviction for example.

Hybrid Columnar Compression in 12.2 – nice new feature

Oracle 12.2 introduced an interesting optimisation for Hybrid Columnar Compression (HCC). Until 12.2 you had to use direct path inserts into HCC compressed segments for data to be actually compressed. If you didn’t use a direct path insert you would still succeed in entering data into the segment, however your newly inserted data was not HCC compressed. There is no error message or other warning telling you about that, which can lead to surprises for the unaware.

My friend and colleague Nick has pointed out that the official HCC white paper states – somewhat hidden – that this requirement is no longer as strict in 12.2. I haven’t managed to find the document Nick pointed out, but a quick search using my favourite retrieval engine unearthed the updated version for 18c.

The new feature is easy to miss: you read on page 3 that data must be loaded using data warehouse bulk loading (direct path) techniques. Examples of said technologies follow before the author continues with a discussion about querying HCC data. For me that was the end of the subject… Turns out it wasn’t: the nugget Nick unearthed was on page 4, in the last paragraph before the next section on “HCC Warehouse (Query) Compression”. Quoting literally from said white paper:

Starting with Oracle Database 12.2, HCC automatically compresses new data from SQL INSERT … SELECT statements, without the APPEND hint and array inserts from programmatic interfaces such as PL/SQL and the Oracle Call Interface (OCI).

Aha! Once you know what to look for you find the same information in the 12.2 new features guide, too. Sometimes it’s hard to see the wood for all those trees.

So there is no more need for the append hint for insert-select in 12.2? This would be great news, and should alleviate some issues people have from unknowingly using HCC without direct path inserts. And it’s good for concurrency! Although I do like white papers, there are times when the white paper has to be checked against the lab to see if it is actually true.

I did exactly this for this blog post.

12.2 Test Case

I am using the SOE.ORDERS table for that purpose, as it holds a fair bit of data. To see whether the new algorithm works I decided to create a new empty table ORDERS_HCC with the same table structure as SOE.ORDERS. In the next step I issue an insert-select statement. If the white paper is correct it’ll compress the data using Query High.

SQL (12.2)> show user    
USER is "SOE"
SQL (12.2)> select banner from v$version where rownum  create table orders_hcc 
  2  column store compress for query high 
  3  as select * from orders where 1 = 0; 
                                                                                                
Table created.

SQL (12.2)> insert into orders_hcc 
  2  select * from orders where rownum  commit;

Commit complete.

SQL (12.2)>

Note that I specifically omitted the /*+ append */ hint in the insert statement. Also note that the preceding CTAS statement didn’t select any rows from the source. In Oracle releases up to and including 12.1, data in ORDERS_HCC would not be compressed at the end of this little code snippet.

But how can you prove the white paper is right? Thankfully Oracle is well instrumented, and even more thankfully that instrumentation not limited to performance. The Oracle-provided dbms_compression package features a procedure called “get_compression_type()”, which allows you to pass it a ROWID and some other information and it’ll tell you the block’s compression algorithm.

Remember that you can change the compression algorithm for a given segment many times over. A partition in a range partitioned table can start uncompressed while being actively used, and as data gets colder, compression levels can be increased. Technically speaking the “alter table … column store” command on its own does not change the way data is stored in the segment. Data currently stored will remain in whatever state it was before changing the compression attribute. Only newly inserted data will be compressed according to the changed segment attribute.

Back to the example: using DBMS_COMPRESSION I would like to find out if my table data is indeed compressed for Query High after my earlier insert command. Let’s see if it is, using the first 10 rows as a rough indication.

SQL (12.2)> select dbms_compression.get_compression_type(user, 'ORDERS_HCC', rowid) ctype
  2  from ORDERS_HCC where rownum < 11;

     CTYPE
----------
	 4
	 4
	 4
	 4
	 4
	 4
	 4
	 4
	 4
	 4

10 rows selected.

Well it would appear as if these are all compressed for Query High (QH). Looking at the package definition I can see that a compression type of 4 indicates Query High.

So far so good, but I merely checked 10 out of 1 million rows. I’m fairly sure the rest of the table is also HCC compressed for QH, but I want proof. To remove any doubts, I can query the whole table. I’m doing this so you don’t have to. The next query will take forever (eg certainly more than 1 minute) to execute, and it is CPU bound so please don’t do this at work. If you really feel like having to run a query like this, don’t do it outside the lab. You have been warned :)

SQL (12.2)> with comptypes as (
  2   select rownum rn, dbms_compression.get_compression_type(user, 'ORDERS_HCC', rowid) ctype
  3     from ORDERS_HCC
  4  )
  5      select count(*), ctype from comptypes group by ctype;

  COUNT(*)      CTYPE
---------- ----------
   1000000          4

I always start these types of queries in a screen (1) session to prevent network connection issues from interrupting my long running task. After some time, the query returns with the results as you can see. The entire table is compressed with Query High.

Summary

Array-inserts into HCC segments can compress data in Oracle 12.2 even if you don’t specify the append hint. The behaviour for conventional inserts did not change. I am going to post another piece of research containing some more data later this week or next.

You may still need gcc when patching Oracle Database 12.2

I have previously written about changes in the Oracle 12.2 preinstall RPM and how gcc is no longer part of the dependencies list. As was pointed out to me, this shouldn’t be necessary anymore, according to the 12.2 Linux Database Installation Guide. Check the blue note for a statement indicating that gcc and gcc-c++ aren’t needed for Grid Infrastructure, nor for the RDBMS software.

I have applied patch 27100009 (January 2018 Release Update 12.2.0.1.180116) on my 2 node RAC system in the lab, and found out that this is partially true :) You may or may not encounter this issue in your environment, see below.

Updating the preinstall RPM

Just to be sure I didn’t miss any changes to the preinstall RPM I pulled the latest one from Oracle and checked it’s requirements for gcc.

[oracle@rac122pri1 ~]$ rpm -qi oracle-database-server-12cR2-preinstall-1.0-3.el7.x86_64
Name        : oracle-database-server-12cR2-preinstall
Version     : 1.0
Release     : 3.el7
Architecture: x86_64
Install Date: Fri 19 Jan 2018 03:13:18 PM GMT
Group       : Test Environment/Libraries
Size        : 56561
License     : GPLv2
Signature   : RSA/SHA256, Mon 10 Jul 2017 11:27:07 AM BST, Key ID 72f97b74ec551f03
Source RPM  : oracle-database-server-12cR2-preinstall-1.0-3.el7.src.rpm
Build Date  : Mon 10 Jul 2017 11:26:59 AM BST
Build Host  : x86-ol7-builder-02.us.oracle.com
Relocations : (not relocatable)
Vendor      : Oracle
Summary     : Sets the system for Oracle Database single instance and Real Application
              Cluster install for Oracle Linux 7
Description :
The Oracle Preinstallation RPM package installs software packages and sets system parameters 
required for Oracle Database single instance and Oracle Real Application Clusters installations 
for Oracle Linux Release 7
Files affected: /etc/sysctl.conf, /boot/grub/menu.lst OR /boot/grub2/grub.cfg
Files added: /etc/security/limits.d/oracle-database-server-12cR2-preinstall.conf
[oracle@rac122pri1 ~]$ rpm -q --requires oracle-database-server-12cR2-preinstall-1.0-3.el7.x86_64 | grep -i gcc
libgcc
[oracle@rac122pri1 ~]$ 

So gcc is definitely not part of the dependent packages, and my minimum install doesn’t pull gcc (or gcc-c++ for that matter):

[oracle@rac122pri1 ~]$ rpm -qa | grep gcc
libgcc-4.8.5-16.el7_4.1.x86_64
[oracle@rac122pri1 ~]$ 

This is a 2 node cluster, consisting of nodes rac122pri1 and rac122pri2. Both use Oracle Linux 7.4 with UEK4 patched to February 12th 2018. As the names I picked for my cluster nodes suggest, this is a system running RAC 12.2.

IMPORTANT note: just because I hit the issue doesn’t mean there’s an issue for everyone. My lab environment is that – my lab environment. I just wanted to point out a way to investigate the problem and how I resolved it in my lab environment. Your mileage might vary.

Applying the January 2018 RU

I always like to be current when it comes to Oracle, and in that spirit decided to apply the January 2018 RU to my cluster after having made sure that I have a working backup. My rule is that there is no patching without backups, ever. And by backups I mean backups of the entire stack ;)

Since this was a new installation in my lab I thought I’d give opatchauto another chance and run with it. What could possibly go wrong?

Node 1 went without any issues, and opatchauto reported that it had applied the patches it wanted to apply. I don’t have the screen output anymore, however here’s the summary after the patch completed.

[oracle@rac122pri1 ~]$ opatch lspatches -oh /u01/app/oracle/product/12.2.0.1/dbhome_1
27335416;OCW JAN 2018 RELEASE UPDATE 12.2.0.1.180116 (27335416)
27105253;Database Release Update : 12.2.0.1.180116 (27105253)

OPatch succeeded.

[oracle@rac122pri1 ~]$ opatch lspatches -oh /u01/app/12.2.0.1/grid/
27335416;OCW JAN 2018 RELEASE UPDATE 12.2.0.1.180116 (27335416)
27144050;Tomcat Release Update 12.2.0.1.0(ID:171023.0830) (27144050)
27128906;ACFS Release Update : 12.2.0.1.0 (27128906)
27105253;Database Release Update : 12.2.0.1.180116 (27105253)
26839277;DBWLM RELEASE UPDATE 12.2.0.1.0(ID:170913) (26839277)

OPatch succeeded.
[oracle@rac122pri1 ~]$

Did you notice the “Tomcat Release Update 12.2.0.1.0” in the Grid Home? So much to investigate!

Patching node 2

After the patch completed on node 1 I continued with node 2. At first, everything looked quite normal, and opatch did it’s thing. I have taken a habit to always apply anything that depends on the network connection to be up in a screen (1) session. That protects the work I’m doing from intermittent network trouble, and allows me to keep a view on how things are progressing.

After a few minutes of work, opatchauto reported this (formatted for readability):

Start applying binary patch on home /u01/app/12.2.0.1/grid
Failed while applying binary patches on home /u01/app/12.2.0.1/grid

Execution of [OPatchAutoBinaryAction] patch action failed, check log for more details. Failures:
Patch Target : rac122pri2->/u01/app/12.2.0.1/grid Type[crs]
Details: [
---------------------------Patching Failed---------------------------------
Command execution failed during patching in home: /u01/app/12.2.0.1/grid, host: rac122pri2.
Command failed:  /u01/app/12.2.0.1/grid/OPatch/opatchauto  apply /u01/stage/27100009 -oh /u01/app/12.2.0.1/grid
    -target_type cluster -binary -invPtrLoc /u01/app/12.2.0.1/grid/oraInst.loc 
    -jre /u01/app/12.2.0.1/grid/OPatch/jre 
    -persistresult /u01/app/12.2.0.1/grid/OPatch/auto/dbsessioninfo/sessionresult_rac122pri2_crs.ser 
    -analyzedresult /u01/app/12.2.0.1/grid/OPatch/auto/dbsessioninfo/sessionresult_analyze_rac122pri2_crs.ser
Command failure output:
==Following patches FAILED in apply:

Patch: /u01/stage/27100009/27335416
Log: /u01/app/12.2.0.1/grid/cfgtoollogs/opatchauto/core/opatch/opatch2018-01-22_09-37-57AM_1.log
Reason: Failed during Patching: oracle.opatch.opatchsdk.OPatchException: Re-link fails on target "install_srvm".

Not exactly what you want to see during patching. But there’s no need to panic (just yet ;) The error message points me to a log file. Opening the log file it quickly become apparent why the issue occurred:

[22-Jan-2018 09:40:30] [INFO]       [OPSR-TIME] Finished applying patch "27335416" to local system
[22-Jan-2018 09:40:31] [INFO]       [OPSR-TIME] Loading raw inventory
[22-Jan-2018 09:40:31] [INFO]       [OPSR-MEMORY] Loaded all components from inventory. Heap memory in use: 314 (MB)
[22-Jan-2018 09:40:31] [INFO]       [OPSR-MEMORY] Loaded all one offs from inventory. Heap memory in use: 314 (MB)
[22-Jan-2018 09:40:31] [INFO]       [OPSR-TIME] Raw inventory loaded successfully
[22-Jan-2018 09:40:31] [INFO]       [OPSR-TIME] Loading cooked inventory
[22-Jan-2018 09:40:31] [INFO]       [OPSR-MEMORY] : Loading cooked one offs. Heap memory used 314 (MB)
[22-Jan-2018 09:40:32] [INFO]       [OPSR-MEMORY] : Loaded cooked oneoffs. Heap memory used : 363 (MB)
[22-Jan-2018 09:40:32] [INFO]       [OPSR-TIME] Cooked inventory loaded successfully
[22-Jan-2018 09:40:32] [INFO]       OUI-67050:Running make for target install_srvm
[22-Jan-2018 09:40:32] [INFO]       Start invoking 'make' at Mon Jan 22 09:40:32 GMT 2018Mon Jan 22 09:40:32 GMT 2018
[22-Jan-2018 09:40:32] [INFO]       Finish invoking 'make' at Mon Jan 22 09:40:32 GMT 2018
[22-Jan-2018 09:40:32] [WARNING]    OUI-67200:Make failed to invoke "/usr/bin/make -f ins_srvm.mk install_srvm 
                                    ORACLE_HOME=/u01/app/12.2.0.1/grid"....'/bin/sh: /usr/bin/gcc: No such file or directory
                                    make: *** [/u01/app/12.2.0.1/grid/rdbms/lib/config.o] Error 127
                                    '
[22-Jan-2018 09:40:32] [INFO]       Stack Description: java.lang.RuntimeException: /bin/sh: /usr/bin/gcc: No such file or directory
                                    make: *** [/u01/app/12.2.0.1/grid/rdbms/lib/config.o] Error 127
[22-Jan-2018 09:40:32] [INFO]       StackTrace: oracle.opatch.MakeAction.apply(MakeAction.java:534)

This is actually quite simple: for some reason opatch wants to use gcc, and doesn’t find it. After installing gcc and dependencies, I resumed opatchauto and finished patching successfully. I haven’t understood why Oracle wants to use gcc on node 2 after it didn’t require it on node 1.

Just thought I’d pass this on in case you hit the same problem. Happy patching!