I have come across an odd behaviour trying to patch an Oracle Restart environment to 126.96.36.199 January 2019. Based on a twitter conversation this isn’t necessarily limited to my patch combination, there might be others as well. I have used opatchauto to apply patch 28813884 to both RDBMS and GRID homes plus its corresponding OJVM (Java) patch. Before diving into details, this is the environment I have been working with:
- Oracle Restart 188.8.131.52 with an earlier PSU/OJVM combo applied
- Separation of duties with oracle as the RDBMS owner, and grid owning the GRID infrastructure installation. This is a key point!
- Patches to be applied
- Patch 28813884 (GI PSU)
- Patch 28790654 (corresponding OJVM patch)
- OPatch version as installed in the GRID|RDBMS home
- opatchauto: 184.108.40.206.0
- opatch: 220.127.116.11.17
Following the instructions in the readme file I upgraded OPatch to the required version. The file I downloaded was named p6880880_121010_Linux-x86-64.zip. I double-checked the readme, and to me this is the correct file. After upgrading OPatch in the RDBMS and GRID homes, I started patching.
After this process completed, I wanted to start the database in UPGRADE mode as required by the OJVM patch. This needs to be done via sqlplus since srvctl does not support an “upgrade” option in “start database”.
And that’s where it hit me: whenever I tried to open the database in UPGRADE mode, it threw an error:
SQL*Plus: Release 18.104.22.168.0 Production on Tue Jul 16 12:15:34 2019 Copyright (c) 1982, 2014, Oracle. All rights reserved. Connected to an idle instance. SQL> startup upgrade ORACLE instance started. Total System Global Area 1073741824 bytes Fixed Size 2932632 bytes Variable Size 713031784 bytes Database Buffers 352321536 bytes Redo Buffers 5455872 bytes ORA-03113: end-of-file on communication channel Process ID: 22354 Session ID: 14 Serial number: 41958
Looking at the alert log I can see that RBAL terminated the instance:
ALTER DATABASE MOUNT ALTER SYSTEM SET local_listener=' (ADDRESS=(PROTOCOL=TCP)(HOST=192.168.56.113)(PORT=1521))' SCOPE=MEMORY SID='ORCL'; 2019-07-16 12:15:43.404000 -04:00 NOTE: ASMB mounting group 1 (DATA) WARNING: cellinit.ora is missing. RBAL is terminating the instance. RBAL (ospid: 22332): terminating the instance due to error 27625 System state dump requested by (instance=1, osid=22332 (RBAL)), summary=[abnormal instance termination]. System State dumped to trace file /u01/app/oracle/diag/rdbms/orcl/ORCL/trace/ORCL_diag_22306_20190716121543.trc Dumping diagnostic data in directory=[cdmp_20190716121543], requested by (instance=1, osid=22332 (RBAL)), summary=[abnormal instance termination]. Instance terminated by RBAL, pid = 22332 2019-07-16 12:15:49.164000 -04:00
But since this is Oracle Restart, Clusterware will simply restart the database. Unless of course you’ve been changing the default behaviour. And funny enough, this works (see further down in the article as to why). Although I’d appreciate this in most cases, the automatic restart isn’t appropriate in my situation: when started by Clusterware, the database is not in upgrade mode:
SQL> select status from v$instance; STATUS ------------ OPEN
Which is a problem for me. A MOS search about error 27625 didn’t reveal anything too useful, and it took me quite a while to realise the problem has to do with permissions. An Internet search finally gave me the right answer, a fellow blogger has pointed it out a little while ago …
So what exactly is the problem? The RDBMS “oracle” binary needs a specific set of permissions and ownership/group membership for Oracle Restart with separation of duties enabled. This is what it looked like before applying the patch:
$ ls -l $ORACLE_HOME/bin/oracle -rwsr-s--x. 1 oracle asmadmin 324518992 Jul 16 12:15 /u01/app/oracle/product/22.214.171.124/dbhome_1/bin/oracle
The file permissions are ‘6751’ with oracle owning the file but it belongs to the asmadmin group. Remember, I am using a separate user for Grid Infrastructure with its own specific operating system groups. After running opatch, this changed to:
$ ls -l $ORACLE_HOME/bin/oracle -rwsr-s--x. 1 oracle oinstall 324518992 Jul 16 12:29 /u01/app/oracle/product/126.96.36.199/dbhome_1/bin/oracle
Permissions remained the same, however the group changed from asmadmin to oinstall. Not quite what I had in mind, and it reproducibly causes instance crashes. There is a simple solution: make sure permissions are set correctly! Using “Database Creation on 11.2/12.1/12.2 Grid Infrastructure with Role Separation ( ORA-15025, KFSG-00312, ORA-15081 ) (Doc ID 1084186.1)” got me on the right track.
$ ls -l $ORACLE_HOME/bin/oracle -rwsr-s--x. 1 oracle asmadmin 324518992 Jul 16 12:29 /u01/app/oracle/product/188.8.131.52/dbhome_1/bin/oracle $ sqlplus / as sysdba SQL*Plus: Release 184.108.40.206.0 Production on Tue Jul 16 12:34:45 2019 Copyright (c) 1982, 2014, Oracle. All rights reserved. Connected to an idle instance. SQL> startup upgrade ORACLE instance started. Total System Global Area 1073741824 bytes Fixed Size 2932632 bytes Variable Size 713031784 bytes Database Buffers 352321536 bytes Redo Buffers 5455872 bytes Database mounted. Database opened. SQL> select status from v$instance; STATUS ------------ OPEN MIGRATE SQL>
With the database in upgrade mode (I believe the “migrate” is a left over from the 9i days) I am reassured that running datapatch (the OJVM part actually) works as advertised.
By the way Clusterware corrects the group permissions when you issue a “srvctl start database -db …” command as documented in “Starting the database using srvctl changes the group setting for oracle binary (Doc ID 1508027.1)”. Which didn’t solve my problem as I can’t start the database in upgrade mode using srvctl.
Hope this helps you one day!