Martins Blog

Trying to explain complex things in simple terms

Oracle Restart

Posted by Martin Bach on October 1, 2009

Oracle Restart is a new feature introduced in Clusterware 11.2. In simple terms it allows you to register resource such as ASM disk groups, ASM- and RDBMS instances in Clusterware in a way very similar to what we did in RAC. The aim is to make custom startup scripts obsolete by starting all resources and their dependencies on a database sever through CRS or Grid Infrastructure as it’s called now.

A quick note as an introduction: CRS and Clusterware are used interchangeably in this article.

As you may already know, Oracle ASM and Clusterware are now bundled into the grid infrastructure installation zipfile. Even if you intend to run single instance Oracle based on ASM, you need to install grid infrastructure into a separate Oracle home in addition to the RDBMS software. The ASM administration option has been removed from dbca and moved into asmca, which actually looks quite nice and not so dated as dbca did and still does. But hey – I don’t use the assistants if I can avoid it and don’t worry too much about their 1990s look and feel (but OUI now looks a lot better in 11.2!). The installation of CRS, ASM and RDBMS into three different homes used to be best practice in 10.2; the installation of CRS and ASM into the same home is a deviation from the 10g theme.

So for single instance Oracle using ASM we need to install Grid Infrastructure first. The installer wants to put the binaries into /u01/app/oracle/product/11.2/grid with ORACLE_BASE set to /u01/app/oracle.

The installation is quite simple, usually a next-next-next approach leads to the desired result. The execution of root.sh takes a little longer as it creates the ASM instance as well. At the end of the installation, you’ll have Oracle Restart as well as ASM plus a your diskgroups registered. Here’s the first difference to what we know from Oracle 10g: there is no need to run “$ORACLE_HOME/bin/localconfig add” as root to enable a trimmed down veresion of CSS to be started, it’s already done for you. And there is more visibility as well. The RAC admin will know most of these commands already, and generally there seems to be less of a difference between single instance and RAC administration when it comes to the tools employed. The exception of course is that this time we execute the commands on a “single instance” system. For example, after the installation finished, these were the resources registered on my system, called devbox001.

[oracle@devbox001 ~]$ crsctl status resource -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.LISTENER.lsnr
               ONLINE  ONLINE       devbox001
ora.DATA.dg
               ONLINE  ONLINE       devbox001
ora.REDO.dg
               ONLINE  ONLINE       devbox001
ora.FRA.dg
               ONLINE  ONLINE       devbox001
ora.asm
               ONLINE  ONLINE       devbox001            Started
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.cssd
      1        ONLINE  ONLINE       devbox001
ora.diskmon
      1        ONLINE  ONLINE       devbox001
ora.dev1.db
      1        ONLINE  ONLINE       devbox001            Open
ora.dev2.db
      1        ONLINE  ONLINE       devbox001            Open
ora.dev3.db
      1        ONLINE  ONLINE       devbox001            Open

The old crs_stat command is deprecated in 11.2, but still works:

[oracle@rhel5 rdbms]$ crs_stat -t
Name           Type           Target    State     Host        
------------------------------------------------------------
ora.DATA.dg    ora....up.type ONLINE    ONLINE    rhel5       
ora....ER.lsnr ora....er.type ONLINE    ONLINE    rhel5       
ora.asm        ora.asm.type   ONLINE    ONLINE    rhel5       
ora.cssd       ora.cssd.type  ONLINE    ONLINE    rhel5       
ora.diskmon    ora....on.type ONLINE    ONLINE    rhel5

The first surprise here was the diskgroup resource, “DATA”. This is a useful addition, in previous releases up to 11.1 you had to manually set the ASM initialisation parameter asm_diskgroups to specify all the disk groups ASM should automatically mount. Now in 11.2 this parameter is actually unset:

[oracle@rhel5 ~]$ sqlplus / as sysasm

SQL*Plus: Release 11.2.0.1.0 Production on Mon Sep 28 19:53:14 2009

Copyright (c) 1982, 2009, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Automatic Storage Management option

SQL> show parameter diskgr

NAME                     TYPE     VALUE
------------------------------------ ----------- ------------------------------
asm_diskgroups                 string

A quick query against v$asm_diskgroup_stat reveals that the diskgroup DATA is mounted nevertheless.

SQL> select name, state from v$asm_diskgroup_stat;

NAME                   STATE
------------------------------ -----------
DATA                   MOUNTED

From now on you can’t stop ASM without stopping the diskgroup first (well, you could use the force option).

Also note that you don’t set a direct dependency between asm and the database (instance) as in 10g/11.1 anymore. You now reference the diskgroup(s):

[oracle@rhel5 ~]$ srvctl add database -h

Adds a database configuration to be managed by Oracle Restart.

Usage: srvctl add database -d <db_unique_name> -o <oracle_home> [-m <domain_name>] [-p <spfile>]
[-r {PRIMARY | PHYSICAL_STANDBY | LOGICAL_STANDBY | SNAPSHOT_STANDBY}] [-s <start_options>]
[-t <stop_options>] [-n <db_name>] [-y {AUTOMATIC | MANUAL}] [-a "<diskgroup_list>"]
    -d <db_unique_name>      Unique name for the database
    -o <oracle_home>         ORACLE_HOME path
    -m <domain>              Domain for database. Must be set if database has DB_DOMAIN set.
    -p <spfile>              Server parameter file path
    -r <role>                Role of the database (primary, physical_standby, logical_standby,
                             snapshot_standby)
    -s <start_options>       Startup options for the database. Examples of startup options are open,
                             mount, or nomount.
    -t <stop_options>        Stop options for the database. Examples of shutdown options are normal,
                             transactional, immediate, or abort.
    -n <db_name>             Database name (DB_NAME), if different from the unique name given by the
                             -d option
    -y <dbpolicy>            Management policy for the database (AUTOMATIC or MANUAL)
    -a "<diskgroup_list>"    Comma separated list of disk groups
    -h                       Print usage

If you wondered which of these to set – this is an example from dbca:

$> srvctl add database -d orcl -o /u01/app/oracle/product/11.2.0/dbhome_1 -p '+DATA/orcl/spfileorcl.ora' \
    -n orcl -a DATA

Administering

From now on refrain from using sqlplus for administering the instance. srvctl is used for all your needs, such as in starting (srvctl start database -d dbname), stopping (srvctl stop database -d dbname). The -o option allows you to pass abort, immediate, mount, nomount etc.

The handling of services has also seen improvement. Instead of calling dbms_service or setting service_names (in single instance only!), we can now use srvctl add service -d <dbname> -s <servicename> instead. Again, RAC and single instance Oracle are a little more unified. Here are the options:

Usage: srvctl add service -d <db_unique_name> -s <service_name> [-l [PRIMARY][,PHYSICAL_STANDBY]
[,LOGICAL_STANDBY][,SNAPSHOT_STANDBY]] [-y {AUTOMATIC | MANUAL}][-q {true|false}] [-j {SHORT|LONG}]
 [-B {NONE|SERVICE_TIME|THROUGHPUT}][-e {NONE|SESSION|SELECT}] [-m {NONE|BASIC}][-z <failover_retries>]
 [-w <failover_delay>]
    -d <db_unique_name>      Unique name for the database
    -s <service>             Service name
    -l <role>                Role of the service (primary, physical_standby, logical_standby,
                             snapshot_standby)
    -y <policy>              Management policy for the service (AUTOMATIC or MANUAL)
    -e <Failover type>       Failover type (NONE, SESSION, or SELECT)
    -m <Failover method>     Failover method (NONE or BASIC)
    -w <integer>             Failover delay
    -z <integer>             Failover retries
    -j <clb_goal>  Connection Load Balancing Goal (SHORT or LONG). Default is LONG.
    -B <Runtime Load Balancing Goal>     Runtime Load Balancing Goal (SERVICE_TIME, THROUGHPUT, or NONE)
    -q <AQ HA notifications> AQ HA notifications (TRUE or FALSE)
    -h                       Print usage

Now how cool is that! We can have services depending on database role! No more messing around with startup triggers to set services based on a database role. No more “Oracle startup or shutdown in progress” when connecting to the wrong host after a switchover/failover operation. Nice.

Some of the options such clb goals don’t really make sense in single instance oracle though…

If you are unsure about the configuration of a component, use srvctl config <component> -a to display information. For a database, that’s srvctl config database -d <dbname> -a. The same works for ASM as well.

Troubleshooting and logs

For the following sections it is useful to remember that Oracle Restart is called OHAS internally – it makes finding logs a lot easier.

ASM logs and traces are stored in the diag_dest directory, there is no difference to a 11.1 ASM instance here. I’d recommend that you get used to using adrci, personally I find it a lot easier than navigating the file system. And finally there is a unified way to access the alert log! If you are working for a managed service provider then this is a godsent!

The logging for CRS is not yet integrated into the new diagnostic framework (why not?), so you’ll have to dig out the important information yourself. Important log file locations are:

$ORACLE_HOME/log/<hostname>/ohasd/ohasd.log

– initialisation of the ohasd process
– reading of resource
– setting ACL per resource
– spawns the agent procesess (oraagent.bin, cssdagent, orarootagent.bin)

$ORACLE_HOME/log/<hostname><hostname>/agent/ohasd/oraagent_oracle/oraagent_oracle.log
$ORACLE_HOME/log/<hostname>/agent/ohasd/orarootagent_oracle/orarootagent_oracle.log
$ORACLE_HOME/log/<hostname>/agent/ohasd/oracssdagent_oracle/oracssdagent_oracle.log
$ORACLE_HOME/log/<hostname>/diskmon/client.log
$ORACLE_HOME/log/<hostname>/diskmon/diskmon.log
$ORACLE_HOME/log/<hostname>/cssd/ocssd.log
$ORACLE_HOME/log/<hostname>/cssd/cssdOUT.log
$ORACLE_HOME/log/<hostname>/client/crsctl.log

The information the agent reads at startup are stored in the OLR, which seems to be a bit like the OCR. You find the location of the OLR in /etc/oracle/olr.loc, but there is still a pointer to the OCR in /etc/oracle/ocr.loc. Usually they are in $CRS_HOME/cddata/hostname/. It seems though that the OCR is not used at all as it hadn’t been touched since I installed the grid infrastructure home.

It seems that there is one backup of the OLR taken, as shown in ocrconfig -local -showbackup. Ommitting the “-local” option will result in in an error message. I have no idea if that it intentional or not. Taking manual backups fails as well, but ocrconfig doesn’t even report the error:

[oracle@rhel5 rhel5]$ ocrconfig -local -showbackup 

rhel5     2009/09/17 08:29:25     /u01/app/oracle/product/11.2.0/grid/cdata/rhel5/backup_20090917_082925.olr
[oracle@rhel5 rhel5]$ ocrconfig -local -manualbackup
PROTL-23: Message 23 not found;  product=srvm; facility=PROTL

[oracle@rhel5 rhel5]$ oerr protl 23
[oracle@rhel5 rhel5]$

Maybe this is like in 10.1 when crsctl wasn’t all that useful and we can expect new functionality to be added in due course.

This is the end of part I, the next part will deal with setting up and using FAN events for automatic client failover.

Update – bug 9084067

Following the oracle-l mailing list I have come across a ulimit problem, described in the following Metalink note:

11gR2 Oracle Restart Does not Use ULIMIT Setting Appropriately [ID 983715.1]

In essence, starting the instance through srvctl upon node reboot huge pages aren’t used and the database alert log displays a warning saying that it is running on a system with low open file descriptor limit. All the prerequisites for ulimits were implemented for the oracle account, and starting the database through sqlplus as oracle is not a problem. srvctl however hands off control to the root account to start the database and for this account limits are different. Check the note for a fix to the problem with ohasd.

[oracle@rhel5 ~]$ srvctl add database -hAdds a database configuration to be managed by Oracle Restart.Usage: srvctl add database -d <db_unique_name> -o <oracle_home> [-m <domain_name>] [-p <spfile>] [-r {PRIMARY | PHYSICAL_STANDBY | LOGICAL_STANDBY | SNAPSHOT_STANDBY}] [-s <start_options>] [-t <stop_options>] [-n <db_name>] [-y {AUTOMATIC | MANUAL}] [-a “<diskgroup_list>”]
-d <db_unique_name>      Unique name for the database
-o <oracle_home>         ORACLE_HOME path
-m <domain>              Domain for database. Must be set if database has DB_DOMAIN set.
-p <spfile>              Server parameter file path
-r <role>                Role of the database (primary, physical_standby, logical_standby, snapshot_standby)
-s <start_options>       Startup options for the database. Examples of startup options are open, mount, or nomount.
-t <stop_options>        Stop options for the database. Examples of shutdown options are normal, transactional, immediate, or abort.
-n <db_name>        Database name (DB_NAME), if different from the unique name given by the -d option
-y <dbpolicy>            Management policy for the database (AUTOMATIC or MANUAL)
-a “<diskgroup_list>”      Comma separated list of disk groups
-h                       Print usage

17 Responses to “Oracle Restart”

  1. Really good article! Thanks!

  2. k1 said

    Thanks for this article, it put me onto the path to resolve an issue I was having. In order to share the love, I thought I’d post my problem and solution here as well.

    “Look ma, no dbora!”

    I’m setting up 11gR2 using ASM and HA services to support an SE1 database. Oracle Restart seems like a great feature, but the online docs are relatively sparse, and I couldn’t find anything describing how to configure the system to replace the time-honored init.d/dbora setup.

    Following a pretty standard install of grid infrastructure and database home, most of the stack would come up correctly: has, dg, asm, and listener all started up happily. My database instance, though registered, would not start automatically. And I couldn’t figure out why, because it was configured with “-y AUTOMATIC”.

    Digging around with srvctl and crsctl (this article helped immensely) showed that my resource configuration had attribute AUTO_START=restore. Yet more google-bashing turned up the fact that these attributes used to be numeric but are now alpha: ‘never’, ‘restore’, and ‘always’. The following command changed things to start up database MYDB automagically:

    crsctl modify resource ora.mydb.db -attr “AUTO_START=always”

    I’m running single-instance, so this may not suit a RAC environment, for instance. However, my linux server now cleanly starts up and shuts down the Oracle stack automatically, in appropriate order.

    • TGASCARD said

      Thanks, it helped me.
      It ‘s true Oracle restart is not well documented.
      Next time i will install grid infrastucture for cluster but just in one node. I think it’s permited.

      • Martin said

        It certainly is. But bear in mind that if you do so, and install a database as well your database will use additional code paths it wouldn’t use if it were single instance. You might hit a performance penalty.

      • TGASCARD said

        Martin;

        Why did you write additional code for database ?. It’s possible to install clusterware for just one node with a single instance (not a rac one).
        I’ll be able to create additional resource, like a application vip and be able to load acfs drivers. You don’t agree with ?.
        Thanks.

      • Martin said

        Hi!

        This would be a different setup-and the database would have to go through additional code paths even though it only does single instance work. Or am I missing your point? Oracle Restart is specifically designed for the scenario I outlined in the post, other technology is of course suitable for other problems to be solved.

        Regards,

        Martin

  3. TGASCARD said

    Hi Martin,

    Thanks for the reply. I try to find what are the best solution between using Oracle Restart or Oracle Clusterware for one server for a non rac database.

  4. Kumar Madduri said

    Hi Martin,
    Have you done any work related to cloning of an oracle restart clusterware installation.
    The clusterware admin and deploy guide (chapter 4) has information about how to clone clusterware but that seems to be relevant to multiple nodes as when I try to use it, the config.sh asks me for scan name which is not relevant for this oracle restart.

    Thank you for your time
    Kumar

    • Martin said

      Hi Kumar,

      I’m sorry I haven’t worked with cloning of Grid Infrastructure (Oracle Restart/cluster) yet, but it’s on my to-do list. When I get some time I’ll definitely explore this in more detail so stay tuned :)

      Martin

  5. Yasser said

    “This is the end of part I, the next part will deal with setting up and using FAN events for automatic client failover”

    Eagerly waiting for your next part on setting and using FAN events for automatic client failover.

    • Martin Bach said

      Sorry, still did not have the time. I will try to put this on my to do list, but currently I’m absorbed by Exadata.

  6. nafey1 said

    Martin,
    Is Oracle Restart any different for the Oracle One-Node RAC, or its the same term interchageably used for the same?

    • Martin Bach said

      Hi,

      it is different. Oracle Restart is installed as a standalone server, you don’t need a private Interconnect, or a SCAN. Oracle Restart is the only way to get ASM in single instance deployments.

      Martin

  7. Sanjay said

    Hi Martin – Did you get resolution for this error –
    PROTL-23: Message 23 not found; product=srvm; facility=PROTL

    I am trying “ocrconfig -local -manualbackup” and getting same error message.

    thanks

    • Martin Bach said

      Hi Sanjay,

      I’d have a look at the client log file in $GRID_HOME/log/hostname/client. The file is usually the last one in that directory. What’s in it?

      Martin

      • Sanjay said

        here is from that log file. I issued the command as root.

        Oracle Database 11g Clusterware Release 11.2.0.2.0 – Production Copyright 1996, 2010 Oracle. All rights reserved.
        2012-10-16 11:33:16.955: [ OCRCONF][2418123568]ocrconfig starts…
        2012-10-16 11:33:16.962: [ OCRCLI][2418123568]proac_backup: Failed. Retval [5]
        2012-10-16 11:33:16.962: [ OCRCONF][2418123568]Failure in performing OCR/OLR backup [5] [PROCL-5: User does not have permission to perform a local registry operation on this key.]
        2012-10-16 11:33:16.962: [ OCRCONF][2418123568]Exiting [status=failed]…

      • Martin Bach said

        -> User does not have permission to perform a local registry operation on this key

        This is odd-I’d open a SR with Oracle in this case, there might be logical corruption in your OLR.

        Martin

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: