Martins Blog

Trying to explain complex things in simple terms

Connected to an idle instance-RAC 10.2.0.4.1

Posted by Martin Bach on October 16, 2009

I encountered a strange problem today with a 3 node cluster. Let’s start with the facts first:

  • RHEL 5.3 64bit
  • Oracle Clusterware 10.2.0.4 + bundle patch#4
  • Oracle ASM 10.2.0.4.1 (that is PSU 1)
  • ASMLib in use
  • Cluster members: nodea, nodeb, nodec (their real names are known to the author)

I was about to create a clustered ASM instance when it happened. I just completed dbca’s “configure automatic storage management” option which created the listeners (listener_<hostname>) on each host as well as the ASM instance itself. ASM was started on all cluster nodes:

[oracle@nodea bin]$ for i in a b c; do srvctl status asm -n node$i; done
ASM instance +ASM1 is running on node nodea.
ASM instance +ASM2 is running on node nodeb.
ASM instance +ASM3 is running on node nodec.

So far so good, then I decided to query +ASM1 to see if the disks are present (I have just finished a snapclone on the storage array).

[oracle@nodea~]$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.4.0 - Production on Tue Oct 6 15:22:51 2009

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.

Connected to an idle instance.

SQL>

Pardon? This is the moment where I think it would be great if wordpress could play back my astonishment and surprise about the nonexistent instance. What’s going on there? I could clearly see the instance was up!

[oracle@nodea ~]$ ps -ef | grep smon
oracle   23013 21570  0 15:23 pts/1    00:00:00 grep smon
oracle   28834     1  0 10:34 ?        00:00:00 asm_smon_+ASM1

Whatever I tried, I couldn’t connect to the instance. Surprisingly, starting and stopping through srvctl worked just fine. I then thought that something was wrong with the profile of the ASM instance. But all looked normal, both in crs_stat -p and srvctl config asm -n nodea. I then wanted to connect to ASM via the listener. I quickly added a tnsnames.ora entry:

ASM1 =
 (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = nodea-vip)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (UR = A)
      (SERVICE_NAME = +ASM)
      (INSTANCE_NAME = +ASM1)
   )
 )

Adding UR=A to the connect_data section is a trick bypass a listener which normally blocks incoming connection requests, especially for ASM. Now the listener didn’t want to play ball either.

ERROR:
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor

It got even more mysterious. I then checked if the listener was present as a process:

[oracle@nodeadbs]$ ps -ef | grep tns
oracle   14632 12235  0 Sep14 pts/6    00:00:00 vi tnsnames.ora
oracle   28655     1  0 10:34 ?        00:00:00 /u01/app/oracle/product/10.2.0/asm_1//bin/tnslsnr LISTENER_NODEA -inherit
oracle   28678 18161  0 10:34 pts/5    00:00:00 grep tns

Where does the double-slash come from after the $ORACLE_HOME I wonder? It’s not in the profile:

[oracle@nodea admin]$ /u01/crs/oracle/product/crs/bin/crs_stat -p ora.nodea.LISTENER_NODEA.lsnr
NAME=ora.nodea.LISTENER_NODEA.lsnr
TYPE=application
ACTION_SCRIPT=/u01/app/oracle/product/10.2.0/asm_1/bin/racgwrap
ACTIVE_PLACEMENT=0
AUTO_START=1
CHECK_INTERVAL=600
DESCRIPTION=CRS application for listener on node
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
HOSTING_MEMBERS=nodea
OPTIONAL_RESOURCES=
PLACEMENT=restricted
REQUIRED_RESOURCES=ora.nodea.vip
RESTART_ATTEMPTS=5
SCRIPT_TIMEOUT=600
START_TIMEOUT=0
STOP_TIMEOUT=0
UPTIME_THRESHOLD=7d
USR_ORA_ALERT_NAME=
USR_ORA_CHECK_TIMEOUT=0
USR_ORA_CONNECT_STR=/ as sysdba
USR_ORA_DEBUG=0
USR_ORA_DISCONNECT=false
USR_ORA_FLAGS=
USR_ORA_IF=
USR_ORA_INST_NOT_SHUTDOWN=
USR_ORA_LANG=
USR_ORA_NETMASK=
USR_ORA_OPEN_MODE=
USR_ORA_OPI=false
USR_ORA_PFILE=
USR_ORA_PRECONNECT=none
USR_ORA_SRV=
USR_ORA_START_TIMEOUT=0
USR_ORA_STOP_MODE=immediate
USR_ORA_STOP_TIMEOUT=0
USR_ORA_VIP=

The great thing about Linux (and Unix) is that it gives you insight to an immense amount of troubleshooting information, especially in the /proc file system. /proc/<pid> gives you the environment variables the process was started with in the “environ” file. In my case, the ASM instance was started with PID 4680 (output formatted for readability)

[oracle@nodea ~]$ cat /proc/4680/environ
_USR_ORA_INST_NOT_SHUTDOWN=
_CAA_OPTIONAL_RESOURCES=
ORA_CRS_HOME=/u01/crs/oracle/product/crs/
SELINUX_INIT=YES
CONSOLE=/dev/console
_USR_ORA_LANG=
_CAA_FAILOVER_DELAY=0
TERM=linux
_CAA_UPTIME_THRESHOLD=7d
_CAA_STATE=:OFFLINE,
_USR_ORA_PRECONNECT=none
_USR_ORA_DEBUG=0
_CAA_NAME=ora.nodea.ASM1.asm
_CAA_REASON=user
_USR_ORA_OPEN_MODE=mount
_CAA_FAILURE_INTERVAL=0
_CAA_FAILURE_THRESHOLD=0
_USR_ORA_CONNECT_STR=/ as sysdba
_USR_ORA_FLAGS=LD_LIBRARY_PATH=/u01/app/oracle/product/10.2.0/asm_1//lib:/u01/crs/oracle/product/crs//lib
_USR_ORA_START_TIMEOUT=0
INIT_VERSION=sysvinit-2.86
_USR_ORA_SRV=
_USR_ORA_VIP=
_CAA_REQUIRED_RESOURCES=
_CAA_HOSTING_MEMBERS=nodea
_CAA_START_TIMEOUT=0
_CAA_ACTION_SCRIPT=/u01/app/oracle/product/10.2.0/asm_1/bin/racgwrap
_CAA_STOP_TIMEOUT=0
_CAA_AUTO_START=1
PATH=
_USR_ORA_NETMASK=
RUNLEVEL=3
_CAA_TARGET=:ONLINE,PWD=/u01/app/oracle/product/10.2.0/asm_1/bin
_USR_ORA_IF=
_USR_ORA_ALERT_NAME=
_USR_ORA_DISCONNECT=false
PREVLEVEL=N
_CAA_PLACEMENT=restricted
_USR_ORA_OPI=false
_CAA_DESCRIPTION=CRS application for ASM instance
HOME=/SHLVL=1
_CAA_SCRIPT_TIMEOUT=600
_CAA_CLIENT_LOCALE=
_USR_ORA_CHECK_TIMEOUT=0
LD_ASSUME_KERNEL=
_CAA_CHECK_INTERVAL=600
_CAA_ACTIVE_PLACEMENT=0
_CAA_TYPE=application
_USR_ORA_STOP_TIMEOUT=0
_USR_ORA_STOP_MODE=immediate
_USR_ORA_PFILE=
_CAA_RESTART_ATTEMPTS=5
ORACLE_HOME=/u01/app/oracle/product/10.2.0/asm_1/
ORACLE_CONFIG_HOME=/u01/crs/oracle/product/crs/
ORACLE_SID=+ASM1
ORA_NET2_DESC=8,11
ORACLE_SPAWNED_PROCESS=1
SKGP_HIDDEN_ARGS=<FATAL/S/x0/xB/x0/x8B53435A/4651/4651/xA>0
[oracle@nodea ~]$

Among all these variables (ever wondered which effect environment variables have for debugging?) you can spot ORACLE_HOME, which has a slash appended to it. Actually, there are lots of paths with double-slashes – odd. I always though that wasn’t the case in CRS? So what about the listener’s environment variables?

[oracle@nodea ~]$ cat /proc/27625/environ
...ORACLE_HOME=/u01/app/oracle/product/10.2.0/asm_1/...

I spotted that the path to ORACLE_HOME ended with a trailing slash, so there’s a pattern here.

A quick check on the other instances revealed that their listener/ASM instance don’t have a slash appended to $ORACLE_HOME. Where does this come from? The profile told us the the $ORACLE_HOME/bin/racgwrap script is used to start/stop the resource so I diff’d the first instance’s racgwrap script against the second instance’s and voila – no trailing slash. Actually, it was a trailing slash followed by a whitespace.

The logical steps was then to stop the ASM instance followed by stopping the listener. I then edited racgwrap and removed the trailing slash and white space after ORACLE_HOME. Did it make the difference?

[oracle@nodea dbs]$ srvctl start listener -n nodea
[oracle@nodea dbs]$ ps -ef | grep tns
oracle   14632 12235  0 Sep14 pts/6    00:00:00 vi tnsnames.ora
oracle   28655     1  0 10:34 ?        00:00:00 /u01/app/oracle/product/10.2.0/asm_1/bin/tnslsnr LISTENER_NODEA -inherit
oracle   28678 18161  0 10:34 pts/5    00:00:00 grep tns

Great – no more double slashes in the path. What about ASM:

[oracle@nodea dbs]$ srvctl start asm -n nodea
[oracle@nodea dbs]$ . oraenv
ORACLE_SID = [+ASM1] ?
[oracle@nodea dbs]$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.4.0 - Production on Tue Oct 6 10:34:50 2009

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options

SQL> exit

Phew, so all good now. Why that managed to creep in I don’t know at all, but I was glad I managed to fix it.

5 Responses to “Connected to an idle instance-RAC 10.2.0.4.1”

  1. dbametrix said

    Hi,

    Very interesting. Nice article and discussion.
    Thanks a lot for same.

    Regards,
    Gitesh Trivedi
    http://www.dbametrix.com

  2. Arul said

    Cool.. Really appreciate sharing this on the web. Had a same problem this morning. scratched my head for 30 minutes with no luck and got your blog.

    Thanks once again.

    One stupid question though … How to find the PID which started your ASM / DB instance?

    • Martin said

      Ahhh, let me try to remember… it’s been a while. I think I just got the PID from smon. All background processes are fork()ed from the same binary so should inherit the identical environment.

  3. […] and so did ps), I triple-checked the ORACLE_SID environment but that was not it. Luckily, I found this blog entry that deals with a very similar problem. Long story short: This is what I had put in my […]

  4. Yoav Givon said

    Friday noon .
    Got the exact phenomenon on my 11.2.0.3.0 ASM instance on EL5.5 .
    So mysterious as obviously the ASM is up and running . After struggling it with no success for 3 hours I’ve to quite for a coffee and BINGO , got your Blog after quick Google search . Will get back home to check your solution .
    In any case thanks a million , clearly you are a pro !!
    .
    Yoav

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: