This is a quick post about one of my pet peeves-statically setting environment variables in .bashrc or other shell’s equivalents. I have been bitten by this a number of times. Sometimes it’s my own code, as in this story.
Many installation instructions about Oracle version x tell you to add variables to your shell session when you log in. What’s meant well for convenience can backfire. Sure it’s nice to have ORACLE_HOME, ORACLE_SID, LD_LIBRARY_PATH, CLASSPATH etc set automatically without having to find out about them the hard way. However, there are situations where this doesn’t help.
A few years back I migrated Clusterware from 10.2.0.4 to 126.96.36.199. It went pretty well for DEV, INT, UAT, DR. But not for PROD (d’oh!). Why didn’t it? It took a little while to figure out but Oracle assumes that variables such as ORA_CRS_HOME are NOT set when in my case it was. The full story is told on MOS: NETCA & ASMCA Fail during Upgrade of CRS/ASM to Grid Infrastructure 11gR2 (Doc ID 952925.1). Thankfully it was recoverable.
I have a script on my lab servers which allows me to set the environment for a particular database, including ORA_NLS10, NLS_DATE_FORMAT, and the usual suspects such as ORACLE_HOME, ORACLE_SID etc. The bug I identified this morning was that for one of the databases I had a copy and paste error. So instead of pointing to the 188.8.131.52 home where I wanted to create another database the ORA_NLS10 parameter pointed to the database I previously worked on, an 184.108.40.206 home. I didn’t know that though when I performed the following steps:
I normally execute dbca in silent mode in a screen session for database creation. In this case, it silently aborted, no error output that I could make out. Checking $ORACLE_BASE/cfgtoollogs/ I noticed the trace file with the following contents:
Could not connect to ASM due to following error: ORA-03113: end-of-file on communication channel
Well-that’s no good. There wasn’t a database alert.log available to check, but there were entries in the ASM alert.log for the core dump:
xception [type: SIGSEGV, Address not mapped to object] [ADDR:0x57598BCB] [PC:0xA635B06, lxcsu22m()+22] [flags: 0x0, count: 1] Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_3742.trc (incident=281963): ORA-07445: exception encountered: core dump [lxcsu22m()+22] [SIGSEGV] [ADDR:0x57598BCB] [PC:0xA635B06] [Address not mapped to object]  Incident details in: /u01/app/oracle/diag/asm/+asm/+ASM1/incident/incdir_281963/+ASM1_ora_3742_i281963.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details.
Checking the file it referenced didn’t give me much of a clue, and the callstack in the incident didn’t tell me much either. A quick search on MOS revealed that core dumps with lxcsu22m in the stack hint at invalid NLS settings. I didn’t recall having set NLS parameters in my session when it dawned on me… Using env | grep -i nls revealed that the ORA_NLS10 path was pointing to the 220.127.116.11 home I previously worked on. Unsetting these caused DBCA to complete without error.
I have added a function to my environment script called cleanEnv() which explicitly unsets all environment variables before setting new ones. It is also called first thing when logging in to avoid this kind of situation. I also assume that ORACLE_HOME is correct and check other paths against it and raise an error if there are mismatches.