Tag Archives: ASM

Be aware of these environment variables in .bashrc et al.

This is a quick post about one of my pet peeves-statically setting environment variables in .bashrc or other shell’s equivalents. I have been bitten by this a number of times. Sometimes it’s my own code, as in this story.

Background

Many installation instructions about Oracle version x tell you to add variables to your shell session when you log in. What’s meant well for convenience can backfire. Sure it’s nice to have ORACLE_HOME, ORACLE_SID, LD_LIBRARY_PATH, CLASSPATH etc set automatically without having to find out about them the hard way. However, there are situations where this doesn’t help.

A few years back I migrated Clusterware from 10.2.0.4 to 11.2.0.1. It went pretty well for DEV, INT, UAT, DR. But not for PROD (d’oh!). Why didn’t it? It took a little while to figure out but Oracle assumes that variables such as ORA_CRS_HOME are NOT set when in my case it was. The full story is told on MOS:¬†NETCA & ASMCA Fail during Upgrade of CRS/ASM to Grid Infrastructure 11gR2 (Doc ID 952925.1). Thankfully it was recoverable.

My case

I have a script on my lab servers which allows me to set the environment for a particular database, including ORA_NLS10, NLS_DATE_FORMAT, and the usual suspects such as ORACLE_HOME, ORACLE_SID etc. The bug I identified this morning was that for one of the databases I had a copy and paste error. So instead of pointing to the 11.2.0.3 home where I wanted to create another database the ORA_NLS10 parameter pointed to the database I previously worked on, an 11.2.0.1 home. I didn’t know that though when I performed the following steps:

DBCA crashes

I normally execute dbca in silent mode in a screen session for database creation. In this case, it silently aborted, no error output that I could make out. Checking $ORACLE_BASE/cfgtoollogs/ I noticed the trace file with the following contents:

Could not connect to ASM due to following error:
 ORA-03113: end-of-file on communication channel

Well-that’s no good. There wasn’t a database alert.log available to check, but there were entries in the ASM alert.log for the core dump:

xception [type: SIGSEGV, Address not mapped to object] [ADDR:0x57598BCB] [PC:0xA635B06, lxcsu22m()+22]
 [flags: 0x0, count: 1]
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_3742.trc  (incident=281963):
ORA-07445: exception encountered: core dump [lxcsu22m()+22] [SIGSEGV] [ADDR:0x57598BCB]
 [PC:0xA635B06] [Address not mapped to object] []
Incident details in: /u01/app/oracle/diag/asm/+asm/+ASM1/incident/incdir_281963/+ASM1_ora_3742_i281963.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.

Checking the file it referenced didn’t give me much of a clue, and the callstack in the incident didn’t tell me much either. A quick search on MOS revealed that core dumps with lxcsu22m in the stack hint at invalid NLS settings. I didn’t recall having set NLS parameters in my session when it dawned on me… Using env | grep -i nls revealed that the ORA_NLS10 path was pointing to the 11.2.0.1 home I previously worked on. Unsetting these caused DBCA to complete without error.

Lesson learned

I have added a function to my environment script called cleanEnv() which explicitly unsets all environment variables before setting new ones. It is also called first thing when logging in to avoid this kind of situation. I also assume that ORACLE_HOME is correct and check other paths against it and raise an error if there are mismatches.

Build your own stretch cluster part V

This post is about the installation of Grid Infrastructure, and where it’s really getting exciting: the 3rd NFS voting disk is going to be presented and I am going to show you how simple it is to add it into the disk group chosen for OCR and voting disks.

Let’s start with the installation of Grid Infrastructure. This is really simple, and I won’t go into too much detail. Start by downloading the required file from MOS, a simple search for patch 10098816 should bring you to the download patch for 11.2.0.2 for Linux-just make sure you select the 64bit version. The file we need just now is called p10098816_112020_Linux-x86-64_3of7.zip. The file names don’t necessarily relate to their contents, the readme helps finding out which piece of the puzzle is used for what functionality.

I alluded to my software distribution method in one of the earlier posts, and here’s all the detail to come. My dom0 exports the /m directory to the 192.168.99.0/24 network, the one accessible to all my domUs. This really simplifies software deployments.

Continue reading

Build your own 11.2.0.2 stretched RAC

Finally time for a new series! With the arrival of the new 11.2.0.2 patchset I thought it was about time to try and set up a virtual 11.2.0.2 extended distance or stretched RAC. So, it’s virtual, fair enough. It doesn’t allow me to test things like the impact of latency on the inter-SAN communication, but it allowed me to test the general setup. Think of this series as a guide after all the tedious work has been done, and SANs happily talk to each other. The example requires some understanding of how XEN virtualisation works, and it’s tailored to openSuSE 11.2 as the dom0 or “host”. I have tried OracleVM in the past but back then a domU (or virtual machine) could not mount an iSCSI target without a kernel panic and reboot. Clearly not what I needed at the time. OpenSuSE has another advantage: it uses a new kernel-not the 3 year old 2.6.18 you find in Enterprise distributions. Also, xen is recent (openSuSE 11.3 even features xen 4.0!) and so is libvirt.

The Setup

The general idea follows the design you find in the field, but with less cluster nodes. I am thinking of 2 nodes for the cluster, and 2 iSCSI target providers. I wouldn’t use iSCSI in the real world, but my lab isn’t connected to an EVA or similar.A third site will provide quorum via an NFS provided voting disk.

Site A will consist of filer01 for the storage part, and edcnode1 as the RAC node. Site B will consist of filer02 and edcnode2. The iSCSI targets are going to be provided by openFiler’s domU installation, and the cluster nodes will make use of Oracle Enterprise Linux 5 update 5.To make it more realistic, site C will consist of another openfiler isntance, filer03 to provide the NFS export for the 3rd voting disk. Note that openFiler seems to support NFS v3 only at the time of this writing. All systems are 64bit.

The network connectivity will go through 3 virtual switches, all “host only” on my dom0.

  • Public network: 192.168.99/24
  • Private network: 192.168.100/24
  • Storage network: 192.168.101/24

As in the real world, private and storage network have to be separated to prevent iSCSI packets clashing with Cache Fusion traffic. Also, I increased the MTU for the private and storage networks to 9000 instead of the default 1500. If you like to use jumbo frames you should check if your switch supports it.

Grid Infrastructure will use ASM to store OCR and voting disks, and the inter-SAN replication will also be performed by ASM in normal redundancy. I am planning on using preferred mirror read and intelligent data placement to see if that makes a difference.

Known limitations

This setup has some limitations, such as the following ones:

  • You cannot test inter-site SAN connectivity problems
  • You cannot make use of udev for the ASM devices-a xen domU doesn’t report anything back from /sbin/scsi_id which makes the mapping to /dev/mapper impossible (maybe someone knows a workaround?)
  • Network interfaces are not bonded-you certainly would use bonded NICs in real life
  • No “real” fibre channel connectivity between the cluster nodes

So much for the introduction-I’ll post the setup step-by-step. The intended series will consist of these articles:

  1. Introduction to XEN on openSuSE 11.2 and dom0 setup
  2. Introduction to openFiler and their installation as a virtual machine
  3. Setting up the cluster nodes
  4. Installing Grid Infrastructure 11.2.0.2
  5. Adding third voting disk on NFS
  6. Installing RDBMS binaries
  7. Creating a database

That’s it for today, I hope I got you interested and following the series. It’s been real fun doing it; now it’s about writing it all up.

Server Pool experiments in RAC 11.2

I spent Wednesday at UKOUG RAC & HA SIG and it was one of the best events I ever attended. Great¬† audience, and great feedback. One question I was particularly interested in was raised during my presentation, regarding server pools. I have now finally had the chance to experiment with this exciting new feature, my findings are in the blog post. I’ll see if the automatic assignment of nodes to pools works as advertised as well, but that’s for another post. Already this one turned out to be a monster!

Setup

I have installed a 3 node RAC cluster on Oracle Enterprise Linux 5 update 4 32bit to better understand server pools. I have read a lot about the subject, but as always, first hand experience pays off more than just reading. My environment uses GPnP, essentially it is the environment I described in part 2 of my build your own RAC 11.2 system, extended by another node. Continue reading

Upgrade ASM 10.2 to 11.2 single instance

This post describes how to upgrade an existing single instance ASM installation to the latest and greatest, Oracle 11.2. The most noteworthy change is that ASM is no longer installed using RDBMS installer but rather the Grid Infrastructure.

Huh-installing CRS for single instance? That at first sounded like a bit of an overkill but Oracle left us with no choice. As you can see later, it’s not as bad as it seems, I have to say I rather like the Oracle Restart feature (see my previous blog post).

So-as always-start by downloading the necessary software linux.x64_11gR2_grid.zip and unzip it somewhere convenient.

Preparation

I recommend getting some metadata from the ASM instance just in case:


SQL> select name,state from v$asm_diskgroup;

NAME			       STATE
------------------------------ -----------
DATA			       MOUNTED
FRA			       MOUNTED
LOG			       MOUNTED

SQL> select name,path from v$asm_disk;

NAME                           PATH
------------------------------ -------------------
DATA1                          ORCL:DATA1
LOG1                           ORCL:LOG1
FRA1                           ORCL:FRA1

SQL> show parameter asm

NAME				     TYPE	 VALUE
------------------------------------ ----------- ------------------------------
asm_diskgroups			     string	 DATA, FRA, LOG
asm_diskstring			     string	 ORCL:*
asm_power_limit 		     integer	 1
SQL>

Continue reading

How to move OCR and voting disks into ASM

Moving OCR and voting disk into ASM is the Oracle Grid Infrastructure or 11.2 way of storing essential metadata. During new installations this is quite simple since the work is done by OUI. When upgrading to 11.2 from a previous Oracle release, things look slightly different. Note that storing OCR and voting disk on raw or block devices-as we were doing it up to 11.1 is no longer possible except for migrated systems.

This is a transcript of some work I did to prove that it’s possible to move OCR and voting disk into ASM. It’s not been entirely simple though! Here’s my current setup after the migration of 10.2.0.4 -> 11.2.0.1.

Be warned though that this is very technical and even longer, but I decided to include the output of all commands just in case one of my readers comes across a similar problem but wants to find out if it’s the exactly same. Continue reading

Upgrade Clusterware 10.2 to 11.2

This post is about upgrading Oracle Clusterware 10.2.0.4 on Oracle Enterprise Linux 5.4 x86-64. The setup is as follows:

  • 3 node RAC system, hostnames: racupgrade1, racupgrade2, racupgrade3
  • Oracle Clusterware 10.2.0.4 with CRS bundle patch 4-(patch 8436582)
  • Oracle ASM 10.2.0.4.1, i.e. PSU 10.2.0.4.1 (patch 8576156)
  • Oracle RDBMS 10.2.0.4.1
  • Additionally I patched OPatch and applied one off patch 6079224 (“THE INSTANCE CAN NOT START DUE TO IPC SEND TIMEOUT”)

Please note that at the time of this writing CRS bundle patch 4 is superseded by Patch Set Update 2 (PSU 2) for CRS 10.2.0.4, and the latest RDBMS/ASM PSU is 10.2.0.4.3.

I use ASMLib for the ASM disks and initially bound CRS and OCR to raw devices (raw1 and raw2).

Continue reading

Let there be multipathing (when there wasn’t before)

One of the projects I was working on recently couldn’t be finished as it should have been due to a lack of switch ports in the SAN. The QA database servers therefore only had single pathing to the storage whereas the production kit (thank god!) had multipathing enabled and configured correctly.

This was far from ideal but couldn’t be helped, and it was also out of my control. You could argue that the project managers could have been aware of this and factor it in but I digress. On top of that I had a strange situation where the 3 nodes mapped the LNUs differently-what was seen as /dev/sda on node 1was /dev/sdf on the second node (but /dev/sda again on the third). The presentation of the LUNs to the database servers on the array was identical for each of the 3 nodes. How did I find out? I asked the storage admin to provide the WWIDs as seen on the management console and compared these against /dev/disk/by-id/. This only works with single pathing as explained further down.

The Linux kernel documentation states that the enumeration of devices isn’t static, and unfortunately there is no direct link between controller, target, disk and slice as in Solaris. These are RHEL 5.3 64bit system by the way, and Oracle 10.2.0.4.1 RAC.

There are 2 ways to get around this problem:

  • Metalink note 414897.1 describes how to set up udev for this purpose (and it includes 11.2!)
  • Use /dev/disk/by-id and /etc/rc.local

Continue reading