Martins Blog

Trying to explain complex things in simple terms

Installing RAC 11.2.0.2 on Solaris 10/09 x64

Posted by Martin Bach on November 12, 2010

One of the major adventures this time of the year involves installing RAC 11.2.0.2 on Solaris 10 10/09 x86-64. The system setup included EMC Power Path 5.3 as the multipathing solution to shared storage.

I initially asked for 4 BL685 G6 with 24 cores, but in the end “only” got two-still plenty of resources to experiment with.  I especially like the output of this command:

$ /usr/sbin/psrinfo | wc –l
 24

Nice! Actually, it’s 4 Opteron processors:

$ /usr/sbin/prtdiag | less
System Configuration: HP ProLiant BL685c G6
 BIOS Configuration: HP A17 12/09/2009
 BMC Configuration: IPMI 2.0 (KCS: Keyboard Controller Style)
==== Processor Sockets ====================================
Version                          Location Tag
 -------------------------------- --------------------------
 Opteron                          Proc 1
 Opteron                          Proc 2
 Opteron                          Proc 3
 Opteron                          Proc 4

So much for the equipment. The operating system showed 4 NICs, all called bnxen where n was 0 through 4. The first interface, bnxe0, will be used for the public network. The second NIC is to be ignored and the final 2, bnxe2 and bnxe3 will be used for the high available cluster interconnect feature. This way I can prevent the use of SFRAC which inevitably would have meant a clustered Veritas file system instead of ASM.

One interesting point to notice is that the Oracle MOS document 1210883.1 specifies that the interfaces for the private interconnect are on the same subnet. So-node1 will use 192.168.0.1 for bnxe2 and 192.168.0.2 for bnxe3. Similarly, node2 uses 192.168.0.3 for bnxe2 and 192.168.0.4 for bnxe3. The Oracle example is actually a bit more complicated than it could have been, as they use a /25 subnet mask. But ipcalc confirms that the address range they use are all well within the subnet:

 Address:   10.1.0.128            00001010.00000001.00000000.1 0000000
 Netmask:   255.255.255.128 = 25  11111111.11111111.11111111.1 0000000
 Wildcard:  0.0.0.127             00000000.00000000.00000000.0 1111111
 =>
 Network:   10.1.0.128/25         00001010.00000001.00000000.1 0000000 (Class A)
 Broadcast: 10.1.0.255            00001010.00000001.00000000.1 1111111
 HostMin:   10.1.0.129            00001010.00000001.00000000.1 0000001
 HostMax:   10.1.0.254            00001010.00000001.00000000.1 1111110
 Hosts/Net: 126                   (Private Internet)

This setup will have some interesting implications which I’ll describe a little later.

Part of the test was to find out how mature the port to Solaris on Intel was. So I decided to start off by installing Grid Infrastructure on node 1 first, and extend the cluster to node2 using the addNode.sh script in $ORACLE_HOME/oui/bin.

The installation uses 2 different accounts to store the Grid Infrastructure binaries separately from the RDBMS binaries. Operating system accounts are oragrid and oracle.

Oracle: uid=501(oracle) gid=30275(oinstall) groups=309(dba),2046(asmdba),2047(asmadmin)
OraGrid: uid=502(oragrid) gid=30275(oinstall) groups=309(dba),2046(asmdba),2047(asmadmin)

I started off by downloading files 1,2 and 3 of patch 10098816 for my platform. The ratio of downloads of this patch was 243 to 751 between x64 and SPARC. So not a massive uptake of this patchset for Solaris it would seem.

As the oragrid user I created user equivalence for RSA and DSA ssh-keys, a little utility will do this now for you, but I’m old-school and create the keys and exchanged them on the hosts myself. Not too bad a task on only 2 nodes.

The next step was to find out about the shared storage. And that took me a little while I admit freely: I haven’t used the EMC Power Path multipathing software before and found it difficult to approach, mainly for the lack of information about it. Or maybe I just didn’t find it, but device-mapper-multipath for instance is easier to understand. Additionally, the fact that this was Solaris Intel made it a little more complex. First I needed to know what the device names actually mean. As on Solaris SPARC, /dev/dsk will list the block devices, /dev/rdsk/ lists the raw devices. So there’s where I’m heading. Next I checked the devices, emcpower0a to emcpower9a. In the course of the installation I found out how to deal with these. First of all, on Solaris Intel, you have to create a partition of the LUN before it can be dealt with in the SPARC way. So for each device you would like to use, fdisk the emcpowerxp0 device, i.e.

# fdisk /dev/rdsk/emcpower0p0

If there is no partition, simply say “y” to the question if you want to use all of it for Solaris and exit fdisk. Otherwise, delete the existing partition (AFTER HAVING double/triple CHECKED THAT IT’S REALLY NOT NEEDED!) and create a new one of type “Solaris2”. It didn’t seem necessary to make it active.

Here’s a sample session:

bash-3.00# fdisk /dev/rdsk/emcpower0p0
No fdisk table exists. The default partition for the disk is:
a 100% "SOLARIS System" partition
Type "y" to accept the default partition,  otherwise type "n" to edit the partition table.
Y

Now let’s check the result:

bash-3.00# fdisk /dev/rdsk/emcpower0p0
Total disk size is 1023 cylinders
Cylinder size is 2048 (512 byte) blocks
Cylinders
Partition   Status    Type          Start   End   Length    %
=========   ======    ============  =====   ===   ======   ===
1       Active    Solaris2          1  1022    1022    100 
SELECT ONE OF THE FOLLOWING:
1. Create a partition
2. Specify the active partition
3. Delete a partition
4. Change between Solaris and Solaris2 Partition IDs
5. Exit (update disk configuration and exit)
6. Cancel (exit without updating disk configuration)
Enter Selection: 6

bash-3.00#

This particular device will be used for my OCRVOTE disk group, that’s why it’s only 1G. The next step is identical on SPARC-start the format tool, select partition, change the fourth partition to use the whole disk (with an offset of 3 cylinders at the beginning of the slice) and label it. With that done, exit the format application.

This takes me back to the discussion of the emcpower-device name. The letters [a-p] refer to the slices of the device, while p stands for the partition. /dev/emcpowernc is slice 2 of the second multipathed device, in other words the whole disk. I usually create a slice 4 which translates to emcpowerne. After having completed the disk initialisation, I had to ensure that the ones I was working on were really shared. Unfortunately the emcpower devices are not consistently named across the cluster. What is emcpower0a on node1 turned out to be emcpower2a on the second node. How to check? The powermt tool to the rescue. Similar to “multipath –ll” on Linux the powermt command can show the underlying disks which are aggregated under the emcpowern pseudo device. So I wanted to know if my device /dev/rdsk/emcpower0e was shared. What I really was interested on was the native device:

# powermt display dev=emcpower0a | awk \
 > '/c[0-9]t/ {print $3}'
 c1t50000974C00A611Cd6s0
 c2t50000974C00A6118d6s0

Well, does that exist on the other node?

# powermt display dev=all | /usr/sfw/bin/ggrep -B8  c1t50000974C00A611Cd6s0
Pseudo name=emcpower3a
Symmetrix ID=000294900664
Logical device ID=0468
state=alive; policy=SymmOpt; priority=0; queued-IOs=0;
==============================================================================
--------------- Host ---------------   - Stor -   -- I/O Path --  -- Stats ---
###  HW Path               I/O Paths    Interf.   Mode    State   Q-IOs Errors
==============================================================================
3072 pci@39,0/pci1166,142@12/pci103c,1708@0/fp@0,0 c1t50000974C00A611Cd6s0 FA  8eA   active  alive       0      0 

So yes it was there. Cool! I checked the 2 other OCR/voting disks LUNS and they were shareable as well. The final piece was to change the ownership of the devices to oragrid:asmdba and permissions to 0660.

Project settings

Another item to look at is the project settings for the grid owner and oracle. It’s important to set projects correctly, otherwise the installation will fail when ASM is starting. All newly created users inherit the settings from the default project. Unless the sys admins set the default project high enough, you will have to change them. To check the settings you can use the “prctl -i project default” call to check all the values for this project.

I usually create a project for the grid owner, oragrid, as well as for the oracle account. My settings are as follows for a maximum SGA size of around 20G:

projadd -c “Oracle Grid Infrastructure” ‘user.oracle’
projmod -s -K “process.max-sem-nsems=(privileged,256,deny)” ‘user.oracle’
projmod -s -K “project.max-shm-memory=(privileged,20GB,deny)” ‘user.oracle’
projmod -s -K “project.max-shm-ids=(privileged,256,deny)” ‘user.oracle’

Repeat this for the oragrid user, then log in as oragrid and check that the project is actually assigned:

# id -p oragrid
uid=223(oragrid) gid=30275(oinstall) projid=100(user.oragrid)

Installing Grid Infrastructure

Finally ready to start the installer! The solaris installation isn’t any different from Linux except for the aforementioned fiddling with the raw devices.

The installation went smoothly, I ran orainstroot.sh and root.sh without any problem. If anything, it was a bit slow, taking 10 minutes to complete root.sh on node1. You can tail the rootcrs_node1.log file in /data/oragrid/product/11.2.0.2/cfgtoollogs/crsconfig to see what’s going on behind the scenes. This is certainly one of the biggest improvements over 10g and 11g Release 1.

Extending the cluster

The MOS document I was alluding to earlier suggested, like I said, to have all the private NIC IP addresses in the same subnet. That isn’t necessarily to the liking of cluvfy. The communication over bnxe3 on both hosts fails, as shown in this example. Tests executed from node1:

bash-3.00# ping 192.168.0.1
192.168.0.1 is alive
bash-3.00# ping 192.168.0.2
192.168.0.2 is alive
bash-3.00# ping 192.168.0.3
192.168.0.3 is alive
bash-3.00# ping 192.168.0.4
^C
192.168.0.4 is not replying

Tests executed on node 2

bash-3.00# ping 192.168.0.1
192.168.0.1 is alive
bash-3.00# ping 192.168.0.2
^C
bash-3.00# ping 192.168.0.3
192.168.0.3 is alive
bash-3.00# ping 192.168.0.4
192.168.0.4 is alive

I decided to ignore this for now, and sure enough, the cluster extension didn’t fail. As I’m not using GNS, the command to add the node was

$ ./addNode.sh -debug -logLevel finest "CLUSTER_NEW_NODES={loninengblc208}" \
 CLUSTER_NEW_VIRTUAL_HOSTNAMES={loninengblc208-vip}"

This is actually a little more verbose than I needed, but it’s always good to be prepared for a SR with Oracle.

However, the OUI command will perform a pre-requisite check before the actual call to runInstaller, and that repeatedly failed, complaining about connectivity on the bnxe3 network. Checking the contents of the addNode.sh script I found an environment variable “$IGNORE_PREADDNODE_CHECKS” which can be set to “Y” to force the script to ignore the pre-requisite checks. With that set, the addNode operation succeeded.

RDBMS installation

This is actually not worthy to report, it’s pretty much the same as on Linux. However, a small caveat is specified to Solaris x86-64. Many files in the Oracle inventory didn’t have correct permissions. When launching runInstaller to install the binaries, I was bombarded with complaints about file permissions.

For example, oraInstaller.properties has the wrong permissions. Example for Solaris Intel:

# ls -l oraInstaller.properties
 -rw-r--r--   1 oragrid  oinstall     317 Nov  9 15:01 oraInstaller.properties

On Linux:

$ ls -l oraInstaller.properties
 -rw-rw---- 1 oragrid oinstall 345 Oct 21 12:44 oraInstaller.properties

There were a few more, I fixed them using these commands:

$ chmod 770 ContentsXML
$ chmod 660 install.platform
$ chmod 770 oui
$ chmod 660 ContentsXML/*
$ chmod 660 oui/*

Once the permissions were fixed the installation succeeded.

DBCA

Nothing to report here, it’s the same as for Linux.

8 Responses to “Installing RAC 11.2.0.2 on Solaris 10/09 x64”

  1. Boris said

    Martin,

    That’s a great explanation. Could you please tell me how did you change the ownership and permissions of the raw devices (i.e. what name did you use – logical, physical..)? I can’t figure out how to do it in Solaris environment (seems to be quite simple in Linux)… when I try something like chown grid:asmdba /dev/rdisk/ it doesn’t give me an error but it also doesn’t change the ownership…

    • Martin said

      Hi Boris!

      The devices in /dev/rdsk are symlinks – use the “ls -lL” command to list the permissions on the “real” devices. I did a chown oragrid:dba (ASM owner != RDBMS owner) /dev/rdsk/emcpower* and that worked for me. The tricky bit was partitioning the devices-as I said in the post this is different from Solaris SPARC.

      Hope this helps!

      • Boris said

        Thank you, Martin.

      • james said

        Hi Martin,

        At my case i can run below command successfully
        chown grid:oinstall (ASM owner != RDBMS owner) /dev/rdsk/emcpower0a
        chmod 660 /dev/rdsk/emcpower0a
        but ls -lrth command show emcpower0a owner is root. And asmca doesnt show any disk.

        regards,
        James

  2. Dean said

    Martin,

    What does your private network look like, when CRS is active on both nodes? According to MOS note 1210883.1, the private network configuration can be viewed with the following commands.

    $GRID_HOME/bin/crsctl stat res -t -init
    $GRID_HOME/bin/oifcfg getif
    $GRID_HOME/bin/oifcfg iflist -p -n
    ifconfig
    select name,ip_address from v$cluster_interconnects;

    I have a two node cluster on HPUX Itanium with three private networks and only one HAIP. According to MOS note 1210883.1, four HAIPs should exist. I have contacted Oracle Support, and they have transferred the request to engineering.

    • Martin said

      Hi Dean,

      my private network looks ok :)

      Here’s the output you are after:

      [root@node1 ~]# crsctl stat res -t -init
      --------------------------------------------------------------------------------
      NAME TARGET STATE SERVER STATE_DETAILS
      --------------------------------------------------------------------------------
      Cluster Resources
      --------------------------------------------------------------------------------
      ora.asm
      1 ONLINE ONLINE node1 Started
      ora.cluster_interconnect.haip
      1 ONLINE ONLINE node1
      ora.crf
      1 ONLINE ONLINE node1
      ora.crsd
      1 ONLINE ONLINE node1
      ora.cssd
      1 ONLINE ONLINE node1
      ora.cssdmonitor
      1 ONLINE ONLINE node1
      ora.ctssd
      1 ONLINE ONLINE node1 OBSERVER
      ora.diskmon
      1 ONLINE ONLINE node1
      ora.drivers.acfs
      1 ONLINE ONLINE node1
      ora.evmd
      1 ONLINE ONLINE node1
      ora.gipcd
      1 ONLINE ONLINE node1
      ora.gpnpd
      1 ONLINE ONLINE node1
      ora.mdnsd
      1 ONLINE ONLINE node1
      [root@node1 ~]#


      [root@node1 ~]# oifcfg getif
      eth2 10.129.52.0 global public
      eth3 192.168.0.0 global cluster_interconnect
      eth5 192.168.0.0 global cluster_interconnect


      SQL> select name,ip_address from v$cluster_interconnects;

      NAME IP_ADDRESS
      --------------- ----------------
      eth3:1 169.254.91.57
      eth5:1 169.254.226.204

      +ASM1:oragrid@node1 /tmp# oifcfg iflist -p -n
      eth2 10.129.52.0 PRIVATE 255.255.255.0
      eth3 192.168.0.0 PRIVATE 255.255.255.0
      eth3 169.254.0.0 UNKNOWN 255.255.128.0
      eth5 192.168.0.0 PRIVATE 255.255.255.0
      eth5 169.254.128.0 UNKNOWN 255.255.128.0

      Hope this helps!

      • Dean said

        Martin,

        Thanks for your time and the quick reply. It appears I have an environmental or platform issue. I will add this information to my service request.

      • Martin said

        No worries-personal experience with Solaris Intel and Linux has been without problems. Itanium has always been a bit of a niche platform imo.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: