It seems I am doing a lot of fixing broken stuff recently. So this time I have been asked to repair a broken 8 node RAC cluster on OEL 5.5 with Oracle RAC 184.108.40.206. The system has been moved into a different, more secure network, and its firewalls prevented all access to the machines except for ILO. Another way of “security through obscurity”. The new network didn’t allow any clients to connect to any of the 8 node RAC which means that it is actually quite expensive kit to sit idle. The cluster is not in production, it’s still being build to specification but this accessibility problem has been a holdup to the project for a little while now. Yesterday has been a breakthrough-the netops team found an error to their configuration and for the first time the hosts could be accessed via ssh. Unfortunately for me that access is possible via audited gateways using PowerBroker to which I don’t have access.An alternative was the ILO interface which has not yet been hardened to production standards. So after some discussion internally I was given the ILO access credentials. This is good and bad: good, because it was a thoroughly broken system, and bad because there is no copy and paste with a java based console. And if that wasn’t bad enough, I had to contend myself with 80×24 characters on the console (however in very big letters). I pretty much needed all of my 24″ screen to display it. But I digress.
When logging on, I found the following situation:
- Only 1 out of 8 nodes had OHAS/CRSD started. The others were still down, a kernel upgrade has taken place, but the asmlib kernel module hasn’t been upgraded at the same time. The first node had the correct RPM installed and ASMLib has done its magic on this node
- Clusterware’s lower stack was up. However the ora.net1.network and all resources depending on it (listener, scan, scan listener, etc) were down. Not a single byte went over the public interconnect. That was strange.
Running /sbin/ifconfig has been a dream on this machine – I saw all 3 SCAN IPs on it, and all 8 node virtual IP addresses. Plus it has 6 NICs for Oracle, bonded into pairs of 2. And this is exactly where the confusion starts. I found the following bonded interfaces defined:
It took a while to figure out why these interfaces were named as they were, but apparently the suffix is a VLAN name. It also filtered through that one of my colleagues has tried to replace the previously used bond0.212 with bond0 as the public interconnect. He was however not successful in doing so, leaving the cluster in the state it was in.
He used the following commands to update the public interface:
$ oifcfg getif bond1.251 172.xxx.0 global cluster_interconnect bond0 10.2xxx8.0 global public
He also changed the vip configuration, with the result shown here:
srvctl config vip -n node11 VIP exists: /node1-vip/10.2xx8.13/10.2xx8.0/255.255.255.0/bond0, hosting node node11
The VIP however remained unimpressed:
srvctl start vip -n node1 PRCR-1079 : Failed to start resource ora.node1.vip CRS-2674: Start of 'ora.net1.network' on 'node1' failed CRS-2632: There are no more servers to try to place resource 'ora.node1.vip' on that would satisfy its placement policy
That’s where I have been asked to cast a keen eye over the installation.
First of all I could find nothing wrong with what has been done so far. So starting my investigation I first thought there was something wrong with the public network so I decided to shut it down:
# ifdown bond0
I then checked the network configuration of /etc/sysconfig/network-scripts. The setting is shown here:
device=bond0 bonding_opts="use_carrier=0 miimon=0 mode=1 arp_interval=10000 arp_ip_target=10.xxx.4 primary=eth0" bootproto=none onboot=yes network=10.2xxx.0 netmask=255.255.254.0 ipaddr=10.xxx.2 userctl=no
device=eth0 hwaddr=f4:ce:46:87:fa:d0 bootproto=none onboot=yes master=bond0 slave=y userctl=no
device=eth1 hwaddr=f4:ce:46:87:fa:d4 bootproto=none onboot=yes master=bond0 slave=yes userctl=no
The MAC addresses of ifcfg-eth* matched the output from the ifconfig command. In the lab I occasionally have the problem that my configurartion files don’t match the real MAC addresses and therefore my NICs don’t come up. But this wasn’t the case here.
I then checked if the kernel module is loaded correctly. Usually you’d find that in /etc/modprobe.conf but there was not entry. I added these lines as per the documentation:
alias bond0 bonding alias bond1 bonding alias bond1.251 bonding
With that all done I brought the bond0 interface back up (don’t ever try to bring down the private interconnect-it will cause a node eviction!). Still nothing. The output of crsctl status resource -t remained “OFFLINE” for resource ora.net1.network. BTW, you cannot manually start that a network resource using srvctl (it’s an ora.* resource so don’t even think about trying crsctl start resource ora.net1.network :). All you can do with a network resource is to get its configuration (srvctl config network -k 1…) and modify it (srvctl modify network -k 1…)
ORAROOTAGENT is responsible for starting the network, and it will try to do so every second or so. That’s CRSD’s ORAROOTAGENT by the way, the log file is in $GRID_HOME/log/`hostname -s`/agent/crsd/orarootagent_root/orarootagent_root.log.
After the modification to bond0 I could now ping the IP associated with bond0 so at least that was a success. One thing I learned that day is that the MAC address of the bonded NIC matches the primary eth* interface’s NIC, in my case it was that of eth0, i.e. f4:ce:46:87:fa:d0. If one of the enslaved NICs failed it would probably assume the failback NIC’s MAC address. So in summary:
- the network bonding was correctly configured
- I could ping bond0
At this point I could see no reason why starting of the network failed. Maybe a typo in the configuration? The network configuration can be queried with 2 commands: oifcfg and servctl config network. So I tried oifcfg first.oifcfg getif returns:
bond0 10.xx.x2.0 "good" bond0 10.xx.x8.0 "old/bad" bind1.251 172.xx.xx.160 interconnect bind1.251 169.254.0.0
Hmmm, where’s that second bond0 interface from? The bond1.251 interface is in use and working, the 172.xxx IP matches the IP address assigned in ifcfg-bon1.251. The second entry for bind1.251 is created by the HAIP resource and has to do with the high available cluster interconnect which uses multicasting for communication (to the frustration of many users who upgraded to 220.127.116.11 only to find out that the lower stack doesn’t start on the second and other nodes).
So to be sure that I was seeing something unusual I compared the output with another node on the cluster. There I found I only have 3 interfaces …. bond0 and bond1 + the UDP multicast address. I initially tried to remove the bad network with oifcfg delif but that didn’t work. I then verified the output of srvctl config network to see if it matched what I expected to. And here was a surprise: the output of the network listed a wrong subnet mask. Instead of 255.255.254.0 (note the “254”!) i found 255.255.255.0. That was easy to fix and while I was back again trying to delete the old network using oifcfg I suddenly realised that the cluster has sprung back into life. Small typo-big consequences! Finally all the resources depending on ora.net1.network were started, including SCAN VIPs, SCAN listeners, listeners, VIPs…
References for NIC bonding on RHEL5