Posted by Martin Bach on October 13, 2011
As part of a server move from one data centre to another I enjoyed working in the depths of Clusterware. This one has been a rather simple case though: the public IP addresses were the only part of the package to change: simple. One caveat though was the recreation of the OCR disk group I am using for the OCR and 3 copies of the voting file. I decided to reply on the backups I took before the server move.
Once the kit has been rewired in the new data centre, it was time to get active. The /etc/multipath.conf file had to be touched to add the new LUNs for my +OCR disk group. I have described the processes in a number of articles, for example here:
A few facts before we start:
- Oracle Enterprise Linux 5.5 64bit
- Grid Infrastructure 126.96.36.199.2 (actually it is Oracle Database SAP Bundle Patch 188.8.131.52.2)
I have already described how to restore the OCR and voting files in 184.108.40.206 in “Pro Oracle Database RAC 11g on Linux”, but since then the procedure has changed slightly I thought I’d add this here. The emphasis is on “slightly”. Read the rest of this entry »
Posted in 11g Release 2, Automatic Storage Management, RAC, Xen | 2 Comments »
Posted by Martin Bach on August 8, 2011
I recently had the immense pleasure of visiting Cisco’s labs at Bedfont Lakes for a day of intensive information exchange about their UCS offering. To summarise the day: I was impressed. Even more so by the fact that there is more to come, I’m assuming a few more blogs posts about UCS will get published here after I had some time to benchmark it.
I knew about UCS from a presentation at the UKOUG user group, but it didn’t occur at the time which potential is behind the technology. This potential is something Cisco sadly fail to make clear on their website-which is very good once you understand the UCS concept as it gives you many details about the individual components.
I should stress that I am not paid or otherwise financially motivated to write this article, it’s pure interest in technology that made me write this blog post. A piece of good technology should be mentioned, and this is what I would like to do.
What is the UCS anyway?
When I mentioned to friends that I was going to see Cisco to have a look at their blade server offering I got strange looks. Indeed, Cisco hasn’t been known as a manufacturer of blades before, it’s only recently (in industry terms) that they entered the market. However instead of providing YABE (yet another blade enclosure), they engineered it quite nicely.
If you like, the UCS is an appliance-like environment you can use for all sorts of workloads. It can be fitted in a standard 42” Rack and currently consists of these components (brackets contain product designations for further reading):
- Two (clustered) Fabric Interconnects (UCS 6120 or 6140 series) for 20 or 40 10G ports, with each port configurable as either uplinks into the core network or server links down to UCS chassis. These ports carry both Ethernet and FCoE traffic from the UCS chassis
- Two Fabric Extenders (UCS 2100 series), which go into the blade enclosure and provide connectivity up to the Fabric Interconnects. Each UCS 2104 fabric extender (FEX) provides 40Gb bandwidth to the Interconnect, controlled by QoS policies
- Blade enclosures (UCS 5100 series), which contain 8 half-width or 4 full width blades
- Different models of half-width and full-width UCS B-series blades providing up to 512G RAM and 7500 series Intel Xeon processors
- 10GE Adapters which are Converged Network Adapters (CNA). In other words they can do Fibre Channel over Ethernet and non-storage Ethernet traffic
The Fabric Interconnects can take extension modules with Fibre Channel to link to a FC switch, there is no new technology introduced and existing arrays can be used. Also, existing fibre channel solutions can be used for backups.
Another of the interesting features is the management software, called UCS Manager. It’s integrated into the Fabric Interconnect using a few gigabyte of flash storage. Not only is it used to manage a huge number of blades, it can also stage firmware for each component. At a suitable time, the firmware can be upgraded in a rolling fashion except for the Fabric Interconnect (obviously), though the fabric interconnects can take advantage of the clustering functionality to ensure that complete firmware upgrades can be undertaken with a system-wide outage.
Read the rest of this entry »
Posted in 11g Release 2, Automatic Storage Management, Linux, RAC | Tagged: cisco, ucs | 2 Comments »
Posted by Martin Bach on July 29, 2011
Yesterday I proudly presented a one hour training class about upgrading to Oracle 11.2 RAC at oracleracsig.org. This was the first time I presented using this facility and thought it might be useful for others to learn about the procedures and hopefully encourage other speakers to follow suit. It’s really straight forward and there is nothing to worry about! Especially if you are already familiar with webex, presenting should be a piece of cake. But I’m getting ahead of myself.
So how do you get to present?
Read the rest of this entry »
Posted in Public Appearances, RAC | Tagged: oracleracsig.org, present, webcast, webex | 1 Comment »
Posted by Martin Bach on June 8, 2011
I was very pleasently surprised that Oracle University are offering another day for my “Grid Infrastructure and Database High Availability Deep Dive” seminar. In addition to the immenent seminars in June (I blogged about them earlier), this one is in London, England. For anyone interested, here is the link:
The date has been set to October 10th, so there is plenty of time still, but nevertheless I hope to see you there!
Posted in 11g Release 2, Automatic Storage Management, Public Appearances, RAC | Leave a Comment »
Posted by Martin Bach on June 2, 2011
With the introduction of Clusterware 11.2 a great number of command line tools have either been deprecated ($ORA_CRS_HOME/bin/crs_* and others) or merged into other tools. This is especially true for crsctl, which is now the tool to access and manipulate low level resources in Clusterware.
This also implies that some of the notes on Metalink are no longer applicable to Clusterware 11.2, such as the one detailing how to get more detailed information in the logs. Not that the log information wasn’t already rather comprehensive if you asked me…
And here comes a warning: don’t change the log levels unless you have a valid reason, or under the instructions of support. Higher log levels than the defaults tend to generate too much data, filling up the GRID_HOME and potentially killing the node.
Log File Location
The location for logs in Clusterware hasn’t changed much since the unified log structure was introduced in 10.2 and documented in “CRS and 10g/11.1 Real Application Clusters (Doc ID 259301.1)”. It has been extended though, and quite dramatically so in 11.2, which is documented as well in one of the better notes from support: “11gR2 Clusterware and Grid Home – What You Need to Know (Doc ID 1053147.1)”
The techniques for getting debug and trace information as described for example in “Diagnosability for CRS / EVM / RACG (Doc ID 357808.1)” doesn’t really apply any more as the syntax changed.
Read the rest of this entry »
Posted in 11g Release 2, Linux, RAC | 3 Comments »
Posted by Martin Bach on May 26, 2011
I researched an interesting new feature available with Oracle 11g R2, the so called RAC FAN API when writing the workload management chapter for the RAC book. The RAC FAN API is documented in Oracle® Database JDBC Developer’s Guide, 11g Release 2 (11.2) available online, but when it came to the initial documentation following the 220.127.116.11 release on Linux it was pretty useless. The good news is that it improved!
The RAC FAN Java API
The aim of this API is to allow a Java application to listen to FAN events by creating a subscription to the RAC nodes’ ONS processes. The application then registers a FANListener, based on the subscription, which can pick up instances of the following events:
All of these are in the oracle.simplefan namespace, the javadoc reference of which you can find in the official documenation. Read the rest of this entry »
Posted in 11g Release 2, Linux, RAC | Tagged: API, fan, oracle.simplefan, RAC | 1 Comment »
Posted by Martin Bach on May 17, 2011
I am playing around with the Grid Infrastructure 18.104.22.168 PSU 2 and found an interesting note on My Oracle Support regarding the Patch Set Update. This reminds me that it’s always a good idea to search for a patch number on Metalink before applying a PSU. It also seems to be a good idea to wait for a few days before trying a PSU (or maybe CPU) on your DEV environment for the first time (and don’t even think about applying a PSU on production without thorough testing!)
OK, back to the story: there is a known issue with the patchset which has to do with the change in the Mutex behaviour which the PSU was intended to fix. To quote MOS note “Oracle Database Patch Set Update 22.214.171.124.2 Known Issues (Doc ID 1291879.1)”, Patch 12431716 Is a Recommended Patch for 126.96.36.199.2. In fact, Oracle strongly recommends you to apply the patch to fix Bug 12431716 – Unexpected change in mutex wait behavior in 188.8.131.52.2 PSU (higher CPU possible).
In a nutshell, not applying the patch can cause your system to suffer from excessive CPU usage and more than expected mutex contention. More information can be found in the description of Bug 12431716 Mutex waits may cause higher CPU usage in 184.108.40.206.2 PSU / GI PSU which is worth reading.
Besides this, the PSU was applied without any problems to my four node cluster, I just wish there was a way to roll out a new version of opatch to all cluster node’s $GRID_HOME and $ORACLE_HOME in one command. The overall process for the PSU is the same as already described in my previous post about Bundle Patch 3:
- Get the latest version of OPatch
- Deploy OPatch to $GRID_HOME and $ORACLE_HOME (ensure permissions are set correctly for the OPatch in $GRID_HOME!)
- Unzip the PSU (Bug 11724916 – 220.127.116.11.2 Patch Set Update (PSU) (Doc ID 11724916.8)), for example to /tmp/PSU
- Change directory to where you unzipped (/tmp/PSU) and become root
- Ensure that $GRID_HOME/OPatch is part of the path
- Read the readme
- Create an OCM response file and save it to say, /tmp/ocm.rsp
- Start the patch as root: opatch auto and supply the full path to the OCM response file (/tmp/ocm.rsp)
- Apply the beforementioned one-off patch
Then wait, and after a little while you spend trailing the logfile in $GRID_HOME/cfgtoollogs/ and having a coffee the process eventually finishes. Repeat on each node and you’re done. I’m really happy there aren’t these long readme files anymore with 8 steps to be performed, partially as root, partially as CRS owner/RDBMS owner. It reduces tge tune ut takes to apply a PSU significantly.
Posted in 11g Release 2, Automatic Storage Management, Linux, RAC | 22 Comments »
Posted by Martin Bach on May 13, 2011
Addmittedly I haven’t checked for a little while, but an email by my co-author Steve Show prompted me to go to the Amazon website and look it up.
And yes, it’s reality! Our book is now finally available as a kindle version, how great is that?!?
There isn’t really a lot more to say about this subject. I’ll wonder how many techies are intersted in the kindle version after the PDF has been out for quite a while. If you read this and decide to get the kindle version, could you please let me know how you liked it? Personally I think the book is well suited for the Amazon reader as it’s mostly text which suits the device well.
Posted in 11g Release 2, Linux, RAC, RAC Book | 1 Comment »
Posted by Martin Bach on April 6, 2011
Julian Dyke has started an interesting thread on the Oak Table mailing list after the latest UKOUG RAC and HA SIG. Unfortunately I couldn’t attend that event, I wish I had, and I knew it would be great.
Anyway, the question revolved around an ASM disk group created with normal redundancy spanning two storage arrays. This should in theory protect against the failure of an array, although at a high price. All ASM disks exported from an array would be 1 failure group. Remember that disks in a failure group all fail if the supporting infrastructure (network, HBA, controller etc) fails. So what would happen with such a setup, if you followed these steps:
- Shutdown the array for failure group 2
- Stop the database
- Shutdown the second array – failure group 1
- Do some more maintenance…
- Startup failgroup B SAN
- Start the database
- Startup failgroup A SAN
ASM can tolerate the failure of one failgroup (capacity permitting), so the failure of failgroup 2 should not bring the disk group down, which would result in immediate loss of service. But what happens if it comes up after the data in the other failure group has been modified? Will there be data corruption?
Read the rest of this entry »
Posted in 11g Release 1, 11g Release 2, Automatic Storage Management, Linux, RAC | Tagged: ora-600, [kccpb_sanity_check_2] | 5 Comments »
Posted by Martin Bach on March 22, 2011
This post is inspired by a recent thread on the oracle-l mailing list. In post “11g RAC orapw file issue- RAC nodes not updated” the fact that the password file is local to the instance has been brought up. In fact, all users with the SYSOPER or SYSDBA role granted are stored in the password file, and changing the account for the SYS user on one instance doesn’t mean the password change is reflected on the other RAC instances. Furthermore, your Data Guard configuration will break as well, since the SYS account is used to log in to the standby database. Read the rest of this entry »
Posted in 11g Release 2, Linux, RAC | 2 Comments »