I am doing a couple of one day seminars with Oracle University, currently planned for Austria and Switzerland. They go by the title “Grid Intrastructure and Database High Availability Deep Dive”, and can be accessed via these links.
To save you from having to get the abstract, I copied it from the Oracle University website:
Providing a highly available database architecture fit for today’s fast changing requirements can be a complex task. Many technologies are available to provide resilience, each with its own advantages and possible disadvantages. This seminar begins with an overview of available HA technologies (hard and soft partitioning of servers, cold failover clusters, RAC and RAC One Node) and complementary tools and techniques to provide recovery from site failure (Data Guard or storage replication).
In the second part of the seminar, we look at Grid Infrastructure in great detail. Oracle Grid Infrastructure is the latest incarnation of the Clusterware HA framework which successfully powers every single 10g and 11g RAC installation. Despite its widespread implementation, many of its features are still not well understood by its users. We focus on Grid Infrastructure, what it is, what it does and how it can be put to best use, including the creation of an active/passive cold failover cluster for web and database resources. Special focus will be placed on the various storage options (Cluster File System, ASM, etc), the cluster interconnect and other implementation choices and on troubleshooting Grid Infrastructure. In the final part of the seminar, we explore Real Application Clusters and its various uses, from HA to scalability to consolidation. We discuss patching and workload management, coding for RAC and other techniques that will allow users to maximise the full potential of the package.
See you there if you are interested!
This post is inspired by a recent thread on the oracle-l mailing list. In post “11g RAC orapw file issue- RAC nodes not updated” the fact that the password file is local to the instance has been brought up. In fact, all users with the SYSOPER or SYSDBA role granted are stored in the password file, and changing the account for the SYS user on one instance doesn’t mean the password change is reflected on the other RAC instances. Furthermore, your Data Guard configuration will break as well, since the SYS account is used to log in to the standby database. Continue reading
This has been an interesting story today when one of my blades decided to reboot after an EXT3 journal error. The hard facts first:
- Oracle Linux 5.5 with kernel 2.6.18-18.104.22.168.1.el5
- Oracle 22.214.171.124 RAC
- Bonded NICs for private and public networks.
- Private bondn device defined on a VLAN
- BL685-G6 with 128G RAM
First I noticed the node had problems when I tried to get all databases configured on the cluster. I got the dreaded “cannot communicate with the CRSD”
[firstname.lastname@example.org] $ srvctl config database
PRCR-1119 : Failed to look up CRS resources of database type
PRCR-1115 : Failed to find entities of type resource that match filters (TYPE ==ora.database.type) and contain attributes DB_UNIQUE_NAME,ORACLE_HOME,VERSION
Cannot communicate with crsd
Not too great, especially since everything worked when I left yesterday. What could have gone wrong? Continue reading
This is a weird problem I ran in today. As part of an automation project the code deploys RAC One databases across a cluster, depending on the capacity available of the node. These are 128G RAM BL685c G6 currently but will be upgraded to G7 later.
Now, my problem was that after the weekend we couldn’t deploy any more RAC One databases, except for 1 node. DBCA simply created single instance databases instead. Newly created databases were properly registered in the OCR, and their build completed ok, but not as RAC One databases. Take for example this database:
$ srvctl config database -d MYDB
Database unique name: MYDB
Database name: MYDB
Oracle home: /u01/app/oracle/product/126.96.36.199
Oracle user: oracle
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: MYDB
Database instances: MYDB
Disk Groups: DATA
Mount point paths:
Database is administrator managed
Finally I got around to providing a useful example for a cluster callout script. It is actually on the verge of taking too long-remember that scripts in the $GRID_HOME/racg/usrco/ directory should execute quickly. Before deploying this, you should definitely ensure that the script executes quickly enough-the “time” utility can help you with this. Nevertheless this has been necessary to work around a limitation of Grid Control: RAC One Node databases are not supported in GC 11.1 (I complained about that earlier).
To work around the problem I wrote a script which can alleviate one of the arising problems: when using srvctl relocate database, another instance (usually called dbName_2) will be started to allow existing sessions to survive the failover operation if they use TAF or FAN/FCF.
This poses a big problem to Grid Control though-the second instance didn’t exist when you registered the database as a target, hence GC doesn’t know about it. Subsequently you may get paged that the database is down when in reality it is not. Receiving one of the “false positive” alarms is annoying at best at 02:00 AM in the morning. Actually, Grid Control is right in assuming that the database is down: although detected as a cluster database target, it only consists of 1 instance. If that’s down, it has to be assumed that the whole cluster database is down. In a perfect world we wouldn’t have this problem-GC was aware that the RON database moved to another node in the cluster and update its configuration accordingly. This is planned for the next major release sometime later in 2011. Apparently dbconsole has the ability to deal with such a situation. Continue reading
Just a really quick post about my appearance at Miracle Open World 2011 in Denmark next month:
I strongly recommend you to have a look at the agenda and speakers, there are lots of fellow Oak Table members sharing their knowledge-quite an incredible lineup and a great honour for me to be there.
Still debugging the OMS problem (it occasionally hangs and has to be restarted) I wrote a small shell script to help me gather all required logs for Oracle support. These are the logs I need for the SR, Niall Litchfield has written a recent blog post about other useful log locations.
The script is basic, and can possibly be extended. However it saved me a lot of time getting all the required information to one place from where I could take it and attach it to the service request. Before uploading I usually zip all files into dd-mm-yyyy-logs.nnn.zip to avoid clashing with logs already uploaded. I run the script via cron daily at 09:30. Continue reading