Martins Blog

Trying to explain complex things in simple terms

Archive for February, 2011

Quis custodiet ipsos custodies-Nagios monitoring for Grid Control

Posted by Martin Bach on February 28, 2011

I have a strange problem with my Grid Control Management Server in a Solaris 10 zone. When restarted, the OMS will serve requests fine for about 2 to 4 hours and then “hang”. Checking the Admin Server console I can see that there are stuck threads. The same information is also recorded in the logs.

NB: the really confusing part about Grid Control 11.1 is the use of Weblogic-you thought you knew where the Grid Control logs where? Forget about what you knew about 10.2 and enter a different dimension :)

So to be able to react quicker to a hang of the OMS (or EMGC_OMS1 to be more precise) I set up nagios to periodically poll the login page.

I’m using a VM with OEL 5.5 64bit to deploy nagios to, the requirements are very moderate. The install process is well documented in the quickstart guide-I’m using Fedora as a basis. OEL 5.5 doesn’t have nagios 3 RPMs available, so I decided to use the source downloaded from The tarballs you need are nagios-3.2.3.tar.gz and nagios-plugins-1.4.15.tar.gz at the time of this writing. Read the rest of this entry »

Posted in Grid Control, Linux | Tagged: , , , , , | Leave a Comment »

GC 11.1 and Monitoring Templates

Posted by Martin Bach on February 24, 2011

Throughout the last 2 weeks I have been working (or better: tried to work) with Grid Control 11.1 as the central monitoring and deployment solution for my current project.

The plan is to use EMGC 11.1 in conjunction with an 8 node cluster to automatically deploy RAC One Node databases. Please don’t ask about RAC One Node-that wasn’t my decision, and as I understand the previous project members only chose this as a poor compromise to keep the operations team happy(-ish)

Besides the fact that the OMS-which runs in a Solaris Zone repeatedly “hangs” and can’t be contacted by emcli or any browser (Bug 11804553)-RAC One Node is NOT SUPPORTED as a target in Grid Control 11.1. It might be supported in GC 12.1 later in 2011. But I digress

The Requirement

The OPS team maintains their own management servers. To allow us to perform some testing with the automatic database deployment without messing with a life OMS, it has been decided to install OEM GC 11.1 with PSU 2 locally on Solaris with a repository database on Linux. We needed GC11.1 to supoprt our cluster.

After the installation of the OMS I tried to export the required management templates from the life OMS (remember it’s and import them into 11.1 to save myself a lot of work.

Export a management template

The export function seems to have been introduced in and it works great. All you need to do it hop on the OMS, and use “emcli” (Enterprise Manager Command Line Interface) to log on and export the template. A sample session is shown here:

  • emcli login -username=yourUserName -password=yourPassword
  • emcli export_template -name=TemplateName -target_type=TargetType -output_file=/path/to/templateName.xml

If you are unsure about template names and targets, you can connect to the repository as sysman and query mgmt_templates:


And so I happily exported the management templates from the OMS.

The Bad News

Unfortunately, you can’t import non 11.1 templates into an 11.1 OMS. When I tried it I got the following error:

$ emcli import_template -files=”emd.10205.xml”
Monitoring template file emd.10205.xml exported from OMS can not be imported to OMS

Bugger. Sure enough, the XML file has a version tag:

<?xml version = '1.0' encoding = 'UTF-8'?>
<MonitoringTemplate template_name="Agent Template" target_type="oracle_emd" is_public="0" oms_version="" owner="SYSMAN" xmlns="">

The solution is to revert to the bad old times and manually comparing source and destination. A rather laborious and tiresome way of getting information across. Don’t forget to export the completed template from 11.1 to save yourself from going through that again.

Posted in Grid Control | Leave a Comment »

Using wget and proxy to download patches from MOS

Posted by Martin Bach on February 23, 2011

This is a rather quick note, but can be quite useful in certain situations. I currently look after a system which is quite difficult to jump on. That means before I get to do a “sudo su – oracle” I need to get to a jump-off box, ssh to 2 other machines and then log in as myself. It’s secure, but not user friendly. Especially in this case where I needed to run the latest RDA for an an open support request.

So rather than “dragging” the RDA with me on each box I used the new (Flash) interface to get a small shell script which you just need to deploy to your machine and run. It then connects to and does its magic.

The script works mostly fine, but depending on your environment you have to make small changes. My example is for Solaris 10, any Linux should just work out of the box.

Read the rest of this entry »

Posted in 11g Release 2 | Tagged: , , , | 1 Comment »

RAC One Node and Database Protection

Posted by Martin Bach on February 16, 2011

An email from fellow Oak Table Member James Morle about RAC One Node and failover got me thinking about the capabilities of the product.

I have written about RON (Rac One Node) in earlier posts, but haven’t really explored what happens with session failover during a database relocation.


So to clarify what happens in these two scenarios I have developed a simple test. Taking a RON database + a service I modified both to suit my test needs. Connected to the service I performed a database relocation to see what happens. Next I killed the instance (I wasn’t able to reboot the node) t o simulate what happens when the node crashes. Read the rest of this entry »

Posted in 11g Release 2, Linux, RAC, RAC Book | 6 Comments »

Error message of the day: OUI-25023 and the FQDN

Posted by Martin Bach on February 15, 2011

It’s been a long day with many problems around a Grid Control installation, including (but not limited to) corruption of the repository database, bugs in OUI when it comes to deinstalling the Oracle Management Server, lots of files left over by the weblogic “” script and many more. Some of the error messages were quite misleading, and OUI-25023 just was one too many. What happened?

Earlier today I was trying to install the 64bit agent on an 8 node cluster. After an initial headache (see below) it worked ok. However, I couldn’t resist mentioning OUI-25023. Here’s the complete story.

I downloaded the 11.1 agent for linux x86-64 as per the GC 11.1 documentation and deployed it to my fresh-installed management server. The OMS is on Solaris SPARC, and Grid Control doesn’t supply agents for a different platform. However, the security experts have locked the oracle account down on the cluster which ruled out the “agent push” scenario. I then opted for the installation via a response file, as described in the documentation. Read the rest of this entry »

Posted in Grid Control, Linux | 1 Comment »

Running Grid Control Agent commands standalone

Posted by Martin Bach on February 14, 2011

I had an error message today from one of my grid agents which was cut short in the GUI just when it became interesting. So I thought of a way of running the command on the comand line to get the full output.

This has been a little easier than I thought. I based my approach on an earlier blog article on my knowledgebase to get the perl environment variables set. I then needed to figure out where some of the libraries (perl scripts ending in *.pm) the agent script are referring were located.

A simple “locate -i *pm | grep $ORACLE_HOME” did it. This enabled me to write a preliminary script to run an EM agent task, shown below. It expects that you have ran “oraenv” previously to set the environment to the AGENT_HOME. When referring to ORACLE_HOME in the following, the AGENT_HOME is meant. It takes the full parameter to the script to be executed as the parameter and checked for ORACLE_HOME and $1 to exist. Read the rest of this entry »

Posted in Grid Control | Leave a Comment »

11.1 GC agent refuses to start

Posted by Martin Bach on February 14, 2011

This is a follow-up post from my previous tale of how not to move networks for RAC. After having successfully restarted the cluster as described here in a previous post I went on to install a Grid Control 11.1 system. This was to be on Solaris 10 SPARC-why SPARC? Not my platform of choice when it comes to Oracle software, but my customer has a huge SPARC estate and wants to make most of it.

After the OMS has been built (hopefully I’ll find time to document this as it can be quite tricky on SPARC) I wanted to secure the agents on my cluster against it. That worked ok for the first node:

  • emctl clearstate agent
  • emctl secure agent
  • emctl start agent

Five minutes later the agent appeared in my list of agents in the Grid Control console. With this success backing me I went to do the same on the next cluster node.

Here things were different-here’s the sequence of commands I used:

$ emctl stop agent
Oracle Enterprise Manager 11g Release 1 Grid Control
Copyright (c) 1996, 2010 Oracle Corporation.  All rights reserved

Read the rest of this entry »

Posted in 11g Release 1, 11g Release 2, Grid Control, Linux | Leave a Comment »

Troubleshooting on an 8 node cluster

Posted by Martin Bach on February 11, 2011

It seems I am doing a lot of fixing broken stuff recently. So this time I have been asked to repair a broken 8 node RAC cluster on OEL 5.5 with Oracle RAC The system has been moved into a different, more secure network, and its firewalls prevented all access to the machines except for ILO. Another way of “security through obscurity”. The new network didn’t allow any clients to connect to any of the 8 node RAC which means that it is actually quite expensive kit to sit idle. The cluster is not in production, it’s still being build to specification but this accessibility problem has been a holdup to the project for a little while now. Yesterday has been a breakthrough-the netops team found an error to their configuration and for the first time the hosts could be accessed via ssh. Unfortunately for me that access is possible via audited gateways using PowerBroker to which I don’t have access. Read the rest of this entry »

Posted in 11g Release 2, Linux, RAC, RAC Book, War Stories | Leave a Comment »

Oracle Support-final update to SR

Posted by Martin Bach on February 10, 2011

Just had a really pleasent exchange with Oracle support. I was after a way to purge the repository database of an OEM 11.1 Grid Control installation without having to blow it all away. Unfortunately, there is no such option. However, what I liked was this final update from the support member:

Generic Note

From sunny Colorado – blue sky and SNOW! – I do wish we could have provided a better option.

But I do want to thank you so much for your kindness and patience. You are the best kind of customer to work with. That means a lot, in these challenging jobs.

Very best,

The whole SR was well and competently managed by Thom, and at no time did he come up with techniques to buy more time by asking for irrelevant log files or similar. I wish more support staff were like him.

Posted in War Stories | Leave a Comment »

Getting up and running with Universal Connection Pool – Take 2

Posted by Martin Bach on February 4, 2011

In yesterday’s post (which is actually didn’t want to post that day) I wrote about the Universal Connection Pool feature. You should be able to get started with the information I gave you, but it didn’t include any hints on how to have a look under the covers of UCP. This can be changed …Oracle includes very fine-grained logging information with UCP, but experiment show that you have to either use log level FINE or FINEST to get to the real information of what’s going on. Read the rest of this entry »

Posted in 11g Release 2, Performance, RAC | Tagged: , | 2 Comments »