Monthly Archives: February 2011

Quis custodiet ipsos custodies-Nagios monitoring for Grid Control

I have a strange problem with my Grid Control 11.10.1.2 Management Server in a Solaris 10 zone. When restarted, the OMS will serve requests fine for about 2 to 4 hours and then “hang”. Checking the Admin Server console I can see that there are stuck threads. The same information is also recorded in the logs.

NB: the really confusing part about Grid Control 11.1 is the use of Weblogic-you thought you knew where the Grid Control logs where? Forget about what you knew about 10.2 and enter a different dimension :)

So to be able to react quicker to a hang of the OMS (or EMGC_OMS1 to be more precise) I set up nagios to periodically poll the login page.

I’m using a VM with OEL 5.5 64bit to deploy nagios to, the requirements are very moderate. The install process is well documented in the quickstart guide-I’m using Fedora as a basis. OEL 5.5 doesn’t have nagios 3 RPMs available, so I decided to use the source downloaded from nagios.org. The tarballs you need are nagios-3.2.3.tar.gz and nagios-plugins-1.4.15.tar.gz at the time of this writing. Continue reading

Advertisements

GC 11.1 and Monitoring Templates

Throughout the last 2 weeks I have been working (or better: tried to work) with Grid Control 11.1 as the central monitoring and deployment solution for my current project.

The plan is to use EMGC 11.1 in conjunction with an 8 node cluster to automatically deploy RAC One Node databases. Please don’t ask about RAC One Node-that wasn’t my decision, and as I understand the previous project members only chose this as a poor compromise to keep the operations team happy(-ish)

Besides the fact that the OMS-which runs in a Solaris Zone repeatedly “hangs” and can’t be contacted by emcli or any browser (Bug 11804553)-RAC One Node is NOT SUPPORTED as a target in Grid Control 11.1. It might be supported in GC 12.1 later in 2011. But I digress

The Requirement

The OPS team maintains their own 10.2.0.5 management servers. To allow us to perform some testing with the automatic database deployment without messing with a life OMS, it has been decided to install OEM GC 11.1 with PSU 2 locally on Solaris with a repository database on Linux. We needed GC11.1 to supoprt our 11.2.0.2 cluster.

After the installation of the OMS I tried to export the required management templates from the life OMS (remember it’s 10.2.0.5) and import them into 11.1 to save myself a lot of work.

Export a management template

The export function seems to have been introduced in 10.2.0.3 and it works great. All you need to do it hop on the OMS, and use “emcli” (Enterprise Manager Command Line Interface) to log on and export the template. A sample session is shown here:

  • emcli login -username=yourUserName -password=yourPassword
  • emcli export_template -name=TemplateName -target_type=TargetType -output_file=/path/to/templateName.xml

If you are unsure about template names and targets, you can connect to the repository as sysman and query mgmt_templates:

SQL> SELECT TEMPLATE_NAME,TARGET_TYPE FROM MGMT_TEMPLATES;

And so I happily exported the management templates from the 10.2.0.5 OMS.

The Bad News

Unfortunately, you can’t import non 11.1 templates into an 11.1 OMS. When I tried it I got the following error:

$ emcli import_template -files=”emd.10205.xml”
Monitoring template file emd.10205.xml exported from 10.2.0.5.0 OMS can not be imported to 11.1.0.1.0 OMS

Bugger. Sure enough, the XML file has a version tag:

<?xml version = '1.0' encoding = 'UTF-8'?>
<MonitoringTemplate template_name="Agent Template" target_type="oracle_emd" is_public="0" oms_version="10.2.0.5.0" owner="SYSMAN" xmlns="http://www.oracle.com/DataCenter/MonitoringTemp">
...
</MonitoringTemplate>

The solution is to revert to the bad old times and manually comparing source and destination. A rather laborious and tiresome way of getting information across. Don’t forget to export the completed template from 11.1 to save yourself from going through that again.

Using wget and proxy to download patches from MOS

This is a rather quick note, but can be quite useful in certain situations. I currently look after a system which is quite difficult to jump on. That means before I get to do a “sudo su – oracle” I need to get to a jump-off box, ssh to 2 other machines and then log in as myself. It’s secure, but not user friendly. Especially in this case where I needed to run the latest RDA for an an open support request.

So rather than “dragging” the RDA with me on each box I used the new (Flash) interface to get a small shell script which you just need to deploy to your machine and run. It then connects to updates.oracle.com and does its magic.

The script works mostly fine, but depending on your environment you have to make small changes. My example is for Solaris 10, any Linux should just work out of the box.

Continue reading

RAC One Node and Database Protection

An email from fellow Oak Table Member James Morle about RAC One Node and failover got me thinking about the capabilities of the product.

I have written about RON (Rac One Node) in earlier posts, but haven’t really explored what happens with session failover during a database relocation.

Overview

So to clarify what happens in these two scenarios I have developed a simple test. Taking a RON database + a service I modified both to suit my test needs. Connected to the service I performed a database relocation to see what happens. Next I killed the instance (I wasn’t able to reboot the node) t o simulate what happens when the node crashes. Continue reading

Error message of the day: OUI-25023 and the FQDN

It’s been a long day with many problems around a Grid Control installation, including (but not limited to) corruption of the repository database, bugs in OUI when it comes to deinstalling the Oracle Management Server, lots of files left over by the weblogic “uninstall.sh” script and many more. Some of the error messages were quite misleading, and OUI-25023 just was one too many. What happened?

Earlier today I was trying to install the 64bit 11.1.0.1 agent on an 8 node cluster. After an initial headache (see below) it worked ok. However, I couldn’t resist mentioning OUI-25023. Here’s the complete story.

I downloaded the 11.1 agent for linux x86-64 as per the GC 11.1 documentation and deployed it to my fresh-installed management server. The OMS is on Solaris SPARC, and Grid Control doesn’t supply agents for a different platform. However, the security experts have locked the oracle account down on the cluster which ruled out the “agent push” scenario. I then opted for the installation via a response file, as described in the documentation. Continue reading

Running Grid Control Agent commands standalone

I had an error message today from one of my grid agents which was cut short in the GUI just when it became interesting. So I thought of a way of running the command on the comand line to get the full output.

This has been a little easier than I thought. I based my approach on an earlier blog article on my knowledgebase to get the perl environment variables set. I then needed to figure out where some of the libraries (perl scripts ending in *.pm) the agent script are referring were located.

A simple “locate -i *pm | grep $ORACLE_HOME” did it. This enabled me to write a preliminary script to run an EM agent task, shown below. It expects that you have ran “oraenv” previously to set the environment to the AGENT_HOME. When referring to ORACLE_HOME in the following, the AGENT_HOME is meant. It takes the full parameter to the script to be executed as the parameter and checked for ORACLE_HOME and $1 to exist. Continue reading

11.1 GC agent refuses to start

This is a follow-up post from my previous tale of how not to move networks for RAC. After having successfully restarted the cluster as described here in a previous post I went on to install a Grid Control 11.1 system. This was to be on Solaris 10 SPARC-why SPARC? Not my platform of choice when it comes to Oracle software, but my customer has a huge SPARC estate and wants to make most of it.

After the OMS has been built (hopefully I’ll find time to document this as it can be quite tricky on SPARC) I wanted to secure the agents on my cluster against it. That worked ok for the first node:

  • emctl clearstate agent
  • emctl secure agent
  • emctl start agent

Five minutes later the agent appeared in my list of agents in the Grid Control console. With this success backing me I went to do the same on the next cluster node.

Here things were different-here’s the sequence of commands I used:

$ emctl stop agent
Oracle Enterprise Manager 11g Release 1 Grid Control 11.1.0.1.0
Copyright (c) 1996, 2010 Oracle Corporation.  All rights reserved
$

Continue reading