Monthly Archives: October 2011

Troubleshooting Oracle agent

As you may have read on this blog I recently moved from Oracle Enterprise Manager 11.1 GRID control to the full control of the cloud-12.1 has taken its place in the lab.

I also managed to install agents via self download (my OEM is x86 to reduce the footprint) on a 2 node cluster: rac11203node1 and rac11203node2. After a catastrophic crash of both nodes followed by a reboot none of the agents wanted to report back to the OMS.

The difference

Oracle 12.1 has a new agent structure: where you used the agent base directory in previous releases to create the AGENT_HOME this now changed. In 11.1 I could specify the agent base to be /u01/app/oracle/product, and OUI would deploy everything in a subdirectory it creates, called agent11g (or agent 10g for 10.2.x).

Now I set the agent base to the same value and installed my agents in parallel, but found that there is no agent12c directory under the base. Instead I found these:

[oracle@rac11203node1 product]$ ls -l
total 48
drwxr-xr-x. 73 oracle oinstall  4096 Oct 27 22:40
-rw-rw-r--.  1 oracle oinstall    91 Sep 23 08:52
drwxr-xr-x.  6 oracle oinstall  4096 Oct 28 14:57 agent_inst
drwxr-xr-x.  3 oracle oinstall  4096 Oct 15 21:35 core
drwx------.  2 oracle oinstall 16384 Oct 14 21:02 lost+found
drwxr-xr-x.  8 oracle oinstall  4096 Oct 15 21:50 plugins
-rwxr-xr-x.  1 oracle oinstall   223 Oct 15 21:25 plugins.txt
-rw-r--r--.  1 oracle oinstall   298 Oct 15 21:42 plugins.txt.status
drwxr-xr-x.  5 oracle oinstall  4096 Oct 15 21:43 sbin

So it’s all a bit different. The core/ directory contains the agent binaries. The agent_inst directory contains the the sysman directory. This is where all the configuration and state information is stored. In that respect the sysman directory is the same as in 11.1.

Now back to my problem-both agents that previously used to work fine were reported “unavailable”. The agent information is no longer in the setup-agents-management agents.

For 12.1 you need to navigate to setup-agents from the top down menu in the upper right corner.This takes you to the overview page. OK, so I could see the agents weren’t communicating with the OMS.

On the machine I could see this:

[oracle@rac11203node1 log]$ emctl status agent
Oracle Enterprise Manager 12c Cloud Control
Copyright (c) 1996, 2011 Oracle Corporation. All rights reserved.
Agent Version      :
OMS Version        : (unknown)
Protocol Version   :
Agent Home         : /u01/app/oracle/product/agent_inst
Agent Binaries     : /u01/app/oracle/product/core/
Agent Process ID   : 13270
Parent Process ID  : 13215
Agent URL          : https://rac11203node1.localdomain:3872/emd/main/
Repository URL     : https://oem12oms.localdomain:4901/empbs/upload
Started at         : 2011-10-26 18:30:17
Started by user    : oracle
Last Reload        : (none)
Last successful upload                       : (none)
Last attempted upload                        : (none)
Total Megabytes of XML files uploaded so far : 0
Number of XML files pending upload           : 1,858
Size of XML files pending upload(MB)         : 8.05
Available disk space on upload filesystem    : 49.16%
Collection Status                            : Collections enabled
Last attempted heartbeat to OMS              : 2011-10-27 15:42:47
Last successful heartbeat to OMS             : (none)

Agent is Running and Ready

The settings are correct, I have verified that with another, uploading and otherwise fine agent. I have also secured the agent, and $AGENT_BASE/agent_inst/sysman/log/secure.log as well as the emctl secure agent commands reported normal, successful operation.

Still the stubborn thing doesn’t want to talk to the OMS – in the agent overview page both agents are listed as “unavailable”, but not blocked. When I force an upload, I get this:

[oracle@rac11203node1 log]$ emctl upload
Oracle Enterprise Manager 12c Cloud Control
Copyright (c) 1996, 2011 Oracle Corporation. All rights reserved.
EMD upload error:full upload has failed: uploadXMLFiles skipped :: OMS version not checked yet. If this issue persists check trace files for ping to OMS related errors. (OMS_DOWN)

However it’s not down, I can reach it from another agent (which happens to be on the same box as the OMS)

[oracle@oem12oms]$ $ORACLE_HOME/bin/emctl status agent
Oracle Enterprise Manager 12c Cloud Control
Copyright (c) 1996, 2011 Oracle Corporation. All rights reserved.
Agent Version      :
OMS Version        :
Protocol Version   :
Agent Home         : /u01/gc12.1/agent/agent_inst
Agent Binaries     : /u01/gc12.1/agent/core/
Agent Process ID   : 2964
Parent Process ID  : 2910
Agent URL          : https://oem12oms.localdomain:3872/emd/main/
Repository URL     : https://oem12oms.localdomain:4901/empbs/upload
Started at         : 2011-10-15 21:00:37
Started by user    : oracle
Last Reload        : (none)
Last successful upload                       : 2011-10-27 15:46:38
Last attempted upload                        : 2011-10-27 15:46:38
Total Megabytes of XML files uploaded so far : 0
Number of XML files pending upload           : 0
Size of XML files pending upload(MB)         : 0
Available disk space on upload filesystem    : 49.16%
Collection Status                            : Collections enabled
Last attempted heartbeat to OMS              : 2011-10-27 15:48:34
Last successful heartbeat to OMS             : 2011-10-27 15:48:34

Agent is Running and Ready

And no, the firewall is turned off and I can connect to the upload from any machine in the network:

[oracle@rac11203node1 log]$ wget --no-check-certificate https://oem12oms.localdomain:4901/empbs/upload
--2011-10-27 15:55:46-- https://oem12oms.localdomain:4901/empbs/upload
Resolving oem12oms.localdomain...
Connecting to oem12oms.localdomain||:4901... connected.
WARNING: cannot verify oem12oms.localdomain’s certificate, issued by “/O=EnterpriseManager on oem12oms.localdomain/OU=EnterpriseManager on oem12oms.localdomain/L=EnterpriseManager on oem12oms.localdomain/ST=CA/C=US/CN=oem12oms.localdomain”:
Self-signed certificate encountered.
HTTP request sent, awaiting response... 200 OK
Length: 314 [text/html]
Saving to: “upload.1”

100%[======================================>] 314 --.-K/s in 0s

2011-10-27 15:55:46 (5.19 MB/s) - “upload.1” saved [314/314]

The agent complains about this in gcagent.log:

2011-10-27 15:56:08,947 [37:3F09CD9C] WARN – improper ping interval (EM_PING_NOTIF_RESPONSE: BACKOFF::180000)
2011-10-27 15:56:18,471 [167:E3E93C4C] WARN – improper ping interval (EM_PING_NOTIF_RESPONSE: BACKOFF::180000)
2011-10-27 15:56:18,472 [167:E3E93C4C] WARN – Ping protocol error [OMS sent an invalid response: “BACKOFF::180000”]

At least someone in Oracle has some humour when it comes to this.

The Solution

Now I dug around a lot more and finally managed to get to the conclusion. It was actually a two fold problem. The first agent was simply blocked. After finding a way to unblock it, it worked happily.

The second agent was a bit more trouble. I unblocked it as well from the agent page in OEM, which failed. As it turned out the agent was shut down. And it didn’t start either:

[oracle@rac11203node2]$ emctl start agent
Oracle Enterprise Manager 12c Cloud Control
Copyright (c) 1996, 2011 Oracle Corporation.  All rights reserved.
Starting agent ............. failed.
Target Metadata Loader failed at Startup
Consult the log files in: /u01/app/oracle/product/agent_inst/sysman/log

I checked the logs and found this interesting bit of information:

2011-10-24 21:35:21,387 [1:3305B9] INFO - Plugin oracle.sysman.oh is now active
2011-10-24 21:35:21,393 [1:3305B9] INFO - Plugin oracle.sysman.db is now active
2011-10-24 21:35:21,396 [1:3305B9] WARN - Agent failed to Startup for Target Metadata Loader in step 2
oracle.sysman.gcagent.metadata.MetadataLoadingException: The targets.xml file is empty
at oracle.sysman.gcagent.metadata.MetadataManager$Loader.validateMetadataFile(
at oracle.sysman.gcagent.metadata.MetadataManager$RegistryLoader.processMDFile(
at oracle.sysman.gcagent.metadata.MetadataManager$RegistryLoader.readRegistry(
at oracle.sysman.gcagent.metadata.MetadataManager$RegistryLoader.load(
at oracle.sysman.gcagent.metadata.MetadataManager.load(
at oracle.sysman.gcagent.metadata.MetadataManager.runStartupStep(
at oracle.sysman.gcagent.metadata.MetadataManager.tmNotifier(
at oracle.sysman.gcagent.tmmain.lifecycle.TMComponentSvc.invokeNotifier(
at oracle.sysman.gcagent.tmmain.lifecycle.TMComponentSvc.invokeInitializationStep(
at oracle.sysman.gcagent.tmmain.lifecycle.TMComponentSvc.doInitializationStep(
at oracle.sysman.gcagent.tmmain.lifecycle.TMComponentSvc.notifierDriver(
at oracle.sysman.gcagent.tmmain.TMMain.startup(
at oracle.sysman.gcagent.tmmain.TMMain.agentMain(
at oracle.sysman.gcagent.tmmain.TMMain.main(
2011-10-24 21:35:21,397 [1:3305B9] INFO - Agent exiting with exit code 55
2011-10-24 21:35:21,398 [31:F9C26A76:Shutdown] INFO - *jetty*: Shutdown hook executing
2011-10-24 21:35:21,399 [31:F9C26A76] INFO - *jetty*: Graceful shutdown SslSelectChannelConnector@
2011-10-24 21:35:21,399 [31:F9C26A76] INFO - *jetty*: Graceful shutdown ContextHandler@14d964af@14d964af/emd/lifecycle/main,null

I yet have to find the reason for the empty targets.xml file but sure enough it existed with 0 byes length.

Simple enough I thought, all I need to do is run agentca to repopulate the file. Unfortunately I couldn’t find it.

[oracle@rac11203node2 emd]$ find /u01/app/oracle/product/ -name "agentca*"
[oracle@rac11203node2 emd]$

This was a bit of a let down. Then I decided to create a new targets.xml file and try a resynchronisation of the agent.This is a well hidden menu item so I dedided to show it here:

The only element that went into targets.xml was “<targets />”. This was sufficient to start the agent, which is a requirement for the resynchronisation to succeed. I was quite amazed that this succeeded, but it did:

[oracle@rac11203node2 emd]$ find /u01/app/oracle/product/ -name "agentca*"
[oracle@rac11203node2 emd]$

This was very encouraging, and both agents are now working properly.

Move the EM12c repository database

I have made a little mistake creating a RAC database for the OEM 12c repository-I now need a little more lightweight solution, especially since I’m going to do some fancy failover testing with this cluster soon! An single instance database without ASM, that’s what I’ll have!

Now how to move the repository database? I have to admit I haven’t done this before, so the plan I came up with is:

  1. Shut down the OMS
  2. Create a backup of the database
  3. Transfer the backup to the destination host
  4. Restore database
  5. Update OEM configuration
  6. Start OMS

Continue reading

The tale of restoring the OCR and voting files on Linux for RAC

As part of a server move from one data centre to another I enjoyed working in the depths of Clusterware. This one has been a rather simple case though: the public IP addresses were the only part of the package to change: simple. One caveat though was the recreation of the OCR disk group I am using for the OCR and 3 copies of the voting file. I decided to reply on the backups I took before the server move.

Once the kit has been rewired in the new data centre, it was time to get active. The /etc/multipath.conf file had to be touched to add the new LUNs for my +OCR disk group. I have described the processes in a number of articles, for example here:

A few facts before we start:

  • Oracle Enterprise Linux 5.5 64bit
  • device-mapper-multipath-0.4.7
  • Grid Infrastructure (actually it is Oracle Database SAP Bundle Patch
  • ASMLib

I have already described how to restore the OCR and voting files in in “Pro Oracle Database RAC 11g on Linux”, but since then the procedure has changed slightly I thought I’d add this here. The emphasis is on “slightly”. Continue reading

The art of getting security right-an observation

A number of high-profile hacks recently (and not so recent) has caught my attention. Well I thought, not such a big problem-I don’t have a PS3 and hence don’t have an account that can be hacked. I was still intrigued that the hackers managed to get hold of the passwords. I may be wrong here, as I haven’t followed the developments not close enough (as I wasn’t affected), but the question I asked myself: how can they be obtained? Surely Sony must have used some sort of encryption for passwords. It’s so far-fetched that anybody stores passwords in clear text somewhere!

Oh well then, Sony has been targeted a number of times and time and time again the security was breached. They only consolation is that the intruders have made it very public when they were successful, otherwise we’d have never learned about the problems Sony has with security.

Now other sites were hacked as well, and somehow I felt the impacts coming closer, such as and others.


Oh well the world is a bad place and the bad guys are way ahead of the good ones I thought, for as long as I’m not affected… That held until today when the ISP and infrastructure provider I am hosting my lab at sent an email out that their systems have been compromised and every customer should change all the passwords they have used with their administrative, web based interface as well as accounts on the servers themselves. I was very happy with Hetzner as their EQ8 server offering was a system I used extensively.

What can I say? I’m very not impressed. Again, how can passwords be stored in a system in a way that makes it easy to compromise them? Was that an Excel Sheet? Why can’t passwords be sensibly encrypted so they are just garbage to intruders. I think a global standard has to be put in place similar to the PCI standard which makes password encryption with strong algorithms mandatory. Better still, failing to do so should be fined. In a way that it hurts.

For those who are interested, the website has the latest. I was considering moving some of my other domains over to them but may have to rethink that strategy.

Clear text passwords in email

But it’s not only the careless storage of sensitive information on one’s own system. How many times did you get emails stating “welcome to service xyz, your username is abc and your password def”. They might as well send your bank details as well including credit card numbers and expiration date.

There has to be a wakeup call in the industry: it is far too simple to outsmart you! Do use strong encryption to protect customer data and identities. Failing to do so can, and maybe one day will cripple the online business of many companies, causing so-called financial analysts to spread panic and sell lots of shares plunging economies into difficult times.

Finally {}

Passwords aren’t a good enough solution to protecting identity and access to ones accounts. IMO there should be better ways of ensuring unauthorized access to your confidential data. What about a finger print reader? Or an iris scan? Sounds James Bond at the moment but if we are to trust the infrastructure again, we need to think of alternatives to passwords. To be secure they are long, clumsy, hard to memorise so you end up using one for almost everything. Also, root kits undermine your home PC and can make almost all online banking a very dangerous game. Trojans are able to undermine security of the iTAN system, yet my bank, HSBC doesn’t offer one of the only safe options for Internet banking: HBCI. Are we just too naïve? A 16 year old schoolboy from Germany performed a “safety audit” for many German banks’ online applications and found that most of them were insecure (XSS the main problem)

Post Scriptum

If anyone knows a reliable, responsible service provider where I can move my domains to, please get in touch!

Installing Oracle Enterprise Manager 12c on OL 5.7

I have been closely involved in the upgrade discussion of my current customer’s Enterprise Managers setup from an engineering point of view. The client uses OEM extensively for monitoring, alerts generated by it are automatically forwarded to an IBM product called Netcool.

Now some of the management servers are still on in certain regions, and for a private cloud project I was involved in an 11.1 system was needed.The big question was: wait for 12.1 or upgrade to 11.1?

So to cut a long story short I have been very keen to get to the OEM 12c beta programme, but unfortunately wasn’t able to make it. Also, I wasn’t at Open World this year which means I didn’t get to see any of the demos. You can imagine I was quite curious to get my hands on it, and when it has been released a few days ago I downloaded it to my lab machine. I created a new domU for the database- plus latest PSU and another one for the management server. I assigned 2 CPUs each, the database server got 2G of memory while the OMS received 8.Don’t take this as a recommendation though, it’s only for lab use! I wouldn’t use less than 24G of memory for a production management server, and it would obviously follow the MAA recommendations and be installed behind an enterprise grade load balancer etc. Needless to say I’d use RAC+Data Guard for the repository database.

Continue reading

Adding another node for RAC on Oracle Linux 6.1 with kernel-UEK

As I have hinted at during my last post about installing Oracle on Oracle Linux 6.1 with Kernel UEK, I have planned another article about adding a node to a cluster.

I deliberately started the installation of my RAC system with only one node to allow my moderately spec’d hardware to deal with a second cluster node. In previous versions of Oracle there was a problem with node additions: the $GRID_HOME/oui/bin/ script did pre-requisite checks that used to fail when you had used ASMLib. Unfortuntely, due to my setup I couldn’t test if that was solved (I didn’t use ASMLib).


As with many cluster operations on non-Exadata you should use the cluvfy tool to ensure that the system you want to add to the cluster meets the requirements. Here’s an example session for the cluvfy output. Since I am about to add a node, the stage has to be “-pre nodeadd”. rac11203node1 is the active cluster node, and rac11203node2 the one I want to add. Note that you run the command from (any) existing node, specifying the nodes to be added with the “-n” parameter. For convenience I have added the “-fixup” option to generate fixup scripts if needed. Also note that this is a lab environment, real production environments would use dm-multipath for storage and a bonded pair of NICs for the public network. Since you no longer need to bond your private NICs, Oracle does that for you now.

Continue reading