Martins Blog

Trying to explain complex things in simple terms

RAC and HA SIG meting Royal Institute of British Architects September 2011

Posted by Martin Bach on September 22, 2011

I have been looking forward to the RAC & HA SIG for quite some time. Unfortunately I wasn’t able to make the spring meeting which must have been fantastic. For those who haven’t heard about it, this was the last time the SIG met under its current name-as Dave Burnham, the chair pointed out in his welcome note.

RAC & HA SIG is going to merge with the management & infrastructure SIG to form the availability management and infrastructure SIG, potentially reducing the number of meetings to 3 for the combined SIG. This is hopefully going to increase the number of attendees and also offer a larger range of topics. I am looking forward to the new format and am hoping for a wider number of topics and greater appeal.

Partly down to the transport problems that hit London today (Victoria Line was severely delayed and apparently overground services were impacted as well) the number of attendees was lower than expected.

The following are notes I have taken during the sessions, and as I’m not the best multi-tasking person in the world there may be some grammatical errors and typos in this post for which I apologise in advance.

Support Update-Phil Davies

The first presentation was Phil Davies’s support update which provided the usual good overview of what is currently relevant in Oracle support. My personal highlight was the fact that you can limit the number of child cursors per statement via an underscore parameter. This worked will for his customer who had to use CURSOR_SHARING set to FORCE.

Also there is an interesting problem related with Data Guard, the RFS process and overwriting of arbitrary files on the standby.

Plugging in the Database Machine-Joel Goodman

Joel delivered a very good presentation about monitoring the Exadata Database Machine. What’s great about Joel is his depth of knowledge and his ability to enrich the presentation with annotations both from the classroom as well as real life. If you haven’t bookmarked his blog yet, it’s well worth doing so from http://dbatrain.wordpress.com/ .

I personally have seen this presentation internally at a customer site, but still learned new things. Especially about the SNMP traps being routed back into the MS process on the cells, which can then be checked via cellcli.

Every so often the current metrics are flushed to disk, and move from metriccurrent to metrichistory. The metric history is kept for 7 days by default, and I think I’ll look at extending my monitoring solution to heave them into the database into a statspack-like schema.

Another interesting fact to know is that ADR is also available on the cells, including the adrci command with all its options.

Plugins for OEM 11.1 include

  • Infiniband plug-in
  • Cisco plug-in
  • ILOM (only for database nodes)
  • Exadata plug-in
  • PDU plug-in
  • KVM plug-in

Of course for these to work you have to install the agents on the database servers (only!). Once the agent is deployed, the plug-ins need to be deployed to the Grid Control infrastructure first before they are passed on to the agents.

The Exadata plug in requires the database server’s agent software owner to use passwordless authentication to the cell’s cellmonitor accounts. Also, the cells must be configured to report SNMP traps to Grid Control. I guess a thorough read of the plugin installation documentation might be needed.

I personally regarded the other plugins to be of less importance and decided not to record them here-I’m sure there is a white paper on Oracle’s website somewhere.

An interesting side note on the KVM which is missing in the x2-8 is the fact that you could still access the KVM if the internal Cisco switch failed. This is simply because the KVM does NOT go through the Cisco Ethernet switch, but rather directly connects to the corporate network.

High availability for agent monitoring a target is described in MOS note 1110675.1-a sure candidate for further investigation.

After Joel’s presentation we had a great discussion with Sally and Jason about disk failures in Exadata and the quarantine. In certain situations, if multiple disks fail only high redundancy can prevent complete disaster.

Also one should really be careful to not have negative numbers in V$ASM_DISKGROUP.USABLE_FILE_MB. If you do, it’s not an immediate problem, but an imminent danger as soon as a failgroup goes offline-there simply isn’t enough space for an ASM rebalance operation. Summary: you should not run your ASM mirrored disk group at full capacity to avoid trouble. Oh yes, and you should have at least 3 failgroups in a normal redundancy diskgroup.

I suggest you read Joel’s blog entry “mirror mirror on the Exadata” for a more thorough discussion of ASM mirroring in Exadata.

Exadata Storage and Administration-Corrado Mascioli

Corrado is a colleague of mine working in engineering on the same site. He has got great experience in patching Exadata and automating the process.

The cells are shipped with the software pre-installed, based on Oracle Linux. The most important accounts available are

  • root
  • celladmin
  • cellmonitor

These have various degrees of power, listed here in descending order.

Cellcli is the main interface to the storage cell allowing the user to perform administrative tasks.

The main cell processes are:

  • CELLSRV: mainly uses iDB to communicate with the RDBMS nodes and satisfies the I/O requests.
  • Management Server – MS
  • Restart Server – RS

Flash storage is something I blogged about earlier, see here:

Flash disks can be used either as Exadata Smart Flash Cache or Grid Disks, i.e. “ASM disks”. I haven’t created flash grid disks yet but suppose you would want to group the grid disks on a cell group to create failure groups.

David Burnham raised an interesting question about differentiating the flash cache in Exadata from the one available to the mere mortals, available with a patch or 11.2.0.x on Linux and Solaris.

The PCI cards you put into a database server are like another level of buffer cache, whereas the Exadata Smart Flash Cache is a) unique to Exadata and b)

Next Corrado explained the link between the physical disk, LUN, cell disk and grid disks. Especially the 30G taken away from the first 2 cell disks cause an interesting dilemma when it comes to the allocation of space for the DBFS disk group (former SYSTEMDG). For each cell, cell disks 3-12 reserve the last 30G on the innermost tracks of the disk for DBFS_DG.

The DATA diskgroup will by default use the fastest, outermost tracks of the disks, +RECO will take the middle of the disk whereas DBFSDG uses the innermost like I just said.

DBFSDG is mostly used for the database file system but also for the OCR and the voting files. DBFS looks like a normal file system for the end user.

I wonder if you could create a grid disk on a specific number of cell disks? I’d have to check the create griddisk command in cellcli …

All the settings are easily accessible with the cellcli commands list {lun,physicaldisk,celldisk,griddisk}.

The grid disks are visible to ASM using the CELL library (V$ASM_DISK.LIBRARY), and use the path 0/<cell IP>/<griddisk name>. By the nature of the technology all 14 x 12 disks are visible in V$ASM_DISK.

Each cell is its own failure group-which makes sense, given the fact that all disks share a single point of failure. Also worth remembering that since there is no storage array mirroring hence we resort to ASM redundancy.

Corrado shared lots of practical advice about creating grid disks and reconfiguring a storage cell using cellcli.

Panel Session-all speakers and The private cloud-Martin Bach

Well that’s me in the middle of the action-I hope someone else covers these.

Managing ASM redundancy-Julian Dyke

Julian started the RAC SIG in summer 2004, and surely had to have the honour to have the last slot on the current designation.

His opening theme has been a comparison of single threaded CPU performance for different architectures including Intel, AMD, SPARC, and IBM Power. Contact him personally if you are interested in the real results, it suffices to say that the 5600 Xeons are the fastest. I wonder if anyone has a recent Itanium processor willing to run the benchmark.

Interesting twist about a two node cluster and ASM tablespace creation with different allocation units-setting the AU size to 32M reduced the tablespace creation time for a 20G tablespace to a few seconds. There seems to be a lot of inter-ASM instance message exchange.

Following this Julian continued with the discussion of the ASM utility kfed to dump disk group metadata followed by a graphic visualisation about extent allocation and maintenance during ASM rebalance operations.

The nugget for today was to learn why the ACD uses 42 entries and ASM_POWERLIMIT used to be 11. 42 is self-explanatory. The power limit of 11 is a true classic-it’s one faster (louder), in honour to Spinal Tap.

There are a number of new features in 11.2 Julian mentioned which I covered in Chapter 8 of “Pro Oracle Database 11g RAC on Linux”. Of particular interest was the location of the 3rd voting file in stretched and non-stretched RAC if you used two different SANs for the first 2 failgroups. Oracle now supports the iSCSI/NFS approach for the third voting disks previously recommended for stretched RAC in “normal” RAC as well.

One of the features Julian didn’t mention was the location of the snapshot controlfile which since 11.2.0.2 also has to reside on shared storage-there is a note on MOS for this.

Oh yes, and then there was the demo about ASM normal redundancy which continued the discussion started at the last RAC SIG.

Summary

I enjoyed today a lot, met lots of interesting people and had many technical discussions about all sorts of things. One thing I’m looking forward to is the change of the SIG format, especially hoping for more attendees to make it even more attractive.

Unfortunately, due to the change of date for the Management and Infrastructure SIG which I will now miss I couldn’t see Piet de Visser whom I haven’t seen since Birmingham last year. Maybe I need to get to work on the same site as he does for a few weeks to catch up properly.

4 Responses to “RAC and HA SIG meting Royal Institute of British Architects September 2011”

  1. Martin,

    >>> An interesting side note on the KVM which is missing in the x2-8 is the fact that you could still access the KVM if the internal Cisco switch failed. This is simply because the KVM does NOT go through the Cisco Ethernet switch, but rather directly connects to the corporate network.

    This is valid even for V2 and X2-2 models. KVM is directly connected to customer network. Infact you can remotely re-image any compute and cell node without stepping into the data center as long as your ILOM is working.

    I had always wanted the ability to complete configure the Exadata remotely, but since when Exadata machine is first delivered to customer, its ILOM is not configured with customer network details your cant do the very first configuration remotely. But once ILOM and cisco switches are connected you dont even need KVM most of the time. Only time you could need KVM during patching is when your ILOM itself is not working during its firmware upgrade.

    I also connected the PDU to our network directly in the lab. I am not looking forward to explaining to someone in the datacenter, who does not know about Exadata, how to configure PDU to connect to our network. Its not difficult, but it could be challenging.

  2. […] should point out that Martin Bach has blogged this far more coherently than I am capable […]

  3. PdV said

    Thanks for a good write-up Martin. Catch you in Brum !

    And I’ll try to catch a few SIGs next year.

    If you are serious to come work on one of our of our sites, you have to move 9 hrs east of UK.
    No Exa- equipment yet, you’d be bored soon.

    • Martin Bach said

      I wouldn’t mind shifting time zones for a little change, and Exadata (or RAC) is not a requirement :)

      See you in Brum!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: