Monthly Archives: September 2009

Build your own 11.2 RAC system-part I: DNS

As many of you already know, Oracle released 11g Release 2 of the database for Linux x86 and x86-64. That is really cool and this time I don’t want to miss out on researching some new features of the new release. I have spent some time reading up about 11.2 and for what it’s worth I’d consider it more of a step forward compared to 11.1 which IMO is just a glorified 10.2.0.4 with a lot of cost options. But I disgress….

The reason of this post is to allow the reader to set up his own DNS server for building an 11.2 RAC system. As you may know, 11.2 uses DNS for two main purposes:

  1. Grid Plug and Play
  2. Single Client Access Name (SCAN)

Grid Plug and Play is something I’ll look at later so let’s focus on the SCAN addresses. The documentation states that we should at least provide 3 IP addresses for a single SCAN name which will be used in a round robin fashion (reference: Section 2.7.2.2 IP Address Requirements for Manual Configuration in the Grid Infrastructure Installation Guide for Linux).

Huh? Are the DBAs now tasked with DNS administration? Probably not, but it doesn’t hurt understanding the concepts, especially if you are like me and want a RAC cluster in your lab environment.

DNS and Linux

I initially looked at DNS when still at the University which seems like a long time ago nowadays. Back then Linux was the uni’s preferred non-Windows platform so I knew which package to install. The following example uses bind 9.3.4-6P1.el5 which is the unpatched DNS server distributed with RHEL 5 update 2.

A word of caution: this article shouldn’t be used to set up a production DNS server, it’s merely intended to get you a DNS server for a lab environment!

With all that said, let’s proceed to getting our SCAN addresses registered. First of all, use rpm to install the package.

Once that’s installed, we need to configure our DNS server. bind9 comes with a number of sample configuration files which make our life a little easier. Traditionally, bind is configured in 2 places:

  • /etc/named.conf for the zone definition and
  • /var/named for the zone configuration.

Continue reading

RAC SIG 09/2009

The last RAC SIG meeting before the conference, in an Oracle building in Solihull near Birmingham. I dreaded the food (all you ever get are sandwiches, no comparison to the London meeting), but it wasn’t that bad after all.

I was really delighted to see some of my friends after some time, and David had a great agenda for the day.

Phil Davis’s deputy gave a refreshingly open service update, the main highlights were:

  • 11.1.0.7 is the terminal release
  • 11.2 is out (no surprise)
  • 10.2 reaches end of premier support July next year

There was a little discussion about which of the dot releases was buggier – 10.2.0.2 or 10.2.0.3. I’d personally consider 10.2.0.3 as a really terrible release.

Next on the agenda was me, and this time I was much more pleased with how the presentation went. Finished in time, but very few questions. I hope I didn’t annoy the very knowledgable Steve Shaw, author of hammerora who happened to be in the audience by a comment about his tool. I’ll certainly have a closer look at it when there is time. Oh, by the way, I presented about the Oracle Linux Test kit.

Then we had Steve Shaw’s presentation as an answer to a slighly weaker presentation at the last RAC SIG meeting in London. Steve introduced Oracle VM in the context of cheap computing resources for HA (without RAC) based on OEL and OCFS2. So far so good, I have done a similar presentation in the past. It got more interesting when Steve-who works for Intel-gave insights in how their new processors use more than just vt to deliver better performance for (hvm) guests. However cool these features are, I can’t help thinking they were made for hyper-v and vmware rather than xen. Also it seems as if Oracle would stick to xen 3.1 (ancient!) while trying to improve the management interface. I pointed out that there is a lot of room for improvement. Ever tried to mount an iSCSI target in an Oracle VM (2.1.5) domU? You are in for a surprise. Steve presented the usual xen networking diagram which isn’t state of the art anymore. Check out opensuse 11.1/SLES 11 for a more modern approach. Nevertheless this was an interesting presentation especially in regards to cloud computing and virtualisation.

Another virtualisation presentation, this time by Carl Bradshaw who was the only presenter to use a Mac. Cool. The contents wasn’t that thrilling for me mainly because of a lack of depth. Most interesting for me was the part of cpu fragmentation, something paravirtualisation can manage better. I can bump up my pv-domU to 32 cpus-try this on vsphere! Also vmware is penalised badly by Oracle licensing making it more unattractive than necessary. This plus a bit of an overhead won’t prompt me to introduce it soon.

After lunch a stand up panel was set up to discuss anything about virtualisation. The original invitees couldn’t make it unfortunatly but their replacements surely were equally capable.

Last one for my day (had to pull the escape chute as I’m from Brighton, a good 3 hour drive from Birmingham) was Deepak Singh’s introduction to extended distance RAC. Nice introduction to the subject, for those of you who didn’t know you now can do this with SE RAC in 11g.

UNIX SIG 09/2009

I can’t say I like junction 15 a lot … simply too much traffic when moving off the M25.

When arriving at the venue I was still quite early and had a chance to catch up with Phil Davis and Joel Goodman from Oracle.

It seems that the credit crunch affected the numbers quite badly as the turnout was really low. I think I counted roughly 20 delegates, what a shame. Maybe I have scared them off with my presentations?

This time the user group tried something completely new, and I’m not entirely sure if it’s a good thing. The afternoon sessions were recorded and broadcast via go-to meeting. In my opinion that reduces the value of a SIG meeting quite significantly as you no longer meet the presenters in person. I always found this kind of thing useful but took little pleasure in it. The oracleracsig.org site uses a similar scheme, but the sound quality usually leaves something to be desired.

This and some technical problems aside the day started with someone from Oracle giving the audience a demo of the new and flashy user interface for metalink, now rebranded “my oracle support”. It didn’t change the quality of the support though! The highlight was the announcement that Oracle will retire the apex based classical metalink soon which is great since most desktops, especially in high security environments will never use flash. This has become even more apparent as the presenter had to download the latest version of the flash plugin. This is something you could completely forget about in most corporate environments as all hardware is locked down. Well done Oracle! The comment that they are aware of the problem and working on a solution only prompted grins.

Next was my turn, and I seriously have to apologise for the delivery of the iSCSI presentation. When I first checked the time I only had 15 minutes left and haven’t even managed to get to the first demo, the creation of an iSCSI target! Bummer. I then had to step up the already fast pace to get towards the end and rushed the whole lot. I can only recommend that you download the latest version of the presentation to have a look – all the screen prints are on it and when looking at it at a quiet moment should convey the message. Next time I’ll rehearse more and cut the demo out, or try and get a 60 minute slot. Anyway.

Phil then presented the latest support update, few surprises here.

Over to David Kurtz, long time UNIX SIG chair. He presented about graphing AWR data through Excel. Previously I read that Tanel Poder had a similar tool, called PerfSheet.xls available to do similar things. Very useful, especially since you can point it to statspack tables for pre 10g databases. Although I have done business studies at the Uni I never felt compelled to use Excel so I wasn’t that fluent with the pivoting wizard – but definitely useful things!

After lunch, I was on to present about optimising Linux kernel parameters for Oracle. I have spent so much time preparing that presentation that I could impossibly fit it all into 45 minutes. Prior to the presentation I then hid a few more slides (which are still available online!) to make it shorter, the iscsi presentation disaster still vividly in my mind. I got a few nods from Nial Litchfield, which was encouraging. Joel then pointed out that the Oracle validated RPM would do a similar job, which is of course correct. But I wanted to show what’s happening behind the scenes… I don’t think there were too many kernel hackers in the audience, but despite that the presentation can still be used as reference (and I certainly will do so).

Row migration can aggravate contention on cache buffer chains latch – a war story and the second part of David’s presentation. He is seriously smart, and used his Excel graphs to find how row chaining (or migration – there was a lot of discussion :) caused a database system to hit a brick wall after the 80th user connected. Very good troubleshooting!

Mangled or Managed – Does Grid Control deliver? Nial’s presentation was all about Grid Control and his experience with the product. All in all a very good presentation and introduction to Grid Control’s architecture. He focused on the design, the good and bad bits as well as licensing (usually a minefield). Very round, but most of the topics were already known to me from my own experience with it during the OCM preparation.

That was it then – back home via M25 and M23, after a good 1:32 I reached Brighton. I had a bit of a mixed feeling this day, most likely caused by my messy iSCSI presentation and the low number of delegates. Let’s see how RAC SIG develops, I hope to see Piet and Jason there.

Introduce Partitioning to a large table

Recently I have been asked to find a way to partition a very big table on very slow storage. Very large table translates into 600G, and slow storage translates into HP MSA 1000. The time window wasn’t big enough to run this, and the insert activity on the table required it to be done on less busy times.

DBMS_REDEFINITION wasn’t an option then so I decided on a multi-step approach. If I can’t neatly use dbms_redefinition then at least I needed the power of the partitioning commands. Here’s the proof of concept.

WARNING
As alwyas, don’t simply copy and paste – this works for me but doesn’t mean it also works for you. Test, test, test first before doing this in production.

Create a “large” table as source first to test. Actually it’s not large at all but that’s besides the point…

create table unpart
tablespace users
as select * from dba_objects;

Table created.

09:53:05 SQL> select count(1) from unpart;

  COUNT(1)
----------
      9960

Now let’s put some more data into it:

begin
 for i  in 1..5 loop  
  insert into unpart select * from unpart;
 end loop;
end;
/

09:53:26 SQL> /

PL/SQL procedure successfully completed.

09:53:27 SQL> commit;

Commit complete.

09:53:30 SQL> select count(1) from unpart;

  COUNT(1)
----------
    318720

Slightly better. I now need to find the upper bound for the newly to be created partition. This is where you shouldn’t insert data into the table…

SQL> select max(object_id) from unpart;

MAX(OBJECT_ID)
--------------
         10459

Now create the partitioned table to hold the data after the move. The whole source table should fit into the first partition.

create table part
partition by range (object_id)
(
 partition p1 values less than (10459),
 partition p2 values less than (20000),
 partition p3 values less than (maxvalue)
)
as select * from unpart where 1 = 0;

SQL> alter table part enable row movement;

Table altered.

A few selects to prove that the table is empty:

SQL> select count(1) from part partition (p1);

  COUNT(1)
----------
         0

SQL> select count(1) from part partition (p2)
  2  /

  COUNT(1)
----------
         0

SQL> select count(1) from part partition (p3);

  COUNT(1)
----------
         0

Now let’s exchange the table with the initial partition:

SQL> alter table part exchange partition p1 with table unpart;

Table altered.

SQL> select count(1) from part partition (p1)
  2  /

  COUNT(1)
----------
    318688

SQL> select count(1) from part partition (p2);

  COUNT(1)
----------
         0

-- Now we can partition the rest!

09:56:22 SQL> alter table part split partition p1 at (5000) into (partition p_del, partition p1) 
           2  update global indexes parallel 4;

Table altered.

09:56:35 SQL> select count(1) from part partition (p_del);

  COUNT(1)
----------
    157312

09:56:46 SQL> select count(1) from part partition (p1);

  COUNT(1)
----------
    161376

09:56:50 SQL> select 157312+161376 from dual;

157312+161376
-------------
       318688

So we’ve got the initial number of rows, evenly split across partitions. This is great! Unfotunately there is a slight caveat with the split partition command – it’s very heavy on resources (remember the multi step approach?). Each row in the source partition will be copied to the destination partitions (new segments), causing a large number of IOs.

09:57:33 SQL> alter table part split partition p1 at (8500) into (partition p1, partition p1bis)
09:57:37   2   update global indexes parallel 4;

Dropping partitions is easy too:

SQL> alter table part drop partition p_del

Table altered.

ORA-15186 with ASMLib and device mapper on Linux

Clumsy title but a problem that can slow you down. When creating ASM disk groups, you might not succeed, ASM complains with ORA-15186. Facts:

  • you are using ASMLib
  • you are using native Linux multipathing (device-mapper-multipath)
  • And you are using Linux
The first port of call should be the alert log of the ASM instance (who wouldn’t check this?). You can find lots of failed calls to ASMLib.

An example alert.log entry could be as follows:

SQL> ALTER DISKGROUP ALL MOUNT
Thu Aug 27 07:36:48 2009
Loaded ASM Library - Generic Linux, version 2.0.4 (KABI_V2) library for asmlib interface
Thu Aug 27 07:36:48 2009
ORA-15186: ASMLIB error function = [asm_open],  error = [1],  mesg = [Operation not permitted]

This is a tricky one as /etc/init.d/oracleasm listdisks reports all your disks:

[oracle@host bdump]$ /etc/init.d/oracleasm listdisks
DATA1
DATA2
FRA1

The first thing which points at something which isn’t quite right is in v$asm_disk:

SQL>  select inst_id,HEADER_STATUS,STATE,NAME,path,LIBRARY
  2*    from gv$asm_disk;
INST HEADER_STATU PATH            LIBRARY
---- ------------ --------------- ------------------------------------------------------------
   2 FOREIGN      /dev/raw/raw2   System
   2 UNKNOWN      ORCL:FRA1       ASM Library - Generic Linux, version 2.0.4 (KABI_V2)
   2 UNKNOWN      ORCL:DATA2      ASM Library - Generic Linux, version 2.0.4 (KABI_V2)
   2 UNKNOWN      ORCL:DATA1      ASM Library - Generic Linux, version 2.0.4 (KABI_V2)
   1 FOREIGN      /dev/raw/raw2   System
   1 PROVISIONED  ORCL:FRA1       ASM Library - Generic Linux, version 2.0.4 (KABI_V2)
   1 MEMBER       ORCL:DATA1      ASM Library - Generic Linux, version 2.0.4 (KABI_V2)
   1 MEMBER       ORCL:DATA2      ASM Library - Generic Linux, version 2.0.4 (KABI_V2)

You can see that the HEADER_STATUS on instance 2 of the ASM cluster is “UNKNOWN”, whereas it’s MEMBER on instance 1. Odd, but simple to fix. Check /etc/sysconfig/oracleasm and make sure to have SCANORDER and SCANEXCLUDE set correctly:

# ORACLEASM_SCANORDER: Matching patterns to order disk scanning
ORACLEASM_SCANORDER="dm-"

# ORACLEASM_SCANEXCLUDE: Matching patterns to exclude disks from scan
ORACLEASM_SCANEXCLUDE="sd"

That’s it – now simply restart ASMLib and it should be able to mount the disk group.

Presentation Update

UNIX SIG Slough 09/2009

This is a novelty to me – going to present twice! The first presentation will deal with Linux Kernel settings and how to tune them, the second one deals with OpenFiler 2.3 and iSCSI on Linux.

RAC & HA SIG 09/2009

And if that wasn’t enough I am also going to present at the Midlands RAC SIG. The subject is the Oracle Linux Test Suite.