Martins Blog

Trying to explain complex things in simple terms

Archive for the ‘Linux’ Category

Linux large pages and non-uniform memory distribution

Posted by Martin Bach on May 17, 2013

In my last post about large pages in 11.2.0.3 I promised a little more background information on how large pages and NUMA are related.

Background and some history about processor architecture

For quite some time now the CPUs you get from AMD and Intel both are NUMA, or better: cache coherent NUMA CPUs. They all have their own “local” memory directly attached to them, in other words the memory distribution is not uniform across all CPUs. This isn’t really new, Sequent has pioneered this concept on x86 a long time ago but that’s in a different context. You really should read Scaling Oracle 8i by James Morle which has a lot of excellent content related to NUMA in it, with contributions from Kevin Closson. It doesn’t matter that it reads “8i” most of it is as relevant today as it was then.

So what is the big deal about NUMA architecture anyway? To explain NUMA and why it is important to all of us a little more background information is on order.

Some time ago processor designers and architects of industry standard hardware could no longer ignore the fact that a front side bus (FSB) proved to be a bottleneck. There were two reasons for this: it was a) too slow and b) too much data had to go over it. As one direct consequence DRAM memory has been directly attached to the CPUs. AMD has done this first with it’s Opteron processors in its AMD64 micro architecture, followed by Intel’s Nehalem micro architecture. By removing the requirement of data retrieved from DRAM to travel across a slow bus latencies could be removed.

Now imagine that every processor has a number of memory channels to which DDR3 (DDR4 could arrive soon!) SDRAM is attached to. In a dual socket system, each socket is responsible for half the memory of the system. To allow the other socket to access the corresponding other half of memory some kind of interconnect between processors is needed. Intel has opted for the Quick Path Interconnect, AMD (and IBM for p-Series) use Hyper Transport. This is (comparatively) simple when you have few sockets, up to 4 each socket can directly connect to every other without any tricks. For 8 sockets it becomes more difficult. If every socket can directly communicate with its peers the system is said to be glue-less which is beneficial. The last production glue-less system Intel released was based on the Westmere architecture. Sandy Bridge (current until approximately Q3/2013) didn’t have an eight-way glue-less variant, and this is exactly why you get Westmere-EX in the X3-8, and not Sandy Bridge as in the X3-2.

Anyway, your system will have local and remote memory. For most of us, we are not going to notice this at all since there is little point in enabling NUMA on systems with two sockets. Oracle still recommends that you only enable NUMA on 8 way systems, and this is probably the reason the oracle-validated and preinstall RPMs add “numa=off” to the kernel command line in your GRUB boot loader.

Read the rest of this entry »

Posted in Linux, VLDB | 2 Comments »

More on use_large_pages in Linux and 11.2.0.3

Posted by Martin Bach on May 13, 2013

Large Pages in Linux are a really interesting topic for me as I really like Linux and trying to understand how it works. Large pages can be very beneficial for systems with large SGAs and even more so for those with large SGA and lots of user sessions connected.

I have previously written about the benefits and usage of large pages in Linux here:

So now as you may know there is a change to the init.ora parameter “use_large_pages” in 11.2.0.3. The parameter can take these values:

SQL> select value,isdefault
  2  from V$PARAMETER_VALID_VALUES
  3* where name = 'use_large_pages'

VALUE		     ISDEFAULT
-------------------- --------------------
TRUE		     TRUE
AUTO		     FALSE
ONLY		     FALSE
FALSE		     FALSE

There is a new value named “auto” that didn’t exist prior to 11.2.0.3. The intention is to create large pages at instance startup if possible, even if /etc/sysctl.conf doesn’t have an entry for vm.nr_hugepages at all. The risk though is that-as with dynamic creation of large pages by echoing values into /proc/sys/vm/nr_hugepages-is that you get fewer than you expect. Maybe even 0. Read the rest of this entry »

Posted in Linux | 3 Comments »

Grid Infrastructure And Database High Availability Deep Dive Seminars 2013

Posted by Martin Bach on April 24, 2013

So this is a little bit of a plug for myself and Enkitec but I’m running my Grid Infrastructure And Database High Availability Deep Dive Seminars again for Oracle University. This time these events are online, so no need to come to a classroom at all.

Here is the short description of the course:

Providing a highly available database architecture fit for today’s fast changing requirements can be a complex task. Many technologies are available to provide resilience, each with its own advantages and possible disadvantages. This seminar begins with an overview of available HA technologies (hard and soft partitioning of servers, cold failover clusters, RAC and RAC One Node) and complementary tools and techniques to provide recovery from site failure (Data Guard or storage replication).

In the second part of the seminar, we look at Grid Infrastructure in great detail. Oracle Grid Infrastructure is the latest incarnation of the Clusterware HA framework which successfully powers every single 10g and 11g RAC installation. Despite its widespread implementation, many of its features are still not well understood by its users. We focus on Grid Infrastructure, what it is, what it does and how it can be put to best use, including the creation of an active/passive cold failover cluster for web and database resources.

If you are interested I would like to invite you to head over to the Oracle University website here which has a more extensive synopsis and all the detail you need:

http://education.oracle.com/pls/web_prod-plq-dad/db_pages.getCourseDesc?dc=D81641_2043034

UPDATE: I received several emails and comments that the above link does not work. I couldn’t reproduce this until today. It appears to be an issue with the country selection. If you have USA selected in the top right corner the link won’t work, switching to United Kingdom (my preference) will fetch the course detail. I don’t quite understand as to why that is the case since the class is virtual and not depending on a country…

I hope to hear from you during the course!

Posted in 11g Release 2, Linux, Public Appearances | 7 Comments »

Oracle Linux support in ESXi

Posted by Martin Bach on January 3, 2013

For quite some time now I am using ESXi 5 update 1 for my lab server and I’m very happy with it. In my lab environment I am not too picky what to run and do not worry about support too much. It’s not production!

One area of concern has been the support for Oracle’s own kernel: UEK or Unbreakable Enterprise Kernel. UEK comes in two editions, one based on 2.6.32, just like Red Hat’s kernel for Red Hat 6. The difference is that you can get UEK/1 (2.6.32.xxx) for Oracle Linux 5.x as well instead of 2.6.18xxx which is otherwise the default.

Oracle’s second iteration of kernel UEK is unsurprisingly named UEK2 and it’s initially based on 3.x but keeps the name to 2.6.39.x for compatibility reasons. UEK2 has some really nice features taken from the Upstream kernel and it is also supported for the Oracle database.

Until not too long ago UEK was not supported by VMware ESXi, but this has changed without me taking notice at first. Thanks to a tweet by @dba_emc2 (Allan Robertson) I learned more about the change in the support policy. One interesting blog post from VMware is found here:

http://blogs.vmware.com/guestosguide/2012/09/unbreakable-enterprise-kernel-for-oracle-linux.html

This post only mentions UEK, but does not clearly state whether UEK or UEK2 or both will be supported. The VMware Compatibility Guide has more information at http://www.vmware.com/resources/compatibility/search.php

  • In the search, enter “unbreakable” to be directed the the relevant certification information
  • It turns out it is (at the time of this writing) UEK 2 actually which is great news for me!
  • Supported versions of ESXi are 5 u1, 5u2 and 5.1 at the time of writing
  • Support date is listed as 09/2012
  • There are even specific installation instruction but they don’t go over and above what you would normally do
  • What’s very interesting is that the paravirtualised drivers for SCSI (vSCSI) and VMXNet3 are supported too
  • You can also add virtual CPUs and memory while the VM is up (with the proper VM settings, I think hot-adding these is deactivated by default)

Enjoy! I will try to install Oracle Linux 6.3 – which is the first to my knowledge that boots UEK2 by default – next and install the VMware tools. Let’s see how that goes.

Posted in ESX, Linux | 3 Comments »

Adding disks to VMware Workstation 8 on the fly on RHEL 6

Posted by Martin Bach on December 6, 2012

Although this post is primarily written for users of VMware Workstation 8 it is applicable for any RedHat 6 clone and adding disks with single path on the fly. Multipathing requires additional setup in dm-multipath or the vendor multipathing software which I won’t cover here. A quick hint though: you need to set disk.EnableUUID = “TRUE” in your VM’s config file for scsi_id to return a value.

The situation is common: you created a virtual machine and need more storage. Hopefully you created it using LVM which would allow you to add the new disk to an existing volume group followed by a resize operation of the logical volume which is short on space. But before you can do this you have to add a new LUN to your setup-here is how you can do this without rebooting the VM.

First I recommend you install lsscsi (for convenience, not really necessary) and the sg3_utils:

[root@server1 ~]# yum install lsscsi.x86_64 sg3_utils.x86_64
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package lsscsi.x86_64 0:0.23-2.el6 will be installed
---> Package sg3_utils.x86_64 0:1.28-4.el6 will be installed
--> Processing Dependency: sg3_utils-libs = 1.28-4.el6 for package: sg3_utils-1.28-4.el6.x86_64
--> Processing Dependency: libsgutils2.so.2()(64bit) for package: sg3_utils-1.28-4.el6.x86_64
--> Running transaction check
---> Package sg3_utils-libs.x86_64 0:1.28-4.el6 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

========================================================================================================================================================================
 Package                                      Arch                                 Version                                    Repository                           Size
========================================================================================================================================================================
Installing:
 lsscsi                                       x86_64                               0.23-2.el6                                 local                                38 k
 sg3_utils                                    x86_64                               1.28-4.el6                                 local                               470 k
Installing for dependencies:
 sg3_utils-libs                               x86_64                               1.28-4.el6                                 local                                51 k

Transaction Summary
========================================================================================================================================================================
Install       3 Package(s)

Total download size: 559 k
Installed size: 1.4 M
Is this ok [y/N]: y
Downloading Packages:
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total                                                                                                                                    25 MB/s | 559 kB     00:00
Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
  Installing : sg3_utils-libs-1.28-4.el6.x86_64                                                                                                                     1/3
  Installing : sg3_utils-1.28-4.el6.x86_64                                                                                                                          2/3
  Installing : lsscsi-0.23-2.el6.x86_64                                                                                                                             3/3

Installed:
  lsscsi.x86_64 0:0.23-2.el6                                                        sg3_utils.x86_64 0:1.28-4.el6

Dependency Installed:
  sg3_utils-libs.x86_64 0:1.28-4.el6

Complete!

With this done it is easy to check the attached SCSI devices, and on my system I found these:

[root@server1 ~]# lsscsi
[0:0:0:0] disk VMware, VMware Virtual S 1.0 /dev/sda
[0:0:1:0] disk VMware, VMware Virtual S 1.0 /dev/sdb
[2:0:0:0] cd/dvd NECVMWar VMware IDE CDR10 1.00 /dev/sr0

This means there are two virtual disks and a virtual CD-ROM. Now it’s time to add the new disk. Do so using the user VMware workstation interface. Once completed, you need to rescan the scsi bus:

[root@server1 ~]# rescan-scsi-bus.sh
Host adapter 0 (mptspi) found.
Host adapter 1 (ata_piix) found.
Host adapter 2 (ata_piix) found.
Scanning SCSI subsystem for new devices
Scanning host 0 for  SCSI target IDs  0 1 2 3 4 5 6 7, all LUNs
Scanning for device 0 0 0 0 ...
OLD: Host: scsi0 Channel: 00 Id: 00 Lun: 00
      Vendor: VMware,  Model: VMware Virtual S Rev: 1.0
      Type:   Direct-Access                    ANSI SCSI revision: 02
Scanning for device 0 0 1 0 ...
OLD: Host: scsi0 Channel: 00 Id: 01 Lun: 00
      Vendor: VMware,  Model: VMware Virtual S Rev: 1.0
      Type:   Direct-Access                    ANSI SCSI revision: 02
Scanning for device 0 0 2 0 ...
NEW: Host: scsi0 Channel: 00 Id: 02 Lun: 00
      Vendor: VMware,  Model: VMware Virtual S Rev: 1.0
      Type:   Direct-Access                    ANSI SCSI revision: 02
Scanning for device 0 0 2 0 ...
OLD: Host: scsi0 Channel: 00 Id: 02 Lun: 00
      Vendor: VMware,  Model: VMware Virtual S Rev: 1.0
      Type:   Direct-Access                    ANSI SCSI revision: 02
Scanning host 1 channels  0 for  SCSI target IDs  0 1 2 3 4 5 6 7, all LUNs
Scanning host 2 channels  0 for  SCSI target IDs  0 1 2 3 4 5 6 7, all LUNs
Scanning for device 2 0 0 0 ...
OLD: Host: scsi2 Channel: 00 Id: 00 Lun: 00
      Vendor: NECVMWar Model: VMware IDE CDR10 Rev: 1.00
      Type:   CD-ROM                           ANSI SCSI revision: 05
0 new device(s) found.
0 device(s) removed.

A new disk has been found, as shown by lscsi:

[root@server1 ~]# lsscsi
[0:0:0:0]    disk    VMware,  VMware Virtual S 1.0   /dev/sda
[0:0:1:0]    disk    VMware,  VMware Virtual S 1.0   /dev/sdb
[0:0:2:0]    disk    VMware,  VMware Virtual S 1.0   /dev/sdc
[2:0:0:0]    cd/dvd  NECVMWar VMware IDE CDR10 1.00  /dev/sr0
[root@server1 ~]#

Very nice-from there on it’s just another disk you can format using parted or fdisk. And you are done.

Posted in Linux | Tagged: | 5 Comments »

Get a feel for enterprise block level replication using drbd

Posted by Martin Bach on July 2, 2012

I didn’t really have a lot of exposure to block-level replication on the storage level before an engagement in the banking industry. I’m an Oracle DBA, and I always thought: why would I want to use anything but Oracle technology for replicating my data from one data centre to another? I need to be in control! I want to see what’s happening. Why would I prefer storage replication over Data Guard?

For a great many sites Data Guard is indeed all you need. Especially if you don’t have a storage array with a replication option. But many large enterprises have historically used large storage area networks with many enterprise features, including block level replication from array to array. They all come under their own name, and all major storage vendors have them. With the risk of speaking too generally, all of the block level replication allows you to somehow copy data from array A in data centre A to array B in data centre B. The data centres are usually geographically dispersed so as to avoid the impact of catastrophes. The storage replication happens without any DBA intervention or even visibility, harking back to the 90s mantra of “storage administrator does storage, system administrator does the OS and the database administrator works on the database”. I have written about this in the context of Exadata before. Read the rest of this entry »

Posted in Linux, Oracle | Tagged: , | 4 Comments »

How to set up data guard broker for RAC

Posted by Martin Bach on June 29, 2012

This is pretty much a note to myself on how to set up Data Guard broker for RAC 11.2.0.2+. The tests have been performed on Oracle Linux 5.5 with the Red Hat Kernel. Oracle was 11.2.0.2. Sadly my lab server didn’t support more than 2 RAC nodes, so everything has been done on the same cluster. It shouldn’t make a difference though. If it does, please let me know).

WARNING: there are some rather deep changes to the cluster here, be sure to have proper change control around making such amendments as it can cause outages! Nuff said.

Unfortunately I didn’t take notes of the configuration as it was before, so the post is going to be a lot shorter and less dramatic, but it’s useful as a reference (I hope) nevertheless. Now what’s the situation? Imagine you have a two node RAC cluster with separation of duties in place-”grid” owns the GRID_HOME, while “oracle” owns the RDBMS binaries. Imagine further you have two RAC database, ORCL and STDBY. STDBY has only just been duplicated for standby, so there is nothing in place which links the two together.

Read the rest of this entry »

Posted in 11g Release 2, Data Guard, Linux | 2 Comments »

How to use vi-style editing in SQL*Plus

Posted by Martin Bach on June 14, 2012

This post is nothing new, and I created it after a little discussion on twitter about how to use readline support in SQL*Plus. The idea is not new, and I have compiled and used rlwrap for quite some time.

At the time, Frits Hoogland asked me why I didn’t use the EPEL package-and I had to admit to myself that I didn’t know the Extra Package for Enterprise Linux repository at all. But there is more to rlwrap and Linux I didn’t know, but first things first.

Installing rlwrap from EPEL

This is really simple-you can either add the EPEL repository to your /etc/yum.repos.d/ directory or simply download the rlwrap package and install it via RPM. A simple wget on your host does the trick. You can set environment variables when you’d like to use a proxy as shown here:

$ export http_proxy=http://your.proxy.server:proxyPort/
$ export https_proxy=https://your.proxy.server:proxyPort/

Depending your release of Enterprise Linux, you can find the rlwrap package here:

Then wget should download the file for you, at the time  of writing 0.37 was current.

Read the rest of this entry »

Posted in Linux, Oracle | Tagged: , | 5 Comments »

Kernel UEK 2 on Oracle Linux 6.2 fixed lab server memory loss

Posted by Martin Bach on June 13, 2012

A few days ago I wrote about my new lab server and the misfortune with kernel UEK (aka 2.6.32 + backports). It simply wouldn’t recognise the memory in the server:

# free -m
             total       used       free     shared    buffers     cached
Mem:          3385        426       2958          0          9        233
-/+ buffers/cache:        184       3200
Swap:          511          0        511

Ouch. Today I gave it another go, especially since my new M4 SSD has arrived. My first idea was to upgrade to UEK2. And indeed, following the instructions on Wim Coekaerts’s blog (see references), it worked:

[root@ol62 ~]# uname -a
Linux ol62.localdomain 2.6.39-100.7.1.el6uek.x86_64 #1 SMP Wed May 16 04:04:37 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
[root@ol62 ~]# free -m
             total       used       free     shared    buffers     cached
Mem:         32221        495      31725          0          5         34
-/+ buffers/cache:        456      31764
Swap:          511          0        511

Note the 2.6.39-100.7.1! It’s actually past that and version 3.x, but to preserve compatibility with a lot of software parsing the kernel revision number in 3 tuples Oracle decided to stick with 2.6.39. But then the big distributions don’t really follow the mainstream kernel numbers anyway.

Now if anyone could tell me if UEK2 is out of beta? I know it’s not supported for the database yet, but it’s a cool kernel release and I can finally play around with the “perf” utility Kevin Closson and Frits Hoogland have mentioned so much about recently. Read the rest of this entry »

Posted in 11g Release 2, Linux | Tagged: | 4 Comments »

Performance testing with Virident PCIe SCM

Posted by Martin Bach on May 25, 2012

Thanks to the kind introduction from Kevin Closson I was given the opportunity to benchmark the Virident PCIe flash cards. I have written a little review of the testing conducted, mainly using SLOB. To my great surprise Virident gave me access to a Westmere-EP system running a top of the line 2s12c24t system with lots of memory.

In summary the testing shows that the “flash revolution” is happening, and that there are lots of vendors out there building solutions for HPC and Oracle database workloads alike. Have a look at the attached PDF for the full story if you are interested. When looking at the numbers please bear in mind it was a two socket system! I’m confident the server could not max out the cards.

Full article:

Virident testing martin bach consulting

Posted in 11g Release 2, Automatic Storage Management, Linux | 1 Comment »

 
Follow

Get every new post delivered to your Inbox.

Join 1,148 other followers