Huge Pages and Linux-Real World example
Posted by Martin Bach on April 19, 2010
A common situation for many DBAs: all over sudden you are tasked to look at a database, and you are told you inherit it. Of course, a lot of problems exist with it, and you are supposed to fix them all. A few weeks ago this happened to me.
The system is a Sun 4660 x86-64 server with Red Hat5.3, and it has 64GB of memory, 8 dual core Opteron 8218 processors. SGA_TARGET was set to 40G, and PGA_AGGREGATE_TARGET to 5GB. That sounds like plenty, however the box was very busy trying to free up memory:
top - 12:16:11 up 23 days, 23:46, 19 users, load average: 28.84, 25.69, 23.19 Tasks: 970 total, 3 running, 967 sleeping, 0 stopped, 0 zombie Cpu(s): 9.3%us, 13.0%sy, 0.0%ni, 36.0%id, 40.6%wa, 0.1%hi, 1.0%si, 0.0%st Mem: 66068664k total, 65992772k used, 75892k free, 43168k buffers Swap: 2096472k total, 2096472k used, 0k free, 40782300k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1068 root 20 -5 0 0 0 R 76.0 0.0 566:03.07 [kswapd5] 13284 oracle 18 0 40.3g 6.3g 6.2g D 31.5 10.0 28:30.12 oraclePROD (LOCAL=NO) 1070 root 10 -5 0 0 0 D 21.6 0.0 925:57.03 [kswapd7] 1065 root 10 -5 0 0 0 D 13.8 0.0 196:27.93 [kswapd2] 7771 oracle 18 0 40.6g 4.5g 4.1g D 12.1 7.1 26:05.04 oraclePROD (LOCAL=NO) 8073 oracle 16 0 40.2g 2.8g 2.8g D 12.1 4.4 5:29.79 oraclePROD (LOCAL=NO) 1066 root 10 -5 0 0 0 S 11.8 0.0 84:57.85 [kswapd3] 1067 root 10 -5 0 0 0 D 10.5 0.0 165:39.98 [kswapd4] 1069 root 10 -5 0 0 0 S 7.9 0.0 277:15.84 [kswapd6]
That looked really bad-load average far too high caused by kswapd trying to free memory. It was actually quite difficult for me to connect through ssh. So what’s using all this memory? And why is the swap size only 2 GB? This was the first thing to fix-but adding swap space is still not the solution, it prevents the box from crashing though which is good.
However we couldn’t account for a lot of the used memory initially. Theoretically, the server’s 64GB memory are used by 40G for the SGA and 5G for the PGA. We checked and the PGA didn’t exceed 3G. That makes for about 44G allocated, yet there was hardly any free space:
total used free shared buffers cached Mem: 64520 64443 76 0 106 43762 -/+ buffers/cache: 20574 43946 Swap: 2047 2047 0
Looking at the /proc/meminfo which I unfortunately no longer have I could make out that the page tables used 22G:
cat /proc/meminfo | grep PageTables: 23418712 kB
HugePagesTotal of course returned 0. The question was: why is that number so hugely inflated? Remember that the standard page size for Linux x86-64 is 4k-huge pages are 2M in size. With huge pages in use, the length of the data structure to be maintained in the kernel for used and free pages is a lot shorter, read more efficient, and smaller in size.But that doesn’t really explain where the 22G went.
We checked the number of attached processes to the SGA and found around 700 in the nattach column of the ipcs command output which gave us the solution. For each process which maps the SGA into it’s own virtual memory address space, a new set of page table entries are required, so the pagetable memory space requirement is SGA size multiplied by attached processes.
A two pronged approach has been chosen here:
- We implement huge pages to reduce the memory preassure
- We try to find ways reducing the number of Oracle client processes
The first is of course easier to achieve, for the second one I need a lot of persuasion power and management buy in (read it’s the long term solution).
Configuring Huge Pages
So the immediate need was to configure huge pages for the system; using the calc hugepages shell script from Metalink we gathered the required number of huge pages. The number of huge pages was added in /etc/sysctl.conf in the vm.nr_hugepages parameter and we set soft and hard memlock limits in /etc/security/limits.conf
After restarting the servers to pick up the huge pages information we got the following information from meminfo:
[oracle@server ~]$ cat /proc/meminfo MemTotal: 66068668 kB MemFree: 17265724 kB Buffers: 1570656 kB Cached: 766640 kB SwapCached: 0 kB Active: 5032060 kB Inactive: 770412 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 66068668 kB LowFree: 17265724 kB SwapTotal: 16776528 kB SwapFree: 16776528 kB Dirty: 4164 kB Writeback: 0 kB AnonPages: 3541624 kB Mapped: 76484 kB Slab: 562968 kB PageTables: 216372 kB NFS_Unstable: 0 kB Bounce: 0 kB CommitLimit: 28775852 kB Committed_AS: 15617048 kB VmallocTotal: 34359738367 kB VmallocUsed: 86764 kB VmallocChunk: 34359650291 kB HugePages_Total: 20542 HugePages_Free: 1465 HugePages_Rsvd: 1404 Hugepagesize: 2048 kB
No swap in use, load average (not shown) down to normal levels and free memory!
[oracle@server ~]$ free -m total used free shared buffers cached Mem: 64520 47781 16738 0 1547 747 -/+ buffers/cache: 45487 19033 Swap: 16383 0 16383
I consider this a success :)