Large Pages in Linux are a really interesting topic for me as I really like Linux and trying to understand how it works. Large pages can be very beneficial for systems with large SGAs and even more so for those with large SGA and lots of user sessions connected.
I have previously written about the benefits and usage of large pages in Linux here:
So now as you may know there is a change to the init.ora parameter “use_large_pages” in 11.2.0.3. The parameter can take these values:
SQL> select value,isdefault 2 from V$PARAMETER_VALID_VALUES 3* where name = 'use_large_pages' VALUE ISDEFAULT -------------------- -------------------- TRUE TRUE AUTO FALSE ONLY FALSE FALSE FALSE
There is a new value named “auto” that didn’t exist prior to 11.2.0.3. The intention is to create large pages at instance startup if possible, even if /etc/sysctl.conf doesn’t have an entry for vm.nr_hugepages at all. The risk though is that-as with dynamic creation of large pages by echoing values into /proc/sys/vm/nr_hugepages-is that you get fewer than you expect. Maybe even 0. Now I’m interested to see if that works.
So let’s have a look, my system is Oracle Linux 6.4, 64bit running virtualised. Before any database was started I checked /proc/meminfo
[root@ol64 ~]# cat /proc/meminfo MemTotal: 8192240 kB MemFree: 5090124 kB Buffers: 67408 kB Cached: 2341504 kB SwapCached: 0 kB Active: 816116 kB Inactive: 2055352 kB Active(anon): 548760 kB Inactive(anon): 284304 kB Active(file): 267356 kB Inactive(file): 1771048 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 524284 kB SwapFree: 524284 kB Dirty: 60 kB Writeback: 0 kB AnonPages: 462560 kB Mapped: 334424 kB Shmem: 370516 kB Slab: 103692 kB SReclaimable: 47496 kB SUnreclaim: 56196 kB KernelStack: 2016 kB PageTables: 26008 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 4620404 kB Committed_AS: 3343896 kB VmallocTotal: 34359738367 kB VmallocUsed: 26480 kB VmallocChunk: 34359700348 kB HardwareCorrupted: 0 kB AnonHugePages: 247808 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 8128 kB DirectMap2M: 8380416 kB
I am interested in the HugePages entries here towards the end of the list. If you have ever looked at /proc/meminfo in the previous release’s kernels (2.6.18.x to be precise) then you’ll notice it’s quite different now with a lot more information. Modern kernels are really a great step ahead. Have a look at the Outlook and References section below, this is a somewhat superficial explanation but good enough for the purpose of this article. A future post will go into more detail about SYSFS which is slated to replace the /proc file system and NUMA considerations.
Back to this article … the database I have running in my VM doesn’t use large pages, as shown in the alert.log:
Starting ORACLE instance (normal) ****************** Large Pages Information ***************** Total Shared Global Region in Large Pages = 0 KB (0%) Large Pages used by this instance: 0 (0 KB) Large Pages unused system wide = 0 (0 KB) (alloc incr 16 MB) Large Pages configured system wide = 0 (0 KB) Large Page size = 2048 KB RECOMMENDATION: Total Shared Global Region size is 2514 MB. For optimal performance, prior to the next instance restart increase the number of unused Large Pages by atleast 1257 2048 KB Large Pages (2514 MB) system wide to get 100% of the Shared Global Region allocated with Large pages ***********************************************************
So let’s change that, but dynamically and not manually. Again, a better (more predictable!) approach would be to manually add an additional 1257 large pages to /etc/sysctl.conf as recommended and reboot to ensure that they will be available when the database starts. And probably set use_large_pages to “only” to enforce their usage. But enough warnings that you probably don’t want to use the “auto” feature, I want to see this in real life!
SQL> alter system set use_large_pages=auto; alter system set use_large_pages=auto * ERROR at line 1: ORA-02095: specified initialization parameter cannot be modified SQL> a scope=spfile; 1* alter system set use_large_pages=auto scope=spfile SQL> / System altered.
As you can see the parameter is static and requires an instance restart, so this is what I did next. Here is an interesting side effect of setting the parameter to “auto”: it doesn’t have an effect if you didn’t prepare the system for use of large pages in /etc/security/limits.conf. You could think that the oracle-preinstall RPM does so, but it misses the settings for “memlock”. Here is proof nothing happened:
[root@ol64 ~]# cat /proc/meminfo MemTotal: 8192240 kB MemFree: 5090124 kB Buffers: 67408 kB Cached: 2341504 kB SwapCached: 0 kB Active: 816116 kB Inactive: 2055352 kB Active(anon): 548760 kB Inactive(anon): 284304 kB Active(file): 267356 kB Inactive(file): 1771048 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 524284 kB SwapFree: 524284 kB Dirty: 60 kB Writeback: 0 kB AnonPages: 462560 kB Mapped: 334424 kB Shmem: 370516 kB Slab: 103692 kB SReclaimable: 47496 kB SUnreclaim: 56196 kB KernelStack: 2016 kB PageTables: 26008 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 4620404 kB Committed_AS: 3343896 kB VmallocTotal: 34359738367 kB VmallocUsed: 26480 kB VmallocChunk: 34359700348 kB HardwareCorrupted: 0 kB AnonHugePages: 247808 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 8128 kB DirectMap2M: 8380416 kB
HugePagesTotal is still 0, which didn’t surprise me. To allow oracle to lock memory you need to grant it the privilege. I had to edit /etc/security/limits.conf and set the memlock parameter to 5GB which is too high for my 2.5 GB SGA but setting the value a little too high doesn’t hurt at all either. The value is in kb by the way.
oracle soft memlock 5242880 oracle hard memlock 5242880
After logging out and back in as oracle I tried once more and hey-success!
Starting ORACLE instance (normal) DISM started, OS id=11969 ****************** Large Pages Information ***************** Parameter use_large_pages = AUTO Total Shared Global Region in Large Pages = 2514 MB (100%) Large Pages used by this instance: 1257 (2514 MB) Large Pages unused system wide = 0 (0 KB) (alloc incr 16 MB) Large Pages configured system wide = 1257 (2514 MB) Large Page size = 2048 KB Time taken to allocate Large Pages = 0.130804 sec *********************************************************** LICENSE_MAX_SESSION = 0
Also notice the DISM process here which is responsible for creating the large pages on the fly. This is an interesting “background process”, and Tanel Poder has mentioned it in one of his presentations already:
[root@ol64 ~]# ps -ef | grep 11969 root 11969 1 0 12:50 ? 00:00:00 ora_dism_ora11 root 12026 11911 0 12:50 pts/3 00:00:00 grep 11969 [root@ol64 bin]# ls -l $ORACLE_HOME/bin/oradism -rwsr-x---. 1 root oinstall 71758 Sep 17 2011 oradism
It is owned by root with the setuid flag set … easy to miss when cloning a home …
Once the large pages are created, the process disappears when you start the instance a second time, and there is no mention of it in the alert.log pertaining to the startup sequence. But it has done its work.
[root@ol64 ~]# grep -i page /proc/meminfo AnonPages: 346808 kB PageTables: 12852 kB AnonHugePages: 208896 kB HugePages_Total: 1257 HugePages_Free: 1045 HugePages_Rsvd: 1045 HugePages_Surp: 0 Hugepagesize: 2048 kB
Notice that not all pages are actually in use yet, I only just started the database. Don’t worry though, 100% of the SGA are allocated in large pages as per the alert.log. Over time you will notice more and more pages being in use.
Now you can of course force the database to touch all these pages, but it’s another question whether that is a good idea. You probably don’t want to do so if you have a large SGA, the startup time can be very long. For the sake of completeness I added this here though to show you the effect in /proc/meminfo. I set pre_page_sga = true and bounced the instance:
[root@ol64 ~]# grep -i page /proc/meminfo AnonPages: 370632 kB PageTables: 15272 kB AnonHugePages: 206848 kB HugePages_Total: 1257 HugePages_Free: 3 HugePages_Rsvd: 3 HugePages_Surp: 0 Hugepagesize: 2048 kB [root@ol64 ~]#
Now all pages are allocated straight after instance start. If you want to follow the example, I suggest you use the watch command as shown here:
watch grep -i page /proc/meminfo
Summary and a bit of warning
I personally wouldn’t rely on use_large_pages = auto in an environment I care about. It’s simply too unpredictable that you get the large pages requested, and you might fall back into 4k page mode. Planning is better than hoping-calculate the number of large pages beforehand, add them to /etc/sysctl.conf in vm.nr_hugepages and you should be almost guaranteed to have them allocated. Large pages need enough contiguous memory or otherwise the allocation may (partially) fail.
Also – large pages cannot be swapped out during memory pressure. Don’t forget you still need enough space for the PGAs and operating system! If the system starts swapping although “free” shows a lot of free memory then most likely you are using up all the 4k pages in memory.
Outlook
There is even more to be said about large pages on systems with more than 2 sockets, especially when it comes to allocating large pages per NUMA node. I’ll leave that for a future post.
Oh and yes, the large page information in /proc is only a legacy, it’s now all in SYSFS in /sys/kernel/mm/hugepages/hugepages-2048kB. Intel x86-64 supports three different page sizes: 1GB, 2048kb and 4kb.
Reference
Martin, great post, thank you. For the sake of completeness in your blog, MOS note 401749.1 contains a script you could use to calculate vm.nr_hugepages; note 361468.1 provides all the basics on configuration.
One question for you:
How do you determine the performance delta, once you’ve implemented HugePages? I can see all the wonderful reasons why one should move (and, in fact, we moved to HugePages a year ago, which was also a great excuse to get away from AMM). But are there any stats you know of that can tell us how smart or dumb we were in making the change?
Hi Dave,
thanks for the link to the MOS note and your comments.
Large pages don’t make applications go faster per se, they reduce overhead. Let me explain.
The immediate benefit of implementing large pages is that the kernel doesn’t need to keep track of so many small (4k) pages as compared to 2M pages. If you divide the total amount of memory by the page size you should see the benefit. Internally the kernel needs to maintain track of all pages and their state (free, dirty, …).
There are also implications about creating a copy of the page table if a process issues a fork() command. In simple terms, the bigger your SGA and the higher the number of sessions then the greater the benefit of large pages.
Let me know if you’d like to discuss further,
Martin
Thanks Martin… I think it was poor word choice when I said “performance delta”. What I’m really getting at is, how can we measure the effect on the system, with and without HugePages?
I agree, it’s more efficient to use hugePages, overhead is reduced, etc. But is there a measurable impact? (If we were talking about improving a query’s performance, we could say, building an index consumes 1GB but saves 2 minutes every time the query is run. Is there something similar we can apply to a kernel setting like this?)
By the way, if you do write about what happens when fork() is called and hugepages is implemented, I’d love to read it. Thanks.
Well as always it depends – I have done some testing but didn’t have time to write it up yet. You should check this page for a performance comparison:
http://www.csn.ul.ie/~mel/docs/stream-api/
Hope this helps.
Pingback: SGA bigger than than the amount of HugePages configured (Linux – 11.2.0.3) | Tanel Poder's blog: IT & Mobile for Geeks and Pros
Hi Martin
Interesting. Did you manage to find the exact description of the ‘AUTO’ parameter value in the Oracle Documentation?
Radu
Say ‘good-bye’ to the AUTO value :-( …
http://flado.blogspot.de/2014/11/ora-27107-farewell-automatic-huge-pages.html
Great post.