Increasing the maximum I/O size in Linux

This post is really a quick note to myself to remind me how to bump up the maximum IO size on Linux. I have been benchmarking a bit lately and increasing the maximum size of an I/O request from 512kb to 1024kb looked like something worth doing. Especially since it’s also done in Exadata :)

So why would it matter? It matters most for Oracle DSS systems, actually. Why? Take ORION for example-although it’s using mindless I/O as explained by @flashdba and @kevinclosson, at least it gives me a quick way to test the size of a typical I/O request. Let me demonstrate:

[oracle@server1 ~]$ $ORACLE_HOME/bin/orion -run oltp -testname foo
...

So ORION is now busying itself producing numbers based on 8k reads. How do I know? Because I’m using collectl (https://martincarstenbach.wordpress.com/2011/08/05/an-introduction-to-collectl/) in another session:

[oracle@server1 ~]$ perl collectl.pl -sD --iosize --dskfilt sdc
waiting for 1 second sample...

# DISK STATISTICS (/sec)
#          <---------reads---------><---------writes---------><--------averages--------> Pct
#Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize  QLen  Wait SvcTim Util
sdc          62464      0 7808    8       0      0    0    0       8     1     0      0   63
sdc          63280      0 7910    8       0      0    0    0       8     1     0      0   57
sdc          64048      0 8005    8       0      0    0    0       8     1     0      0   60
^COuch!

Look at the IO size: 8k just as you would expect. By the way ORION is slightly unrealistic in the OLTP case because it uses io_submit and io_getevents instead of pread but that’s a different story altogether for another blog post.

Now I’d like to run the DSS test. Firing off ORION again, this time with “-run dss” I observe the following:

$ perl collectl.pl -sD --iosize --dskfilt sdc
waiting for 1 second sample...

# DISK STATISTICS (/sec)
#          <---------reads---------><---------writes---------><--------averages--------> Pct
#Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize  QLen  Wait SvcTim Util
sdc         1995776      0 3898  512       0      0    0    0     512     1     0      0   73
sdc         2089984      0 4082  512       0      0    0    0     512     1     0      0   77
sdc         1976320      0 3860  512       0      0    0    0     512     1     0      0   73

Aha, my IO size is 512k. That’s good, but not good enough. After what seemed like a very long time and digging through scheduler code on lxr.linux.no and discussing with friends I found out that the clue is in the SYSFS. Here you find lots of parameters determining how the kernel issues IO against a block device. You define the IO scheduler, the queue depth, sector size (those 4k sectors and Oracle!) and other parameters. Amongst these the following are of relevance to this post:

– max_sectors_kb
– max_hw_sectors_kb

max_sectors_kb defines the maximum I/O size, and it logically cannot exceed its maximum. I found 1MB the maximum for many spinning disks, my SATA SSD has more.

Now when I bump that value up to 1M I get larger IO sizes with the DSS benchmark:

$ perl collectl.pl -sD --iosize --dskfilt sdc
waiting for 1 second sample...

# DISK STATISTICS (/sec)
#          <---------reads---------><---------writes---------><--------averages--------> Pct
#Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize  QLen  Wait SvcTim Util
sdc         1927047      0 1882 1024       0      0    0    0    1024     1     0      0   51
sdc         1906688      0 1862 1024       0      0    0    0    1024     1     0      0   51

Thanks also to Frits Hoogland for helping me understand Linux IO better!

Responses

Greg Rahn

July 4, 2013

Would be useful to explain what you actually did. What were the values for max_sectors_kb & max_hw_sectors_kb, etc. before and after?
1. Martin Bach
  
  July 4, 2013
  
  Hi Greg,
  
  thanks for passing by! The change to the IO size was quite simple actually:
  
  – cd to the /sys/block/sdc/queue directory, where sdc was the block device I used for the benchmark. You see it’s not very scientific since it is just a single path but this was my internal SSD in the laptop.
  – cat max_hw_sectors_kb
  – cat max_sectors_kb
  
  The max_hw_sectors kb as per the Linux documentation is the maximum IO size the device (sdc in this case) can digest. For the SATA6G SSD in the laptop that was something in the range of many MB. Initially max_sectors_kb was set to 512 which appears to be the SCSI driver’s default.
  
  Now to change these values you either simply echo 1024 > max_sectors_kb or write a persistent udev rule. This way the kernel can use 1MB I/Os; I should have actually straced the system call to show the arguments to io_submit in ORION.
Pter

July 5, 2013

Hi,

This may be really dump question, but what actual benefits did you get from ‘upping’ the I/O-size?

/Peter
1. Martin Bach
  
  July 5, 2013
  
  Peter,
  
  there are no stupid questions :)
  
  Well by submitting larger I/Os you get more throughput.
A closer look at ASM rebalance, Part I: Disks have been added | bdt's oracle blog

August 25, 2014

[…] guess this is somehow related to the max_sectors_kb and max_hw_sectors_kb SYSFS parameters. It will be the subject of another […]
ASM Rebalance: Why is the avg By/Read equal to 1MB while the allocation unit is 4MB? | bdt's oracle blog

September 3, 2014

[…] But now one more question: What is the impact of the Linux Maximum IO size? […]
adhikarexuss

November 28, 2014

Hello Martin,

Can this be done on disks which are going to be used for ASM and configured by oracleasm (ASMLib)?

Thank you,
Adhika
1. Martin Bach
  
  December 14, 2014
  
  Hi Adhika,
  
  it is my understanding that this works with any physical device in Linux, regardless what accesses it. The setting is on the operating system.
  
  Martin