Martins Blog

Trying to explain complex things in simple terms

An interesting problem with ext4 on Oracle Linux 5.5

Posted by Martin Bach on November 4, 2011

I have run into an interesting problem with my Red Hat 5.5 installation. Naively I assumed that ext4 has been around for a long time it would be stable. For a test I performed for a friend, I created my database files on a file system formatted with ext4 and mounted it the same way I would have mounted an ext3 file system:

$ mount | grep ext4
/dev/mapper/mpath43p1 on /u02/oradata type ext4 (rw)

Now when I tried to create a data file within a tablespace of a certain size, I got block corruption which I found very interesting. My first thought was: you must have a corruption of the file system. So I shut down all processes accessing /u02/oradata and gave the file system a thorough checking.

# umount /u02/oradata
#
# fsck.ext4 -cfv /dev/mapper/mpath43p1
e4fsck 1.41.9 (22-Aug-2009)
Checking for bad blocks (read-only test): done
/dev/mapper/mpath43p1: Updating bad block inode.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

/dev/mapper/mpath43p1: ***** FILE SYSTEM WAS MODIFIED *****

42 inodes used (0.00%)
14 non-contiguous files (33.3%)
0 non-contiguous directories (0.0%)
# of inodes with ind/dind/tind blocks: 0/0/0
Extent depth histogram: 18/14
3679655 blocks used (61.82%)
0 bad blocks
5 large files

26 regular files
7 directories
0 character device files
0 block device files
0 fifos
0 links
0 symbolic links (0 fast symbolic links)
0 sockets
--------
33 files
#

Have a look at the command line options and make sure you understand them before letting fsck loose on your file systems! By the way if Linux tells you the file system cannot be unmounted, use “fuser -m /u02/oradata” to list all PIDs accessing the mount point.

As you can see from the output the file system was fine-which struck me as odd. I mounted it again and started my database, surely that was only a glitch.

After repeating my test, I got the same block corruption. I now had a suspicion that ext4 might be the problem. Instead of creating a tablespace with 1 8GB data file, I used 4 2 GB data files and the problem went away.

After some experimentation I found out that the magical boundary is somewhere > 3G for single data files. Here is the proof:

SQL> create tablespace WillIBeUnusable datafile '/u02/oradata/orcl/corrupt.dbf' size 3G;

Tablespace created.

SQL> !dbv file=/u02/oradata/orcl/corrupt.dbf

DBVERIFY: Release 11.2.0.2.0 - Production on Thu Nov 3 10:32:02 2011

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = /oradata/orcl/corrupt.dbf

DBVERIFY - Verification complete

Total Pages Examined         : 393216
Total Pages Processed (Data) : 0
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 127
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 393089
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 3371559 (0.3371559)
$

However a 5G data file reports these:

$ dbv file=/oradata/orcl/corrupt.dbf

DBVERIFY: Release 11.2.0.2.0 - Production on Thu Nov 3 10:33:49 2011

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = /oradata/orcl/corrupt.dbf

...

Page 655357 is marked corrupt
Corrupt block relative dba: 0x0289fffd (file 10, block 655357)
Completely zero block found during dbv:

Page 655358 is marked corrupt
Corrupt block relative dba: 0x0289fffe (file 10, block 655358)
Completely zero block found during dbv:

Page 655359 is marked corrupt
Corrupt block relative dba: 0x0289ffff (file 10, block 655359)
Completely zero block found during dbv:

Page 655360 is marked corrupt
Corrupt block relative dba: 0x028a0000 (file 10, block 655360)
Completely zero block found during dbv:

DBVERIFY - Verification complete

Total Pages Examined         : 655360
Total Pages Processed (Data) : 0
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 127
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 524160
Total Pages Marked Corrupt   : 131073
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 3371948 (0.3371948)
$

That’s not good. A search on supporthtml.oracle.com revealed that ext4 is supported only in 5.6 and later. At that point I stopped bothering with it since even if I worked out where the problem was, I would still be running in an unsupported configuration. The main reason I prefer not to be in “unsupported” terrain is that I don’t like Oracle support to get away by simply dismissing the service request.

The Alternative: XFS

However, as Greg Rahn, Kevin Closson and others will tell you, you shouldn’t use ext4 for databases anyway: use XFS! The latter has been substantially improved in Oracle/Red Hat Linux 6 and should be your file system of choice.

Almost forgot the hard facts:

  • Oracle Linux 5.5 64bit
  • Oracle 11.2.0.1 single instance

6 Responses to “An interesting problem with ext4 on Oracle Linux 5.5”

  1. I completely disagree when it comes to XFS and Oracle. Dont get me wrong XFS i a great file system and i use it a lot (no fsck, extremely fast deletion of big files, support for really huge file systems, xfsudmp/xfsrestore) BUT i would not use it for datafiles.

    The reason is:

    “Oracle does not run certifications on local filesystems (i.e. except for OCFS2, NFS etc.) except ext2/ext3 as it is the common default filesystem for all Linux distributions. So if a problem happens specific to XFS, the Linux vendor should be engaged.” [Metalink Doc ID 414673.1]

    The same note stated there was a error which caused redo log corruption.

    I dont want to deal with SuSE or RedHat (or even Oracle) in case there are errors with the underlying file system. It is completely unneccessary. Ext3 supports up to 32 TB (wikipedia) volume size. That should be enough for most use cases. In case it is not you should use ASM anyway. And there are a lot installations out there with ext3.
    Formatted the right way (one inode per 1 MB or even 4 MB speeds up file system checks a lot… down from 1 hour per TB to 5 minutes for 16 TB volumes).

  2. Paul McManus said

    Hi

    Has any of the discussion above been changed by the passage of time? Can see a few issues listed against EXT4 and I have warnings in /var/log/messages about EXT4 alignment and possible performance issues

    So is ext3 the way to go to avoid issues?

    Red Hat Linux 6.2
    Oracle 11g R2 – 11.2.0.3

    • Martin Bach said

      Hmmm,

      ASM seems to become more and more my personal preference. The same for RAC and single instance, a unified interface to database management (srvctl etc) and you can raise calls with Oracle if needed. The downside is additional patching, although that has become a non-issue with the opatch auto patching. I’d check the release notes with every Oracle Linux (or any other) to see if support for a particular file system is granted, with the exception of btrfs which is still quite fresh despite everything you might hear otherwise.

      I’d also give XFS a try which seems a viable alternative (in test of course) in recent distributions.

      But as with everything, the best technology might not be the best technology for _you_. Always perform (regression) testing to ensure that your environment and the chosen solution are compatible.

      Martin

      • Paul McManus said

        Martin

        Many thanks for the reply and the reminder there is no black and white answer to this type of “what is best?” question.

        PMcM

  3. Efstathios Efstathiou said

    Mostly on Linux you will consider looking ASM rather soon for the mentioned points above, specially when using Linux. I my recent project we had some issues with ext3 using direct i/o on EMC Vmax, where the the i/o got chopped in 4k size causing overhead and slowdowns (check Barts Jerps “dirtycache”‘ blog for more information. At first stage we moved redo logs to raw devices and got rid of the i/o chopping (check using iostat -x => column avgrq-sz). I/O requests got now issues at maximum kernel size. In our use case an implementation for ASM seemed reasonable. But as always you need to know what you are doing and bear in mind the consequences(there is no holy grail). Eventually you even make things worse by turning your awr reported physical from a previously buffered filesystem, which in realty were 70% buffered and 30% only physical into 100% physical. Carefully evaluate all combinations (all on ASM, redo on raw, data with directio, redo on raw, data on buffered, all on raw). Oracle is recently offering you both raw (ASM) and buffered filesystem (ACFS) for (most) supported platforms, so as DBA it is good news, that you have some standard accross all platforms. As ACFS is sitting on top on ASM trough AVDM, this gives you alot of control on the server’s layout without bugging your sysadmin all the time.

  4. […] though-ext4 has been, let’s say, a little flaky. I first found out about that in 2011 (https://martincarstenbach.wordpress.com/2011/11/04/an-interesting-problem-with-ext4-on-oracle-linux-5…) and Oracle now published a MOS note: ORA-1578 ORA-353 ORA-19599 Corrupt blocks with zeros when […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: