Some time ago @martinberx mentioned on twitter that one of his Linux systems suffered from Clusterware issues for which there wasn’t a readily available explanation. It turned out that the problem he faced were unwanted (from an Oracle perspective at least) automatic cleanup operations in /var/tmp/.oracle. You can read more at the original blog post.
The short version is this: systemd (1) – successor to SysV init and Upstart – tries to be helpful removing unused files in a number of “temp” directories. However some of the files it can remove are essential for Clusterware, and without them all sorts of trouble ensue.
A note about this post’s shell life and versions used
In case you found this post via a search engine, here are the key properties of the system I worked on while compiling this post – an Oracle Linux 7.7 VM and orachk 19.2.0_20190717. Everything in IT has a shell life, and this post is no different. It is likely to become obsolete with new releases of the operating system and/or orachk versions.
My Oracle Suport (MOS) has been updated, there are finally hits when searching for “tmpfiles.d” for Oracle/RedHat Linux 7. Exadata users find the issue documented as EX50. More importantly, orachk 19.2.0_20190717 (and potentially earlier releases, I haven’t checked) warn you about this potential stability issue as you can see in figure 1:
This is great news for database administrators as this might have gone undetected otherwise. It should be noted though that the suggested solution underneath Action/Repair is incomplete. You cannot simply copy and paste the 3 lines mentioned in that section as the documentation for tmpfiles.d (5) reveals. Subsequent orachk runs confirm this by still flagging the outcome of the check as critical.
Ensuring the check passes
A little more digging into the issue and corresponding MOS notes revealed that a different syntax is needed. This has already been covered in a couple of other blog posts, I’m adding the information here to save you time. With the amended configuration added towards the end of /usr/lib/tmpfiles.d/tmp.conf, orachk was happy:
[root@server1]# cat /usr/lib/tmpfiles.d/tmp.conf # This file is part of systemd. # # systemd is free software; you can redistribute it and/or modify it # under the terms of the GNU Lesser General Public License as published by # the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. # See tmpfiles.d(5) for details [ ... more output ... ] x /tmp/.oracle* x /var/tmp/.oracle* x /usr/tmp/.oracle*
I re-ran the orachk command and thankfully, the test succeeded as you can see this in figure 2:
I have no idea if systemd picks the change in my configuration file up without restarting the timer, so I’m also doing this for good measure:
[root@server1 ~]# systemctl restart systemd-tmpfiles-clean.timer
With these remediation steps in place, you have done everything Oracle documented to be safe from Clusterware issues caused by systemd. Shortly before hitting the publish button @FritsHoogland let me know that Oracle’s Grid Infrastructure RU 19.4 has a go at fixing this issue. It doesn’t add all the lines to satisfy orachk though, and you need to review /usr/lib/tmpfiles.d/tmp.conf after applying the 19.4 Grid Infrastructure RU.