One of the main problems I have seen with GNS (Grid Naming Service) installations was that you couldn’t really see if your DNS and DHCP configuration was correct until it’s been too late. This has been addressed, but it’s little known. There are a number of checks you can run before starting Oracle Universal Installer, and this post is about them.
What is the Grid Naming System?
I was initially drawn towards the GNS when it was initially released with 126.96.36.199. It is aimed at environments where the Oracle DBAs take on (yet another) piece of work, namely the DNS administration. By virtue of “subdomain delegation”, the master DNS server responsible for “example.com” hands off requests for a subdomain to this – rac.example.com – to an Oracle managed process. This was quite poorly documented initially, prompting me to figure it out myself in an earlier post: https://martincarstenbach.wordpress.com/2009/10/02/build-your-own-11-2-rac-system-part-ii-dns-dhcp-for-gns/
The problem with GNS in 188.8.131.52 was that you couldn’t really test if the DNS setup was sufficient for Oracle Installer to work, and I had a few attempts at the installation (the discussion here takes into account that I might not have been able to perform sufficient checking!)
Implementing DNS and DHCP
As a quick reminder, here is the subdomain delegation bit you add to your forward resolution zone file (with bind as the example):
$ORIGIN rac.localdomain. @ IN NS gns.rac.localdomain. gns.rac.localdomain. IN A 192.168.99.150
Here, I have a domain called “localdomain” for which 192.168.99.10 is the primary NS. Anything for “*.rac.localdomain” will be handed off to gns.rac.localdomain with IP address 192.168.99.150. This IP address will later be defined as the GNS VIP and as it seems, it’s the only address that must be registered in DNS. This is one of the biggest changes-with “manual” name resolution you’d add the SCAN, VIP, and public names for all cluster nodes in DNS before installation.
DHCP is very simple to set up and my example in the previously mentioned post are perfectly sufficient to get started.
Implementing on the host
The below is a sample “/etc/resolv.conf” I used – ensure that your /etc/nsswitch.conf sets “hosts” to files then DNS, which is the documented way.
options attempts: 2 options timeout: 1 search rac.localdomain localdomain nameserver 192.168.99.150 nameserver 192.168.99.10
The “attempts” and “timeout” options were new to my builds, and the search order must include the GNS domain, before it bubbles up to the corporate domain. This doesn’t have to be a top level domain by the way, I just kept it simple.
The hostname, as defined in /etc/hosts (and NOT in DNS like I said), has to be in the DNS subdomain. I opted for rac11203gnsnode1.rac.localdomain, with IP address 192.168.99.34. Nothing else is defined in /etc/hosts, optionally you could define the private interconnect addresses in there as well.
Testing and validating the setup
The cluvfy utility has been enhanced to test GNS before you go through the installation process and finding out it all failed. Before you install, you invoke runcluvfy comp gns -precrsinst as in this example:
[oracle@rac11203gnsnode1 ~]$ runcluvfy comp gns -precrsinst -domain rac.localdomain -vip 192.168.99.150 -verbose -n rac11203gnsnode1 Verifying GNS integrity Checking GNS integrity... Checking if the GNS subdomain name is valid... The GNS subdomain name "rac.localdomain" is a valid domain name Checking if the GNS VIP is a valid address... GNS VIP "192.168.99.150" resolves to a valid IP address Checking the status of GNS VIP... GNS integrity check passed Verification of GNS integrity was successful.
The command is fairly self explanatory: pass the subdomain you want GNS to manage and the GNS VIP, and off it goes. With my settings in the zone file (the reverse file doesn’t have any entries) it completed successfully.
The GNS integrity can also be verified post installation as in this example:
[oracle@rac11203gnsnode1 ~]$ cluvfy comp gns -postcrsinst -verbose Verifying GNS integrity Checking GNS integrity... Checking if the GNS subdomain name is valid... The GNS subdomain name "rac.localdomain" is a valid domain name Checking if the GNS VIP belongs to same subnet as the public network... Public network subnets "192.168.99.0" match with the GNS VIP "192.168.99.0" Checking if the GNS VIP is a valid address... GNS VIP "192.168.99.150" resolves to a valid IP address Checking the status of GNS VIP... Checking if FDQN names for domain "rac.localdomain" are reachable GNS resolved IP addresses are reachable GNS resolved IP addresses are reachable GNS resolved IP addresses are reachable Checking status of GNS resource... Node Running? Enabled? ------------ ------------------------ ------------------------ rac11203gnsnode1 yes yes GNS resource configuration check passed Checking status of GNS VIP resource... Node Running? Enabled? ------------ ------------------------ ------------------------ rac11203gnsnode1 yes yes GNS VIP resource configuration check passed. GNS integrity check passed Verification of GNS integrity was successful. [oracle@rac11203gnsnode1 ~]$
The srvctl utility has a few more options as well to query the GNS state. The most comprehensive one is the “-a” flag:
[oracle@rac11203gnsnode1 ~]$ srvctl config gns -a GNS is enabled. GNS is listening for DNS server requests on port 53 GNS is using port 5353 to connect to mDNS GNS status: OK Domain served by GNS: rac.localdomain GNS version: 184.108.40.206.0 GNS VIP network: ora.net1.network
There are more granular options returning version, status etc. The most useful of these options seems to be the “-l”, listing all allocated IPs (note that 192.168.99.34 is NOT supplied by GNS, but defined in the host file). Since we only connect via VIP or SCAN this is not a problem.
For my 1 node experimental systems, these IP addresses were used:
[oracle@rac11203gnsnode1 ~]$ srvctl config gns -l Name Type Value gnsclu-scan1-vip A 192.168.99.36 gnsclu-scan2-vip A 192.168.99.37 gnsclu-scan3-vip A 192.168.99.38 rac11203gnsnode1-vip A 192.168.99.35 scan A 192.168.99.36 scan A 192.168.99.37 scan A 192.168.99.38 [oracle@rac11203gnsnode1 ~]$
If found the following addition to crsctl option very useful to quickly set up a DNS test “server”-surely not something to be done in production but interesting nevertheless.
[oracle@rac11203gnsnode1 ~]$ crsctl start testdns -h Usage: crsctl start testdns [-address <IP_address>] [-port <port>][-domain <GNS_domain>] [-once][-v] Start a test DNS listener that listens on the given address at the given port and for specified domain Where IP_address IP address to be used by the listener (defaults to hostname) port The port on which the listener will listen. Default value is 53. domain The domain query for which to listen. By default, all domain queries are processed. -once Flag indicating that DNS listener should exit after one DNS query packet is received -v Verbose output [oracle@rac11203gnsnode1 ~]$
You might also consider this interesting DNS query tool:
[oracle@rac11203gnsnode1 ~]$ crsctl query dns -servers CRS-10018: the following configuration was found on the system: CRS-10019: There are 2 domains in search order. They are: rac.localdomain localdomain CRS-10022: There are 2 name servers. They are: 192.168.99.150 192.168.99.10 CRS-10020: number of retry attempts for name lookup is: 4 CRS-10021: timeout for each name lookup is: 5 [oracle@rac11203gnsnode1 ~]$
And if you don’t like the manual page for nslookup, you could use this little command instead to interrogate your DNS setup:
[oracle@rac11203gnsnode1 ~]$ crsctl query dns -h Usage: crsctl query dns -servers Lists the system configured DNS server, search paths, attempt and timeout values crsctl query dns -name <name> [-dnsserver <DNS_server_address>] [-port <port>] [-attempts <attempts>] [-timeout <timeout>] [-v] Returns a list of addresses returned by DNS lookup of the name with the specified DNS server Where name Fully qualified domain name to lookup DNS_server_address Address of the DNS server on which name needs to be looked up port Port on which DNS server is listening attempts Number of retry attempts timeout Timeout in seconds
The handling of GNS has improved greatly, and a lot more information is available about what’s happening under the covers. Unknown to many, the main Clusterware tools undergo revisions all the time, so it’s worth running crsctl commands on the various objects available with the “-h” flag to stay up to date.