Testing NIC bonding on RHEL6

I have recently upgraded my lab’s reference machine to Oracle Linux 6 and have experimented today with its network failover capabilities. I seemed to remember that network bonding on xen didn’t work, so was curious to test it on new hardware. As always, I am running this on my openSuSE 11.2 lab server, which features these components:

  • xen-3.4.1_19718_04-2.1.x86_64
  • Kernel  2.6.31.12-0.2-xen
  • libvirt-0.7.2-1.1.4.x86_64

Now for the fun part-I cloned my OL6REF domU, and in about 10 minutes had a new system to experiment with. The necessary new NIC was added quickly before registering the domU with XenStore. All you need to do in this case is to add another interface, as in this example (00:16:1e:1b:1d:1f already existed):

...
<interface type='bridge'>
<mac address='00:16:1e:1b:1d:1f'/>
<source bridge='br1'/>
<script path='/etc/xen/scripts/vif-bridge'/>
<target dev='vif179.0'/>
</interface>
<interface type='bridge'>
<mac address='00:16:1e:10:1d:1f'/>
<source bridge='br1'/>
<script path='/etc/xen/scripts/vif-bridge'/>
<target dev='vif179.0'/>
</interface>
...

After registering the domU using  a call to “virsh define bondingTest.xml” the system starts as usual, except that it has a second NIC, which at this stage is unconfigured. Remember that the Oracle Linux 5 and 6 network configuration is in /etc/sysconfig/network and /etc/sysconfig/network-scripts/.

The first step is to rename the server-change /etc/sysconfig/network to match your new server name.That’s easy :)

Now to the bonding driver. RHEL6 and OL 6 have deprecated /etc/modprobe.conf in favour of /etc/modprobe.d and its configuration files. It’s still necessary to tell the kernel that it should use the bonding driver for my new device, bond0 so I created a new file /etc/modprobe.d/bonding.conf with just one line in it:

alias bond0 bonding

That’s it, don’t put any further information about module parameters in the file, this is deprecated. The documentation clearly states “Important: put all bonding module parameters in ifcfg-bondN files”.

Now I had to create the configuration files for eth0, eth1 and bond0. They are created as follows:

File: ifcfg-eth0

DEVICE=eth0
 BOOTPROTO=none
 ONBOOT=yes
 MASTER=bond0
 SLAVE=yes
 USERCTL=no

File: ifcfg-eth1

DEVICE=eth1
 BOOTPROTO=none
 ONBOOT=yes
 MASTER=bond0
 SLAVE=yes
 USERCTL=no

File: ifcfg-bond0

DEVICE=bond0
 IPADDR=192.168.0.126
 NETMASK=255.255.255.0
 ONBOOT=yes
 BOOTPROTO=none
 USERCTL=no
 BONDING_OPTS="<bonding parameters separated by spaces>"

Now for the bonding paramters-there are a few of interest. First, I wanted to set the mode to active-passive, which is Oracle recommended (with the rationale: it is simple). Additionally, you have to set either the arp_interval/arp_target parameters or a value to miimon to allow for speedy link failure detection. My BONDING_OPTS for bond0 is therefore as follows:

BONDING_OPTS=”miimon=1000 mode=active-backup”

Have a look at the documentation for more detail about the options.

The Test

The test is going to be simple: first I’ll bring up the interface bond0 by issuing a “system network restart” command on the xen console, followed by a “xm network-detach” command.The output of the network restart command is here:

[root@rhel6ref network-scripts]# service network restart
Shutting down loopback interface:  [  OK  ]
Bringing up loopback interface:  [  OK  ]
Bringing up interface bond0:  [  OK  ]
[root@rhel6ref network-scripts]# ifconfig
bond0     Link encap:Ethernet  HWaddr 00:16:1E:1B:1D:1F
          inet addr:192.168.99.126  Bcast:192.168.99.255  Mask:255.255.255.0
          inet6 addr: fe80::216:1eff:fe1b:1d1f/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:297 errors:0 dropped:0 overruns:0 frame:0
          TX packets:32 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:9002 (8.7 KiB)  TX bytes:1824 (1.7 KiB)

eth0      Link encap:Ethernet  HWaddr 00:16:1E:1B:1D:1F
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:214 errors:0 dropped:0 overruns:0 frame:0
          TX packets:22 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:6335 (6.1 KiB)  TX bytes:1272 (1.2 KiB)
          Interrupt:18

eth1      Link encap:Ethernet  HWaddr 00:16:1E:1B:1D:1F
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:83 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2667 (2.6 KiB)  TX bytes:552 (552.0 b)
          Interrupt:17

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

The kernel traces these operations in /var/log/messages:

May  1 07:55:49 rhel6ref kernel: bonding: bond0: Setting MII monitoring interval to 1000.
 May  1 07:55:49 rhel6ref kernel: bonding: bond0: setting mode to active-backup (1).
 May  1 07:55:49 rhel6ref kernel: ADDRCONF(NETDEV_UP): bond0: link is not ready
 May  1 07:55:49 rhel6ref kernel: bonding: bond0: Adding slave eth0.
 May  1 07:55:49 rhel6ref kernel: bonding: bond0: Warning: failed to get speed and duplex from eth0, assumed to be 100Mb/sec and Full.
 May  1 07:55:49 rhel6ref kernel: bonding: bond0: making interface eth0 the new active one.
 May  1 07:55:49 rhel6ref kernel: bonding: bond0: first active interface up!
 May  1 07:55:49 rhel6ref kernel: bonding: bond0: enslaving eth0 as an active interface with an up link.
 May  1 07:55:49 rhel6ref kernel: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
 May  1 07:55:49 rhel6ref kernel: bonding: bond0: Adding slave eth1.
 May  1 07:55:49 rhel6ref kernel: bonding: bond0: Warning: failed to get speed and duplex from eth1, assumed to be 100Mb/sec and Full.
 May  1 07:55:49 rhel6ref kernel: bonding: bond0: enslaving eth1 as a backup interface with an up link.

This shows an active device of eth0, with eth1 as the passive device. Note that the MAC addresses of all devices are identical (which is expected behaviour). Now let’s see what happens to the channel failover when I take a NIC offline. First of all I have to check xenstore which NICs are present:

# xm network-list bondingTest
Idx BE     MAC Addr.     handle state evt-ch tx-/rx-ring-ref BE-path
0   0  00:16:1e:1b:1d:1f    0     4      14    13   /768     /local/domain/0/backend/vif/208/0
1   0  00:16:1e:10:11:1f    1     4      15    1280 /1281    /local/domain/0/backend/vif/208/1

I would like to take the active link away, which is at index 0. Let’s try:

# xm network-detach bondingTest 0

The domU shows the link failover:

May  1 08:00:46 rhel6ref kernel: bonding: bond0: Warning: the permanent HWaddr of eth0 - 00:16:1e:1b:1d:1f - is still in use by bond0.
Set the HWaddr of eth0 to a different address to avoid conflicts.
 May  1 08:00:46 rhel6ref kernel: bonding: bond0: releasing active interface eth0
 May  1 08:00:46 rhel6ref kernel: bonding: bond0: making interface eth1 the new active one.
 May  1 08:00:46 rhel6ref kernel: net eth0: xennet_release_rx_bufs: fix me for copying receiver.

Oops, there seems to be a problem with the xennet driver, but never mind. The important information is in the lines above: the active eth0 device has been released, and eth1 jumped in. Next I think I will have to run a workload against the interface to see if that makes a difference.

And the reverse …

I couldn’t possibly leave the system in the “broken” state, so I decided to add the NIC back. That’s yet another online operation I can do:

# xm network-attach bondingTest type='bridge' mac='00:16:1e:1b:1d:1f' bridge=br1 script=/etc/xen/scripts/vif-bridge

Voila-job done. Checking the output of ifconfig I can see the interface is back:

# ifconfig
bond0     Link encap:Ethernet  HWaddr 00:16:1E:1B:1D:1F
          inet addr:192.168.99.126  Bcast:192.168.99.255  Mask:255.255.255.0
          inet6 addr: fe80::216:1eff:fe1b:1d1f/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:39110 errors:0 dropped:0 overruns:0 frame:0
          TX packets:183 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1171005 (1.1 MiB)  TX bytes:32496 (31.7 KiB)

eth0      Link encap:Ethernet  HWaddr 00:16:1E:1B:1D:1F
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:7 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:412 (412.0 b)  TX bytes:0 (0.0 b)
          Interrupt:18

eth1      Link encap:Ethernet  HWaddr 00:16:1E:1B:1D:1F
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:39106 errors:0 dropped:0 overruns:0 frame:0
          TX packets:186 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1170749 (1.1 MiB)  TX bytes:33318 (32.5 KiB)
          Interrupt:17

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:46 errors:0 dropped:0 overruns:0 frame:0
          TX packets:46 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:2484 (2.4 KiB)  TX bytes:2484 (2.4 KiB)


I can also see that the kernel added the new interface back in.

May  2 05:05:31 rhel6ref kernel: bonding: bond0: Adding slave eth0.
May  2 05:05:31 rhel6ref kernel: bonding: bond0: Warning: failed to get speed and duplex from eth0, assumed to be 100Mb/sec and Full.
May  2 05:05:31 rhel6ref kernel: bonding: bond0: enslaving eth0 as a backup interface with an up link.

All is well that ends well.

Responses

  1. Very helpful and informative post. Thank you!

  2. Hi, Im also trying network bonding but i have some problem, when i shutdown eth0, eth1 becomes active but i cant ssh to the server. I can only ssh to the server when eth0 is active slave, any thoughts?

    1. Robert,

      I’d think your configuration is wrong somewhere. You should have a look at the messages file on the server for failover messages and also check the verbose output of ssh when trying to connect to the server.

      Martin

  3. In RHEL 5 we need to modify /etc/modprobe.conf file to configure bonding.
    But in RHEL6 there no such file present.

    So this file modification is not at all required in RHEL6 or there is other way.

    Thanks

    1. Hi Ashish,

      have a look at the blog post where it reads “RHEL6 and OL 6 have deprecated /etc/modprobe.conf in favour of /etc/modprobe.d and its configuration files”. You should find what you need from there onwards.

  4. Sir
    I am getting the following error for bond0
    sometimes when it includes the MAC Id of any of the interfaces
    “Device ins not managed by network manager”
    and when MAC Id is not there it is saying
    “Device is not there”

    1. Hi Ashish,

      can you confirm you are using a RH 6 or Oracle Linux 6 distribution? AFAIK those don’t use network manager. I don’t recall being prompted for Network Manager at any stage during the installation. My Ubuntu server though (and anything else based on it plus SuSE) seem to use NM for NIC management. Please ensure you are using ifconfig for network configuration.

      Hope this helps,

      Martin

  5. How about the warning while bringing up the bond.
    May 1 07:55:49 rhel6ref kernel: bonding: bond0: Warning: failed to get speed and duplex from eth1, assumed to be 100Mb/sec and Full.

    Does it mean the bond works at a lower speed ? Why failed to get speed … NIC, bond or driver issue ??

    1. Hi Sunderjeet,

      the warning is caused by my hypervisor-this Oracle Linux installation is not on physical hardware. You can use mii-tool On your setup to verify that the link speed is defined correctly.

      1. I’m seeing similar warning when bonding is enabled on Broadcom NICs. Though the cards support 1G, warning says switching to 100Mbps. When i do ethtool on the interface, it shows 500Mbps. Not sure if the reason is with NIC or bonding.

  6. thanks for the info.!. however I would not like to use active-backup but load balancing mode and I can’t seem to get this setup proparly

Blog at WordPress.com.