Put Down the Saw and Get the Glue: Working Around VMware KB1022751

VMware KB article 1022751 lays out the details of an interesting bug in ESXi 4.0 and 4.1 pretty plainly:

When trying to team NICs using EtherChannel, the network connectivity is disrupted on an ESXi host. This issue occurs because NIC teaming properties do not propagate to the Management Network portgroup in ESXi. When you configure the ESXi host for NIC teaming by setting the Load Balancing to Route based on ip hash, this configuration is not propagated to Management Network portgroup.

(Note that load balancing by IP hash is the only supported option for EtherChannel link aggregation.)

Unfortunately, the KB article’s workaround – there is no patch that I’m aware of – requires network connectivity to the host via the vSphere Client. But what do you do if you’ve just sawed off the branch you’re sitting on network-wise, and can no longer connect with the vSphere client?

Enable Local Tech Support Mode on the ESXi host, log in as root and run the following command:

vim-cmd hostsvc/net/portgroup_set --nicteaming-policy=loadbalance_ip vSwitch0 "Management Network"

(Replace “vSwitch0” and “Management Network” with the appropriate vSwitch and portgroup as necessary.)

You may also find that while both NICs are active for the vSwitch, one will be in “standby” for the portgroup – a configuration not supported for IP hash load balancing. It would be reasonable to think that you could fix this with the following, but you can’t (see error on lines 2-8):

~ # vim-cmd hostsvc/net/portgroup_set --nicorderpolicy-active=vmnic0,vmnic1 vSwitch0 "Management Network"
(vmodl.fault.InvalidArgument) {
   dynamicType = <unset>, 
   faultCause = (vmodl.MethodFault) null, 
   invalidProperty = <unset>, 
   msg = "A specified parameter was not correct. 

VMware appears to have known about this bug for a while now – try searching the VMware Communities for some workarounds dating back to the 3.x days, including some from VMware employees – so resolving it is presumably either extremely difficult or not currently a high priority. However, you will likely be able reach the ESXi host using the vSphere Client after fixing the portgroup NIC teaming policy, so you can fix this issue in the GUI.

If you find yourself attempting to automate an ESXi install with Kickstart and don’t want to make fixing the portgroup through the vSphere Client part of your install process, consider not using EtherChannel at all for the Management Network – just use active and standby NICs, perhaps in a configuration similar to Kendrick Coleman’s ESXi 4.1 Kickstart Install blog post.