Thursday 28 April 2011

vCenter server on Server 2008 R2 - hosts marked as "Disconnected" after 90 seconds

A bit of an oddity...

For convenience, I installed our current vCenter cluster in our dev lab. This meant an isolated network, with a simulated default gateway (for ESXi's heartbeats). Installed everything, up and running, no problem. Slight issue when putting the system into service though; I fired up the ESXi hosts, started the vCenter server, activated windows, joined to the domain, rebooted... when the machine came back up, I could connect to vCenter, but the hosts wer marked as 'Disconnected'. H'm. A bit odd but OK then... Right-click the host, choose 'Connect'... and all is well. start digging about getting the thing into service and suddenly the host falls off. I can connect directly to the host and everything seems well there, a cursory chceck of the logs seems to be OK.
Repeat this process a couple of times until it gets boring. Reconnect the host, and after a short timne the host falls off again. A quick check of the VMware KB brings me to this:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1003409

so I follow this through and come across a note in this KB article...
http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&externalId=1008030

"Note: The host goes into the Not Responding mode for a default 90 seconds time after adding it to vCenter Server. In case the vCenter Server is multi-homed, verify that the internal IP (that is reachable by the ESX hosts) is set as the management IP."

90 seconds is exactly the amount of time I get before the host falls off again. OK, so it's specifically a comms issue between the host and the vCenter server... a vmkping bears this out - I can ping the host from the vCenter server but not vice versa. OK, that sounds like a firewall isssue...

Turn on firewall logging on the vCenter server...

On the vCenter server, open Windows Firewall (which has changed a bit since I was a lad) and modify the logging fields - right click "Windows Firewall with advanced security" (the parent object) choose 'Properties', 'customise' the 'logging' field, increase the size to maximum (32KB), and set 'log dropped packets' to 'yes'.

open cmd and:
cd %systemroot%\system32\LogFiles\Firewall
type pfirewall.log

note a spot of DROP for TCP 902 - that's the inbound heartbeat from the host, which explains the behaviour... the R2 server is dropping traffic from it's hosts. It can establish a session because that's outbound and, well, established traffic. But unsolicited inbound stuff from hosts is broked. But R2 is a supported OS. So what happened?

We're joined to an AD domain, so have a look at the inbound firewall rules for the domain profile... well, there's a bunch of stuff specified by the vCenter server install, but it's all in the 'Private' profile... because that's where we were when the install was run. Since I joined to the domain, the firewall profile changed, and the rules the installer set up are no lomnger valid. Pretty easy to fix... select each rule that starts "VMware vCenter..." and make sure it has the 'Domain' profile ticked on the 'Advanced' tab. Immediately the hosts start talking to me again...

so really this occurred because of the shortcut I took of installing vCenter on a workgroup member then changing to a domain profile...

No comments:

Post a Comment