During some vCenter HA testing for a customer, I ended up in a situation where all nodes in the vCenter HA cluster could not communicate with each other. This means that all three nodes: Active, Passive, and Witness could not communicate with each other. This is more than a single point of failure and when this happens, the vCenter HA cluster is assumed non-functional and availability is impacted because vCenter HA is not designed for multiple failures.
More on different isolated vCenter HA scenarios can be found in the VMware vCenter Server High Availability Performance and best practices whitepaper which is a quite good and I suggest you read it anyway 😉
Recover from isolated vCenter HA nodes
So, when the nodes become isolated, your first objective is to resolve the connectivity problem. If you can restore connectivity, isolated vCenter HA nodes rejoin the cluster automatically and the Active node starts serving client requests again.
If you cannot resolve the connectivity problem, follow this steps to recover from isolated vCenter HA nodes.
- Power off and delete the Passive node and the Witness node.
- Log in to the Active node by using SSH or via Direct Console.
- Log in as the root user and enable the Bash shell:
- Run the following command to remove the vCenter HA configuration:
# destroy-vcha -f
- Reboot the Active node:
- Wait until the Active node is back online.
The Active node will become a standalone vCenter Server Appliance. You can now start vCenter HA cluster configuration again.