How to recover from isolated vCenter HA nodes

During some vCenter HA testing for a customer, I ended up in a situation where all nodes in the vCenter HA cluster could not communicate with each other. This means that all three nodes: Active, Passive, and Witness could not communicate with each other. This is more than a single point of failure and when this happens, the vCenter HA cluster is assumed non-functional and availability is impacted because vCenter HA is not designed for multiple failures.

More on different isolated vCenter HA scenarios can be found in the VMware vCenter Server High Availability Performance and best practices whitepaper which is a quite good and I suggest you read it anyway 😉

Recover from isolated vCenter HA nodes

So, when the nodes become isolated, your first objective is to resolve the connectivity problem. If you can restore connectivity, isolated vCenter HA nodes rejoin the cluster automatically and the Active node starts serving client requests again.

If you cannot resolve the connectivity problem, follow this steps to recover from isolated vCenter HA nodes.

  1. Power off and delete the Passive node and the Witness node.
  2. Log in to the Active node by using SSH or via Direct Console.
  3. Log in as the root user and enable the Bash shell: # shell
  4. Run the following command to remove the vCenter HA configuration: # destroy-vcha -f
  5. Reboot the Active node: # reboot
  6. Wait until the Active node is back online.

The Active node will become a standalone vCenter Server Appliance. You can now start vCenter HA cluster configuration again.

Cheers!

– Marek.Z

4 Comments

  1. I have a similar issue, but our Passive and Witness are showing as (orphaned) and vCenter is showing a lot of errors. After a destroy of vcha we had issues with getting services to restart. VMware support went to a previous snapshot that had vcha turn on. So the Passive and Witness are orphaned and unreachable. We can get to all hosts and do not see either Passive or Witness on any of the hosts. Is there a safe process for removing this orphaned vms to get our vCenter back to a health state. Vmware says that our only solutions are to redeploy vCenter or power off all our Host and VMs. We are hoping for a better soulition.

Leave a reply...