Is it possible to boot an ESX 4 host from a LUN that has been “failed over” to another physical location? Yes, it is possible but there are some serious caveats by doing this. Please consider the following scenario:
- 2 datacenters: site A production (active), site B failover (passive)
- In each site an identical SAN array; LUN mirroring between the 2 arrays through a dedicated Fiber Channel link
- In each site 2 identical ESX hosts and both hosts boot from the SAN array
- The vCenter Server is virtualized
After the ESX boot LUN’s have been migrated to site B and you want to boot an ESX host from the migrated LUN, the boot process will probably stop at the vsd-mount and the ESX host will fall back into the troubleshooting mode. This is because the ESX host recognizes the boot LUN as a snapshot. This issue can be solved as described in the VMware KB 1012142 article. So, let’s get started 🙂
- In the troubleshooting mode, enable the resignature on the ESX host by typing: #esxcfg-advcfg –s 1 /LVM/EnableResignature
- Unload the VMFS driver: #vmkload_mod –u vmfs3
- Load the VMFS driver again: #vmkload_mod vmfs3
- Detect and resignature the VMFS volume by typing: #vmfstools –V
- Now, find the path to the esxconsole.vmdk file by typing: #find /vmfs/volumes/ -name esxconsole.vmdk
- The output should look similar to the following example:/vmfs/volumes/4c7e41bc-acb14e48-eeb9-e61f137cb50f/esxconsole-4c57f62f-72a6-8e68-1e35-e41f1378a8e0/esxconsole.vmdk
- Make a note of this output. You will need it later.
- Reboot the ESX host.
- Wait until the host boots up and you see the GRUB menu.
- Highlight the “VMware ESX 4.0” and press the “e” button.
- Select the “kernel /vmlinuz” section, press “e” button again and type the following after a space: /boot/cosvmdk=/esxconsole.vmdk. It should look similar to the following example: quiet /boot/cosvmdk=/vmfs/volumes/4c7e41bc-acb14e48-eeb9-e61f137cb50f/esxconsole-4c57f62f-72a6-8e68-1e35-e41f1378a8e0/esxconsole.vmdk
- Press Enter to accept the changes and press the “b” button to start the boot process. The ESX host should start successfully.
- Next, login at the service console with root user account and edit the esx.conf file located in the /etc/vmware directory: #vi /etc/vmware/esx.conf
- Press Insert key, scroll down to the /adv/Misc/CosCorefile entry and change the path to the one noted in step 6. It should look similar to: /adv/Misc/CosCorefile = “/vmfs/volumes/4c7e41bc-acb14e48-eeb9-e61f137cb50f/esxconsole-4c57f62f-72a6-8e68-1e35-e41f1378a8e0/core-dumps/cos-core”
- Scroll down to the /boot/cosvmdk entry and change the path to the one noted in step 6. The entry should read similar to: /boot/cosvmdk = “/vmfs/volumes/4c7e41bc-acb14e48-eeb9-e61f137cb50f/esxconsole-4c57f62f-72a6-8e68-1e35-e41f1378a8e0/esxconsole.vmdk”
- Press ESC key and type: :wq
- Press Enter key to save the changes to the esx.conf file.
- Save the changes made to the boot configuration by typing: #esxcfg-boot –b
- Reboot the ESX host.
- Repeat step 1 to 18 for every ESX host.
OK, the ESX part is done. The hosts should boot without any problems. Now, let’s try to bring the vCenter Server VM up and running.
- Login to one of the ESX hosts with the vSphere Client.
- Locate the vCenter Server on the datastore (if the datastore appears as a snapshot, simply rename it to the correct name).
- Add the vCenter Server VM to the inventory (if the vCenter Server has multiple disk drives located at different datastores, remove and re-add the disks to the vCenter Server VM).
- Check if the Network Adapter of the VM is connected to the correct network.
- Power on the VM.
Now that the virtual infrastructure is operational you can now restore the production VM’s. I’ve tested this procedure with 2 ESX hosts and 4 VM’s. It took me about 3 hours due to reboots and restore operations. Imagine doing this with 8 ESX hosts and 60 VM’s… you get the picture right? 🙂
Buy SRM! Or configure an Active/Active infrastructure.
Can you please update the article if this happened on a ESXi host?
Thanks and great site!
I would love to but unfortunately I don’t have access to this environment…
So cool Marek.Z
Glad this blog post was useful to you.
Followed your guide, but have one problem:
After the operation, the ESX boot LUN name in Vcenter has changed. It’s snap-xxxxxxxx- (in Configuration-Storage-Identification).
As far as I can tell, everything else is normal.
Is it really a snap shot? How can I tell?
Or is it just a leftover from the earlier problem?
What should I do?
In case of a LUN that has been replicated to another SAN, this is a normal behavior in vSphere 4.x, because there was a change to the disk or change to the controller connecting to the disk in which ESX(i) 4.x was previously installed. That’s why the disk is presented as a snapshot. You could resignature the disk and rename it but you will have to repeat the whole procedure as described in the blog post.
Hope this help.