Back to Basics – SRM – Part 7: Migration & Failover

In the previous part we created, configured and tested a recovery plan. In this part we will perform a planned migration and a disaster recovery from the Protected Site to the Recovery Site to simulate a disaster recovery scenario. But first, let’s define both scenarios once again.

What is a planned migration?

“Planned migration enables you to migrate the workloads from the protected site to the recovery site with minimal risk of data loss. A planned migration will stop if there is an error in the workflow giving you an opportunity to fix it.”

What is a disaster recovery?

“Disaster recovery event is an unplanned migration where the connection between the sites has been lost. Unplanned failover will not stop if any errors are encountered in the workflow. This provides the quickest recovery time during a disaster event.”

Planned Migration

  1. On the Protected Site, open the vSphere Web Client and log in with the administrator account.
  2. On the left pane, under Home, go to Site Recovery.
  3. Next, select Recovery Plans to open the configured recovery plans.
  4. Select your recovery plan and click the Run recovery plan.SRM Migration & Failover
  5. In the recovery confirmation window, select the “I understand that this process will permanently alter the virtual machines and infrastructure of both the protected and recovery datacenters” check-box. Make sure you select Planned Migration as the recovery type. Click Next to continue.B2B-SRM-P6 Migration & Failover 02
  6. Review the settings and click Finish to start the planned migration process.

Depending on the amount of virtual machines configured in the recovery plan, this step can take a while. When it is finished, you should see the virtual machines running on the recovery site.

At this point your application admins should check if all apps are working correctly. When done, it is advisable to reprotect the virtual machines using the same recovery plan so you can return to your primary datacenter later.

  1. Select the recovery plan and click the Reprotect recovery plan.B2B-SRM-P6 Migration & Failover 03
  2. Select the “I understand that this operation cannot be undone” check-box and click Next.
  3. Click Finish to start the reprotection process.

Disaster Recovery

This process is very similar to the one performed with the Planned Migration but in this case the Protected Site is not available. So, on the Protected Site, simulate a disaster by powering off the vCenter Server and the ESXi hosts containing the protected virtual machines.

  1. On the Recovery Site, go to Site Recovery and then to Recovery Plans.
  2. Select your recovery plan and click the red Run recovery plan.
  3. Notice that the Planned Migration option is grayed out since the connection with the Protected site has been broken. Select the “I understand that this process will permanently alter the virtual machines and infrastructure of both the protected and recovery datacenters” checkbox and click Next.B2B-SRM-P6 Migration & Failover 04
  4. Review the settings and click Finish to start the recovery process.

You can monitor the progress on the Monitor tab when the recovery plan is executed. You will see a lot of errors but that’s normal in this scenario since there is no connection with the protected site and Site Recovery Manager tries to recover the most recent up-to-date data of virtual machines. When your virtual machines are up and running, make sure that the application admins check the data consistency.

Note: one more thing, see comment on REPROTECT as commented by my good friend Mike!

This concludes the Site Recovery Manager mini-series based on the vSphere Web Client. In my opinion, the installation and configuration is a lot easier and faster in this version. Replication and recovery steps are also much faster than the previous version.

Cheers!

– Marek.Z

6 Comments

  1. Hey Marek,

    There’s so much information out there on SRM, as you know. But one of the most important things that wasn’t emphasized enough were the requirements for a Reprotect operation. Namely, that you must have a 100% successful Recovery *before* the Reprotect operation is even available to run!

    The ability to fail back with SRM, even when used with vSphere Replication, is a big selling point for VMware as well as a requirement, many times, for the customers who purchase it. But most of my training and lab experiences were with perfect failovers and recoveries. Need to failback after running a Recovery Plan a single time? Sure! No problem! Simply hit that big blue Reprotect button and you’re all set! Easy right?

    I’ve found that when running for realz Disaster Recoveries or Planned Migrations, things fail all the time, from simple service restart issues to bigger problems, but each has the same requirement to get it 100% right and fixed before you can run a Reprotect.

    What I’m saying, my dear friend, Marek, is that your blog posts on SRM are some of the best and most read out there. Please reinforce and reiterate the requirement for a 100% successful Recovery Plan run *before* running a Reprotect. That means folks will need to fix any and all errors that occur during a Recovery before they use that all important Reprotect feature!

    As always, all the best!

    Mike

    • Hi Mike,

      Great comment. Indeed, you are absolutely right. The recovery operation needs to complete before you can even reprotect you VMs.

      Ik will add it in big red letters to the post! 🙂

      Thanks Mike,

      Cheers!

  2. So we performed an actual DR and the machines are all running in the recovered site. We’ve now brought the original protected site back up but I want to retain the new data that’s been generated at the Recovery Site from the machines that were failed over but you’re saying I need to run the Recovery again so that it all goes correctly, won’t that overwrite the data?

      • Hello Marek,

        Can You please elaborate how can we run Reprotect if the initial DR operation was not fully successful. As Casey said, they’ve performed an actual DR while the original protected site was down and unavailable. That would cause some errors in the recovery plan processing, right? So, how do you resolve those errors once the original site is up, and run Reprotect?

        Thank you,
        Slavko

        • Hi Slavko, it is been a while since I touched SRM. Not sure which version you are using but back in the older version, you had to reconfigure SRM manually when you did a DR, run planned migration and check for new errors. If successful without errors, you can use the reprotect option again.

          Hope this helps.

Leave a reply...