• This topic has 7 replies, 4 voices, and was last updated August 30, 2019 by Matthew W.

Failover test question

  • Hi, We’re planning to a failover test (vmware to vmware), once the DR copy is online – we plan to shutdown the source ESX hosts for a maintenance window. I want to confirm that the failover test won’t auto stop once the source hardware is offline, we want to keep the DR test VM running in the DR site during the maintenance. Kenan Dervisevic,

    Good afternoon,

    So long as you follow the proper procedure for shutting down down a host for maintenance by moving all VMs off of it first, see first link below, you are able to perform the maintenance during a failover test. However, we do not advise this, as depending on how long your maintenance goes, you may run into problems with leaving the test running for so long, see the second link below.

    How to Place a Host into Maintenance Mode – https://www.zerto.com/myzerto/knowledge-base/how-to-place-a-host-with-an-associated-vra-into-maintenance-mode/

    Results of Leaving a VPG Failover Test for a Long Duration – https://www.zerto.com/myzerto/knowledge-base/result-of-leaving-a-vpg-in-the-failover-test-failover-before-commit-or-move-before-commit-states-for-a-long-duration/

    I hope this information helps!

    Hi Matt,

     

    So here is our scenario:

     

    The datacenter that hosts the source VM will be offline for 2 days this weekend, it will be available again Monday morning (the ESX hosts and storage will be powered off on Friday night).

     

    We have a need for one VM to be available this weekend, but essentially in a read only state – we don’t need to save changes made to it. My thought was to do a failover test of it to our DR datacenter and have it run there over the weekend, and when the primary datacenter is back online – I will discard this test VM and just power back online the source VM.

     

    Does the article you mentioned still apply in this scenario? Do you have a recommendation on how to accomplish the above?

    Hello, Kenan.

    If there are no changes being made to the VM, and it is only for the weekend, you may be able to make it with just the failover test. However, it is a risk, and we do not suggest it as a standard. Another option could be to perform a live failover to your DR site, enabling reverse protection so that it will then see the site it was originally on as it’s DR site. This would give you the one VM available without worry, and when your hosts are brought back up after the weekend you can perform another live failover to return it to it’s original location.

     

    Does this make sense?

    In the failover test scenario – where is this scratch disk kept, on the same datastore the VM is being replicated to? Right now it has a ton of space free, so I’m not too concerned if that delta/scratch file grows for 2 days until we discard the test VM.

    Hi Kenan,

    The scratch disk will be in the same datastore in which the recovery/target VRA put the journal disks.

    The scratch disk is counted in the VPG or the VM’s journal hard limit.

    If you will take a longer time on the testing in Failover Test, please consider increasing the journal hard limit to a size which might be able to accommodate the changes happened in the testing period.

    However, as Matt explained, the long testing period is not recommended and various factors might bring risk to the temporary testing failed-over VM.

    If any questions, please feel free to let us know.

    One last question – in the failover test scenario, what happens if the test VM hits its journal limit, will it go down?

    I can increase the amount if necessary to accommodate the maintenance window, will editing the VPG to make this change cause any kind of re-sync/bitmap sync?

    Kenan,

    The following quote is from the the KB article Result of Leaving a VPG in the Failover Test State for a Long Duration. This is the result of letting the scratch disk fill up, corruption and errors.

    “As all IO writes during these states are written to a scratch volume, it is important to limit the duration of these states, as when the volume becomes full, the applications can no longer write IOs, which will lead to corruption and other errors. At this point, the only option is to stop the Failover Test, or Rollback the Live Failover/Move, and start again.”

    As for your second question, increasing the Journal disk size will not lead to a re-sync.

    Result of Leaving a VPG in the Failover Test State for a Long Duration – https://www.zerto.com/myzerto/knowledge-base/result-of-leaving-a-vpg-in-the-failover-test-failover-before-commit-or-move-before-commit-states-for-a-long-duration/

    Thank you.

You must be logged in to create new topics. Click here to login