We performed a DR test last week. During the test and afterwards we experienced a lot of issues that appear to be related to disk corruption (ie: corrupted files, volumes, databases, etc). We used the move option for testing as I wanted to make sure we didn’t lose any data. While moving out to our DR site, we had a few machines that went into Windows recovery on startup. I rolled back and tried again and was able to get them started the second time. We experienced many more issues moving back to our production site and I wasn’t able to easily remediate all of them. In a few instances, we started getting errors about missing or corrupted files on the VMs, services not able to start due to that, etc. I ended up having to recover VMs and files from storage snapshots and Veeam backups at our production site pre-move. Needless to say, this does not instill confidence in Zerto and/or that we would be able to recover in a matter of minutes as designed and promised when we purchased the product if we were to ever need to do a real failover due to a loss of our production site.
Anyone experienced similar issues? I’m not quite sure where to start on troubleshooting and remediating these issues so they don’t happen again in the future. We are doing VMWare > VMWare replication and all of these systems are Windows Server (mostly 2012 R2 and 2019). Any input is welcome.
Can you please share what version of Zerto you are running? This could be linked to a known issue. If not, its something we can investigate via a case if you still have logs.
We are running 9.0 Update 1.
The bug is on an older version of Zerto so this is not the cause. If you log a ticket we can give you some better answers.