Possible Host Crash in environments running 2.0 Update 3 (Xeme), patches 144, 148, or 154

KB Number:
00000112

Solution:

Purpose


We have identified a rare statistical issue that can have a negative effect on the stability of hosts with VRAs in environments running Zerto Virtual Replication 2.0 Update 3 (Xeme) ? patches 144, 148, or 154. A code bug in the Zerto Host Component can cause it to crash with a purple screen (PSOD). (Note that all other versions of Zerto Virtual Replication are not affected.)

 

The host crash is caused due to a code bug that occurs when a large number of small IO writes occur, specifically when processes inside the Virtual Replication Appliance (VRA) are faulty. 

Affected Versions


2.0 Update 3 (Xeme), patches 144, 148, and 154.

Complexity


High

 

Solution


Zerto has fixed the issue in a patch, version 2.0 Update 3 (Xeme 159), which was released earlier today (January 31st, 2013) and is available for download from the Zerto Self-Service Portal.
 
To avoid this risk, there are two solutions:
  1. Download the latest version, 2.0 Update 3 (Xeme 159), and upgrade all hosts to this version. This will permanently prevent this crash.
  2. If an upgrade cannot take place, there is a workaround that will also prevent this crash until the system can be upgraded. The workaround involves modifying a Zerto host component loading script.

To apply the workaround, follow these steps:

  1. Change DRS setting to Manual, so that DRS will not auto-vMotion VMs.
  2. Choose the first host with which you wish to work.
  3. vMotion all protected VMs from this host to another host.
    • Note: the procedure has no impact on unprotected VMs on that host.  The purpose of vMotioning the protected VMs is solely to prevent the VPGs from entering Delta Sync.
  4. SSH to the host.
  5. Run the following command:
    • cat /etc/vmware/zloadmod.txt
  6. You will see output similar to the following:
    • sh: /vmfs/volumes/4f9ff102-c34c6c71-942d-842b2b0c241e/zagentid/44454c4c590010318046b7c04f355031/zloadmod.sh
  7. Change to the directory containing the "zloadmod.sh" file in the previous output.
  8. Stop the Zerto component by running the following command:
    • ./zunloadmod.sh
  9. Edit the 'zloadmod.sh' file:
    1. Look for the following line:
      • MODULE_PARAMS="vrauuid=$VRA_VM_UUID"
    2. Change the line to the following:
      • MODULE_PARAMS="vrauuid=$VRA_VM_UUID iocache=48"
    3. Save the changes.
  10. Star the Zerto component by running the following command:
    • ./zloadmod.sh
  11. (Optional) vMotion protected VMs back to the host.
  12. Repeat steps 3 - 11 for all hosts in the environment.
  13. (Optional) Revert the DRS automation level to its original setting.

Although this is a
rare condition, if your environment is running 2.0 Update 3 (Xeme), it is
highly recommended to either upgrade or use the workaround to avoid host
crashes.

 

Note: if you are running a version of Zerto Virtual Replication prior to 2.0 Update 3 (Xeme), this issue does not affect you, but you
may still want to consider upgrading to Zerto 2.0 Update 3 (Xeme 159).


1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading...