Error: “I/O Error to Journal”
Viewed 617 times
An administrator receives an alert that IOs to a Journal disk for a protected VM are failing.
This alert appears for the following reasons:
The underlying storage is out of space, has very little free space, or there is actually an ongoing performance issue (this refers to the actual storage hardware, not virtual datastore within vCenter or SCVMM).
The recovery VRA responsible for holding this Journal disk is having resource utilization issues.
The recovery host on which the recovery VRA responsible for holding this Journal disk is having connectivity issues to the datastore on which this Journal disk resides on.
The below alert is seen in the Zerto GUI:
I/O error to journal
This can lead to syncs becoming stuck and not being able to Failover with a current checkpoint as checkpoints are not created during a sync.
To troubleshoot and resolve this issue, kindly follow the steps below:
Check for storage free space at the storage hardware level (array) and on the datastore that this Journal disk reside on.
If there is ample space available for the datastore, review host/storage logs for any indication of Check Conditions returned by the array or Performance Degradation warnings in the host logs.
Use the following article "How to Connect to a VRA via SSH" to connect to the recovery VRA responsible for holding this Journal disk.
Once inside the VRA, execute the "top" command.
A table with the VRA services should be visible to you now. Look for the service named "VraMain" and observe how much CPU and RAM the VRA is using.
If the "VraMain" service utilizes more than 90%, kindly review the "How to Increase the RAM Allocation for an Existing VRA" article in order to increase the resources of the VRA.
Open vSphere and locate the recovery host on which the recovery VRA responsible for holding this Journal disk, right-click the host, click "Storage", and then click "Rescan Storage".
If the above steps did not resolve the issue, kindly open a case with Zerto Support and provide the following logs using the Zerto diagnostics utility. The "How to perform Log Collection with the Zerto Virtual Replication Diagnostics Utility" article can be used as a guide to collect said logs. Below are the logs necessary to perform a further deep dive of the issue:
Case number: <The number of the case you opened for this issue>
Timeframe: <Enough to cover the time that the error first appeared + 1 hour before that>
VPGs: One affected VPG
vCD: If vCD is involved in the replication at the recovery site, then check the box.
Hypervisor: Yes, check the box.
Hosts: Review which VRAs were auto-populated and select the corresponding hosts on this screen.
Note: If any non-Zerto logs (i.e. host, hypervisor, VCD) fail to be collected, kindly collect the necessary logs normally from the endpoint itself and request a link to upload the additional logs.