Problem
JetStream DR at the recovery site may display a warning under the recovery CFO domain indicating “Protected Domain recovery not running.” The field “Data (Processed/Known Remaining)” appears stuck with no visible change for a prolonged period of time.
Troubleshooting
To troubleshoot the problem, try following the below steps:
- Ensure the DRVA status is “Running” from the JetStream DR UI Appliances tab.
- If any connection error is reported, check the connection to the object store from the DRVA.
- If any connection error is reported, check the connection to the object store from the DRVA.
- Ensure the replication log volume has enough free space for operation.
- Check the RocVA VM to make sure it is up and running with no errors reported.
- Ensure there are no underlying network issues and validate the VM events.
- Ensure there are no underlying network issues and validate the VM events.
- Check the MSA logs and note if there are any connection issues between the RocVA and DRVA.
The following example illustrates RocVA reconnect attempts:Sep 24 07:07:21 drva-live DRVA[1054]: JSS:NETWORKING:accept_connection(190): Accepted socket 35 at 4#10.245.24.81:35364
Sep 24 07:07:21 drva-live DRVA[1054]: JSS:REPLICATION:FioValidateAuth(1125): Authenticated socket 35 ctx 0x7fe1d4000df0 group 3cc|800000001
Sep 24 07:07:21 drva-live DRVA[1054]: JSS:PLATFORM:FioReplicationLogOpenSection(1740): invoking connector->openSection, connector = 0x7fe1ac002dc0
Sep 24 07:07:21 drva-live DRVA[1054]: JSS:PLATFORM:FioReplicationLogOpenSection(1749): connector->openSection 0
Sep 24 07:07:21 drva-live DRVA[1054]: JSS:OC:deployOcReplicationChannel(221): Rejected deployment of OC channel for UUID 0000000400000000f6f32fe098c53b00 due to existence of entry 37 with same UUID.
Sep 24 07:07:21 drva-live DRVA[1054]: JSS:OC:FioOpenOcChannelForRecovery(1396): Failed to deploy channel 0000000400000000f6f32fe098c53b00 for GC database recovery
Sep 24 07:07:21 drva-live DRVA[1054]: JSS:REPLICATION:FioApplianceCmdOpenGCMetadataLog(972): Failed to open OC channel 0000000400000000f6f32fe098c53b00 for recovery
Example
The following example illustrates the recovery site MSA initiating cessation of the recovery process and then addressing the event by expanding/adding the disk of the corresponding RVM, or creating a new RVM for the newly protected VM. After that, it resumes the recovery state of the domain.
2023-11-02 08:10:15,509 INFO [Thread-31] (CleanupCFOProtectionStoppedVMsTask:179) |taskId=410053f1-7957-11ee-ae35-005056891e25|4-Hours-Domain02| Cleaning up RVM for protection stopped VM. pvmId=542a0901-7259-11ed-951c-005056907ae3, vm=DCWS1004
2023-11-02 08:10:25,276 DEBUG [Thread-73] (TopologyCollectionTask:4823) Removing objects: VirtualMachine:vm-4450
2023-11-02 08:10:25,278 DEBUG [Thread-73] (TopologyCollectionTask:4940) Removed vm object: VmKey [vmUuid=50099923-51d7-3446-4544-9f6945874b16, vcenterUuid=738adc85-10b9-49cd-a349-6babb0042823]
2023-11-02 08:10:25,697 INFO [Thread-54] (TopologyCollectionTask:1809) Removing VM from cache. vmName=jss-rvm-DCWS1004-1464
2023-11-02 08:10:25,700 DEBUG [Thread-54] (TopologyCollectionTask:1667) Removing the VM from Host->VM cache. VmKey [vmUuid=50099923-51d7-3446-4544-9f6945874b16, vcenterUuid=738adc85-10b9-49cd-a349-6babb0042823]
2023-11-02 08:10:30,361 DEBUG [Thread-31] (CleanupCFOProtectionStoppedVMsTask:231) |taskId=410053f1-7957-11ee-ae35-005056891e25|4-Hours-Domain02| Stopping recovery on ROCVA. rocvaIp=10.245.24.246
2023-11-02 08:10:30,925 INFO [Thread-31] (CleanupCFOProtectionStoppedVMsTask:253) |taskId=410053f1-7957-11ee-ae35-005056891e25|4-Hours-Domain02| Task cleanup for CleanupCFOProtectionStoppedVMsTask
2023-11-02 08:10:34,300 DEBUG [Thread-24] (ResolveRecoveryIssueTask:202) |4c387d62-7957-11ee-ae35-005056891e25|4-Hours-Domain02|initial| Initialize ResolveRecoveryIssueTask
2023-11-02 08:10:34,303 DEBUG [Thread-24] (ResolveRecoveryIssueTask:211) |4c387d62-7957-11ee-ae35-005056891e25|4-Hours-Domain02|initial| ResolveRecoveryIssueTask is ready now.
Upon resuming recovery, the MSA log reports:
2023-11-02 08:14:06,000 INFO [Thread-23] (ResolveRecoveryIssueTask:265) |4c387d62-7957-11ee-ae35-005056891e25|4-Hours-Domain02|checkAndResolveRocvaIssue| Recovery status is already running. No action needed. status=RUNNING
Solution
- Resolve any underlying network issues, then re-check the status of the domain.
- Perform a hard reboot of the DRVA VM associated with the CFO domain experiencing problems.
- If the problem persists, reboot the RocVA VM associated with the CFO domain experiencing problems.