JetStream Software

Protected Domain Recovery Fails on CFO Recovery Domain

This article applies to all product versions.

Problem

JetStream DR at the recovery site may display a warning under the recovery CFO domain indicating “Protected Domain recovery not running.” The field “Data (Processed/Known Remaining)” appears stuck with no visible change for a prolonged period of time.

Troubleshooting

To troubleshoot the problem, try following the below steps:

  1. Ensure the DRVA status is “Running” from the JetStream DR UI Appliances tab.
    • If any connection error is reported, check the connection to the object store from the DRVA.
       
  2. Ensure the replication log volume has enough free space for operation.
  3. Check the RocVA VM to make sure it is up and running with no errors reported.
    • Ensure there are no underlying network issues and validate the VM events.
       
  4. Check the MSA logs and note if there are any connection issues between the RocVA and DRVA.
    The following example illustrates RocVA reconnect attempts:

    Sep 24 07:07:21 drva-live DRVA[1054]: JSS:NETWORKING:accept_connection(190): Accepted socket 35 at 4#10.245.24.81:35364
    Sep 24 07:07:21 drva-live DRVA[1054]: JSS:REPLICATION:FioValidateAuth(1125): Authenticated socket 35 ctx 0x7fe1d4000df0 group 3cc|800000001
    Sep 24 07:07:21 drva-live DRVA[1054]: JSS:PLATFORM:FioReplicationLogOpenSection(1740): invoking connector->openSection, connector = 0x7fe1ac002dc0
    Sep 24 07:07:21 drva-live DRVA[1054]: JSS:PLATFORM:FioReplicationLogOpenSection(1749): connector->openSection 0
    Sep 24 07:07:21 drva-live DRVA[1054]: JSS:OC:deployOcReplicationChannel(221): Rejected deployment of OC channel for UUID 0000000400000000f6f32fe098c53b00 due to existence of entry 37 with same UUID.
    Sep 24 07:07:21 drva-live DRVA[1054]: JSS:OC:FioOpenOcChannelForRecovery(1396): Failed to deploy channel 0000000400000000f6f32fe098c53b00 for GC database recovery
    Sep 24 07:07:21 drva-live DRVA[1054]: JSS:REPLICATION:FioApplianceCmdOpenGCMetadataLog(972): Failed to open OC channel 0000000400000000f6f32fe098c53b00 for recovery

Example

The following example illustrates the recovery site MSA initiating cessation of the recovery process and then addressing the event by expanding/adding the disk of the corresponding RVM, or creating a new RVM for the newly protected VM. After that, it resumes the recovery state of the domain.

If a VM is unprotected, be aware the CFO domain may require some time to execute cleanup and then resume the recovery process (e.g., you may need to wait 5, or even 15 to 20 minutes before seeing a change in status).

2023-11-02 08:10:15,509  INFO [Thread-31] (CleanupCFOProtectionStoppedVMsTask:179) |taskId=410053f1-7957-11ee-ae35-005056891e25|4-Hours-Domain02| Cleaning up RVM for protection stopped VM. pvmId=542a0901-7259-11ed-951c-005056907ae3, vm=DCWS1004
2023-11-02 08:10:25,276 DEBUG [Thread-73] (TopologyCollectionTask:4823) Removing objects: VirtualMachine:vm-4450
2023-11-02 08:10:25,278 DEBUG [Thread-73] (TopologyCollectionTask:4940) Removed vm object: VmKey [vmUuid=50099923-51d7-3446-4544-9f6945874b16, vcenterUuid=738adc85-10b9-49cd-a349-6babb0042823]
2023-11-02 08:10:25,697  INFO [Thread-54] (TopologyCollectionTask:1809) Removing VM from cache. vmName=jss-rvm-DCWS1004-1464
2023-11-02 08:10:25,700 DEBUG [Thread-54] (TopologyCollectionTask:1667) Removing the VM from Host->VM cache. VmKey [vmUuid=50099923-51d7-3446-4544-9f6945874b16, vcenterUuid=738adc85-10b9-49cd-a349-6babb0042823]
2023-11-02 08:10:30,361 DEBUG [Thread-31] (CleanupCFOProtectionStoppedVMsTask:231) |taskId=410053f1-7957-11ee-ae35-005056891e25|4-Hours-Domain02| Stopping recovery on ROCVA. rocvaIp=10.245.24.246
2023-11-02 08:10:30,925  INFO [Thread-31] (CleanupCFOProtectionStoppedVMsTask:253) |taskId=410053f1-7957-11ee-ae35-005056891e25|4-Hours-Domain02| Task cleanup for CleanupCFOProtectionStoppedVMsTask
2023-11-02 08:10:34,300 DEBUG [Thread-24] (ResolveRecoveryIssueTask:202) |4c387d62-7957-11ee-ae35-005056891e25|4-Hours-Domain02|initial| Initialize ResolveRecoveryIssueTask
2023-11-02 08:10:34,303 DEBUG [Thread-24] (ResolveRecoveryIssueTask:211) |4c387d62-7957-11ee-ae35-005056891e25|4-Hours-Domain02|initial| ResolveRecoveryIssueTask is ready now.

Upon resuming recovery, the MSA log reports:

2023-11-02 08:14:06,000 INFO [Thread-23] (ResolveRecoveryIssueTask:265) |4c387d62-7957-11ee-ae35-005056891e25|4-Hours-Domain02|checkAndResolveRocvaIssue| Recovery status is already running. No action needed. status=RUNNING

If the disk of a protected VM is added, expanded or removed, similar behavior as above may be witnessed. No manual intervention is required at the recovery site.

Solution

  1. Resolve any underlying network issues, then re-check the status of the domain.
     
  2. Perform a hard reboot of the DRVA VM associated with the CFO domain experiencing problems.
     
  3. If the problem persists, reboot the RocVA VM associated with the CFO domain experiencing problems.

When performing any actions, wait a few minutes for the status change to be reported by the JetStream DR UI.

Was this article helpful?

Related Articles

Can We Help You?

Can't find what you're looking for?
Contact JetStream