JetStream for GCE Disaster Recovery Orchestration Admin Guide

Handling AROVA Failures

A healthy AROVA status is essential and is completely independent from production VMs.

If an AROVA fails for any reason (see below), it should be restarted immediately.

Protection of production VMs is not affected; however, an AROVA stops handling configuration updates which can affect failover if failure occurs while the AROVA is down.

If an AROVA fails to run in the currently active replica zone (i.e., compute and/or disk replica are down), notification will be sent out.

The reason for failure should be resolved and AROVA should be redeployed using the replica in the second zone.
The AROVA can be restarted using arova-cli.py with recovery script. For example:

python3 ./arova-cli.py recovery \

--src-pri-zone us-east1-b \

--aro-disk-name jet-aro-data-us-central1-us-east1 \

--sa [email protected] \

--project arova-project

Important: The above command is an illustrative example only and should not be directly used.

The recovery script parameters can be generated using the Recover Helper from the Management Site.

The recovery procedure assumes all previous instances of AROVA for the same region pair are down. If the VM is still available, then specifying the --force command line option will delete all unexpected AROVA instances before proceeding.
The same option can be used in case the replication state of the ACD is not determinable for some reason.

If active or passive replicas of ACD are down, AROVA may still be able to run depending on the scope of the incident.

In such case, notification is sent out but no action is necessary by the user.

If AROVA loses disks in both zones (R1Z1 and R1Z2), notification is sent out.

The reason for failure should be resolved and AROVA should be redeployed in the secondary region using arova-cli.py script described above.
The script requires stale AROVA instances to be deleted by the user, if any are present.

Note: AROVA and protected VMs may run in different regions. AROVA failover does not impact ongoing asynchronous replication.

If AROVA fails due to software error, notification is sent out.

The reason for failure should be resolved and then the AROVA VM should be restarted.
A support bundle that includes the AROVA process events log and core dump should be collected and forwarded to the support team.

If AROVA is inadvertently deleted (together with the ACD), notification is sent out.

Once this issue is discovered, AROVA should be redeployed in the secondary region using arova-cli.py script described above.

In the case of primary region failure, resulting
In case of primary region failure, manifested issues may appear unrelated to ARO.

If deemed appropriate, failover should be conducted with AROVA being redeployed in the secondary region using arova-cli.py script described above.

Also see:

View: Security Considerations

View: AROVA Prerequisites

View: AROVA Deployment

View: AROVA Health Monitoring

View: Accessing AROVA

View: AROVA Cleanup