JetStream Software

What Is: Failover?

Failover is the process where protected VMs and data are retrieved from the storage site and rehydrated at the recovery site. Operation continues from the recovery site.

In certain types of disaster incidents, the primary site (including both service and compute clusters) may become temporarily or even permanently unusable. In this case, JetStream DR can be used to recover the affected domains to a separate recovery site. 

Failover can be performed “as-needed,” in which case resources at the recovery site will only be created and utilized at the time failover recovery is performed. This is the most economical mode of operation.

Alternatively, failover can be performed “continuously” where the recovery site is initially created and then synchronized with the protected site during its normal course of operation. Continuous Failover (“CFO”) requires simultaneous operation of resources at both the protected site as well as the recovery site, but it can significantly reduce the amount of time required to recover from a disaster event.

Modes of Operation

Failover can be operated in two different modes:

“Failover” Mode

Failover mode is typically used in response to a disaster event where resources are replicated to a recovery site and then protected VMs are recovered to it. Failover steps are configured and performed together.

“Continuous Failover” (CFO) Mode

Continuous Failover mode can be initiated at any time to replicate resources to a recovery site prior to any disaster and then allowed to run in the background during normal system operation. When a disaster event occurs, continuous failover can be “completed” to immediately transfer ownership of the protected VMs to the recovery site (near-zero RTO).

  • Open the More menu and select one of the failover options (Failover or Continuous Failover).

Refer to the JetStream DR Admin Guide for more details about configuring and starting failover.

Post-Failover Options

Multiple options are available to control the state of the protected site and ownership of VMs and data after failover has been completed. Failover can be completed in one of three ways: regularplanned, or forced.

“Regular” Failover

Follows the set course of steps for normal failover mode.

“Planned” Failover

Shuts down the domain at the protected site (gracefully) before transferring ownership of VMs and data to the recovery site.

  • Data at the protected site will not be deleted.
  • Planned failover is typically used for non-disaster related events such as seamlessly shifting the location of VMs and their workloads while they continue to operate (i.e., migration).
    • For example, if it is known an event will occur that would produce a large workload or data burst that exceeds the resource capacity of the on-premises site, it could be beneficial to shift the VMs and their workloads to a cloud services provider capable of meeting the demand.
    • Another common case is moving VMs and workloads away from the on-premises site before performing major site maintenance that could potentially be disruptive or risky. After maintenance is complete, the VMs and workloads can be non-disruptively “failed-back” to the updated on-premises site.

During planned failover, any running VMs will be gracefully shut down by the MSA. VMware tools should be installed on all protected VMs to ensure proper operation of the shutdown process. If any VMs cannot be automatically shut down by the MSA they will need to be manually powered-off and then the task should be re-run. In general, it is recommended to install VMware tools on all VMs that will be protected.

  • If planned failover is used, a sub-option is available to Resume Continuous Rehydration on the Remote Site.
    • Planned failover using the continuous rehydration option is in essence “continuous failback.” Once VMs and data have been failed over to the recovery site, any new data generated there will automatically be “rehydrated” (synchronized in the background) back to the original protected site allowing for “near-zero RTO” failback.
    • Click the Configure Remote Recovery VA link to configure continuous rehydration settings. (It is typically configured to run automatically upon failover.)
  • Certain conditions must be met in order to use the Resume Continuous Rehydration on Remote Site option.

“Forced” Failover

Assumes the primary site is no longer accessible.

  • Ownership of the protected domain is immediately passed to the recovery site.
  • A dialog window will appear asking the user to confirm that complete ownership of the protected domain can be taken over immediately by the recovery site.

Failover Complete

After failover is complete (from any of the above described modes) the mode will change to “Running in Failover” and VM status will be “Recoverable.” All VMs of the protected domain will now be running at the recovery site in the state specified by the failover runbook settings.

Was this article helpful?

Related Articles