The following guidelines describe conditions and methods to trigger various JetStream DR alarms which may be helpful for troubleshooting and testing purposes.
DRVA Restarted
- Reboot the DRVA VM or restart the DRVA service.
- An alarm will be triggered and can be viewed from DRVA VM > Monitor > Events.
DRVA High CPU Usage Duration Exceeded
- In preparation, configure the DRVA with a minimum: 4 CPU + 8GB Memory.
- Open the DRVA console and enable SSHD service.
- Connect via SSH to the DRVA using PuTTY.
- Use the top command to manage system processes.
- Create multiple (duplicate) SSH sessions running the following command in each:
- cat /dev/zero > /dev/null
DRVA High Memory Usage Duration Exceeded
Method 1
The DRVA high memory usage alarm may be triggered by conditions releated to DRVA high CPU usage (described above).
- The following command can be used to write and discard 4G of memory.
- dd if=/dev/zero of=/dev/null bs=4G count=1
- Repeating this command can produce a spike in memory usage for testing alarm conditions.
Method 2
An alternate method is to consume the defined memory to the point the alarm is triggered.
- Create a temporary mount point:
- # sudo mkdir /mnt/tmpfs
- # sudo mount -t tmpfs -o size=8G tmpfs /mnt/tmpfs
- Use the
dd
command to allocate required memory.- The following command writes a 7.5 GB file filled with zero bytes into the
tmpfs
mount, thus consuming 3.5 GB (3500MB) of RAM.- # dd if=/dev/zero of=/mnt/tmpfs/testfile bs=1M count=7500
- The following command writes a 7.5 GB file filled with zero bytes into the
- To clean up after testing:
- Reboot the DRVA, or
- Unmount the created
tmpfs
mount- # sudo umount /mnt/tmpfs
- # rmdir /mnt/tmpfs
DR Store IO Error
- Remove the replication log from a DRVA that contains an actively replicating protected domain.
- The DRVA should be configured to use the replication log that gets removed for the test.
- This action should trigger the alarm.
DRVA Unreachable Duration Exceeded
- Power off the DRVA.
- Or, disconnect the DRVA network.
- This action should trigger the alarm.
DR Store Unavailable
- It is not possible to trigger this error specificially. It is similar in behavior to an IO error.
Bitmap Mode ‘On’ Duration Exceeded
- Create a new protected domain.
- Protect multiple VMs (two or three should be sufficient).
- Wait for the VMs to enter the initial sync phase.
- From the DRVA Edit settings screen, disconnect the replication volume disk.
- This action should trigger the alarm.
Protected Domain Recovery Failure
Method 1
This alarm condition can be triggered while performing planned failover:
- Initiate planed failover from the recovery site.
- While failover is in progress, shut down the primary MSA.
- After a period of time, the alarm should be triggered from the primary site.
Method 2
This alarm condition can be triggered while performing continuous failover:
- Start continous failover.
- Terminate the task from the task log.
- After a period of time, the alarm should be triggered from the primary site.
Failback Interrupted Due to Issue at Failover Site
- Initiate a failover.
- After the failover successfully completes, open an MSA SSH session on the recovery site (where the domain has failed over).
- Start the failback process and concurrently stop the VME2 service on the recovery site by issuing the command:
- #service vme2 stop
- This action should trigger the alarm.
- After the test, the VME2 service can be restarted by issuing the command:
- #service vme2 start
Protected Domain Test Failover Failed
- Initiate a test failover at the recovery site.
- As test failover is being performed, power off the MSA at the primary site.
- This action should trigger the alarm.
Application Write Backpressure On
- If the incoming VM network speed is high compared to the outgoing replication traffic, this can create backpressure leading to the alarm being triggered.
DR Virtual Appliance Network IP Not Available
- Disconnect the DRVA network.
- After a period of time, the alarm should be triggered.
Test Failover Site Ready
- Conduct a test failover and perform the steps to the point where VMs can be tested at the recovery site.
- An alert message will appear in the UI of the recovery site where VMs can be tested.
Replication Log Reserved Space Running Low
- Deploy a DRVA and add a replication log volume with a minimal configuration.
- Create a protected domain configured with a large total estimated data size to be protected.
- Set the metadata size to be greater than half the capacity of the replication log disk.
- Once the protected domain is created, protect the VM.
- Navigate to the replication log and change the reserved space alarm threshold to 10% (the default size is 5%).
- This action should trigger the alarm.
Protected Domain Recovery Runbook Execution Failed
- Create a protected domain and protect a VM that doesn’t have VMware tools installed.
- If necessary, uninstall VMware tools from the VM.
- Configure a runbook for ReIP of the primary site.
- ReIP allows IP addresses of protected VMs to be changed via runbooks during failover or failback.
- Initiate a failover.
- After a period of time, the alarm should be triggered that can be viewed from Cluster > Monitor > Events.
VM Protection Cancelled
- This condition could occur in earlier versions of JetStream DR (version 4.1.x and prior) when a protected VM undergoes a snapshot revert.
- Current versions of JetStream DR software have addressed the underlying issue and it is no longer possible to create this condition or trigger this alarm.
DR Store Degraded in Multi-Pathing Mode
- This condition occurs if the replication log uses an iSCSI volume that relies on multi-pathing to storage and one of the paths becomes broken or degraded.
- In such case, the alarm will be triggered.
- This issue is not applicable in AVS environments which do not use iSCSI multi-pathing to storage.
Protected Domain Recovery (Failover / Failback / Restore ) Completed
- Initiate any failover, failback, or restore operation.
- Upon successful completion of the task, the alarm will be triggered.