IO Filters: What You Need to Know for Hybrid Cloud Services

JetStream DR

Q&A with Serge Shats, Ph.D., CTO and Co-Founder, JetStream Software

Over the past few years, we’ve seen a lot of new features introduced to the VMware platform. Many of these new developments were undertaken to make VMware an even better cloud and hybrid cloud platform. One of the less well-known developments may be one of the most important: IO filters. As organizations shift some or most of their infrastructure to cloud-based services or consume cloud services for data protection and business continuity, IO filters are becoming key to accomplishing some important capabilities, such as migrating VMs without interruption and protecting VMs in an on-premises data center from a cloud service. The VMware vSphere API for IO Filtering (VAIO) represents a significant step in how VMware can be used in cloud and hybrid cloud environments.

We recently chatted with Dr. Serge Shats, CTO and co-founder at JetStream Software, about the company’s role in developing and applying IO filter technology for cross-cloud data management. Serge has led architecture and development at storage and data protection companies including Veritas, Quantum and Virsto. He was CTO of FlashSoft Software, then engineering fellow at SanDisk after the company acquired FlashSoft.

Q: Tell us about your role in developing the API framework for IO filters.

A: Starting in 2014, while our engineering team was still at SanDisk, we began collaborating with VMware as the co-design partner for the IO filters API, so we have a rather extensive understanding of the API. It is a standard VMware API, and solutions that support the API are listed in the VMware Compatibility Guide and are certified “VMware Ready.” The VMware Ready certification ensures full support from VMware. JetStream Software has a large number of deployments of the software, mostly in large data centers running VMware for public and private cloud operations.

Q: What are IO filters?

A: The IO filters API is a feature of vSphere that allows third-party data services to be safely integrated into the data path between the virtual machine and its virtual disk(s), capturing data and events in order to provide some data management service. There are different IO filters for different data management functions, including:

Data replication for disaster recovery
IO acceleration with host-based non-volatile memory
Data encryption
Storage IO control

Q: How do IO filters work?

A: An IO filter is a software component that intercepts data and events continuously, with very low latency. But there is more to IO filters than what they do at the VM level. It’s helpful to think about IO filters at the cluster level as well. IO filters are deployed from a standard VIB and installed by vSphere to every host in a cluster, including new hosts that are added after the initial deployment. Even the process of updating or uninstalling filters is managed across the cluster by vSphere. Once deployed, the filters’ operating parameters are defined by VMware Storage Policy Based Management (SPBM). So it’s fair to say that the API enables a third-party data service to act as though it is “VMware native.”

Q: What are the advantages of IO filters?

A: First, and perhaps most obviously, because IO filters run within vSphere, they truly achieve the goal of enabling “software defined storage.” IO filters are designed to run with any type of datastore, including shared storage, VSAN/HCI or Virtual Volumes. Second, among the various software-oriented approaches to integrating third-party data services with vSphere, IO filters are the most “vSphere native.” IO filters don’t use any agents in the VMs, virtual appliances in the data path, third-party modules in the kernel, or calls to internal APIs. Solutions deployed as IO filters provide an assurance of support, compatibility and stability that other approaches to software-defined storage can’t match. Of course, this becomes doubly important when we’re talking about cloud or hybrid cloud deployments, where abstraction is paramount.

Q: How are IO filters used for virtual machine live migration?

A: The problem with live migration is this: How do you keep applications running, with new data being written continuously, during the hours — or sometimes days — that it takes to move the applications’ data to the destination? There are a number of approaches, as virtual machine migration is not a new problem. But IO filters provide a capability that’s much simpler than anything we’ve seen before.

With JetStream Migrate, the software deploys as a replication filter in the source VMware environment. The migrating VMs’ configurations and virtual disks are copied from the on-premises data center to the cloud data center, and while that copy and transfer process is taking place, newly written data from the VM is captured by the IO filter and also replicated to the destination.

One of the advantages of this approach is that the copy of the virtual disk can be moved over the network connection, or it can be copied onto a physical device for “offline” transport to the cloud destination. So if you are familiar with the Amazon Snowball, it’s now possible for an organization to use a snowball-like device to transport data from one VMware environment to another VMware environment, without having to stop the VMs or their applications from running at the source.

Q: With respect to disaster recovery (DR), why would someone use IO filters instead of snapshots?

A: One of the key goals for using IO filters for data replication is that — unlike snapshots — data can be captured for replication without a detrimental impact on application performance. Also, because data is being captured in a stream, a solution based on IO filters can provide continuous data protection (CDP) with a recovery point objective (RPO) that’s nearly zero. That means in case of a disaster, you aren’t losing the hours’ or days’ worth of data since your last backup.

Q: How are IO filters used for cloud DR?

A: Here are four scenarios in which IO filters can be used to replicate data for cloud DR:

Business Continuity Cloud Services: With data replication from the on-premises environment to a cloud service provider, the service provider can host a warm failover destination for the VMs running at the on-premises data center.
Data Backup to Cloud Object Store: With the same method of intercepting data on-premises, the data can be continuously replicated to a cloud object store for recovery. Data may be preprocessed for the destination through the specific object store’s APIs. Again, no snapshots are required.
Point-in-Time Recovery for Continuous Data Protection: By replicating data in a continuous stream instead of discrete snapshots, point-in-time navigation is possible for recovery of all data up to immediately prior to a critical event (e.g., malware intrusion).
Cloud Data Protection Services for On-Premises HCI: Rather than requiring a “like-to-like” model for cloud data protection, data replication from within the hypervisor itself can provide DR for Virtual SAN or third-party HCI, even if the cloud destination is running entirely different compute and storage hardware.

Q: With respect to cloud DR, how do IO filters compare to other data capture methods?

A: IO filters enable a continuous capture of data from within vSphere, which is a game-changer for cloud DR. Traditionally, organizations have looked to the cloud for snapshot-based backup, which has its place, but it is limited in terms of realizing true DR as a cloud service.

It’s well understood that snapshots degrade application performance and by definition don’t support continuous replication. The tradeoff with snapshots is the shorter you want your RPO to be, the more snapshots you create, so the greater impact on runtime performance. Also, recovering a volume from many small snapshots will increase the recovery time objective (RTO). For true DR from a cloud service, continuous data replication from an IO filter gives a more efficient approach.

Prior to the availability of IO filters, continuous data capture was possible, for example, by intercepting data in a vSCSI filter. This is how vSphere Replication accesses data as it makes snapshots for data recovery. The key problem with vSCSI is that it’s a private API intended for VMware’s use, and VMware provides no guarantee of support for third-party technologies that use vSCSI intercept.

Another approach to continuous data capture is to install agents inside the VMs to replicate data in a stream. While this method can achieve RPOs of just seconds, it is an agent-based solution, which may raise concerns about security and compatibility.

Lastly, virtual appliances typically run within their own VMs, so they are broadly compatible, and they generally don’t take snapshots, so they can stream data. The problem is that they either stand in the data path itself, introducing IO latency, or they require a filter or agent to intercept data.

Q: What’s next for IO filters?

A: While the IO filters API is primarily of interest to software developers providing data management services in the VMware ecosystem, interest has been growing recently, driven primarily by cloud and hybrid cloud use cases. In the future, it’s not difficult to see IO filters applied for uses beyond performance acceleration, live migration, and data protection to other types of policy-based data management.

The idea of cloud services moving beyond disaster recovery and data protection solutions is feasible with on-premises IO filters enabling “X as a service” offerings, with the application of specific policies to data across an infrastructure comprising on-premises operations and cloud services.

With an IO filter in each VM on premises, a solution can intercept and process every bit of data moving up and down the storage stack, and it can help the admin set data policies for those VMs, for any type of business requirement, such as cloud cost optimization or compliance. The key is that there is no need for an external data management framework — policy-based data management can be enabled within vSphere itself — across multiple data centers and cloud services.

JetStream DR