DR for VMware – SRM on vSphere Replication


Most VMware administrators have heard of Site Recovery Manager (SRM). SRM has been the standard in disaster recovery for some time. It plays into VMware’s parent company’s (EMC) product line, traditionally leveraging storage based replication. This architecture leverages write journaling technology we spoke of in our first article in the series, so Recovery Time Objectives (RTOs) and Recovery Point Objectives (RPOs) could be very aggressive.

The down side to this architecture is that the customer has to have simular storage arrays at both the production and disaster recovery site. If for example the customer had a fiber channel array on the production side, and a lower grade NFS array from a different vendor on the other side SRM was not compatable. Bummer…

VMware however released vSphere replication in the vSphere 5 family suite and allowed administrators to replicate their virtual machines without common storage subsystems. What this means is that you could have your traditional fibre channel SAN on the production side, and NFS, or internal storage on your disaster recovery site. The underlying storage type is completely irrelavent as long as the workload is supported. This is a gift for DR budgets everywhere. Additionally you can recover to previous points in time using snapshots at the recovery site much the same as you would use a traditional snapshot.

SRM in thie configuration sits on top of the vSphere replication instead of RPAs that are common in array to array based architectures. These replication appliances are Linux virtual machines that are deployed in the VMware environment. I will give VMware a large amount of credit here, where some competing technologies are cumbersom to install, vSphere replication installaton takes only a few mouse clicks. Your vSphere replication appliances are functional in just a few minutes. Replication can be configured through the VMware fat client or the web client.

So what’s the catch? vSphere replication would fall into the snap and replicate category. This means that RTOs and RPOs wont be as aggressive as with array to array based replciation, or hypervisor technologies that use write journaling. The current RTOs and RPOs that can be achieved by vSphere replication with SRM over vSphere replication is 15 minutes. There are rumors that this will be coming down to 5 minutes in the future, but it’s only a rumor at this point. Also if you are trying to move to the web client then you will dismayed to learn that SRM can still only be managed through the VMware fat client. I don’t know to many administrators that are excited about the web client, but it’s a relavent piece of information for your day to day work.

So what about the licensing and additional costs? There are pros and cons to the vSphere replication / SRM model.

The virtual appliances are Linux based – pro

This means there aren’t additional Windows licenses required to operate the environment. Some of the other products use Windows based virtual appliances. When you have to stand up more Windows servers you have to patch and manage them, this adds to the cost of the solution. SRM can generally be installed on your Windows system that vCenter runs on. If you’re using the Linux based vCenter appliance SRM isn’t compatable. I would expect this to be resolved soon as VMware is trying to eliminate the need for Windows systems in the environment.

The base vSphere replication is free – pro

Yes you heard that correct, vSphere replication is free. If you have lower priority virtual machines you don’t have to buy SRM licenses. This means you can save money and buy only the SRM licenses (sold in packs of 25) for your mission critical VMs.

SRM is the orchestration tool on top of vSpherer replication – nutural

SRM and all of it’s power can be scoped down to only the systems you need it for. I personally like the flexability and choice, most companies don’t need to replicate all of their virtual machines with very tight RTOs and RPOs. If you are trying to replicate your entire VMware environment, you maybe better off with a solution that licenses by socket as it maybe more cost effective.

Snap and replicate technology – con

At the end of the day snap and replicate technologies are limited. Because the recovered virtual machine ends up with snapshots scalabilty can be an issue. Let’s look at an example.

VMware recommends that you only have 21 snapshots at a maximum using vSphere replication. More snapshots than this can lead to snapshot consolidation issues. If you wanted to have a recovery point every hour, you wouldn’t be able to recover your virtual machine to a point further back than 21 hours. This a limitation of any snaphost based replication technology not a defiency with in SRM or vSphere replication.

Scalability – nutral

The upper limit to SRM with vSphere replication is 500 virtual machines. This will suit most enterprises; however, for very large scale deployments this may not be enough. SRM with storage array replication for example can support up to 1500 vitual machines. This limit is roughly about what you would get with any other snap and replication technology. In my personal experience Veeam starts to have problems after 300 virtual machines in a single instance.

Speaking of Veeam this is the next technology that we will discuss. Veeam is a good product that not only provides DR capabilities, but also a very mature backup solution. Join us for our next article in the series.



