Stretched Storage Cluster with VMware vMSC (vSphere Metro Storage Cluster): Great solution for Remote/Branch Office Virtualization
A stretched storage cluster is a storage cluster that is distributed across two or more geographical locations. It is suitable for geographically separate datacenters with high speed connectivity between them and with stringent disaster/downtime avoidance requirements. A stretched storage cluster is an important solution type and architectural concept, especially now when a lot of customers are utilizing virtualization in ROBO (Remote Office / Brach Office) type environment. VMware vSphere Metro Storage Cluster is a stretched storage cluster with VMware vSphere.
The diagrams below shows the concept of vMSC architecture.
Source: http://www.vmware.com/files/pdf/techpaper/vSPHR-CS-MTRO-STOR-CLSTR-USLET-102-HI-RES.pdf
When to use stretched or metro clusters:
If you need to often move workloads between two or more sites non-disruptively.
If one site goes down (network failure, server failure, storage failure or complete site failure at one site), you need to continue running workload from the other site.
The two sites are not very far. The distance will depend on interlink technologies, cluster requirements and storage technology used but generally it should be less than 100 km.
The two sites should be connected through dedicated high speed link and not just regular WAN connection.
How a backup-restore type disaster recovery solution is different from a stretched or metro cluster:
A disaster recover solution works at any distance without a need for dedicated high-speed low-latency interlinks between sites.
The disaster recovery solution includes backup and replication at different levels and locations, i.e. at OS or hypervisor level, application level or at block or file level and also locally or at another site. It means that this strategy can be differently determined for different servers and applications. Some applications can be backed up only locally and some high priority critical applications can be backed up at another site. This means that you can customize your disaster recovery strategy to your budget and requirements. You do not have to set up an extensive inter-site network and storage infrastructure, like in the case of a stretched cluster.
A disaster recovery solution does not generally provide true non-disruptive high availability needed to migrate a workload between sites.
In most cases, when a stretched cluster is deployed, a complementary disaster recovery solution is also used.
Design considerations:
There are multiple factors affecting the design of a stretched storage cluster. Major considerations are which workloads should run at which of the locations during normal operation. This can be determined by applying affinity to each workload for specific servers (and so to a site). A lot of scenarios involving the type of failures (host level, network switch level, storage array level, enclosure level, disk level, etc.), affinity of different workload to different datacenters in regular operation (no-failure), correct HW sizing to allow workload failover across datacenters for different type of failures, etc. have to be considered to be able to correctly design a vSphere vMSC or any other stretched cluster.
Further reading:
VMware vSphere Metro Storage Cluster Case Study
Stretched Clusters and VMware vCenter Site Recovery Manager













