Microsoft Software Defined Storage AKA Scale-out File Server (SOFS)

Business Challenges:

  • $/IOPS and $/TB
  • Continuous Availability
  • Fault Tolerance
  • Storage Performance
  • Segregation of production, development and disaster recovery storage
  • De-duplication of unstructured data
  • Segregation of data between production site and disaster recovery site
  • Continuous break fix of Distributed File Systems (DFS) & File Server
  • Continuously extending storage on the DFS servers
  • Single point of failure
  • File systems is not available always
  • Security of file systems is constant concern
  • Propitiatory non-scalable storage
  • Management of physical storage
  • Vendor lock-in contract for physical storage
  • Migration path from single vendor to multi vendor storage provider
  • Management overhead of unstructured data
  • Comprehensive management of storage platform


Microsoft Software Defined Storage AKA Scale-Out File Server is a feature that is designed to provide scale-out file shares that are continuously available for file-based server application storage. Scale-out file shares provides the ability to share the same folder from multiple nodes of the same cluster.Microsoft Software Defined Storage offerings compared with third party offering:

Storage feature Third-party NAS/SAN Microsoft Software-Defined Storage
Fabric Block protocol


File protocol Network


Network Low latency network with FC


Low latency with SMB3Direct Management


Management Management of LUNs


Management of file shares Data de-duplication


Data De-duplication Data de-duplication


Data de-duplication Resiliency


Resiliency RAID resiliency groups


Flexible resiliency options Pooling


Pooling Pooling of disks


Pooling of disks Availability


Availability High


Continuous (via redundancy) Copy offload, Snapshots


Copy Offloads, Snapshots Copy offload, Snapshots


SMB copy offload, Snapshots Tiering


Tiering Storage tiering


Performance with tiering Persistent write-back cache


Persistent Write-back cache Persistent write-back cache


Persistent write-back cache Scale up


Scale up Scale up


Automatic scale-out rebalancing Storage Quality of Service (QoS)


Storage Quality of Service (QoS) Storage QoS


Storage QoS (Windows Server 2016) Replication


Replication Replication


Storage Replica (Windows Server 2016) Updates


Updates Firmware updates


Rolling cluster upgrades (Windows Server 2016)


    Storage Spaces Direct (Windows Server 2016)


    Azure-consistent storage (Windows Server 2016)


 Functional use of Microsoft Scale-Out File Servers:

1. Application Workloads

  • Microsoft Hyper-v Cluster
  • Microsoft SQL Server Cluster
  • Microsoft SharePoint
  • Microsoft Exchange Server
  • Microsoft Dynamics
  • Microsoft System Center DPM Storage Target
  • Veeam Backup Repository

2. Disaster Recovery Solution

  • Backup Target
  • Object storage
  • Encrypted storage target
  • Hyper-v Replica
  • System Center DPM

3. Unstructured Data

  • Continuously Available File Shares
  • DFS Namespace folder target server
  • Microsoft Data de-duplication
  • Roaming user Profiles
  • Home Directories
  • Citrix User Profiles
  • Outlook Cached location for Citrix XenApp Session Server

4. Management

  • Single Management Point for all Scale-out File Servers
  • Provide wizard driven tools for storage related tasks
  • Integrated with Microsoft System Center

Business Values:

  • Scalability
  • Load balancing
  • Fault tolerance
  • Ease of installation
  • Ease of management/operations
  • Flexibility
  • Security
  • High performance
  • Compliance & Certification

SOFS Architecture:

Microsoft Scale-out File Server (SOFS) is  considered as a Storage Defined Storage (SDS).  Microsoft SOFS is independent of hardware vendor as long as the compute and storage is certified by Microsoft Corporation. The following figure shows Microsoft Hyper-v cluster, SQL Cluster and Object Storage on the SOFS.


                 Figure: Microsoft Software Defined Storage (SDS) Architecture


                     Figure: Microsoft Scale-out File Server (SOFS) Architecture


                                      Figure: Microsoft SDS Components


                        Figure: Unified Storage Management (See Reference)

Microsoft Software Defined Storage AKA Scale-out File Server Benefits:


  • Continuous availability file stores for Hyper-V and SQL Server
  • Load-balanced IO across all nodes
  • Distributed access across all nodes
  • VSS support
  • Transparent failover and client redirection
  • Continuous availability at a share level versus a server level


  • Identifies duplicate chunks of data and only stores one copy
  • Provides up to 90% reduction in storage required for OS VHD files
  • Reduces CPU and Memory pressure
  • Offers excellent reliability and integrity
  • Outperforms Single Instance Storage (SIS) or NTFS compression.

SMB Multichannel

  • Automatic detection of SMB Multi-Path networks
  • Resilience against path failures
  • Transparent failover with recovery
  • Improved throughput
  • Automatic configuration with little administrative overhead

SMB Direct:

  • The Microsoft implementation of RDMA.
  • The ability to direct data transfers from a storage location to an application.
  • Higher performance and lower latency through CPU offloading
  • High-speed network utilization (including InfiniBand and iWARP)
  • Remote storage at the speed of local storage
  • A transfer rate of approximately 50Gbps on a single NIC port
  • Compatibility with SMB Multichannel for load balancing and failover

VHDX Virtual Disk:

  • Online VHDX Resize
  • Storage QoS (Quality of Service)

Live Migration

  • Easy migration of virtual machine into a cluster while the virtual machine is running
  • Improved virtual machine mobility
  • Flexible placement of virtual machine storage based on demand
  • Migration of virtual machine storage to shared storage without downtime

Storage Protocol:

  • SAN discovery (FCP, SAS, iSCSI i.e. EMC VNX, EMC VMAX)
  • NAS discovery (Self-contained NAS, NAS Head i.e. NetApp OnTap)
  • File Server Discovery (Microsoft Scale-Out File Server, Unified Storage)

Unified Management:

  • A new architecture provides ~10x faster disk/partition enumeration operations
  • Remote and cluster-awareness capabilities
  • SM-API exposes new Windows Server 2012 R2 features (Tiering, Write-back cache, and so on)
  • SM-API features added to System Center VMM
  • End-to-end storage high availability space provisioning in minutes in VMM console
  • More Windows PowerShell


  • More resilience to power failures
  • Highest levels of system availability
  • Larger volumes with better durability
  • Scalable to petabyte size volumes

Storage Replica:

  • Hardware agnostic storage configuration
  • Provide a DR solution for planned and unplanned outages of mission critical workloads.
  • Use SMB3 transport with proven reliability, scalability, and performance.
  • Stretched failover clusters within metropolitan distances.
  • Manage end to end storage and clustering for Hyper-V, Storage Replica, Storage Spaces, Scale-Out File Server, SMB3, Deduplication, and ReFS/NTFS using Microsoft software
  • Reduce downtime, and increase reliability and productivity intrinsic to Windows.

Cloud Integration:

  • Cloud-based storage service for online backups
  • Windows PowerShell instrumented
  • Simple, reliable Disaster Recovery solution for applications and data
  • Supports System Center 2012 R2 DPM

Implementing Scale-out File Server

Scale-out File Server Recommended Configuration:

  1. Gather all virtual servers IOPS requirements*
  2. Gather Applications IOPS requirements
  3. Total IOPS of all applications & Virtual machines must be less than available IOPS of physical storage 
  4. Keep latency below 3 ms at all time for high performance
  5. Gather required capacity + potential growth + best practice
  6. N+1 Compute, Network and Storage Hardware
  7. Use low latency, high throughput networks
  8. Segregate storage network from data network using logical network (VLAN) or fibre channel
  9. Tools to be used

*Not all virtual servers are same, DHCP server generate few IOPS, SQL server and Exchange can generate thousands of IOPS.

*Do not place SQL Server on the same logical volume (LUN) with Exchange Server or Microsoft Dynamics or Backup Server.

*Isolate high IO workloads to separate logical volume or even separate storage pool if possible.

Prerequisites for Scale-Out File Server

  1. Install File and Storage Services server role, and the Failover Clustering feature on the cluster nodes
  2. Configure Microsoft failover Clusters using this article Windows Server 2012: Failover Clustering Deep Dive Part II
  3. Add Cluster Share Volume
  • Log on to the server as a member of the local Administrators group.
  • Open Server Manager> Click Tools, and then click Failover Cluster Manager.
  • Click Storage, right-click the disk that you want to add to the cluster shared volume, and then click Add to Cluster Shared Volumes> Add Storage Presented to this cluster.

Configure Scale-out File Server

  1. Open Failover Cluster Manager> Right-click the name of the cluster, and then click Configure Role.
  2. On the Before You Begin page, click Next.
  3. On the Select Role page, click File Server, and then click Next.
  4. On the File Server Type page, select the Scale-Out File Server for application data option, and then click Next.
  5. On the Client Access Point page, in the Name box, type a NETBIOS of Scale-Out File Server, and then click Next.
  6. On the Confirmation page, confirm your settings, and then click Next.
  7. On the Summary page, click Finish.

Create Continuously Available File Share

  1. Open Failover Cluster Manager>Expand the cluster, and then click Roles.
  2. Right-click the file server role, and then click Add File Share.
  3. On the Select the profile for this share page, click SMB Share – Applications, and then click Next.
  4. On the Select the server and path for this share page, click the name of the cluster shared volume, and then click Next.
  5. On the Specify share name page, in the Share name box, type a name for the file share, and then click Next.
  6. On the Configure share settings page, ensure that the Continuously Available check box is selected, and then click Next.
  7. On the Specify permissions to control access page, click Customize permissions, grant the following permissions, and then click Next:
  • To use Scale-Out File Server file share for Hyper-V: All Hyper-V computer accounts, the SYSTEM account, cluster computer account for any Hyper-V clusters, and all Hyper-V administrators must be granted full control on the share and the file system.
  • To use Scale-Out File Server on Microsoft SQL Server: The SQL Server service account must be granted full control on the share and the file system

      8. On the Confirm selections page, click Create. On the View results page, click Close.

Use SOFS for Hyper-v Server VHDX Store:

  1. Open Hyper-V Manager. Click Start, and then click Hyper-V Manager.
  2. Open Hyper-v Settings> Virtual Hard Disks> Specify Location of Store as \\SOFS\VHDShare\ and Specify location of Virtual Machine Configuration \\SOFS\VHDCShare
  3. Click Ok.

Use SOFS in System Center VMM: 

  1. Add Windows File Server in VMM
  2. Assign SOFS Share to Fabric & Hosts

Use SOFS for SQL Database Store:

1. Assign SQL Service Account Full permission to SOFS Share

  • Open Windows Explorer and navigate to the scale-out file share.
  • Right-click the folder, and then click Properties.
  • Click the Sharing tab, click Advanced Sharing, and then click Permissions.
  • Ensure that the SQL Server service account has full-control permissions.
  • Click OK twice.
  • Click the Security tab. Ensure that the SQL Server service account has full-control permissions.

2. In SQL Server 2012, you can choose to store all database files in a scale-out file share during installation.  

3. On the step 20 of SQL Setup Wizard , provide a location of Scale-out File Server which is \\SOFS\SQLData and \\SOFS\SQLLogs

4. Create a Database on SOFS Share but on the existing SQL Server using SQL Script

( NAME = N’TestDB’, FILENAME = N’\\SOFS\SQLDB\TestDB.mdf’ )
( NAME = N’TestDBLog’, FILENAME = N’\\SOFS\SQLDBLog\TestDBLogs.ldf’)

Use Backup & Recovery:

System Center Data Protection Manager 2012 R2

Configure and add a dedupe storage target into DPM 2012 R2. DPM 2012 R2 will not backup SOFS itself but it will backup VHDX files stored on SOFS. Follow Deduplicate DPM storage and protection for virtual machines with SMB storage  guide to backup virtual machines.

Veeam Availability Suite

  1. Log on to Veeam Availability Console>Click Backup Repository> Right Click New backup Repository
  2. Select Shared Folder on the Type Tab
  3. Add SMB Backup Target \\SOFS\Repository
  4. Follow the Wizard. Make Sure Service Account of Veeam has full access permission to \\SOFS\Repository  Share.
  5. Click Scale-out Repositories>Right Click Add Scale-out backup repository> Type the Name
  6. Select the backup repository you created in previous>Follow the Wizard to complete tasks.


Microsoft Storage Architecture

Storage Spaces Physical Disk Validation Script

Validate Hardware

Deploy Clustered Storage Spaces

Storage Spaces Tiering in Windows Server 2012 R2

SMB Transparent Failover

Cluster Shared Volume (CSV) Inside Out

Storage Spaces – Designing for Performance

Related Articles:

Scale-Out File Server Cluster using Azure VMs

Microsoft Multi-Site Failover Cluster for DR & Business Continuity

Microsoft Multi-Site Failover Cluster for DR & Business Continuity

Not every organisation looses millions of dollar per second but some does. An organisation may not loose millions of dollar per second but consider customer service and reputation are number one priority. These type of business wants their workflow to be seamless and downtime free. This article is for them who consider business continuity equals money well spent. Here is how it is done:

Multi-Site Failover Cluster

Microsoft Multi-Site Failover Cluster is a group of Clustered Nodes distribution through multiple sites in a region or separate region connected with low latency network and storage. As per the diagram illustrated below, Data Center A Cluster Nodes are connected to a local SAN Storage, while replicated to a SAN Storage on the Data Center B. Replication is taken care by a identical software defined storage on each site.  Software defined storage will replicate volumes or Logical Unit Number (LUN) from primary site in this example Data Center A to Disaster Recovery Site B. Microsoft Failover cluster is configured with pass-through storage i.e. volumes and these volumes are replication to DR site. In the Primary and DR sites, physical network is configured using Cisco Nexus 7000. Data network and virtual machine network are logically segregated in Microsoft System Center VMM and physical switch using virtual local area network or VLAN.  A separate Storage Area Network (SAN) is created in each site with low latency storage. Volumes of pass-through storage are replicated to DR site using identical size of volumes.


                                     Figure: Highly Available Multi-site Cluster


                           Figure: Software Defined Storage in Each Site

 Design Components of Storage:

  • SAN to SAN replication must be configured correctly
  • Initial must be complete before Failover Cluster is configured
  • MPIO software must be installed on the cluster Nodes (N1, N2…N6)
  • Physical and logical multipathing must be configured
  • If Storage is presented directly to virtual machines or cluster nodes then NPIV must configured on the Fabric Zones.
  • All Storage and Fabric Firmware must up to date with manufacturer latest software
  • An identical software defined storage must be used on the both sites 
  • If a third party software is used to replicate storage between sites then storage vendor must be consulted before the replication. 

Further Reading:

Understanding Software Defined Storage (SDS)

How to configure SAN replication between IBM Storwize V3700 systems

Install and Configure IBM V3700, Brocade 300B Fabric and ESXi Host Step by Step

Application Scale-out File Systems

Design Components of Network:

  • Isolate management, virtual and data network using VLAN
  • Use a reliable IPVPN or Fibre optic provider for the replication over the network
  • Eliminate all single point of failure from all network components
  • Consider stretched VLAN for multiple sites 

Further Reading:

Understanding Network Virtualization in SCVMM 2012 R2

Understanding VLAN, Trunk, NIC Teaming, Virtual Switch Configuration in Hyper-v Server 2012 R2

Design failover Cluster Quorum

  • Use Node & File Share Witness (FSW) Quorum for even number of Cluster Nodes
  • Connect File Share Witness on to the third Site
  • Do not host File Share Witness on a virtual machine on same site
  • Alternatively use Dynamic Quorum

Further Reading:

Understanding Dynamic Quorum in a Microsoft Failover Cluster

Design of Compute

  • Use reputed vendor to supply compute hardware compatible with Microsoft Hyper-v
  • Make sure all latest firmware updates are applied to Hyper-v host
  • Make manufacture provide you with latest HBA software to be installed on Hyper-v host

Further Reading:

Windows Server 2012: Failover Clustering Deep Dive Part II

Implementing a Multi-Site Failover Cluster

Step1: Prepare Network, Storage and Compute

Understanding Network Virtualization in SCVMM 2012 R2

Understanding VLAN, Trunk, NIC Teaming, Virtual Switch Configuration in Hyper-v Server 2012 R2

Install and Configure IBM V3700, Brocade 300B Fabric and ESXi Host Step by Step

Step2: Configure Failover Cluster on Each Site

Windows Server 2012: Failover Clustering Deep Dive Part II

Understanding Dynamic Quorum in a Microsoft Failover Cluster

Multi-Site Clustering & Disaster Recovery

Step3: Replicate Volumes

How to configure SAN replication between IBM Storwize V3700 systems

How to create a Microsoft Multi-Site cluster with IBM Storwize replication

Use Cases:

Use case can be determined by current workloads and future workloads plus business continuity. Deploy Veeam One to determine current workloads on your infrastructure and propose a future workload plus business continuity.  Here is a list of use cases of multi-site cluster.

  • Scale-Out File Server for application data-  To store server application data, such as Hyper-V virtual machine files, on file shares, and obtain a similar level of reliability, availability, manageability, and high performance that you would expect from a storage area network. All file shares are simultaneously online on all nodes. File shares associated with this type of clustered file server are called scale-out file shares. This is sometimes referred to as active-active.

  • File Server for general use – This type of clustered file server, and therefore all the shares associated with the clustered file server, is online on one node at a time. This is sometimes referred to as active-passive or dual-active. File shares associated with this type of clustered file server are called clustered file shares.

  • Business Continuity Plan

  • Disaster Recovery Plan

  • DFS Replication Namespace for Unstructured Data i.e. user profile, home drive, Citrix profile

  • Highly Available File Server Replication 

How to configure SAN replication between IBM Storwize V3700 systems

The Metro Mirror and Global Mirror Copy Services features enable you to set up synchronous and asynchronous replication between two volumes between two separate IBM storage, so that updates are made by an application to one volume in one storage systems in prod site are mirrored on the other volume in anther storage systems in DR site.

  • The Metro Mirror feature provides a synchronous-replication. When a host writes to the primary volume, it does not receive confirmation of I/O completion until the write operation has completed for the copy on both the primary volume and the secondary volume. This ensures that the secondary volume is always up-to-date with the primary volume in the event that a failover operation must be performed. However, the host is limited to the latency and bandwidth limitations of the communication link to the secondary volume.
  • The Global Mirror feature provides an asynchronous-replication. When a host writes to the primary volume, confirmation of I/O completion is received before the write operation has completed for the copy on the secondary volume. If a failover operation is performed, the application must recover and apply any updates that were not committed to the secondary volume. If I/O operations on the primary volume are paused for a small length of time, the secondary volume can become an exact match of the primary volume.


  1. Both systems are connected via dark fibre, L2 MPLS or IPVPN for replication over IP
  2. Both systems are connected via fibre for replication over FC.
  3. Both systems have up to date and latest firmware.
  4. Easy Tier and SSD installed in both systems.
  5. Remote Copy license activated in both systems.
  6. Volumes are identical in Prod and DR SAN.

Configure Metro Mirror in IBM v3700 Systems

Step1: Activate License

Log on to IBM V3700>Settings>System>Licensing

Activate Remote Copy and Easy Tier License.


Step2: Configure Ethernet Ports & iSCSI in Production SAN and DR SAN

Both systems will communicate via management network but volume will be replicated via Ethernet if remote copy is configured to use replication over Ethernet. This step is necessary for Metro Mirror over Ethernet. Skip this step if you are using FC.

Log on to Production IBM v3700 systems. Settings>Network>Ethernet Ports. Right Click on Node1 Port 2> Configure Copy Group1 and Copy Group2. Assign IP address, Enable iSCSI, Select Copy Group1. Repeat to Create Copy Group2.



Repeat the step to configure Copy Groups in DR SAN.  
Note: TCP/IP assigned in DR SAN can be from same subnet of production SAN or can be different than production subnet as long as both subnets can communicate with each other.  
Step3: Create Partnership in Prod & DR SAN
Log on to Production V3700>Copy Services>Partnerships>Create Partnership>Add NetBIOS Name and Management IP of DR SAN


Fully Configured Indicates that the partnership is defined on the local and remote systems and is started.


Initial synchronization bandwidth is 2048MBps but once I take the DR storage to DR site. I will modify that to 1024MBps. Initial Synchronization will take place in prod site. You can have your own bandwidth specification.

Log on to DR V3700>Copy Services>Partnerships>Create Partnership>Add NetBIOS Name and Management IP of Prod SAN


Step4: Create Relationships between Volume

log on to Prod SAN>Copy Services>Remote Copy>Add Relationship>Select Metro Mirror




Specify DR SAN as Auxiliary Systems where DR Volume is located. IBM should have used word “Master/Slave” or “Prod/DR systems”.   


Add identical volume>Click Next.



Step5: Monitor Performance and Copy Services



Consistency group states

Consistent (synchronized) – The primary volumes are accessible for read and write I/O operations. The secondary volumes are accessible for read-only I/O operations.

Inconsistent (copying) – The primary volumes are accessible for read and write I/O operations, but the secondary volumes are not accessible for either operation. This state is entered after the startrcconsistgrp command is issued to a consistency group in the InconsistentStopped state. This state is also entered when the startrcconsistgrp command is issued, with the force option, to a consistency group in the Idling or ConsistentStopped state.

The background copy bandwidth can affect foreground I/O latency in one of three ways:

  • If the background copy bandwidth is set too high for the intersystem link capacity, the following results can occur:
    • The intersystem link is not able to process the background copy I/Os fast enough, and the I/Os can back up (accumulate).
    • For Metro Mirror, there is a delay in the synchronous secondary write operations of foreground I/Os.
    • For Global Mirror, the work is backlogged, which delays the processing of write operations and causes the relationship to stop. For Global Mirror in multiple-cycling mode, a backlog in the intersystem link can congest the local fabric and cause delays to data transfers.
    • The foreground I/O latency increases as detected by applications.
  • If the background copy bandwidth is set too high for the storage at the primary site, background copy read I/Os overload the primary storage and delay foreground I/Os.
  • If the background copy bandwidth is set too high for the storage at the secondary site, background copy write operations at the secondary overload the secondary storage and again delay the synchronous secondary write operations of foreground I/Os.
    • For Global Mirror without cycling mode, the work is backlogged and again the relationship is stopped



Once volumes are synchronized you are ready to integrate storage with System Center 2012 R2.

Further Readings:

SAN Replication based Enterprise Grade Disaster Recovery with ASR and System Center

What’s New in 2012 R2: Cloud-integrated Disaster Recovery

Understanding IT Disaster Recovery Plan

Install and Configure IBM V3700, Brocade 300B Fabric and ESXi Host Step by Step

Data Deduplication in Windows Storage Server 2012 R2

Deduplication in Windows Server: Data deduplication involves finding and removing duplication within data without compromising its fidelity or integrity. The goal is to store more data in less space by segmenting files into small variable-sized chunks (32–128 KB), identifying duplicate chunks, and maintaining a single copy of each chunk. Redundant copies of the chunk are replaced by a reference to the single copy. The chunks are compressed and then organized into special container files in the System Volume Information folder.

Enhanced Dedupe features in Windows Server 2012 R2

  • Data deduplication for remote storage of Virtual Desktop Infrastructure (VDI) workloads
  • Expand an optimized file on its original path.

When using the Data Deduplication feature for the first time or migrating from a previous version of Windows Server, be sure to consider the following related technologies and issues:

  • BranchCache
  • Failover Clusters
  • DFS Replication
  • FSRM quotas
  • Single Instance Storage or NAS Box

Install and Configure Data Deduplication using GUI

1. Open Server Manager, From the Add Roles and Features Wizard, under Server Roles, select File and Storage Services.

2. Select the File Services check box, and then select the Data Deduplication check box.

3. Click Next until the Install button is active, and then click Install.

4. From the Server Manager dashboard, right-click a data volume and choose Configure Data Deduplication. The Deduplication Settings page appears.

5. In the Data deduplication box, select the workload you want to host on the volume. Select General purpose file server for general data files or Virtual Desktop Infrastructure (VDI) server when configuring storage for running virtual machines.

6. Enter the number of days that should elapse from the date of file creation until files are deduplicated, enter the extensions of any file types that should not be deduplicated, and then click Add to browse to any folders with files that should not be deduplicated.

7. Click Apply to apply these settings and return to the Server Manager dashboard, or click the Set Deduplication Schedule button to continue to set up a schedule for deduplication.

Install and Configure Data Deduplication using Windows PowerShell

Start Windows PowerShell. Right-click the Windows PowerShell icon on the taskbar, and then click Run as Administrator.

Import-Module ServerManager | Add-WindowsFeature -name FS-Data-Deduplication

Import-Module Deduplication

Enable-DedupVolume E: -UsageType HyperV

Enable-DedupVolume E: -UsageType Default

Set-Dedupvolume E: -MinimumFileAgeDays 20

Get-DedupVolume | fl

Start-DedupJob E: –Type Optimization –Wait


Windows Server 2012 R2 NAS Box with Deduplication Capacity

Introduction to Windows Deduplication

Windows PowerShell Cmdlet for Deduplication

Multi-Site Hyper-v Cluster for High Availability and Disaster Recovery

In most of the SMB customer, the nodes of the cluster that reside at their primary data center provide access to the clustered service or application, with failover occurring only between clustered nodes. However for an enterprise customer, failure of a business critical application is not an option. In this case, disaster recovery and high availability are bonded together so that when both/all nodes at the primary site are lost, the nodes at the secondary site begin providing service automatically, or with minimal intervention.

The maximum availability of any services or application depends on how you design your platform that hosts these services. It is important to follow best practices in Compute, Network and Storage infrastructure to maximize uptime and maintain SLA.

The following diagram shows a multi-site failover cluster that uses four nodes and supports a clustered service or application.




The following rack diagram shows the identical compute, storage and networking infrastructure in both site.


Physical Infrastructure

  • Primary and Secondary sites are connected via 2x10Gbps dark fibre
  • Storage vendor specific replication software such as EMC recovery point
  • Storage must have redundant storage processor
  • There must be redundant Switches for networking and storage
  • Each server must be connected to redundant switches with redundant NIC for each purpose
  • Each Hyper-v server must have minimum dual Host Bus Adapter (HBA) port connected to redundant MDS switches
  • Each network must be connected to dual NIC from server to switches
  • Only iLO/DRAC will have a single connection
  • Each site must have redundant power supply.

Storage Requirements

Since I am talking about highly available and redundant systems design, this sort of design must consist of replicated or clustered storage presented to multi-site Hyper-v cluster nodes. Replication of data between sites is very important in a multi-site cluster, and is accomplished in different ways by different hardware vendors. You will achieve high performance through hardware or block level replication instead of software. You should contact your storage vendor to come up with solutions that provide replicated or clustered storage.

Examples are:

StarWind Software

Steeleye DataKeeper

EMC Recovery Point

HP Storage

Network Requirements:

A multi-site cluster running Windows Server 2008 can contain nodes that are in different subnet however as a best practice, you must configure Hyper-v cluster in same subnet. You applications and services can reside in separate subnets. To avoid any conflict, you should use dark fibre connection or MPLS network between multi-sites that allows VLANs.

Note that you must configure Hyper-v with static IP. In a multi-site cluster, you might want to tune the “heartbeat” settings, see for details.

Network Configuration Spread Sheet



NICs and Switch Ports speed







Live Migration



Storage Migration



Virtual Machine



iSCSI Network



Heartbeat network



Storage Replication

(Separate from Hyper-v)


Dark Fibre


Note that iSCSI network is only required if you are using IP Storage instead of Fibre Channel (FC) storage.

Cluster Selection: Node and File Share Majority (For Cluster with Special Configurations)

Quorum Selection: Since you will be configuring Node and File Share Majority cluster, you will have the option to place quorum files to shared folder. Where do you place this shared folder? Since we are talking about fully redundant and highly available Hyper-v Cluster, we have several options to place quorum shared folder.

Option1: Secondary Site

Option 2: Third Site

Visit for more details on quorum.

Storage Configuration:

Visit , and for clustered storage configuration for Hyper-v.

Hyper-v Cluster Configuration:

Visit for detailed cluster configuration guide.