Microsoft Multi-Site Failover Cluster for DR & Business Continuity

Not every organisation looses millions of dollar per second but some does. An organisation may not loose millions of dollar per second but consider customer service and reputation are number one priority. These type of business wants their workflow to be seamless and downtime free. This article is for them who consider business continuity equals money well spent. Here is how it is done:

Multi-Site Failover Cluster

Microsoft Multi-Site Failover Cluster is a group of Clustered Nodes distribution through multiple sites in a region or separate region connected with low latency network and storage. As per the diagram illustrated below, Data Center A Cluster Nodes are connected to a local SAN Storage, while replicated to a SAN Storage on the Data Center B. Replication is taken care by a identical software defined storage on each site.  Software defined storage will replicate volumes or Logical Unit Number (LUN) from primary site in this example Data Center A to Disaster Recovery Site B. Microsoft Failover cluster is configured with pass-through storage i.e. volumes and these volumes are replication to DR site. In the Primary and DR sites, physical network is configured using Cisco Nexus 7000. Data network and virtual machine network are logically segregated in Microsoft System Center VMM and physical switch using virtual local area network or VLAN.  A separate Storage Area Network (SAN) is created in each site with low latency storage. Volumes of pass-through storage are replicated to DR site using identical size of volumes.

image

                                     Figure: Highly Available Multi-site Cluster

image

                           Figure: Software Defined Storage in Each Site

 Design Components of Storage:

  • SAN to SAN replication must be configured correctly
  • Initial must be complete before Failover Cluster is configured
  • MPIO software must be installed on the cluster Nodes (N1, N2…N6)
  • Physical and logical multipathing must be configured
  • If Storage is presented directly to virtual machines or cluster nodes then NPIV must configured on the Fabric Zones.
  • All Storage and Fabric Firmware must up to date with manufacturer latest software
  • An identical software defined storage must be used on the both sites 
  • If a third party software is used to replicate storage between sites then storage vendor must be consulted before the replication. 

Further Reading:

Understanding Software Defined Storage (SDS)

How to configure SAN replication between IBM Storwize V3700 systems

Install and Configure IBM V3700, Brocade 300B Fabric and ESXi Host Step by Step

Application Scale-out File Systems

Design Components of Network:

  • Isolate management, virtual and data network using VLAN
  • Use a reliable IPVPN or Fibre optic provider for the replication over the network
  • Eliminate all single point of failure from all network components
  • Consider stretched VLAN for multiple sites 

Further Reading:

Understanding Network Virtualization in SCVMM 2012 R2

Understanding VLAN, Trunk, NIC Teaming, Virtual Switch Configuration in Hyper-v Server 2012 R2

Design failover Cluster Quorum

  • Use Node & File Share Witness (FSW) Quorum for even number of Cluster Nodes
  • Connect File Share Witness on to the third Site
  • Do not host File Share Witness on a virtual machine on same site
  • Alternatively use Dynamic Quorum

Further Reading:

Understanding Dynamic Quorum in a Microsoft Failover Cluster

Design of Compute

  • Use reputed vendor to supply compute hardware compatible with Microsoft Hyper-v
  • Make sure all latest firmware updates are applied to Hyper-v host
  • Make manufacture provide you with latest HBA software to be installed on Hyper-v host

Further Reading:

Windows Server 2012: Failover Clustering Deep Dive Part II

Implementing a Multi-Site Failover Cluster

Step1: Prepare Network, Storage and Compute

Understanding Network Virtualization in SCVMM 2012 R2

Understanding VLAN, Trunk, NIC Teaming, Virtual Switch Configuration in Hyper-v Server 2012 R2

Install and Configure IBM V3700, Brocade 300B Fabric and ESXi Host Step by Step

Step2: Configure Failover Cluster on Each Site

Windows Server 2012: Failover Clustering Deep Dive Part II

Understanding Dynamic Quorum in a Microsoft Failover Cluster

Multi-Site Clustering & Disaster Recovery

Step3: Replicate Volumes

How to configure SAN replication between IBM Storwize V3700 systems

How to create a Microsoft Multi-Site cluster with IBM Storwize replication

Use Cases:

Use case can be determined by current workloads and future workloads plus business continuity. Deploy Veeam One to determine current workloads on your infrastructure and propose a future workload plus business continuity.  Here is a list of use cases of multi-site cluster.

  • Scale-Out File Server for application data-  To store server application data, such as Hyper-V virtual machine files, on file shares, and obtain a similar level of reliability, availability, manageability, and high performance that you would expect from a storage area network. All file shares are simultaneously online on all nodes. File shares associated with this type of clustered file server are called scale-out file shares. This is sometimes referred to as active-active.

  • File Server for general use – This type of clustered file server, and therefore all the shares associated with the clustered file server, is online on one node at a time. This is sometimes referred to as active-passive or dual-active. File shares associated with this type of clustered file server are called clustered file shares.

  • Business Continuity Plan

  • Disaster Recovery Plan

  • DFS Replication Namespace for Unstructured Data i.e. user profile, home drive, Citrix profile

  • Highly Available File Server Replication 

Multi-Site Hyper-v Cluster for High Availability and Disaster Recovery

In most of the SMB customer, the nodes of the cluster that reside at their primary data center provide access to the clustered service or application, with failover occurring only between clustered nodes. However for an enterprise customer, failure of a business critical application is not an option. In this case, disaster recovery and high availability are bonded together so that when both/all nodes at the primary site are lost, the nodes at the secondary site begin providing service automatically, or with minimal intervention.

The maximum availability of any services or application depends on how you design your platform that hosts these services. It is important to follow best practices in Compute, Network and Storage infrastructure to maximize uptime and maintain SLA.

The following diagram shows a multi-site failover cluster that uses four nodes and supports a clustered service or application.

 

image

 

The following rack diagram shows the identical compute, storage and networking infrastructure in both site.

image

Physical Infrastructure

  • Primary and Secondary sites are connected via 2x10Gbps dark fibre
  • Storage vendor specific replication software such as EMC recovery point
  • Storage must have redundant storage processor
  • There must be redundant Switches for networking and storage
  • Each server must be connected to redundant switches with redundant NIC for each purpose
  • Each Hyper-v server must have minimum dual Host Bus Adapter (HBA) port connected to redundant MDS switches
  • Each network must be connected to dual NIC from server to switches
  • Only iLO/DRAC will have a single connection
  • Each site must have redundant power supply.

Storage Requirements

Since I am talking about highly available and redundant systems design, this sort of design must consist of replicated or clustered storage presented to multi-site Hyper-v cluster nodes. Replication of data between sites is very important in a multi-site cluster, and is accomplished in different ways by different hardware vendors. You will achieve high performance through hardware or block level replication instead of software. You should contact your storage vendor to come up with solutions that provide replicated or clustered storage.

Network Requirements:

A multi-site cluster running Windows Server 2008 can contain nodes that are in different subnet however as a best practice, you must configure Hyper-v cluster in same subnet. You applications and services can reside in separate subnets. To avoid any conflict, you should use dark fibre connection or MPLS network between multi-sites that allows VLANs.

Note that you must configure Hyper-v with static IP. In a multi-site cluster, you might want to tune the “heartbeat” settings, see http://go.microsoft.com/fwlink/?LinkId=130588 for details.

Network Configuration Spread Sheet

Network

VLAN ID

NICs and Switch Ports speed

iLO/DRAC

10

1Gbps

MGMT

20

2x1Gbps

Live Migration

30

2x10Gbps

Storage Migration

40

2x10Gbps

Virtual Machine

50,60

4x10Gbps

iSCSI Network

70

4x10Gbps

Heartbeat network

80

2x1Gbps

Storage Replication

(Separate from Hyper-v)

90

Dark Fibre

2x10Gbps

Note that iSCSI network is only required if you are using IP Storage instead of Fibre Channel (FC) storage.

Cluster Selection: Node and File Share Majority (For Cluster with Special Configurations)

Quorum Selection: Since you will be configuring Node and File Share Majority cluster, you will have the option to place quorum files to shared folder. Where do you place this shared folder? Since we are talking about fully redundant and highly available Hyper-v Cluster, we have several options to place quorum shared folder.

Option1: Secondary Site

Option 2: Third Site

Visit http://technet.microsoft.com/en-us/library/cc770620%28WS.10%29.aspx for more details on quorum.

Hyper-v Cluster Configuration:

Visit https://araihan.wordpress.com/2013/06/04/windows-server-2012-failover-clustering-deep-dive/ for detailed cluster configuration guide.

Multi-Site Hyper-v Cluster for High Availability and Disaster Recovery

In most of the SMB customer, the nodes of the cluster that reside at their primary data center provide access to the clustered service or application, with failover occurring only between clustered nodes. However for an enterprise customer, failure of a business critical application is not an option. In this case, disaster recovery and high availability are bonded together so that when both/all nodes at the primary site are lost, the nodes at the secondary site begin providing service automatically, or with minimal intervention.

The maximum availability of any services or application depends on how you design your platform that hosts these services. It is important to follow best practices in Compute, Network and Storage infrastructure to maximize uptime and maintain SLA.

The following diagram shows a multi-site failover cluster that uses four nodes and supports a clustered service or application.

 

image

 

The following rack diagram shows the identical compute, storage and networking infrastructure in both site.

image

Physical Infrastructure

  • Primary and Secondary sites are connected via 2x10Gbps dark fibre
  • Storage vendor specific replication software such as EMC recovery point
  • Storage must have redundant storage processor
  • There must be redundant Switches for networking and storage
  • Each server must be connected to redundant switches with redundant NIC for each purpose
  • Each Hyper-v server must have minimum dual Host Bus Adapter (HBA) port connected to redundant MDS switches
  • Each network must be connected to dual NIC from server to switches
  • Only iLO/DRAC will have a single connection
  • Each site must have redundant power supply.

Storage Requirements

Since I am talking about highly available and redundant systems design, this sort of design must consist of replicated or clustered storage presented to multi-site Hyper-v cluster nodes. Replication of data between sites is very important in a multi-site cluster, and is accomplished in different ways by different hardware vendors. You will achieve high performance through hardware or block level replication instead of software. You should contact your storage vendor to come up with solutions that provide replicated or clustered storage.

Examples are:

StarWind Software

Steeleye DataKeeper

EMC Recovery Point

HP Storage

Network Requirements:

A multi-site cluster running Windows Server 2008 can contain nodes that are in different subnet however as a best practice, you must configure Hyper-v cluster in same subnet. You applications and services can reside in separate subnets. To avoid any conflict, you should use dark fibre connection or MPLS network between multi-sites that allows VLANs.

Note that you must configure Hyper-v with static IP. In a multi-site cluster, you might want to tune the “heartbeat” settings, see http://go.microsoft.com/fwlink/?LinkId=130588 for details.

Network Configuration Spread Sheet

Network

VLAN ID

NICs and Switch Ports speed

iLO/DRAC

10

1Gbps

MGMT

20

2x1Gbps

Live Migration

30

2x10Gbps

Storage Migration

40

2x10Gbps

Virtual Machine

50,60

4x10Gbps

iSCSI Network

70

4x10Gbps

Heartbeat network

80

2x1Gbps

Storage Replication

(Separate from Hyper-v)

90

Dark Fibre

2x10Gbps

Note that iSCSI network is only required if you are using IP Storage instead of Fibre Channel (FC) storage.

Cluster Selection: Node and File Share Majority (For Cluster with Special Configurations)

Quorum Selection: Since you will be configuring Node and File Share Majority cluster, you will have the option to place quorum files to shared folder. Where do you place this shared folder? Since we are talking about fully redundant and highly available Hyper-v Cluster, we have several options to place quorum shared folder.

Option1: Secondary Site

Option 2: Third Site

Visit http://technet.microsoft.com/en-us/library/cc770620%28WS.10%29.aspx for more details on quorum.

Storage Configuration:

Visit http://www.starwindsoftware.com/images/content/technical_papers/StarWind_HA_Hyper-V_6.0.pdf , http://docs.us.sios.com/ and http://us.sios.com/wp-content/uploads/sios-datakeeper-replication-multi-site-clustering-windows-servers-enterprise.pdf for clustered storage configuration for Hyper-v.

Hyper-v Cluster Configuration:

Visit http://microsoftguru.com.au/2013/06/04/windows-server-2012-failover-clustering-deep-dive/ for detailed cluster configuration guide.