Azure Site Recovery orchestrates and manages disaster recovery for Azure VMs in Azure Cloud, and on-premises VMs in VMware, System Center VMM and physical servers. Prerequisites: VMware Virtual Server Azure Subscription Azure Virtual Network ExpressRoute between On-premises to Azure Network … Continue reading
Hyper-V Replica provides IP based asynchronous replication of virtual machines between two Hyper-v servers. Since this an asynchronous replication, replica virtual machine will not have the most recent data. However, replica virtual machines provides a cost effective way of keeping a copy of production virtual machines in a secondary site and can be made available in case of a disaster.
- Shared or standalone storage to fulfill the capacity requirement of the replicated virtual machine
- Asynchronous replication of Hyper-V virtual machines over Ethernet IP based network
- Replica works with standalone servers, failover clusters, or a mixture of both
- Hyper-v Hosts can be physically co-located or geographically diverse location with MPLS or IPVPN connection
- Hyper-v Hosts can be domain joined or standalone
- Provide planned or unplanned failover
- Any Hyper-v virtualized server can be replication using Hyper-v replica
- Windows Server 2012 R2 Hyper-v Role Installed
- Windows Server 2012 Hyper-v Role Installed
- Similar virtual network and physical network must be configured in secondary site for replica virtual machine to function as production virtual machine.
Step1: Configure Firewall on Primary and Secondary Hyper-v Host
1. Right Click Windows Logo on Task Bar>Control Panel>Windows Firewall
2. Open Windows Firewall with Advance Security and click Inbound Rules.
3. Right-click Hyper-V Replica HTTP Listener (TCP-In) and click Enable Rule.
4. Right-click Hyper-V Replica HTTPS Listener (TCP-In) and click Enable Rule.
Step2: Pre-stage Replica Broker Computer Object
1. Log on to DC>Open Active Directory Users & Computers>Create New Computer e.g. HVReplica
2. Right Click on HVReplica Computer Object>Properties>Security Tab>Hyper-v Cluster Nodes NetBIOS Name>Allow Full Permission>Apply>Ok.
Step3: Configure Replica Broker in Hyper-v Environment
Hyper-v Replica using Failover Cluster Wizard
1. Log on Hyper-v Host>open Failover Cluster Manager.
2. In the left pane, connect to the cluster, and while the cluster name is highlighted, click Configure Role in the Actions pane. The High Availability wizard opens
3. In the Select Role screen, select Hyper-V Replica Broker.
4. Complete the wizard, providing a NetBIOS name you have created in previous step and IP address to be used as the connection point to the cluster.
5. Verify that the Hyper-V Replica Broker role comes online successfully. Click Finish.
6. To test Replica broker failover, right-click the role, point to Move, and then click Select Node. Then, select a node, and then click OK.
7. click Roles in the Navigate category of the Details pane
8. Right-click the role and choose Replication Settings.
9. In the Details pane, select Enable this cluster as a Replica server.
10. In the Authentication and ports section, select the authentication method Kerberos over HTTP and authentication over HTTPS.
11. To use certificate-based authentication, click Select Certificate and provide the request certificate information.
12. In the Authorization and storage section, you can specify default location or specific server with specific storage with the Trust Group tag.
13. Click OK or Apply when you are finished.
Configure Hyper-v Replica using Hyper-v Manager
To Configure Hyper-v replica Broker in non-clustered environment.
1. In Hyper-V Manager, click Hyper-V Settings in the Actions pane.
2. In the Hyper-V Settings dialog, click Replication Configuration.
3. In the Details pane, select Enable this computer as a Replica server.
4. In the Authentication and ports section, select the authentication method Kerberos over HTTP and authentication over HTTPS.
5. To use certificate-based authentication, click Select Certificate and provide the request certificate information.
6. In the Authorization and storage section, you can specify default location or specific server with specific storage with the Trust Group tag.
7. Click OK or Apply when you are finished.
Step4: Configure Replica Virtual Machine
1. In the Details pane of Hyper-V Manager, select a virtual machine by clicking it.
2. Right-click the selected virtual machine and point to Enable Replication. The Enable Replication wizard opens.
3. On the Specify Replica Server page, in the Replica Server box, enter either the NetBIOS or fully qualified international domain name (FQIDN) of the Replica server that you configured in Step 2.1. If the Replica server is part of a failover cluster, enter the name of the Hyper-V Replica Broker that you configured in Step 1.4. Click Next.
4. On the Specify Connection Parameters page, the authentication and port settings you configured for the Replica server in Step 2.1 will automatically be populated, provided that Remote WMI is enabled. If it is not enabled, you will have to provide the values. Click Next.
5. On the Choose Replication VHDs page, clear the checkboxes for any VHDs that you want to exclude from replication, then click Next.
6. On the Configure Recovery History page, select the number and types of recovery points to be created on the Replica server, then click Next.
7. On the Choose Initial Replication page, select the initial replication method and then click Next.
8. On the Completing the Enable Replication Relationship Wizard page, review the information in the Summary and then click Finish.
9. A Replica virtual machine is created on the Replica server. If you elected to send the initial copy over the network, the transmission begins either immediately or at the time you configured.
Step5: Test Replicated Virtual Machine
1. In Hyper-V Manager, right-click the virtual machine you want to test failover for, point to Replication…, and then point to Test Failover….
2. After you have concluded your testing, discard the test virtual machine by choosing Stop Test Failover under the Replication option
Step6: Planed Failover
1. Start Hyper-V Manager on the primary server and choose a virtual machine to fail over. Turn off the virtual machine that you want to fail over.
2. Right-click the virtual machine, point to Replication, and then point to Planned Failover.
3. Click Fail Over to actually transfer operations to the virtual machine on the Replica server. Failover will not occur if the prerequisites have not been met.
How to respond to unplanned Failover
1. Open Hyper-V Manager and connect to the Replica server.
2. Right-click the name of the virtual machine you want to use, point to Replication, and then point to Failover….
3. In the dialog that opens, choose the recovery snapshot you want the virtual machine to recover to, and then click Failover….. The Replication Status will change to Failed over – Waiting completion and the virtual machine will start using the network parameters you previously configured for it
4. Use the Complete-VMFailover Windows PowerShell cmdlet below to complete failover.
Starting a reverse replication once disaster is over
1. Open Hyper-V Manager and connect to the Replica server.
2. Right-click the name of the virtual machine you want to reverse replicate, point to Replication, and then point to Reverse replication…. The Reverse Replication wizard opens.
3. Complete the Reverse Replication wizard. You will find the requested information to be very similar if not identical to the information you provided in the Enable Replication wizard
Disaster recovery (DR) involves a set of policies and procedures to enable the recovery or continuation of vital technology infrastructure and systems following a natural or human-induced disaster. Disaster recovery focuses on the IT or technology systems supporting critical business functions, as opposed to business continuity, which involves keeping all essential aspects of a business functioning despite significant disruptive events. Disaster recovery is therefore a subset of business continuity. Wiki reference
A disaster recovery plan (DRP) is a documented process or set of procedures to recover and protect a business IT infrastructure in the event of a disaster. Such plan, ordinarily documented in written form, specifies procedures an organization is to follow in the event of a disaster.
Given organizations’ increasing dependency on information technology to run their operations, a disaster recovery plan, sometimes erroneously called a continuity of operations plan (COOP), is increasingly associated with the recovery of information technology data, assets, and facilities.
Disaster Recover VS Business Continuity – Are you mixing up?
Business Continuity is different than a Disaster Recovery but linked together. Business Continuity is a practice enterprise adopt to protect business from a complete failure and wait for recovery. By adopting business continuity, you can be assured that your business will continuity to run in the event of disaster; until all systems are recovered from disaster.
There are many, many businesses that fail after a disaster without a BC plan. DR will get your hardware, software and apps back up and running, but without a business continuity plan to keep your company going during the recovery process, you might not have a reason to recover those items. BC involves your finances, your personnel, your emergency plans and everything else that is a necessity to keep going and serving.
An example of business continuity is that all your corporate inbound and outbound email will come and go via third party cloud based smart host that will store all your email up to 15~30 days but deliver inbound/outbound email straight away to your corporation which means in the event of disaster you will receive and send email from any devices that has internet connectivity. Once systems is restored, cloud based smart host will sync with on premise Exchange Server.
Disaster Recovery Terminology in Alphabetic Order
Alert – Notification that a potential disruption is imminent or has occurred; usually includes a directive to act or standby.
Application Recovery – The component of Disaster Recovery that deals specifically with the restoration of business system software and data after the processing platform has been restored or replaced.
DR Site – A site held in readiness for use during/following an invocation of business or disaster recovery plans to continue urgent and important activities of an organization.
DR Work Area – Recovery environment complete with necessary infrastructure (desk, telephone, workstation, and associated hardware and equipment, communications, etc)
Backlog – The amount of work that accumulates when a system or process is unavailable for a long period of time. This work needs to be processed once the system or process is available and may take a considerable amount of time to process.
A situation whereby a backlog of work requires more time to action than is available through normal working patterns. In extreme circumstances, the backlog may become so marked that the backlog cannot be cleared.
Backup – A process by which data (electronic or paper-based) and programs are copied in some form so as to be available and used if the original data from which it originated is lost, destroyed or corrupted.
Business Continuity – The strategic and tactical capability of the organization to plan for and respond to incidents and business disruptions in order to continue business operations at an acceptable predefined level.
Checklist Tool to remind and /or validate that tasks have been completed and resources are available, to report on the status of recovery. A list of items (names or tasks etc.) to be checked or consulted.
Contingency Plan An event specific preparation that is executed to protect an organization from certain and specific identified risks and/or threats.
Continuous Availability A system or application that supports operations which continue with little to no noticeable impact to the user. For instance, with continuous availability, the user will not have to re-log in, or to re-submit a partial or whole transaction.
Data Backups The copying of production files to media that can be stored both on and/or offsite and can be used to restore corrupted or lost data or to recover entire systems and databases in the event of a disaster.
Data Center Recovery- The component of Disaster Recovery which deals with the restoration of data center services and computer processing capabilities at an alternate location and the migration back to the production site.
Data Recovery- The restoration of computer files from backup media to restore programs and production data to the state that existed at the time of the last safe backup.
Database Replication- The partial or full duplication of data from a source database to one or more destination databases.
Disaster- Situation where widespread human, material, economic or environmental losses have occurred which exceeded the ability of the affected organization (2.2.9), community or society to respond and recover using its own resources. Source: ISO 2.1.11
Disaster Recovery- The process, policies and procedures related to preparing for recovery or continuation of technology infrastructure, systems and applications which are vital to an organization after a disaster or outage.
Hot site- An alternate facility that already has in place the computer, telecommunications, and environmental infrastructure required to recover critical business functions or information systems.
Impact- The effect, acceptable or unacceptable, of an event on an organization. The types of business impact are usually described as financial and non-financial and are further divided into specific types of impact.
Incident- An event which is not part of standard business operations which may impact or interrupt services and, in some cases, may lead to disaster.
Network Outage- An interruption of voice, data, or IP network communications.
Off-Site Storage Any place physically located a significant distance away from the primary site, where duplicated and vital records (hard copy or electronic and/or equipment) may be stored for use during recovery.
Outage- The interruption of automated processing systems, infrastructure, support services, or essential business operations, which may result, in the organizations inability to provide services for some period of time.
Recovery- Implementing the prioritized actions required to return the processes and support functions to operational stability following an interruption or disaster.
Replication– Copying a point of time, structured or unstructured data from between site(s)
Risk- Potential for exposure to loss which can be determined by using either qualitative or quantitative measures.
Recovery Point Objective- A recovery point objective, or “RPO”, is defined by business continuity planning. It is the maximum tolerable period in which data might be lost from an IT service due to a major incident. The RPO gives systems designers a limit to work to. Wiki Reference
Recovery Time Objective – The recovery time objective (RTO) is the targeted duration of time and a service level within which a business process must be restored after a disaster (or disruption) in order to avoid unacceptable consequences associated with a break in business continuity. Wiki Reference
Service Level Agreement (SLA)- A formal agreement between a service provider (whether internal or external) and their client (whether internal or external), which covers the nature, quality, availability, scope and response of the service provider. The SLA should cover day-to-day situations and disaster situations, as the need for the service may vary in a disaster.
System Recovery- The procedures for rebuilding a computer system and network to the condition where it is ready to accept data and applications, and facilitate network communications.
Validation Script- A set of procedures within the Business Continuity Plan to validate the proper function of a system or process before returning it to production operation.
Workaround Procedures- Alternative procedures that may be used by a functional unit(s) to enable it to continue to perform its critical functions during temporary unavailability of specific application systems, electronic or hard copy data, voice or data communication systems, specialized equipment, office facilities, personnel, or external services.
Developing a DR Strategy
Regarding disaster recovery strategies, ISO/IEC 27031, the global standard for IT disaster recovery, states, “Strategies should define the approaches to implement the required resilience so that the principles of incident prevention, detection, response, recovery and restoration are put in place.” Strategies define what you plan to do when responding to an incident, while plans describe how you will do it.
- Reduce Overall Risk
- Maintain and Test Your Disaster Recovery Plan
- Alleviate Owner/Investor Concerns
- Restore Day-To-Day Operations
- Comply With Regulations
- Rapid Response
|Priority 2||Medium High||Medium|
|Priority 4||Medium low||Low|
Who and What are involved in a Disaster Recovery
- Physical facilities
- Data (Structured & Unstructured)
- Third Party Vendor or Suppliers
- IT Governance(Policies & Procedures)
Producing a DR Document
A DR document consist of the following sections:
- Corporate Logo
- Document history.
- Corporate Copyright Info
- Table of Content
- Executive Summary
- Roles and responsibilities.
- Third Party
- Site Diagram
- Incident response.
- Plan activation.
In conclusion, once your DR plans have been completed, they are ready to be implemented. This process will determine whether business will recover and restore IT assets as planned. Remember, this is not about IT department, this is about business who wants to comply and understand importance of disaster recovery. You will only succeed if your business is willing to participate and invest CAPEX and OPEX on disaster recovery.