Configuring EMC DD Boost with Veeam Availability Suite

This article provides a tour of the configuration steps required to integrate EMC Data Domain System with Veeam Availability Suite 9 as well as provides benefits of using EMC DD Boost for backup application.

Data Domain Boost (DD Boost) software provides advanced integration with backup and enterprise applications for increased performance and ease of use. DD Boost distributes parts of the deduplication process to the backup server or application clients, enabling client-side deduplication for faster, more efficient backup and recovery. All Data Domain systems can be configured as storage destinations for leading backup and archiving applications using NFS, CIFS, Boost, or VTL protocols.

The following applications work with a Data Domain system using the DD Boost interface: EMC Avamar, EMC NetWorker, Oracle RMAN, Quest vRanger, Symantec Veritas NetBackup (NBU), Veeam and Backup Exec. In this example, we will be using Veeam Availability Suite version 9.

Data Domain Systems for Service Provider

Data Domain Secure Multitenancy (SMT) is the simultaneous hosting by a service provider for more than one consumer (Tenant) or workload (Applications, Exchange, Standard VMs, Structured Data, Unstructured Data, Citrix VMs).

SMT provides the ability to securely isolate many users and workloads in a shared infrastructure, so that the activities of one Tenant are not apparent or visible to the other Tenants. A Tenant is a consumer (business unit, department, or customer) who maintains a persistent presence in a hosted environment.

Basic Configuration requirements are:

  • Enable SMT in the DD System
  • Role Based Access Control in DD Systems
  • Tenant Self-Service in the DD Systems
  • A Tenant is created on the DD Management Center and/or DD system.
  • A Tenant Unit is created on a DD system for the Tenant.
  • One or more MTrees are created to meet the storage requirements for the Tenant’s various types of backups.
  • The newly created MTrees are added to the Tenant Unit.
  • Backup applications are configured to send each backup to its configured Tenant Unit MTree.

Prerequisites:

  1. Backup Server

Physical Server- Fibre Channel or iSCSI

OR

Virtual Server- Fibre Channel with N-Port Virtualization or NPIV or Pass-through Storage or iSCSI

  1. Backup Software

Backup Application, DD Boost Library, DD Boost-over-FC Transport

  1. Storage Area Network

Fibre Channel or iSCSI

  1. Data Domain System

DD Boost Service

DD Boost-over-FC Server

SCSI Commands over FC

SCSI Processor Devices

  1. Virtual Infrastructure

Hyper-v Server cluster & System Center Virtual Machine Manager OR

VMware vCenter with vSphere Hosts

Designing DD Boost for resiliency & availability

The Data Domain System broadcast itself to the backup server using one or more path physically or virtually connected. The design of entire systems depend on the Data Domain sizing on how you connect Data Domain with backup server(s), how many backup jobs will be running, size of backup, de-duplication, data retention and frequency of data restore. A typical backup solution should include the following environment.

  • Backup server with 2 initiator HBA ports (A and B)
  • Data Domain System has 2 FC target endpoints (C and D)
  • Fibre Channel Fabric zoning is configured such that both initiator HBA ports can access both FC target endpoints
  • Data Domain system is configured with a SCSI target access group containing:
  • Both FC target endpoints on the Data Domain System
  • Dual Fabric for fail over and availability
  • Multiple physical and logical Ethernet for availability and fail over

Examples of Sizing

To calculate the maximum simultaneous connection to Data Domain Fibre Channel System (DFC) from all Backup servers. DFC device (D) is the number of devices to be advertised to the initiator of the backup server(s). Lets say we have 1 backup server, single data domain systems, the backup server is running 100 backup jobs.

DFC Device Count D= (minimum 2 X S)/128

J=1 Backup Server x 100 Backup Jobs=100

C= 1 (Single DD System)

S=JXC (100X1)=100

D=2*100/128 = 1.56 rounded up 2

Therefore, all DFC groups on the Data Domain system must be configured with 2 devices.

Step1: Preparing DD System

Step2: Managing system licenses

  1. Select Administration > Licenses> Click Add Licenses.
  2. On the License Window, type or paste the license keys. Type each key on its own line or separate each key by a space or comma (DD System Manager automatically places each key on a new line)
  3. Click Add. The added licenses display in the Added license list.

OR

  1. In System Manager, select Protocols > DD Boost > Settings. If the Status indicates that DD Boost is not licensed, click Add
  2. License and enter a valid license in the Add License Key dialog box.

Step3: Setting up CIFS Protocol

  1. On the DD System Manager Navigation>click Protocols > CIFS.
  2. In the CIFS Status area, click Enable.

Step4: Remove Anonymous Log on

  1. Select Protocols > CIFS > Configuration.
  2. In the Options area, click Configure Options.
  3. To restrict anonymous connections, click the checkbox of the Enable option in the

Step4: Restrict Anonymous Connections area.

  1. In the Log Level area, click the drop-down list to select the level number 1.
  2. In the Server Signing area, select Enabled to enable server signing

Step5: Specifying DD Boost user names

The following user will be used to connect to DD boost from backup software.

  1. Select Protocols > DD Boost.
  2. Select Add, above the Users with DD Boost Access list.
  3. On the Add User dialog appears. To select an existing user, select the user name in the drop-down list. EMC recommends that you select a user name with management role privileges set to none.
  4. To create and select a new user, select Create a new Local User and Enter the password twice in the appropriate fields. Click Add.

Step6: Enabling DD Boost

  1. Select Protocols > DD Boost > Settings.
  2. Click Enable in the DD Boost Status area.
  3. Select an existing user name from the menu then complete the wizard.

Step7: Creating a storage unit

  1. Select Protocols > DD Boost > Storage Units.
  2. Click Create. The Create Storage Unit dialog box is displayed.
  3. Enter the storage unit name in the Name box e.g. DailyRepository1
  4. Select an existing username that will have access to this storage unit. EMC recommends that you select a username with management role privileges set to none. The user must be configured in the backup application to connect to the Data Domain system.
  5. To set storage space restrictions to prevent a storage unit from consuming excess space: enter either a soft or hard limit quota setting, or both a hard and soft limit.
  6. Click Create.
  7. Repeat the above steps for MonthlyRepository1 each Data Domain Boost-enabled system.

Step8: Encrypting Communication between Backup Server and Data Domain (Optional)

Generate an advanced certificate from Active Directory Certificate services and install into the Data Domain DD Boost. You must install the same certificate into the backup servers so that both data domain and data domain client which is backup server can talk to each via encrypted certificate.

  1. Start DD System Manager on the system to which you want to add a host certificate.
  2. Select Protocols > DD Boost > More Tasks > Manage Certificates….
  3. In the Host Certificate area, click Add.
  4. To add a host certificate enclosed in a .p12 file, Select I want to upload the certificate as a .p12 file. Type the password in the Password box.
  5. Click Browse and select the host certificate file to upload to the system.
  6. Click Add.
  7. To add a host certificate enclosed in a .pem file, Select I want to upload the public key as .pem file and use a generated private key. And Click Browse and select the host certificate file to upload to the system.
  8. Click Add.

DD Boost client access and encryption

  1. Select Protocols > DD Boost > Settings.
  2. In the Allowed Clients section, click Create. The Add Allowed Client dialog appears.
  3. Enter the hostname of the client. This can be a fully-qualified domain name (e.g. Backupserver1.domain.com) or a hostname with a wildcard (e.g. *.domain.com).
  4. Select the Encryption Strength. The options are None (no encryption), Medium (AES128-SHA1), or High (AES256-SHA1).
  5. Select the Authentication Mode. The options are One Way, Two Way.
  6. Click OK.

Step9:Configuring DD Boost over Fibre Channel

  1. Select Protocols > DD Boost > Fibre Channel.
  2. Click Enable to enable Fibre Channel transport.
  3. To change the DD Boost Fibre Channel server name from the default (hostname), click Edit, enter a new server name, and click OK.
  1. Select Protocols > DD Boost > Storage Units to create a storage unit (if not already

created by the application).

  1. Install the DD Boost API/plug-in (if necessary, based on the application).

Step10: Configuring storage for DD Extended Retention (Optional)

Before you proceed with Extended Retention you must add required license on the DD System.

  1. Select Hardware > Storage tab.
  2. In the Overview tab, select Configure Storage. In the Configure Storage tab, select the storage to be added from the Available Storage list.
  3. Select the appropriate Tier Configuration (or Active or Retention) from the menu.
  4. Select the checkbox for the Shelf to be added.
  5. Click the Add to Tier button. Click OK to add the storage.

Step11: Configure a Veeam backup repository

  1. To create an EMC Data Domain Boost-enabled backup repository, navigate to the Backup Infrastructure section of the user interface, then select Backup Repositories and right-click to select Add Backup Repository.

DDBoost

  1. The next step is to select the repository type, De-duplicating storage appliance. Type the Name of the DD Systems, Choose Fibre Channel or Ethernet Option, add credentials to connect to DD System and Gateway to connect to DD System. To be able to connect Veeam Backup server to the DD System using Fibre Channel you must add DD System & Veeam Backup server in the same SAN zone. You also need to enable FC on the DD System. To be able to connect Veeam Backup Server using Ethernet Veeam backup Server and DD System must be in same VLAN or for multi-VLAN you must enable unrestricted communication between VLANs.
  2. On the next screen, select the Storage Unit of the DD System to be used by the Veeam Server as repository, leave concurrent connection as default
  3. On the Next screen, enable vPower NFS, complete the wizard

Step12: Configure Veeam Backup Job & Backup Copy Job

The critical decision on backup jobs will be whether to do an active full backup or leverage synthetic full backups. Veeam Backup Job Creation GuideVeeam Backup Copy Job Creation Guide

Here is short business case of backup type.

Veeam Backup Options:

  1. Active Full- Financial or health sector prefer to keep a monthly full backup of data and retain certain period of time for corporate compliance and satisfying external auditor’s  requirement to keep data off-site for a period of time.
  2. Synthetic Full- A standard practice to keep synthetic full at all time to reduce storage cost and recovery time objective for any organization.

Sythetic

  • For most environments, Veeam recommends to do synthetic full backups when leveraging EMC Data Domain Boost. This will save stress on primary storage for the vSphere and Hyper-V VMs and the Boost-enabled synthesizing is very fast.
  • For a Backup Copy job using GFS retention (Monthly, Weekly, Quarterly and/or Annual restore points), the gateway server must be closest to the Data Domain server, since the Backup Copy job frequently involves an offsite transfer. When the Data Domain server is designated in the repository setup, ensure that consideration is given to the gateway server if it is being used off site.
  • Backup job timed out value must be higher than 30 minutes to be able to retry the job if it is to fail for any reason

DD System Option:

  • A virtual synthetic full backup is the combination of the last full (synthetic or full) backup and all subsequent incremental backups. Virtual synthetics are enabled by default.
  • The synthetic full backups are faster when Data Domain Boost is enabled for a repository
  • DD Boost reduces backup transformation time by less than 80% of total time if DD Boost was not used.
  • The first job has the bulk of the blocks of the vSphere or Hyper-V VM on the DD Boost Storage Unit, it will only need to transfer metadata and any possible changed blocks. This can be a significant improvement on the active full backup process when there is a fast source storage resource in place.
  • With DD Boost, multi-link provides fail over & resiliency. DD Boost also provides parallel processing of concurrent jobs to DD Boost Storage unit.
  1. To display the DD Boost option settings, select Protocols > DD Boost > Settings >Advanced Options.
  2. To change the settings, select More Tasks > Set Options. Select or deselect any option to be enabled.
  3. Click OK.

Buying a SAN? How to select a SAN for your business?

A storage area network (SAN) is any high-performance network whose primary purpose is to enable storage devices to communicate with computer systems and with each other. With a SAN, the concept of a single host computer that owns data or storage isn’t meaningful. A SAN moves storage resources off the common user network and reorganizes them into an independent, high-performance network. This allows each server to access shared storage as if it were a drive directly attached to the server. When a host wants to access a storage device on the SAN, it sends out a block-based access request for the storage device.

A storage-area network is typically assembled using three principle components: cabling, host bus adapters (HBAs) and switches. Each switch and storage system on the SAN must be interconnected and the physical interconnections must support bandwidth levels that can adequately handle peak data activities.

Good SAN

A good provides the following functionality to the business.

Highly availability: A single SAN connecting all computers to all storage puts a lot of enterprise information accessibility eggs into one basket. The SAN had better be pretty indestructible or the enterprise could literally be out of business. A good SAN implementation will have built-in protection against just about any kind of failure imaginable. As we will see in later chapters, this means that not only must the links and switches composing the SAN infrastructure be able to survive component failures, but the storage devices, their interfaces to the SAN, and the computers themselves must all have built-in strategies for surviving and recovering from failures as well.

Performance:

If a SAN interconnects a lot of computers and a lot of storage, it had better be able to deliver the performance they all need to do their respective jobs simultaneously. A good SAN delivers both high data transfer rates and low I/O request latency. Moreover, the SAN’s performance must be able to grow as the organization’s information storage and processing needs grow. As with other enterprise networks, it just isn’t practical to replace a SAN very often.

On the positive side, a SAN that does scale provides an extra application performance boost by separating high-volume I/O traffic from client/server message traffic, giving each a path that is optimal for its characteristics and eliminating cross talk between them.

The investment required to implement a SAN is high, both in terms of direct capital cost and in terms of the time and energy required to learn the technology and to design, deploy, tune, and manage the SAN. Any well-managed enterprise will do a cost-benefit analysis before deciding to implement storage networking. The results of such an analysis will almost certainly indicate that the biggest payback comes from using a SAN to connect the enterprise’s most important data to the computers that run its most critical applications.

But its most critical data is the data an enterprise can least afford to be without. Together, the natural desire for maximum return on investment and the criticality of operational data lead to Rule 1 of storage networking.

Great SAN

A great SAN provides additional business benefits plus additional features depending on products and manufacturer. The features of storage networking, such as universal connectivity, high availability, high performance, and advanced function, and the benefits of storage networking that support larger organizational goals, such as reduced cost and improved quality of service.

  • SAN connectivity enables the grouping of computers into cooperative clusters that can recover quickly from equipment or application failures and allow data processing to continue 24 hours a day, every day of the year.
  • With long-distance storage networking, 24 × 7 access to important data can be extended across metropolitan areas and indeed, with some implementations, around the world. Not only does this help protect access to information against disasters; it can also keep primary data close to where it’s used on a round-the-clock basis.
  • SANs remove high-intensity I/O traffic from the LAN used to service clients. This can sharply reduce the occurrence of unpredictable, long application response times, enabling new applications to be implemented or allowing existing distributed applications to evolve in ways that would not be possible if the LAN were also carting I/O traffic.
  • A dedicated backup server on a SAN can make more frequent backups possible because it reduces the impact of backup on application servers to almost nothing. More frequent backups means more up-to-date restores that require less time to execute.

Replication and disaster recovery

With so much data stored on a SAN, your client will likely want you to build disaster recovery into the system. SANs can be set up to automatically mirror data to another site, which could be a fail safe SAN a few meters away or a disaster recovery (DR) site hundreds or thousands of miles away.

If your client wants to build mirroring into the storage area network design, one of the first considerations is whether to replicate synchronously or asynchronously. Synchronous mirroring means that as data is written to the primary SAN, each change is sent to the secondary and must be acknowledged before the next write can happen.

The alternative is to asynchronously mirror changes to the secondary site. You can configure this replication to happen as quickly as every second, or every few minutes or hours, Schulz said. While this means that your client could permanently lose some data, if the primary SAN goes down before it has a chance to copy its data to the secondary, your client should make calculations based on its recovery point objective (RPO) to determine how often it needs to mirror.

Security

With several servers able to share the same physical hardware, it should be no surprise that security plays an important role in a storage area network design. Your client will want to know that servers can only access data if they’re specifically allowed to. If your client is using iSCSI, which runs on a standard Ethernet network, it’s also crucial to make sure outside parties won’t be able to hack into the network and have raw access to the SAN.

Capacity and scalability

A good storage area network design should not only accommodate your client’s current storage needs, but it should also be scalable so that your client can upgrade the SAN as needed throughout the expected lifespan of the system. Because a SAN’s switch connects storage devices on one side and servers on the other, its number of ports can affect both storage capacity and speed, Schulz said. By allowing enough ports to support multiple, simultaneous connections to each server, switches can multiply the bandwidth to servers. On the storage device side, you should make sure you have enough ports for redundant connections to existing storage units, as well as units your client may want to add later.

Uptime and availability

Because several servers will rely on a SAN for all of their data, it’s important to make the system very reliable and eliminate any single points of failure. Most SAN hardware vendors offer redundancy within each unit — like dual power supplies, internal controllers and emergency batteries — but you should make sure that redundancy extends all the way to the server. Availability and redundancy can be extended to multiple systems and cross datacentre which comes with cost benefit analysis and specific business requirement. If your business drives to you to have zero downtime policy then data should be replicated to a disaster recovery sites using identical SAN as production. Then use appropriate software to manage those replicated SAN.

Software and Hardware Capability

A great SAN management software deliver all the capabilities of SAN hardware to the devices connected to the SAN. It’s very reasonable to expect to share a SAN-attached tape drive among several servers because tape drives are expensive and they’re only actually in use while back-ups are occurring. If a tape drive is connected to computers through a SAN, different computers could use it at different times. All the computers get backed up. The tape drive investment is used efficiently, and capital expenditure stays low.

A SAN provide fully redundant, high performance and highly available hardware, software for application and business data to compute resources. Intelligent storage also provide data movement capabilities between devices.

Best OR Cheap

No vendor has ever developed all the components required to build a complete SAN but most vendors are engaged in partnerships to qualify and offer complete SANs consisting of the partner’s products.

Best-in-class SAN provides totally different performance and attributes to business. A cheap SAN would provide a SAN using existing Ethernet network however you should ask yourself following questions and find answers to determine what you need? Best or cheap?

  1. Has this SAN capable of delivering business benefits?
  2. Has this SAN capable of managing your corporate workloads?
  3. Are you getting correct I/O for your workloads?
  4. Are you getting correct performance matrix for your application, file systems and virtual infrastructure?
  5. Are you getting value for money?
  6. Do you have a growth potential?
  7. Would your next data migration and software upgrade be seamless?
  8. Is this SAN a heterogeneous solutions for you?

Storage as a Service vs on-premises

There are many vendors who provides storage as a service with lucrative pricing model. However you should consider the following before choosing storage as a service.

  1. Does this vendor a partner of recognised storage manufacturer?
  2. Does this vendor have certified and experienced engineering team to look after your data?
  3. Does this vendor provide 24x7x365 support?
  4. Does this vendor provide true storage tiering?
  5. What is geographic distance between storage as a service provider’s data center and your infrastructure and how much WAN connectivity would cost you?
  6. What would be storage latency and I/O?
  7. Are you buying one off capacity or long term corporate storage solution?

If answers of these questions favour your business then I would recommend you buy storage as a service otherwise on premises is best for you.

NAS OR FC SAN OR iSCSI SAN OR Unified Storage

A NAS device provides file access to clients to which it connects using file access protocols (primarily CIFS and NFS) transported on Ethernet and TCP/IP.

A FC SAN device is a block-access (i.e. it is a disk or it emulates one or more disks) that connects to its clients using Fibre Channel and a block data access protocol such as SCSI.

An iSCSI, which stands for Internet Small Computer System Interface, works on top of the Transport Control Protocol (TCP) and allows the SCSI command to be sent end-to-end over local-area networks (LANs), wide-area networks (WANs) or the Internet.

You have to know your business before you can answer the question NAS/FC SAN/iSCSI SAN or Unified? Would you like to maximise your benefits from same investment well you know the answer you are looking for unified storage solutions like NetApp or EMC ISILON. If you are looking for enterprise class high performance storage, isolate your Ethernet from storage traffic, reduce backup time, minimise RPO and RTO then FC SAN is best for you example EMC VNX and NetApp OnCommand Cluster. If your intention is to use existing Ethernet and have a shared storage then you are looking for iSCSI SAN example Nimble storage or Dell SC series storage. But having said that you also needs to consider your structured corporate data, unstructured corporate data and application performance before making a judgement call.

Decision Making Process

Let’s make a decision matrix as follows. Just fill the blanks and see the outcome.

Workloads I/O Capacity Requirement (in TB) Storage Protocol

(FC, iSCSI, NFS, CIFS)

Virtualization
Unstructured Data
Structured Data
Messaging Systems
Application virtualization
Collaboration application
Business Application

Functionality Matrix

Option Rating Requirement (1=high 3=Medium 5=low )
Redundancy
Uptime
Capacity
Data movement
Management

Risk Assessment

Risk Type Rating (Low, Medium, High)
Loss of productivity
Loss of redundancy
Reduced Capacity
Uptime
Limited upgrade capacity
Disruptive migration path

Service Data – SLA

Service Type SLA Target
Hardware Replacement
Uptime
Vendor Support

Rate storage via Gartner Magic Quadrant. Gartner magic quadrant leaders are (as of October 2015):

  1. EMC
  2. HP
  3. Hitachi
  4. Dell
  5. NetApp
  6. IBM
  7. Nimble Storage

To make your decision easy select a storage that enables you to cost effective way manage large and rapidly growing data. A storage that is built for agility, simplicity and provide both tiered storage approach for specialized needs and the ability to unify all digital contents into a single high performance and highly scalable shared pool of storage. A storage that accelerate productivity and reduce capital and operational expenditures, while seamlessly scaling storage with the growth of mission critical data.

Data Deduplication in Windows Storage Server 2012 R2

Deduplication in Windows Server: Data deduplication involves finding and removing duplication within data without compromising its fidelity or integrity. The goal is to store more data in less space by segmenting files into small variable-sized chunks (32–128 KB), identifying duplicate chunks, and maintaining a single copy of each chunk. Redundant copies of the chunk are replaced by a reference to the single copy. The chunks are compressed and then organized into special container files in the System Volume Information folder.

Enhanced Dedupe features in Windows Server 2012 R2

  • Data deduplication for remote storage of Virtual Desktop Infrastructure (VDI) workloads
  • Expand an optimized file on its original path.

When using the Data Deduplication feature for the first time or migrating from a previous version of Windows Server, be sure to consider the following related technologies and issues:

  • BranchCache
  • Failover Clusters
  • DFS Replication
  • FSRM quotas
  • Single Instance Storage or NAS Box

Install and Configure Data Deduplication using GUI

1. Open Server Manager, From the Add Roles and Features Wizard, under Server Roles, select File and Storage Services.

2. Select the File Services check box, and then select the Data Deduplication check box.

3. Click Next until the Install button is active, and then click Install.

4. From the Server Manager dashboard, right-click a data volume and choose Configure Data Deduplication. The Deduplication Settings page appears.

5. In the Data deduplication box, select the workload you want to host on the volume. Select General purpose file server for general data files or Virtual Desktop Infrastructure (VDI) server when configuring storage for running virtual machines.

6. Enter the number of days that should elapse from the date of file creation until files are deduplicated, enter the extensions of any file types that should not be deduplicated, and then click Add to browse to any folders with files that should not be deduplicated.

7. Click Apply to apply these settings and return to the Server Manager dashboard, or click the Set Deduplication Schedule button to continue to set up a schedule for deduplication.

Install and Configure Data Deduplication using Windows PowerShell

Start Windows PowerShell. Right-click the Windows PowerShell icon on the taskbar, and then click Run as Administrator.

Import-Module ServerManager | Add-WindowsFeature -name FS-Data-Deduplication

Import-Module Deduplication

Enable-DedupVolume E: -UsageType HyperV

Enable-DedupVolume E: -UsageType Default

Set-Dedupvolume E: -MinimumFileAgeDays 20

Get-DedupVolume | fl

Start-DedupJob E: –Type Optimization –Wait

References:

Windows Server 2012 R2 NAS Box with Deduplication Capacity

Introduction to Windows Deduplication

Windows PowerShell Cmdlet for Deduplication