I have been deploying Storage Area Network for almost 10 years in my 16 years Information Technology career. I have deployed various traditional, software defined and converged SANs manufactured by global vendor like IBM, EMC, NetApp, HP, Dell, etc. I was tasked with deployment of Dell Compellent in my previous role for several clients. I was excited for the opportunities and paused after reading the documentation presented to me. I could not co-relate implementation of a SAN and expected outcome desired by clients. When over hyped sales pitch is sold to businesses with high promises then there will always be hidden risks that comes with this sales pitch. Lesson number one is never trust someone blindly even though they have a very decent track record, re-sellers are often after a quick sale and get out. Lesson number two make sure you know who to trust as your partner in the transition to have a new SAN. Decide what to procure based on your business case, ROI, workload analysis, capacity planning and outcome of requirement analysis. Consider current technology trend, where you are at now, a technology road map and where you want to be in future. Consider aligning technology stream with the business you do. I have written this article to share my own experience and disclose everything I learnt through my engagement on Dell Compellent deployment projects so that you can make a call by yourself. I will elaborate each feature of Dell Compellent and what exactly this feature does when you deploy a Compellent. FYI I have no beef with Dell. Let’s start now…
Target Market: Small Business
Lets not go into detail, that will be a different topic for another day. Please read Dell’s business proposition “Ideally suited to smaller deployments across a variety of workloads, the SC Series products are easy to use and value optimized. We will continue to optimize the SC Series for value and server-attach.”
Management: Dell Compellent Storage Center has a GUI designed to be accessible allegedly ease of use. Wizards offers few common tasks such as allocation, configuration, and administration functions. Compellent Storage Center monitoring tools provide very little insight on how storage backend is doing. You have to engage Dell remote support for diagnostic, and monitoring tools with alert and notification services. Storage center is not as granular as the competitor NetApp and EMC. Storage center has little information on storage performance, bottle neck and backend storage issues . Compellent is by design thin provisioned storage. There is no option in management center to assign as thick provisioned volume. IOPS and latency are calculated in volume and IOPS and latency are calculated in disks are far too different than real IOPS. You may see little IOPS in volume but click at disk level IOPS you will see storage controller is struggling to cope with the IOPS. Management center does not provide any clues who is generating this much IOPS.
Contact technical support they will say RAID scrub is killing your storage. Your standard request to tech support that stop the RAID scrub in business hour. “You cannot do it” another classic reply by tech support. If you go through Compellent management center you will find nothing that can schedule or stop RAID scrub.
Data Progression: In theory Data Progression is an automated tiering technology that should have optimized the location of data, both on a schedule and on demand as prompted by a storage profile. Compellent’s tiering profiles streamline policy administration by assigning tier attributes based on the profile. On-demand data progression in business hour will drive Compellent into crazy. If you are Citrix VDI mainstream than your workload is pretty much dead until data progression is complete.
Side effect of this technology is storage controller struggle to maintain on demand data progression and IO request at the same time hence there will be queue depth and longer seek time in backend storage. In this situation storage seek time is higher than normal.
Storage Profile: Storage profile in lay man’s terms is segregating expensive and cheap disk and profiling them in tier 1, tier 2 and tier 3. These determine how the system reads and writes data to disk for each volume as they are known in Compellent terms and how data ages over time a feature called Data Progression.
Storage Profiles supposed to allow the administrator to manage both writable blocks and replay blocks for a volume. It is essentially a tiering of storage in controlled way. In theory it supposed to be like a controlled environment. However in reality it does add extra workload to Dell Complellent controller. Let’s say you have tiered your storage according to your read and write intense IO. What happen when READ and WRITE intense volume gets full. Storage controller automatically trigger an on demand data progression from upper tier to lower tier to store data. Hence a WRITE intense IO is generated in lower tier what you wanted to avoid in first place that’s why you profiled or tiered your storage. Mixing data progression with storage tiering defeats whole purpose of storage profiling.
Replay: Replay is essentially a storage snapshot in Dell terms. Dell Compellent Data Instant Replay software creates point-in-time copies called Replays. With Data Instant Replay Dell Compellent storage Replays at any time interval with minimal storage capacity. But here is the catch you will most likely to run storage replay at 7pm every day and for critical volume every hour but your backup triggers at 7pm. Backup generates lots of READ IOPS and Replays generates lots of READ and WRITE IOPS at the same time which is 7pm. Hence your backup is going to be dead slow. You will run out of backup window and never going to finish backup before 7am morning. It will be nightmare to fulfil your SLA in backup and restore of any file systems and application.
RAID Scrub: Data scrubbing is an error correction technique that uses a background task to periodically inspect storage for errors, and then correct detected errors using redundant data in form of different checksums or copies of data. Data scrubbing reduces the likelihood that single correctable errors will accumulate, leading to reduced risks of un-correctable errors.
In NetApp you can schedule a RAID Scrub that suits your time and necessity however in Dell Compellent you cannot schedule a RAID Scrub through GUI or Command line. Dell technical support advised that this is an automated process takes places every day to correct RAID groups in Dell Compellent. There is a major side effect running automated RAID scrub. RAID scrub will drive your storage to insane IOPS level and latency will peak to high causing production volume to suffer and under perform. Performance of virtualization will be degraded so badly that production environment will struggle to serve IO request. Dell advised that Dell can do nothing about RAID scrub because RAID scrub in SCOS operating systems is an automated process.
Multipathing: By implementing MPIO solution you eliminate any single point of failure in any physical path (s) and logical path(s) among any components such as adapters, cables, fabric switches, servers and storage. In the event that one or more of these components fails, causing the path to fail, multipathing logic uses an alternate path for I/O so that applications can still access their data. Each network interface card (in the iSCSI case) or HBA should be connected by using redundant switch infrastructures to provide continued access to storage in the event of a failure in a storage fabric component. This is the fundamental concept of any storage area network AKA SAN.
New generation SANs are integrated with multipath I/O (MPIO) support. Both Microsoft and VMware virtualization architecture supports iSCSI, Fibre Channel and serial attached storage (SAS) SAN connectivity by establishing multiple sessions or connections to the storage array. Failover times may vary by storage vendor, and can be configured various way but the logic of MPIO remains unchanged.
New MPIO features in Windows Server include a Device Specific Module (DSM) designed to work with storage arrays that support the asymmetric logical unit access (ALUA) controller model (as defined in SPC-3), as well as storage arrays that follow the Active/Active controller model.
The Microsoft DSM provides the following load balancing policies. Microsoft load balance policies are generally dependent on the controller model (ALUA or true Active/Active) of the storage array attached to Windows-based computers.
- Round-robin with a subset of paths
- Dynamic Least Queue Depth
- Weighted Path
VMware based systems also provide Fixed Path, Most Recently Used (MRU) and Round-Robin Configuration which is the most optimum configuration for VMware virtual infrastructure.
To explain ALUA in simple terms is that Server can see any LUN via both storage processors or Controller or NAS Head as active but only one of these storage processors “owns” the LUN. Both Storage Processor can view logical activities of storage using physical connection either via SAN switch to server or via direct SAS cable connections. Hyper-v or vSphere ESXi server knows which processor owns which LUNs and sends traffic preferably directly to the owner. In case of controller or processor or NAS Head Failure Hyper-v or vSphere server automatically send traffic to active processor without loss of any productivity. This is a key feature of EMC, NetApp and HP products.
Let’s look at Dell Compellent now. Dell Compellent does not offer true Active/Active Controllers for any Storage. Dell Controllers Explained! Dell Verified Answer. Reference from Dell Forum….
“In the Compellent Architecture, both controllers are active. Failover is done at either the port or controller level depending on how the system was installed. Volumes are “owned” by a specific controller for the purposes of mapping to servers. Changing the owning controller can be done – but it does take a volume down.”
I can confirm that this is exactly Dell Customer support advised me when I called them. Dell Compellent can take up to 60~90 seconds to failover from one controller to another. Which means entire virtual environment will go offline for a while and get back online. To update firmware or to replace a controller you have to bring everything down then bring everything back online which will cause a major outage and productivity loss for entire organization.
Performance Issue: To identify Dell Compellent bottleneck for a virtualization platform hosted in Compellent. Run the following in Windows perfmon in a virtual machine or a physical machine where a volume of Compellent storage is presented via HBA or iSCSI initiator. Use Windows perfmon, create a data collector set of the below attributes and generate report using PAL tools. Extract seek time, latency, IOPS and queue depth in the Compellent storage. You will see bottleneck in every area of storage you can expect. Read further on Windows Performance Monitoring Tools
\LogicalDisk\Avg. Disk Sec/Read
\LogicalDisk\Avg. Disk Sec/Write
Use the following Tools to analyse workloads and storage performance in your storage area network:
Summary: Dell Compellent makes an interesting argument for all-flash performance tiers. Yes this argument is in sales pitch not in reality. A price conscious poor man who needs just any SAN and has a lower IO environment can have Compellent. For mainstream enterprise storage, Dell Compellent is a bad experience and can bring disaster to corporate storage area network.