Dell Compellent: A Poor Man’s SAN


I have been deploying Storage Area Network for almost ten years in my 16 years Information Technology career. I have deployed various traditional, software defined and converged SANs manufactured by a global vendor like IBM, EMC, NetApp, HP, Dell, etc. I was tasked with the deployment of Dell Compellent in my previous role for several clients. I was excited about the opportunities and paused after reading the documentation presented to me. I could not co-relate implementation of a SAN and expected outcome desired by clients. When over wild sales pitch is sold to businesses with high promises, then there will always be hidden risks that come with this sales pitch. Lesson number one never trusts someone blindly although they have a very decent track record, resellers are often after a quick sale and get out. Lesson number two make sure you know who to trust as your partner in the transition to have a new SAN. Decide what to procure based on your business case, ROI, workload analysis, capacity planning and outcome of requirement analysis. Consider current technology trend, where you are at now, a technology road map and where you want to be in future, e.g. AWS or Azure. Capital investment can be the one off exercise these days before you pull the plug off on the on-premises infrastructure and fork-lift to Azure or Amazon. Consider aligning technology stream with the business you do. I have written this article to share my own experience and disclose everything I learnt through my engagement on Dell Compellent deployment projects so that you can make a call by yourself. I will elaborate each feature of Dell Compellent and what exactly this feature does when you deploy a Compellent. FYI I have no beef with Dell. Let’s start now… “Marketing/sales pitch” vs “practical implication.”

Target Market: Small Business

Lets not go into detail, that will be a different topic for another day. Please read Dell’s business proposition “Ideally suited to smaller deployments across a variety of workloads, the SC Series products are easy to use and value optimized. We will continue to optimize the SC Series for value and server-attach.”

Management Interface: Dell Compellent Storage Center has a GUI designed to be accessible allegedly ease of use. Wizards offer few everyday tasks such as allocation, configuration, and administration functions. Compellent Storage Center monitoring tools provide very little insight on how storage backend is doing. You have to engage Dell remote support for diagnostic, and monitoring tools with alert and notification services. Storage center is not as granular as the competitor NetApp and EMC. Storage center has little information on storage performance, bottle neck and backend storage issues. Compellent is by design thin provisioned storage. There is no option in management center to assign as thick provisioned volume. IOPS and latency are calculated in volume and IOPS and latency are calculated in disks are far too different than real IOPS. You may see little IOPS in volume but click at drive level IOPS you will see storage controller is struggling to cope with the IOPS. Management center does not provide any clues who is generating this much IOPS.

Contact technical support they will say RAID scrub is killing your storage. Your standard request to tech support that stops the RAID scrub in a business hour. “You cannot do it” another classic reply by tech support. If you go through Compellent management center, you will find nothing that can schedule or stop RAID scrub.

Data Progression: In theory, Data Progression is an automated tiering technology that should have optimised the location of data, both on a schedule and on demand as prompted by a storage profile. Compellent’s tiering profiles streamline policy administration by assigning tier attributes based on the profile. On-demand data progression in a business hour will drive Compellent into crazy. If you are Citrix VDI mainstream than your workload is pretty much dead until data progression is complete.

A side effect of this technology is storage controller struggle to maintain on demand data progression and IO request at the same time hence there will be queue depth, and longer seek time in backend storage. In this situation, storage seek time is higher than normal.

Storage Profile: Storage profile in lay man’s terms is segregating expensive and cheap disk and profiling them in tier 1 (SSD RAID 10), tier 2 (15K Fibre Channel RAID 10, RAID 5, RAID 6) and tier 3 (7.2K SATA RAID 5, RAID 6). The storage profile determines how the system reads and writes data to disk for each volume as they are known in Compellent terms and how the data ages over time a feature called Data Progression. For example, random read request goes to tier 1 where you kept hot data, and a year old emails go to tier 3.

Storage Profiles supposed to allow the administrator to manage both writable blocks and replay blocks for a volume. It is fundamentally a tiering of storage in a controlled way. In theory, it supposed to be in a controlled environment. However, in reality, it does add extra workload to Dell Compellent controller. Let’s say you have tiered your storage according to your read and write intense IO. What happens when to READ and WRITE intense volume gets full?. Storage controller automatically triggers an on demand data progression from upper tier to lower tier to store data. Hence a WRITE intense IO is generated in lower tier what you wanted to avoid in the first place that’s why you profiled or tiered your storage. Mixing data progression with storage tiering defeats whole purpose of storage profiling.

Compellent Replay: Replay is essentially a storage snapshot in Dell terms. Dell Compellent Data Instant Replay software creates point-in-time copies called Replays. With Data Instant Replay Dell Compellent storage Replays at any time interval with minimal storage capacity. But here is the catch you will be most likely to run storage replay during the daily backup window. Backup generates lots of READ IOPS and Replays generate lots of READ and WRITE IOPS at the same time which is a daily backup window. Hence your backup is going to be dead slow. You will run out of the backup window and never be going to finish backup before the business hours. It will be a nightmare to fulfil data retention SLA and restore of any file systems and sensitive applications.

IOPS & Latency: Input/Output per second is a measurement unit of any hard disk and storage area network (SAN). This is a key performance matrix of a SAN regardless of manufacture, and this matrix remains unchanged. If you are to measure a SAN, this is where you begin. Never think that you have a bounce of virtual machines and it’s okay to buy SAN without IOPS consideration. There is the difference between a virtualised DHCP server and virtualised SQL server. A DHCP server may generate 20 IOPS but a SQL server can generate 5000 IOPS depends on what you are running on that SQL server. Every query you send to a SQL server or the application depends on the SQL server generate IOPS both read and write IOPS. For a Citrix VDI and App customer, you have to take into consideration that every word document you load, you generate IOPS, once you click save button on a word document, you generate write IOPS. Now you multiply by the number of users and session you are running.

Now think about latency, in plain English, latency is the number of seconds or milli seconds you wait to retrieve information from a hard disk drive. This is calculated in round-trip between your request and the hard disk serve your request. Now you think millions of requests are bombarded on the storage area network. A SAN must sustain those requests and serve application requests, again it depends on what sort of workload you are running on a SAN. For example, file servers, Citrix profile, Citrix VDI, Exchange Server and SQL servers need low latency SAN.

In Dell Compellent, you may see volume IOPS e.g. 2000 but if you view disks hosting the same volume, then you might see 5000 IOPS. Then you must ask question how-come 5000-2000=3000 IOPS are generated automatically. Does Compellent has any tools to interrogate storage controller to see how additional workloads are generated? No it doesn’t. Your only bet is Dell support telling you the truth if you are lucky. The answer is automated RAID scrub is generating extra workloads on storage i.e. 3000 IOPS which could have been utilized for real workloads.

To co-relate this analysis with an all flash array storage, e.g. Dell Compellent, the SAN must be able to offer you the major benefits of a storage area network. If this storage cannot provide you low latency and high IO throughput for sensitive applications and workloads then you need to go back to drawing board or hire a consultant who can analyse your requirements and recommend you the options that match your need and budget. For further reading find Citrix validated solutions, storage best practices recommended by VMware and Microsoft. There are many tooling available in the market for you to analyse workload on applications, on a virtual or a physical infrastructure.

RAID Scrub: Data scrubbing is an error correction technique that uses a background task to inspect storage for errors periodically, and then correct detected errors using redundant data in the form of different checksums or copies of data. Data scrubbing reduces the likelihood that single correctable errors will accumulate, leading to reduced risks of uncorrectable errors.

In NetApp, you can schedule a RAID Scrub that suits your time and necessity however in Dell Compellent you cannot schedule a RAID Scrub through GUI or Command line. Dell technical support advised that this is an automated process takes places every day to correct RAID groups in Dell Compellent. There is a major side effect running automatic RAID scrub. RAID scrub will drive your storage to insane IOPS level, and latency will peak to high causing production volume to suffer and under perform. Performance of virtualisation will be degraded so badly that production environment will struggle to serve IO request. Dell advised that Dell can do nothing about RAID scrub because RAID scrub in SCOS operating systems is an automated process.

Compellent Multipathing: By implementing MPIO solution you eliminate any single point of failure in any physical path (s) and logical path(s) among any components such as adapters, cables, fabric switches, servers and storage. If one or more of these elements fails, causing the path to fail, multipathing logic uses an alternate path for I/O so that applications can still access their data. Each network interface card (in the iSCSI case) or HBA should be connected by using redundant switch infrastructures to provide continued access to storage in the event of a failure in a storage fabric component. This is the fundamental concept of any storage area network AKA SAN.

New generation SANs are integrated with multipath I/O (MPIO) support. Both Microsoft and VMware virtualisation architecture supports iSCSI, Fibre Channel and serial attached storage (SAS) SAN connectivity by establishing multiple sessions or connections to the storage array. Failover times may vary by storage vendor, and can be configured various way but the logic of MPIO remains unchanged.

New MPIO features in Windows Server include a Device Specific Module (DSM) designed to work with storage arrays that support the asymmetric logical unit access (ALUA) controller model (as defined in SPC-3), as well as storage arrays that follow the Active/Active controller model.

The Microsoft DSM provides the following load balancing policies. Microsoft load balance policies are generally dependent on the controller design (ALUA or true Active/Active) of the storage array attached to Windows-based computers.

Failover
Failback
Round-robin
Round-robin with a subset of paths
Dynamic Least Queue Depth
Weighted Path

VMware based systems also provide Fixed Path, Most Recently Used (MRU) and Round-Robin Configuration which is the most optimum configuration for VMware virtual infrastructure.

To explain ALUA in simple terms is that Server can see any LUN via both storage processors or Controller or NAS Head as active but only one of these storage processors “owns” the LUN. Both Storage Processor can view logical activities of storage using physical connection either via SAN switch to the server or via direct SAS cable connections. Hyper-v or vSphere ESXi server knows which processor owns which LUNs and sends traffic preferably directly to the owner. In case of controller or processor or NAS Head Failure Hyper-v or vSphere server automatically send traffic to an active processor without loss of any productivity. This is an essential feature of EMC, NetApp and HP products.

Let’s look at Dell Compellent now. Dell Compellent does not offer true Active/Active Controllers for any Storage. Dell Controllers Explained! Dell Verified Answer. Reference from Dell Forum….

“In the Compellent Architecture, both controllers are active. Failover is done at either the port or controller level depending on how the system was installed. Volumes are “owned” by a particular controller for mapping to servers. Changing the owning controller can be done – but it does take a volume down.”

I can confirm that this is exactly Dell Customer support advised me when I called them. Dell Compellent can take up to 60~90 seconds to failover from one controller to another. Which means entire virtual environment will go offline for a while and get back online. To update firmware or to replace a controller you have to bring everything down then bring everything back online which will cause a major outage and productivity loss for the entire organisation.

Multipathing options supported by the Host Utilities

Multipath I/O Overview

Multipathing Considerations

Dell Compellent is not an ALUA Storage Array

Performance Issue:  To identify Dell Compellent bottleneck for a virtualisation platform hosted in Compellent. Run the following in Windows perfmon in a virtual machine or a physical machine where a volume of Compellent storage is presented via HBA or iSCSI initiator. Use Windows perfmon, create a data collector set of the below attributes and generate a report using PAL tools. Extract seek time, latency, IOPS and queue depth in the Compellent storage. You will see a bottleneck in every area of storage you can expect. Read further on Windows Performance Monitoring Tools

\LogicalDisk\Avg. Disk Sec/Read

\LogicalDisk\Avg. Disk Sec/Write

\LogicalDisk\Disk Bytes/Sec

\LogicalDisk\Disk Reads/Sec

\LogicalDisk\Disk Writes/Sec

\LogicalDisk\Split IO/sec

\LogicalDisk\Disk Transfers/sec

Use the following Tools to analyse workloads and storage performance in your storage area network: 

Capacity planning & workload analysis tools

Multi-vendor storage performance and capacity monitoring

RVTools 

Windows Perfmon

PAL Analaysis Tools

Storage load generator / performance test tool

Dell EqualLogic Storage Management Pack Suite for SCOM

Monitoring EMC storage using SCOM 2012 SP1 with ESI Management Packs

IBM Storage Management Pack for Microsoft System Center Operations Manager

Summary: Dell Compellent makes a compelling argument for all-flash performance tiers. Yes, this argument is in sales pitch not in reality. A price conscious poor man who needs just any SAN and has a lower IO environment can have Compellent. For mainstream enterprise storage, Dell Compellent is a bad experience and can bring disaster to corporate storage area network.

I had no doubt when Compellent introduced all flash arrays it was innovative but Compellent’s best days are gone. Just shop around, you will find better flash arrays nowadays which are built on better software, controllers and SSDs. There are flash arrays in the market which run clever codes and algorithm within the software to produce high IO, low latency and performance for sensitive applications.

Related Articles: 

Pro Tips For Storage Performance Testing

Storage Top 10 Best Practices

SQLIO download page

SQLIOSim tool KB article

SQL Server Storage Engine Team Blog

SQL Server I/O Basics

SQL Server I/O Basics – Chapter 2

Predeployment I/O Best Practices

Disk Partition Alignment Best Practices for SQL Server

EMC Symmetrix DMX-4 Enterprise Flash Drives with Microsoft SQL Server Databases

Implementing EMC CLARiiON CX4 with Enterprise Flash Drives for Microsoft SQL Server 2008 Databases

Microsoft. “Lab Validation Report: Microsoft Windows Server 2012 with Hyper-V and Exchange 2013

Microsoft TechNet. “Exchange 2013 Virtualization.”

Microsoft TechNet. “Failover Clustering Hardware Requirements and Storage Options.” Aug 2012.

About Raihan Al-Beruni

My Name is Raihan Al-Beruni. I am working as an Infrastructure Architect in Data Center Technologies in Perth, Western Australia. I have been working on Microsoft technologies for more than 15 years. Other than Microsoft technologies I also work on Citrix validated solution and VMware data center virtualization technologies. I have a Masters degree in E-Commerce. I am certified in Microsoft, VMware, ITIL and EMC. My core focus is on cloud technologies. In my blog I share my knowledge and experience to enrich information technology community as a whole. I hope my contribution through this blog will help someone who wants more information on data center technologies.
Gallery | This entry was posted in Azure, Windows Server and tagged , , , , , , , , , , , , , , , , , , , . Bookmark the permalink.

24 Responses to Dell Compellent: A Poor Man’s SAN

  1. steve says:

    Thanks for this. My boss is looking at one of these, but I prefer the look of the EMC Unity 300 all flash array. have you any blogs on those EMC Unity Flash SAN’s?

    Liked by 1 person

  2. Ugo says:

    We’ve been using a Compellent system for 2 years and here’s our experience: 1- Data Progression and RAID scrubs run in the background at a high priority (all production iops are degraded before executing a RAID scrub or Data Progression operation). We’ve seen a performance hit from these operations. 2- The failover time, for us, is almost 60 seconds. We have never done live firmware updates and failover tests. 3- replays create a lot of read iops. We perform daily snapshots during our backup window and it’s not a transparent process. I don’t have experience with any other competitor because we were on DAS before, but I can say that the Compellent system has been a pain, has performed least to our expectations and the support has been less than expected. They also have a pricing model that separates the licensing from the hardware, so you typically get to pay what you want. I can’t confirm because we’ve just started asking for quotes from other vendors.

    Like

  3. Bill says:

    Every storage vendor does things differently, so it is up to customers to research how the product works. There can be many other factors such as not configuring storage correctly, not testing failover before going into production, etc. The customer had not ever tested the failover and also neglected to do any firmware updates for years.

    With regard to data progression/tiering and compression/dedupe, there are cases where Compellent did not perform well. For example, if you have one 24-bay shelf of SSDs, one 24-bay shelf of 10k and one 12-bay shelf of 7.2k, then obviously we have serious issues once it pushes older data down to the lowest tier! SSDs have thousands of IOPs per disk, 10k is perhaps 200 and 7.2k is under 100. So in this scenario, we have latency hits. I have run backup jobs against production volumes while background processes were running and did experience latency issues.

    I can bad mouth about Dell’s products. But even if you are spending double, triple or 10 times the cost for some custom-built SAN and paying for someone else to manage it, I assure you that Dell Compellent SAN will have issues.

    So in conclusion, I will say that Compellent have various weaknesses.

    Like

  4. Jason says:

    We selected Compellent solely based on price. Honestly I would rate it only slightly better than a QNAP we used (which was even cheaper). If performance and reliability are factors in your decision (and they should be) I would recommend looking at something like a VNXe.

    Like

  5. John says:

    Hey, having 2 years of experiences with Dell SC series i can only say its performance is very low when it comes to High enterprise demands…I could go on for hours what that statement, from VVols VM (metadada VM, that MUST NOT be turned off) to raid scrubbing, to bugs when you are updating controllers and you end up reimaging both of them due to unknown bugs, up until very bad white space reclaim performance….

    I am doing Unity now and its machine.

    Like

  6. Steven says:

    we selected the compellent because of comparatively low price but ended up in gutter because we could not migrate our production VDI infrastructure to Compellent due to poor performance. We start to receive lot of complains from end users that VDI session is flacky and loading applications are slow.
    Now we had to roll back VDI infra to old storage and extend maintenance contract of old storage. We are stuck between compellent and our old HP storage. It appears that our old HP storage has more I/O than new compellent. I will never buy this product again.

    Like

  7. John says:

    Wait until you get ODX problems if you are using MS platform 🙂

    Like

  8. Steven Crockett says:

    We had volume corruption several occasions. we restored from backup. I wish Dell listen to customer and improve this product. This is not the experience a customer should get. This is not the way a customer should be treated by customer service. Be aware when you buy Compellent.

    Like

  9. MARK VASSALLO says:

    We have experienced the worse when we bought this product. At first we thought minor teething problem. The more we migrate workloads to Compellent the greater the latency spike on storage. At some point, our VMware iSCSI stop functioning because there no response from storage. We have been through so much trouble to buy Compellent, now we cannot use it because high latency on storage specially our Exchange DAG took the hit from this stupid storage. What on earth Dell manufacturer such a product!

    Like

  10. Compellent User says:

    I agree with your critique of the Compellent, with one exception:

    You actually can thick-provision (pre-allocate) volumes. Formerly this option was exposed through the GUI, but after a SCOS (Storage Center OS) upgrade to 6.7.x, that function seems to only be available through PowerShell. However…while thick provisioning is possible, it conflicts so strongly with the “Compellent” way of life that I would not suggest using it at all.

    I urge you not to use Compellent underneath any OS+filesystem combination that cannot use thin-provisioned volumes (ie., the OS+FS must support SCSI TRIM & UNMAP commands).

    But there is another problem, you can thik provision using CLI but you end up in corrupting volume. Volume corruption is common occurrence in Dell Compellent series.

    Like

    • Compellent User says:

      It appears that the experience shared in this blog is correct and understandable.We have the similar experience as explained here. When I was searching for a solution of performance issues of compellent storage. I really got frustrated with deceptive and misleading answer by Dell support. Inexperience user like me should be explained in plain english and dell support should be truthful about this product. I am guessing that Dell support isn’t the sales guys who will lie on technical matter but be truthful and factual on the performance issues. I am frustrated that Dell support didn’t tell me the truth about RAID scrub, volume corruption and high latency of Compellent. Dell support continue to blame operating systems either Linux or Microsoft. Dell support continue blame Citrix application for under performing whereas you can see high latency and IOPS issues on VMware vSphere performance chart. It is a shocking customer support by Dell.

      Like

  11. Compellent user says:

    We have a compellent in our company. The best thing about the thing is : “the box it came in…” put it back in and return to sender. Copilot support is money down the drain. Use it as JBOD no more no less. Little experiences:
    – Firmware Upgrade = downtime
    – Cache controllers are slow and come withe the huge amount of memory <500Mb and slowest thing in the box (for we know)
    – Performance = wait for it… sloooowww
    – Technical know-how is probably gone to some other company. it's not there
    – Copilot optimize = Which dementia care program did they hire? All questions asked are forgotten the next time you speak with them…
    – Storage manager needs reinstall around every 6 months
    – After firmware upgrade you will wait and just pull the power plug and cross fingers..
    – HA ( Live Volume) = a stunning 200 Volumes which will get you … nowhere
    – Use the rest interface and you will be amazed how easy you crash the box
    Positive thing I have to say : You never know what will happen next…
    – Hourly spikes in the performance off all your system.. telling you high latency…

    After buying one you will get lots of experience of things you don't want in your new storage… Good luck…

    My advise to Micheal Dell : sell it while you can… otherwise run… For all new potential customers: run fools, run….

    Like

  12. XL4 says:

    we just installed SC4020’s. stumbled on barrage of problems. But we cannot fully deploy the storage in production. we can’t say this the experience we wanted to go through. Dell “coordination” and design completely fell apart. Probably go with another vendor (again), next time.

    Like

  13. Shawn Harper says:

    I really think that a lot of the information here is presented based on deeper understanding of Compellent. The Dell Storage Manager/Enterprise Manager Data Collector software does not have any automation at all. The issues with RAID scrub and progression sound like the system was not working as expected.

    Like

  14. Compellent User says:

    It appears that the experience shared in this blog is correct and understandable.We have the similar experience as explained here. When I was searching for a solution of performance issues of compellent storage. I really got frustrated with deceptive and misleading answer by Dell support. Inexperience user like me should be explained in plain english and dell support should be truthful about this product. I am guessing that Dell support isn’t the sales guys who will lie on technical matter but be truthful and factual on the performance issues. I am frustrated that Dell support didn’t tell me the truth about RAID scrub, volume corruption and high latency of Compellent. Dell support continue to blame operating systems either Linux or Microsoft. Dell support continue blame Citrix application for under performing whereas you can see high latency and IOPS issues on VMware vSphere performance chart. It is a shocking customer support by Dell.

    Like

  15. v3rd1ct says:

    I usually just lurk but after reading so many comments I had to pitch in, we run CML (SC40s and SC9000) and I’ve got nothing good things to say about this SAN. If I had to compare this SAN with the rest of the industry than I can say that this is by far the worse SAN we ever experienced including broken SSH remote session from the Compellent SAN to Co-Pilot for support.

    Like

  16. Brendan says:

    We used various storage time to time including IBM, HP and Dell. We had the last opportunity to use on-premises storage infrastructure. storage area network evolves a lot over the years and new storage vendor like Nimble does 100 times better job than the competition storage of same class. If I make an honest comparison between Nimble vs Compellent. I have no doubt in mind that Nimble will perform better with workloads like Citrix, SQL, Oracle. You should care about workloads not storage itself. Focus is workloads not virtualization, fabric or storage. Build an infrastructure based on the demand of workloads PERIOD.

    Like

  17. Samuel De Silva says:

    Here is anther issue: Dell Compellent with VMware SRM. SRM Reprotect fails. Co-Pilot said that delete volume….and re-do SRM…very strange suggestion!!

    Like

    • John Bader says:

      I had had the same experience as you. We engaged Dell consultant but outcome same. There are software bug in Compellent as well.

      Like

  18. James Lewis says:

    I have experienced many of the issues presented here. SC upgrades have interrupted my production environment nearly cost my job, and I’ve stopped upgrading my compellent storage. CoPilot support has not been a pleasant experience for me, and I really would have appreciated if co-pilot was based in the US as I have a hard time understanding Indian accents. In future, I will move from an SC8000 to the Nimble/HP/NetApp. I was annoyed how hard it was to manage Compellent with no automation at all. Obviously, your mileage may vary, but I’ve had a very negative experience overall.

    Like

  19. Kamal Gupta says:

    Dell Compellent is the worst SAN ever!, SC8000, SC4020 they make great pinatas . They claim the SC4020 is redundant, however when they have to replace the entire Chassis, you must power down both controllers since it is a “all in one” Don’t even start when you have a dual drive failure, you will be stuck monitoring a raid rebuild while Copilot tries to enable hot spares. Their upper management is even worse and that is including you Michael Dell. from the Copilots I worked with that are no longer there, they were great, however I was told that Dell Compellent gets butt hurt when their employees give their 2 week notice, Management gets upset and terminates them on spot because these guys found a better place to work. Sounds just like Dell. Dell – Doesn’t Even Last Long.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s