Storage Design in 4 Easy Steps – Part 2
Storage Design in 4 Easy Steps – Part 2
Master Skew and Bandwidth!
Designing for Skew
For a properly designed “hybrid” array (an array that still has both spinning disk and flash) you MUST understand “skew”. In our last article, we defined working at as the amount of data that’s active at any given time on the storage array.
In part one of Storage Design in 4 Easy Steps we went through a few sizing steps. For use with our skew example we’ll use the data below.
The question is, where do we find out skew? Most all the reports from array vendors will allow you to get some sort of rough idea of your skew. The screen show below is from what’s commonly referred to as a “MiTrends” report by EMC / Dell; your array vendor almost certainly has something similar.
As you can see there’s quite a bit of information here, including the bandwidth this array is consuming but we’ll focus on the skew and bandwidth.
From our math we know that we need just over 75,000 IOPS, and 75 percent of that workload is on 26 percent of our LUNs. So what’s 26 percent of the total capacity we require?
371Tb * .26 = 96Tb of “data skew”
And how many IOPS will be on that 96Tb or capacity?
76,055 Total IOPS required * .75 = 57,049 IOPS
In a perfect world your extreme performance tier in your array should be capable of 57,049 IOPS and have 96Tb of capacity. If you look at our configuration below we’ve found a disk size combination that meets both or size and performance requirements:
96Tb of capacity and 79,688 IOPS
Could we design something that would “technically” meet the customer’s requirements, but not give them the performance they require? Absolutely we can and there is why when you’re comparing options from two vendors the price maybe so different. Look at the next image…
This is what some vendors / partners will try and do. Technically this configuration meets our requirements. It has 96Tb of capacity and achieves almost more IOPS than required, but look at the size of the SSD capacity… It’s only 5% of the total storage in this configuration. That means that the capacity simply isn’t large enough to hold the amount of data that will be worked on any given day.
So that important SQL query that runs your business could be left sitting down on spinning disk when you really need it on a flash drive, OUCH!
Real World Storage Design Disclaimer!
In the real world, you wouldn’t most likely have budget for an array with 26% flash, I’m using these inflated numbers as an example for your learning. It should illustrate to you how important it is to look beyond “marketing hype”. Be really careful when any one comes to you and blindly tosses out performance numbers without understanding your environment and workload. There are other factors from an IOPS perspective that make excellent design even more tricky, block size will also impact your design. The above content should help you call B.S. on a less than honest pre-sales engineer.
What I’ve found in my personal experience that having about 10% SSD in your array is a good sweet spot for cost and performance. 5% SSD is the absolute minimum that I would ever put in, and that’s ONLY for customers that are HIGHLY cost constrained.
Bandwidth calculations are quite a bit more straight forward. Below is a chart which I was able to find online. They give you what I would consider best case numbers for drive bandwidth, in my personal experience I would only count on each of those drive types providing me with 50% of the stated bandwidth.
REMEMBER, any number you see in print will be a vendor’s BEST-CASE scenario; I generally cut it half!
So how are we going to calculate out bandwidth? Remember in our example we need 1,500Mb of bandwidth. Looks like one of those PCIe SSD drives will work, and it most likely would. Sadly those drives aren’t readily available in storage arrays. You’ll mostly be working with 2.5 in SLC or MLC SSDs.
Now that we have a realistic idea of the bandwidth we have, we need to take our RAID penalty off.
Most storage arrays (minus VMAX) use the following raid types:
RAID 5 (4+1)
RAID 6 (4+2)
If you’re not familiar with raid, a quick google search will explain those terms. What this means for is simple. It means that for our raid 5 example we should only count on getting 4 drives with of bandwidth. The same is true with our raid 6 example, we should only plan on getting 4 drives worth of bandwidth. This is a VERY over simplified statement about raid, but it’s a good rule of thumb to use without going into more detail here.
This is where things get interesting…
It’s very possible if you have workload that had a very large bandwidth requirement, but a very low IOPS requirement traditional spinning disk could perform better for you. If you don’t need all those IOPS why would you buy all that flash? This is generally true in big data platforms like Hadoop, but be aware of this, flash is not always the answer.
In our case how many drives will we need to achieve 1,500Mb of bandwidth?
With SSD each raid group will give us: 4 x 250Mb of bandwidth = 1000Mb
With NL or SAS each raid group will give us: 4 x 100Mb bandwidth = 400Mb
In our case we would need 2 * SSD raid groups or 4 * NL or SAS raid groups.
Bandwidth is a much easier issue to handle unless you have some unusually workload. Most of my customers don’t achieve more than 4-6Gb of bandwidth on their arrays. Just by the capacity requirements they will have enough disks to meet their bandwidth needs.
I hope you’ve found these articles helpful. Again, these articles are not a storage technical deep dive, but they should give you a solid foundation to understand if your vendor or partner is designing a storage platform that’s meeting your needs. It should help you ask the write questions to sniff out a bad storage design.
Have you had positive or negative experiences with your storage vendor or partner? What lessons did you learn that could help others?