10 Oct Spartans and Servers and Storage, Oh My!
In 480 B.C., during the Greco-Persian wars, the famous Battle of Thermopylae took place between the invading Persian Empire led by king Xerxes and an alliance of Greek city-states led by Leonidas of Sparta. You may remember Leonidas and this battle from the movie “300.” The battle was notable and the legend has persevered throughout history because of the unique and strategic use of a physical, geographical bottleneck. An incredibly small number of Greeks (around 7,000) held off 100,000 – 150,000 advancing Persians for an entire week because the narrow coastal pass at Thermopylae forced the Persians to serialize their advance. Since the pass was so narrow, rather than surrounding the Greeks with their sheer numbers, they could only send a small number of troops into the bottle neck at one time at which point they had to contend with the Greek phalanx. Although the Persians eventually compromised the Greek defense by outflanking them via a mountain pass, the multi-day battle where an army outnumbered 20:1 held it’s ground is nothing short of incredible. What we learn from this is that bottlenecks are very powerful constraints.
Different segments of the data center technology market develop at different rates and thus, IT professionals commonly wind up in situations in which interconnected components have differing levels of maturity. There isn’t usually an interoperability problem here in the sense that it’s broken; however, this disparity often manifests itself in the form of a bottleneck. As it was for the Persians, a technical bottleneck in the data center is a big problem.
An Example – Storage Performance
A perfect example from recent data center history is the storage array. In the post-virtualization era, most businesses had realized the immense value that server virtualization could bring to their data center and moved from a fixed number of physical servers to an almost unlimited number of virtual ones. Due to the ease with which a new software-based server could now be deployed, server sprawl ensued and data centers filled up with more production operating system instances than one could have fathomed a few years prior.
Of course, the growing workload footprint required storage resources, in terms of both capacity and performance. And while capacity hasn’t been an issue lately, there was a period of time recently during which storage performance was absolutely an issue. Thanks to virtualization, it became quite easy to cram so many operating system instances and applications into a server cluster that modern storage arrays simply couldn’t keep up. Many virtualization users – especially those with highly diverse workloads with varying I/O characteristics – found it next to impossible to procure storage systems that could keep up with the demand without breaking the bank.
When technologists encounter situations like this, it’s important to understand that when viewed from distance sufficient to see the “big picture”, this problem is revealed as a temporary one. It’ll only be a matter of time before advances in storage array technology—whether from hardware development, software development, or some combination of the two—will make it possible to serve up way more I/O than is needed. That’s the situation we find ourselves in today. A mere couple of years down the road, I/O performance is rarely an issue. It’s almost always something else.
During the window in which this problem existed, however, a number of interesting solutions were brought to market. I don’t mean this in a derogatory sense, but for lack of a better term, many of these solutions were “stop-gap” technology. They merely filled the void between now and whenever this problem is resolved. In business, timing is everything, and waiting until some arbitrary date in the future when the problem goes away just isn’t going to be an option. The ever-increasing demand on IT for performance, availability, and flexibility doesn’t leave much room for waiting a few years to see how it pans out. There’s an easily-made case for buying these stop-gap solutions to fix the problem now.
In the storage performance example, the stop-gap was brilliant software that could leverage local flash devices placed inside servers to act as an intelligent cache which prevented I/O requests from ever being sent to the array. Rather than traversing a storage fabric, they were served locally instead. Suddenly, an underperforming storage array had years of life left in it because the bottleneck had moved. To make this cache work, the cache systems consumed local CPU, RAM, and SSDs in the virtualization host. Now, rather than being bound by the shared storage array, the performance was bound by how much CPU, RAM, and flash could be stuffed in a host. At the time, this resulted in considerably more performance, and for considerably less money compared to buying an upgraded array.
Now that the I/O performance problem has mostly gone away, these solutions are less prevalent, but in fact the discoveries and technologies have just been morphed into other solutions that will live on. For the foreseeable future, I believe this particular stop-gap for storage will still have applicability in cases where an array upgrade isn’t possible or necessary but a bit more performance is needed.
A Current Example
Although the storage performance bottleneck has moved and the disparity between the array performance and the demand from the host has moved, there will always be a bottleneck somewhere. That’s the nature of technology. At some point in the system, there is a constraint. Now that it’s no longer host-to-array I/O performance, where does the bottleneck lie?
It’s certainly not the only issue we’re currently working through, but one big problem that exists in the data center today is that operating system development hasn’t kept up with the capabilities of data center hardware, especially with regard to storage. That is to say, applications and operating systems are not always written to take advantage of the computing power we can now provide to them. And until we can circle back and spend time re-developing applications and operating systems to harness data center resources in a fundamentally different way, we have new bottlenecks with which to contend.
To describe this problem, we’ll use the example of an 8-core CPU in a Windows server. Use the diagram below to help you visualize this problem. A modern application like the latest version of Microsoft® SQL Server is written to be multithreaded, meaning that it can process data on multiple CPU cores rather than just one at a time. While this is immensely helpful, a bottleneck in the operating system remains: I/O operations are still handled in a serial fashion by a single CPU core. So even though there’s parallel computing going on, applications that perform I/O operations during that computing (read: pretty much any application) can only access storage as fast as a single CPU core can go.
This is problematic because while the core density on CPUs continues to increase, the clock rate of modern CPUs hasn’t increased recently, and probably won’t any time soon. So, in the future, you’re likely to have more cores to work with, but not faster ones. In the problem shown in the diagram below, the only way things get better is if cores get faster.
Since cores aren’t likely to get faster, the other logical way to solve this problem is to parallelize I/O operations at the operating system level so that I/O can leverage multiple cores rather than just one. And here’s where the “stop-gap” bit comes in:
Eventually, the operating system and applications will (hopefully) all be fixed so that this is handled properly by the operating system itself.
In the meantime, however, there is a boatload of money to be made if I/O can be completed more quickly. This is especially true in the arena of data and analytics (which is why MSSQL is such a great starting point). Therefore, many organizations are in the market for any sort of technology that can speed up these number crunching processes.
I wouldn’t be explaining this giant problem if there wasn’t a solution! At Tech Field Day 15, I was introduced to a fantastic workaround to this problem: DataCore’s MaxParallel™ product line. MaxParallel™ for SQL Server is the first in what we can anticipate is a series of software solutions that work in tandem with the application and operating system to parallelize I/O operations in a way that allows multiple CPUs to be utilized. Refer to the diagram below to see how, with MaxParallel™ installed, the single CPU core that was previously the bottleneck is now expanded to four CPU cores and all cores on the chip are being utilized to maximize performance.
The implications of this parallelization are that businesses will be able to process and analyze more data in a shorter time. This will allow them to reach better data-driven decision more quickly and effectively take action sooner. In an age where data is everything, being able to speed up reporting and analysis without adding hardware, re-designing applications, or modifying a single line of code could be the competitive advantage that takes a business to the next level.
Some logical applications of this technology are:
- e-Commerce / Brick-n-mortar Retail / OLTP
- High Frequency Iterative Analytics
- Inventory and Resource Optimization
- Fabrication / Construction Fault Detection
- Fraud Alerts and Prevention
- Threat Assessment
Rooted in Storage Design
DataCore is a company that is known for building storage-centric solutions that approach things a bit differently to achieve an amazing result. MaxParallel™ isn’t the first solution this well-established company has brought to market. DataCore actually holds three of the top ten spots – including the #1 spot – for SPC-1 IOPS benchmark results (as of this writing in October 2017). They also hold both the #1 and #2 spots for the Price-Performance IOPS benchmark ($/IOPS).
It’s probably safe to say that if you’re in the market to make things go faster, DataCore has some interesting ideas that you should at least consider. I would argue, as I think they would, that it’s their success in developing record smashing storage systems like these that gave them the insight into the problem which MaxParallel™ solves, as well as the understanding that’s needed to develop a strong solution.
DataCore has some nice data sheets and whitepapers for download as well as a 30-day trial of the MaxParallel™ for Microsoft® SQL Server software available on their website at https://www.datacore.com/maxparallel