Predictability: The King of Infrastructure Outcomes

Posted on April 3, 2019

Workload performance predictability is a critical consideration but does it run contrary to the current trend toward infrastructure simplification?

I’ll start by highlighting the two topics I’m discussing in this article:

Simplicity: As organizations seek to redeploy IT talent to more business-facing needs, they are adopting technologies that promise simplicity, streamlined deployment, and easier administration. This is not because IT staff are incapable, but the reality is that deeply technical skill sets can be difficult to acquire and can be expensive.

Predictable workload performance: Not that long ago, I wrote an article entitled Latency: The King of Storage Performance Metrics. I still maintain the belief that, when performance is viewed as the key contributor to an organization’s bottom line, latency is the most critical metric to watch across storage, compute, and networking. If your company’s singular goal is to maximize resource performance, latency is the figure you need to watch.

But, is latency really the king of metrics when predictable workload performance is desired?

Maximization vs. Optimization

Maximizing resource performance to minimize latency can be an expensive undertaking. There are all kinds of crazy things you can do to maximize just about any metric. Consider this analogy: in a business setting, you can maximize revenue by selling at a loss and undercutting your competition, but the end result will be unpleasant as your money runs out. Revenue was at an all-time high, but you kept losing money.

It’s pretty clear what went wrong.

Rather than maximizing metrics, your goal instead needs to be optimizing them. In the context of IT, maximizing performance can be achieved by undertaking some less than desirable means that could impact other parts of the business.

For example, you could have people dedicated to shifting workloads between different systems and clouds all day long so that they are operating at peak efficiency. That would be a good outcome viewed through a performance lens, but the human cost would eventually get expensive and there would be other technical side effects that you may not want to deal with. There are technologies such as VMware DRS to helps to alleviate this, but not everyone has that available to them for every workload.

Further, if you’re shifting workloads between clouds to maximize performance, you may end up paying huge egress fees to do so. Again, the focus on maximizing performance above all else is not the sole metric that needs to be considered.

Why Am I Even Thinking About This?

You may wonder why this topic is on my mind. I attended an Intel product launch event this week and the company announced a launch of a number of products, including new CPUs, new storage devices, and new network adapters. During the presentation describing the new Intel Ethernet 800 Series network adapter, the presenter described application device queues (ADQ). I’m not going to get into ADQ technical stuff here but do want to focus on one of the benefits that the company associated with the feature.

“Increases application predictability”

Along with this, the feature promises to reduce latency and increase throughput, but the predictability promise was first in the list and it reinforced to me what’s important, particularly in the context of the other market dynamics that I described above, including a desire for simplicity and streamlined infrastructure.

Predictability Reigns Supreme

It’s clear that predictability is a more desirable outcome than simply driving latency out of the equation. Latency is a contributor to and a function of predictability. Today’s infrastructure architects have to always keep latency in mind as they design infrastructure environments that provide predictable application performance.

Why is predictability important? Businesses need to know that the applications they’re running are running in a way that doesn’t impact the predicted performance of other workloads. Predictability increases client and customer satisfaction since they don’t see slowdowns and the like. A consistent and positive customer experience is one in which the customer’s expectations are met or exceeded.

This is one of the key reasons that, rather than deploying the biggest and baddest systems around, infrastructure architects often adopt a series of smaller systems that are clustered to provide a consistent application experience. Predictability would require that there be consistency in how workloads operate and that there is a level of availability that would ensure that a workload continues to operate at a predictable level even in the event of a failure somewhere in the cluster.

It also means that you’re not going to run all systems at peak performance 24/7. You need some overhead available to allow individual workloads and resources to burst as demand levels increase. This allows you to maintain consistent performance levels. Of course, this is typically the way that we’ve designed our virtual environments and we’ve implemented tools like VMware DRS to help maintain it.

Initially, establishing an architecture that provides predictability will take some additional effort, but it’s less effort than it would take to constantly move workloads around to minimize latency. Latency has to be considered, of course, but only inasmuch as it informs application performance guidelines.

Simplicity as a Goal

This might be where you say, “This is as it has always been” in terms of a desire for predictability, but there are other trends to content with. Namely, the aforementioned trend toward simplified IT infrastructure.

As you look around the market, companies are scooping up things like hyperconverged solutions to ease management. Hyperconvergence isn’t the only example, but it’s one of the most prevalent. At the same time that companies want consistent, predictable application performance, they also want solutions that don’t require the technical depth that used to be prevalent.

We used to have deeply technical architects design, tweak, and constantly tune systems. Now, we need those systems to do some of that on their own. This is why features like Intel’s ADQ are so important. In an era in which simplicity is so cherished, predictable performance can’t be an afterthought. It needs to be baked right into the infrastructure.