Disaggregated Physical Storage Architectures and Hyperconvergence

Posted on September 8, 2016

In the data center, technology advances in distinct steps rather than smoothly along a continuum. One day we don’t have SSDs, and the next, the first SSD is generally available and we do. Market adoption, however, smooths out those distinct steps into what looks more like a gradually sloping line. Some technologies see more rapid adoption than others, and a recent example of this is hyperconvergence.
Hyperconvergence is about economic advantages, operational efficiencies, and performance. And as it stands today, hyperconvergence is one of the very best ways to simplify the data center while delivering fantastic performance and unheard of flexibility. But data center architectures come and go. Prior to pushing towards hyperconvergence, the most desirable physical architecture was exactly the opposite: shared storage purposely not inside the servers.
I recently participated in a Tech Field Day Extra session at VMworld 2016 where Rich Petersen and Serge Shats from SanDisk presented a quick update and then dug into SanDisk’s Flash Virtualization System offering. The awesome technology that will cause Flash Virtualization System to be viable got me thinking about future architectures – perhaps an augmented version of hyperconvergence, or perhaps something beyond hyperconvergence entirely.

The Problem with Storage Today

As much rapid progress has been made in the storage space recently with denser and cheap non-volatile memory and faster interconnects, we haven’t arrived yet. There’s still more performance a flexibility to be had in the world of enterprise storage.
Part of the challenge is that SSDs have progressed beyond our ability to push them to their limits. Back in early 2014, I remember hearing Andy Warfield of Coho Data talk about how difficult it was at that time to actually push modern flash storage to it’s limits because it was too. fast.

Andy Warfield explaining the challenge of tapping into flash’s potential

Between the storage stack in the OS and the command set used, we’ve had to work hard to access the true power of modern flash storage. The SCSI command set that SAS and SATA protocols use was designed for spinning disks, not flash. As such, a new protocol was needed that could take advantage of all that flash storage can offer.
It took some time, but the NVMe storage protocol matured to the point that it is readily available for today’s enterprise consumers. Drivers are available in all modern editions of Windows, Linux, and VMware. But there’s one problem, at least for the data center. The initial implementation of NVMe leverages PCI Express, which allows the promised performance from devices that are internal to a server.
But what if storage needs to scale to beyond what can fit in a single server? Or what if an organization is unwilling or unable to put disks inside of their servers? Enter NVMe over Fabrics.

NVMe Over Fabrics

Just recently (June 2016) the NVMe over Fabrics update was announcing, allowing the NVMe command set to be used over proven data center fabrics like Ethernet, Fibre Channel, and Infiniband. This is accomplished by encapsulating NVMe commands in RDMA protocols like iWARP and RoCE – again, proven data center protocols.
NVMe over Fabrics shares the same base architecture as the existing NVMe over PCie implementation, but allows the simple implementation of additional fabrics as well. 90% of the NVMf code is shared with the original NVMe PCIe code.

Image source: Joint Flash Memory Summit presentation by EMC, HGST, and Mellanox > http://www.flashmemorysummit.com/English/Collaterals/Proceedings/2015/20150811_FA12_Overview.pdf — Image source: Joint Flash Memory Summit presentation by EMC, HGST, and Mellanox

“BUT!” you say. “The whole purpose of placing disks inside of servers is to allow the low latencies associated with that physical proximity. If you put these flash devices in a shared storage appliance, aren’t we right back to traditional SAN and NAS?”
That’s an astute observation, and that’s the key difference between external SSDs accessed via NVMe over Fabrics as opposed to a traditional remote storage protocol like iSCSI or NFS. Using one of these SCSI-based protocols can add 100+ microseconds of latency just due to translation (and that’s without accounting for the network).
NVMe/f is different. NVMe commands and structures are transferred end-to-end, resulting in zero translation overhead, and only a very minimal amount for encapsulation. This design has allowed the NVMe/f developers to come within a few microseconds of matching direct-attached PCIe latency. What’s the impact of that?

The Case for External Storage

With NVMe over Fabrics, high performing, low latency flash storage can now be aggregated outside of servers, but accessed with latency that rivals an architecture where the devices are inside the servers. This means, among other things, hyperconverged-like performance and flexibility with shared storage-like overhead and economy. It’s literally the best of both worlds.
Performance isn’t the only consideration, however. While it may make sense to have a shared tier of extremely high performance storage that performs as if it was local, there’s still plenty of room in organizations for slower, cheaper storage. With that in mind, many storage experts propose the following architecture for future storage topologies. It will leverage both NVMe/f connected flash devices as well as more traditional SCSI-based block/file or object storage.
The flash devices at the top of the rack could be used for caching (as will be presented briefly) or it could be used as a sort of Tier 0 in a scenario where some sort of auto-tiering mechanism will promote and demote hot data to and from the top-of-rack flash.

NVMe/f connected TOR Flash and SCSI-based bottom-of-rack storage.

SanDisk’s Flash Virtualization System

As a practical example, the Flash Virtualization System that SanDisk presented is a perfect use of this technology. (I’m going to call it FVS for short – I’m not sure if their marketing documents follow that convention…)
The FVS makes use of FC and Infiniband today and the vSphere APIs for I/O Filtering that SanDisk helped VMware develop. Relevant to this article, it will also soon make use of NVMe over Fabrics technology to further increase performance and reduce latency. Serge explained the VAIO framework and features at the VMworld TFDx last year. VAIO allows third parties (like SanDisk) to install I/O filters at the vSphere host which allow the filtering of I/O without having to give those third parties access to run software in kernel mode.
The I/O filter running in the user world means that security and kernel stability are not compromised, yet a solution can be inserted directly into the storage path at the vSphere host level. In the case of FVS, this will be for the purpose of cacheing.

The Flash Virtualization System makes use of SanDisks FlashSoft 4 software, which works with any standard SSD attached via SATA, SAS, PCIe, or NVMe and supports vSphere 6-compatible datastores including VMFS and NFS. With this technology (and the help of SPBM, which it leverages) storage workloads can be accelerated via top-of-rack flash (or a more traditional design, too) on a per-vDisk level.
Because of the architecture, it’s pretty easy to test without any downtime, and because of the granularity with which it allows acceleration of workloads, administrators are afforded quite a bit of control when testing and implementing the product.
Learn more about SanDisk’s Flash Virtualization System here.

Does Any of This Spell the End of Hyperconvergence?

No, it doesn’t. Bear in mind that hyperconvergence as a data center architecture paradigm is less about placing storage inside of servers and more about combining services and features into one scalable, easily managed platform.
Whether SSDs are inside or outside of a server from a physical standpoint doesn’t really change the direction of where hyperconvergence is headed – the fundamental paradigm that makes it attractive is still completely viable.