10 Oct 10 on Tech Episode 013: J Metz on NVMe and NVMe-oF
Last week at Storage Field Day 11, Intel presented on the challenges that are created by storage media and access methods being too fast. At this point in time, storage media delivers much higher performance than existing software and application stacks are able to fully leverage. In the presentation on SPDK, Jonathan Stern (@JonSternAtIntel) walked through the contributions of the Storage Performance Development Kit to the storage industry as a whole.
Storage Performance Development Kit
Intel’s SPDK is “an open source software project dedicated to providing building blocks for scalable and efficient storage applications with breakthrough performance.” The open source nature of SPDK means that assuming they follow proper licensing guidelines, anyone is able to leverage this work and integrate it into something bigger.
The foundation of SPDK is the NVMe driver (and by extension, the NVMe-oF target). Jon also talked about how they created the Block Device Abstraction Layer to allow both iSCSI and NVMe-oF protocols to bypass the OS kernel and directly access media via NVMe. The BDAL also allows these protocols to make use of Intel® QuickData Technology which enables data copy by the chipset instead of the CPU. This allows data to move more efficiently through the server and provides fast, scalable, and reliable throughput.
This Week’s Show
A big portion of the incredible storage performance that we enjoy today and that necessitates projects like Intel’s SPDK is NVMe. Being that we’ve lived in a SCSI world for such a long time, it’s hard for many people to grasp exactly what NVMe is and begin thinking about storage access this way. To help us understand it better, I’m excited to bring to the show J Metz (@drjmetz) who is possibly the most well-dress storage guy you will ever meet. He’s also an academic, researcher, and general force of nature in the office of the CTO at Cisco; his focus is on storage and storage networking. He’s currently serving a term on the board of directors (called Promoters) for NVM Express, Inc, which is the organization driving development and awareness of NVMe. I would be hard pressed to find someone more appropriate to discuss NVMe with than J, so this is a very exciting episode, and I’m kicking myself for creating a show where we only got to spend 10 minutes with J. This could easily have been a 60 minute show that I think most listeners would have appreciated!
J also writes on his blog (www.jmetz.com) and shared two very important posts with me to pass along. If you find that you need to learn more about NVMe after listening to this show, J has curated (and written personally) a substantial amount of information on NVMe.
- If you’re looking to get up to speed and have a strong grasp of NVMe in general, use this “program of study”: https://jmetz.com/2016/08/learning-nvme-a-program-of-study/
- If you’ve got a decent grasp of NVMe but need to delve into a certain area of focus, reference this extensive bibliography: https://jmetz.com/2016/08/a-nvme-bibliography/
|James Green:||Hello, and welcome to another episode of 10 on Tech. I’m James Green with ActualTech Media and I’m your host. Today, I have a special guest. It is none other than J Metz. J is a R&D engineer in the Office of the CTO at Cisco and he works on really interesting stuff, at least to me, especially in the storage arena. Last week, we were out at Intel at Storage Field Day 11 and we did quite a bit of talking about some of the improvements they’re making in their offerings and to the general community as far as the way they’re dealing with fast storage. A big part of fast storage today is NVMe. That’s why J is here. J is actually on the Board of Directors at NVMe … Oh, what do you call it, J?|
|J Metz:||NVM Express group.|
|James Green:||Yep, the NVM Express group. He’s actually helping to shape the way that this develops. Obviously, he’s very involved as Cisco as well, in the way that they’re going to be supporting and leveraging this stuff. J, thanks for being on. I really appreciate it.|
|J Metz:||I’m very excited to be invited.|
|James Green:||If you want to look J up, you can get him on his Twitter account. It’s @drjmetz. He writes about this kind of stuff over on his blog and that’s jmetz.com. You can find him over there as well. First of all, J, I know that not everybody is familiar with NVMe and that’s kind of the foundation of a lot of what we’re going to see changing in the near future. Can you just explain for anybody who doesn’t know, what is NVMe and as kind of a sidecar to that, what problem are we solving by developing NVMe?|
|J Metz:||Let me answer those questions in reverse order, if I can.|
|J Metz:||One of the things that’s been happening within the world of storage is that we’ve been getting faster and faster processors and faster and faster networks. We’ve also been getting faster and faster storage media. Previously, when we had storage media like spinning disks, you’d pick what is storage protocol to access them, meaning SCSI. Once you start to get into flash devices, flash media, SSDs, those kinds of things, you start to realize that there’s some inefficiencies in the way that SCSI traditionally has been architected for these kinds of solutions.|
|What the NVM Express Group did was create a protocol for addressing storage inside of a server where the memory space of an SSD could be shared with a CPU. Previously, in a SCSI-based system, you had to go through an adapter and that adapter turns the language of the CPU into the language of the drive or SCSI. You had to have an adapter to do that. With NVMe, you don’t. You basically have a CPU that can talk natively, using PCIe commands to directly address the memory inside of a device. Because they both speak the same language and there’s no translation going on, it’s considerably more efficient and you can do some interesting things with it. Once you uplevel that, you get added efficiencies as the more scalable you go.|
|James Green:||Can you give me some kind of idea … What is the order of magnitude of a difference that we’re talking about here when we go from a SCSI and now using PCIe in communicating directly with the CPU. How big of a difference does it really make?|
|J Metz:||Well, in the first iteration, in terms of performance, there were benchmarks of getting 2-1/2 times the performance for NVMe-based, using the same exact equipment, so the same device, running on a SCSI bus versus an NVMe bus, the same server, the same CPU, you’re talking about a difference between 2 and 2-1/2 times the performance.|
|James Green:||Wow, so very much the non-trivial difference there. Okay, so now the kind of new, exciting thing that’s been worked on more recently is extending NVMe outside the server chassis. This is called NVMe over Fabrics and that’s something that you and I were talking about last week and I have written about recently. Can you tell me more about NVMe over Fabrics and what additional benefit we’re going to get from being able to access the media that way?|
|J Metz:||Sure. Well, it shows that physics doesn’t change all that much. Back 20 years ago, when we had devices inside of a server and you wanted to scale beyond the capabilities of that server, you had to get remote access. That’s where storage networks came into play, like Fibre Channel and eventually, iSCSI and some of the other ones that we all know and love. We’re doing the same thing with NVMe. You need to be able to scale beyond just what a physical server can hold. To do that, you have to do a couple of other things.|
|You have to make sure that the connection has some integrity. You have to be able to recover from errors. You don’t share the memory space like you would inside of a server, so there’s a couple of different minor changes that need to be made. The overall architecture for NVMe Express is kept intact but now you’re just going outside of a server. The way that the organization has done this is to make it completely transport-agnostic. From an NVMe perspective, it doesn’t matter what the actual network is. It could be Fibre Channel. It could be Infiniband. It could be Ethernet and inside of Ethernet, it’s usually RoCE or iWARP.|
|James Green:||It sounds an awful lot like the server and SAN architecture that we’ve been dealing with for a while. Is it the same or is it different?|
|J Metz:||Well, see this is where things start to break down when you start to use the same metaphor for trying to apply to new things. Realistically, what we’re doing is we’re looking at the strength of a network to be able to handle the parts of storage traffic that NVMe’s not very good at. NVMe is typically a shared memory space with PCIe but when you try to extend outside of a server and you don’t want to use PCIe, you have to make it resilient. Right? You have to have high availability. There are transports that have that built into it, but NVMe does not. If you happen to prefer one version of transport versus another, there’s plenty of ways of getting your favorite transport network to do that but you maintain the advantages of NVMe because the actual end devices themselves are keeping that efficiency that we got when it was stored inside the server.|
|James Green:||It sounds like a big part of the vision is to get all the benefits of NVMe without reinventing the wheel where it’s not necessary.|
|James Green:||Great. With regard to those fabrics, this is something I was hoping we would have time to hit on, most things that I read or talk to someone about these days with regard to NVMe over Fabrics are referencing ethernet, RDMA protocols. It seems like there’s good reasons why that’s a good place to start, but I know that there’s an article on your blog that I read not too long ago where you were talking about why we need to be thinking about Fibre Channel as well and why we shouldn’t leave Fibre Channel out in the cold. Can you tell me just, at a high level, what is inherent in Fibre Channel or what do we know about Fibre Channel that also makes that a viable option? Why should be be looking there?|
|J Metz:||NVMe over Fabrics is completely agnostic. You could do NVMe over Fabrics with any kind of fabric that you want and Fibre Channel and RDMA-based protocols are all just among the list. Fibre Channel has a distinct advantage in storage networks because of the fact that it’s been around for a long time. It’s an extremely well-understood protocol and highly reliable. It also has a very good discovery mechanism. That means that using the Fibre Channel base, you’re actually getting built-in discovery as well as the network itself. Finally, the way that the Fibre Channel is sold, it’s always sold with an end to end qualification, so you’ll never buy a Fibre Channel device without it being a qualified part of a solution. Those are the 3 reasons why I listed the Fibre Channel as something to look at.|
|It’s particularly useful if you already have Fibre Channel, which is really the kind of audience I was aiming for in that environment. RDMA-based protocols are extremely popular and with good reason, because of the fact that ethernet is an understood technology as well. The only thing that RDMA has to work with or to move forward is the fact that it doesn’t have quite the longevity in the storage networking world. It’s been useful for back end systems but not front end systems, not end to end solutions with hosts. Not to say that’s a problem. It’s just saying that there are questions that people have as to how it should be put in place. Those will be fixed. Those will be figured out but right now, there’s no one winner. It’s all about what people want to use and what they’re comfortable with.|
|James Green:||Sure. If there’s one thing that we know about needs with regard to storage, it’s that it needs to be reliable and one thing we know about Fibre Channel is that, at this point, it’s pretty reliable. I would say it’s a valid argument. I have one final question for you before we wrap up here. We have been hearing about this stuff for a while, but it’s definitely still developing and as I’ve been learning about it, I’ve seen NVMe over Fabrics, which is a long thing to say, abbreviated a number of different ways, probably 4 or 5 different ways, which happens as something like that is getting developed. Being that you’re a part of the body that’s making decisions about this, can you tell me, is there an official way to properly abbreviate NVMe over Fabrics?|
|J Metz:||Yes, it is a common question. The proper way to abbreviate NVMe over Fabrics is NVE … Oh, wait. I can’t even spell it, can I? NVMe-oF.|
|James Green:||Got it, so NVMe like we’re used to seeing -oF, lower case o, capital F.|
|James Green:||Okay. Well, thank you, J. That was super interesting. If listeners want to go and find more about J’s thoughts on the topic, you can go over to his blog, JMetz.com where he writes about this kind of thing or get ahold of him on Twitter. He’s @drjmetz. If you want to learn more about NVMe in general, you can head on over to the NVMe Express website at NVMExpress.org and you can learn some more over there. Thanks for your time, J. I really appreciate it.|
|J Metz:||Yeah. Thanks for the invite.|