ATMCOMIO

Leverage AWS For High Performance Apps With WekaIO's Cloud-Based Parallel File System

Amazon’s AWS is a great platform for many types of enterprise applications, but it wasn’t built for high-performance needs, like AI. The challenge is at the file system level and that’s where a new startup, WekaIO, is making a difference.
At AWS re:Invent 2017, I met with the co-founder and CEO of WekaIO, Liran Zvibel, to learn about the company’s scalable cloud-based file system, which was built from ground up for compute-intensive applications, and that is now available on AWS Marketplace. The company officially launched in July 2017 and in less than half a year made huge wins, including appearing on stage with Amazon at re:Invent.
WekaIO’s founders have many years of experience and a lot of expertise in this space. They created a shared file system that’s efficient, super-fast, scalable and feature-rich. It offers high throughput and low latency and supports Ethernet and Infiniband, which is ideal for even the highest performance needs. On top of this, it can run on-prem as well as in the cloud. In fact, Zvibel told me it’s the highest performing file system available on-prem and also the highest performance cloud native file system.
WekaIO Configurator
During our conversation, we discussed WekaIO’s capabilities, benefits and applications, touching on what differentiates this cutting edge file system from the rest.

Bringing High Performance to AWS

When I met with Zvibel the company had just announced its availability on AWS Marketplace, which coincided with the launch of version 3.1 of WekaIO Matrix.
Matrix 3.1 offers a new snapshot and clone functionality that simplifies remote backup to Amazon S3; the entire file system and its data can be backed up to S3 without the need for dedicated backup or DR tools. What’s more, the remote cloud copy can be updated automatically without impacting application performance.
“They don’t cause any performance degradation,” Zvibel told me, “so taking them is instantaneous and you don’t have to care. Whether you took a snapshot or didn’t take a snapshot, it’s the same performance, so why not?”
This functionality is so cool that even AWS had to demo it at a breakout session at re:Invent, showing a WekaIO cluster running in one region, taking a snapshot and pushing it to S3.
Zvibel explained that in addition to the backup and DR benefits, you can also leverage the snapshot to S3 functionality for cloud bursting.
“You take the snapshot, you push it to S3, and you can spin up an I3 cluster,” says Zvibel. “Our performance really linearly scales. Double the instances will get you double the performance, ad infinitum, to many, many thousands of instances.”
And you’re only paying for the hours that you run, which makes Matrix a really cost-effective tool for cloud bursting.
“We are the highest performance low profile system and also by far the highest performance cloud native file system,” Zvibel told me,” but we also enable you to run a hybrid cloud center that is really efficient.”
The new snapshot functionality also incorporates a Pause/Resume feature for cloud bursting and data archiving, which you can’t even get on Amazon, according to Barbara Murphy, VP of Marketing for WekaIO.
“You take a snapshot, push it to S3, collapse the cluster and the next time you want it, you just bring it up,” Zvibel explained.
“You only pay for it when you’re using it,” Murphy adds. “It’s was better performance and way more economical than having compute only running against [Amazon’s] elastic block services.”

Beyond Distributed

What Zvibel was describing sounded like a distributed file system. However, WekaIO’s parallel file system goes beyond a distributed file system. It can aggregate performance over many devices concurrently and in parallel, according to Zvibel, which is how the system is able to offer such low latency and high throughput.
WekaIO GUI
“A parallel file system is where the clients understand the cluster,” Zvibel told me, “so when you’re doing a large read, the read can go and get it from the right places in the cluster.”
Zvibel explained that what they’ve implemented is a parallel file system over NVMe-over-fabrics that that’s even faster than a local file system over NVMe, and that’s for both I/Os and also metadata.
“Other file systems can get you high throughput, but they’re aren’t low latency, so if you have tons of small files, you’re not really getting the high throughput,” says Zvibel.

WekaIO Use Cases

WekaIO is ideal for any compute intensive applications and technical compute workloads, including AI, data analytics, machine learning, robotics, media production and mixed storage workloads. On-prem, the file system’s most popular use case is machine learning, according to Zvibel. These services are typically filled with expensive GPUs that are left idling without WekaIO.
“We can get them to read line write,” says Zvibel, “even if you have small files, like voice samples of small images. The problem with other file systems is that they can get you high throughput, but they aren’t low latency, so if you have tons of small files, you’re not really getting the high throughput.”
That’s where WekaIO differentiates itself and can even outperform all-flash filers, like Pure Storage FlashBlade on throughput.
“Not only are we faster for I/O, because we also paralyze metadata operations and we can scale them,” says Zvibel, “we’re also faster for metadata operations.”
Zvibel explains that that’s WekaIO’s fast tier, which can run on-prem and in the cloud and can handle any compute-intensive work, not just machine learning. In fact, he says that they’ve been able to accomplish a very strong benchmark at one of the top semiconductor companies, cutting down their average compilation time to less than half of what they were used to.
“Compilation is as intensive as it gets on small I/Os and metadata,” says Zvibel. “No other file system could do what we did.”
Murphy also shared some of the use cases they’re seeing, including autonomous vehicle applications, analytics systems in the genomic space, machine learning in the digital pathology space and other medical and life sciences applications.

Eliminating The Data Accessibility Bottleneck

Running high performance applications in the cloud is challenging because it puts tremendous pressure on storage. As Zvibel explained, if you can’t get the storage into the cluster you’re paying for machines that are sitting and idling, starving for data. WekaIO eliminates this problem by moving high performance apps to the cloud and offering high performance file services that can serve that up.
“With [WekaIO] in the Amazon cloud, we actually enable something that right now is very, very difficult to do,” says Zvibel.
And it’s all software. WekaIO works with Dell, HP, Super Micro and any OEM vendor, including the Open Compute Platform.
“You can run the software on any standard x86 infrastrucutre,” says Murphy. “All we require is one core, 10 gigs of RAM and some amount of SSD and you’re off to the races. It’s that easy.”
To learn more about WekaIO and get started with the fastest file system for high computing, visit weka.io.