The Serverless Supercomputer
Harnessing the power of cloud functions to build a new breed of distributed systems
The Serverless Journey So Far
When AWS Lambda launched on November 13, 2014, it created a new way of writing applications in the cloud. Today, developers around the world in every conceivable type and size of organization enjoy the lower costs and faster time to market of Serverless apps.
The journey from that initial release to today isn’t just about growth in adoption and scale: It’s also the story of an ever-increasing “target space” for Serverless apps. What began as a way to react to objects stored in Amazon S3 has grown to encompass event handling across the portfolios of every major cloud vendor. Over the years new capabilities were added that enabled Serverless architectures to be used for building mobile and web backends, process streaming data, replace cron jobs, and much more. Cloud vendors have found ways to make more and more of their capabilities serverless: not just the obvious ones like object stores and pub/sub services, but also SQL databases, data analytics queries, IoT, and more. This innovation is far from over— recent advances such as Amazon API Gateway’s ability to auto-frame websocket traffic and turn it into Lambda function calls, also known as mullet architectures, enables a whole new class of capabilities that formerly required heavyweight servers and deployment methodologies. These innovations come in many forms: new patterns and frameworks, new services (like AWS’s AppSync and Google’s Cloud Run), and the ever-growing set of features among the existing cloud vendor Serverless offerings.
The simplicity of Serverless coupled with its rapid pace of innovation has helped it become wildly successful — the benefits of not needing to own, scale, manage, or secure infrastructure lets developers focus on their business instead of the care and feeding of servers or containers. These benefits aren’t just technical — companies that adopt Serverless architectures also typically lower their cloud compute bills by up to 90% or more thanks to the fact that Serverless functions achieve 100% utilization, versus typical corporate server utilizations of 10–15%.
But as great as these advances have been, even enabling some companies to go entirely Serverless, they’re not enough. Large scale data processing, parallel algorithms, streaming content, distributed job management and plenty of other tasks are still primarily handled using the same multi-tier server architectures we were deploying in the 1990’s. Is it possible to apply Serverless techniques to big data, ML modeling, generic algorithms, video transcoding, and other compute-intensive domains and gain the same kinds of cost and time-to-market improvements that it offers today for web, mobile, and event-based applications? Can we get the same benefits for High Performance Computing (HPC) applications that Serverless brought to event handling and web and mobile backends?
The Rise of the Serverless Supercomputer
Complexity is never a virtue. The promise at the heart of Serverless approaches is that mapping code to servers (or containers), and keeping those servers running securely at scale, should be the responsibility of cloud vendors. Figuring out where to run code and how to keep the infrastructure healthy represents an unnecessary distraction for every application developer — regardless of what their application might be. This has never been more true than in the world of big data and big compute, where the problems of partitioning, deploying, and managing servers can be an overwhelming challenge to the developers. It’s even more true when you consider that the “developer” of such applications might be a biology researcher, statistician, astronomer, or other professional with a need to crunch numbers but not necessarily the desire or skill to scale and babysit servers.
Meanwhile, Serverless functions now possess massive fleets, with aggregate computing power that’s larger than the entire computing capability of the public cloud just a few years ago. With that scale of processing power, Serverless functions in the cloud now rival purpose-built supercomputers in the total amount of silicon they employ…the question is no longer whether they will scale as a service, but whether we can find ways to harness all that power in actual HPC applications.
There are some tantalizing suggestions that this is possible. The world’s fastest video transcoding algorithm known today, ExCamera, doesn’t run on GPUs or other specialized hardware…it was built using AWS Lambda. Serverless algorithms for highly parallel linear algebra are already within 33% of conventional approaches while exhibiting 240% better compute efficiency. Fannie Mae has ported massive Monte Carlo simulations on loan portfolios from mainframes to AWS Lambda, resulting in faster completion times, lower costs, and more flexible scaling. Other researchers have found ways to take jobs normally run on laptops and desktops and move them into the cloud, shaving hours off distributed build and compute times.
Solving existing problems at this scale is exciting, but it doesn’t stop there. It’s also highly likely that Jevon’s Paradox will apply — for example, if we can find ways to parallelize video transcoding such that editing can happen in real time, more people will edit more videos. This suggests that, far from being the rarefied province of researchers and a few hardcore number crunchers, “Serverless Supercomputing” will actually unlock new applications and enable powerful new features in the apps we use today. Bottom line? The easier and cheaper it becomes to apply massive computing power to find answers quickly to hard problems, the more likely we are to ask such questions.
Serverless Supercomputers Are Still Missing some Parts
The examples above are exciting — they illustrate what’s possible with a Serverless approach and that it has the potential to revolutionize these applications…just as it changed how we write event handlers and API-driven backends. But here’s the bad news: Each of the teams building one of the applications listed above had to manually create a lot of scaffolding in order to get their job done. None of these were simple “out of the box” applications, and creating that underlying framework for distributed algorithms on top of vendor-provided Serverless offerings was nontrivial in every case. The good news? It was also fairly similar — and we can read the list of what was required as a form of product spec for building the Serverless Supercomputing platform of the future.
Before we tackle the specifics, it’s worth asking the broader question: Why would Serverless algorithms and architectures differ from what’s come before? The answer is simple: For most of history we’ve thought about the theory and practice of distributed systems as unlimited time on limited space. The problem statement in pretty much every distributed systems research paper or highly parallel algorithm has been to harness a limited set of machines (servers) that live forever to perform a computation that can’t fit inside just one of them. Serverless turns that fundamental assumption on its head: Its computation model is limited time on unlimited space. That shift causes us to rethink everything about our approach to distributed algorithms…and creates the disruptive potential for entirely new innovations that can bring previously unavailable computing power to the masses!
What’s needed to enable a Serverless Supercomputer? Creating that capability requires a combination of closing gaps to conventional (“serverful”) compute, like networking, as well as crafting new services that address the impedance mismatch between compute (cloud functions like AWS Lambda) and server-based storage technologies like redis. It requires fundamentally new algorithms for sorting and searching that can exploit Serverless’s ability to tightly envelope costs while playing nicely within its temporal limitations. And it requires new frameworks, including the ability to rapidly distribute and choreograph work across tens of thousands of functions running in parallel. Lets take a look at some specific “missing parts” and the roles they would play in a Serverless Supercomputer architecture:
- Distributed networking — All distributed algorithms depend on the ability to connect computing nodes to one another, and in many cases on the ability to dynamically establish and modify the topology of those communication paths. Today’s cloud functions don’t support this capability directly, and in fact most disable inbound TCP/IP. To make Serverless Supercomputing possible, we need to address this gap in the fundamentals.
- Fine-grained, low-latency choreography — While existing managed workflow services do a nice job at modeling business processes, they are generally orders of magnitude too slow (and too costly) to manage the lifecycle of tens of thousands of function lifetimes, let alone to coordinate intra-function invocation transitions. A Serverless Supercomputer will also need custom workflow functionality, such as straggler management, that reflects its unique semantics and challenges.
- High-speed key-value stores — Probably the most obvious missing piece in the managed portfolios of today’s cloud vendors is a high-speed (DRAM-based) key-value store. Cloud object stores such as Amazon S3 offer low-cost bandwidth and high throughput, but at the expense of high latency. NoSQL databases, such as Amazon DynamoDB, offer faster performance but at costs that are too high to effectively serve as a shared memory system for transient state. (This isn’t surprising; DynamoDB is designed to make writes persistent and durable…neither of which are typically necessary for storing the in-flight state of an algorithm that could simply be restarted.) Server-based infrastructure deployments like redis (or their cloud vendor equivalents) suffer from a grave impedance mismatch with Serverless functions, taking us back to the world of individual server management and scaling. Fortunately, recent work in academia shows promise in better matching storage lifetimes to Serverless algorithms and in designing cost effective scalable storage.
- Immutable inputs and outputs associated with function invocations — Systems that have succeeded at utilizing the power of Serverless at massive computational scale have all needed to find a way to create input and output bindings in order to execute large-scale dataflows. Unlike the event-based bindings that exist today in some Serverless architectures, these objects are usually ephemeral outside of the algorithm’s execution and occur at massive scale. Fortunately, immutable object management frameworks have proven tractable to layer on top of existing cloud vendor services; they just need to be harvested out of academic research and made available to developers at large.
- Ability to customize code and data at scale — Serverless functions like AWS Lambda have “control planes” (APIs for creating, updating, and deleting functions) that were designed for human interaction scale. But these control planes are ill-suited for a situation where 10,000 parallel function invocations each need slightly different code to perform a coordinated algorithm and then disappear forever. Similarly, cloud function invocation is designed to transmit arguments (such as event payloads) only once, when the function is invoked, and then return a single result when it completes. Distributed algorithms may require large amounts of data to be delivered throughout a function’s lifetime, with partial results being transmitted before the function exits.
- Execution dataflows — With support for the preceding items, it’s possible to create and execute dataflow graphs that represent complex, multi-node distributed algorithms at massive scale. These dataflows will represent a new layer of abstraction for Serverless computing, because they describe the entire computation, not just how to run or invoke an individual function.
- General purpose and domain-specific algorithms customized to the “size and shape” of Serverless functions — Finally, developers need support for both general purpose (sorting, searching, grouping, etc.) and domain-specific (video transcoding, genetics) libraries that are tuned to the Serverless platform. With these in place, they can focus on the specific needs of their algorithms and avoid reinventing the wheel to get things done.
The Journey Ahead
As the saying goes, the future’s already here…it’s just not evenly distributed yet. Building any new technology stack takes time and work, so expect to see the Serverless Supercomputer emerge incrementally over the next several years. University researchers are already hard at work and companies like ServerlessTech are already tackling the problem of distributed networking for cloud functions.
Fortunately, it’s not an all-or-nothing proposition: Every incremental step will open up new application domains to take advantage of Serverless approaches. Just adding support for networking, for example, makes prewarming capacity, peer-to-peer file sharing, real-time content distribution, and exchanging additional arguments and results with a cloud function possible. Cloud vendors also have critical roles to play that only they can address: improvements to network bandwidth and jitter, additional memory and file system storage, higher (and easier to self manage) limits on concurrency, and a host of feature details from managed NAT customization to more contextual information inside of functions. There’s a lot to do, but it’s never been a better time to be a developer or innovator in this space — the world’s most powerful and easiest to use supercomputer is being designed and built right now!
This article includes material from my ServerlessConf NYC ’19 slides.