Crypto can’t scale because of consensus … yet Amazon DynamoDB does over 45 Million TPS

Crypto can’t scale because of consensus … yet Amazon DynamoDB does over 45 Million TPS

The metrics point to crypto still being a toy until it can achieve real world business scale demonstrated by Amazon DynamoDB

14 transactions per second. No matter how passionate you may be about the aspirations and future of crypto, it’s the metric that points out that when it comes to actual utility, crypto is still mostly a toy.

After all, pretty much any real world problem, including payments, e-commerce, remote telemetry, business process workflows, supply chain and transport logistics, and others require many, many times this bandwidth to handle their current business data needs — let alone future ones.

Unfortunately the crypto world’s current solutions to this problem tend to either blunt the advantages of decentralization (hello, sidechains!) or look like clumsy bolt-ons that don’t close the necessary gaps.

Real World Business Scale

Just how big is this gap, and what would success look like for crypto scalability? We can see an actual example of both real-world transaction scale and what it would take to enable migrating actual business processes to a new database technology by taking a look at Amazon’s 2019 Prime Day stats.

The AWS web site breaks down Amazon retail’s adoption and usage of NoSQL (in the form of DynamoDB) nicely:

Amazon DynamoDB supports multiple high-traffic sites and systems including Alexa, the Amazon.com sites, and all 442 Amazon fulfillment centers. Across the 48 hours of Prime Day, these sources made 7.11 trillion calls to the DynamoDB API, peaking at 45.4 million requests per second.

45 million requests per second. That’s six zeros more than Bitcoin or Eth. Yikes. And this is just one company’s traffic, and only a subset at that. (After all, Amazon is a heavy user of SQL databases as well as DynamoDB), so the actual TPS DynamoDB is doing at peak is even higher than the number above.

Talk about having a gap to goal…and it doesn’t stop there. If you imagine using a blockchain (with or without crypto) for a real-world e-commerce application and expect it support multiple companies in a multi-tenanted fashion, want it to replace legacy database systems, and need a little headroom to grow — a sane target might look like 140 million transactions per second.

That’s seven orders of magnitude from where we are today.

The Myth of Centralization

Why are these results so different? Let’s examine this dichotomy a little closer. First, note that DynamoDB creates a fully ordered ledger, known as a stream, for each table. Each table is totally ordered and immutable; once emitted, it never changes.

DynamoDB is doing its job by using a whole lot of individual servers communicating over a network to form a distributed algorithm that has a consensus algorithm at its heart.

Cross-table updates are given ACID properties through a transactional API. DynamoDB’s servers don’t “just trust” the network (or other parts of itself), either — data in transit and at rest is encrypted with modern cryptographic protocols and other machines (or the services running on them) are required to sign and authenticate themselves when they converse.

Any of this sound familiar?

The classic, albeit defensive, retort to this observation is, “Well, sure, but that’s a centralized database, and decentralized data is so much harder that is just has to be slower.” This defense sounds sort of plausible on the surface, but it doesn’t survive closer inspection.

First, let’s talk about centralization. A database running in single tenant mode with no security or isolation can be very fast indeed — think Redis or a hashtable in RAM, either of which can achieve bandwidth numbers like the DynamoDB rates quoted above. But that’s not even remotely a valid model for how a retail giant like Amazon uses DynamoDB.

Different teams within Amazon (credit card processing, catalog management, search, website, etc.) do not get to read and write each others’ data directly — these teams essentially assume they are mutually untrustworthy as a defensive measure. In other words, they make a similar assumption that a cryptocurrency blockchain node makes about other nodes in its network!

On the other side, DynamoDB supports millions of customer accounts. It has to assume that any one of them can be an evildoer and that it has to protect itself from customers and customers from each other. Amazon retail usage gets exactly the same treatment any other customer would…no more or less privileged than any other DynamoDB user.

Again, this sounds pretty familiar if you’re trying to handle money movement on a blockchain: You can’t trust other clients or other nodes.

These business-level assumptions are too similar to explain a 7 order of magnitude difference in performance. We’ll need to look elsewhere for an explanation.

Is it under the hood?

Now let’s look at the technology…maybe the answer is there. “Consensus” often gets thrown up as the reason blockchain bandwidth is so low. While DynamoDB tables are independent outside of transaction boundaries, it’s pretty clear that there’s a lot of consensus, in the form of totally ordered updates, many of which represent financial transactions of some flavor in those Prime Day stats.

Both blockchains and highly distributed databases like DynamoDB need to worry about fault tolerance and data durability, so they both need a voting mechanism.

Here’s one place where blockchains do have it a little harder: Overcoming Byzantine attacks requires a larger majority (2/3 +1) than simply establishing a quorum (1/2 +1) on a data read or write operation. But the math doesn’t hold up: At best, that accounts for 1/6th of the difference in bandwidth between the two systems, not 7 orders of magnitude.

What about Proof of Work? Ethereum, Bitcoin and other PoW-based blockchains intentionally slow down transactions in order to be Sybil resistant. But if that were the only issue, PoS blockchains would be demonstrating results similar to DynamoDB’s performance…and so far, they’re still not in the ballpark. Chalk PoW-versus-PoS up to a couple orders of magnitude, though — it’s at least germane as a difference.

How about the network? One difference between two nodes that run on the open Internet and a constellation of servers in (e.g.) AWS EC2 is that the latter run on a proprietary network. Intra-region, and especially intra-Availability Zone (“AZ”) traffic can easily be an order of magnitude higher bandwidth and an order of magnitude lower latency than open Internet-routed traffic, even within a city-sized locale.

But given that most production blockchain nodes at companies like Coinbase are running in AWS data centers, this also can’t explain the differences in performance. At best, it’s an indication that routing in blockchains needs more work…and still leaves 3 more orders of magnitude unaccounted for.

What about the application itself? Since the Amazon retail results are for multiple teams using different tables, there’s essentially a bunch of implicit sharding going on at the application level: Two teams with unrelated applications can use two separate tables, and neither DynamoDB nor these two users will need to order their respective data writes. Is this a possible semantic difference?

For a company like Amazon retail, the teams using DynamoDB “know” when to couple their tables (through use of the transaction API) and when to keep them separate. If a cryptocurrency API requires the blockchain to determine on the fly whether (and how) to shard by looking at every single incoming transaction, then there’s obviously more central coordination required. (Oh, the irony.)

But given that we have a published proof point here that a large company obviously will perform application level sharding through its schema design and API usage, it seems clear that this is a spurious difference — at best, it indicates an impoverished API or data model on the part of crypto, not an a priori requirement that a blockchain has to be slow in practice.

In fact, we have an indication that this dichotomy is something crypto clients are happy to code to: smart contracts. They’re both 1) distinguished in the API from “normal” (simple transfer) transactions and 2) tend to denote their participants in some fashion.

It’s easy to see the similarity between smart contract calls in a decentralized blockchain and use of the DynamoDB transaction API between teams in a large “centralized” company like Amazon retail. Let’s assume this accounts for an order of magnitude; 2 more to go.

Managed Services and Cloud Optimization

One significant difference in the coding practices of a service like DynamoDB versus pretty much any cryptocurrency is that the former is highly optimized for running in the cloud.

In fact, you’d be hard pressed to locate a line of code in DynamoDB’s implementation that hasn’t been repeatedly scrutinized to see if there’s a way to wring more performance out of it by thinking hard about how and where it runs. Contrast this to crypto implementations, which practically make it a precept to assume the cloud doesn’t exist.

Instance selection, zonal placement, traffic routing, scaling and workload distribution…most of the practical knowledge, operational hygiene, and design methodology learned and practiced over the last decade goes unused in crypto. It’s not hard to imagine that accounts for the remaining gap.

Getting Schooled on Scalability

Are there design patterns we can glean from a successfully scaled distributed system like DynamoDB as we contemplate next-generation cryptocurrency blockchain architectures?

We can certainly “reverse engineer” some requirements by looking at how a commercially viable solution like Amazon’s Prime Day works today:

  • Application layer (client-provided) sharding is a hard requirement. This might take a more contract-centric form in a blockchain than in a NoSQL database’s API, but it’s still critical to involve the application in deciding which transactions require total ordering versus partial ordering versus no ordering. Partial ordering via client-provided grouping of transactions in particular is virtually certain to be part of any feasible solution.
  • Quorum voting may indeed be a bottleneck on performance, but Byzantine resistance per se is a red herring. Establishing a majority vote on data durability across mutually authenticated storage servers with full encoding on the wire isn’t much different from a Proof-of-Stake supermajority vote in a blockchain. So while it matters to “sweat the details” on getting this inner loop efficient, it can’t be the case that consensus per se fundamentally forces blockchains to be slow.
  • Routing matters. Routing alone won’t speed up a blockchain by 7 orders of magnitude, but smarter routing might shave off a factor of 10.
  • Infrastructure ignorance comes at a cost. Cryptocurrency developers largely ignore the fact that the cloud exists (certainly that managed services, the most modern incarnation of the cloud, exist). This is surprising, given that the vast majority of cryptocurrency nodes run in the cloud anyway, and it almost certainly accounts for at least some of the large differential in performance. In a system like DynamoDB you can count on the fact that every line of code has been optimized to run well in the cloud. Amazon retail is also a large user of serverless approaches in general, including DynamoDB, AWS Lambda, and other modern cloud services that wring performance and cost savings out of every transaction.

We’re not going to solve blockchain scaling in a single article 😀, but there’s a lot we can learn by taking a non-defensive look at the problem and comparing it to the best known distributed algorithms in use by commercial companies today.

Only by being willing to learn and adapt ideas from related areas and applications can blockchains and cryptocurrencies grow into the lofty expectations that have been set for them…and claim a meaningful place in scaling up to handle real-world business transactions.


Crypto can’t scale because of consensus … yet Amazon DynamoDB does over 45 Million TPS was originally published in A Cloud Guru on Medium, where people are continuing the conversation by highlighting and responding to this story.