Replicate your database to the client

Every time I read about the CAP theorum, I feel like we're missing something.

We've gone to an awful lot of effort to make our database cluster consistent and reliable. We have proofs around how we expect it to behave, and we test the crap out of any claims our databases make.

But there's another side to this picture which we never talk about - the client.

Why can my client still show out of date information sometimes? Why do I have to manually refresh? Our database secondaries don't have to poll the primary to get updated data - why does my app? Worse, why do some web apps require me to refresh the page?

The metaphor is old and crappy

I remember visiting a bank branch a few times with my mum when I was a child.

When we wanted to interact with our money, we'd first wait in line. When we got to the front of the line, we'd perform an action ("What is my balance?", "Deposit this money", "Withdraw some dollars"). Then we'd go home. If we needed to interact with the bank again later, we'd visit again.

This is not how we want to interact with our data.

The main screen of modern apps is often called the 'dashboard'. Thats a really great metaphor. I want to interact with the world like an airline pilot does - with all the information I need in front of me, as well as all my controls. When I interact with the world, I want to see the numbers change as I do so. Tight feedback loops help us learn and understand the choices we're making.

The modern version of a bank branch is a tool like Mint.com.

If you haven't seen it, watch this video by Brett Victor about working spaces. Its the same idea:

In every case, we want to see a live view of all the data. We don't necessarily know what interactions to make until the data is presented and understood.

Clients are database replicas

But our databases don't work this way. We're still thinking of transactions as serial events. Our users are still a queue of customers in the branch, not airline pilots. Modern databases are amazing software, but we can genuinely support realtime apps a lot better than we do.

I think we should start thinking of every database client as a partial replica. Just like a real database replica, clients need to be told when the data they store gets updated. Its sort of obvious when you think about it.

Why?

  • A user looking at changing data should see it update in realtime just like the replicas do. I never want to see out of date information in a browser tab.
  • You should always know happens if a user performs an action while offline. If your client is offline, its really a kind of network partition. We know how deal with those already.
  • If the client is part of the replica set, fat web apps can do database transactions too.

We're already doing this here and there, but its all adhoc. Because databases don't support these features, we're hacking realtime change feeds into our ORM / MVC systems. I think realtime data feeds should be a first class feature supported by databases. Database authors should be thinking about how their replication strategy will work in the client.

Of course, this will require making a lot more infrastructure - but I honestly think the database (or 'data layer') is the best place for it. Its clearly possible to scale realtime updates - Twitter and Facebook both do it, and they have more data than you and I.

This isn't a hard problem.

The fun part of this problem is that by building replication into our databases, we've done most of the work already. And there's already a handful of hackneyed solutions appearing on top of classical databases to solve this problem:

  • Meteor implements this by piggybacking off mongo's replication oplog. The replication oplog is constantly parsed on your behalf, and any data your client is interested in gets pulled out and sent to the browser.
  • Derby uses my operational transformation code in ShareJS to make a live, realtime database view from the browser. The database API is called livedb.
  • Firebase brings transactions to the client - but I have no idea what their tech stack looks like behind the scenes.

But I'm still waiting for this to become a first class feature in a production database. Where's the secure, realtime client API for postgresql queries? I think this is going to be the next big killer feature for databases over the next few years.