Tuesday 24 June 2008

Database design tips for massively-scalable apps

Here's an interesting post on design practices for building massively-scalable apps on database infrastructure such as Google BigTable.

The takeaway: this ain't your granpappy's old relational database system, so throw out everything he taught you. Denormalize. Prefer big fluffy things to small granular things. Don't bother with DB constraints - enforce the model in the application. Prefer small frequent updates to large page updates.

The good news (or bad, depending on how fed up you are with your local DBA) - don't bother with all this unless you intend to scale to millions of users.

3 comments:

Regina Obe said...

This is all interesting stuff. I suppose its there to solve the same issues as all those new fangled column datastore architectures.

The main issue I have with the author is that he seems to be mixing model with implementation. The implementation of that model I think is the main problem to solve not the model itself. Perhaps he meant that but it certainly wasn't clear.

The relational model doesn't solve all problems and is not suited for all, but I think one of the great achievements of the relational revolution, was not the relational model itself, but the idea of separation of concerns. That the best place to decide how to update and retrieve data should be a function of that thing that is looking at and collecting stats about the traffic back and forth - not the applications using the data.

This allows hybrid implementations that deal with different usage patterns differently such as Joey is an avid writer but no one reads his stuff and Jimmy is an infrequent writer, but everyone reads his stuff. It may make sense to treat their data differently, but the applications pushing those changes are too low down and already complicated enough to deal with these connections in an efficient manner.

Dr JTS said...

I'm in sympathy with your views, Regina - the Relational model has served us well for a long time, and you abandon it at your peril.

That said, there's no arguing with success, and there seem to be a lot of high-profile web apps that have adopted this new paradigm. Google and Amazon of course aren't helping by promoting their scalable "database" offerings, which as I understand them absolutely required these kind of techniques.

No doubt the RDB vendors could explore how to make their offerings run on massively scalable hardware, and it would be interesting to see how this would work. (But think of the per-processor licensing cost...). Or, perhaps this is really a niche market, and they will be content to sell to more conventional customers.

Interesting times for the database industry! The onslaught of cheap hardware really seems to be shaking up long-standing paradigms.

Regina Obe said...

I'm actually fine with abandoning the relational model if there is an impedance mismatch with the ideas I'm trying to express and what it allows me to. I just don't like mixing model with implementation details and it seems that's what all these silly google like things are forcing you to do.

It is the same reason I don't program in assembly language. Too much silly house-keeping and I'm way too lazy for that kind of work. I would rather spend time thinking about how to state the same idea in 1 sentence than having to state it in 2. That's what a model does for me. Implementation I wear a different hat.

I'm a database programmer - because I'm just really too lazy to do real programming :)