Runaway complexity

Nearly 3 years ago I was sitting on a swingset in the middle of the night feeling sorry for myself. The deadline for a gamejam was fast approaching (we had until 5pm) and our game was terrible. After talking about it for an hour or two Jeremy and I came up with a new idea for a game and got to work.

We paired on the first version and got it working at 1:46 am. I know the time because I checked the first working version into a local git repo. It looked like this:

Initial version

The red and blue squares are 'tanks', which are trying to pick up the yellow 'flag' in the middle. You take turns driving a blue or a red tank. Each round the game records your tank's movement and plays it back on subsequent rounds.

This prototype took 189 lines of code to make:

sephsmac:tanksalot josephg$ wc -l  

By the end of the game jam we had this:

tanksalot final version

The gameplay isn't significantly changed but it has art, music & sound effects, a score and some other stuff. The question is: How big do you think the codebase is now?

Take a few moments to think about this.

The answer is 519 lines. Nearly 3 times bigger!

sephsmac:tanksalot josephg$ wc -l  

Wow! There's not 3x the gameplay in the new version. What mental model would have predicted that size increase? Wow - the same amount of code would have resulted in 3 gameplay prototypes!

Other projects

How about Google Chrome? Here's the graph:

lines of code in chrome

Chrome left beta on windows in December 2008 with about 2.5 million lines of code. Now it has 17 million lines, 2.5 million of which were added in the last 18 months. Somehow in the last 18 months the chrome team has added the equivalent complexity of another working copy of google chrome, to chrome. I'm sure there's some new features since then, but I challenge you to name more than 2. I'd be surprised if either of my parents could name any.

Almost every project I look at has this sort of runaway complexity problem. Here's sqlite:

sqlite lines of code

Notice that the graph has the same shape, but the absolute number is much lower. Is that because the problem sqlite solves is fundamentally simpler, or because it only has 1-3 developers?

And the linux kernel:

linux kernel lines of code

Linux has increased its code complexity by 50% in the last 5 years. Other than supporting new hardware I have no idea what any of the new code does.

Why care?

What the hell does all that code do? And why do I have all of it on my computer? Is our job to add code to a product even when its basically done?

I genuinely believe that code size is the number one smell in programming. All else being equal, your power to change a program is inversely proportional to the size of the program. I moved house recently, and I did all my packing the night before. I managed that by simply having most of my possessions in storage in London. The easiest way to reduce clutter is to throw things out.

Alan Kay talks about the difference between complexity (how hard is it to actually solve your actual problem) and complications - which is code you need to actually solve your problem in your language of choice.

Does Chrome really have 20 million lines of complexity? How much of that is simply complications?

Steve Yegge uses the metaphor of clutter in a wardrobe. How much of your code exists to actually solve your problem, and how much exists to keep your boxes organised?

If you can explain your solution to me on a whiteboard, why do I need a million lines of code to express it to the computer?

A final thought: How many lines do you think it would take to make a usable computer system if you wrote one from scratch? Imagine it needs to solve all the standard workflows you use on a daily basis. How do we make that?