My life is not on rails

Its kind of fascinating the changes you make at different parts of your life. When I was 14 I felt like I was basically an adult and already knew everything. In many ways I was right - I was mostly fully formed. I could make decisions, and I could learn things and talk to people. I was like a stone block that was adult sized, and had a completely recognisable person blob chiseled in.

So its interesting to think about whats changed in the last 16 years, how and why. I think my rate of change is decreasing, but its like a progressively rendered jpeg image. Mostly the small details seem small but important. They're the oregano in a good pizza, the salt in well cooked pasta. They're the little communication skills, and the mastery, and the ability to say I love you without feeling self conscious. And the ability to say I was wrong and mean it (wow it took me a long time to learn that one!).

So its pretty momentous when you find big chunks still to learn. Usually bittersweet, but momentous all the same.

Let me tell you about Irish Citizenship. My mum's parents are from Ireland, and as a result I'm eligible for citizenship. The paperwork is surprisingly straightforward - I need to prove my lineage and then wait 9-24 months. Once thats done, I can apply for an Irish passport and live and work anywhere in the EU.

Paperwork is usually my kryptonite - I hate it, and I don't think I've sent mail in at least 3 or 4 years. So I did the adult thing and pretended like it was a quest in wow. I got everything together while I was back in Sydney at christmas - 3 generations of birth certificates and wedding certificates. I put it in a protective plastic sleeve, which I carefully put in a recycled cardboard document sleeve in my room. And then after assembling some furnature, I thought the cardboard sleeve was trash and I absentmindedly threw it out with the ikea boxes.


So, learning and maturing as a person...

It took me about 2 months to come to terms with the loss and tell my parents. 2 months of not sleeping very well, and waking up in the middle of the night thinking of new places in my room I might not have searched.

It was really awful, and I feel a little stressed just thinking about it.

But I finally asked the right question: Why am I so stressed out by this? Why did this hit me deep in my core? And I realised that it was challenging a fundamental belief in the world that I have - a belief of privilege. I somehow made it to 29 thinking somehow, deep down, that the world always ends up working out somehow. That the thing that should happen does happen. There's always a reason for things. (I didn't know I had this belief, but I did). When I have paperwork to do, it should just a case of doing it and then it gets done. I didn't believe in unexpected side effects, at least not for me, unless there's some deep purpose or reason why it'll be better in the long run.

Even as I realised that I was living in a storybook world I scrambled to find a story to tell myself that wasn't "just because". Maybe I lost the paperwork because in the long run its better that I learn this very lesson than have my original birth certificate. Surely, somehow the universe was actually conspiring to make my life better. (Much more likely than random chance). But the truth is, I didn't lose the paperwork for any good reason; it happened because sometimes bad things happen. The world doesn't run on stories, and if I generally have good luck its because I get lucky, not because the universe likes me.

More fundamentally, my life is not a train driving along a track. In both good and bad ways, what can happen in my life isn't limited to the directions with track laid out.

I have a friend who died from cancer at the age of 22. It took me 7 more years to realise that that could have actually happened to me instead!

So, learning. I've learned that the world isn't a story in the middle of being told. Its both better and worse than that. I've learned that I need to make time to talk to my parents on the phone - I might not have them forever. I've learned to be a little more scared of injury. It might happen to me, and it might be really inconvenient for my whole life. I've learned that I don't need other people to live lives before I can live them too. The universe is. I can change it however I like, including in ways I make up myself, with or without permission.

So I guess, at the start of the year I learned what privilege is, and I now have both more fear and more hope. Maybe when I move out I'll find a little plastic sleeve, but if I don't its ok - this is a problem I can fix.

Boilerplate Part 1 - History

So I sort of made a simulator for a new kind of CPU...

A few years ago another UNSW student (David Collien) wanted a way to teach hardware design to students. Breadboards and wires are sort of annoying, but there's lots of other ways you can make a logic gate. And once you have NAND, you can of course make anything.

Here's an early prototype of logic gates using string:

string logic

Before long David started exploring the idea of using air pressure instead. The idea was that air pressure could push a little piece of plastic (a shuttle) which could block off perpendicular air tunnels. You could use pressure to make logic gates, and from there you could build everything. Dougall Johnson made a prototype of the idea using laser cut acrylic:

steam prototype

David Collien then made a simple simulator in Java for it to plot out circuitry. He called it SteamOS. They wanted to make a CPU using it - but I don't think that ever worked out.

This is what the first simulator looked like in late 2010 - this is a diode:


Pressure can come from the left and travel down - it'll push the shuttle out of the way and travel. But if pressure comes from the bottom pipe, it'll get stopped by the shuttle.

This is a NAND gate:


If positive pressure hits both of the shuttles from the left (purple), it'll connect the negative pressure (red) through to the output on the left. If they aren't both pressurized, at least one green will be connected.

I don't know what the blue cells are - I think this simulator had some way to ground out pressure to the environment.

These world files were originally stored as BMPs - which makes a lot of sense, and you edited them using a paint tool.

In early 2012 Jeremy Apthorp and I were hacking and ended up talking about steamos. I hadn't seen any of the picture at this point - but it was a great concept. We decided to whack together a simple simulator in coffeescript. The original simulator was 100 lines. We didn't even have any programs to use as reference - I'd never seen the original, but that didn't really matter. We just wanted a toy.

The thing we ended up building is impossible to build physically - we dumped some realism in favor of having a simple simulator that is easy to reason about. There's no concept of a vaccuum in our world - air isn't displaced, it merely has a pressure integer and walls default to a pressure of 0. But it looks like this:


This is a ceiling function - it simply takes positive or negative pressure as an input, and outputs positive or negative pressure. Because of a quirk of how the simulator works, if a shuttle has twice the surface area, pressure will have twice the effect - hence the larger plate on the left.

After spending a few hours making the simulator, Jeremy and I got horribly distracted simply making stuff with it - What does a multiplexer look like? Can you make a shuttle move back and forth along a tunnel? Oscillators are easy, but whats the smallest binary counter you can make? Our implementation is called Boilerplate, and the code is here.


This is 16 bits of addressable memory. The giant shuttles on the left wire a bit of memory into the output (which you can barely see) on the left. The cells themselves are on the right. A pressure value of two is used to write (using the same tunnel you can read from).

Over christmas I started making a CPU:


The main bus is on the top left, and the program counter on the top. I haven't finished it - so I don't think anybody has built a CPU with steam yet. I'm not sure that I will finish though - making a Von Neumann machine feels like its going against the grain of steam somehow. I might start from scratch and build a Forth machine or something like that.

I showed it publically for the first time in Jan 2014 to a standing ovation at SydJS. I keep getting a fantastic reception when I show people. There's now a haskell implementation of the simulator. I also wired it up to arduinos at nodeconf last weekend - I'm really tempted to attach it to some lego robots. And Jeremy and I started making a little safe crasking game using it, although its pretty broken at the moment.

I'm going to make a visual walkthrough / tutorial soon - I finished cleaning up the code today, so now its a lot easier to embed in a page or do whatever you want with (it hasn't been 100 lines for a long time now).

As of today, the boilerplate simulator is now its own npm module. I'm hoping to re-launch it in a week or so. I basically just need to host it somewhere now and I'm done! :D

Email culture

I just got a forwarded email from my mum which used a gawdy purple size 18 text to say this, linking to some youtube video:

Human ingenuity knows no limits. Make sure speakers are on. Sit back and take it all in. Enjoy!

I want to ask mum whats up with the awful colourful text - honestly, why does someone start thumbing through text colours looking for something shouty and obnoxious? Why don't they just use plaintext?


But why should anyone use plaintext? Why do I prefer it? I'm sure I could invent some reasons, but its not really better. In truth, I prefer plaintext because thats just how we talk. Its cultural.

10 years ago, rich text in email was considered rude. You excluded mutt and pine users. So technical mailing lists stayed in plaintext, and all important discussions happened in 80 character terminals. I don't know if anyone I care about still uses text-based email clients. It doesn't really matter at this point. Now unnecessary styling in email feels like a 'foreign tribe' signal. When I see it, I think that this person probably doesn't have experience in mailing lists. And so I intuitively think my mum shouldn't use it because people will think less of her.

And thats totally wacky and wrong. My mum and her friends aren't in our internet tribes. They don't have to be, and even though our people invented email, we don't get decide how it gets used. The fact that my mum's social circle can invent their own social conventions is a great sign for email as a product. And anyway, its not like any of them uses mutt.

As always, culture moves on.

Secure Email

Sending email securely is such a mess. Even PGP isn't good enough because it leaks metadata about who I'm contacting, when, and how much I'm saying. I'm really bothered by that and I've been thinking about it a lot lately. I think my ideal setup would be something like this:

Suppose alice@a.example.com wants to email bob@b.example.com. Alice needs bob's PGP key, a.example.com's public key and b.example.com's public key.

Alice PGP encrypts her email to Bob, then encrypts that so it can only be read by b.example.com, then encrypts that so it can only be read by a.example.com.

When she sends her email to her SMTP server at a.example.com, her server can only read & decrypt enough to know that the message came from Alice and is intended for b.example.com. Her server does not know anything else about the email, including its final destination. Her SMTP server forwards the encrypted bundle to b.example.com.

b.example.com decrypts the message with their key, and only knows that the email came from a.example.com and is intended for Bob. The b.example.com server does not know that Alice sent the email.

Finally Bob receives the message and decrypts it using his PGP key. Bob can of course read everything, including who sent the message.

This system has the big advantage that snooping hardware at either a.example.com or b.example.com doesn't tell the NSA anything. (Just that Alice sent someone an email, or that Bob received an email).

They would need hardware at both endpoints to discover that Alice and Bob are even messaging each other. Further, if Alice and Bob are feeling particularly paranoid, once this infrastructure was in place it would be easy to TOR-style bounce the message through a few more intermediate mail servers to make snooping almost impossible. Once it was bounced through more locations, even if the NSA snooped on both endpoints, they wouldn't be able to match the messages together - They would just know that Alice sent an email and Bob received one.

Its a shame that email would need to be changed so much to implement this system. But long term, I think its something we should work towards.

Identity crisis on the web

I went to IndieWebCamp drinks the other night and chatted to those guys about my ideas for KeyKitten.

We ended up chatting about what your main identity should be on the web. The two candidates are your email address (user@domain.com) or a URL, which for us techie people is probably simply a domain. On a website, you can put an hCard which can list all of your secondary identity information anyway, like email address and twitter handle.

The big advantages of email are that

  • Everyone already has one, even my mother
  • Everyone remembers their email addresses already
  • My mum will probably never register her own domain

Email is the primary identity of Persona and Gravatar. If someone logs in to your service using persona, you get their email address, not their URL.

URLs on the other hand are more powerful. I can put actual content on my website, including whatever contact information I want and an avatar image. We can do that with email addresses too, but its sort of hacked in. Gravatar (well, libravatar) works by rewriting foo@example.com to example.com/avatar/<hash of foo@example.com>. Which is a big awful nasty looking hack. Using hCard, you can just have a link to your avatar image from your homepage. Of course, if I own josephg.com, its pretty easy to put an image at josephg.com/avatar/<hash> anyway. Its just kind of haphazard.

The other benefit of URLs that @tantek kept talking about is that people shouldn't be siloed, and an identity like josephg@gmail.com is stuck in gmail's stack. As far as identities go, its not really a first class citizen. Maybe we shouldn't be building new infrastructure with the assumption of siloed anything. Maybe we should make people without a website feel the pinch.

I'm kinda convinced by the solos argument, but I still want to be able to send encrypted email to my mum. Not because we have anything to hide but more because fuck the surveillance state thats why. Its also much easier to programatically find someone's gravatar than parse an hCard entry.

URLs are a fun idea, but I think email based systems will win in the end amongst regular folk. And thats the hardest question of all: Ultimately, who are we making our software for? If we're making it for tinkerers and hackers, a URL will be fine. If we're making it for my mum, an email address is really the only way to go. There's something really appealing about designing and writing software for a smaller internet with just people who create on it. But its also insular and snobbish, and many of my friends won't make the effort to join me there.

ShareJS 0.7

This is a repost from my old tumblr blog from March 19

A month ago I got hired by Lever and moved over to San Francisco. We're building an applicant tracking system for hiring. Its a realtime web app built on top of Derby and Racer, which is a web framework written by the company's cofounders.

Racer doesn't do proper OT and it doesn't scale. Over the next few months, I'm going to refactor and rewrite big chunks of ShareJS so we can use it underneath racer to keep data in sync between the browser and our servers. I'm going to refactor ShareJS into a few modules (long overdue), add live queries to ShareJS and make the database layer support scaling.

I want feedback on this before I start. I will break things, but I think its worth it in the long term.

So, without further ado, here's the master plan:


Standardized OT Library

First, ShareJS's OT types are written to a simple API and don't depend on any external services. I'm going to pull them out into their own project, akin to libOT.

The types here should be super stable and fast, and preferably written in multiple languages.

I considered adding some simple, reusable OT management code in there too, but by the time I pared OT down until I had something reusable, it was just a for loop.

I'm not sure where the text & JSON API wrappers should go. The wrappers are generally useful, but not coded in a particularly reusable way.

Scalable database backend

Next, we need a scalable version of ShareJS's database code. I want to pull out ShareJS's database code and make it support scaling the server across multiple machines.

I also want to add:

  • Collections: Documents will be scoped by collection. I expect collections to map to SQL tables, mongodb collections or couchdb databases. Collections seem to be a standard, useful thing.
  • Live queries: I want to be able to issue a query saying "get me all docs in the profiles collection with age > 50". The result set should update in realtime as documents are added & removed from that set. This should also work with paginated requests. I don't want to invent my own query language - I'll just use whatever native format the database uses. (SQL select statements, couchdb views, mongo find() queries, etc).
  • Snapshot update hooks: For example, I want to be able to issue a query to a full-text search database (like SOLR) and reuse the same live query mechanism. I imagine this working via a post-update hook that the application can use to update SOLR. As a first pass, I'll poll all outstanding queries against the database when documents are updated, but I can optimise for certain common use cases down the track.

I want to get the API here stable first and let the implementation grow in complexity as we need it to be more scalable and reliable. At first, this code will route all messages through a single redis server. Later I want to set it up with a redis slave for automatic failover and make the server shard between multiple DB instances using consistant hashing of document IDs or something.

I'm nervous about how the DB code and the operational transform algorithm will work. If the DB backend doesn't understand OT, the API will have to be strongly tied to ShareJS's model code and harder to reuse. But if I make it understand OT and subsume ShareJS's model code, it makes the DB code much harder to adapt to work with other databases (you'll need to rewrite all that code!). I really love the state of model.coffee in ShareJS at the moment, though it took me 2 near complete rewrites to get to that point.

I would also like to make a trivial in-memory implementation for examples and for testing. Once I have two implementations and a test suite, it should be possible to rewrite this layer on top of Hadoop or AWS or whatever.

ShareJS code

Whats left for ShareJS?

ShareJS's primary responsibility is to let you access the OT database in a web browser or nodejs client in a way thats secure & safe.

It will (still) have these components:

  • Auth function for limiting reading & writing. I want to extend this for JSON documents to make it easy to restrict / trim access to certain parts of some documents.
  • Session code to manage client sessions. All the protocol stuff thats in session.coffee. I want to rewrite / refactor this to use NodeJS's new streams.
  • Presence, although this will require some rethinking to work with the new database backend stuff.
  • A simple API that lets you tell the server when it has a new client, and pass messages for it. I'm sick of all the nonsense around socket.io, browserchannel, sockjs, etc so I want to just make it the user's problem. Again, this will use the new streams API. This also makes it really easy for applications to send messages to their server that don't have anything to do with OT.
  • Equivalent connection code on the client, currently in client/connection.coffee.
  • Client-side OT code, currently in client/doc.coffee.
  • Build script to bundle up & minify the client and required OT types for the browser. I want to rewrite this in Make. (Sorry windows developers).
  • Tests. It looks like nodeunit is no longer actively maintained, so it might time to port the tests to a different framework. (Suggestions? What does everybody use these days?)

ShareJS has slowly become a grab bag of other stuff that I like. I'm not sure whether all this stuff should stay in ShareJS or what.

There is:

  • The examples. These will wire ShareJS up with browserchannel and express. The examples will add a few dependancies that ShareJS won't otherwise have.
  • The different database backends. Unless someone makes an adapter for my new database code, these are all going to break. Sorry.
  • Browser binding for textareas, ace and codemirror
  • All the ongoing etherpad work. I met a bunch of etherpad & etherpad lite developers at an event last week, and they were awesome. Super happy this is happening.


Thats the gist of the redesign. Some thoughts:

I hate making ShareJS more complicated, but at the same time I think its important to make it actually useful. People need to scale their servers and they need to be able to build complex applications on top of all this stuff. I love how ShareJS's entire server is basically encapsulated in one file, and it'll be a pity to lose that.

This change will break existing code. Sorry. The current DB adapters will break, and putting documents in collections will change APIs all the way though ShareJS.

I'm still not entirely sure how this redesign will interact with my C port of ShareJS. Before I realised how integral ShareJS would be to my current work, I was intending to finish working on my C implementation next. For now, I guess that'll take a back seat. (In exchange, I'll be working on this stuff while at work, and not just on weekends.)

This design allows some nice application features. For example, the auth stuff can much more easily enforce schemas for documents. You could enforce that everything in the 'code' collection has type 'text', everything in the 'projects' collection is JSON (with a particular structure) and items in the 'profiles' directory are only editable by user who owns the profile. You could probably do that before, but it was a bit more subtle.

As I said above, I'm not sure where the line should be drawn between the DB project and the model. If they're two separate projects, they should have a very clear separation of concerns. I'm really trying to build a DB wrapper that provides the API that I want databases to provide directly, in a scalable way. However, that idea is entangled with the OT and presence functionality. What a mess.

I want feedback on all this stuff. I know a lot of people are building cool stuff with ShareJS, or want to. Do these plans make your lives better or worse? Should we keep the current simple incantation of ShareJS around? If I'm taking the time to rip the guts out of ShareJS, what else would you like to see changed? How do these ideas interact with the etherpad interaction work?


Chipmunk in ASM.JS

I'm not sold on emscripten. Its a cool idea, and its impressive that it works at all, but its output seems really stupid. For example, take this C function:

cpMomentForCircle(cpFloat m, cpFloat r1, cpFloat r2, cpVect offset) {
    return m*(0.5f*(r1*r1 + r2*r2) + cpvlengthsq(offset));

This is the asm.js code that emscripten generates using -O2:

function cpMomentForCircle(m, r1, r2, offset) {
  // Type annotations

  // Variable declarations
  var f=0, g=0, h=0.0;

  // Body
  HEAP32[offset>>2] = HEAP32[g>>2];
  HEAP32[offset+4>>2] = HEAP32[g+4>>2];
  HEAP32[offset+8>>2] = HEAP32[g+8>>2];
  HEAP32[offset+12>>2] = HEAP32[g+12>>2];
  h=((r1*r1 + r2*r2)*0.5+ +cpvlengthsq(offset))*m;
  return +h;

How did such a simple function become so complicated? It doesn't need to copy the offset vector onto the stack (in fact, it doesn't need to use the stack at all).

With code like that you can easily see how ChipmunkJS triples in size due to emscripten. You can also see how executables get so big...

So I was curious how hand written asmjs compares, so I ported cpVect to asmjs manually.

The original C:

static inline cpVect cpvclamp(const cpVect v, const cpFloat len)
    return (cpvdot(v,v) > len*len) ? cpvmult(cpvnormalize(v), len) : v;


var vclamp = cp.v.clamp = function(v, len)
    return (vdot(v,v) > len*len) ? vmult(vnormalize(v), len) : v;

And hand written ASMJS:

function vclamp(ret, v, len) {
  ret = ret|0;
  v = v|0;
  len = +len;

  if (+vdot(v, v) > len*len) {
    vnormalize(ret, v);
    vmult(ret, ret, len);
  } else {
    cpv(ret, +f64[v>>3], +f64[v+8>>3]);

(Most methods don't balloon out in complexity so much)

I don't know how my version compares to LLVM's version in terms of speed. I'd like to think that its faster - but I have no idea if thats actually true. I've been told that LLVM will generate better assembly than me almost always.

Its really annoying to write this code - especially dealing with the heap and doing type annotations everywhere. If I were going to convert the whole thing, I'd be better off using LLJS's asmjs branch. That said, lljs hasn't been touched in 3 months and doesn't seem to currently work. I think I would still be ahead time-wise after fixing LLJS though. I don't want to debug those bitshift operations.

The big win is in compiled output size. And here's the crazy part: The hand-written asmjs is smaller than ChipmunkJS!

$ uglifyjs -cm <cpVect.js  | wc -c


$ uglifyjs -m <asm.js  | wc -c

(-c doesn't work on asmjs modules - it breaks some of the type hints)

That asmjs module is at a disadvantage too - it includes a bunch of asmjs boilerplate that is only needed once!

The most interesting part is looking at the minified code. It looks totally different. And its super obvious which is which:

,vperp=cp.v.perp=function(t){return new Vect(-t.y,t.x)},
vpvrperp=cp.v.pvrperp=function(t){return new Vect(t.y,-t
.x)},vproject=cp.v.project=function(t,n){return vmult(n,
(t){return this.mult(vdot(this,t)/vlengthsq(t)),this};va
r vrotate=cp.v.rotate=function(t,n){return new Vect(t.x*
tion(t){return this.x=this.x*t.x-this.y*t.y,this.y=this.

function I(n,r,t){n=n|0;r=r|0;t=t|0;m(n,+y[r>>3]-+y[t>>3
],+y[r+8>>3]-+y[t+8>>3])}function U(n,r){n=n|0;r=r|0;y[n
>>3]=-+y[r>>3];y[n+8>>3]=-+y[r+8>>3]}function z(n,r,t){n
function F(n,r){n=n|0;r=r|0;return+(+y[n>>3]*+y[r>>3]+ +
y[n+8>>3]*+y[r+8>>3])}function _(n,r){n=n|0;r=r|0;return
+(+y[n>>3]*+y[r+8>>3]-+y[n+8>>3]*+y[r>>3])}function b(n,
r){n=n|0;r=r|0;m(n,-+y[r+8>>3],+y[r>>3])}function j(n,r)

I've never seen javascript look so mathsy. GWT, CoffeeScript and the closure compiler all look positively plain compared to that.

The entire cpVect asmjs module is here if you want to see a larger code sample.

In comparison, cpVect.js is here and the original cpVect.h is here.

The only downside is that its super awkward to call any of these methods from normal javascript. Because asm.js modules can't view or edit normal javascript objects, all the stateful data has to exist inside the module's own memory heap & stack. If I go this route, I'll need to wrap the entire chipmunk API in a plain JS API to actually make it usable by normal javascript programs. Its a serious downer. (Emscripten has exactly the same problem).

KeyKitten: Gravatar for keys!

One of the reasons crypto isn't used more is usability. To use PGP there's like 6 scary steps you have to go through. First you have to install gpg, then make your keys (which requires typing some scary stuff and choosing your cypher..!?). Then you have to add your key to your keyring (??), and you should upload it to some random websites that nobody has ever heard of. And you'll still feel guilty because you didn't go to a key signing party. Even after you've done all of that, you need to store your private key somewhere safe so you don't lose it. And who do you trust with your private key?

And good gracious, I hope you don't use windows to do all of that!

Lets face it, my mum is never going to make a pgp key today, and my mum is GMail's target audience, not crypto neckbeards. Which makes message encryption impossible.

We have the same problem on the other side of the fence. If I want to send an encrypted message to jim@example.com, how can I get Jim's public key?

Well, in comes keykitten. The point of keykitten.org is gravatar for keys. Hash jim@example.com to c20266793..., then fetch https://keykitten.org/keys/c20266793d32b1b99e42438807fc7038f89bb326/pgp to get his pgp key. Or you can fetch /ssh to get jim's public ssh key.

The other half of the project is a simple web UI to sign in & upload your keys to the site. I want to make it usable by both my mum and security neckbeards. If you don't have a key, we'll generate you some using browser javascript. If you're worried you'll lose your private key, I'll store a copy of it (but only if you want me to). I'll use persona to sign users in, pin SSL certificates in chrome and firefox (and make the SSL cert widely published).

Neckbeards can go in and upload the pgp key they generated & got signed at key parties. My mum can click the 'figure it out for me' button. And finally, of course, the site should be federated so if you want jim@example.com's key, you should first check example.com/keys/... before looking on keykitten.com.

There's a few fun things you can do with a system like this. Once github knows my email address, they can just look up my ssh public keys to give me access. If I want to let my friend ssh in to my computer, I can add him from my contact book (I have his email address, after all). My computer will fetch his ssh key via keykitten, make an account and add his key via authorized_keys. And finally, it should be much easier to make things like encrypting browser extensions. All the extension needs to know is the recipient and it can figure out how to encrypt data for them.

So thats the plan. Little, tiny, exciting steps.

Dreams, cleverness and the gallows

Last night I dreamed that I was on death row, slowly taken out toward the gallows. I hadn't really done anything wrong. Suddenly my execution was just sort of about to happen. Everyone thought I must have some clever plan to escape - so most of my dream was taken up scheming to cheat death. If I was sneaky and clever enough, I'm sure I could think of something.

And now I'm awake. At some point I'm going to get old and die. I won't die because I'll deserve it. I'll die because this dreamy life might end before I think of a clever escape from death. We're probably in one of the last generations to ever die - how sad is that!

I should shoulder some of that burden. We all should; its sensible. I don't want to leave my potential immortality up to the cleverness of strangers and their willingness to share. Seriously - what do we need to do? Because I'm having way too much fun, and it would just be so sad if I die stupidly because we're too busy partying to be clever.

And I know all that. I've been talking about this stuff, and my crazy AI ideas for basically ever. When will I actually write that code? The most recent HPMOR chapters have kicked me with this stuff again. If I catch some terminal illness in a few years, I will look back on my time now with disgust - as wasted years that I could have been saving my life. Or worse, if someone I love dies because right now I'm spending my life making hiring software instead of AI, then what? There's a gun pointed at my head and yours. Every day there's a small chance it will go off. Today, I'm ignoring it and building cool concurrency systems instead. But we should do something about that gun. If we don't, we will all die, 100%.

We're all on the way to the gallows. Its time to come up with something clever.

ChipmunkJS and Emscripten

I've finally gotten around to compiling Chipmunk to JS using Emscripten to see what happens. It works great.

As a baseline, here's the first benchmark running in C:

Time(a) =  1451.45 ms (benchmark - SimpleTerrainCircles_1000)

The same benchmarks running with chipmunkjs in node 0.10.12 (v8:

$ node bench.js 
SimpleTerrainCircles 1000
Run 1: 22426
Run 2: 21808

(ie, 21 seconds, 15x baseline)

Using v8 head (v8:

$ ../v8/out/native/d8 bench.js
SimpleTerrainCircles 1000
Run 1: 11248
Run 2: 12930

(8x baseline)

Emscripten (-O2 -DNDEBUG), v8: in Chrome Canary:

Time(a) =  3967.12 ms (benchmark - SimpleTerrainCircles_1000)

(2.7x baseline)

In Firefox 22 (which has asmjs support) (Firefox nightly (25a) has about the same performance):

Time(a) =  2044.10 ms (benchmark - SimpleTerrainCircles_1000)

(1.4x - only 40% slower than C!!!)

The V8 team is actively working on making asmjs code run faster in v8. They don't want to have a special 'asm.js mode' like firefox does - instead they're adding optimizations which can kick in for asmjs-style code (source: insiders on the Chrome team). I expect Chrome performance to catch up to firefox performance in the next ~6 months or so.


  • I didn't make any changes to chipmunk (although I did bump chipmunkjs tests runs back up to 1000 to match chipmunk). My test code is here

  • I compiled the benchmark code from C using emscripten. If your game is written in javascript, performance will be worse than this.

  • These numbers are approximate. I didn't run the benchmarks multiple times and I have a million things open on my machine. I doubt they'll be off by more than ~10% though.

  • Downloaded filesize increases by nearly 3x. Chipmunk-js is 170k, or 17k minified & gzipped. With emscripten the output is 300k, minified & gzipped to 49k. This is way bigger.

  • We can expose most of chipmunk directly to javascript. Unfortunately, we can't share vectors from inside the emscripten environment and outside of it - emscripten (obviously) inlines vectors inside its own heap & stack. In javascript, the best we can do is use objects in the JS heap. Our options are either removing vectors (as much as possible) from the API (cpBodySetPos(v) -> cpBodySetPos(x, y)), writing some javascript wrappers around everything to bridge between a javascript vector type and a C vector type or putting vectors in the emscripten heap (which would be faster than a JS bridge, but require that you match cpv() calls with cpvFree() or something. All options are kind of nasty.

  • Emscripten doesn't use the GC, so you can now leak memory if you don't cpSpaceFree(), etc.

  • As well as running faster, its easier to port code like this. Keeping chipmunkjs updated with the latest version of chipmunk should mostly just require a rebuild.