Random RSS feed Archive
26
Aug

Faster Strings

At some point I want to port ShareJS to C, because tiny, fast projects like redis are sexy.

One of the pain points of ShareJS is javascript’s high level strings. Because they’re immutable, inserting in the middle of a large document requires O(n) time. ShareJS slows to a crawl if you have huge documents.

The simplest solution is to break documents into an array of lines. But that complicates the OT code, and doesn’t fix the asymptotic time anyway. - O(number of characters) becomes O(number of lines + max line length). Its still slow when you have lots of lines or binary data. But most editors seem to do that anyway, because worse is better and all that.

The right thing is to use a tree or skip list of smaller character strings, aka a Rope (its called a rope because its like a complex string. Very funny, yes?)

To make ShareJS faster I implemented one of them for javascript called jumprope. But because of all the gubbins between javascript and the actual character arrays, its still pretty slow. Your strings need to be over 5k characters long before it catches up with simply making new strings from old strings.

Ropes in C

So I thought maybe this is a good time to revisit the whole problem. I want my library to be straight C (I hate C++), so SGI’s rope implementation is out. SGI’s implementation also doesn’t support unicode, and I want to be able to insert & delete at UTF-8 character offsets.

So I polished off my coffeescript jumprope library and ported it to C. Right now the C library is twice as big as it was in coffeescript. (600 LOC vs 270) but it runs about 20x faster. (~100k edits/second in javascript on V8 vs ~2m edits/second in C. Wow.).

The implementations themselves are almost identical. In both cases I’m using skip lists of strings. In C, the best performing node length was 128 characters, vs 512 in javascript.

Benchmarking these libraries is a bitch. Performance is dependant on the document size, which changes with every edit. I need to get some actual edit data from someone creating a document to see how it fares for real.

There’s also another unexpected boon in a library like this. It turns out that skip lists can be used to convert line & character offset pairs from an editor into a single global character offset in O(log n) time. I can also use the same trick to store & use tombstones for sharejs’s TP2 text type.

The benchmark I really want to see is my skip list rope compared to a tree rope implementation. It wouldn’t surprise me if trees get better performance that skip lists - I really have no idea. I tried comparing my implementation to the SGI rope implementation in C++ and I got this:

benchmarking librope
did 500000 iterations in 157.574000 ms: 3.173112 Miter/sec
final string length: 5799

benchmarking c string
did 500000 iterations in 1556.024000 ms: 0.321332 Miter/sec
final string length: 5799

benchmarking sgirope
did 500000 iterations in 8442.429000 ms: 0.059225 Miter/sec
final string length: 5799

(Code)

Thats a slaughter, not a competition.

About me
Blog of Joseph Gentle.

Coding, ShareJS, AI, all the good stuff :)
Follow me
Following
fozmeadows: What Happens Next: A Gallimaufrynornagon: Where You Could Bemissquillslinger: Arreptitious Affair
Search