At some point I want to port ShareJS to C, because tiny, fast projects like redis are sexy.
The simplest solution is to break documents into an array of lines. But that complicates the OT code, and doesn’t fix the asymptotic time anyway. - O(number of characters) becomes O(number of lines + max line length). Its still slow when you have lots of lines or binary data. But most editors seem to do that anyway, because worse is better and all that.
The right thing is to use a tree or skip list of smaller character strings, aka a Rope (its called a rope because its like a complex string. Very funny, yes?)
Ropes in C
So I thought maybe this is a good time to revisit the whole problem. I want my library to be straight C (I hate C++), so SGI’s rope implementation is out. SGI’s implementation also doesn’t support unicode, and I want to be able to insert & delete at UTF-8 character offsets.
Benchmarking these libraries is a bitch. Performance is dependant on the document size, which changes with every edit. I need to get some actual edit data from someone creating a document to see how it fares for real.
There’s also another unexpected boon in a library like this. It turns out that skip lists can be used to convert line & character offset pairs from an editor into a single global character offset in O(log n) time. I can also use the same trick to store & use tombstones for sharejs’s TP2 text type.
The benchmark I really want to see is my skip list rope compared to a tree rope implementation. It wouldn’t surprise me if trees get better performance that skip lists - I really have no idea. I tried comparing my implementation to the SGI rope implementation in C++ and I got this:
benchmarking librope did 500000 iterations in 157.574000 ms: 3.173112 Miter/sec final string length: 5799 benchmarking c string did 500000 iterations in 1556.024000 ms: 0.321332 Miter/sec final string length: 5799 benchmarking sgirope did 500000 iterations in 8442.429000 ms: 0.059225 Miter/sec final string length: 5799
Thats a slaughter, not a competition.