Wednesday, July 9, 2008

Mork and SQLite

As I was doing some performance testing for a bit of demorkification, I noted that what I was demorkifying was one of the few places where mork works well. The API makes it very clearly an EAV model, which is a model that makes mork happy. On a higher level, it would be hard to position it in a way where the queries only affect one row (or column of a row in mork).

Mork is a model likes EAV and hates queries. That means keys need to be meaningful, since that's the only query you can do without looking at everything. Meaningful keys create problems, though.

SQLite is a model that loves queries but hates EAV and other models which don't rely on batching. That means that to get good performance, you should really be using larger batch queries. The biggest implication is that querying suddenly drops down to a lower level of index. If you pick the queries and schemas right, SQL can knock searches down from O(n) to O(lg n), and forgo object creation costs as a benefit.

But back to the patch at hand. Basic testing shows that, optimizing what I have to the point of not changing some core implementation stuff, it regresses badly in relative terms on two infrequent items (creating and removing files, i.e., adding/deleting folders). On actually retrieving the cache elements, it does the same (under <1ms average for both on a release build). Surprisingly, mork hates commits, consistently taking longer without a compression and averaging at 15.36ms for 50 commits, while SQLite takes 2.41ms on average for the same 50 commits. The cost, though, is in get/set of the actual EAV pairs; while all operations are under 35ms for 5000 get/sets for SQLite, mork can do the same for 10ms per operation. But mork's subsequent speed comes at a 100ms startup price (for 50 folders, heavier users will consume even more time).

Mork's fast query comes at the price of a slow startup: it loads everything into memory. SQLite could do the same thing, which would improve the get/set times but weaken startup, as well as memory usage as well. Mork also makes committing—which apparently happens more often than actually setting a property—more expensive, especially if you've neglected to compress recently; SQLite shows no loss if you neglect a compression, and (for something this small and static) no strong benefit to vacuuming. I couldn't find any services to measure memory easily, so I'm relying on assumptions there.


Schrep said...

Have you tried tuning the cache sizes for sqlite? Do the queries run faster the second time?

Anonymous said...

Sorry, but what does the acronym EAV stand for?

Joshua Cranmer said...


(it's listed from the disambiguation page.