Wednesday, July 16, 2008

Profiling made visual

When you've got a performance regression resulting from a major patch, pinpointing where you can save time can be annoying. For me, on Linux, the only decent tools is jprof. And I didn't get far in jprof before tripping over a bug in its code that made reliable testing infeasible (ternary operators are wonderful things). After fixing that, I turned to the output.

The output is basic, and, in general, not helpful for deep inspection. Okay, so I know that I'm spinning in this function in specific. But, on a grandiose level, which functions am I really spinning hard in?

In one trace, it's obvious that malloc, JS, card creation, and case conversion are being nice and expensive. All four of those are more or less unavoidable. Where else am I wasting time? It's hard to tell, since many of the top functions produced by both flat and hierarchial views are wasted by irrelevant subfunctions of these. Enter graphviz.

Graphviz is a wonderful library I discovered about a year ago. It takes a file that looks like the code on the left and makes it into the graph on the right.

digraph G {
  A -> B;
  A -> C;
  B -> D;
  C -> D;
}
Graphviz output

The output gets better as you tickle it more and more. But it's flexibility is not why I love it. It's the fact that the simplicity is such that one can easily just write a simple sed or awk script to generate the graph. In the following three commands (that could just as easily be one command, but I'm not that cruel), I took the ugly jprof output and formatted into an easy-to-read graph:

jcranmer@quetzalcoatl /src/tree2/mozilla $ cat tmp3.html | sed -e '/index/,/<\/pre>/!d' -e '/<A href="#[0-9]*">/s/^.* \(.*[0-9]\) \(.*\)<\/A>$/c|\2|\1/' -e '/<a name/s#^.* \(.*[0-9]\)</a> <b>\(.*\)</b>$#f|\2|\1#' -e 's/<hr>/e|--/' -e '/|/!d' -e 's/|\(.*\)(\(.*\))|/|\1|/' -e 's/|.*::\(.*\)|/|\1|/' | awk -'F|' 'BEGIN { skip = 0; print "digraph G {" } $1 == "c" { if (skip == 0) { count[$2] = $3; } } $1 == "f" { for (func in count) { print "\"" func "\"->\"" $2 "\" [label=" count[func] "];"; delete count[func] } skip = 1; print "\"" $2 "\" [sum=" $3 "];" } $1 == "e" { skip = 0 } END { print "}" }' > ~/full.dot
jcranmer@quetzalcoatl ~ $ cat full.dot | gvpr 'BEG_G { $O = graph($.name, "D") } E  { if ($.tail.sum > 200 && $.tail.sum < 1000) { copy($O, $); } }' > full2.dot
jcranmer@quetzalcoatl ~ $ dot -Tpng -o full2.png full2.dot

Now that I've most likely burned you eyes out by using a sed, an awk, and a gvpr (something like awk, but for graphviz) script all from the command line, I feel the need to explain what it's doing. The sed script, in order, grabs only the hierarchical portion of the jprof output, changes the lines into simple fragments surrounded by pipe characters to be readable better by awk, and then scrubs the C++ demangled names into simple function names (although not perfectly). The awk script then compiles the information into a dot file mapping the call graph and annotating the nodes with probe frequencies. Next, gvpr scrubs out all nodes with more than 1000 probes or less than 200 probes. Finally, dot gets a hold of it, and makes a nice PNG of it.

And the PNG is informative. Although enormous, the information leaps out immediately. Floating high up are five functions which are expensive, the fifth of which I never noticed: XPCThrower::ThrowBadResult. Hmm... I quickly threw up a graph of the pre-patch results, and confirmed that it wasn't in the top slots there. Doing some basic math, this one function, and results off of it, produces about 60% of the current regression, assuming that I'm looking at the numbers right. Who said throwing exceptions was cheap?

Anyways, my visual approach to profiling isn't complete. The graph is in plain black and white, where I should be using colors and line thickness to be representing the expensiveness of operations. I might also play around with tickling the data to be able to highlight exact functions where regressions occur, something that I could easily do with gvpr if I had two dot graphs of the translated output. And my output filtering isn't perfect by any means. But all that comes for free in my envisioned perfect profiling extension. Oh well, at least I have something to point to for neat data.

Thursday, July 10, 2008

2900- UNCO bugs

As of a few minutes ago, Thunderbird's UNCO bug count dipped below 2900: it stands at 2892 right now (likely to change within a few minutes). For those of you not satisfied with only one products bugs, the sum of all mailnews UNCO bugs (TB bugs + mailnews portions of MAS and Core) is 3236, with Core contributing 227 and MAS with the other 117. Keep triaging!

Wednesday, July 9, 2008

Mork and SQLite

As I was doing some performance testing for a bit of demorkification, I noted that what I was demorkifying was one of the few places where mork works well. The API makes it very clearly an EAV model, which is a model that makes mork happy. On a higher level, it would be hard to position it in a way where the queries only affect one row (or column of a row in mork).

Mork is a model likes EAV and hates queries. That means keys need to be meaningful, since that's the only query you can do without looking at everything. Meaningful keys create problems, though.

SQLite is a model that loves queries but hates EAV and other models which don't rely on batching. That means that to get good performance, you should really be using larger batch queries. The biggest implication is that querying suddenly drops down to a lower level of index. If you pick the queries and schemas right, SQL can knock searches down from O(n) to O(lg n), and forgo object creation costs as a benefit.

But back to the patch at hand. Basic testing shows that, optimizing what I have to the point of not changing some core implementation stuff, it regresses badly in relative terms on two infrequent items (creating and removing files, i.e., adding/deleting folders). On actually retrieving the cache elements, it does the same (under <1ms average for both on a release build). Surprisingly, mork hates commits, consistently taking longer without a compression and averaging at 15.36ms for 50 commits, while SQLite takes 2.41ms on average for the same 50 commits. The cost, though, is in get/set of the actual EAV pairs; while all operations are under 35ms for 5000 get/sets for SQLite, mork can do the same for 10ms per operation. But mork's subsequent speed comes at a 100ms startup price (for 50 folders, heavier users will consume even more time).

Mork's fast query comes at the price of a slow startup: it loads everything into memory. SQLite could do the same thing, which would improve the get/set times but weaken startup, as well as memory usage as well. Mork also makes committing—which apparently happens more often than actually setting a property—more expensive, especially if you've neglected to compress recently; SQLite shows no loss if you neglect a compression, and (for something this small and static) no strong benefit to vacuuming. I couldn't find any services to measure memory easily, so I'm relying on assumptions there.

Thursday, July 3, 2008

Bugs and trends

As anyone who hangs out on IRC, subscribes to Mozilla's newsgroups, or reads key mailco blogs on a regular basis knows by now, Thunderbird has dipped under 3,000 UNCO bugs. The key chart on the matter now shows quite clearly that we have beaten that number, the first time in about a year. But this one metric does not reveal all.

First on my list of things to point out is the trends. The UNCO count peaked at 3500 and a bit earlier this year. Extrapolating is not the easiest thing here, but it seems that the overarching trend before the recent freefall would have pushed the UNCO count as high as 3700. Meanwhile, one might fret at the uptick in NEW bugs. However, the overarching trend may have been changing before the bugdays (very hard to tell); worst-case, it's only 200 above where the trend line falls. So 700 bugs have been "confirmed" in some sense at the expense of 200 of them staying valid. Impressive work.

Second is to point out that Thunderbird bugs don't represent the entire picture. Combining the bug counts from the Core mailnews components (Mailnews: * and Networking {IMAP, POP, News, SMTP}) and the mailnews side of Seamonkey as well, the aggregate UNCO count peters out to 3268 as of this writing (which is in the middle of a bugday, albeit a rather tame one). All open bugs come out to 10913.

Some components are "healthier" than others. I would consider a component healthy if the number of UNCO bugs is low with respect to NEW bugs, and the NEW bug count is low with respect to all bugs. For core components at least, I would consider both thresholds to be about 10%. The component I follow most closely (Networking: News) satisfies the first metric but not the second (7/143/897). I've not worked out numbers for the Thunderbird side, but a 30%/10% does not seem irrational. Surprisingly enough, Thunderbird does have a NEW count of about 10% of all bugs (2187/20403); the only component not close to this is RSS (106/658).

Time to jump back into bugday and nurse a sick component back to health!