Wednesday, July 16, 2008

Profiling made visual

When you've got a performance regression resulting from a major patch, pinpointing where you can save time can be annoying. For me, on Linux, the only decent tools is jprof. And I didn't get far in jprof before tripping over a bug in its code that made reliable testing infeasible (ternary operators are wonderful things). After fixing that, I turned to the output.

The output is basic, and, in general, not helpful for deep inspection. Okay, so I know that I'm spinning in this function in specific. But, on a grandiose level, which functions am I really spinning hard in?

In one trace, it's obvious that malloc, JS, card creation, and case conversion are being nice and expensive. All four of those are more or less unavoidable. Where else am I wasting time? It's hard to tell, since many of the top functions produced by both flat and hierarchial views are wasted by irrelevant subfunctions of these. Enter graphviz.

Graphviz is a wonderful library I discovered about a year ago. It takes a file that looks like the code on the left and makes it into the graph on the right.

digraph G {
  A -> B;
  A -> C;
  B -> D;
  C -> D;
Graphviz output

The output gets better as you tickle it more and more. But it's flexibility is not why I love it. It's the fact that the simplicity is such that one can easily just write a simple sed or awk script to generate the graph. In the following three commands (that could just as easily be one command, but I'm not that cruel), I took the ugly jprof output and formatted into an easy-to-read graph:

jcranmer@quetzalcoatl /src/tree2/mozilla $ cat tmp3.html | sed -e '/index/,/<\/pre>/!d' -e '/<A href="#[0-9]*">/s/^.* \(.*[0-9]\) \(.*\)<\/A>$/c|\2|\1/' -e '/<a name/s#^.* \(.*[0-9]\)</a> <b>\(.*\)</b>$#f|\2|\1#' -e 's/<hr>/e|--/' -e '/|/!d' -e 's/|\(.*\)(\(.*\))|/|\1|/' -e 's/|.*::\(.*\)|/|\1|/' | awk -'F|' 'BEGIN { skip = 0; print "digraph G {" } $1 == "c" { if (skip == 0) { count[$2] = $3; } } $1 == "f" { for (func in count) { print "\"" func "\"->\"" $2 "\" [label=" count[func] "];"; delete count[func] } skip = 1; print "\"" $2 "\" [sum=" $3 "];" } $1 == "e" { skip = 0 } END { print "}" }' > ~/
jcranmer@quetzalcoatl ~ $ cat | gvpr 'BEG_G { $O = graph($.name, "D") } E  { if ($.tail.sum > 200 && $.tail.sum < 1000) { copy($O, $); } }' >
jcranmer@quetzalcoatl ~ $ dot -Tpng -o full2.png

Now that I've most likely burned you eyes out by using a sed, an awk, and a gvpr (something like awk, but for graphviz) script all from the command line, I feel the need to explain what it's doing. The sed script, in order, grabs only the hierarchical portion of the jprof output, changes the lines into simple fragments surrounded by pipe characters to be readable better by awk, and then scrubs the C++ demangled names into simple function names (although not perfectly). The awk script then compiles the information into a dot file mapping the call graph and annotating the nodes with probe frequencies. Next, gvpr scrubs out all nodes with more than 1000 probes or less than 200 probes. Finally, dot gets a hold of it, and makes a nice PNG of it.

And the PNG is informative. Although enormous, the information leaps out immediately. Floating high up are five functions which are expensive, the fifth of which I never noticed: XPCThrower::ThrowBadResult. Hmm... I quickly threw up a graph of the pre-patch results, and confirmed that it wasn't in the top slots there. Doing some basic math, this one function, and results off of it, produces about 60% of the current regression, assuming that I'm looking at the numbers right. Who said throwing exceptions was cheap?

Anyways, my visual approach to profiling isn't complete. The graph is in plain black and white, where I should be using colors and line thickness to be representing the expensiveness of operations. I might also play around with tickling the data to be able to highlight exact functions where regressions occur, something that I could easily do with gvpr if I had two dot graphs of the translated output. And my output filtering isn't perfect by any means. But all that comes for free in my envisioned perfect profiling extension. Oh well, at least I have something to point to for neat data.


Blake said...

Joshua, have you filed a bug on XPCThrower::ThrowBadResult being slow? Throwing exceptions isn't free, but it is the sort of thing that we should be able to do pretty quickly.

Pike said...

I would have loved to see the actual output you looked at.