Tuesday, October 14, 2008

I can has pony?

Not too long ago, I was working on killing RDF in the subscribe dialog. When doing a stress test (a server containing over 180 thousand newsgroups), I noticed that the nsStringStats output at the end had gone up to somewhere in the realm of 180 thousand strings. It couldn't be coincidence, so I ran some leak logs.

Debugging leaks is never fun. Debugging a leak where the object is stored in a global holding pen for almost the entire session is aggravating, as you need to match the references across practically the entire tree. The worst part, however, was that the leaking reference came from JS. After a very long, arduous job of matching up the global references, I narrowed down the suspect to this call (190 characters of whitespace happily elided):

js_Invoke
 XPC_WN_GetterSetter(JSContext*, JSObject*, unsigned int, long*, long*)
  XPCWrappedNative::GetAttribute(XPCCallContext&)
   .L1287
    NS_InvokeByIndex_P
     nsMsgDBFolder::GetServer(nsIMsgIncomingServer**)
      nsCOMPtr::swap(nsIMsgIncomingServer*&)

I just need to look for the JS code that calls nsIMsgFolder.server. There's only… a bit more than 100 of those. Furthermore, the leak didn't seem to stay, so I chalked it up to some sort of GC thing and moved on. (It actually was a valid leak in that the server was left with a reference to a JS object referring to the server again... but I fixed that after I realized what was going on)

In any case, it did leave with an extreme sense of frustration at debugging the leak. Seeing all those js_Invokes lining the stack was the worst part, as the information so is tantalizingly close, yet just out of reach. So I naturally reiterated an opinion I've had for a while: "Why not just integrate JS stacks in the stack trace?" The problem with that is that most tools have different ways of walking the stack…

But wait! There is one place in Mozilla code where there's a common way to walk the stack: nsStackWalk. Therefore, it might be possible that I could coerce that code a little to replace each js_Invoke with the information for the JS function it's actually calling. And, success! With some caveats:

  • It only works in the one definition I have access to (x86 gcc Linux).
  • It relies on being able to tell that I'm calling js_Invoke during the stack trace, i.e., the symbols must be in the same binary.
  • It doesn't do function names, only filenames and line numbers (better than nothing!).
  • It makes xpcom/base depend on js (but not xpconnect!).
  • It relies on js_Invoke's first function parameter being the JSContext *, and relies on the cdecl function call (i.e., the parameter is the one right above the return address).

It's not perfect, not even review-ready, but it's usable.

If you don't get the title, it's parodying this relatively popular internet meme. Without any pictures, though

No comments: