Quetzalcoatal: December 2011

This feels a bit late to be talking about the 2011 LLVM Developers' Meeting (seeing as how it happened almost exactly a month ago), but since the slides have been put up over the past week and the talks were only put up on Youtube this week, I suppose I can finally back notes up with links and not talk so abstractly.

Of the talks I went to, I think by far the most interesting one was Doug Gregor's talk on Extending Clang. It covered various extension points in Clang and some of their capabilities with simple examples. It also ended in what might be impolitely titled "Where Extending Clang Sucks" and what was politely titled "Help Wanted." This boils down to "plugins are hard to use" and "one-off rewriters are hard to write". Indeed, I think the refrain of problematic architecture for further tools was repeated in more than a few presentations: Clang has the information you want and need, but squeezing the information out is inordinately difficult.

What was probably the most popular talk was Chandler Carruth's talk on Clang MapReduce—Automatic C++ Refactoring at Google Scale (his slides appear to not yet be posted). From what I recall, the more interesting parts of the talk are closer to the end. He had a discussion on developing a language for semantic queries to identify code that needs to be replaced (the example query was essentially "find all calls to Foo::get()"); the previous things people have tried are regexes (C++ is not syntactically regular, it's recursively enumerable), XPath (unfortunately, ASTs, despite their name, aren't exactly tree structures), and pattern matching ASTs (not always sufficient textual clues). The team's idea was to use a matcher library on "AST" values as predicates. He also spoke a bit about some of the efforts they did on using Clang with Chromium.

There is a point brought up in Chandler's talk that I want to reiterate. One of the problems with writing refactoring tools (or static analysis tools in general) is in getting people to trust that they work. For any sufficiently large codebase—millions of lines of code or more—it is fairly certain that the project will run up against the nasty edge cases in parsing tools. If a tool is intended to run mostly automated analysis on such code, it is impossible for anyone to be able to look at the output and ensure complete correctness. However, most sufficiently large codebases come with massive testsuites to help people believe their code is correct; if the same parser is capable of producing code that passes the entire testsuite, then one only needs to trust that the analysis is done correctly, a much smaller task. Without such a capability, no one would be willing to trust the analysis; in other words, any parser which is not capable of then generating code is never going to be trusted as a basis for further work, at least, not if you are looking at million-line projects.

Speaking of Chromium and Clang, there was talk on this too. As I am sure most of my audience is well aware, Chromium uses Clang for several of its buildbots (and all of its recent Mac development); I wish Mozilla could get Clang to be a tier 2 or tier 1 platform for Mac and Linux. As for why they prefer it, the brief rundown is this: better diagnostics, faster, smaller object sizes, they can write a style checker, and they can build better tools off of it (like AddressSanitizer, which, unsurprisingly, itself had a talk). Again, there were the complaints: building the rewriter proved to be difficult a task (notice a trend here?). Incidentally, I also attended the AddressSanitizer talk, but I'm getting tired of copying down notes of talks, so I'll let those slides and the video speak for themselves.

Finally, I did give a talk on DXR. It seemed to be well-received; I had a few people coming up to me later in the day thanking me for the talk and asking more questions about DXR. Something I did discover when giving it, though, is just how difficult giving a talk really is. I had specific notes on the slides written in my notebook, only to discover that I wasn't able to gracefully retreat to the podium and read the notes long enough to figure out what to say next. If you're wondering what I forgot to mention in the talk, it's mostly a more coherent explanation during the undead demo.

Wednesday, December 14, 2011

2011 LLVM Developers' Meeting