Monday, March 5, 2012

How to use bugpoint

One of the problems with working with compilers is that you tend not to find some bugs until they get used on real code. And unlike most test suites, real code can involve some very large files, which is where you tend to find bugs. As luck would have it, my debugging fun today led to two bugs being found in the same function... which is only a 200 line function that hides infinite loops in macros and is written using continuations to boot. The LLVM IR for this function, after being compiled with -O3, was 4500 lines (and one block had 87 predecessors and several ϕ instructions as well). At such a size, finding out why my code crashes on such a function is impossible, let alone figuring out which optimizations I need to blame for it.

Fortunately, LLVM has a tool called bugpoint which can reduce this IR into a more manageable size of work. Doing manual reduction via an iterative process of "this doesn't look necessary; cut it out" the first time took me about an hour to produce a pile of code small enough to actually analyze. Doing it via bugpoint on the second bug took closer to 30 minutes. Unfortunately, the hard part is figuring out how to actually use the tool in the first place: none of the manuals give an example command line invocation, and they start playing games of "look at that documentation over there". So, I am going to remedy this situation by actually giving functional documentation.

In this case, I have an assert in an LLVM pass that is being triggered. This pass isn't being run as part of opt, but rather its own tool that takes as input an LLVM IR file. So the first step is to get the IR file (clang -cc1 -emit-llvm -O3 is sufficient for my needs). After that, it's time to prepare a shell script that actually compiles the code; you can skip this step if you don't actually need to provide arguments to the program. For example, my script would look like:

/path/to/tool "$@" -arg1 -arg2 -arg3=bleargh -o /dev/null

After that, the next step is to actually invoke bugpoint. Here's the command line that's running as I write this post: bugpoint --compile-custom -compile-command ./ io_lat4.ll. Since my program causes an assertion failure, bugpoint figures out that I'm trying to crash on code generation (it can also detect miscompilation errors). Hopefully, the first bit of output you get should look like the following:

Error running tool:
  ./ bugpoint-test-program.bc-FB1NoU
Text indicating your program crashed
  *** Debugging code generator crash!

If you've seen this much (you may need to wait for it to crash; it can take a long time if you doing Debug+Asserts builds), you know that it's trying to find code that makes your tool crash. After that, bugpoint tries to first reduce global initializers and then tries to eliminate as many functions as possible. After that, it tries eliminating basic blocks and then goes to work eliminating instructions. You will see lots of streaming output telling you what stuff it's removing; the documentation says it's helpful to capture this output, but I've found it useless.

When everything is done, you should get several files of the form bugpoint-*.bc; the most useful one is bugpoint-reduced-simplified.bc, which is the most reduced testcase. What you get now is a nice, reduced testcase? Not quite. First, it gives you just a bitcode file (I'd prefer .ll files, simply because my first thought is to read them to figure out what's going on). Another problem I have with bugpoint is that it doesn't do things like attempt to eliminate unused struct entries, nor does it bother to clean up some of its names. Take a look at this struct monstrosity: %struct.su3_matrix. = type { [3 x [3 x %struct.complex.]] }.

Anyways, I hope this helps anyone else who has looked at bugpoint before and wondered "How do I actually use this tool to do something useful?".

No comments: