Quetzalcoatal: Code coverage to the extreme

If all goes well, sometime tonight I will have completed 362 builds of Thunderbird, one for each day from July 24, 2008 to July 23, 2009 excluding August 3, 2008, December 25, 2008, and July 4, 2009 (more may turn up as I get more data; bonus points if you can figure out the significance of each of those days!). Included for each build is either a build log to tell me why the build failed or a test log telling me what ran. Also included is a copy of the Thunderbird code coverage data.

What, you may ask, do I intend to do with a year's worth of code coverage data? I intend to use this data to help answer some questions I have about our code coverage. Already, I've wondered about a more general overview of code coverage data (see my last post for more details). Now, I want to pose some of the following questions:

Whose code is not covered?
Who is adding code right now without making sure to cover it?
Whose tests are responsible for most improving code coverage?
How is code coverage being impacted over time?

My answers to these questions involves taking a snapshot of the code coverage data over time. That, however, proves to be a little more difficult than you'd imagine. First of all, hg doesn't support, as far as I can tell, an "update to what the repo looked like at this time" (hg up -d goes to the revision that most matches that date, not to a snapshot at that time). So I had to write a few scripts to pull out the revisions to look at. Second, gloda ruined some of this data. Fortunately, that's easy to tell due to the <1KB log files complaining about no client.mk. Then there's the issue of my revision logs containing m-c data, not m-1.9.1, so I have to hack around the client.py for Thunderbird trying to pull a different revision.

Another source of complaints was actually building and running the things. The computers I'm doing this on are all 64-bit Linux. There are a few m-c revisions that cause 64-bit to break, and libthebes and gcov just can't seem to work together on 64-bit Linux. Plus, libpango has some breaking API changes between 2.22 and 2.24. One of the XPCOM tests seems to crash and sit there with a prompt saying "Do you want to debug me?" Finally, the test plugins seem to cause massive test failure due to assertions. Not to mention that these machines don't have lcov on them and I don't have sudo privileges (so I'm not running mozmill tests yet).

In short, it's somewhat surprising to me that this actually works. Just looking at some of the build generation shows some coarse changes: between October 2008 and June 2009, the size of the compressed test log files increase 6-fold, and the compressed lcov output has nearly doubled in the same period. Lcov also reported that the coverage increased from about 20% to around 40% as well.

Sometime later, I'll hope to get mozmill tests working, as well as improving the JS code coverage to actually work for Thunderbird (it doesn't like E4X nor some of the other files for no apparent reason). Since jscoverage works by modifying the JS code, I can run that without really needing the builds (archived nightlies plus tricking the build-system will work). When all that data is collected, or sometime before, I'll make a nice little web-app that shows all of this information so people can gawp at pretty pictures.

If you want to try this on your own, here is the shell script I used to actually collect data:

#!/bin/bash

if [ -z $1 ]; then
    echo "Need a date to build"
    exit 1
fi
DATE=$1

REV=$(grep $DATE comm-revs.log | cut -d' ' -f 3)
MOZREV=$(grep $DATE moz-revs.log | cut -d' ' -f 3)

if [ -z $REV -o -z $MOZREV ]; then
    echo "Illegal date"
    exit 2
fi

echo "Updating to $REV"
hg -R src update -r "$REV"
hg -R src/mozilla update -r "$MOZREV"
pushd src
python client.py --skip-comm --skip-mozilla checkout &> ../config-$REV.log
make -f client.mk configure &> ../config-$REV.log
popd

pushd obj/mozilla
#make -C .. clean &>/dev/null
for f in $(ls config/autoconf.mk nsprpub/config/autoconf.mk js/src/config/autoconf.mk); do
    sed -e 's/-fprofile-arcs -ftest-coverage//' -e 's/-lgcov//' -i $f
done
echo "Building mozilla..."
make -j3 &> ../../build-$REV.log
popd

pushd src
echo "Building comm-central..."
make -f client.mk build &> ../build-$REV.log || exit
popd

LCOV=lcov-1.8/bin/lcov
$LCOV -z -d obj
pushd obj/
echo "Running tests..."
rm -f mozilla/dist/bin/plugins/*
make -k check &>../tests-$REV.log
make -k xpcshell-tests 2>&1 >>../tests-$REV.log
popd
$LCOV -c -d obj -o $REV.info
echo 'Done!'

Don't bother complaining to me if it doesn't work for you. I just did what I needed to do to get it to reliably work. And be prepared to wait for a few hours to collect any non-trivial number of builds. It took me about 12 hours to get 6 months worth of data using 6 different computers; the next 6 months is still going on right now.

Friday, April 2, 2010

Code coverage to the extreme

No comments: