Tuesday, May 7, 2013

Understanding the comm-central build system

Among the build systems peer, I am very much a newcomer. Despite working with Thunderbird for over 5 years, I've only grown to understand the comm-central build system in gory detail in the past year. Most of my work before then was in trying to get random projects working; understanding it more thoroughly is a result of attempting to eliminate the need for comm-central to maintain its own build system. The goal of moving our build system description to a series of moz.build files has made this task much more urgent.

At a high level, the core of the comm-central build system is not a single build system but rather three copies of the same build system. In simple terms, there's a matrix on two accesses: which repository does the configuration of code (whose config.status invokes it), and which repository does the build (whose rules.mk is used). Most code is configured and built by mozilla-central. That comm-central code which is linked into libxul is configured by mozilla-central but built by comm-central. tier_app is configured and built by comm-central. This matrix of possibilities causes interesting bugs—like the bustage caused by the XPIDLSRCS migration, or issues I'm facing working with xpcshell manifests—but it is probably possible to make all code get configured by mozilla-central and eliminate several issues for once and all.

With that in mind, here is a step-by-step look at how the amorphous monster that is the comm-central build system works:

python client.py checkout

And comm-central starts with a step that is unknown in mozilla-central. Back when everyone was in CVS, the process of building started with "check out client.mk from the server, set up your mozconfig, and then run make -f client.mk checkout." The checkout would download exactly the directories needed to build the program you were trying to build. When mozilla-central moved to Mercurial, the only external projects in the tree that Firefox used were NSPR and NSS, both of which were set up to pull from a specific revision. The decision was made to import NSPR and NSS as snapshots on a regular basis, so there was no need for the everyday user to use this feature. Thunderbird, on the other hand, pulled in the LDAP code externally, as well as mozilla-central, while SeaMonkey also pulls in the DOM inspector, Venkman, and Chatzilla as extensions. Importing a snapshot was not a tenable option for mozilla-central, as it updates at an aggressive rate, so the functionality of checkout was ported to comm-central in a replacement python fashion.

./configure [comm-central]

The general purpose of configure is to discover the build system and enable or disable components based on options specified by the user. This results in a long list of variables which is read in by the build system. Recently, I changed the script to eliminate the need to port changes from mozilla-central. Instead, this script reads in a few variables and tweaks them slightly to produce a modified command line to call mozilla-central's configure...

./configure [mozilla-central]

... which does all the hard work. There are hooks in the configure script here to run a few extra commands for comm-central's need (primarily adding a few variables and configuring LDAP). This is done by running a bit of m4 over another file and invoking that as a shell script; the m4 is largely to make it look and feel "like" autoconf. At the end of the line, this dumps out all of the variables to a file called config.status; how these get configured in the script is not interesting.

./config.status [mozilla/comm-central]

But config.status is. At this point, we enter the mozbuild world and things become very insane; failure to appreciate what goes on here is a very good way to cause extended bustage for comm-central. The mozbuild code essentially starts at a directory and exhaustively walks it to build a map of all the code. One of the tasks of comm-central's configure is to alert mozbuild to the fact that some of our files use a different build system. We, however, also carefully hide some of our files from mozbuild, so we run another copy of config.status again to add in some more files (tier_app, in short). This results in our code having two different notions of how our build system is split, and was never designed that way. Originally, mozilla-central had no knowledge of the existence of comm-central, but some changes made in the Firefox 4 timeframe suddenly required Thunderbird and SeaMonkey to link all of the mailnews code into libxul, which forced this contorted process to come about.


Now that all of the Makefiles have bee generated, building can begin. The first directory entered is the top of comm-central, which proceeds to immediately make all of mozilla-central. How mozilla-central builds itself is perhaps an interesting discussion, but not for this article. The important part is that partway through building, mozilla-central will be told to make ../mailnews (or one of the other few directories). Under recursive make, the only way to "tell" which build system is being used is by the directory that the $(DEPTH) variable is pointing to, since $(DEPTH)/config/config.mk and $(DEPTH)/config/rules/mk are the files included to get the necessary rules. Since mozbuild was informed very early on that comm-central is special, the variables it points to in comm-central are different from those in mozilla-central—and thus comm-central's files are all invariably built with the comm-central build system despite being built from mozilla-central.

However, this is not true of all files. Some of the files, like the chat directory are never mentioned to mozilla-central. Thus, after the comm-central top-level build completes building mozilla-central, it proceeds to do a full build under what it thinks is the complete build system. It is here that later hacks to get things like xpcshell tests working correctly are done. Glossed over in this discussion is the fun process of tiers and other dependency voodoo tricks for a recursive make.

The future

With all of the changes going on, this guide is going to become obsolete quickly. I'm experimenting with eliminating one of our three build system clones by making all comm-central code get configured by mozilla-central, so that mozbuild gets a truly global view of what's going on—which would help not break comm-central for things like eliminating the master xpcshell manifest, or compiling IDLs in parallel. The long-term goal, of course, is to eliminate the ersatz comm-central build system altogether, although the final setup of how that build system works out is still not fully clear, as I'm still in the phase of "get it working when I symlink everything everywhere."


sfink said...

As a totally naive observer, it seems to me that we have multiple builds that really want to be a toplevel repo that pulls in one or more other repos. I've heard that b2g is this way, for example. There are also parts of the m-c tree that would prefer to be split out and developed separately.

Is it perhaps time to switch over to that model completely? As in, do things like teaching tbpl to record a collection of rev ids, one from each repo, and have buildbot pull changes from each of those subrepos (some of which may be hg, some git, perhaps some svn/cvs/...).

If we're going to have to move in that direction for other projects, perhaps comm-central could stop being quite so special?

kairo said...

Back when I created the comm-central repository, there were basically two preconditions:
1) We were told that anything we do for SeaMonkey, Thunderbird and Sunbird/Lightning cannot need any changes to mozilla-central, so we needed to treat it as a singleton. I could sneak in a few more uses of the pre-existing MOZILLA_DIR variable that eased sharing files between comm-central and mozilla-central, but that's it.
2) Supposedly, all Mozilla applications were moving towards being based on a single and generic XULRunner, so we'd end up having this separation at some point anyhow. Or so we thought.

I tried to share as much as possible, but unfortunately that wasn't possible for a good part of the build system. In any case, the two things above explain some design choices I/we made back then. Of course, the future would show that #1 wouldn't be as strict in the long run (yay) and #2 just didn't come true - and now the whole moz.build and similar work on the m-c build system is changing the whole landscape.

Thanks for working on making things better in the new world!