Saturday, January 10, 2015

A unified history for comm-central

Several years back, Ehsan and Jeff Muizelaar attempted to build a unified history of mozilla-central across the Mercurial era and the CVS era. Their result is now used in the gecko-dev repository. While being distracted on yet another side project, I thought that I might want to do the same for comm-central. It turns out that building a unified history for comm-central makes mozilla-central look easy: mozilla-central merely had one import from CVS. In contrast, comm-central imported twice from CVS (the calendar code came later), four times from mozilla-central (once with converted history), and imported twice from Instantbird's repository (once with converted history). Three of those conversions also involved moving paths. But I've worked through all of those issues to provide a nice snapshot of the repository [1]. And since I've been frustrated by failing to find good documentation on how this sort of process went for mozilla-central, I'll provide details on the process for comm-central.

The first step and probably the hardest is getting the CVS history in DVCS form (I use hg because I'm more comfortable it, but there's effectively no difference between hg, git, or bzr here). There is a git version of mozilla's CVS tree available, but I've noticed after doing research that its last revision is about a month before the revision I need for Calendar's import. The documentation for how that repo was built is no longer on the web, although we eventually found a copy after I wrote this post on I tried doing another conversion using hg convert to get CVS tags, but that rudely blew up in my face. For now, I've filed a bug on getting an official, branchy-and-tag-filled version of this repository, while using the current lack of history as a base. Calendar people will have to suffer missing a month of history.

CVS is famously hard to convert to more modern repositories, and, as I've done my research, Mozilla's CVS looks like it uses those features which make it difficult. In particular, both the calendar CVS import and the comm-central initial CVS import used a CVS tag HG_COMM_INITIAL_IMPORT. That tagging was done, on only a small portion of the tree, twice, about two months apart. Fortunately, mailnews code was never touched on CVS trunk after the import (there appears to be one commit on calendar after the tagging), so it is probably possible to salvage a repository-wide consistent tag.

The start of my script for conversion looks like this:


set -e


# Bug 445146: m-c/editor/ui -> c-c/editor/ui

# Bug 669040: m-c/db/mork -> c-c/db/mork

# Bug 1027241, bug 611752 m-c/security/manager/ssl/** -> c-c/mailnews/mime/src/*

# Step 0: Grab the mozilla CVS history.
if [ ! -e $HGCVS ]; then
  hg clone git+ $HGCVS

Since I don't want to include the changesets useless to comm-central history, I trimmed the history by using hg convert to eliminate changesets that don't change the necessary files. Most of the files are simple directory-wide changes, but S/MIME only moved a few files over, so it requires a more complex way to grab the file list. In addition, I also replaced the % in the usernames with @ that they are used to appearing in hg. The relevant code is here:

# Step 1: Trim mozilla CVS history to include only the files we are ultimately
# interested in.
cat >$WORKDIR/convert-filemap.txt <<EOF
# Revision e4f4569d451a
include directory/xpcom
include mail
include mailnews
include other-licenses/branding/thunderbird
include suite
# Revision 7c0bfdcda673
include calendar
include other-licenses/branding/sunbird
# Revision ee719a0502491fc663bda942dcfc52c0825938d3
include editor/ui
# Revision 52efa9789800829c6f0ee6a005f83ed45a250396
include db/mork/
include db/mdb/

# Add the S/MIME import files
hg -R $MC log -r "children($MC_SMIME_IMPORT)" \
  --template "{file_dels % 'include {file}\n'}" >>$WORKDIR/convert-filemap.txt

if [ ! -e $WORKDIR/convert-authormap.txt ]; then
hg -R $HGCVS log --template "{email(author)}={sub('%', '@', email(author))}\n" \
  | sort -u > $WORKDIR/convert-authormap.txt

hg convert $HGCVS $OUTPUT --filemap convert-filemap.txt -A convert-authormap.txt

That last command provides us the subset of the CVS history that we need for unified history. Strictly speaking, I should be pulling a specific revision, but I happen to know that there's no need to (we're cloning the only head) in this case. At this point, we now need to pull in the mozilla-central changes before we pull in comm-central. Order is key; hg convert will only apply the graft points when converting the child changeset (which it does but once), and it needs the parents to exist before it can do that. We also need to ensure that the mozilla-central graft point is included before continuing, so we do that, and then pull mozilla-central:

CC_CVS_BASE=$(hg log -R $HGCVS -r 'tip' --template '{node}')
CC_CVS_BASE=$(grep $CC_CVS_BASE $OUTPUT/.hg/shamap | cut -d' ' -f2)
MC_CVS_BASE=$(hg log -R $HGCVS -r 'gitnode(215f52d06f4260fdcca797eebd78266524ea3d2c)' --template '{node}')
MC_CVS_BASE=$(grep $MC_CVS_BASE $OUTPUT/.hg/shamap | cut -d' ' -f2)

# Okay, now we need to build the map of revisions.
cat >$WORKDIR/convert-revmap.txt <<EOF
e4f4569d451a5e0d12a6aa33ebd916f979dd8faa $CC_CVS_BASE # Thunderbird / Suite
7c0bfdcda6731e77303f3c47b01736aaa93d5534 d4b728dc9da418f8d5601ed6735e9a00ac963c4e, $CC_CVS_BASE # Calendar
9b2a99adc05e53cd4010de512f50118594756650 $MC_CVS_BASE # Mozilla graft point
ee719a0502491fc663bda942dcfc52c0825938d3 78b3d6c649f71eff41fe3f486c6cc4f4b899fd35, $MC_EDITOR_IMPORT # Editor
8cdfed92867f885fda98664395236b7829947a1d 4b5da7e5d0680c6617ec743109e6efc88ca413da, e4e612fcae9d0e5181a5543ed17f705a83a3de71 # Chat

# Next, import mozilla-central revisions
  hg convert $MC $OUTPUT -r $rev --splicemap $WORKDIR/convert-revmap.txt \
    --filemap $WORKDIR/convert-filemap.txt

Some notes about all of the revision ids in the script. The splicemap requires the full 40-character SHA ids; anything less and the thing complains. I also need to specify the parents of the revisions that deleted the code for the mozilla-central import, so if you go hunting for those revisions and are surprised that they don't remove the code in question, that's why.

I mentioned complications about the merges earlier. The Mork and S/MIME import codes here moved files, so that what was db/mdb in mozilla-central became db/mork. There's no support for causing the generated splice to record these as a move, so I have to manually construct those renamings:

# We need to execute a few hg move commands due to renamings.
pushd $OUTPUT
hg update -r $(grep $MC_MORK_IMPORT .hg/shamap | cut -d' ' -f2)
(hg -R $MC log -r "children($MC_MORK_IMPORT)" \
  --template "{file_dels % 'hg mv {file} {sub(\"db/mdb\", \"db/mork\", file)}\n'}") | bash
hg commit -m 'Pseudo-changeset to move Mork files' -d '2011-08-06 17:25:21 +0200'
MC_MORK_IMPORT=$(hg log -r tip --template '{node}')

hg update -r $(grep $MC_SMIME_IMPORT .hg/shamap | cut -d' ' -f2)
(hg -R $MC log -r "children($MC_SMIME_IMPORT)" \
  --template "{file_dels % 'hg mv {file} {sub(\"security/manager/ssl\", \"mailnews/mime\", file)}\n'}") | bash
hg commit -m 'Pseudo-changeset to move S/MIME files' -d '2014-06-15 20:51:51 -0700'
MC_SMIME_IMPORT=$(hg log -r tip --template '{node}')

# Echo the new move commands to the changeset conversion map.
cat >>$WORKDIR/convert-revmap.txt <<EOF
52efa9789800829c6f0ee6a005f83ed45a250396 abfd23d7c5042bc87502506c9f34c965fb9a09d1, $MC_MORK_IMPORT # Mork
50f5b5fc3f53c680dba4f237856e530e2097adfd 97253b3cca68f1c287eb5729647ba6f9a5dab08a, $MC_SMIME_IMPORT # S/MIME

Now that we have all of the graft points defined, and all of the external code ready, we can pull comm-central and do the conversion. That's not quite it, though—when we graft the S/MIME history to the original mozilla-central history, we have a small segment of abandoned converted history. A call to hg strip removes that.

# Now, import comm-central revisions that we need
hg convert $CC $OUTPUT --splicemap $WORKDIR/convert-revmap.txt
hg strip 2f69e0a3a05a

[1] I left out one of the graft points because I just didn't want to deal with it. I'll leave it as an exercise to the reader to figure out which one it was. Hint: it's the only one I didn't know about before I searched for the archive points [2].
[2] Since I wasn't sure I knew all of the graft points, I decided to try to comb through all of the changesets to figure out who imported code. It turns out that hg log -r 'adds("**")' narrows it down nicely (1667 changesets to look at instead of 17547), and using the {file_adds} template helps winnow it down more easily.

No comments: