Monday, March 31, 2008

Pluggability

Pluggability is an area where there is often still much to explore. A fair amount of the mailnews work involves this area. The address book rewrite (bug 413260) is designed primarily to refactor the interfaces to make adding the SQLite backend (bug 382876) easier: a fair amount of code assumes mork than is healthy. My current WIPs have limited this code to portions within addrbook, palmsync, and parts of import (i.e., not much at all). Not only is the SQLite backend going to benefit, but LDAP does in part (although I have not implemented enough of its methods), and probably Outlook and OS X directories as well.

Another large area where mailnews could use a pluggable interface is storage. In my last post, I discussed my thoughts on storage. To clarify one point: I am not against pluggable storage APIs or even implementations of most of those presented; it is just my opinion that mbox should probably be the default in lieu of a better choice.

A third area is the account manager. As far as I can tell, the only thing everyone can agree upon is that it does not work well as it stands. Try creating a new account type and you'll see what I mean. Killing RDF will help this in part, but the current consensus is to hold off a rewrite in this area until post-Thunderbird 3.

To keep this post short, I'll omit details in the other areas where we could use pluggability. The view pane could use it at least in part (see this site for more information). Supporting synchronization with mobile devices (several bugs exist here) would need one, if we want to go beyond palmsync.

Thursday, March 27, 2008

Mail storage

There are few things in the world that are universally agreed upon. Mail storage is not one of those: many people say that mbox is a poor format and would rather have some other form of mail storage. The suggestions I've seen include maildir (qmail-style or other), database storage, creating a false filesystem, or use IMAP and shunt the storage problem off to somebody else. Most of these have their own problems.

So why is mbox such a bad format? Supposedly, it doesn't scale. An mbox measuring a few GB's causes problems because it's a large file. There's also the tricky problem of deleting: a one-byte change is cheap, and so is appending, but midfile deletion or insertion is expensive.

In contrast, people sing the praises of maildir: by using one file per message, deletion is cheap. But there are hidden costs. Stating a directory to find new messages or deleted messages is relatively expensive. Also, modern filesystems attach metadata to each file. A 1KB metadata is not noticeable in a 1GB file, but 1KB metadata for each of 50,000 files is 50MB, which can be noticeable.

Using databases for mail storage? Yes, people have suggested it (bug 361087), and one even has the gall to request it as blocking-thunderbird3 (Point of order: I would probably reject maildir as blocking and even pluggable storage APIs I would only go so far as to say wanted). The basic reason cited for doing so is that "databases... are very stable and robust." Note however that mboxes are older, more stable, and more robust in theory and probably in practice too. And scalability? Exact same problems with mbox, only slightly exacerbated (probably going to have more indexes).

The second-to-last option (false filesystem) has problems of its own. From the comments I read, it would appear to force mozilla to carry along another lib*** implementation that I suspect is ill-tested. I also suspect that no one has tried (at least very hard) to port it to Windows. I also suspect this holds the same scalability flaws (the argument for this is "individual mail storage is [not] the job of the MUA anymore," to be fair).

So where are we? The primary argument against mbox is that it scales poorly. Yet all of the other suggested replacements suffer the same problems, manifested in different ways. Echoing Churchill's comment on democracy, mbox is the worst mail storage format except for all the others. It actually has a lot going for it: it's simple and universal, more than the others can claim.

If you really want to fix scalability, there are two options. First, don't keep GB of mail. I may accumulate 100 MB of mail in a year (half of it spam, actually), but I clean my mail out at least yearly to prune conversations that are outdated. Option 2: keep your folders small. Mailing list archives starts a new archive each month by default, which tends to keep the mailing list from getting large.

Wednesday, March 19, 2008

A blizzard of updates, part 2

Yesterday ended the second day of blizzard updates. Today I'm attending a hardware setup party, so I won't have a full day of blizzards. What I did do yesterday:

More on nsIAbCard
I now have a patch once again requesting review in this area. nsAbOSXCard drove me mad when I was going back over my changes. Something about aMember.Equals(aValue) didn't seem like it made sense; it turns out that it wasn't quite right, because the meaning of aMember had been changed ever so slightly.
nsIAbDirectory
I started looking into replacing more of nsIAddrDatabase in import, and immediately backed off. Crazy stuff happens there, so I'll need a few hours to work on that without distraction. I've also hooked up nsIAbCollection and am slowly starting to make it work right. Keyword: slowly.
nsNNTPProtocol
Getting the list of newsgroup is somewhat confusing, as it launches some callbacks to avoid hanging the UI thread. Life will be so much better when protocols go async.
libmime
I didn't spend as much time on this as I expected to, but the initial forays look promising. Its implementation is amazingly simple for having a custom C++-like format, so an automated rewrite should go smoothly. A naming scheme is rigidly enforced for the class/object hierarchy, and connecting the function implementations to the defined virtual function pointers looks trivial. Finding the inheritance list is simple, especially because it doesn't practice MI. The hardest part, though, is constructors. They may need to be done manually.
MorkReader
My biggest announcement is the one I'm saving 'till the end. I've started work on MorkReader.cpp, after spending a few hours going through the morkThumb morkParser morkBuilder code to see how the most complete implementation of mork sees a file. Did you know that db/mork doesn't actually fully implement the mork specification? Also, groups.google.com appears to have spotty records of old netscape.mozilla.public.mail-news postings (I need the ones from late 1998/early 1999), and news.mozilla.org only goes back to 2003. Fun times...

And for today? A hardware party, followed by some more work on MorkReader. Hopefully with ample help from David Bienvenu.

Monday, March 17, 2008

A blizzard of updates, part 1

So it's the conclusion of the first day of my fun-filled week, and I've managed to have 3 updates today, as well as another 2 over the weekend. Here's the list:

nsIAbCard sanity
(bug 413260, pt. 1) Okay, this was done over the weekend, but still. From my previous patch, I converted some property strings to UTF-16 instead of the UTF-8 they were originally, as well as doing a bit more general cleanup in the vicinity.
nsIAbDirectory sanity
(bug 413260, pt. 2) Once again, over the weekend. Work was completed in yanking MDB-specific stuff outside of the abook extended trio (addrbook, import, extensions/palmsync). In addition, I put some time in cleaning out the interface into the new stuff.
bug 400331
I finally got around starting on Sunday to opening this back up. My list of preapproval has grown to include authentication code and newsrc. In my personal builds, it doesn't warn anymore on gcc 4.2, and I'm slowly improving on gcc 4.3 (I won't even try until I get PRInt32 converted to nsresult). Having received permission from David Bienvenu to significantly change to the point where cvsblame becomes unhelpful, I've started moving around functions to make it easier to understand how stuff works. Finally, I'm trimming the size of the class (only trivially right now), and I'm slowly dismantling the horror that is SendFirstNNTPCommand. It will be a long time before I get to doing higher-level logical structure.
bug 11054
Now that all other code tripping me up has been completed, I returned to work on this for the first time in at least a month. Quickly, I discovered that Neil had added a bit of code that cascaded into über-failure and heavy database mangling (horrible for news code). Out of this (which took a few hours to get working again), I discovered that a reasonable assertion to fire becomes hard to sort out quickly when applying trivial optimization (don't recurse into children of ignored messages, which means we can't find out how many are unapplied to correct), and that the UI code is causing future things to break in even more annoying ways. I'll have to look into fixing that breakage; I have an extremely easy way to reproduce said error.
bug 418551
(demork in panacea.dat) Mark Banner pushed his profile-directory creation changes in the middle of yesterday, allowing me to unbitrot this bug and get it working. Note that the past that is posted needs one change to compile: change the .equals to a .Equals. I blame Java.
Bug triage.
I set up today an Outlook/Outlook Express parity bug (bug 423488) based on some rather simple queries. Needless to say, the list is a bit long for my tastes (and I was pretty conservative about adding stuff to said list!). I've also started working on getting updates on news bugs, just as I now get updates on mailnews database bugs. Expect a mass QA-reassign shortly!

Long list for day one, isn't it? Tomorrow, I hope to be able to look into automating a rewrite of libmime, pushing out new changes to bug 413260 given Mark's updated interfaces, and starting writing that new morkreader that I need for 382876 and 11050.

One more thing: I'm starting to compile feedback on news server impls for some basic NNTP planning. I've heard back from Giganews, which has said that RFC 3977 is not on their list right now, and which also tells me that they support LIST OVERVIEW.FMT for the list of XHDR headers. INN is open source, so I can see what they have. Tornado/Typhoon/whatever the heck is the right news server I'm still waiting to hear from. I've picked these server impls because they represent the three news servers I use: news.mozilla.org, news.aioe.org, and news.verizon.net, respectively.

A fun-filled week

Having a nice long week without needing to do much else, I'll be putting in some quality time on mozilla this week. Of my seven or so tasks, I hope to get at least updated patches on all of them. No, I am not committing suicide.

Bug 413260
This is the address book rewrite, one of the to-be-core features of TB 3. Hopefully, part 1 (nsIAbCard rewrites) will be committed by the end of the week and part 2 (nsIAbDirectory rewrites) will be in review stages. The third part (mailing list sanification) should be posted at least in part. I'm not going to work on the hypothetical part 4: implement some of the functions for LDAP.
Demorkification
De-mork in panacea.dat msgFolderCache.sqlite will hopefully be complete and committed as well this week. After that, I'm going to start work on creating a better mork reader (nsMorkReader is insufficient to handle a .mab or .msf file), which blocks completion of bugs 382876 and 11050 (address book and message database, respectively).
Fakeserver implementation
I hope to have more flesh put on fakeserver this week, since I should have more time to actually figure out how to set up an account, which is blocking my work.
Libmime rewrite
With any luck, I should have some time to write some dehydra and elsa scripts that will profile libmime to infect it with the C++ virus. This should allow people to finally approach libmime to be able to hack it and bring it into the 21st century.
nsNNTPProtocol
To say I'm going over it with a fine-tooth comb is an understatement. I've expanded the scope of the rewrite to include whitespace updates, removal of accumulated cruft, function reordering for logical coherency, breaking up SendFirstNNTPCommand for clarity, documentation of what happens, identifying places where code should be updated, and shrinking the size of the class. I do not know want to know what sizeof(nsNNTPProtocol) is right now, it's that large.
NNTP/Usenet wins
ROT-13 implementation and LIST PRETTYNAMESLIST NEWSGROUPS are two low-risk, just-needs-UI wins. Filter-after-the-fact is a medium-risk win. Spam detection and combine-and-decode or other multipart are high-risk, high-value wins (imagine the elation of alt.binaries users or sci.math users). Bug 176238 is instructive to see the full list.

Time for less yapping and more coding!