Wednesday, February 20, 2008

Mork is evil, but...

I decided today to finally start poking around the Mork reader code that was introduced to import history data, since the plan is to use it to migrate mork to SQLite code. I had read earlier enough to know that I would have to look into having morkreader handle multiple tables, but what I saw was just astounding. You see, morkreader is essentially a 580-line hack.

Don't get me wrong, I have nothing against hacks. My patch for fixing searching in base64-encoded messages was quite hacky as well. But a few things justify the hack. First, the legacy code needed some pretty severe refactoring to handle the recursive nature of MIME. Second, the improperly-handled cases should be rather rare in nature: the simplest way to generate a case is to forward-as-attached a message with a base64-encoded attachment. (I concede: a third reason is that I wrote it, but that pales in importance to the other two, I swear...) Morkreader, however, did not need to work around crufty legacy APIs, nor are its improperly-handled components uncommon.

The first assumption that morkreader makes is that there is only one table (mailnews loves having several tables). Adding in support for multiple tables would not be too difficult if the parser wasn't already broken in other ways. The number two assumption it makes is that no line is longer than 80-characters (which should be safe) and that no line is continued more than once (i.e., no more than 160-characters)... which is complete BS, as anyone who has ever subscribed to a mailing list can recognize (think of the References: header). Finally, the code will not handle aborted changesets properly, the banality of which I cannot determine. (Does mailnews code ever use the ! change type?)

So, to assess the accuracy of morkreader, I read the mork specification. Cross-referencing with some mork files of mine (a FF 2 history.dat, an abook.mab, and an inbox msf file), I discovered that the specification itself is inaccurate. Once again, two failings here: (atomScope=c) should be (a=c), and the spec implies that -[...] is the proper way to remove a row, whereas [-...] is the actual method. A subtle statement in the spec says that a + can be omitted from changesets.

I had to fix mork to get my first patch for bug 413260 working, looks like I'll have to fix morkreader as well. Oh well, magic will happen if Friday is as predicted...

Sunday, February 17, 2008

More on rewrites

No one can deny that mailnews needs some rewrites pretty badly. The address book is getting an overhaul right now. Message databases are planned to have a second overhaul soon; thoughts are starting to fly around for an account manager rewrite as well. RSS gets one as well. Compose, MIME, and news code all need rewrites as well. Obviously, most are going to miss TB 3. Address book looks set to make it; ditto with kill-RDF; RSS will also probably slide in. Everyone else gets to wait for TB 4—or even later.

As I have mentioned before, I am in the midst of rewriting address book. The ultimate goal is to replace mork with mozStorage. But the interfaces are a large barrier in implementing these. So bug 382876 is blocked by bug 413260. No sane person would put all of the changes into one patch though, it's just too many. I therefore expect bug 413260 to have three or four patches fixing up one part of the story. And these are not going to be small by any stretch of the imagination: the first part alone is -2000/+1000 lines of code. And all that does is modify nsIAbCard.

Second and third in bug 413260 involve two more interfaces. The second part will be to implement the new nsIAbDirectory, which will involve cleaning up usages of nsIAbMDBDirectory and nsIAddrDatabase. I expect that to end up with 1000 lines of changes at least. The third part is to clean up the mailing list mess; this change is, in my opinion, the most important change of the interface setup. Finally is the maybe-fourth part, implementing the refactored changes into LDAP code.

After getting three or four large patches for bug 413260 committed comes the large patch for bug 382876, which needs some modifications to morkreader as well(which I hopefully won't have to write!). Finishing that allows me to start on message databases. It looks as if some of the ideas surrounding bug 11050 won't be touched until TB 4 simply for the sake of not overloading people with so many rewrites in such a short time.

Finally come the other slew of rewrites. jminta is so kindly doing kill-RDF. Other people are working on the RSS changes; I haven't used RSS on my Trunk builds yet, so I can't evaluate any changes since 2.0 yet nor will I likely do so for some time. Rewriting news code is a nice distraction when I'm frustrated at other code; however I am waiting for permission to really axe large chunks of it before do serious work on it. Compose and MIME get no love at the moment. And the account manager has to wait for agreement before it gets its rewrite: the most people can agree on at this point is that "it needs to change." And so life continues...

Monday, February 11, 2008

Anatomy of a Refactoring

The first part of bug 413260, refactoring nsIAbCard, is finally starting the review process, freeing my up to start on part two, nsIAbMDBDirectory. The goal here is to remove this heavily-used interface. For those of you who are only being introduced to large-scale refactorings, here is a simple step-by-step guide for refactoring.

  1. Pray that only a little JavaScript is involved. As much as people fall in love with JavaScript, I greatly prefer C++. Cases like this prove why: C++ complains when you compile that something goes wrong; these simple problems are deferred until actual execution in JavaScript. Sometimes, these problems crop out in the most out-of-reach places: one usage of nsIAbCard, unfound by grep, exists in msgHdrViewOverlay.js, one of the last places one would expect to find address book usages.
  2. Fire up grep and find usage characteristics. In the case of nsIAbMDBDirectory, I see that it is used outside of addrbook for two reasons: cardForEmail and to get the database. The former is simple to deal with via a minor refactoring, the latter requires some more in-depth analysis.
  3. Mark stuff as deprecated and compile. Note that gcc 4.3 or better is required to catch the most common case (nsCOMPtr stuff) and that, as of right now, my XPIDL/nscore.h patch is needed to mark IDL files as being deprecated. If you're using gcc 4.3, -Wno-conversion is highly recommended.
  4. Ensure that the tests use the new stuff. Tests are the simplest JavaScript to handle, primarily because everything is used. They also alert you to broken migration.
  5. Remove deprecated and change other JS. Don't forget to test the crap out of it. This will by far take the longest to execute. Stuff can crop up in weird places; JavaScript analysis is on my list of things to do, but I'm looking at IDL/C++/JS+Mozilla+ctags+vim automagic first.

Tuesday, February 5, 2008

Politics and civility

There is one rule in particular I try to keep: to read over everything I get fully and carefully. With three email addresses (comprising a half-dozen mailing lists) I regularly check, two daily newspapers, one weekly news magazine, eleven newsgroups of varying daily post rates, and too many RSS feeds to even count anymore, this is a rule I break more often than I would like, despite spending well over an hour each day doing so. The frequency of my blog posting is proof enough of this—I would like to post once every two days, a feat which I have already given up on doing.

To make a long story short, this cutting back of in-depth reading has impacted one of the blogs I read, the Fact Checker for the Washington Post. I mostly skim the article and focus more on the comments these days. And similar to how I transitioned my reading of this blog over its lifetime, the comments have transitioned. Crucially, they have gotten worse as time continues.

It used to be that the comments were thoughtful and pointed out some of the factual errors. Now, the comments have turned nasty, with obvious political slants coming out. In the most recent posting (discussing the Republican candidates' repositioning on major issues), the first comment was a strong anti-Republican that didn't really relate to the article. Fourth was another slamming comment, again irrelevant. Same with the 8th, 11th, 19th, 20th, 21st, and around a third of the comments in general. How many of the rest were the thoughtful, reasoned responses I saw at the beginning? A handful, although many were in response to the fringish comments earlier posted.

After shaking my head at this, I turn to one of my newsgroups, sci.math. Recently, a poster by the alias of JSH posted some stuff. This poster is not particularly well-liked in this newsgroup for an aura of doing shoddy mathematics, inflating claims, and ignoring objections. I am not sufficiently well-versed in the relevant fields to know the correctness of his mathematics (they look suspect to me, but that doesn't count for much), but I do know that his refusal to attempt to factor an RSA number with his factoring algorithm casts suspicion on its correctness, and that he also did not reply to some of my requests for clarification.

With respect to this poster, I once awaited his posts, not because I was fascinated in the mathematics, but because they usually had some measure of debate to go with them. I found the posts on his return disappointing for a similar reason that I was irritated at the comments on the earlier blog. These debates had grown uninteresting. JSH was pontificating without responding, and other people just vehemently skewered him without remorse, as if their entire lives revolved around insulting him as much as possible.

Which brings me, albeit in a roundabout manner (an endemic problem of mine), to my point. It seems that the world at large has grown unable to speak civilly. I have always tried to keep my postings as civil as possible, but it seems that in many replies I look at, the poster made no such attempt. The most egregious violation of civility is in the political arena. Take a group of Democrats with only moderately-held beliefs and a group of Republicans with similarly moderately-held beliefs, and the resulting confrontation will shortly become a physical one without outside intervention. It seems to me that something about politics today has driven people to untenable extremes and is in part the cause of the lack of compromise in today's political world. I just can't see something like the Compromise of 1850 (which staved the American Civil War off for a decade) happening today...

Saturday, February 2, 2008

Changes to come

For the past few weeks, my main work has been involved with the address book rewrite. And a long job that has been—the patch touches over 30 files with a diff measuring some 1800 or so lines removed and about 1000 or so added (total savings seems to be in the 700's). And blimey, I've only modified one interface (to be fair, it is the most used interface...). Still, it isn't finished: import and palmsync almost undoubtedly break with this patch; LDAP may as well. Finally, it is recently bitrotted by another patch (given the scope of changes, bitrotting was likely to begin with).

My work was, however, sped along by another change (this one in the pipelines already). I added a deprecated attribute to XPIDL that allows me to mark a function as deprecated, rebuild, and see who uses that function. It is not reified to JS though (making JS usages as annoying as ever to work with). gcc 4.2 has a problem that makes this useless for virtual functions (essentially making it worthless for XPIDL); gcc 4.3 fixes this, but it is considerably more noisy in warnings and doesn't like linking with gcc 4.2 code. Go figure.

Change #3 is still in my conception pipeline. This one is to make make alltags a tad bit more correct. My idea is to only pipe dist/include and dist/idl into ctags (separately, though). The problem here is that XPIDL functions are typically declared with NS_DECL_NSIABITEM, for example, making it useless when I need to find the definition of functions. Then there is the other problem: I more often want to go to definitions than declarations. Ramping up my configuration magic in vim may be the way to go here.

Those were all stuff that I have worked on so far. Now comes the stuff that I plan to work on. The first on my list (not necessarily the first I will work on) goes back to the account manager. As anyone who has been reading recently should have discovered by now, the account manager is the source of a fair number of complaints. Between the use of RDF and some confusing UI (especially with regards to RSS), it desperately needs an overhaul. Another problem in the account manager is somewhat difficult to see. It is the server manager as well; trying to use a server without creating an account is impossible, and even with the account, it is difficult.

Number 2 of this second class is involved with filters. Recently, I came across SIEVE, and decided to look into it. In short, it is a specification for a mail filter language. Since it is a series of RFCs (with some draft RFCs including discussion with mail servers), it would probably be supported elsewhere. This conceptual idea is to use Sieve as the filtering backend, which may fix some problems and would definitely open up a few new questions.

Several more things weigh in on my pontification list that I have already mentioned. I've started collecting a list of mailing lists for my webscrape idea; I also have two forums lined up as well for testing purposes. Continuing work on redesigning my blog is a given. De-morkification I've said in my recent posts, and my work in news filter overhauls are still stymied on bug 16913 going through. Ah well, they'll go through in time...