<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-5947958124349996271</id><updated>2012-01-11T08:57:31.987-05:00</updated><category term='mozila'/><category term='abrewrite'/><category term='jshydra'/><category term='clang'/><category term='listarchive'/><category term='visualization'/><category term='mozilla mailnews accttype'/><category term='webscraper'/><category term='mork'/><category term='news'/><category term='politics'/><category term='llvm'/><category term='accttype'/><category term='ablation'/><category term='pork'/><category term='camping'/><category term='mozilla'/><category term='mailnews'/><category term='bug413260'/><category term='dxr'/><category term='blizzard'/><category term='codecoverage'/><category term='profiling'/><category term='opinions'/><title type='text'>Quetzalcoatal</title><subtitle type='html'>Random musings about life not at all related to misspelled Aztec gods.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><link rel='next' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default?start-index=101&amp;max-results=100'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>109</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-2155885471449484499</id><published>2012-01-10T10:19:00.000-05:00</published><updated>2012-01-10T21:59:33.361-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mailnews'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>How bugs get fixed</title><content type='html'>Recently, I have had the &lt;del&gt;misfortune&lt;/del&gt;&lt;ins&gt;opportunity&lt;/ins&gt; to fix &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=695309"&gt;bug 695309&lt;/a&gt;. This bug is perhaps a good exemplar of why "obvious" bugs take weeks or months to fix; it is also a good example of why just reporting that a bug occurs for the filer is insufficient to fix. In the hope that others might find this useful, I will explain in a fake liveblogging format how I fixed the bug (fake because I'm writing most of these entries well after they happened).
&lt;/p&gt;&lt;h5&gt;October 20&lt;/h5&gt;&lt;p&gt;
There's a bug that "Thunderbird sometimes marks entire newsgroups as unread" in my email. Time to open up the bug report&amp;hellip; reported in version 8&amp;hellip; I'm pretty sure I've seen this once or twice before, so I don't think it's just a "the news server borked itself" issue. Time to file it away until when I have more time.
&lt;/p&gt;&lt;h5&gt;&lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=695309#c10"&gt;November 12&lt;/a&gt;&lt;/h5&gt;&lt;p&gt;
Another comment reminded me that I have this tab open. I've definitely seen it a few times, but I need to remember to keep Thunderbird open with logging enabled to figure out what's going on. It's reported as being a regression from version 8, when I checked in a slew of the &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=226890"&gt;NNTP URI parsing patches&lt;/a&gt;, so that seems like a probable target to look at. The question is why URI parsing would be causing an intermittent problem instead of a constant issue?
&lt;/p&gt;&lt;h5&gt;&lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=695309#c32"&gt;December 3&lt;/a&gt;&lt;/h5&gt;&lt;p&gt;
Some NNTP logs. Nothing is obviously amiss (not terribly unexpected, since logging either tends to drown you in useless information or omit the things you really want to know). &lt;i&gt;Note after the fact: the NNTP logs in fact contain the smoking gun; it's just that the poster trimmed it out.&lt;/i&gt;
&lt;/p&gt;&lt;h5&gt;December 6&lt;/h5&gt;&lt;p&gt;
Bienvenu comments that no developer has seen the problem; this isn't true, as I have seen it (but just taken to avoiding it as much as possible). At some point in time, I figured out that the issue could be avoided by shutting down Thunderbird before putting my laptop to sleep. After being prodded over IRC, I finally sat down and attempted to debug it. The working thesis is that the problem is that newsrc files are getting corrupted, so I faithfully save several copies of the newsrc for confirmation. The msf files are also generally good candidates, but since other flags are untouched, it's highly unlikely in this bug.
&lt;/p&gt;&lt;p&gt;
I successfully trigger the bug once. The reports are mixed: the newsrc didn't get trashed as expected at first, but later it did. However, there is a brief window of time after the bug happens which allows you to fix it, if you shut down Thunderbird. &lt;i&gt;Knowing what I now know, this is because the newsrc file is written out on a timer, so the newly-all-unread messages wouldn't have been saved to disk to show the bug.&lt;/i&gt; Since I always think of what to log after the fact, I try to trigger it a few more times.
&lt;/p&gt;&lt;p&gt;
None of the later tests are quite so successful as the first one. It does start to dawn on me that the bug probably has two parts. There is a first step that puts the group into a pre-bug situation; a second step actually produces the symptoms of the bug. &lt;i&gt;Omniscient addendum: the first step is where the bug happens; the second step is just the delay in getting the bug to occur.&lt;/i&gt; &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=695309#c39"&gt;I report my findings&lt;/a&gt;, and subsequent comments all but confirm my hypothesis for a necessary component.
&lt;/p&gt;&lt;h5&gt;December 12&lt;/h5&gt;&lt;p&gt;
This bug seems to be happening for me only when I don't want it to happen. I thought of more things to test earlier, and had them ready to copy-paste into my error console to try to find a smoking gun. This test confirms that the fault originates with the newsrcLine (so something reset that); more investigation leads to only one truly likely scenario to cause this to happen. At this point, I am all but convinced that the bug happens in two parts. All of the debugging needs to focus on when the first part happens; the symptoms (everything being marked read) are a result of Thunderbird attempting to repair the damage that had already been done. Since most people are going to try to report based on when they see the problems, I'm probably not going to get anything useful.
&lt;/p&gt;&lt;h5&gt;December 13&lt;/h5&gt;&lt;p&gt;
Hey, I found it again. Tests confirm that the high water number was first set to 0. This means either we have memory corruption or we are setting the value incorrectly initially. Looking at the code, this is set from a call to &lt;tt&gt;sscanf&lt;/tt&gt;. A simple test indicates that I can set the input value to 0 if I don't have it scan a number. Now all I need to do is figure out network traffic that can cause this. Time to try to trigger it with xpcshell tests. And then get frustrated because I still can't do it reliably.
&lt;/p&gt;&lt;h5&gt;December 18&lt;/h5&gt;&lt;p&gt;
With Wireshark (logging on Windows is rather annoying), I finally can track down the network traffic of interest. It turns out that we are off-by-one in terms of responses. This really should be enough to get xpcshell to report the error. However, it also means I probably should finally give up and go for the NNTP logs again to catch the error: it is painfully obvious that the problem is that the internal state is futzed up. This also means that other various newly-reported issues (more frequent auth failures, various alerts like "Not in a newsgroup", etc.) are pretty much confirmed to be the same bug.
&lt;/p&gt;&lt;h5&gt;January 9&lt;/h5&gt;&lt;p&gt;
I finally buckled down and turned my very old hackish code for NSPR log management into &lt;a href="https://addons.mozilla.org/en-US/thunderbird/addon/loghelper/"&gt;an extension&lt;/a&gt;. This means that I don't have to futz with environment variables on Windows, and it also gives me an easy way to trim down my log size (since unlike environment variables, I can stop logging). I test and finally get the NNTP log of interest.
&lt;/p&gt;&lt;p&gt;
Now that I have a log, I can try to work backwards and figure out how this bug happens. The failure is easily ascribable to somehow thinking we've already opened the socket when we open a socket; this much I have known for almost a month (the Wireshark told me). What the NNTP log gives me is a more fine-grained ability to try to trace the code after-the-fact to figure out where the bug is.
&lt;/p&gt;&lt;h6&gt;Working backwards from the log&lt;/h6&gt;&lt;p&gt;
NNTP logs helpfully record the address of the the NNTP protocol object when emitting output, so the best place to start is from the first line of that object. Since the numbers are hard to read, so I replace the addresses with a more obvious string like "evil." The next state was set to &lt;tt&gt;SEND_FIRST_NNTP_COMMAND&lt;/tt&gt;&amp;mdash;that clearly should be &lt;tt&gt;NNTP_LOGIN_RESPONSE&lt;/tt&gt;, so let's see why it might be skipped. The latter is set if &lt;tt&gt;m_socketIsOpen&lt;/tt&gt; is false, so perhaps someone could open a socket if it isn't set.
&lt;/p&gt;&lt;p&gt;
This variable is set only by &lt;tt&gt;nsMsgProtocol&lt;/tt&gt;, specifically the &lt;tt&gt;LoadUrl&lt;/tt&gt; method. So who might call that? Still nothing out of the ordinary is popping out at me, so it's time to return to the logs (A particularly astute developer might be able to find out the bug without returning to the log here, but I doubt most people would come up with the final flash of insight yet).
&lt;/p&gt;&lt;p&gt;
To figure out the problem, it's important (to me, at least) to set out the time frames. The natural sequence of events for a connection is to be created, connected to the server, and then go through several iterations of running URLs. Is this bad connection a new connection or an old one (since I started the log after TB started up, I can't guarantee that this is truly a new connection)? Thunderbird stupidly (but helpfully, in this case) tells me when the connection is created, since we get a log message of "creating" in the constructor. Since this message doesn't appear, it's being reused.
&lt;/p&gt;&lt;p&gt;
Wait&amp;mdash;that message is there, tucked just underneath the lines that tell me &lt;tt&gt;LoadUrl&lt;/tt&gt; is run. As I am often wont to do, I announce my revelation to IRC: "&amp;lt; jcranmer&amp;gt; bienvenu: please slap me". The &lt;a href="http://hg.mozilla.org/comm-central/file/b3944d5783fc/mailnews/news/src/nsNNTPProtocol.cpp#l306"&gt;code for the constructor&lt;/a&gt; is pretty simple, but there is one little &lt;a href="http://hg.mozilla.org/comm-central/file/b3944d5783fc/mailnews/news/src/nsNNTPProtocol.cpp#l532"&gt;function call&lt;/a&gt; that ends up causing the problem. If I tell you &lt;a href="http://hg.mozilla.org/comm-central/rev/b3944d5783fc"&gt;the exact revision&lt;/a&gt; that caused the regression (specifically, the first hunk of that the patch), can you find the regression? &lt;a href="javascript:alert('m_nntpServer is now initialized to a non-NULL value, causing the check in SetIsBusy to succeed.')"&gt;Click me if you can't&lt;/a&gt; 
&lt;/p&gt;&lt;h5&gt;Reproducing the bug&lt;/h5&gt;&lt;p&gt;
Now that I know the exact sequence of events to cause the bug, I can write a test to reliably reproduce the bug. I can also explain why the symptoms occur. First, a connection's socket dies, but it doesn't fail the timeout check in mailnews code, so it remains in the cache. Next, the server schedules a URL on the connection, which promptly dies. If there are many more URLs to be run (when we're doing, say, a timed update), then we have several elements in the queue waiting to be processed. Now, we attempt to run a new URL; with no connections available in the cache, we now create a new connection and run the new URL on that. Since the queue is not empty, the &lt;tt&gt;SetIsBusy(false)&lt;/tt&gt; call ends up pulling a queued URL and setting up the state (including opening up the connection) and running that. Then the constructor finishes, and the new URL for which the connection was constructed is used to initialize it. Since the connection is by now open, the code moves straight to the run-my-URL part of the state machine, which misinterprets the login response as the answer to the command it just sent. This ends up leading to all of the various badness subsequently observed.
&lt;/p&gt;&lt;p&gt;
The description of these events is pretty complicated. A condensed version is added to the test I constructed to get this to happen, which is called, fittingly enough, &lt;tt&gt;test_bug695309.js&lt;/tt&gt;. I am not going to try to come up with a better name, as I think most people might agree that this is where naming tests after bug numbers is most desirable. Naturally, the code to actually trigger this is complicated: I started with code to trigger just the unread effect, and then switched to monitoring the highwater mark instead (one less download of messages). I need to fill up the connection cache and then figure out how to kill the connections (restarting the server seems to do an adequate job). After that, I need to run enough URLs to cause the connections to queue, and then trigger a load of the folder in question at precisely the right moment. All of these requires paying careful attention to what happens synchronously and what does not, and it all relies on knowing the precise steps of what will happen as a result of each action. Any of a small number of changes I could make in the future is probably going to "break" the test, in that it will cease to test anything meaningful.
&lt;/p&gt;&lt;h5&gt;Postmortem on fixing it&lt;/h5&gt;&lt;p&gt;
I'm sure some people might look at this bug and ask several questions. Let me try to anticipate some of them and answer them.
&lt;/p&gt;&lt;dl&gt;
&lt;dt&gt;Why did the reviewers fail to notice this change?&lt;/dt&gt;
&lt;dd&gt;This bug is a result of the confluence of several changes. In the beginning, &lt;tt&gt;SetIsBusy&lt;/tt&gt; was mostly a no-op, so calling it from the constructor was safe. Then the URL queue was added, and &lt;tt&gt;SetIsBusy&lt;/tt&gt; was used to indicate that the connection was ready to receive a new URL from the queue. This turned it into a time bomb, since the code was correct only because the server was null in the constructor and the queue should have been empty during connection construction anyways. The final change was moving initialization of the server, which triggered the time bomb. But even then, it triggered the bomb only in a rather abnormal circumstance. At no time would the code have failed any automated tests or any of the "does your patch work" tests that a reviewer normally goes through: robustness during connection interruption is both sufficiently difficult to test and rare enough that it generally doesn't get tested.&lt;/dd&gt;
&lt;dt&gt;Why did this bug take so long to track down?&lt;/dt&gt;
&lt;dd&gt;The time since December 1 is truly a result of lack of time: I have had maybe 1 good week to work on anything due to outside factors. There are two, maybe three, people in active participation with Mozilla who know this code well, and out of those, only one was able to somewhat reliably reproduce it (specifically, the one with the least amount of time on his hands). I could have relied on others to gather the information I needed, but my previous efforts in QA have taught me that getting people to give you necessary information is often very difficult, especially when the information collectable at the obvious point in time is completely useless. I realize in retrospect that there were some more technically inclined people in the bug (at which point in time I was already devoting as much attention as I felt I could spare on the bug anyways). The bug is also particularly pernicious in that the most valuable information is that which is gathered before any visible symptoms occur; it was after I figured out how to discover that the bug was going to occur that I could track it down further.&lt;/dd&gt;
&lt;dt&gt;Will the fix be backported to more stable branches?&lt;/dt&gt;
&lt;dd&gt;At this stage, I am looking at two fixes: the small one-liner that fixes the specific bug, and the larger patch which fixes the larger issue of "the NNTP URL connection queue is a ticking time bomb". The former I would definitely like to see backported to aurora and beta (and possibly release if there are plans for another one before the next uplift), and I can't see any reason a small patch on the most-highly-voted TB bug filed since 01-01-2011 would get rejected (it is the newest bug to have more than 5 votes, and the next largest has 1/3 less and is almost twice as old).&lt;/dd&gt;
&lt;dt&gt;Why is this bug so hard to reproduce?&lt;/dt&gt;
&lt;dd&gt;If you think to the cause of the bug, it is that it is passing a check in a function that is always called. It therefore seems like this bug should be triggered a majority of the time, and not an undependable minority. As I alluded to earlier, the problem is that this also has to interact with a buggy pending URL queue management system. In short, most of the time, this queue will be empty during the initial, buggy call; with a poor timing of events (less likely to happen on most developers' computers, in short), the queue will cease to be empty and cause the problem. Indeed, when crafting my test for reproduction, I discovered that letting the event loop spin in one particular location (after killing connections) would fail to trigger the bug. In the real world, that is one of the places where the event loop is most liable to spinning for the longest.&lt;/dd&gt;
&lt;/dl&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-2155885471449484499?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/2155885471449484499/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=2155885471449484499' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/2155885471449484499'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/2155885471449484499'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2012/01/how-bugs-get-fixed.html' title='How bugs get fixed'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-3742387881720792113</id><published>2011-12-14T23:34:00.003-05:00</published><updated>2011-12-16T21:30:33.128-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='clang'/><category scheme='http://www.blogger.com/atom/ns#' term='llvm'/><category scheme='http://www.blogger.com/atom/ns#' term='dxr'/><title type='text'>2011 LLVM Developers' Meeting</title><content type='html'>This feels a bit late to be talking about the &lt;a href="http://llvm.org/devmtg/2011-11/"&gt;2011 LLVM Developers' Meeting&lt;/a&gt; (seeing as how it happened almost exactly a month ago), but since the slides have been put up over the past week and the talks were only put up on &lt;a href="http://www.youtube.com/playlist?list=PL970A5BD02C11F80C"&gt;Youtube&lt;/a&gt; this week, I suppose I can finally back notes up with links and not talk so abstractly.
&lt;/p&gt;&lt;p&gt;
Of the talks I went to, I think by far the most interesting one was Doug Gregor's talk on &lt;a href="http://llvm.org/devmtg/2011-11/Gregor_ExtendingClang.pdf"&gt;Extending Clang&lt;/a&gt;. It covered various extension points in Clang and some of their capabilities with simple examples. It also ended in what might be impolitely titled "Where Extending Clang Sucks" and what was politely titled "Help Wanted." This boils down to "plugins are hard to use" and "one-off rewriters are  hard to write". Indeed, I think the refrain of problematic architecture for further tools was repeated in more than a few presentations: Clang has the information you want and need, but squeezing the information out is inordinately difficult.
&lt;/p&gt;&lt;p&gt;
What was probably the most popular talk was Chandler Carruth's talk on &lt;a href="http://www.youtube.com/watch?v=mVbDzTM21BQ&amp;list=UUwYXEwMhYcU6BiRyTi78DgQ&amp;index=6&amp;feature=plcp"&gt;Clang MapReduce&amp;mdash;Automatic C++ Refactoring at Google Scale&lt;/a&gt; (his slides appear to not yet be posted). From what I recall, the more interesting parts of the talk are closer to the end. He had a discussion on developing a language for semantic queries to identify code that needs to be replaced (the example query was essentially "find all calls to Foo::get()"); the previous things people have tried are regexes (C++ is not syntactically regular, it's recursively enumerable), XPath (unfortunately, ASTs, despite their name, aren't exactly tree structures), and pattern matching ASTs (not always sufficient textual clues). The team's idea was to use a matcher library on "AST" values as predicates. He also spoke a bit about some of the efforts they did on using Clang with Chromium.
&lt;/p&gt;&lt;p&gt;
There is a point brought up in Chandler's talk that I want to reiterate. One of the problems with writing refactoring tools (or static analysis tools in general) is in getting people to trust that they work. For any sufficiently large codebase&amp;mdash;millions of lines of code or more&amp;mdash;it is fairly certain that the project will run up against the nasty edge cases in parsing tools. If a tool is intended to run mostly automated analysis on such code, it is impossible for anyone to be able to look at the output and ensure complete correctness. However, most sufficiently large codebases come with massive testsuites to help people believe their code is correct; if the same parser is capable of producing code that passes the entire testsuite, then one only needs to trust that the analysis is done correctly, a much smaller task. Without such a capability, no one would be willing to trust the analysis; in other words, any parser which is not capable of then generating code is never going to be trusted as a basis for further work, at least, not if you are looking at million-line projects.
&lt;/p&gt;&lt;p&gt;
Speaking of Chromium and Clang, there was &lt;a href="http://llvm.org/devmtg/2011-11/Weber_Wennborg_UsingClangInChromium.pdf"&gt;talk on this&lt;/a&gt; too. As I am sure most of my audience is well aware, Chromium uses Clang for several of its buildbots (and all of its recent Mac development); I wish Mozilla could get Clang to be a tier 2 or tier 1 platform for Mac and Linux. As for why they prefer it, the brief rundown is this: better diagnostics, faster, smaller object sizes, they can write a style checker, and they can build better tools off of it (like AddressSanitizer, which, unsurprisingly, itself had &lt;a href="http://llvm.org/devmtg/2011-11/Serebryany_FindingRacesMemoryErrors.pdf"&gt;a talk&lt;/a&gt;). Again, there were the complaints: building the rewriter proved to be difficult a task (notice a trend here?). Incidentally, I also attended the AddressSanitizer talk, but I'm getting tired of copying down notes of talks, so I'll let those slides and &lt;a href="http://www.youtube.com/watch?v=CPnRS1nv3_s&amp;list=UUwYXEwMhYcU6BiRyTi78DgQ&amp;index=14&amp;feature=plcp"&gt;the video&lt;/a&gt; speak for themselves.
&lt;/p&gt;&lt;p&gt;
Finally, I did give &lt;a href="http://www.youtube.com/watch?v=jsbfrx38djQ&amp;list=UUwYXEwMhYcU6BiRyTi78DgQ&amp;index=2&amp;feature=plcp"&gt;a talk on DXR&lt;/a&gt;. It seemed to be well-received; I had a few people coming up to me later in the day thanking me for the talk and asking more questions about DXR. Something I did discover when giving it, though, is just how difficult giving a talk really is. I had specific notes on the slides written in my notebook, only to discover that I wasn't able to gracefully retreat to the podium and read the notes long enough to figure out what to say next. If you're wondering what I forgot to mention in the talk, it's mostly a more coherent explanation during the undead demo.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-3742387881720792113?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/3742387881720792113/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=3742387881720792113' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/3742387881720792113'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/3742387881720792113'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2011/12/2011-llvm-developers-meeting.html' title='2011 LLVM Developers&apos; Meeting'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-2031812973498092379</id><published>2011-10-15T14:46:00.002-04:00</published><updated>2011-10-15T15:28:53.445-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>How I got involved with Mozilla</title><content type='html'>My journey to become a contributor to Mozilla started during the summer of 2007, which was when I started very heavily using Thunderbird. The initial impetus is probably best traced to &lt;a href="http://groups.google.com/group/comp.lang.java.programmer/browse_thread/thread/100ee519f0335092/13734a7fa97ff0b0"&gt;this thread in a newsgroup&lt;/a&gt;: a long, semi-flame war. I wanted to excise the flame war portions from the rest of the thread, so I filed &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=392404"&gt;bug 392404&lt;/a&gt; (the first bug I ever filed!). That bug was marked as a duplicate of &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=11054"&gt;bug 11054&lt;/a&gt;.
&lt;/p&gt;&lt;p&gt;
At that time, I had a fair amount of free time (this was around the time my summer internship ended but before school started). I downloaded the source code to Thunderbird and then built it, only to discover that every time I built it, it crashed in a linker. After bugging people about it on IRC, I learned that I needed a swap file, and I finally managed to get a successful build. Then, having had programming experience, I decided just to fix the bug myself. I honestly can't remember for certain, but I believe I built the original patch without much aide from people on IRC--it was only the inability to build that I had needed to ask for help.
&lt;/p&gt;&lt;p&gt;
I do remember that I needed guidance on how to get a patch committed. This being mailnews code around the time that Mozilla Messaging was being set up, the pool of potential reviewers and superreviewers was rather slim. I had been advised that David Bienvenu and Scott McGregor were my best candidates for review. Unfortunately, around this time, Scott had cut off all work with Mozilla; after two months of not getting any response from him whatsoever, I switched to another reviewer.
&lt;/p&gt;&lt;p&gt;
While I was waiting for the patch to be reviewed, I recall Dan Mosedale talking about &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=132340"&gt;bug 132340&lt;/a&gt; indirectly; later, when I asked if there was a fix that people would be interested in, that bug was what I was pointed towards. This patch took about two months and four review cycles to be accepted, but it eventually satisfied my reviewer and superreviewer and became my first contribution to Mozilla. That first patch I worked on? It turned out to be significantly more complicated than I first appreciated and finally was closed 9 months after I started work on it. After bug 132340, I started getting more and more involved with Mozilla development and have been a contributor since then.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-2031812973498092379?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/2031812973498092379/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=2031812973498092379' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/2031812973498092379'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/2031812973498092379'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2011/10/how-i-got-involved-with-mozilla.html' title='How I got involved with Mozilla'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-5315109489027564153</id><published>2011-09-19T22:17:00.002-04:00</published><updated>2011-09-19T22:41:00.720-04:00</updated><title type='text'>Public Service Announcement #2</title><content type='html'>A brief note to all application and operating system developers:
&lt;/p&gt;&lt;p&gt;
Failure to respond to a prompt does not indicate accession to perform the actions that would otherwise occur. Especially if those actions have a high risk of losing my data. So I would like to thank you (whichever product is responsible for my most recent undesired restart) for completely obliterating my class notes. At the very least, you did not destroy all of my research too, just a day's worth.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-5315109489027564153?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/5315109489027564153/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=5315109489027564153' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/5315109489027564153'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/5315109489027564153'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2011/09/public-service-announcement-2.html' title='Public Service Announcement #2'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-3166885380945424459</id><published>2011-08-03T23:49:00.003-04:00</published><updated>2011-08-04T03:41:13.962-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='dxr'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>Not-so-random mozilla-central factoids</title><content type='html'>Back when I was looking at reducing disk usage of running DXR, I used the sizes of the generated CSV files to make a rough estimate of how many times the average line of mozilla-central needs to be parsed and compiled (the answer is around 20). Now, having just respun a build of mozilla-central with the newest version of DXR, I have a larger database with some fairly accurate statistics of its size. This all started when I wondered what our most common function name was, so I decided to make a list of factoids (it's also a chance to refresh my SQL memory). All of the statistics come from a build of Firefox today on Linux x86-64 with debug and tests disabled.
&lt;/p&gt;&lt;h4&gt;Notes on accuracy&lt;/h4&gt;&lt;p&gt;
DXR tries as hard as possible to correlate data back to the original source code, and it is very likely that it can get itself confused in some weird circumstances. I don't yet have a good idea of where all of the buggy cases are, but I am aware of some broad strokes. Pretty much all information that deals with references is likely missing large swathes of the codebase. Information about callers only counts explicit calls (i.e, call expressions). The most accurate data I have is the macro information and the least accurate is information involving a templated class in almost any fashion. I also suspect that scope information is malformed in a non-empty set of cases, and I'm pretty sure that the number of global objects in any count is overcounted.
&lt;/p&gt;&lt;h4&gt;Type statistics&lt;/h4&gt;&lt;p&gt;
I count in mozilla-central just 33,555 distinct types. Of these, we have 13,648 typedefs, 6,523 structs, 6,470 classes, 5,147 enums, 1,410 interfaces, and 357 unions. Of all of these, 13,740 types are nested in another in some fashion, and another 4,524 types are templated. I'm not counting separate template instantiations as new types, but I am counting specializations individually.
&lt;/p&gt;&lt;p&gt;
Then there's the inheritance. I found 7,715 direct inheritance relations, all but a handful of which (around 200) are public. These relations account for 2,317 distinct base classes and 6,157 distinct subclasses. Naturally, some types are inherited much more than others. The winner, by a factor of 6, is &lt;tt&gt;nsISupports&lt;/tt&gt;, having a whopping 3,156 implementations. &lt;tt&gt;Pickle&lt;/tt&gt; and &lt;tt&gt;IPC::Message&lt;/tt&gt; tie for second place with 501 implementations each; &lt;tt&gt;nsIRunnable&lt;/tt&gt; takes 4th place at 282 implementations. The 5th place goes to &lt;tt&gt;nsXPCOMCycleParticipant&lt;/tt&gt; (242). Rounding out the top 10 are &lt;tt&gt;nsIDOMElement&lt;/tt&gt; (226), &lt;tt&gt;nsRunnable&lt;/tt&gt; (224), &lt;tt&gt;nsScriptObjectTracer&lt;/tt&gt; (209), &lt;tt&gt;nsSupportsWeakReference&lt;/tt&gt; (154), and &lt;tt&gt;nsIObserver&lt;/tt&gt; with 144. Subclass relationships involving templates are not counted, which may bump a few classes up into this list.
&lt;/p&gt;&lt;h4&gt;Macros&lt;/h4&gt;&lt;p&gt;
There are 42,457 distinct definitions of macros to produce 38,045 distinct names. Of these, 30,475 look like variables and 7,647 look like functions, which implies that 77 macros take on both depending on how they got defined.
&lt;/p&gt;&lt;p&gt;
How about calling them? I count just 482,142 macro invocations, so each macro is being invoked about 12 times on average. But... 20,628 of our macros are never used (or almost 48.6% of them), so the average is closer to 25 times. Of course, some macros really get used. Here are the top 5:
&lt;/p&gt;&lt;table&gt;
&lt;tr&gt;&lt;th&gt;Count&lt;/th&gt;&lt;th&gt;Macro name&lt;/th&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;23,109&lt;/td&gt;&lt;td&gt;nsnull&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;22,941&lt;/td&gt;&lt;td&gt;PR_FALSE&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;20,139&lt;/td&gt;&lt;td&gt;NS_OK&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;13,154&lt;/td&gt;&lt;td&gt;NS_IMETHOD&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;13,138&lt;/td&gt;&lt;td&gt;NS_IMETHODIMP&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;/p&gt;&lt;h4&gt;Functions&lt;/h4&gt;&lt;p&gt;
There are 137,903 functions in mozilla-central. Of these, there are 68,867 having a distinct name. Of these, I found 33,693 in the global scope, and 8,943 that were a member of a class. Directly templated functions comprise about 2,291 functions. Just for fun, I found that there are 774 functions named exactly "Init" and a further 1,681 that begin with "Init" (case-insensitively in the last case).
&lt;/p&gt;&lt;p&gt;
In terms of calling these functions, I found 291,598 distinct edges in the callgraph (this is definitely an underestimate, since I am missing a large number of cases). For my usage, a callgraph is not a traditional directed graph but rather a hypergraph, where each edge goes from a single head to a set of nodes in the tail. These comprise 85,144 distinct callers and 67,098 distinct targets. Of the targets, I found 51,590 distinct functions being called statically, 13,274 distinct virtual functions invoked, and 2,234 distinct function pointers or pointers-to-member-functions being called. If I break it up by calls, 246,086 of the calls are static function calls, 41,770 virtual function calls, and 3,742 function pointer calls. I want to emphasize here that information pertaining to templates, in particular &lt;tt&gt;nsCOMPtr&lt;/tt&gt; is completely missing, so a lot of calls to &lt;tt&gt;nsISupport&lt;/tt&gt;'s methods are missing, which is going to horribly skew the statistics.
&lt;/p&gt;&lt;p&gt;
Counting the function pointers, we have 65 pointers-to-member-functions and around 1700-1800 function pointers (those numbers do not add up to what I should get above, but I'm not sure who's in error here). I count about 19,721 virtual functions that I generated target information for (a subset of all virtual functions), and 65,761 implementations of those virtual functions, so the average virtual function has about 3.4 implementations. In addition, I found about 981 of these virtual functions were also called statically.
&lt;/p&gt;&lt;p&gt;
Now I'm sure, having mentioned it earlier, that you too are now wondering what the most common function name in mozilla-central is. The answer should be pretty obvious when you consider the most heavily-implemented class. And the winners are...
&lt;/p&gt;&lt;table&gt;
&lt;tr&gt;&lt;th&gt;Count&lt;/th&gt;&lt;th&gt;Function name&lt;/th&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;1,687&lt;/td&gt;&lt;td&gt;Release&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;1,680&lt;/td&gt;&lt;td&gt;AddRef&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;1,600&lt;/td&gt;&lt;td&gt;GetIID&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;1,587&lt;/td&gt;&lt;td&gt;QueryInterface&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;774&lt;/td&gt;&lt;td&gt;Init&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;659&lt;/td&gt;&lt;td&gt;operator=&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;549&lt;/td&gt;&lt;td&gt;Log&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;517&lt;/td&gt;&lt;td&gt;Read&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;506&lt;/td&gt;&lt;td&gt;Write&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;396&lt;/td&gt;&lt;td&gt;GetType&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;&lt;p&gt;
The top 4 methods are related to XPCOM; the famed &lt;tt&gt;Init&lt;/tt&gt; method is a mere 5th place. Of interesting note is that there are 659 assignment operators; I'm guessing some of these may be default copy constructors implemented for non-POD classes.
&lt;/p&gt;&lt;h4&gt;Variables&lt;/h4&gt;&lt;p&gt;
There's not much to say here. We have some 623,237 variables of some kind. Most of these, naturally, are local variables or parameters: 516,627, to be precise. We additionally have 82,008 members of some compound type, and 24,602 global variables. Some interesting statistics would be to compute the number of variables whose names defy our naming convention or the number of static constructors that need to be run before startup, but that data is harder to compute.
&lt;/p&gt;&lt;h4&gt;Warnings&lt;/h4&gt;&lt;p&gt;
I count 5,283 reported warnings for mozilla-central. Of these, 1,608 are warning about the use of non-virtual destructors with virtual functions. Another 940 warn about our use of mismatched enumeration types. There are 1,348 warnings of unused things. Finally, there are 1,551 warnings about use of extensions, leaving 466 "miscellaneous warnings".
&lt;/p&gt;&lt;h4&gt;Build statistics&lt;/h4&gt;&lt;p&gt;
The final set of statistics I have is mere size statistics for the build information. There are about 8,581,746 non-empty lines of text comprising some 320MiB of data. These are organized into about 51,469 files, including 1,469 files of generated files in the build directory. My output SQLite file was 464MiB, easily comprising around 3.5 million rows of data. We also have about 38MiB of binary files (like PNGs) in the source tree. Almost 15MiB of the generated included files were produced, comprising around 354,893 non-empty lines of text.
&lt;/p&gt;&lt;p&gt;
I think the best way to summarize this data is "mozilla-central is a massive codebase." It also goes to show you why just looking at source code without compiling is a bad idea: around 3-5% of our source code actually isn't in the source tree to begin with but is instead automatically created at compile time.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-3166885380945424459?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/3166885380945424459/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=3166885380945424459' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/3166885380945424459'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/3166885380945424459'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2011/08/not-so-random-mozilla-central-factoids.html' title='Not-so-random mozilla-central factoids'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-3013115740463427898</id><published>2011-08-02T20:23:00.003-04:00</published><updated>2011-08-02T20:36:29.530-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>Fixing sqlite compatibility issues</title><content type='html'>While testing the separation of database generation and the actual webpage, I had discovered that there was a weird bug in that SQLite cheerily opened the database but then complained that the database was invalid. I thought this was quite weird, and double-checked that the file had copied correctly: same size, correct permissions, &lt;tt&gt;file&lt;/tt&gt; gives the same information (SQLite database, version 3). After a bit of thought, I found the problem:
&lt;/p&gt;
&lt;pre&gt;
[mozbuild@dm-dxr01 ~]$ sqlite3 -version
3.3.6
...
jcranmer@xochiquetazal ~ $ sqlite3 -version
3.7.7 2011-06-23 19:49:22 4374b7e83ea0a3fbc3691f9c0c936272862f32f2
&lt;/pre&gt;&lt;p&gt;
The databases were incompatible. So I did some investigation and I found an easy way to fix this:
&lt;/p&gt;&lt;pre&gt;
jcranmer@xochiquetzal ~ $ sqlite3 -line '.dump' spidermonkey.sqlite &gt; /tmp/statements.sql
[mozbuild@dm-dxr01 ~]$ sqlite3 -init /tmp/statements.sql spidermonkey.sqlite
&lt;/pre&gt;&lt;p&gt;
That worked wonderfully. So if you ever need to fix a problem with SQLite-incompatible versions, that is how you dump a database to sqlite and import it again. While I'm on the topic, this is worth paying attention to:
&lt;/p&gt;&lt;pre&gt;
-rw-r--r-- 1 jcranmer   jcranmer   27394048 Aug  2 17:13 spidermonkey.sqlite
-rw-r--r-- 1 jcranmer   jcranmer   27419572 Aug  2 17:17 statements.sql
&lt;/pre&gt;&lt;p&gt;
The list of SQL statements is only 0.1% larger than the SQLite file. If I look at the older database, it's actually smaller than the database. Food for thought.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-3013115740463427898?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/3013115740463427898/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=3013115740463427898' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/3013115740463427898'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/3013115740463427898'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2011/08/fixing-sqlite-compatibility-issues.html' title='Fixing sqlite compatibility issues'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-4690794888656810303</id><published>2011-07-26T13:29:00.003-04:00</published><updated>2011-07-26T15:14:29.927-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='dxr'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>Callgraph</title><content type='html'>There remains, as of this blog post, one major regression from the original dehydra-based DXR, and that is the lack of support for callgraph. The main problem I have here is trying to pin down the scope of the feature. To give an idea of why this hard, I pose the following challenge: how many different ways are there in C++ of invoking a function, as would be viewed from the generated assembly code? Here is my list:
&lt;/p&gt;&lt;dl&gt;
&lt;dt&gt;Global function&lt;/dt&gt;
&lt;dt&gt;Invocation of non-virtual member&lt;/dt&gt;
&lt;dt&gt;Nonvirtual invocation of virtual member&lt;/dt&gt;
&lt;dt&gt;Constructor, copy constructor invocations&lt;/dt&gt;
&lt;dt&gt;Implicit destructor invocation&lt;/dt&gt;
&lt;dt&gt;Functors (objects that implement &lt;tt&gt;operator()&lt;/tt&gt;)&lt;/dt&gt;
&lt;dt&gt;C++0x lambdas (which appears to be a functor)&lt;/dt&gt;
&lt;dd&gt;Count all of these as "one," if you like, since they're all still more or less boil down to a &lt;tt&gt;call addr&lt;/tt&gt; instruction, so it really boils down to how pedantic you get about the differences. Most of these look fairly distinct in source code level as well too. But, on the plus side, being a statically-known function call, all of these are easy to handle.&lt;/dd&gt;
&lt;dt&gt;Virtual member function invoked virtually&lt;/dt&gt;
&lt;dd&gt;This is clearly different at an assembly level, since it requires a vtable lookup with the added possibility of thunking. From a handling perspective, though, this is more or less statically known, since we know the possible targets of a virtual method call are limited to subclasses of the method, if we assume that people &lt;a href="http://dxr.mozilla.org/mozilla/mozilla-central/xpcom/reflect/xptcall/src/xptcall.cpp.html#l69"&gt;aren't sneaky&lt;/a&gt;.&lt;/dd&gt;
&lt;dt&gt;Function pointers&lt;/dt&gt;
&lt;dd&gt;Easy to dip into, impossible to think about off the top of your head (how many tries does it take you to write the type of a function pointer that returns a function pointer without using typedefs?), and it requires data-flow analysis to solve. The main question is if it's even worth trying to at least plan to support function pointers, or if they should just be left out entirely.&lt;/dd&gt;
&lt;dt&gt;Pointer-to-member function pointers&lt;/dt&gt;
&lt;dd&gt;Theoretically easier than function pointers (because we again have a more limited state space), but they still suffer from the same need to data flow analysis. Not to mention that they are much harder to use, and that their implementations are &lt;a href="http://www.codeproject.com/KB/cpp/FastDelegate.aspx"&gt;much different from function pointers&lt;/a&gt;.&lt;/dd&gt;
&lt;dt&gt;Template dependent name resolution&lt;/dt&gt;
&lt;dd&gt;Templates, as far as C++ is concerned, is essentially a macro system that is more powerful and more inane than the C preprocessor. So if the expression that is to be called is dependent on the type parameter, then the function call can be any (or all) of the above depending on what was passed in.&lt;/dd&gt;
&lt;dt&gt;&lt;tt&gt;throw&lt;/tt&gt;&lt;/dt&gt;
&lt;dd&gt;The entire exception handling semantics in C++ essentially gets handled by the ABI as a combination of magic function calls and magic data sections; undoubtedly a throw itself is a function call which gets to do stack unwinding. Most of this stuff isn't exposed in the code, it just leaks out in semantics.
&lt;/dl&gt;&lt;p&gt;
One goal I've had in mind is to make sure I can do some reasonable support for dynamic languages as well as static languages. To that end, I've looked up other callgraph implementations to see what they do in the realm of indirect function calls. The answer is a mixture of either "do runtime instrumentation to build the callgraph" or "ignore them altogether." The closest I've seen is one implementation that indicated when the address of a function was taken.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-4690794888656810303?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/4690794888656810303/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=4690794888656810303' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/4690794888656810303'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/4690794888656810303'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2011/07/callgraph.html' title='Callgraph'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-7448441412371688024</id><published>2011-07-22T15:27:00.002-04:00</published><updated>2011-07-22T19:07:32.572-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='dxr'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>Viewsource up and running</title><content type='html'>This morning, I have made available a &lt;a href="http://dxr.mozilla.org/viewsource/"&gt;public&lt;/a&gt; version of the &lt;a href="https://github.com/jcranmer/viewsource/"&gt;viewsource&lt;/a&gt; code. Viewsource was originally a web tool to output a pretty-printed version of the &lt;a href="https://developer.mozilla.org/En/Dehydra"&gt;Dehydra&lt;/a&gt; objects for simple snippets of source code. Since I found it a useful idea for debugging, I also added dumps for the now-defunct Pork rewriting toolset, &lt;a href="https://developer.mozilla.org/En/JSHydra"&gt;JSHydra&lt;/a&gt;, and now, finally &lt;a href="http://clang.llvm.org"&gt;Clang&lt;/a&gt;.
&lt;/p&gt;&lt;p&gt;
The odd tool out here is clang, as it has neither a debug-useful XML output nor a simple JS representation that can report the output, which requires me to make the tool a two-stage process: the first stage is a compiler plugin that figures out from &lt;tt&gt;clang::RecursiveASTVisitor&lt;/tt&gt; which functions and classes it needs to worry about dumping out, and then writes out the code for the second stage, another compiler plugin that actually dumps the generated AST information to the console. In other words, I have a plugin to write a plugin that is used to dump information to write plugins.
&lt;/p&gt;&lt;p&gt;
Eventually, I hope to be more complete in the information I can dump out (particularly type locations and type information), but this is complete enough to be at least somewhat useful for finding interesting things, such as understanding what the location information actually refers to. As I work on this tool more and more, this should enable me to find and fix many bugs in DXR.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-7448441412371688024?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/7448441412371688024/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=7448441412371688024' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/7448441412371688024'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/7448441412371688024'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2011/07/viewsource-up-and-running.html' title='Viewsource up and running'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-1893874934618479365</id><published>2011-07-21T20:05:00.003-04:00</published><updated>2011-07-22T13:01:15.519-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='dxr'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>DXR alpha 3 release</title><content type='html'>I am pleased to announce the third release of &lt;a href="http://wiki.mozilla.org/DXR"&gt;DXR&lt;/a&gt;. Compared to the previous release, the UI has been tweaked and several bugs have been noticed and fixed. In particular, I have improved the support of links, and fixed the bug that prevented the type hierarchy from being properly realized. I have also introduced support for indexing the IDL code in Mozilla's codebase, as well as indexing generated code in general. I have also added yet another tree, &lt;a href="http://dxr.mozilla.org/comm-central"&gt;comm-central&lt;/a&gt;.
&lt;/p&gt;&lt;p&gt;
Compared to the original implementation of DXR by Dave Humphrey, there remains one unimplemented feature, namely support for the callgraph. My plan for the next release will focus on implementing this feature as well as any other bugs I uncover via real-world testing. The other major features for the next release will be a JSON query API for data and improved support for multiple trees.
&lt;/p&gt;&lt;p&gt;
In a more general viewpoint, the plugin architecture has been much improved. There is now support for viewing code-coverage data in the source tree (I do not have a visible demo of this yet), and the support for IDL is entirely a separate plugin. Implementing these plugins has caused me to realize that there is need for better hooks in the current architecture (especially with regards to the generated HTML), and these will also be forthcoming in the next release.
&lt;/p&gt;&lt;p&gt;
The final point to make is that I have moved the &lt;a href="http://dxr.mozilla.org/viewsource"&gt;viewsource&lt;/a&gt; code into &lt;a href="https://github.com/jcranmer/viewsource/"&gt;its own repository&lt;/a&gt;, as well as adding support for a limited subset of the clang AST. The website does not properly support clang at this time owing to problems in the server's current toolchain configuration; when these issues are resolved, I will be demonstating the tool better.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-1893874934618479365?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/1893874934618479365/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=1893874934618479365' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/1893874934618479365'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/1893874934618479365'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2011/07/dxr-alpha-3-release.html' title='DXR alpha 3 release'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-6895654610238403993</id><published>2011-07-13T22:44:00.003-04:00</published><updated>2011-07-13T23:20:05.693-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mailnews'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>Random discovery</title><content type='html'>I've been playing around with the idea of replacing mork for leveldb, the two being at roughly the same "no feature database" level. The sanest way to do a decent head-to-head comparison involves picking the least powerful wrapping around this API (the message folder cache) and then reimplementing it twice. Actually, I originally intended to try implementing mdb with a leveldb backend for a drop-in replacement to mork. Five minutes successfully dissuaded me, as the mdb interfaces require you to implement about five interfaces to get to "open a database".
&lt;/p&gt;&lt;p&gt;
For realistic testing, I grabbed a real database and instrumented all of the accesses in real use cases (i.e., startup and shutdown). Since the real data is what I love to call my Profile from Hell™, there's enough data that I can get some reasonable statistical accuracy in timing. I'm also testing what is a path of note in Thunderbird startup, so I made sure to test cold startup, which, incidentally, is very annoying. A 1 second test takes several seconds to reload xpcshell.
&lt;/p&gt;&lt;p&gt;
The first performance results were very surprising to me. I expected that not touching the knobs would probably result in a mild performance regression (mork is fast by virtue of choosing the "space" option in the space/time tradeoff). What I wasn't expecting was something that was an order of magnitude slower. I eventually got off my butt and hooked up jprof to see what was going wrong. Unfortunately, I forgot that jprof's visualization of output results sucks, so I traipsed back to my very old post to pull down my script to turn the dumb HTML file into a more manageable dot file (which needed fixing, since the only commit to jprof in 3 years changed the output syntax on me).
&lt;/p&gt;&lt;p&gt;
Clearly, the output file implicates JS as being the hotpath. Since this is xpcshell and I had to hack the jprof initialization in, I decided to move the jprof initialization to the execution point to eliminate the effect of parsing my ~20K-line input file (I did say it loads a lot of data). When I &lt;a href="http://quetzalcoatal.blogspot.com/2008/07/profiling-made-visual.html"&gt;last used jprof&lt;/a&gt; for serious performance tracking, I noted that the main regression appeared to be caused by &lt;tt&gt;XPCThrower::ThrowBadResult&lt;/tt&gt; (about 60%, apparently). So here, the main regression appears to be... you guessed it, JS exceptions (the method names have changed in 3 years, though).
&lt;/p&gt;&lt;p&gt;
That is truly unexpected, since I'm not supposed to be actually changing any API. After looking back at the code and inspecting much more deeply, I found out that I actually was changing API. You see, the &lt;tt&gt;nsIMsgFolderCacheElement&lt;/tt&gt; has two ways to access properties: as strings or as integers. Accessing integers clearly throws an error if the value isn't there; accessing strings also appears to throw an error. Assuming this to be the case, I explicitly threw an error in both cases in my new implementation. It turns out that the current implementation actually doesn't throw for strings (I'm not sure if it's intended to or not). Fixing that greatly improved the times. A brief summary of the real results is that mork is faster on startup, even on general property access, and slower on shutdown, only the last of which is not surprising.
&lt;/p&gt;&lt;p&gt;
In summary: throwing JS exceptions, at least via XPCOM, is very, very, very slow. If that code is a potential hotpath, just say no to exceptions.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-6895654610238403993?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/6895654610238403993/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=6895654610238403993' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/6895654610238403993'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/6895654610238403993'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2011/07/random-discovery.html' title='Random discovery'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-4968629716628640276</id><published>2011-07-12T20:26:00.002-04:00</published><updated>2011-07-12T21:33:06.234-04:00</updated><title type='text'>Location information</title><content type='html'>One of the hardest parts about writing a compiler is producing correct location information. Well, it is very easy to get the precise location of a token (most of the time, in any case), but the hard part is assigning token locations to actual AST productions, since they are composed of several tokens. Take a simple function definition: &lt;code&gt;void foo(int bar) {}&lt;/code&gt;. Nearly every token in that definition can be argued to be the right location of the function. Hence showing only line information is common for compilers: in most normal usage, you get the same information no matter which token you pick. However, there are times when the actual column number is crucial: if you need to correlate something to the original source code (e.g., for rewriting or for DXR), you want that column number to be spot on.
&lt;/p&gt;&lt;p&gt;
Now let's consider pathological cases. Suppose we define that function in a templated class. Do I want the location of the template or the location of the template instantiation? The former is often better, but there are times you want the latter. But you can also make the function in a macro: &lt;code&gt;#define FUNCTION(name) void name(int bar) {}&lt;/code&gt; and &lt;code&gt;FUNCTION(foo)&lt;/code&gt;. Do I want the location within the macro definition or the location in the macro instantiation? Macros let you be really evil: &lt;code&gt;#define FUNCTION(name) void start##name(int bar) {}&lt;/code&gt;. Now which location do I want? Oh yeah, and while we're on the topic, there is also the issue of autogenerated code (e.g., I want the location to be what it was in the original file). So which location do I use?
&lt;/p&gt;&lt;p&gt;
There are two orthogonal issues here: which token do I use in the "final" version of the text, and which version of the text do I grab it from. There are several versions of the text (think of the lexer not as a single unit but instead of a chain of steps that pass the information between them); we don't care about some of the intermediate representations (e.g., what the code looks like after replacing trigraphs). The lowest level of these is the position in the last form of text, just before being parsed for real. It's also important to track the position through repeated invocations of macros (most of the time). We also want the location in the text file as it is when fed to the compiler to begin with, as well as the position in the original file before it is fed to whatever output the C/C++ source file.
&lt;/p&gt;&lt;p&gt;
Clang gives us some of this information for any &lt;tt&gt;SourceLocation&lt;/tt&gt;. The source of the actual token is the &lt;i&gt;spelling&lt;/i&gt; location. On the other hand, the &lt;i&gt;instantiation&lt;/i&gt; location is the location of the outermost macro invocation to generate the production. It is possible to get the tree of locations by repeatedly calling &lt;tt&gt;getImmediateSpellingLocation&lt;/tt&gt;, which only looks down one level of invocation. To get the result after following &lt;tt&gt;#line&lt;/tt&gt; directives, you need to use the &lt;i&gt;presumed&lt;/i&gt; location, which looks at the instantiation location. It is possible to pass in the spelling location to &lt;tt&gt;getPresumedLoc&lt;/tt&gt; to produce the correct results, though. Unfortunately, precise location information goes out the window if your macro uses &lt;tt&gt;#&lt;/tt&gt; or &lt;tt&gt;##&lt;/tt&gt; (it goes into "scratch space" whose line numbers don't correlate well enough to the original source code for me to make sense of).
&lt;/p&gt;&lt;p&gt;
The other direction is one of the possible methods to get the source location in the class. A function declaration merely has 6 of these: the start and end of the declaration, the location of the name identifier, the location of the type specifier, the location after the template stuff, and the location of the right brace. I would go in more detail, but I haven't yet catalogued all of the ways to get this information to be sure that the method names actually mean what they claim to mean.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-4968629716628640276?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/4968629716628640276/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=4968629716628640276' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/4968629716628640276'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/4968629716628640276'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2011/07/location-information.html' title='Location information'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-7529003558371626850</id><published>2011-07-06T12:59:00.003-04:00</published><updated>2011-07-06T15:09:57.083-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='dxr'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>DXR alpha 2 release</title><content type='html'>I am pleased to announce a second alpha release of &lt;a href="https://wiki.mozilla.org/DXR"&gt;DXR&lt;/a&gt;. Several improvements have been made, including changes to the UI as well as better cooperation with macros. A complete list of changes can be found at the end. As this is the second release, I have expanded the live demo to include not one but two separate indexing trees:
&lt;a href="http://dxr.mozilla.org/clang"&gt;clang&lt;/a&gt; and &lt;a href="http://dxr.mozilla.org/mozilla/"&gt;mozilla-central&lt;/a&gt;, both of which are current as of last night.
&lt;/p&gt;&lt;p&gt;
The advantage of DXR is that it sees the source code by instrumenting the compiler at build time, so it is able to understand classes most of whose methods are defined by macros, like Clang's &lt;a href="http://dxr.mozilla.org/clang/clang/include/clang/AST/RecursiveASTVisitor.h.html"&gt;RecursiveASTVisitor&lt;/a&gt;. Like any source code browser, it is possible to click on an identifier and then go see information about the identifier, such as its points of declaration, or its definition. Unlike LXR and MXR, however, it knows what type of identifier it's working with, so you won't be offered the definition of &lt;tt&gt;BuiltinType::getKind&lt;/tt&gt; if you look up the &lt;tt&gt;getKind&lt;/tt&gt; method of an &lt;tt&gt;Expr&lt;/tt&gt; object.
&lt;/p&gt;&lt;p&gt;
Another new feature is support for plugins, to allow people to use different compilers, add in support for more languages, or even augment the UI with new features like capturing line-coverage information. The current support is rudimentary, but there is already active work on a &lt;a href="https://github.com/groleo/dxr"&gt;dehydra-based indexer&lt;/a&gt;.
&lt;/p&gt;&lt;p&gt;
A list of new features:
&lt;/p&gt;&lt;ul&gt;
&lt;li&gt;Improved search speed at runtime, and dropped glimpse as a prerequisite&lt;/li&gt;
&lt;li&gt;The web UI works in newer versions of Firefox&lt;/li&gt;
&lt;li&gt;Support for plugins has been added&lt;/li&gt;
&lt;li&gt;Disk space requirements have been radically reduced&lt;/li&gt;
&lt;li&gt;Setup configuration is easier&lt;/li&gt;
&lt;li&gt;Link extents have been fixed in many cases&lt;/li&gt;
&lt;li&gt;Declarations can now be seen along with definitions&lt;/li&gt;
&lt;li&gt;Macros can now be browsed like source code&lt;/li&gt;
&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-7529003558371626850?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/7529003558371626850/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=7529003558371626850' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/7529003558371626850'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/7529003558371626850'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2011/07/dxr-alpha-2-release.html' title='DXR alpha 2 release'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-4413386534702480224</id><published>2011-06-30T19:20:00.003-04:00</published><updated>2011-06-30T19:23:56.996-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>Google Groups Public Service Announcement</title><content type='html'>For the past few days, it appears that Google Groups has stopped mirroring newsgroups properly; this included the various Mozilla mailing lists. It does, however, appear to be forwarding posted messages, so if you send a message via the UI and you don't see anything show up... don't panic, and don't repost it 5 times.
&lt;/p&gt;&lt;p&gt;
I don't know how long Google Groups will be down, but, in the meantime, you can use your favorite newsreader to read the newsgroups. Heck, &lt;a href="https://www.mozilla.org/en-US/thunderbird/"&gt;Thunderbird&lt;/a&gt; has a built in news client. If you wish to use mozilla mailing lists, just add in the server news.mozilla.org. You can then happily see everything that you would be able to see with Google Groups, if it were working.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-4413386534702480224?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/4413386534702480224/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=4413386534702480224' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/4413386534702480224'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/4413386534702480224'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2011/06/google-groups-public-service.html' title='Google Groups Public Service Announcement'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-7862198853929372873</id><published>2011-06-29T17:33:00.004-04:00</published><updated>2011-07-28T14:30:55.739-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='dxr'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>Mozilla-central and DXR</title><content type='html'>Some of you may be wondering why it is taking me so long to build a DXR index of mozilla-central. Part of the answer is that I'm waiting to bring DXR on clang back up to the original instantiation in terms of feature support. The other part of the answer is that mozilla-central is rather big to index. How big? Well, some statistics await&amp;hellip;
&lt;/p&gt;&lt;p&gt;
I first noticed a problem when I was compiling and found myself at 4 GiB of space remaining and dwindling fast. I'm used to fairly tight space restrictions (a result of malportioning the partitions on my last laptop), so that's not normally a source of concern. Except I had been at 10 GiB of free space an hour before that, before I started the run to index mozilla-central.
&lt;/p&gt;&lt;p&gt;
The original indexing algorithm is simple: for each translation unit (i.e., C++ file), I produce a corresponding CSV file of the "interesting" things I find in that translation unit&amp;mdash;which includes all of the header files included. As a result, I emit the data for every header file each time it is included by some compiled file. The total size of all data files produced by the mozilla-central run turned out to be around 12GiB. When I collected all of the data and removed duplicate lines, the total size turned out to be about 573MiB.
&lt;/p&gt;&lt;p&gt;
Step back and think about what this means for a moment. Since "interesting" things to DXR basically boil down to all warnings, declarations, definitions, and references (macros and templates underrepresented), this implies that every declaration, definition, and reference is parsed by the compiler, &lt;em&gt;on average&lt;/em&gt;, about 20-25 times. Or, if you take this as a proxy for the "interesting" lines of code, the compiler must read every line of code in mozilla-central about 20-25 times.
&lt;/p&gt;&lt;p&gt;
The solution for indexing is obviously not to output that much data in the first place. Since everyone who includes, say, &lt;tt&gt;nscore.h&lt;/tt&gt; does so in the same way, there's no reason to reoutput its data in all of the thousands of files that end up including it. However, a file like &lt;tt&gt;nsTString.h&lt;/tt&gt; (for the uninitiated, this is part of the standard internal string code interfaces, implemented using C's versions of templates, more commonly known as "the preprocessor") can be included multiple times but produce different results: one for &lt;tt&gt;nsString&lt;/tt&gt; and one for &lt;tt&gt;nsCString&lt;/tt&gt; in the same place. Also, there is the question of making it work even when people compile with &lt;tt&gt;-jN&lt;/tt&gt;, since I have 4 cores that are begging for the work.
&lt;/p&gt;&lt;p&gt;
It was Taras who thought up the solution. What we do is we separate out all of the CSV data by the file that it comes in. Then, we store each of the CSV data in a separate file whose name is a function of both the file it comes in and its contents (actually, it's &lt;tt&gt;&amp;lt;file&amp;gt;.&amp;lt;sha1(contents)&amp;gt;.csv&lt;/tt&gt;, for the curious). This also solves the problem of multiple compilers trying to write the same file at the same time: if we open with &lt;tt&gt;O_CREAT | O_EXCL&lt;/tt&gt; and the open fails because someone else created the file&amp;hellip; we don't need to do anything because the person who opens the file will write the same data we wanted to write! Applying this technique brings the total generated CSV file data down to around 1GiB (declaration/definition mappings account for the need for duplicates), or down to about 2 times the real data size instead of 20 times. Hence why the commit message for fixing this is titled &lt;a href="https://github.com/jcranmer/dxr/commit/866013d862b4752b7773c893004c860b1567597d"&gt;Generate much less data. MUCH LESS.&lt;/a&gt;
&lt;/p&gt;&lt;p&gt;
Of course, that doesn't solve all of my problems. I still have several hundred MiBs of real data that need to be processed and turned into the SQL database and HTML files. Consuming the data in python requires a steady state of about 3-3.5 GiB of resident memory, which is a problem for me since I stupidly gave my VM only 4GiB of memory and no swap. Switching the blob serialization from using &lt;tt&gt;cPickle&lt;/tt&gt; (which tries to keep track of duplicate references) to &lt;tt&gt;marshal&lt;/tt&gt; (which doesn't) allowed me to postpone crashing until I generate the SQL database, where SQL allocates just enough memory to push me over the edge, despite generating all of the SQL statements using iterators (knowing that duplicate processing is handled more implicitly). I also switched from using multiple processes to using multiple threads to avoid Linux failing to fork Python due to not enough memory (shouldn't it only need to copy-on-write?).
&lt;/p&gt;&lt;p&gt;
Obviously, I need to do more work on shrinking the memory usage. I stripped out the use of the SQL for HTML generation and achieved phenomenal speedups (I swore that I had to have broken something since it went from a minute to near-instantaneous). I probably need to move to a light-weight database solution, for example, &lt;a href="http://code.google.com/p/leveldb/"&gt;LevelDB&lt;/a&gt;, but I don't quite have a simple (key, value) situation but a (file, plugin, table, key, value) one. Still not hard to do, but more than I want to test for a first pass.
&lt;/p&gt;&lt;p&gt;
In short, &lt;a href="http://dxr.mozilla.org/mozilla/"&gt;DXR indexing for mozilla-central&lt;/a&gt; is being recreated as I write this. Enjoy.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-7862198853929372873?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/7862198853929372873/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=7862198853929372873' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/7862198853929372873'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/7862198853929372873'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2011/06/mozilla-central-and-dxr.html' title='Mozilla-central and DXR'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-5378703455734324333</id><published>2011-06-22T20:16:00.003-04:00</published><updated>2011-06-23T01:05:40.578-04:00</updated><title type='text'>Why autoconf sucks</title><content type='html'>I'm going to go out on a limb and guess that the GNU autotools are one of the most heavily used build systems in existence. What I'm not going to guess is that, as a build system, it sucks. I will grant you that C and C++ code in particular are annoying to compile by virtue of the preprocessor mechanics, and I will also grant that autotools can be useful in trying to work out the hairiness of working on several close-but-not-quite-the-same platforms. But that doesn't justify why you need to make a build system that is so bad.
&lt;/p&gt;&lt;p&gt;
One of the jobs of autotools (the configure script in particular) is to figure out which compilers to use, how to invoke them, and what capabilities they support. For example, is &lt;tt&gt;char&lt;/tt&gt; &lt;tt&gt;signed&lt;/tt&gt; or &lt;tt&gt;unsigned&lt;/tt&gt;? Or does the compiler support &lt;tt&gt;static_assert(expr)&lt;/tt&gt; or need to resort to &lt;tt&gt;extern void func(int arg[expr ? 1 : -1])&lt;/tt&gt;? Language standards progress, and, particularly in the case of C++, compilers can be slow to correct their implementations to the language.
&lt;/p&gt;&lt;p&gt;
The reason I bring this up is because I discovered today that my configure failed because my compiler "didn't produce an executable." Having had to deal with this error several times (my earlier work with dxr spent a lot of time figuring out how to manipulate &lt;tt&gt;$CXX&lt;/tt&gt; and still get a workable compiler), I immediately opened up the log, expecting to have to track down configure guessing the wrong file is the executable (last time, it was a &lt;tt&gt;.sql&lt;/tt&gt; file instead of &lt;tt&gt;.o&lt;/tt&gt;). No, instead the problem was that the compiler had crashed (the reason essentially boiling down to &lt;tt&gt;std::string&lt;/tt&gt; doesn't like being assigned &lt;tt&gt;NULL&lt;/tt&gt;). That reason, this is the program that autoconf uses to check that the compiler works:
&lt;/p&gt;&lt;pre&gt;
#line 1861 "configure"
#include "confdefs.h"

main(){return(0);}
&lt;/pre&gt;&lt;p&gt;
(I literally copied it from mozilla's configure script, go look around line 1861 if you don't believe me). That is a problem because &lt;strong&gt;&lt;em&gt;it's not legal C99 code&lt;/em&gt;&lt;/strong&gt;. No, seriously, autoconf has decided to verify that my C compiler is working by relying on a feature so old that it's been removed (not deprecated) in a 12-year old specification. While I might understand that there are some incredibly broken compilers out there, I'm sure this line of code is far more likely to fail working compilers than the correct code would be, especially considering that it is probably harder to write a compiler to accept this code than not except it. About the only way I can imagine writing this "test" program to make it more failtastic is to use trigraphs (which is legal C code that gcc does not honor by default). Hey, you could be running on systems that don't have a `#' key, right?
&lt;/p&gt;&lt;p&gt;
&lt;b&gt;Addendum:&lt;/b&gt; Okay, yes, I'm aware that the annoying code is a result of autoconf2.13 and that the newest autoconfs don't have this problem. In fact, after inspecting some source history (I probably have too much time on my hands), the offending program was changed by a merge of the experimental branch in late 1999. But the subtler point, which I want to make clearer, is that the problem with autoconf is that it spends time worrying about arcane configurations that the projects who use them probably don't even support. It also wraps the checks for these configurations in scripts which render the actual faults incomprehensible, including "helpfully" cleaning up after itself so you can't actually see the offending command lines and results. Like, for example, the fact that your compiler never produced a .o file to begin with.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-5378703455734324333?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/5378703455734324333/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=5378703455734324333' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/5378703455734324333'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/5378703455734324333'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2011/06/why-autoconf-sucks.html' title='Why autoconf sucks'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-3821394380123775791</id><published>2011-06-17T13:44:00.003-04:00</published><updated>2011-06-17T16:54:10.119-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='dxr'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>Alpha release of dxr</title><content type='html'>I am pleased to announce an alpha release of &lt;a href="https://wiki.mozilla.org/DXR"&gt;DXR&lt;/a&gt; built on top of &lt;a href="http://clang.llvm.org/"&gt;Clang&lt;/a&gt;. A live demo of DXR can be found at &lt;a href="http://dxr.mozilla.org/clang/"&gt;http://dxr.mozilla.org/clang/&lt;/a&gt;, which is an index of a relatively recent copy of the Clang source code. Since this is merely an alpha release, expect to find bugs and inconsistencies in the output. For more information, you can go to &lt;a href="irc://irc.mozilla.org/static"&gt;#static on irc.mozilla.org&lt;/a&gt; or contact the &lt;a href="news://news.mozilla.org/mozilla.dev.static-analysis"&gt;static&lt;/a&gt; &lt;a href="https://lists.mozilla.org/listinfo/dev-static-analysis"&gt;analysis mailing list&lt;/a&gt;. A list of most of the bugs I am aware of is at the end of this blog post.
&lt;/p&gt;&lt;p&gt;
So what is DXR? DXR is a smart code browser that works by using instrumented compilers to use what the compiler knows about the code to provide a database of the code. For C and C++ in particular, using an instrumented compiler is necessary, since it is the only reliable way to fix the issue of macros. Take, for instance, &lt;a href="http://dxr.mozilla.org/clang/clang/include/clang/AST/RecursiveASTVisitor.h.html"&gt;&lt;code&gt;RecursiveASTVistor&lt;/code&gt;&lt;/a&gt; in the Clang codebase. Most of the almost 1200 functions are defined via macros as opposed to in raw code; as a consequence, the &lt;a href="http://clang.llvm.org/doxygen/classclang_1_1RecursiveASTVisitor.html"&gt;doxygen output&lt;/a&gt; for this class is useless: as far as I can tell, there are only five methods I can override to visit AST nodes. On the other hand, DXR neatly tells me all of the methods that are defined, and can point me to the place where that function is defined (within the macro, of course).
&lt;/p&gt;&lt;p&gt;
Where can you get the code? DXR is available both as a &lt;a href="https://github.com/mozilla/dxr"&gt;github repository&lt;/a&gt; (use the dxr-clang branch) and as a &lt;a href="http://hg.mozilla.org/webtools/dxr"&gt;Mercurial repository&lt;/a&gt;. Instructions on how to use can be found on the &lt;a href="https://wiki.mozilla.org/DXR"&gt;wiki page&lt;/a&gt;.
&lt;/p&gt;&lt;p&gt;
The following is a list of known problems:&lt;ul&gt;
&lt;li&gt;Links occur at odd boundaries&lt;/li&gt;
&lt;li&gt;Some lines have &lt;tt&gt;id="l234"/&gt;&lt;/tt&gt; prepended&lt;/li&gt;
&lt;li&gt;Non-root installs (i.e., installing to &lt;tt&gt;http://dxr.mozilla.org/clang/&lt;/tt&gt;) cause issues. Interestingly, refreshing the page often causes things to work.&lt;/li&gt;
&lt;li&gt;There is a long list of scrolling text when compiling code. Ignore it.&lt;/li&gt;
&lt;li&gt;HTML generation produces &lt;tt&gt;IndexError&lt;/tt&gt;s&lt;/li&gt;
&lt;li&gt;&lt;tt&gt;.csv&lt;/tt&gt; files are created in the source directory and HTML code is generated.&lt;/li&gt;
&lt;li&gt;Inheritance searches don't match the full hierarchy, only one or two levels.&lt;/li&gt;
&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-3821394380123775791?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/3821394380123775791/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=3821394380123775791' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/3821394380123775791'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/3821394380123775791'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2011/06/alpha-release-of-dxr.html' title='Alpha release of dxr'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-3743771855628575843</id><published>2011-06-10T16:46:00.002-04:00</published><updated>2011-06-10T17:02:47.846-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>Documentation lies</title><content type='html'>For various reasons, I want to get the extent of a particular expression in clang. Most AST objects in clang have a method along the lines of &lt;tt&gt;getSourceRange()&lt;/tt&gt;, which, on inspection, returns start and end locations for the object in question. Naturally, it's the perfect fit (more or less), so I start using it. However, it turns out that the documentation is a bit of a liar. I would expect the end location to return the location (either file offset or file/line) of the end of the expression, give or take a character depending on whether people prefer half-open or fully closed ranges. Instead, it returns the location of the last &lt;em&gt;token&lt;/em&gt; in the expression. Getting the true last token requires this mess: &lt;code&gt;sm.getFileOffset(Lexer::getLocForEndOfToken(end, 0, sm, features))&lt;/code&gt;. Oh yeah, in case you couldn't have guessed, that causes it to relex the file starting at that point. For an idea of my current mood, pick a random image from &lt;a href="http://blogs.mozillamessaging.com/docs/2011/06/09/anatomy-of-a-bad-user-experience-with-rage-faces/"&gt;Jennifer's recent post&lt;/a&gt; and you'll likely capture it.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-3743771855628575843?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/3743771855628575843/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=3743771855628575843' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/3743771855628575843'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/3743771855628575843'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2011/06/documentation-lies.html' title='Documentation lies'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-105918509696982860</id><published>2011-06-10T02:55:00.002-04:00</published><updated>2011-06-10T03:33:37.071-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='dxr'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>DXR updates</title><content type='html'>It's been almost a week since I last discussed DXR, but, if you look at the &lt;a href="https://github.com/jcranmer/dxr/commits/"&gt;commit log&lt;/a&gt;, you can see that I've not exactly been laying back and doing nothing. No, the main reason is because most of the changes I've been doing so far aren't exactly ground-breaking.
&lt;/p&gt;&lt;p&gt;
In terms of UI, the only thing that's changed is that I know actually link the variable declarations correctly, a result of discovering the typo that was causing it not to work properly in the midst of the other changes. From a user's perspective, I've also radically altered the invocation of DXR. Whereas before it involved a long series of steps consisting of "build here, export these variables, run this script, build there, run that script, then run this file, and I hope you did everything correctly otherwise you'll wait half an hour to find you didn't" (most of which takes place in different directories), it's now down to &lt;kbd&gt;. dxr/setup-env.sh&lt;/kbd&gt; (if you invoke that incorrectly, you get a loud error message telling you how to do it correctly. Unfortunately, the nature of shells prohibits me from just doing the right thing in the first place), build your code, and then &lt;kbd&gt;python dxr/dxr-index.py&lt;/kbd&gt; in the directory with the proper &lt;tt&gt;dxr.config&lt;/tt&gt;. Vastly improved, then. But most of my changes are merely in cleaning up the code.
&lt;/p&gt;&lt;p&gt;
The first major change I made was replacing the &lt;tt&gt;libclang&lt;/tt&gt; API with a real clang plugin. There's only one thing I've found harder (figuring out inheritance, not that I'm sure I've had it right in the first place), and there are a lot of things I've found easier. Given how crazy people get with build systems, latching onto a compiler makes things go a lot smoother. The biggest downside I've seen with clang is that it's documentation is lacking. &lt;a href="http://clang.llvm.org/doxygen/classclang_1_1RecursiveASTVisitor.html"&gt;RecursiveASTVisitor&lt;/a&gt; is the main visitor API I'm using, but the documentation for that class is complete and utter crap, a result of doxygen failing to handle the hurdle of comprehending macros. I ended up using the libindex-based dxr to dump a database of all of the functions in the class, of which there are something like 1289, or around 60 times the functions listed in the documentation. Another example is &lt;a href="http://clang.llvm.org/doxygen/classclang_1_1Decl.html"&gt;Decl&lt;/a&gt;, where the inheritance picture is actually helpful. It, however, manages to have failed to document &lt;a href="http://clang.llvm.org/doxygen/DeclCXX_8h_source.html"&gt;DeclCXX.h&lt;/a&gt;, which is a rather glaring omission if what you are working with is C++ source.
&lt;/p&gt;&lt;p&gt;
The last set of changes I did was rearchitecting the code to make it hackable by other people. I have started on a basic pluggable architecture for actually implementing new language modules, although most of the information is still hardcoded to just use cxx-clang. In addition, I've begun work on ripping out SQL as the exchange medium of choice: the sidebar list is now directly generated using the post-processing ball of information, and linkification is now set up so that it can be generated independent of tokenization. In the process, I've greatly sped up HTML generation times only to regress a lot of it by tokenizing the file twice (the latter part will unfortunately remain until I get around to changing tokenization API). It shouldn't take that much longer for me to rip SQL fully out of the HTML builder and shove SQL generation into a parallel process for improved end-to-end time.
&lt;/p&gt;&lt;p&gt;
Some things I've discovered about python along the way. First, python closures don't let you mutate closed-over-variables. That breaks things. Second, if you have a very large python object, don't pass it around as an argument to multiprocessing: just make it a global and let the kernel's copy-on-write code make cloning cheap. Third, apparently the fastest way to concatenate strings in python is to use join with a list comprehension. Finally, python dynamic module importing sucks big time.
&lt;/p&gt;&lt;p&gt;
Where am I going from here? I'll have HTMLification (i.e., remove SQL queries and replace with ball lookups) fixed up tomorrow; after that, I'll make the cgi scripts work again. The next major change after that is getting inheritance and references in good shape, and then making sure that I can do namespaces correctly. At that point in time, I'll think I'll be able to make a prerelease of DXR.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-105918509696982860?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/105918509696982860/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=105918509696982860' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/105918509696982860'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/105918509696982860'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2011/06/dxr-updates.html' title='DXR updates'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-877492324360528050</id><published>2011-06-08T02:48:00.003-04:00</published><updated>2011-06-08T03:46:12.268-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mailnews'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>Fakeserver and future changes</title><content type='html'>This is perhaps a bit belated, but I thought I'd give a brief overview of where I think fakeserver should be going over the next several months (i.e., when I find time to work on it). For those who don't know, &lt;a href="https://developer.mozilla.org/en/MailNews_fakeserver"&gt;Fakeserver&lt;/a&gt; is a generalized testing framework for the 4 major mail protocols: IMAP, NNTP, SMTP, and POP. For various reasons, I'm actually getting around to fixing problems with it right now. The following are the features I am going to be adding:
&lt;/p&gt;&lt;h4&gt;&lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=656984"&gt;Multiple connections&lt;/a&gt;&lt;/h4&gt;&lt;p&gt;
This is a long-standing bug in fakeserver, which I first discovered about 3 years ago, when I started testing the IMAP implementation. Fakeserver uses the same connection handler for every connection, which means the state of the connection is shared between all connections, obviously very problematic for a protocol as stateful as IMAP. In my defense, I cribbed from the stateless (more or less) HTTP server, and used the barely-stateful NNTP as my first testbed. This bug is probably the most API-breaking of any change to fakeserver code: every invocation of the server must replace &lt;code&gt;new nsMailServer(new handler(daemon))&lt;/code&gt; with &lt;code&gt;new nsMailServer(function (d) { return new handler(d); }, daemon)&lt;/code&gt;. Subsequent to that, handlers are no longer easily exposed by the server, which means all manipulation must be on the daemon end. This causes major changes to SMTP code and NNTP code for different reasons; pretty much everyone else hides between helper functions that act as a firewall to minimizing code changes.
&lt;/p&gt;&lt;h4&gt;&lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=662176"&gt;Multi-process fakeserver&lt;/a&gt;&lt;/h4&gt;&lt;p&gt;
This change (which has no patch yet) will move the fakeserver running into another process. I have some working prototypes for manipulating objects across two different processes, but I don't yet have a good IPC mechanism (which Mozilla strangely lacks) for actually communicating this information. As for API changes, I haven't gotten far enough yet to know the impact of multiprocess fakeserver, but the answer will probably that handler information goes from "hard to get" to "impossible to get". On the plus side, this should essentially enable mozmill tests to be able to use fakeserver.
&lt;/p&gt;&lt;h4&gt;&lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=662180"&gt;SSL fakeserver&lt;/a&gt;&lt;/h4&gt;&lt;p&gt;
I figure fakeserver needs SSL support--it's common in the mail world to use SSL instead of plaintext, so we need to support it. In terms of external API, the only change will be some added methods to start up an SSL server (and methods to get STARTTLS working to, provided my knowledge of how SSL works is correct). The API pseudo-internal to maild.js gets more convoluted, though: essentially, I need to support multiple pumping routines for communication. On the plus side, though, if the API I'm planning on implementing turns out to be feasible, we should also be able to get generic charset decoding for nearly free, for i18n tests.
&lt;/p&gt;&lt;h4&gt;&lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=662192"&gt;LDAP fakeserver&lt;/a&gt;&lt;/h4&gt;&lt;p&gt;
LDAP address book code is almost completely untested. I'd like to see that fixed. However, LDAP is a binary protocol (&lt;a href="http://en.wikipedia.org/wiki/Basic_Encoding_Rules"&gt;BER&lt;/a&gt;, to be precise). If I can implement the SSL support in the API framework I want to, then it shouldn't be that much more to get a BER-to-text-ish layer thrown in on top of it. The downside is that I'm pretty much looking at using our own LDAP library to do BER encoding/decoding since I don't want to write that myself.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-877492324360528050?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/877492324360528050/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=877492324360528050' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/877492324360528050'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/877492324360528050'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2011/06/fakeserver-and-future-changes.html' title='Fakeserver and future changes'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-8008597681449050220</id><published>2011-06-04T18:40:00.002-04:00</published><updated>2011-06-04T21:18:48.779-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='dxr'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>DXR thinking aloud</title><content type='html'>A primary goal of DXR is to be easy to use. Like any simple statement, translating that into design decisions is inordinately difficult, and I best approach such issues by thinking out loud. Normally, I do this in IRC channels, but today I think it would be best to think in a louder medium, since the problem is harder.
&lt;/p&gt;&lt;p&gt;
There are three distinct things that need to be made "easy to use". The first is the generation of the database and the subsequent creation of the web interface. The second is extension of DXR to new languages, while the last is the customization of DXR to provide more information. All of them have significant issues.
&lt;/p&gt;&lt;h5&gt;Building with DXR&lt;/h5&gt;&lt;p&gt;
Starting with the first one, build systems are complicated. For a simple GNU-ish utility, &lt;kbd&gt;./configure &amp;amp;&amp;amp; make&lt;/kbd&gt; is a functioning build system. But where DXR is most useful is on the large, hydra-like projects where figuring out how to build the program is itself a nightmare: Mozilla, OpenJDK, Eclipse, etc. There is also a substantial number of non-autoconf based systems which throw great wrenches in everything. At the very least, I know this much about DXR: I need to set environment variables before configuring and building (i.e., replace the compiler), I need to "watch" the build process (i.e., follow warning spew), and I need to do things after the build finishes (post-processing). Potential options:
&lt;/p&gt;&lt;ol&gt;
&lt;li&gt;Tell the user to run this command before the build and after the build. On the plus side, this means that DXR needs to know absolutely nothing about how the build system works. On the down side, this requires confusing instructions: in particular, since I'm setting environment variables, the user has to specifically type ". &lt;shell file&gt;" in the shell to get them set up
properly. There are a lot of people who do not have significant shell exposure to actually understand why that is necessary, and general usage is different enough from the commands that people are liable to make mistakes doing so.&lt;/li&gt;
&lt;li&gt;Guess what the build system looks like and try to do it all by ourselves. This is pretty much the opposite extreme, in that it foists all the work on DXR. If your program is "normal", this won't be a problem. If your program isn't... it will be a world of pain. Take also into consideration that any automated approach is likely to fail hard on Mozilla code to begin with, which effectively makes this a non-starter.&lt;/li&gt;
&lt;li&gt;Have the user input their build system to a configuration file and go from there. A step down from the previous item, but it increases the need for configuration files.&lt;/li&gt;
&lt;li&gt;Have DXR spawn a shell for the build system. Intriguing, solves some problems but causes others.&lt;/li&gt;
&lt;/ol&gt;&lt;p&gt;
Conclusion: well, I don't like any of those options. While the goal of essentially being able to "click" DXR and have it Just Work™ is nice, I have reservations about such an approach being able to work in practice. I think I'll go for a "#1 and punt on this issue to someone with more experience."
&lt;/p&gt;&lt;h5&gt;Multiple language&lt;/h5&gt;&lt;p&gt;
I could devote an entire week's worth of blog posts to this topic, I think, and I would wager that this is more complicated and nebulous than even build systems are. In the end, all we really need to worry about with build systems is replacing compilers with our versions and getting to the end; with languages, we actually need to be very introspective and invasive to do our job.
&lt;/p&gt;&lt;p&gt;
Probably the best place to start is actually laying out what needs to be done. If the end goal is to produce the source viewer, then we need to at least be able to do syntax highlighting. That by itself is difficult, but people have done it before: I think my gut preference at this point is to basically ask authors of DXR plugins to expose something akin to vim's syntax highlighting instead of asking them to write a full lexer for their language.
&lt;/p&gt;&lt;p&gt;
On the other end of the spectrum is generating the database. The idea is to use an instrumenting compiler, but while that works for C++ or Java, someone whose primary code is a website in Perl or a large Python utility has a hard time writing a compiler. Perhaps the best option here is just parsing the source code when we walk the tree. There is also the question about what to do with the build system: surely people might want help understanding what it is their Makefile.in is really doing.
&lt;/p&gt;&lt;p&gt;
So what does the database look like? For standard programming languages, we appear to have a wide-ranging and clear notion of types/classes, functions, and variables, with slightly vaguer notions of inheritance, macros (in both the lexical preprocessing sense and the type-based sense of C++'s templates), and visibility. Dynamic languages like JavaScript or Python might lack some reliable information (e.g., variables don't have types, although people often still act as if they have implicit type information), but they still uphold this general contract. If you consider instead things like CSS and HTML or Makefiles in the build system, this general scheme completely fails to hold, but you can still desire information in the database: for example, it would help to be able to pinpoint which CSS rules apply to a given HTML element.
&lt;/p&gt;&lt;p&gt;
This begs the question, how does one handle multiple languages in the database? As I ponder this, I realize that there are multiple domains of knowledge: what is available in one language is not necessarily available in another. Of the languages Mozilla uses, C, assembly, C++, and Objective-C[++] all share the same ability to access any information written in the other languages; contrast this to JS code, which can only interact with native code via the use of a subset of IDL or interactions with native functions. IDL is a third space, which is a middle ground between native and JS code, but is insufficiently compatible with either to be lumped in with one. Options:
&lt;/p&gt;&lt;ol&gt;
&lt;li&gt;Dump each language into the same tables with no distinction. This has problems in so far as some languages can't be shoehorned into the same models, but I think that in such cases, one is probably looking for different enough information anyways that it doesn't matter. The advantage of this is that searching for an identifier will bring it up everywhere. The disadvantage... is that it gets brought up everywhere.&lt;/li&gt;
&lt;li&gt;Similar to #1, but make an extra column for language, and let people filter by language.&lt;/li&gt;
&lt;li&gt;Going a step further, take the extra language information and build up the notion of different bindings: this &lt;tt&gt;foo.bar&lt;/tt&gt; method on a python object may be implemented by this &lt;tt&gt;Python_foo_bar&lt;/tt&gt; C binding. In other words, add another table which lists this cross-pollination and takes it into account when searching or providing detailed information&lt;.&lt;/li&gt;
&lt;li&gt;Instead of the language column in tables, make different tables for every language.&lt;/li&gt;
&lt;li&gt;Instead of tables, use databases?&lt;/lI&gt;&lt;/ol&gt;
&lt;p&gt;Hmm. I think the binding cross-reference is important. On closer thought, it's not really &lt;em&gt;languages&lt;/em&gt; themselves that are the issue here, it's essentially the &lt;em&gt;target bindings&lt;/em&gt;: if we have a system that is doing non-trivial build system work that involves cross-compiling, it matters if what we are doing is being done for the host or being done for the target. Apart from that, I think right now that the best approach is to have different tables.
&lt;/p&gt;&lt;h3&gt;Extraneous information&lt;/h3&gt;&lt;p&gt;
The previous discussion bleeds into this final one, since they both ultimately concern themselves with one thing: the database. This time, the question is how to handle generation of information beyond the "standard" set of information. Information, as I see it, comes in a few forms. There is additional information at the granularity of identifiers (this function consumes 234 bytes of space or this is the documentation for the function), lines (this line was not executed in the test suite), files (this file gets compiled to this binary library), and arguably directories or other concepts not totally mappable to constructs in the source tree (e.g., output libraries).
&lt;/p&gt;&lt;p&gt;
The main question here is not on the design of the database: it's only a question of extra tables or extra columns (or both!). Instead, the real question is in the design of the programmatic mechanisms. In dxr+dehydra, the simple answer is to load multiple scripts. For dxr+clang, however, the question becomes a lot more difficult since the code is written in C++ and isn't dynamically loading modules like dehydra does. It also begins to beg the question of the exposed API. On the other hand, I'm not sure I know enough of the problem space to be able to actually come up with solutions. I think I'll leave this one for later&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-8008597681449050220?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/8008597681449050220/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=8008597681449050220' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/8008597681449050220'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/8008597681449050220'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2011/06/dxr-thinking-aloud.html' title='DXR thinking aloud'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-3891041472247678741</id><published>2011-06-02T13:32:00.003-04:00</published><updated>2011-06-02T14:03:49.240-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='visualization'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>Who's responsible for libxul</title><content type='html'>As you might have gathered from my last post, I spend a fair amount of time thinking up ways of visualizing software metrics. Today's visualization is inspired by a lunch conversation I had yesterday. Someone asked "how much space does SVG take up in libxul?" This isn't a terribly hard problem to solve: we have a tool that can tell us the size of symbols (i.e., codesighs), and I have a tool that gives me a database of where every symbol is defined in source code.
&lt;/p&gt;&lt;p&gt;
Actually, it's not so simple. codesighs, as I might have predicted, doesn't appear to work that well, so I regressed back to &lt;tt&gt;nm&lt;/tt&gt;. And when I actually poked the generated DXR database for mozilla-central using dehydra (I haven't had the guts to try my dxr-clang integration on m-c yet), I discovered that it appears to only lack the information about functions. Both static functions and member C++ functions&amp;mdash;my suspicions of breakage are now confirmed. So I resorted to one of my "one-liner" scripts, as follows:&lt;/p&gt;&lt;pre&gt;
cat &amp;lt;&amp;lt;HEREDOC
&amp;lt;!DOCTYPE html&amp;gt;
&amp;lt;html&amp;gt;&amp;lt;head&amp;gt;
&amp;lt;title&amp;gt;Code size by file&amp;lt;/title&amp;gt;
&amp;lt;script type="application/javascript" src="https://www.google.com/jsapi"&amp;gt;&amp;lt;/script&amp;gt;
&amp;lt;script type="application/javascript"&amp;gt;
google.load("visualization", "1", {packages: ["treemap"]});
google.setOnLoadCallback(drawChart);
function drawChart() {
  var data = new google.visualization.DataTable();
  data.addColumn('string', 'File');
  data.addColumn('string', 'Path');
  data.addColumn('number', 'Code size (B)');
HEREDOC
nm --format=bsd --size-sort /src/build/trunk/browser/dist/bin/libxul.so | cut -d' ' -f 3 &amp;gt; /tmp/libxul.symbols.txt
find -name "*.o"| xargs nm --format=bsd --size-sort --print-file-name | sed -e 's/:/ /g' | python -c '
import sys
symbols = set(open("/tmp/libxul.symbols.txt").readlines())
files = {}
for line in sys.stdin:
  args = line.split(" ")
  if args[-2] in "tTvVwW" and args[-1] in symbols:
    files[args[0]] = files.get(args[0], 0) + int(args[1], 16)
out_lines = set()
for f in files:
  f2 = f.replace("./", "")
  path = f2.split("/")
  str = "src/"
  for p in path[:-1]:
    out_lines.add((str + p + "/", str, 0))
    str += p + "/"
  out_lines.add((f2, str, files[f]))
print "data.addRows([" + ",\n".join([repr(list(x)) for x in out_lines]) + "]);"
'

cat &amp;lt;&amp;lt;HEREDOC
  data.addRow(["src/", null, 0]);
  var tree = new google.visualization.TreeMap(document.getElementById('tmap'));
  tree.draw(data, {maxDepth: 3, headerHeight: 0, noColor: "#ddd", maxColor: "#ddd",
    minColor: "#ddd", midColor: "#ddd"});
}
&amp;lt;/script&amp;gt;&amp;lt;/head&amp;gt;
&amp;lt;body&amp;gt;
  &amp;lt;div id="tmap" style="width: 100%; height: 1000px;"&amp;gt;&amp;lt;/div&amp;gt;
&amp;lt;/body&amp;gt;
&amp;lt;/html&amp;gt;
HEREDOC
&lt;/pre&gt;&lt;p&gt;
It's really 4 commands, which just happens to include two HEREDOC cats and a multi-line python script. The output is a &lt;del&gt;little&lt;/del&gt;&lt;ins&gt;not-so-little&lt;/ins&gt; HTML file that loads a treemap visualization using the google API. The end result &lt;a href="http://tjhsst.edu/~jcranmer/firefoxsize.html"&gt;can be seen here&lt;/a&gt;. To answer the question posed yesterday: SVG accounts for about 5-6% of libxul, in terms of content, and about another 1% for the layout component; altogether, about as much as our xpconnect code accounts for libxul. For Thunderbird folks, I've taken the time to also similarly divvy up libxul, and that &lt;a href="http://tjhsst.edu/~jcranmer/tbsize.html"&gt;can be found here&lt;/a&gt;.
&lt;/p&gt;&lt;p&gt;
P.S., if you're complaining about the lack of hard numbers, blame Google for giving me a crappy API that doesn't allow me to make custom tooltips.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-3891041472247678741?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/3891041472247678741/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=3891041472247678741' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/3891041472247678741'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/3891041472247678741'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2011/06/whos-responsible-for-libxul.html' title='Who&apos;s responsible for libxul'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-8335430746651761147</id><published>2011-06-02T02:37:00.003-04:00</published><updated>2011-06-02T03:03:47.211-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='visualization'/><category scheme='http://www.blogger.com/atom/ns#' term='politics'/><title type='text'>Visualization toolkits suck</title><content type='html'>This is a rant in prelude to a blog post I expect to make tomorrow morning (or later today, depending on your current timezone). As someone who is mildly interested in data, I spend a fair amount of time thinking up things that would be interesting to just see dumped out visually. And I'm not particularly interested in small datasets: one of my smaller datasets I've been playing with is "every symbol in libxul.so"; one of my larger datasets is "every news message posted to Usenet this year, excluding binaries" (note: this is 13 GiB worth of data, and that's not 100% complete).
&lt;/p&gt;&lt;p&gt;
So while I have a fair amount of data, what I need is a simple way to visualize it. If what I have is a simple bar chart or scatter plot, it's possible to put it in Excel or LibreOffice... only to watch those programs choke on a mere few thousand data points (inputting a 120K file caused LibreOffice to hang for about 5 minutes to produce the scatter plot). But spreadsheet programs clearly lack the power for serious visualization information; the most glaring chart they lack is the &lt;a href="http://en.wikipedia.org/wiki/Box-and-whiskers_plot"&gt;box-and-whiskers plot&lt;/a&gt;. Of course, if the data I'm looking at isn't simple 1D or 2D data, then what I really need won't be satiable with them anyways.
&lt;/p&gt;&lt;p&gt;
I also need to visualize large tree data, typically in a tree map. Since the code for making squarified tree maps is more than I care to do for a simple vis project, I'd rather just use a simple toolkit. But which to use? Java-based toolkits (i.e., prefuse) require me to sit down and make a full-fledged application for what should be a quick data visualization, and I don't know any Flash to be able to use Flash toolkits (i.e., flare). For JavaScript, I've tried the &lt;a href="thijit.org"&gt;JavaScript InfoVis Toolkit&lt;/a&gt;, &lt;a href="http://code.google.com/apis/chart/interactive/docs/index.html"&gt;Google's toolkit&lt;/a&gt;, and &lt;a href="http://vis.stanford.edu/protovis/"&gt;protovis&lt;/a&gt;, all without much luck. JIT and protovis both require too much baggage for a simple "what does this data look like", and Google's API is too inflexible to do anything more than "ooh, pretty treemap". Hence why my previous foray into an application for viewing code coverage used a Java applet: it was the only thing I could get working.
&lt;/p&gt;&lt;p&gt;
What I want is a toolkit that gracefully supports large datasets, allows me to easily drop data in and play with views (preferably one that doesn't try to dynamically change the display based on partial options reconfiguration like most office suites attempt), supports a variety of datasets, and has a fine-grained level of customizability. Kind of like SpotFire or Tableau, just a tad bit more in my price range. Ideally, it would also be easy to put on the web, too, although supporting crappy old IE versions isn't a major feature I need. Is that really too much to ask for?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-8335430746651761147?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/8335430746651761147/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=8335430746651761147' title='10 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/8335430746651761147'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/8335430746651761147'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2011/06/visualization-toolkits-suck.html' title='Visualization toolkits suck'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>10</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-940800933320250766</id><published>2011-06-01T18:34:00.002-04:00</published><updated>2011-06-01T19:28:28.179-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='dxr'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>Building with dxr-clang</title><content type='html'>As I'm building yet another program from scratch (I've now run dxr on four separate trees in the past week, and I'm still looking for a good one to use for tests), let me take the time to explain how to set up DXR on your own computer. Instead of stuffing it in this blogpost, most of the information is instead on &lt;a href="https://wiki.mozilla.org/DXR"&gt;wiki.mozilla.org&lt;/a&gt;. One thing not on that page is that I currently have the sql files in the clang dumper dump to the source directory instead of the object directory, so the DXR config needs to have the source and object directory be the same setting, even if you don't actually build the source in-place!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-940800933320250766?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/940800933320250766/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=940800933320250766' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/940800933320250766'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/940800933320250766'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2011/06/building-with-dxr-clang.html' title='Building with dxr-clang'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-5341679255825717557</id><published>2011-05-27T20:22:00.003-04:00</published><updated>2011-05-27T20:55:34.153-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='dxr'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>clang-DXR</title><content type='html'>As mentioned previously, the first thing I started doing was getting DXR to use libclang to build the indexing. As of right now, I have a tool which kind of works. In that it can successfully build a program but utterly fails to produce reliable SQL output due to constantly swallowed segfaults (I think). My test program uses cmake to build, so it "helpfully" hides the noise of what is actually going on, so debugging will have to wait until next week.
&lt;/p&gt;&lt;p&gt;
The main unexpected hurdle is that for all of the wonderful praise clang has had lavished on it, I have found it surprisingly difficult to use. I had expected to be able to do a more or less straightforward port of &lt;a href="http://hg.mozilla.org/webtools/dxr/file/0bc8e23c54c8/xref-scripts/dxr.js"&gt;dxr.js&lt;/a&gt;, with some cosmetic changes to handle differences in naming, etc., between dehydra and libclang. That expectation is now hopelessly gone; libclang provides nothing near the clean API that dehydra does. For example, to get the fully-qualified name of a class, instead of being able to say &lt;tt&gt;type.name&lt;/tt&gt;, I have to walk up the entire AST hierarchy of the class to find all of its containing classes and namespaces to do it myself. If I want to get one for a function (which, in C++, requires differentiating between overloads, so I need type parameters), I have to build my own type-to-string dumper. Indeed, no less than &lt;a href="https://github.com/jcranmer/dxr/blob/406c25ed5f29fd0f7de70d2de9a016ef6ac4671d/xref-tools/cxx-clang/dxr-index.cpp"&gt;half of my code&lt;/a&gt; so far is merely trying to build the type names I need to dump out. In addition, it seems that the clang people are dismissive of an idea to include a higher level API than is currently present.
&lt;/p&gt;&lt;p&gt;
If you didn't notice from the links, by the way, dxr is now on &lt;a href="https://github.com/mozilla/dxr"&gt;github&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-5341679255825717557?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/5341679255825717557/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=5341679255825717557' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/5341679255825717557'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/5341679255825717557'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2011/05/clang-dxr.html' title='clang-DXR'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-4673568854594508028</id><published>2011-05-23T17:14:00.003-04:00</published><updated>2011-05-23T17:47:18.138-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='dxr'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>Summer DXR work</title><content type='html'>So, far this summer, I will be mostly putting away Thunderbird work and will instead be focusing my attention on reviving DXR. Sometime this week, I expect to be updating the installation on &lt;a href="http://dxr.mozilla.org"&gt;dxr.mozilla.org&lt;/a&gt; (simply so I can match up better on DXR to what I'm doing locally).
&lt;/p&gt;&lt;p&gt;
What will I be doing on DXR? First and foremost, I will be making it easier to run DXR on anything that is not Mozilla. Or anything that is Mozilla, for that matter. This will be done by cleanly separating the build/instrumentation steps from the rest of DXR. In other words, setting CC/CXX/LD/etc. should be all you need to do build your program.
&lt;/p&gt;&lt;p&gt;
After that, I will be working on using Clang to get information instead of using gcc+dehydra. If I understand the current Apple development process, supporting clang is essential to supporting Mac OS X builds in the future. I will also work on getting IDL and JS support added.
&lt;/p&gt;&lt;p&gt;
One other thing I want to do is (time permitting) to get DXR to represent the results of different builds, so that you can get results other than just the current Linux build. I have a list of other ideas to play around with, but I'll leave that for later, since I don't know if I'll time to start them or not.
&lt;/p&gt;&lt;p&gt;
Don't expect dxr.mozilla.org to see all of the new features immediately. I'll update when I feel like it (i.e., I see enough stability to do it), and it's easier for me to work more locally on my laptop. I also find it easier to do rapid updates on a smaller project than Mozilla which takes quite some time to do clobber rebuilds; I don't know which project(s) I'll be using for all of my work yet, so don't bother asking.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-4673568854594508028?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/4673568854594508028/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=4673568854594508028' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/4673568854594508028'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/4673568854594508028'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2011/05/summer-dxr-work.html' title='Summer DXR work'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-4767034242178551170</id><published>2011-01-21T17:06:00.003-05:00</published><updated>2011-01-21T17:39:57.822-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='news'/><category scheme='http://www.blogger.com/atom/ns#' term='mailnews'/><title type='text'>Usage share of newsreaders, update</title><content type='html'>A few months ago, I logged &lt;a href="http://quetzalcoatal.blogspot.com/2010/09/usage-share-of-newsreaders.html"&gt;the usage share of various newsreaders&lt;/a&gt; for roughly the month of August. Since then, I ran updated tests, alternating monthly between logging by users and messages, which gives me statistics for September through November. Later months are not available because the newsserver I used has now gone away (without notifying users!), and I did not want to switch this script to the new one, since comparability is lost.
&lt;/p&gt;&lt;p&gt;
One of the uses I had for original statistics collection was to argue why NNTP support for Thunderbird still matters. During an IRC discussion, it was brought up that August is a poor month for logging since there is a tradition of using that month for vacation. Pulling up the data for the month of October, the last one for which I have this data, indicates that approximately 720,000 messages were posted that month, indicating that August is indeed a poor month for indicating volume.
&lt;/p&gt;&lt;p&gt;
Have the statistics changed much? Google Groups and Thunderbird are both within .2% absolute difference of the scores I calculated last time (44.02% and 12.3%, respectively). Down the line, things change: Outlook Express had 8.98%, followed by Forte Agent at 8.86%. Live Mail had 2.83% and MT-NewsWatcher had 2.51%. Indeed, the tail is longer, with 20.52% as compared to before.
&lt;/p&gt;&lt;p&gt;
As my new server has a longer retention time, I no longer wish to use the same script as before. My next goal is to log every header of every message posted this year, so that I may collect more information without having to list everything I need, particularly information useful in determining the user of mail-to-news gateways and information to help identify spamminess of messages. I have lots of ideas for possible analysis of data, but first I want usable data.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-4767034242178551170?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/4767034242178551170/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=4767034242178551170' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/4767034242178551170'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/4767034242178551170'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2011/01/usage-share-of-newsreaders-update.html' title='Usage share of newsreaders, update'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-2653529156263722413</id><published>2011-01-17T14:11:00.001-05:00</published><updated>2011-01-17T14:11:59.272-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='news'/><category scheme='http://www.blogger.com/atom/ns#' term='mailnews'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>News URI handling</title><content type='html'>I have built a version of Thunderbird that should fix news URI handling issues, obtainable from &lt;a href="http://ftp.mozilla.org/pub/mozilla.org/thunderbird/tryserver-builds/Pidgeot18@gmail.com-00bba56d3767/"&gt;this site&lt;/a&gt;. This is a patch queue based off of trunk just after the 3.3a2 release branch, so it should have all of the features of 3.3a2 if not the version number.
&lt;/p&gt;&lt;p&gt;
In particular, both news and nntp links of any type should work, including news URIs without a server. If any of those links do not work, please tell me, including the circumstances in which they didn't work (e.g., where did you click it, was TB open, were you subscribed to the group or not, etc.). Also, there is a chance that this could regress other handling in news code or OS command line handling, so if you see such regresses, please also tell me.
&lt;/p&gt;&lt;p&gt;
Thanks in advance for your testing!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-2653529156263722413?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/2653529156263722413/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=2653529156263722413' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/2653529156263722413'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/2653529156263722413'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2011/01/news-uri-handling.html' title='News URI handling'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-8976386748205389478</id><published>2011-01-15T20:17:00.004-05:00</published><updated>2011-01-15T22:20:49.788-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>The Great Codec War</title><content type='html'>At the &lt;a href="http://blog.chromium.org/2011/01/html-video-codec-support-in-chrome.html"&gt;Second Battle of Chrome&lt;/a&gt;, WebM seems to have struck a surprise victory against H.264, when Google announced that it was dropping support for H.264 in &amp;lt;video&amp;gt;. Well, maybe it was only a pyrrhic victory. Reactions seem to differ a lot, but I think a lot of them miss the mark.
&lt;/p&gt;&lt;p&gt;
I've seen some people (I'm looking at you, Ars) claim that this will help kill HTML 5 video. This claim seems to me to be bogus: HTML 5 video effectively died years ago when no one could agree on a codec. Of the several video sites I use, the only one to support HTML 5 is YouTube; everyone else uses Flash. Since it's already dead, you can hardly kill it by switching codecs. This just shifts the balance more in favor of WebM. And as for H.264 working nearly everywhere, AppleInsider, &lt;a href="http://www.appleinsider.com/articles/11/01/15/google_reaffirms_intent_to_derail_html5_h_264_video_with_webm_browser_plugins.html"&gt;your chart&lt;/a&gt; is just &lt;a href="http://tvtropes.org/pmwiki/pmwiki.php/Main/BlatantLies"&gt;Blatant Lies&lt;/a&gt;.
&lt;/p&gt;&lt;p&gt;
Another thing that people do is compare video codecs to image codecs, most particularly GIF. But H.264 is not GIF: GIF became wildly popular before Unisys seemed to realize that it violated LZW. It is also unclear, looking back a decade after the fact, if Unisys targeted only encoders or both (&lt;a href="http://lzw.info"&gt;lzw.info&lt;/a&gt; implies both, but Mozilla had &lt;a href="http://bonsai.mozilla.org/cvslog.cgi?file=mozilla/modules/libpr0n/decoders/gif/nsGIFDecoder2.cpp&amp;rev=HEAD&amp;mark=1.108"&gt;GIF code long before the patent expired&lt;/a&gt;, and I don't see any information about LZW licensing). H.264, however, clearly mentions licensing for the decoder. Furthermore, it was relatively easy to make a high-quality image codec that doesn't require stepping on patents (which we now call PNG); the video codec market is much more strangled to make that impossible.
&lt;/p&gt;&lt;p&gt;
While on the topic, I've also seen a few statements that point out that H.264 is an ISO standard and WebM is not. Since people love to make the comparison between H.264 and GIF, I will point out that GIF is not an ISO standard nor an RFC, ITU, W3C, or IEEE document (although the W3C does have a copy of it on their website, it appears to not be accessible from their website by internal links). The commentary about "open standards" typically means "I can implement it by reading this/these specification(s) without paying a fee to anybody", not "there exists an officially-approved, freely-available standard" (incidentally, ISO standards generally are NOT freely-available).
&lt;/p&gt;&lt;p&gt;
But what about Flash, most people say, both on the issue of support for H.264 (although it will be supporting WebM as well) as well as the fallacy of open support. The answer to that is two simple words: "Legacy content." Flash works for everybody but a few prissy control freaks, and so much stuff&amp;mdash;more than just video, in fact&amp;mdash;uses it that not supporting it is impractical. Remember, half the web's users do not have HTML 5-capable browsers.
&lt;/p&gt;&lt;p&gt;
All of that said, where things go in the future is very much an open question. I see several possible directions:
&lt;/p&gt;&lt;ol&gt;
&lt;li&gt;The U.S. declares software patents invalid. &lt;a href="http://weblogs.mozillazine.org/roc/archives/2011/01/playing_the_gam.html"&gt;Mr. O'Callahan can tell you one scenario&lt;/a&gt; that could cause this. It's actually not implausible: the Supreme Court in Bilski seemed mildly skeptical of expansive patentability claims, and a relatively clean software patent claims would probably allow them to make a coherent "narrow" ruling on software patents in general. And, though the U.S. is not the world, an anti-software patent U.S. ruling would probably lead to nullification of software patents worldwide.&lt;/li&gt;
&lt;li&gt;MPEG-LA changes their minds and allows royalty-free decoding (not encoding) of H.264. This solution is fairly implausible, unless MPEG-LA desperately decides to try this gambit to stop H.264 from becoming obsolete. The circumstances which would lead them to do this would probably be on the back of a steep descent in H.264 popularity, so the actual value of this outcome would be minor.&lt;/li&gt;
&lt;li&gt;Apple caves in and allows either Flash or WebM on iOS. With alternative browsers on mobile allowing these options, that means only about 17% of the mobile market has no support for video other than H.264. Depending on the success of other OSs, this may force Apple to support one of the two alternatives to allow video to work on iOS. I don't know how plausible this is, but seeing as how Android is both newer than iOS and more popular, a long-term decline in Apple's fortunes is not unreasonable.&lt;/li&gt;
&lt;li&gt;The world continues as it does today, with no single solution supporting everybody. Not ideal, but it is the path of least resistance. Unfortunately, it's also probably the most likely.&lt;/li&gt;&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-8976386748205389478?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/8976386748205389478/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=8976386748205389478' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/8976386748205389478'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/8976386748205389478'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2011/01/great-codec-war.html' title='The Great Codec War'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-314706627166576167</id><published>2011-01-10T18:44:00.003-05:00</published><updated>2011-01-10T19:24:52.023-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='accttype'/><category scheme='http://www.blogger.com/atom/ns#' term='mailnews'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>Developing new account types, Part 4: Displaying messages</title><content type='html'>This series of blog posts discusses the creation of a new account type implemented in JavaScript. Over the course of these blogs, I use the development of my &lt;a href="http://hg.mozilla.org/users/Pidgeot18_gmail.com/webfora"&gt;Web Forums extension &lt;/a&gt;to explain the necessary actions in creating new account types.
&lt;/p&gt;&lt;p&gt;
In the previous blog post, I showed how to implement the folder update. Our next step is to display the messages themselves. As of this posting, I will refer less frequently to my JSExtended-based framework (Kent James's SkinkGlue is a less powerful variant; a final version will likely be a hybrid of the two technologies)&amp;mdash;it will be slowly phased out over the rest of these series of blog posts.
&lt;/p&gt;&lt;h4&gt;URLs and pseudo-URLs&lt;/h4&gt;&lt;p&gt;
As mentioned earlier, messages have several representations. Earlier, we used the message key and the message header as our representations; now, we will be using two more forms: message URIs and necko URLs &lt;a href="#note-4.1"&gt;[1]&lt;/a&gt;. The message URI is more or less a serialization of the folder and key unique identifier. It does not have any further property of a "regular" URL (hence the title); most importantly, it is not (necessarily) something that can be run with necko. To convert them to necko URLs, you need to use the message service.
&lt;/p&gt;&lt;p&gt;
Because message URIs require an extra step to convert to necko URLs, most of the message service uses the message URI instead of the URL (anytime you see a raw string or a variable named messageURI, or (most of the time) URI, it is this pseudo-URL that is being referred to). Displaying messages involves a call to the aptly-named &lt;tt&gt;DisplayMessage&lt;/tt&gt;. Unfortunately, it's also not quite so aptly-named in that it can also effectively mean "fetch the contents of this message to a stream," but I will discuss this later.
&lt;/p&gt;&lt;p&gt;
This is where the bad news starts. First off, mailnews is a bit lazy when it comes to out parameters. Technically, XPCOM requires that you pass in pointers to all outparams to receive the values; a lot of the calls to &lt;tt&gt;DisplayMessage&lt;/tt&gt; don't pass this value because they ignore it anyways. Second, one of the key calls needed in &lt;tt&gt;DisplayMessage&lt;/tt&gt; turns out to be a &lt;tt&gt;[noscript]&lt;/tt&gt; method on an internal Gecko object. What this means is you can't actually implement the message service in JavaScript.
&lt;/p&gt;&lt;p&gt;
There is good news, however. Many of the methods in &lt;tt&gt;nsIMsgMessageService&lt;/tt&gt; are actually variants of "fetch the contents of this message"; indeed, the standard implementations typically funnel the methods to a &lt;tt&gt;FetchMessage&lt;/tt&gt;. My solution is to reduce all of this to a single method that you have to implement, and you get your choice of two ways to run it. Owing to implementation design artifacts, I've done it both ways and can show it to you.
&lt;/p&gt;&lt;h4&gt;Body channel&lt;/h4&gt;&lt;p&gt;
The first way is to stream the body as a channel. This is probably not the preferred method. Telling us that you did this is simple:&lt;/p&gt;
&lt;pre class="lang-js"&gt;
wfService&lt;span class="special"&gt;.&lt;/span&gt;prototype &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
  getMessageContents&lt;span class="special"&gt;:&lt;/span&gt; &lt;span class="keyword"&gt;function&lt;/span&gt; &lt;span class="special"&gt;(&lt;/span&gt;aMsgHdr&lt;span class="special"&gt;,&lt;/span&gt; aMsgWindow&lt;span class="special"&gt;,&lt;/span&gt; aCallback&lt;span class="special"&gt;)&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
    &lt;span class="keyword"&gt;let&lt;/span&gt; task &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="keyword"&gt;new&lt;/span&gt; LoadMessageTask&lt;span class="special"&gt;(&lt;/span&gt;aMsgHdr&lt;span class="special"&gt;);&lt;/span&gt;
    aCallback&lt;span class="special"&gt;.&lt;/span&gt;deliverMessageBodyAsChannel&lt;span class="special"&gt;(&lt;/span&gt;task&lt;span class="special"&gt;,&lt;/span&gt; &lt;span class="string"&gt;"text/html"&lt;/span&gt;&lt;span class="special"&gt;);&lt;/span&gt;
  &lt;span class="special"&gt;}&lt;/span&gt;
&lt;span class="special"&gt;};&lt;/span&gt;
&lt;/pre&gt;&lt;p&gt;
Seriously, that's the full code to say that you have a channel. The channel itself implements &lt;a href="https://developer.mozilla.org/en/nsIChannel"&gt;&lt;tt&gt;nsIChannel&lt;/tt&gt;&lt;/a&gt;, but we only use very few methods: &lt;tt&gt;asyncOpen&lt;/tt&gt; (we never synchronously open), &lt;tt&gt;isPending&lt;/tt&gt;, &lt;tt&gt;cancel&lt;/tt&gt;, &lt;tt&gt;suspend&lt;/tt&gt;, and &lt;tt&gt;resume&lt;/tt&gt;. The primary purpose of the channel is just to funnel the input stream of the body (not the message headers; those will be written based on the message header). The channel implementation is moderately simple:&lt;/p&gt;
&lt;pre class="lang-js"&gt;
&lt;span class="keyword"&gt;function&lt;/span&gt; LoadMessageTask&lt;span class="special"&gt;(&lt;/span&gt;hdr&lt;span class="special"&gt;)&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
  &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_hdr &lt;span class="special"&gt;=&lt;/span&gt; hdr&lt;span class="special"&gt;;&lt;/span&gt;
  &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_uri &lt;span class="special"&gt;=&lt;/span&gt; hdr&lt;span class="special"&gt;.&lt;/span&gt;folder&lt;span class="special"&gt;.&lt;/span&gt;getUriForMsg&lt;span class="special"&gt;(&lt;/span&gt;hdr&lt;span class="special"&gt;);&lt;/span&gt;
  &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_server &lt;span class="special"&gt;=&lt;/span&gt; hdr&lt;span class="special"&gt;.&lt;/span&gt;folder&lt;span class="special"&gt;.&lt;/span&gt;server&lt;span class="special"&gt;;&lt;/span&gt;
&lt;span class="special"&gt;}&lt;/span&gt;
LoadMessageTask&lt;span class="special"&gt;.&lt;/span&gt;prototype &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
  runTask&lt;span class="special"&gt;:&lt;/span&gt; &lt;span class="keyword"&gt;function&lt;/span&gt; &lt;span class="special"&gt;(&lt;/span&gt;protocol&lt;span class="special"&gt;)&lt;/span&gt; &lt;span
class="special"&gt;{&lt;/span&gt;
    &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_listener&lt;span class="special"&gt;.&lt;/span&gt;onStartRequest&lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;,&lt;/span&gt; &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_channelCtxt&lt;span class="special"&gt;);&lt;/span&gt;
    &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_pipe &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="type"&gt;Cc&lt;/span&gt;&lt;span class="special"&gt;[&lt;/span&gt;&lt;span class="string"&gt;"@mozilla.org/pipe;1"&lt;/span&gt;&lt;span class="special"&gt;].&lt;/span&gt;createInstance&lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="type"&gt;Ci&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;nsIPipe&lt;span class="special"&gt;);&lt;/span&gt;
    &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_pipe&lt;span class="special"&gt;.&lt;/span&gt;init&lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="keyword"&gt;false&lt;/span&gt;&lt;span class="special"&gt;,&lt;/span&gt; &lt;span class="keyword"&gt;false&lt;/span&gt;&lt;span class="special"&gt;,&lt;/span&gt; &lt;span class="constant"&gt;4096&lt;/span&gt;&lt;span class="special"&gt;,&lt;/span&gt; &lt;span class="constant"&gt;0&lt;/span&gt;&lt;span class="special"&gt;,&lt;/span&gt; &lt;span class="keyword"&gt;null&lt;/span&gt;&lt;span class="special"&gt;);&lt;/span&gt;
    &lt;span class="comment"&gt;/* load url */&lt;/span&gt;
  &lt;span class="special"&gt;},&lt;/span&gt;&lt;br&gt;&amp;nbsp; onUrlLoaded&lt;span class="special"&gt;:&lt;/span&gt; &lt;span class="keyword"&gt;function&lt;/span&gt; &lt;span class="special"&gt;(&lt;/span&gt;document&lt;span class="special"&gt;)&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
    &lt;span class="keyword"&gt;let&lt;/span&gt; body &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="comment"&gt;/* body */&lt;/span&gt;&lt;span class="special"&gt;;&lt;/span&gt;
    &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_pipe&lt;span class="special"&gt;.&lt;/span&gt;outputStream&lt;span class="special"&gt;.&lt;/span&gt;write&lt;span class="special"&gt;(&lt;/span&gt;body&lt;span class="special"&gt;,&lt;/span&gt; body&lt;span class="special"&gt;.&lt;/span&gt;length&lt;span class="special"&gt;);&lt;/span&gt;
    &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_listener&lt;span class="special"&gt;.&lt;/span&gt;onDataAvailable&lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;,&lt;/span&gt; &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_channelCtxt&lt;span class="special"&gt;,&lt;/span&gt;
      &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_pipe&lt;span class="special"&gt;.&lt;/span&gt;inputStream&lt;span class="special"&gt;,&lt;/span&gt; &lt;span class="constant"&gt;0&lt;/span&gt;&lt;span class="special"&gt;,&lt;/span&gt; &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_pipe&lt;span class="special"&gt;.&lt;/span&gt;inputStream&lt;span class="special"&gt;.&lt;/span&gt;available&lt;span class="special"&gt;());&lt;/span&gt;
  &lt;span class="special"&gt;},&lt;/span&gt;
  onTaskCompleted&lt;span class="special"&gt;:&lt;/span&gt; &lt;span class="keyword"&gt;function&lt;/span&gt; &lt;span class="special"&gt;(&lt;/span&gt;protocol&lt;span class="special"&gt;)&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
  &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_listener&lt;span class="special"&gt;.&lt;/span&gt;onStopRequest&lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;,&lt;/span&gt; &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_channelCtxt&lt;span class="special"&gt;,&lt;/span&gt; &lt;span class="type"&gt;Cr&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;NS_OK&lt;span class="special"&gt;);&lt;/span&gt;
  &lt;span class="special"&gt;},&lt;/span&gt;
  QueryInterface&lt;span class="special"&gt;:&lt;/span&gt; XPCOMUtils&lt;span class="special"&gt;.&lt;/span&gt;generateQI&lt;span class="special"&gt;([&lt;/span&gt;&lt;span class="type"&gt;Ci&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;nsIChannel&lt;span class="special"&gt;,&lt;/span&gt; &lt;span class="type"&gt;Ci&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;nsIRequest&lt;span class="special"&gt;]),&lt;/span&gt;
  asyncOpen&lt;span class="special"&gt;:&lt;/span&gt; &lt;span class="keyword"&gt;function&lt;/span&gt; &lt;span class="special"&gt;(&lt;/span&gt;listener&lt;span class="special"&gt;,&lt;/span&gt; context&lt;span class="special"&gt;)&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
    &lt;span class="keyword"&gt;if&lt;/span&gt; &lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_listener&lt;span class="special"&gt;)&lt;/span&gt;
      &lt;span class="keyword"&gt;throw&lt;/span&gt; &lt;span class="type"&gt;Cr&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;NS_ERROR_ALREADY_OPENED&lt;span class="special"&gt;;&lt;/span&gt;
    &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_listener &lt;span class="special"&gt;=&lt;/span&gt; listener&lt;span class="special"&gt;;&lt;/span&gt;
    &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_channelCtxt &lt;span class="special"&gt;=&lt;/span&gt; context&lt;span class="special"&gt;;&lt;/span&gt;

    &lt;span class="comment"&gt;// Fire off the task!&lt;/span&gt;
    &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_server&lt;span class="special"&gt;.&lt;/span&gt;wrappedJSObject&lt;span class="special"&gt;.&lt;/span&gt;runTask&lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;);&lt;/span&gt;
  &lt;span class="special"&gt;}&lt;/span&gt;
&lt;span class="special"&gt;};&lt;/span&gt;
&lt;/pre&gt;&lt;p&gt;
There are some things to note. First, this code can synchronously callback &lt;tt&gt;onStartRequest&lt;/tt&gt; from &lt;tt&gt;runTask&lt;/tt&gt;, which is a necko no-no. However, our magic glue channel gracefully handles this (by posting the call to &lt;tt&gt;asyncOpen&lt;/tt&gt; in another event). Loading the input stream is done with a pipe here, and I'm doing a quick-and-easy implementation that does not take into account potential internationalization issues. I also haven't bothered to implement the other methods I should here, mostly because this code is primarily an artifact of an earlier approach, whose only purpose now is demonstrating channel-based loading.
&lt;/p&gt;&lt;h4&gt;Body input streams&lt;/h4&gt;&lt;p&gt;
The second method of implementation is just to give us the message body as an input stream:&lt;/p&gt;
&lt;pre class="lang-js"&gt;
getMessageContents&lt;span class="special"&gt;:&lt;/span&gt; &lt;span class="keyword"&gt;function&lt;/span&gt; &lt;span class="special"&gt;(&lt;/span&gt;aMsgHdr&lt;span class="special"&gt;,&lt;/span&gt; aMsgWindow&lt;span class="special"&gt;,&lt;/span&gt; aCallback&lt;span class="special"&gt;)&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
  &lt;span class="keyword"&gt;let&lt;/span&gt; pipe &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="type"&gt;Cc&lt;/span&gt;&lt;span class="special"&gt;[&lt;/span&gt;&lt;span class="string"&gt;"@mozilla.org/pipe;1"&lt;/span&gt;&lt;span class="special"&gt;].&lt;/span&gt;createInstance&lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="type"&gt;Ci&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;nsIPipe&lt;span class="special"&gt;);&lt;/span&gt;
  pipe&lt;span class="special"&gt;.&lt;/span&gt;init&lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="keyword"&gt;false&lt;/span&gt;&lt;span class="special"&gt;,&lt;/span&gt; &lt;span class="keyword"&gt;false&lt;/span&gt;&lt;span class="special"&gt;,&lt;/span&gt; &lt;span class="constant"&gt;4096&lt;/span&gt;&lt;span class="special"&gt;,&lt;/span&gt; &lt;span class="constant"&gt;0&lt;/span&gt;&lt;span class="special"&gt;,&lt;/span&gt; &lt;span class="keyword"&gt;null&lt;/span&gt;&lt;span class="special"&gt;);&lt;/span&gt;
  aCallback&lt;span class="special"&gt;.&lt;/span&gt;deliverMessageBodyAsStream&lt;span class="special"&gt;(&lt;/span&gt;pipe&lt;span class="special"&gt;.&lt;/span&gt;inputStream&lt;span class="special"&gt;,&lt;/span&gt; &lt;span class="string"&gt;"text/html"&lt;/span&gt;&lt;span class="special"&gt;);&lt;/span&gt;
  aMsgHdr&lt;span class="special"&gt;.&lt;/span&gt;folder&lt;span class="special"&gt;.&lt;/span&gt;server&lt;span class="special"&gt;.&lt;/span&gt;wrappedJSObject&lt;span class="special"&gt;.&lt;/span&gt;runTask&lt;span class="special"&gt;(&lt;/span&gt;
    &lt;span class="keyword"&gt;new&lt;/span&gt; LoadMessageTask&lt;span class="special"&gt;(&lt;/span&gt;aMsgHdr&lt;span class="special"&gt;,&lt;/span&gt; pipe&lt;span class="special"&gt;.&lt;/span&gt;outputStream&lt;span class="special"&gt;));&lt;/span&gt;
&lt;span class="special"&gt;}&lt;/span&gt;

&lt;span class="keyword"&gt;function&lt;/span&gt; LoadMessageTask&lt;span class="special"&gt;(&lt;/span&gt;hdr&lt;span class="special"&gt;,&lt;/span&gt; outstream&lt;span class="special"&gt;)&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
  &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_hdr &lt;span class="special"&gt;=&lt;/span&gt; hdr&lt;span class="special"&gt;;&lt;/span&gt;
  &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_outputStream &lt;span class="special"&gt;=&lt;/span&gt; outstream&lt;span class="special"&gt;;&lt;/span&gt;
&lt;span class="special"&gt;}&lt;/span&gt;
LoadMessageTask&lt;span class="special"&gt;.&lt;/span&gt;prototype &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
  runTask&lt;span class="special"&gt;:&lt;/span&gt; &lt;span class="keyword"&gt;function&lt;/span&gt; &lt;span class="special"&gt;(&lt;/span&gt;protocol&lt;span class="special"&gt;)&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
    protocol&lt;span class="special"&gt;.&lt;/span&gt;loadUrl&lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="comment"&gt;/* url */&lt;/span&gt;&lt;span class="special"&gt;,&lt;/span&gt; protocol&lt;span class="special"&gt;.&lt;/span&gt;_oneShot&lt;span class="special"&gt;);&lt;/span&gt;
  &lt;span class="special"&gt;},&lt;/span&gt;
  onUrlLoaded&lt;span class="special"&gt;:&lt;/span&gt; &lt;span class="keyword"&gt;function&lt;/span&gt; &lt;span class="special"&gt;(&lt;/span&gt;document&lt;span class="special"&gt;)&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
    &lt;span class="keyword"&gt;let&lt;/span&gt; body &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="comment"&gt;/* body */&lt;/span&gt;&lt;span class="special"&gt;;&lt;/span&gt;
    &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_outputStream&lt;span class="special"&gt;.&lt;/span&gt;write&lt;span class="special"&gt;(&lt;/span&gt;body&lt;span class="special"&gt;,&lt;/span&gt; body&lt;span class="special"&gt;.&lt;/span&gt;length&lt;span class="special"&gt;);&lt;/span&gt;
  &lt;span class="special"&gt;},&lt;/span&gt;
  onTaskCompleted&lt;span class="special"&gt;:&lt;/span&gt; &lt;span class="keyword"&gt;function&lt;/span&gt; &lt;span class="special"&gt;(&lt;/span&gt;protocol&lt;span class="special"&gt;)&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
    &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_outputStream&lt;span class="special"&gt;.&lt;/span&gt;close&lt;span class="special"&gt;();&lt;/span&gt;
  &lt;span class="special"&gt;}&lt;/span&gt;
&lt;span class="special"&gt;};&lt;/span&gt;
&lt;/pre&gt;&lt;p&gt;
Here, the basic appraoch is still the same: we open up a pipe, stuff our body in one end and give the other end to the stream code. However, we don't need to do the other work that comes with loading the URI, which streamlines the code greatly. We can also pass in to the callback method an underlying request that will take care of network load stopping, etc., for us if we so choose, but the argument is optional.
&lt;/p&gt;&lt;h4&gt;More implementation&lt;/h4&gt;&lt;p&gt;
Naturally, you have to add some more contract implementations to get all of the services to work right. The following is a sample of my chrome.manifest as it stands:&lt;/p&gt;
&lt;pre&gt;component {207a7d55-ec83-4181-a8e7-c0b3128db70b} components/wfFolder.js
component {6387e3a1-72d4-464a-b6b0-8bc817d2bbbc} components/wfServer.js
component {74347a0c-6ccf-4b7a-a429-edd208288c55} components/wfService.js
contract @mozilla.org/nsMsgDatabase/msgDB-&lt;span class="special"&gt;webforum&lt;/span&gt; &lt;i&gt;{e8b6b6ca-cc12-46c7-9a2c-a0855c311e07}&lt;/i&gt;
contract @mozilla.org/rdf/resource-factory;1?name=&lt;span class="special"&gt;webforum&lt;/span&gt; {207a7d55-ec83-4181-a8e7-c0b3128db70b}
contract @mozilla.org/messenger/server;1?type=&lt;span class="special"&gt;webforum&lt;/span&gt; {6387e3a1-72d4-464a-b6b0-8bc817d2bbbc}
contract @mozilla.org/messenger/protocol/info;1?type=&lt;span class="special"&gt;webforum&lt;/span&gt; {74347a0c-6ccf-4b7a-a429-edd208288c55}
contract @mozilla.org/messenger/backend;1?type=&lt;span class="special"&gt;webforum&lt;/span&gt; {74347a0c-6ccf-4b7a-a429-edd208288c55}
contract @mozilla.org/messenger/messageservice;1?type=&lt;span class="special"&gt;webforum&lt;/span&gt;-message &lt;i&gt;{7e3d2918-d073-4c98-9ec7-f419a05c29de}&lt;/i&gt;
&lt;/pre&gt;&lt;p&gt;
The first and last CIDs, as you'll notice, were not implemented by me (well, kind of). The first is the CID of &lt;tt&gt;nsMsgDatabase&lt;/tt&gt; that I've exposed in one of my comm-central patches; the latter is the CID of my extension message service implementation. Also of importance is that I included a second contract-ID for my service implementation, this is for my new interface &lt;a href="http://hg.mozilla.org/users/Pidgeot18_gmail.com/acctimpl-glue/file/tip/msgIAccountBackend.idl"&gt;&lt;tt&gt;msgIAccountBackend&lt;/tt&gt;&lt;/a&gt;, which is the source of the &lt;tt&gt;getMessageContents&lt;/tt&gt; method I implemented earlier, and which you also need to implement to get it to work.
&lt;/p&gt;&lt;p&gt;
Finally, you need to generate the message URI properly. Fortunately, this just requires you to implement one method:
&lt;/p&gt;&lt;pre class="lang-js"&gt;
wfFolder&lt;span class="special"&gt;.&lt;/span&gt;prototype &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
  &lt;span class="keyword"&gt;get&lt;/span&gt; baseMessageURI&lt;span class="special"&gt;()&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
    &lt;span class="keyword"&gt;if&lt;/span&gt; &lt;span class="special"&gt;(!&lt;/span&gt;&lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_inner&lt;span class="special"&gt;[&lt;/span&gt;&lt;span class="string"&gt;"#mBaseMessageURI"&lt;/span&gt;&lt;span class="special"&gt;])&lt;/span&gt;
      &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_inner&lt;span class="special"&gt;[&lt;/span&gt;&lt;span class="string"&gt;"#mBaseMessageURI"&lt;/span&gt;&lt;span class="special"&gt;]&lt;/span&gt; &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="string"&gt;"&lt;span class="special"&gt;webforum&lt;/span&gt;-message"&lt;/span&gt; &lt;span class="special"&gt;+&lt;/span&gt;
        &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_inner&lt;span class="special"&gt;[&lt;/span&gt;&lt;span class="string"&gt;"#mURI"&lt;/span&gt;&lt;span class="special"&gt;].&lt;/span&gt;substring&lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="string"&gt;"&lt;span class="special"&gt;webforum&lt;/span&gt;"&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;length&lt;span class="special"&gt;);&lt;/span&gt;
    &lt;span class="keyword"&gt;return&lt;/span&gt; &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_inner&lt;span class="special"&gt;[&lt;/span&gt;&lt;span class="string"&gt;"#mBaseMessageURI"&lt;/span&gt;&lt;span class="special"&gt;];&lt;/span&gt;
  &lt;span class="special"&gt;}&lt;/span&gt;
&lt;span class="special"&gt;};&lt;/span&gt;
&lt;/pre&gt;&lt;h4&gt;Under the hood&lt;/h4&gt;&lt;p&gt;
For those who wish to know more about is actually going on, I am going to describe the full loading process, from the moment you click on the header to the time you see the output.
&lt;/p&gt;&lt;p&gt;
Clicking on the header (after some Gecko code that I'll elide) leads you to &lt;tt&gt;nsMsgDBView::SelectionChanged&lt;/tt&gt;. This code is kicked back to the
front-end via &lt;tt&gt;nsIMsgWindow::commandUpdater&lt;/tt&gt;'s &lt;tt&gt;summarizeSelection&lt;/tt&gt; method. For Thunderbird, this is the method that handles clearing some updates and also decides whether or not to show the message summary (which is "yes if there is a collapsed thread, this pref is set, and this is not a news folder" &lt;a href="#note-4.2"&gt;[2]&lt;/a&gt;). Summarization is a topic I'll handle later.
&lt;/p&gt;&lt;p&gt;
In the case of a regular message, the result of the loading is to display the message. The message URI is constructed, and then passed to &lt;tt&gt;nsMessenger::OpenURL&lt;/tt&gt;, which calls either &lt;tt&gt;nsIMsgMessageService::DisplayMessage&lt;/tt&gt; or &lt;tt&gt;nsIWebNavigation::LoadURI&lt;/tt&gt;, depending on whether or not it can find the message service. The message service converts its URI to the necko URL and then passes that&amp;mdash;since it's passed in with the docshell as a consumer&amp;mdash;to &lt;tt&gt;LoadURI&lt;/tt&gt; with slightly different flags. And thus begins the real message loading.
&lt;/p&gt;&lt;p&gt;
Loading URLs by the docshell is somewhat complicated, but it boils down to creating the channel, opening it via &lt;tt&gt;AsyncOpen&lt;/tt&gt;. When the channel is opened (&lt;tt&gt;OnStartRequest&lt;/tt&gt; is called), it tries to find someone who can display it, based on the content type. It turns out that there is &lt;a href="http://mxr.mozilla.org/comm-central/source/mailnews/base/src/MailNewsDLF.cpp"&gt;a display handler&lt;/a&gt; in the core mailnews code that can display message/rfc822 messages, which it does by converting the text into text/html (via libmime) and using the standard HTML display widget. I'm going to largely treat libmime as a black box; it processes text as &lt;tt&gt;OnDataAvailable&lt;/tt&gt; is called and spits out HTML via a mixture of &lt;tt&gt;OnDataAvailable&lt;/tt&gt; and callbacks via the channel's url's header sink, or the channel's url's message window's header sink.
&lt;/p&gt;&lt;p&gt;
The special extension message service implementation goes a few steps further. By managing the display and channel code itself, it allows new implementors to not worry so much about some of the particular requirements during the loading process. Its &lt;tt&gt;AsyncOpen&lt;/tt&gt; method is guaranteed to not run &lt;tt&gt;OnStartRequest&lt;/tt&gt; synchronously, and also properly manages the load groups and content type manipulation. Furthermore, the channel manually synthesizes the full RFC 822 envelope (the code inspired by some compose code), and ensures that the &lt;tt&gt;nsIStreamListener&lt;/tt&gt; methods are called with the proper request parameter (the original loaded channel must be the request passed).
&lt;/p&gt;&lt;h4&gt;Alternative implementation&lt;/h4&gt;&lt;p&gt;
It is still possible to do this without using the helper implementation. In that case, there are alternatives. The first thing to do is to implement the network handler, for which you'll definitely need a protocol implementation, and probably a channel and url as well. A url that does not implement &lt;tt&gt;nsIMsgMailNewsUrl&lt;/tt&gt; and &lt;tt&gt;nsIMsgMessageUrl&lt;/tt&gt; is likely to run into problems with some parts of the code. You can possibly get by without a message service for now, but I suspect it is necessary for some other portions of the code. To get the message header display right, you need a message/rfc822 content-type (which gets changed to text/html, so it has to be settable!).
&lt;/p&gt;&lt;p&gt;
A possible alternate implementation would be to send a straight text/html channel for the body and then manually call the methods on the header sink, i.e., bypass libmime altogether. A word of caution about this approach is that libmime can output different things based on the query parameters in the URL, and I don't know which of those outputs are used or not.
&lt;/p&gt;&lt;h4&gt;Next steps&lt;/h4&gt;&lt;p&gt;
Now that we have message display working, we pretty much have a working implementation of the process of getting new messages and displaying them. There are several ways I can go from here, but for now, I'll make part 5 deal with the account manager. Other parts that I am planning to do soon include dealing with subscription, filters, and other such code.
&lt;/p&gt;&lt;h4&gt;Notes&lt;/h4&gt;&lt;ol&gt;
&lt;li id="note-4.1"&gt;If you are not aware, "necko" refers to the networking portion of the Mozilla codebase. The terms "URI" and "URL" also have standard meanings, but for the purposes of this guide, they mean different things. I will try to keep them distinct, but I have a tendency to naturally prefer "URI" most of the time, so I may slip up.&lt;/li&gt;
&lt;li id="note-4.2"&gt;Unfortunately, a lot of the front-end code has taken it upon itself to hardcode checks for certain implementations to enable/disable features. Hopefully, as Kent James and I progress on this work, these barriers can be reduced.&lt;/li&gt;
&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-314706627166576167?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/314706627166576167/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=314706627166576167' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/314706627166576167'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/314706627166576167'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2011/01/developing-new-account-types-part-4.html' title='Developing new account types, Part 4: Displaying messages'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-1126446365993946210</id><published>2011-01-07T23:45:00.003-05:00</published><updated>2011-01-08T00:06:38.224-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='codecoverage'/><category scheme='http://www.blogger.com/atom/ns#' term='mailnews'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>Random code coverage statistics</title><content type='html'>Test coverage happened to come up in an IRC channel I frequent today, so I thought up some probably useless statistics on code coverage, and applied them to Thunderbird.
&lt;/p&gt;&lt;p&gt;
Since I lost the original lcov files, I reculled them from the output HTML data. Of the 112,024, only 65,012 were actually run (which matches the summary output, so I'm good). It turns out that, in total, there was a whopping 168,563,629 line executions in the test, or an average of 1,504.89 hits per line. If we only count among the lines that were hit, the average number of times run is 2,592.81.
&lt;/p&gt;&lt;p&gt;
Given the general paucity of tests in comm-central, why is this number so high? Well, it turns out that some functions in libmime run a lot. The most was &lt;a href="http://www.tjhsst.edu/~jcranmer/c-ccov/mime/src/mimebuf.cpp.gcov.html"&gt;mimebuf.cpp's line 215&lt;/a&gt;, which ran no fewer than 3,223,519 times, followed closely by line 224 at the still-impressive count of 3,222,934. The most outside of libmime (which swept the top 5) was &lt;a href="http://www.tjhsst.edu/~jcranmer/c-ccov/base/util/nsMsgLineBuffer.cpp.gcov.html"&gt;nsMsgLineBuffer.cpp's lines 140-150&lt;/a&gt;, at the count of 1,828,695. I think we can safely say that those lines are well-covered.
&lt;/p&gt;&lt;p&gt;
The numbers for functions seem similarly skewed: 10,951 functions, 6,443 of them actually run. Between these, there were a total of 26,588,690 function calls, or 2,427.96 or 4,126.76 calls per function (depending on your count). And these high-flying functions are both in libmime, &lt;a href="http://www.tjhsst.edu/~jcranmer/c-ccov/mime/src/comi18n.cpp.gcov.html"&gt;com18n.cpp&lt;/a&gt;, to be specific: NextChar_UTF8 and utf8_nextchar, with 1,782,545 calls each. Outside of libmime (who again sweeps the top 5) is &lt;a href="http://www.tjhsst.edu/~jcranmer/c-ccov/base/src/nsMsgFolderCache.h.gcov.html"&gt;nsMsgFolderCache::GetEnv&lt;/a&gt;, with a total of 450,400 function calls. I think we can safely say that said function is bug-free.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-1126446365993946210?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/1126446365993946210/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=1126446365993946210' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/1126446365993946210'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/1126446365993946210'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2011/01/random-code-coverage-statistics.html' title='Random code coverage statistics'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-8990249663746492295</id><published>2011-01-05T16:01:00.004-05:00</published><updated>2011-01-05T16:56:41.737-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='news'/><category scheme='http://www.blogger.com/atom/ns#' term='mailnews'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>Predicted 2011 Mozilla work</title><content type='html'>Another year, another time to predict what work I want to get to this year. And, of course, another chance to fail to do that work.
&lt;/p&gt;&lt;h4&gt;News submodule&lt;/h4&gt;&lt;p&gt;
This year, I am going to give myself a goal of bringing the total number of open bugs in the MailNews Core: Networking: NNTP component to below 100, or, at the very least, below 104 to make it the least buggy of the mailnews core protocol implementations. I've laid out for myself a map of all bugs in the component, so I know what needs to be worked on. My current work on news URIs by itself should get me almost there: I have &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=226890"&gt;patches&lt;/a&gt; awaiting review that fix bugs 37465, 108297, 226890, 403242, 498321, and 617287, as well as patches in my queue that fix bugs 108970, 110841, and 224335, with patches for bugs 80972, 108107, 108877, 133793, 167991, 327885, 411568, and 530193 likely to come. In other words, that's &lt;strong&gt;supporting no-authority news URLs&lt;/strong&gt;.
&lt;/p&gt;&lt;p&gt;
Outside of that, I can easily pick up a few small bugs along the way to get that number down. What's likely not going to be fixed by me is venerable bug 43278, or any expired article issues&amp;mdsash;in other words, any set of bugs that would require as much effort as the no-authority bug. I'm not quite leaving out the authentication-related bugs, but those are in the unlikelier side of the "maybe" pile.
&lt;/p&gt;&lt;h4&gt;Code coverage&lt;/h4&gt;&lt;p&gt;
My attempts to get decent JavaScript code coverage appear to have been thwarted once again (though I'm not giving up hope). If a few tweaks don't get my current approach working, I'll probably return to a simpler instrumentation-based approach (blech). I would still like to see at least the C++ code coverage analysis be run on a more regular basis now (at least weekly) so we can get a good timeline of code coverage through the ages. Building versions of mozilla a year in retrospect is not fun, especially given the annoyance of mozmill versioning.
&lt;/p&gt;&lt;h4&gt;Unfinished things&lt;/h4&gt;&lt;p&gt;
Once I get no-authority news URIs in the tree, I would like to return to new account types. Actually, the work I did has proven to be very helpful in removing the next road block (since I got intimate knowledge of how URIs are actually run behind the scenes as well as how necko works with the URIs). On the downside, my attempts to get JS to extend C++ classes appear to be getting less stable the more I work with them, so I'll probably abandon that approach and instead turn to another approach: writing an extension layer that makes it simpler to implement all of the functionality without having to get down-and-dirty. I feel justifying in saying that the less you look like an email server, the more you'll be happier not having to deal with the messy glue implementations.
&lt;/p&gt;&lt;p&gt;
Now, that said, I may still continue working in the vein of the blog posts, since it is still a nice documentation of how things go on just below the hood. In any case, the two things I most want to hide in the implementation are the database and the URL, half of which has already been discussed.
&lt;/p&gt;&lt;p&gt;
Some people have asked that I continue my guides to pork. However, I have become more persuaded that the current mime implementation needs to be tossed away and restarted from scratch, which reduces my primary motivation for learning it. Furthermore, pork (at least as built in elsa) is pretty much considered abandoned, although there are intentions to rebuild the tool on clang, now that it has a decent C++ parser.
&lt;/p&gt;&lt;p&gt;
I have been told by Mark Banner that he intends to get a new roadmap for the address book up in the near future. To the extent that it does not conflict with other goals I have, I will probably do some implementation under that. I may also decide to attempt again to work on address book integration with Linux desktops, given some experience I've had over the past year.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-8990249663746492295?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/8990249663746492295/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=8990249663746492295' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/8990249663746492295'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/8990249663746492295'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2011/01/predicted-2011-mozilla-work.html' title='Predicted 2011 Mozilla work'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-5103655491617465895</id><published>2010-10-16T10:26:00.002-04:00</published><updated>2010-10-16T10:47:11.720-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='codecoverage'/><category scheme='http://www.blogger.com/atom/ns#' term='mailnews'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>Updated code coverage</title><content type='html'>It's been about seven months since I last ran code coverage tools on Thunderbird, so I thought I would do it again. In these intervening seven months, Thunderbird has moved to libxul, and the mozmill tests have become more important, which means I get to change my methodology slightly.
&lt;/p&gt;&lt;p&gt;
Problematic caveats first, then. For various reasons, my build for code-coverage results runs on a 64-bit machine which I have to ssh to get to and on which I lack administrative privileges. Previously, the build setup caused gcov to fail to work properly for the mozilla-central code, causing my build scripts to require hacks to only cover the comm-central code. It seems that something in either the environment, the code, or the building of libxul caused it to be fixed.
&lt;/p&gt;&lt;p&gt;
On the other hand, the environment prevented me from running the mozmill tests correctly. On this computer, the user accounts are set up via LDAP, so the gtk initialization code tried to get user information from libc, which got it from NSS, which got it from LDAP, which tried to get it from another LDAP library. Unfortunately, it chose to use the directory/c-sdk ldap instead of the system ldap causing a crash. The only solution was to disable ldap, which required a few tricks to work correctly. Oh, and somehow the tests failed if I didn't enable libxul.
&lt;/p&gt;&lt;p&gt;
My last issue was with mozmill. I had a plan to use Xvfb for the display for mozmill (the intent being that I could automate mozmill tests on several computers overnight via screen). Turns out that it someone complained about needing Xrandr, so I got to run all of the mozmill tests via twice-forwarded X connections.
&lt;/p&gt;&lt;p&gt;
Anyways, the results are in. &lt;a href="http://www.tjhsst.edu/~jcranmer/c-ccov-no-mozmill/"&gt;These are the results before mozmill tests were run&lt;/a&gt;, and &lt;a href="http://www.tjhsst.edu/~jcranmer/c-ccov/"&gt;these are the results including mozmill tests&lt;/a&gt;. By comparison, &lt;a href="http://www.tjhsst.edu/~jcranmer/c-ccov-old/"&gt;these are the results from my last run&lt;/a&gt; (which do not include mozmill tests). For completeness sake, 41 xpcshell-tests failed and 10 mozmill tests failed. I do not have the record of which ones failed, however.
&lt;/p&gt;&lt;p&gt;
Finally, here is the HD view of the code-coverage treemap results:&lt;br /&gt;
&lt;img src="http://www.tjhsst.edu/~jcranmer/c-ccov/coverage.png" /&gt;&lt;br /&gt;
By comparison, the old code-coverage treemap results:&lt;br /&gt;
&lt;img src="http://www.tjhsst.edu/~jcranmer/c-ccov-old/coverage.png" /&gt;&lt;br /&gt;
I hope you enjoy the results!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-5103655491617465895?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/5103655491617465895/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=5103655491617465895' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/5103655491617465895'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/5103655491617465895'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2010/10/updated-code-coverage.html' title='Updated code coverage'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-4927020407906605916</id><published>2010-09-04T20:08:00.003-04:00</published><updated>2010-09-04T21:24:02.386-04:00</updated><title type='text'>Usage share of newsreaders</title><content type='html'>I have noted before, by a nonscientific and utterly biased survey, that Thunderbird appeared to account for a significant share of the newsreader market (testing &lt;a href="http://bugzilla.mozilla.org/show_bug.cgi?id=16913"&gt;bug 16913&lt;/a&gt; was what caused me to discover this fact). But actually finding any attempts to measure usage share of newsreaders via Google has actually been rather frustrating. You can easily find market shares of web browsers, desktop operating systems, server operating systems (though the numbers vary wildly?), and mobile platforms. But not things like email client shares or newsreader shares.
&lt;/p&gt;&lt;p&gt;
Okay, I am not about to find market shares of email clients. I have no access to anywhere near enough a representative sample that could work. But collecting newsreader market shares should not be that hard. After all, pretty much anyone can pick up a large, representative sample of news postings... connect to a NNTP server of your choice. So, seeing as how it's a three-day weekend, I thought I might as well collect the data myself. The other reason for my collecting this data was to demonstrate that a significant number of Thunderbird users are NNTP, so removing NNTP support would adversely affect the userbase.
&lt;/p&gt;&lt;h5&gt;Methodology&lt;/h5&gt;&lt;p&gt;
First off, I have to define what I mean by "usage share." Unlike other mediums, a relatively small numbers of users account for a relatively large share of NNTP postings. I've decided to measure it by the number of posts generated by each NNTP client, since it's easier to calculate, and I think it is more informative than the measuring by individual users.
&lt;/p&gt;&lt;p&gt;
I also have to pick the subset to log. For this set of data, I collected every single news article in the Big-8 newsgroups on my school's NNTP server (news.gatech.edu), which has a retention time of a month (30 days is the exact number, I think). I did not even attempt to filter out spam messages, and I did not account for cross-posting (my script managed to crash due to races a few times, so the totalized data which accounted for cross-posting was unreported).
&lt;/p&gt;&lt;p&gt;
Essentially, this is what I did. I ran a python which collected every group in the big 8 (determined by LIST ACTIVE wildmats) on the server. Then, it entered every group, performed an XOVER to find all messages, and then XHDR'd User-Agent, X-Mailer, and X-Newsreader to figure out what the user agent was. This script output, for every group, a total for each full string that represented the UA into a ~2MB csv file.
&lt;/p&gt;&lt;p&gt;
I collected all of the csv data into OpenOffice.org Calc and then ran a macro which attempted to collect the program number and the version from the UA strings. Unsurprisingly, I had to do some hacks to get it to recognize SeaMonkey and Thunderbird correctly (Mnenhy was not helping). I output tables that broke readers down by versions and by total program counts.
&lt;/p&gt;&lt;h5&gt;Results&lt;/h5&gt;&lt;p&gt;
It turns out that there is an incredibly long tail of newsreaders. There are about 250 different UA strings I found. Excluding &lt;a href="http://home.httrack.net/~nocem/"&gt;one particularly prevalent bot&lt;/a&gt; and those postings for which I could not find a UA string, I found around 430,000 messages (there may be some other things dropped by copy-paste errors). Of these, the top 5 newsreaders account for just 79% of the total count. By contrast, the top 5 web browsers account for very nearly 100% of the total. Some interesting newsreaders I found in the long tail:&lt;/p&gt;&lt;ul&gt;
&lt;li&gt;Mozilla 4.8 [en] (Windows NT 5.0; U)&lt;/li&gt;
&lt;li&gt;Mozilla 3.04 (WinNT; U)&lt;/li&gt;
&lt;li&gt;trn 4.0-test70 (17 January 1999)&lt;/li&gt;
&lt;li&gt;Lynx/2.8.5rel.1 libwww-FM/2.14 SSL-MM/1.4.1 GNUTLS/0.8.12&lt;/li&gt;
&lt;li&gt;MyBB&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;
Finally, here is the table of the top newsreaders:
&lt;/p&gt;&lt;table&gt;
&lt;tr&gt;&lt;th&gt;Newsreader&lt;/th&gt;&lt;th&gt;Total&lt;/th&gt;&lt;th&gt;Percent&lt;/th&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Google Groups&lt;/td&gt;&lt;td&gt;189536&lt;/td&gt;&lt;td&gt;43.94%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Thunderbird&lt;/td&gt;&lt;td&gt;52258&lt;/td&gt;&lt;td&gt;12.11%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Forte Agent&lt;/td&gt;&lt;td&gt;47100&lt;/td&gt;&lt;td&gt;10.92%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Microsoft Outlook Express&lt;/td&gt;&lt;td&gt;41042&lt;/td&gt;&lt;td&gt;9.51%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Microsoft Windows Live Mail&lt;/td&gt;&lt;td&gt;11196&lt;/td&gt;&lt;td&gt;2.60%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;MT-NewsWatcher&lt;/td&gt;&lt;td&gt;10872&lt;/td&gt;&lt;td&gt;2.52%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Other&lt;/td&gt;&lt;td&gt;79359&lt;/td&gt;&lt;td&gt;18.40%&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;&lt;p&gt;
That Google Groups has the highest market share is not surprising, but I was surprised by the strong showing of Forte Agent and the poor showing of traditional newsreaders (e.g., tin, rn-based newsreaders). I guess this goes to show you that Windows has a surprisingly large market share in the Big 8 newsgroups. For SeaMonkey enthusiasts, your newsreader has a mere 4,187 postings (with another ~5K provided by other Mozilla distributions, some of whom cannot be determined... Mnenhy made processing UA strings difficult).
&lt;/p&gt;&lt;p&gt;
In terms of individual versions, one of Outlook Express's versions clocks in #1 at 31,617 total posts, with Thunderbird 3.1.2 trailing at a "mere" 23,661. Thunderbird has around 14,000 on the 2.x branch, 11,000 on the 3.0.x branch, and 25,000 on the 3.1.x branch. There is apparently some spoofing going on for SeaMonkey users as well (I found a dozen or so Firefox entries, which I presumed is a SeaMonkey-spoofed UA string).
&lt;/p&gt;&lt;p&gt;
Another datum incidentally collected was the number of postings in each hierarchy. Here they are:
&lt;/p&gt;&lt;table&gt;
&lt;tr&gt;&lt;th&gt;Hierarchy&lt;/th&gt;&lt;th&gt;Count&lt;/th&gt;&lt;th&gt;Largest Newsgroup&lt;/th&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;comp.*&lt;/td&gt;&lt;td&gt;64,360&lt;/td&gt;&lt;td&gt;comp.soft-sys.matlab (8,399)&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;humanities.*&lt;/td&gt;&lt;td&gt;2,460&lt;/td&gt;&lt;td&gt;humanities.lit.authors.shakespeare (1,455)&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;misc.*&lt;/td&gt;&lt;td&gt;28,518&lt;/td&gt;&lt;td&gt;misc.test (8,796)&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;news.*&lt;/td&gt;&lt;td&gt;31,635&lt;/td&gt;&lt;td&gt;news.list.filters (26,238)&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;rec.*&lt;/td&gt;&lt;td&gt;217,548&lt;/td&gt;&lt;td&gt;rec.games.pinball (14,707)&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;sci.*&lt;/td&gt;&lt;td&gt;47,948&lt;/td&gt;&lt;td&gt;sci.electronics.design (6,076)&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;soc.*&lt;/td&gt;&lt;td&gt;87,192&lt;/td&gt;&lt;td&gt;soc.retirement (6,053)&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;talk.*&lt;/td&gt;&lt;td&gt;12,513&lt;/td&gt;&lt;td&gt;talk.origins (6,498)&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;&lt;p&gt;
Remind me again why we have the humanities hierarchy? Almost 60% of its messages come from a single newsgroup, and it has just 8 newsgroups.
&lt;/p&gt;&lt;h5&gt;Future Work&lt;/h5&gt;&lt;p&gt;
What could be done in the future is to expand this research into binary newsgroups. However, merely counting posts becomes a more inappropriate metric because binary newsgroups use a lot of multipart messages, so just because someone uploads a ginormous binary does not mean it should be counted 50 times. I also don't have access to any binary newsservers.
&lt;/p&gt;&lt;p&gt;
Another opportunity for fixing is to discount spam. As a brief test, I looked into only those newsgroups which had the name `moderated'--this resulted in a paltry sample of 3,272 messages. The statistics also appear to not change much, but the newsgroups are likely not a representative sample of the Big 8 anyways.
&lt;/p&gt;&lt;p&gt;
Finally, this needs to be broadened and run repeatedly so it can collect snapshots of the data across time. This metric suffers poorly at capturing historical data, but it could be an excellent way to get data every few months from the future, so long as someone collects all of the data in the future.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-4927020407906605916?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/4927020407906605916/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=4927020407906605916' title='12 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/4927020407906605916'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/4927020407906605916'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2010/09/usage-share-of-newsreaders.html' title='Usage share of newsreaders'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>12</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-8004687123714073162</id><published>2010-06-07T18:30:00.004-04:00</published><updated>2010-06-07T18:42:29.071-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='accttype'/><category scheme='http://www.blogger.com/atom/ns#' term='mailnews'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>Developing new account types, Part 3: Updating folders (part 3)</title><content type='html'>This series of blog posts discusses the creation of a new account type implemented in JavaScript. Over the course of these blogs, I use the development of my &lt;a href="http://hg.mozilla.org/users/Pidgeot18_gmail.com/webfora"&gt;Web Forums extension &lt;/a&gt;to explain the necessary actions in creating new account types. I hope to add a new post once every two weeks (I cannot guarantee it, though).
&lt;/p&gt;&lt;p&gt;
This blog post is a continuation of my previous two posts, which is being broken up into multiple segments to lower the amount of text one has to read in a single sitting. The current step is to actually implement the folder update.
&lt;/p&gt;&lt;h4&gt;Only new messages&lt;/h4&gt;&lt;p&gt;
Now that we know how to add messages to the database, we need to figure out how to find the downloaded messages.
&lt;/p&gt;&lt;p&gt;
It should go without saying that checking to see if you actually need to update the folder should be the first thing you probably want to do in this function. In my extension, I need to download the front page of the specific board and check the topic list to see if it matches what is stored in the database.
&lt;/p&gt;&lt;p&gt;
For now, at least, I can rely on the forum telling me about the number of replies in a thread (one less than the number of total messages), as this is shown in the thread index of a forum. What I do is grab the reply count that I've seen and subtract that from the number that is listed to get the number of new messages I need to download. Then I need only to look at the last few messages to add them to the database.
&lt;/p&gt;&lt;p&gt;
At this point, I have two main issues to worry about. First, I am working with paginated return results. That means I actually need to load multiple documents. Second, I am not getting a list of messages, but a list of threads; therefore, I need a database that is associated with threads &lt;a href="#note-3.8"&gt;[8]&lt;/a&gt;.
&lt;/p&gt;&lt;p&gt;
The database I use is a simple JSON object that exists for each folder, and so far only has a mapping of threads to the reply count that I've seen; I may give it more in later iterations of this extension.
&lt;/p&gt;&lt;p&gt;
Pagination is where the trickery in implementation comes in. First, I need to look at the thread index for new messages; if I have seen all of the messages in the last thread, I can stop looking at new pages. Otherwise, I have to grab the next page and continue recursing. Note that it is possible to hit a thread that I've fully seen and still have threads I've not seen: sticky messages can be infrequently updated yet still make it first on my list of messages.
&lt;/p&gt;&lt;p&gt;
The other issue is when loading threads. The link I end up scraping is to the &lt;i&gt;first&lt;/i&gt; page of messages for that thread, which I may already have seen. So I
need to skip over pages until I find the page that first has new messages. For now, I'm doing this na&amp;iuml;vely by actually loading each page and counting the number of posts rather than trying to deconstruct URLs and calculating where to load. I then need to look at the last set of posts, not the first set, so I calculate the start position and read forwards. Since I'm using &lt;tt&gt;querySelectorAll&lt;/tt&gt;, I get an array of results, so I don't worry about having to throw out a number of iterations; I can just start in the middle when iterating.
&lt;/p&gt;&lt;p&gt;
Once all of that is implemented, we can then put everything together to make a proper implementation of &lt;tt&gt;updateFolder&lt;/tt&gt;, the function we started implementing a few pages ago. The end result is that, when all is said and done, you can load up the message pane (the last column is the number of messages in the thread):
&lt;img alt="The thread pane after implementation" src="http://1.bp.blogspot.com/_qW4UNslWKZU/TA103Sz0IvI/AAAAAAAAACQ/BKcQZswIcVM/s1600/Thread_pane.png" height="254" width="807"&gt;
&lt;/p&gt;&lt;p&gt;
By comparison, here is an equivalent view of the forum that I loaded this from:
&lt;img alt="The equivalent forum list" src="http://3.bp.blogspot.com/_qW4UNslWKZU/TA11koml47I/AAAAAAAAACY/YiURkNwmIZ4/s1600/Forum_equiv.png" height="377" width="804"&gt;
&lt;/p&gt;&lt;p&gt;
Now, I wish to ask you, which user interface would you rather use to view the forum?
&lt;/p&gt;&lt;p&gt;
Some notes for implementors: be prepared to delete your msf files over and over again. I would recommend tackling the individual components in this order: first build a message, then your protocol object (I found it easier to test when the running tasks were already known to be working), and then start work on tying it all into the database. Leave issues like threading for after the basic stuff is laid out, then tackle determining which messages are new if it's not implicit in what you do (i.e., you don't have a "get new messages" query you can readily use). Pagination should be last: everything is easier to test if you only have a small number of messages you really need to test.
&lt;/p&gt;&lt;p&gt;
I apologize for the excessive length of this step; this happened to be pretty much the first step where most of the necessary technology had to be used. The next step is to actually be able to display the messages in our database, which should be shorter.
&lt;/p&gt;&lt;h4&gt;Notes&lt;/h4&gt;&lt;ol start="8"&gt;
&lt;li id="note-3.8"&gt;&lt;a href="http://mesquilla.com/"&gt;Kent James&lt;/a&gt; and I are both working on developing new account type extensions (he doing an Exchange connector and I this blog series); both of us have identified the narrow-mindedness of the database as an issue. It is therefore possible that my workaround here will not be necessary in the next few versions.
&lt;/li&gt;
&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-8004687123714073162?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/8004687123714073162/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=8004687123714073162' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/8004687123714073162'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/8004687123714073162'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2010/06/developing-new-account-types-part-3.html' title='Developing new account types, Part 3: Updating folders (part 3)'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_qW4UNslWKZU/TA103Sz0IvI/AAAAAAAAACQ/BKcQZswIcVM/s72-c/Thread_pane.png' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-7835550926699157960</id><published>2010-05-21T17:32:00.003-04:00</published><updated>2010-05-21T17:47:17.044-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='accttype'/><category scheme='http://www.blogger.com/atom/ns#' term='mailnews'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>Developing new account types, Part 3: Updating folders (part 2)</title><content type='html'>This series of blog posts discusses the creation of a new account type implemented in JavaScript. Over the course of these blogs, I use the development of my &lt;a
href="http://hg.mozilla.org/users/Pidgeot18_gmail.com/webfora"&gt;Web Forums extension &lt;/a&gt;to explain the necessary actions in creating new account types. I hope to add a new post once every two weeks (I cannot guarantee it, though).
&lt;/p&gt;&lt;p&gt;
This blog post is a continuation of my previous post, which is being broken up into multiple segments to lower the amount of text one has to read in a single sitting. The current step is to actually implement the folder update.
&lt;/p&gt;&lt;h4&gt;Folder updating&lt;/h4&gt;&lt;p&gt;
To actually achieve our goal of getting a correct message list, we are going to modify the implementation of updateFolder. This function is called whenever a folder is selected in the folder pane; conceptually, you can view the function as causing the cached database to be resynchronized with the actual folder. For example, this is where a local folder would actually reparse the mailbox if the database was incorrect or missing.
&lt;/p&gt;&lt;p&gt;
This function essentially consists of three steps: figure out new messages, process them (i.e., apply filters), and then announce to the world that they exist. Some account types (like IMAP) may need to do more involved message processing, but this is the general gist of what goes on &lt;a href="#note-3.4"&gt;[4]&lt;/a&gt;. I'll ignore the processing step until I start talking about filters.
&lt;/p&gt;&lt;h4&gt;Database Details Devil&lt;/h4&gt;&lt;p&gt;
To start with, I'll cover the last step. Announcing to the world that a message exist boils down to adding a new header to the database. So how do you add a new header to the database? It requires three easy steps: create the header, populate the fields, and then add it to the database. With the proper listener setup, all of the other notification is done for you automatically. But as they say, the devil is in the details.
&lt;/p&gt;&lt;p&gt;
Let me begin by explaining some things about messages. There are five different representations of the message: the message key, the message header, the message ID, the message URI, and the necko URL object. Siddharth Agarwal has &lt;a href="https://developer.mozilla.org/User:Sid0/Mailnews/Message_Representations"&gt;a nice diagram&lt;/a&gt; that shows how to convert between these representations. The last two are more concerned with displaying messages; it is the first three that are interesting right now.
&lt;/p&gt;&lt;p&gt;
Message keys are the internal database key for a message; the tuple (folder, key) is guaranteed to be unique by the database. Message keys are unsigned 32-bit integers (with &lt;tt&gt;0xFFFFFFFF&lt;/tt&gt;, or &lt;tt&gt;-1&lt;/tt&gt; in 2's complement, reserved as the "no message here" key). In general, any time a property needs to refer to another message, the message key is used; as a consequence, it means that such properties cannot refer to stuff across folders.
&lt;/p&gt;&lt;p&gt;
Message IDs are the &lt;a href="http://tools.ietf.org/html/rfc5322"&gt;RFC 5322&lt;/a&gt; identifier for a message. These identifiers are supposed to be unique (for logical messages, not in a "the message at offset &lt;tt&gt;0x234f3d&lt;/tt&gt; in this file" sense). The most important use case for message IDs is that they are a critical
component for threading.
&lt;/p&gt;&lt;p&gt;
The message header object is an object of type &lt;a href="https://developer.mozilla.org/en/nsIMsgDBHdr"&gt;nsIMsgDBHdr&lt;/a&gt;. These are objects are directly backed by the database. However, many of the properties do not notify the database of changes, so you generally do not want to actually set them. Like all generalities &lt;a href="#note-3.5"&gt;[5]&lt;/a&gt;, there are exceptions to this rule. Right now, we want to manipulate headers before adding them to database, and therefore we do not want to notify people of changes to not-yet-existing headers, so we want to actually use the fields of nsIMsgDBHdr.
&lt;/p&gt;&lt;p&gt;
So, the first thing you need to do is to decide what your message key is. Message keys are going to be used to get the message URI, so it should be a property that is easy to associate with methods. IMAP uses message UIDS, local folders the offset into the mbox &lt;a href="#note-3.6"&gt;[6]&lt;/a&gt;, and NNTP uses the key numbers in the group. In my case, it appears that the forum assigns each post a unique number, so that is what I'll use.&lt;/p&gt; &lt;p&gt;After the message key, the most important properties are the major ones for display. The &lt;tt&gt;author&lt;/tt&gt; attribute correlates to the "From" header, &lt;tt&gt;subject&lt;/tt&gt; to the "Subject" header, and &lt;tt&gt;date&lt;/tt&gt; to the "Date" header. All of these will be used to generate values in the thread pane columns; things would look strange without these.
&lt;/p&gt;&lt;p&gt;
The other major property in the display is &lt;tt&gt;flags&lt;/tt&gt;. Flags, as the name implies, is an integer where each bit corresponds to a &lt;a href="https://developer.mozilla.org/en/nsMsgMessageFlags"&gt;different flag&lt;/a&gt;. The most important of these are probably &lt;tt&gt;HasRe&lt;/tt&gt;, &lt;tt&gt;Flagged&lt;/tt&gt;, and &lt;tt&gt;New&lt;/tt&gt;. Flags should be set with &lt;tt&gt;OrFlags&lt;/tt&gt; and &lt;tt&gt;AndFlags&lt;/tt&gt; instead of manipulating the value directly. And don't set these values with the &lt;tt&gt;mark&lt;/tt&gt;* methods, as these cause notifications to be fired (remember that we haven't added the message to the database yet).
&lt;/p&gt;&lt;p&gt;
If you want to do real threading, you will want to set message IDs and references &lt;a href="#note-3.7"&gt;[7]&lt;/a&gt;. The References header is a space-separated list of message ID tokens (wrapped in angle brackets), although the parser routine in the database does a pretty good job of ignoring any random crap. The list is in the reverse order of hierarchy, so the last element is the message's parent, second-to-last the grandparent, etc.
&lt;/p&gt;&lt;p&gt;
Threading is implemented in the following manner. First, the database attempts to find a message for each message ID in reverse order. If it finds one, that is made the parent header and threading stops. Otherwise, if correct threading is enabled, an attempt is to made to find a thread which has that message ID. Otherwise, if use strict threading is not enabled, a thread that has a message which has the same subject (without Re) is used as the thread. If threading without re is
disabled, the message has to have the &lt;tt&gt;HasRe&lt;/tt&gt; flag checked to perform the last step. Finally, if a thread could not be found by this point, a new one is created.
&lt;/p&gt;&lt;p&gt;
To combine messages in a thread, then, the References field needs to be set for the messages. If people enable correct threading (this is done by default), you can use a simple trick: create a valid message ID for each thread and stuff that as the References header.
&lt;/p&gt;&lt;h4&gt;A practical example&lt;/h4&gt;&lt;p&gt;
In my case, I have an author (without email addresses), a subject (with possible non-ASCII text but without Re: stuff), a date in a standard format, as well as a simple per-thread unique identifier for message keys. I also want to make threads&amp;mdash;although this will only be two-level threads. Ideally, I should also be flagging the sticky threads, but I'll leave that for a later version. So what does this code look like?
&lt;/p&gt;&lt;pre class="lang-js"&gt;
_loadThread&lt;span class="special"&gt;:&lt;/span&gt; &lt;span class="keyword"&gt;function&lt;/span&gt; &lt;span class="special"&gt;(&lt;/span&gt;document&lt;span class="special"&gt;,&lt;/span&gt; firstMsgId&lt;span class="special"&gt;)&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
  &lt;span class="keyword"&gt;let&lt;/span&gt; database &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_folder&lt;span class="special"&gt;.&lt;/span&gt;getDatabase&lt;span class="special"&gt;();&lt;/span&gt;
  &lt;span class="keyword"&gt;let&lt;/span&gt; conv &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="type"&gt;Cc&lt;/span&gt;&lt;span class="special"&gt;[&lt;/span&gt;&lt;span class="string"&gt;'@mozilla.org/messenger/mimeconverter;1'&lt;/span&gt;&lt;span class="special"&gt;]&lt;/span&gt;
               &lt;span class="special"&gt;.&lt;/span&gt;getService&lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="type"&gt;Ci&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;nsIMimeConverter&lt;span
class="special"&gt;);&lt;/span&gt;
  &lt;span class="keyword"&gt;let&lt;/span&gt; subject &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="comment"&gt;/* one for the thread */&lt;/span&gt;
  &lt;span class="keyword"&gt;let&lt;/span&gt; hostname &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_folder&lt;span
class="special"&gt;.&lt;/span&gt;server&lt;span class="special"&gt;.&lt;/span&gt;hostName&lt;span class="special"&gt;;&lt;/span&gt;
  &lt;span class="keyword"&gt;let&lt;/span&gt; charset &lt;span class="special"&gt;=&lt;/span&gt; document&lt;span class="special"&gt;.&lt;/span&gt;characterSet&lt;span class="special"&gt;;&lt;/span&gt;
  &lt;span class="comment"&gt;/* for each new message */&lt;/span&gt;&lt;span class="special"&gt;&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
    &lt;span class="keyword"&gt;let&lt;/span&gt; postID &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="comment"&gt;/* generate msg key */&lt;/span&gt;&lt;span class="special"&gt;;&lt;/span&gt;
    &lt;span class="keyword"&gt;let&lt;/span&gt; author &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="comment"&gt;/* get author name */&lt;/span&gt;&lt;span class="special"&gt;;&lt;/span&gt;
    &lt;span class="keyword"&gt;let&lt;/span&gt; date &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="keyword"&gt;new&lt;/span&gt; &lt;span class="type"&gt;Date&lt;/span&gt;&lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="comment"&gt;/* get text string*/&lt;/span&gt;&lt;span class="special"&gt;);&lt;/span&gt;
    &lt;span class="keyword"&gt;let&lt;/span&gt; msgHdr &lt;span class="special"&gt;=&lt;/span&gt; database&lt;span class="special"&gt;.&lt;/span&gt;CreateNewHdr&lt;span class="special"&gt;(&lt;/span&gt;postID&lt;span class="special"&gt;);&lt;/span&gt;
    &lt;span class="comment"&gt;// The | is to prevent accidental message delivery&lt;/span&gt;
    msgHdr&lt;span class="special"&gt;.&lt;/span&gt;author &lt;span class="special"&gt;=&lt;/span&gt; conv&lt;span class="special"&gt;.&lt;/span&gt;encodeMimePartIIStr_UTF8&lt;span class="special"&gt;(&lt;/span&gt;
      author &lt;span class="special"&gt;+&lt;/span&gt; &lt;span class="string"&gt;" &amp;lt;"&lt;/span&gt; &lt;span class="special"&gt;+&lt;/span&gt; author &lt;span class="special"&gt;+&lt;/span&gt; &lt;span
class="string"&gt;"@"&lt;/span&gt; &lt;span class="special"&gt;+&lt;/span&gt; hostname &lt;span class="special"&gt;+&lt;/span&gt; &lt;span class="string"&gt;"|&amp;gt;"&lt;/span&gt;&lt;span class="special"&gt;,&lt;/span&gt; &lt;span class="keyword"&gt;true&lt;/span&gt;&lt;span class="special"&gt;,&lt;/span&gt; charset&lt;span class="special"&gt;,&lt;/span&gt; &lt;span class="constant"&gt;0&lt;/span&gt;&lt;span class="special"&gt;,&lt;/span&gt; &lt;span class="constant"&gt;72&lt;/span&gt;&lt;span class="special"&gt;);&lt;/span&gt;
    msgHdr&lt;span class="special"&gt;.&lt;/span&gt;subject &lt;span class="special"&gt;=&lt;/span&gt; conv&lt;span class="special"&gt;.&lt;/span&gt;encodeMimePartIIStr_UTF8&lt;span class="special"&gt;(&lt;/span&gt;subject&lt;span class="special"&gt;,&lt;/span&gt; &lt;span class="keyword"&gt;false&lt;/span&gt;&lt;span class="special"&gt;,&lt;/span&gt; charset&lt;span class="special"&gt;,&lt;/span&gt;
      &lt;span class="constant"&gt;0&lt;/span&gt;&lt;span class="special"&gt;,&lt;/span&gt; &lt;span class="constant"&gt;72&lt;/span&gt;&lt;span class="special"&gt;);&lt;/span&gt;
    &lt;span class="comment"&gt;// PRTime is in &amp;micro;s, JS date in ms&lt;/span&gt;
    msgHdr&lt;span class="special"&gt;.&lt;/span&gt;date &lt;span class="special"&gt;=&lt;/span&gt; date &lt;span class="special"&gt;*&lt;/span&gt; &lt;span class="constant"&gt;1000&lt;/span&gt;&lt;span class="special"&gt;;&lt;/span&gt;
    msgHdr&lt;span class="special"&gt;.&lt;/span&gt;Charset &lt;span class="special"&gt;=&lt;/span&gt; charset&lt;span class="special"&gt;;&lt;/span&gt;
    msgHdr&lt;span class="special"&gt;.&lt;/span&gt;messageId &lt;span class="special"&gt;=&lt;/span&gt; postID &lt;span class="special"&gt;+&lt;/span&gt; &lt;span class="string"&gt;"@"&lt;/span&gt; &lt;span class="special"&gt;+&lt;/span&gt; document&lt;span class="special"&gt;.&lt;/span&gt;documentURI&lt;span class="special"&gt;;&lt;/span&gt;
    &lt;span class="keyword"&gt;if&lt;/span&gt; &lt;span class="special"&gt;(&lt;/span&gt;firstMsgId&lt;span class="special"&gt;)&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
      msgHdr&lt;span class="special"&gt;.&lt;/span&gt;setReferences&lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="string"&gt;"&amp;lt;"&lt;/span&gt; &lt;span class="special"&gt;+&lt;/span&gt; firstMsgId &lt;span class="special"&gt;+&lt;/span&gt; &lt;span class="string"&gt;"&amp;gt;"&lt;/span&gt;&lt;span class="special"&gt;);&lt;/span&gt;
      msgHdr&lt;span class="special"&gt;.&lt;/span&gt;OrFlags&lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="type"&gt;Ci&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;nsMsgMessageFlags&lt;span class="special"&gt;.&lt;/span&gt;HasRe&lt;span class="special"&gt;);&lt;/span&gt;
    &lt;span class="special"&gt;}&lt;/span&gt; &lt;span class="keyword"&gt;else&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
      firstMsgId &lt;span class="special"&gt;=&lt;/span&gt; msgHdr&lt;span class="special"&gt;.&lt;/span&gt;messageId&lt;span class="special"&gt;;&lt;/span&gt;
   &lt;span class="special"&gt;}&lt;/span&gt;
   msgHdr&lt;span class="special"&gt;.&lt;/span&gt;OrFlags&lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="type"&gt;Ci&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;nsMsgMessageFlags&lt;span class="special"&gt;.&lt;/span&gt;New&lt;span class="special"&gt;);&lt;/span&gt;
   database&lt;span class="special"&gt;.&lt;/span&gt;AddNewHdrToDB&lt;span class="special"&gt;(&lt;/span&gt;msgHdr&lt;span class="special"&gt;,&lt;/span&gt; &lt;span class="keyword"&gt;true&lt;/span&gt;&lt;span
class="special"&gt;);&lt;/span&gt;
  &lt;span class="special"&gt;}&lt;/span&gt;
&lt;span class="special"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;p&gt;
First, we get a reference to the database. Remember we implemented this in our last step, so this shouldn't present any problems. We also get the things that are shared in this thread: the subject, hostname of the server, and the charset. For each of the posts, we collect the post ID, the author, and the date of the post as text strings, and then convert them into an integer, string, and a date respectively.
&lt;/p&gt;&lt;p&gt;
Using the &lt;tt&gt;CreateNewHdr&lt;/tt&gt; function, we get a new message header that we can manipulate. Since I'm trying to be aware of non-ASCII text, I'm using the MIME encoding strings to prepare the author and subject. Remember that the MIME specifications want you to encode non-ASCII text in the headers; the function we use is the simplest way to do the encoding.
&lt;/p&gt;&lt;p&gt;
If you're not working with actual email, the from string can be contorted. What I did was to create a fictituous email that could be theoretically tied back to the author in a systematic way (for a possible future compose code that does forum private messaging). The purpose of the pipe character at the end is to prevent accidental mail delivery; I also used the &lt;tt&gt;hostName&lt;/tt&gt; and not the &lt;tt&gt;realHostName&lt;/tt&gt;, so this email address would be traceable even if the user changes the host name on me.
&lt;/p&gt;&lt;p&gt;
The message date I have is a formatted string; the &lt;tt&gt;Date&lt;/tt&gt; constructor is pretty handy at converting most forms of these strings into a usable JS Date object. Then I have a JS Date object, which is measured in milliseconds, whereas the date attribute is a &lt;tt&gt;PRTime&lt;/tt&gt;, which is measured in microseconds, so I need to multiply by 1000 to actually set the property. Ironically, the date is actually stored in seconds in database and is converted to and from microseconds on the fly.
&lt;/p&gt;&lt;p&gt;
The &lt;tt&gt;Charset&lt;/tt&gt; attribute, apparently only used for search right now, is derived from the character set as reported by the DOM. This means that it is the same character set as would be assumed by the layout engine, including character set overrides.
&lt;/p&gt;&lt;p&gt;
The message ID is simpler to generate: valid URIs are pretty much valid right-hand-sides of a message ID. A post is pretty much representable as a tuple of the thread page and the path to the post in the DOM, so this message ID is also an easy way to get to the message. References are also generated as I described above; in a later version, I may try to do sniffing to figure out from quoting who is replying to whom and recreate actual threads. Note that when setting the message
ID, the outer angle brackets are optional.
&lt;/p&gt;&lt;p&gt;
The last thing I set is the flags. A complete listing of flags &lt;a href="https://developer.mozilla.org/en/nsMsgMessageFlags"&gt;can be found on MDC&lt;/a&gt;. In this case, the only flags I care about are HasRe (since I want to generate "Re:" headers) and New; most of the others will probably be set by the user in the UI.
&lt;/p&gt;&lt;p&gt;
Finally, we add the header to the database. The last parameter tells the database to tell anyone listening that we have a new message. After we have loaded all of the messages, we need to commit the database:
&lt;/p&gt;&lt;pre class="lang-js"&gt;
database&lt;span class="special"&gt;.&lt;/span&gt;Commit&lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="type"&gt;Ci&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;nsMsgDBCommitType&lt;span class="special"&gt;.&lt;/span&gt;kLargeCommit&lt;span class="special"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;p&gt;
A brief note to make here: it doesn't really matter if you do a large or session commit, they both end up doing the same thing. Small commits end up doing nothing.
&lt;/p&gt;&lt;h4&gt;Notes&lt;/h4&gt;&lt;ol start="4"&gt;
&lt;li id="note-3.4"&gt;Like most synchronization stuff, you theoretically also have to deal with deletion on the remote side as well as read changes, etc. The more I think about it, the more I'm torn on whether or not I should implement it. For now, I'll recommend that you weigh the cost of trying to determine deleted messages versus the commonality of deletion or other modification.&lt;/li&gt;
&lt;li id="note-3.5"&gt;Except, I am told, that all words that end in -tion in French are female.&lt;/li&gt;
&lt;li id="note-3.6"&gt;Incidentally, this is a major part of the reason why there is a 4 GiB limit on mailbox size in Thunderbird and SeaMonkey.&lt;/li&gt;
&lt;li id="note-3.7"&gt;What about &lt;tt&gt;In-Reply-To&lt;/tt&gt;, you may ask. This information is pretty much redundant with &lt;tt&gt;References&lt;/tt&gt;, so what happens is that, for the purposes of computing threading, this header is appended to the &lt;tt&gt;References&lt;/tt&gt; header. And you do this before calling on the database header.&lt;/li&gt;
&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-7835550926699157960?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/7835550926699157960/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=7835550926699157960' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/7835550926699157960'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/7835550926699157960'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2010/05/developing-new-account-types-part-3_21.html' title='Developing new account types, Part 3: Updating folders (part 2)'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-421058381234702501</id><published>2010-05-12T10:55:00.005-04:00</published><updated>2010-05-12T11:19:17.305-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='accttype'/><category scheme='http://www.blogger.com/atom/ns#' term='mailnews'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>Developing new account types, Part 3: Updating folders (part 1)</title><content type='html'>This series of blog posts discusses the creation of a new account type implemented in JavaScript. Over the course of these blogs, I use the development of my &lt;a
href="http://hg.mozilla.org/users/Pidgeot18_gmail.com/webfora"&gt;Web Forums extension &lt;/a&gt;to explain the necessary actions in creating new account types. I hope to add a new post once every two weeks (I cannot guarantee it, though).
&lt;/p&gt;&lt;p&gt;
In the previous blog post, I showed how to get an empty message list displayed in the folder pane. The next step is to actually implement the folder update. Since this task involves several tasks, I will be breaking this step into multiple blog posts.
&lt;/p&gt;&lt;h4&gt;Getting a DOM for HTML&lt;/h4&gt;&lt;p&gt;
In terms of webscraping, I treat the first step as simply turning a URI into a DOM. The developer center actually has &lt;a href="https://developer.mozilla.org/en/Code_snippets/HTML_to_DOM"&gt;some good resources on this&lt;/a&gt;, if you have access to a document object. The issue, though, is getting a document object, since your code will likely be running from an XPCOM component &lt;a href="#note-3.1"&gt;[1]&lt;/a&gt;. What is needed then, is a utility method for loading the DOM. This is the code I've been using:&lt;/p&gt;
&lt;pre class="lang-js"&gt;&lt;span class="keyword"&gt;function&lt;/span&gt; asyncLoadDom&lt;span class="special"&gt;(&lt;/span&gt;uri&lt;span class="special"&gt;,&lt;/span&gt; callback&lt;span class="special"&gt;)&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
  &lt;span class="keyword"&gt;let&lt;/span&gt; doc &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="type"&gt;Cc&lt;/span&gt;&lt;span class="special"&gt;[&lt;/span&gt;&lt;span class="string"&gt;'@mozilla.org/appshell/window-mediator;1'&lt;/span&gt;&lt;span class="special"&gt;]&lt;/span&gt;
              &lt;span class="special"&gt;.&lt;/span&gt;getService&lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="type"&gt;Ci&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;nsIWindowMediator&lt;span class="special"&gt;)&lt;/span&gt;
              &lt;span class="special"&gt;.&lt;/span&gt;getMostRecentWindow&lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="string"&gt;"mail:3pane"&lt;/span&gt;&lt;span class="special"&gt;).&lt;/span&gt;document&lt;span class="special"&gt;;&lt;/span&gt;
  &lt;span class="keyword"&gt;let&lt;/span&gt; frame &lt;span class="special"&gt;=&lt;/span&gt; doc&lt;span class="special"&gt;.&lt;/span&gt;createElement&lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="string"&gt;"iframe"&lt;/span&gt;&lt;span class="special"&gt;);&lt;/span&gt;
  frame&lt;span class="special"&gt;.&lt;/span&gt;setAttribute&lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="string"&gt;"type"&lt;/span&gt;&lt;span class="special"&gt;,&lt;/span&gt; &lt;span class="string"&gt;"content"&lt;/span&gt;&lt;span class="special"&gt;);&lt;/span&gt;
  frame&lt;span class="special"&gt;.&lt;/span&gt;setAttribute&lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="string"&gt;"collapsed"&lt;/span&gt;&lt;span class="special"&gt;,&lt;/span&gt; &lt;span class="string"&gt;"true"&lt;/span&gt;&lt;span class="special"&gt;);&lt;/span&gt;
  doc&lt;span class="special"&gt;.&lt;/span&gt;documentElement&lt;span class="special"&gt;.&lt;/span&gt;appendChild&lt;span class="special"&gt;(&lt;/span&gt;frame&lt;span class="special"&gt;);&lt;/span&gt;
  &lt;span class="keyword"&gt;let&lt;/span&gt; ds &lt;span class="special"&gt;=&lt;/span&gt; frame&lt;span class="special"&gt;.&lt;/span&gt;webNavigation&lt;span class="special"&gt;;&lt;/span&gt;
  ds&lt;span class="special"&gt;.&lt;/span&gt;allowPlugins &lt;span class="special"&gt;=&lt;/span&gt; ds&lt;span class="special"&gt;.&lt;/span&gt;allowJavascript &lt;span class="special"&gt;=&lt;/span&gt; ds&lt;span class="special"&gt;.&lt;/span&gt;allowImages &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="keyword"&gt;false&lt;/span&gt;&lt;span class="special"&gt;;&lt;/span&gt;
  ds&lt;span class="special"&gt;.&lt;/span&gt;allowSubframes &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="keyword"&gt;false&lt;/span&gt;&lt;span class="special"&gt;;&lt;/span&gt;
  ds&lt;span class="special"&gt;.&lt;/span&gt;allowMetaRedirects &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="keyword"&gt;true&lt;/span&gt;&lt;span class="special"&gt;;&lt;/span&gt;
  frame&lt;span class="special"&gt;.&lt;/span&gt;addEventListener&lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="string"&gt;"load"&lt;/span&gt;&lt;span class="special"&gt;,&lt;/span&gt; &lt;span class="keyword"&gt;function&lt;/span&gt; &lt;span class="special"&gt;(&lt;/span&gt;event&lt;span class="special"&gt;)&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
    &lt;span class="keyword"&gt;if&lt;/span&gt; &lt;span class="special"&gt;(&lt;/span&gt;event&lt;span class="special"&gt;.&lt;/span&gt;originalTarget&lt;span class="special"&gt;.&lt;/span&gt;location&lt;span class="special"&gt;.&lt;/span&gt;href &lt;span class="special"&gt;==&lt;/span&gt; &lt;span class="string"&gt;"about:blank"&lt;/span&gt;&lt;span class="special"&gt;)&lt;/span&gt; &lt;span class="keyword"&gt;return&lt;/span&gt;&lt;span class="special"&gt;;&lt;/span&gt;
    callback&lt;span class="special"&gt;(&lt;/span&gt;frame&lt;span class="special"&gt;.&lt;/span&gt;contentDocument&lt;span class="special"&gt;);&lt;/span&gt;
    doc&lt;span class="special"&gt;.&lt;/span&gt;documentElement&lt;span class="special"&gt;.&lt;/span&gt;removeChild&lt;span class="special"&gt;(&lt;/span&gt;frame&lt;span class="special"&gt;);&lt;/span&gt;
  &lt;span class="special"&gt;},&lt;/span&gt; &lt;span class="keyword"&gt;true&lt;/span&gt;&lt;span class="special"&gt;);&lt;/span&gt;
  frame&lt;span class="special"&gt;.&lt;/span&gt;contentDocument&lt;span class="special"&gt;.&lt;/span&gt;location&lt;span class="special"&gt;.&lt;/span&gt;href &lt;span class="special"&gt;=&lt;/span&gt; uri&lt;span class="special"&gt;;&lt;/span&gt;&lt;br&gt;&lt;span class="special"&gt;}&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;The first argument is the URI to load, as a string, and the second argument is the function to be called back with the DOM document as its sole argument. An added benefit to this method is that it also uses an asynchronous callback method, so you're not blocking the UI while you wait for the page to download. This code will likely not be called except by the protocol object, though, since we probably want to throttle the number of pages loaded up at once.
&lt;/p&gt;&lt;h4&gt;The protocol object&lt;/h4&gt;&lt;p&gt;
Earlier, I mentioned that one of the implemented objects wasn't actually mandatory. This object was the protocol object. An instance of this object is meant to wrap around an actual connection to the server; where you don't need to connect to a server, this object might not be worth implementing. In reality, it is still a useful thing to have if you have a non-trivial account type&amp;mdash;any time a task is more complicated than "load this thing and use it," a protocol object can help with managing multiple subtasks.
&lt;/p&gt;&lt;p&gt;
For a wire protocol, the implementation of this object should be straightforward. It would essentially be a state machine, with an idle state entered after setting up the connection during which the instance can accept tasks to do. A state machine could also be done for webscraping-based account types, but I am using a more queue-based
approach due to how I have structured the web loads.
&lt;/p&gt;&lt;p&gt;
At a high level, server requests are chunked at two levels. On the higher level, the application makes calls to functions like &lt;tt&gt;updateFolder&lt;/tt&gt;; these calls I have decided to term &lt;b&gt;tasks&lt;/b&gt;. The lower level requests are the requests you communicate to the server; for lack of any better terminology, I will refer to these as &lt;b&gt;states&lt;/b&gt;&lt;a href="#note-3.2"&gt;[2]&lt;/a&gt;. In my implementation, I keep two queues, one for each of these.
&lt;/p&gt;&lt;p&gt;
Managing the queue for tasks is best done at the server. The overall logic is actually rather simple:&lt;/p&gt;
&lt;pre class="lang-js"&gt;&lt;span class="keyword"&gt;const&lt;/span&gt; kMaxProtocols &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;2&lt;/span&gt;&lt;span class="special"&gt;;&lt;/span&gt;
wfServer&lt;span class="special"&gt;.&lt;/span&gt;prototype &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
  &lt;span class="comment"&gt;/* Queued tasks to run on the next open protocol */&lt;/span&gt;
  _queuedTasks&lt;span class="special"&gt;:&lt;/span&gt; &lt;span class="special"&gt;[],&lt;/span&gt;
  _protocols&lt;span class="special"&gt;:&lt;/span&gt; &lt;span class="special"&gt;[],&lt;/span&gt;
  runTask&lt;span class="special"&gt;:&lt;/span&gt; &lt;span class="keyword"&gt;function&lt;/span&gt; &lt;span class="special"&gt;(&lt;/span&gt;task&lt;span class="special"&gt;)&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
    &lt;span class="keyword"&gt;if&lt;/span&gt; &lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_protocols&lt;span class="special"&gt;.&lt;/span&gt;length &lt;span class="special"&gt;&amp;lt;&lt;/span&gt; kMaxProtocols&lt;span class="special"&gt;)&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
      &lt;span class="keyword"&gt;let&lt;/span&gt; protocol &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="keyword"&gt;new&lt;/span&gt; wfProtocol&lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;);&lt;/span&gt;
      protocol&lt;span class="special"&gt;.&lt;/span&gt;loadTask&lt;span class="special"&gt;(&lt;/span&gt;task&lt;span class="special"&gt;);&lt;/span&gt;
      &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_protocols&lt;span class="special"&gt;.&lt;/span&gt;push&lt;span class="special"&gt;(&lt;/span&gt;protocol&lt;span class="special"&gt;);&lt;/span&gt;
      &lt;span class="keyword"&gt;return&lt;/span&gt;&lt;span class="special"&gt;;&lt;/span&gt;
    &lt;span class="special"&gt;}&lt;/span&gt;
    &lt;span class="keyword"&gt;for&lt;/span&gt; &lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="keyword"&gt;let&lt;/span&gt; i &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;0&lt;/span&gt;&lt;span
class="special"&gt;;&lt;/span&gt; i &lt;span class="special"&gt;&amp;lt;&lt;/span&gt; &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_protocols&lt;span class="special"&gt;.&lt;/span&gt;length&lt;span class="special"&gt;;&lt;/span&gt; i&lt;span class="special"&gt;++)&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
      &lt;span class="keyword"&gt;if&lt;/span&gt; &lt;span class="special"&gt;(!&lt;/span&gt;&lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_protocols&lt;span class="special"&gt;[&lt;/span&gt;i&lt;span class="special"&gt;].&lt;/span&gt;isRunning&lt;span class="special"&gt;)&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
        &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_protocols&lt;span class="special"&gt;[&lt;/span&gt;i&lt;span class="special"&gt;].&lt;/span&gt;loadTask&lt;span class="special"&gt;(&lt;/span&gt;task&lt;span class="special"&gt;);&lt;/span&gt;
        &lt;span class="keyword"&gt;return&lt;/span&gt;&lt;span class="special"&gt;;&lt;/span&gt;
      &lt;span class="special"&gt;}&lt;/span&gt;
    &lt;span class="special"&gt;}&lt;/span&gt;
    &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_queuedTasks&lt;span class="special"&gt;.&lt;/span&gt;push&lt;span class="special"&gt;(&lt;/span&gt;task&lt;span class="special"&gt;);&lt;/span&gt;
  &lt;span class="special"&gt;},&lt;/span&gt;
  getNextTask&lt;span class="special"&gt;:&lt;/span&gt; &lt;span class="keyword"&gt;function&lt;/span&gt; &lt;span class="special"&gt;(&lt;/span&gt;task&lt;span class="special"&gt;)&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
    &lt;span class="keyword"&gt;if&lt;/span&gt; &lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_queuedTasks&lt;span class="special"&gt;.&lt;/span&gt;length &lt;span class="special"&gt;&amp;gt;&lt;/span&gt; &lt;span class="constant"&gt;0&lt;/span&gt;&lt;span class="special"&gt;)&lt;/span&gt;
      &lt;span class="keyword"&gt;return&lt;/span&gt; &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_queuedTasks&lt;span class="special"&gt;.&lt;/span&gt;shift&lt;span class="special"&gt;();&lt;/span&gt;
    &lt;span class="keyword"&gt;return&lt;/span&gt; &lt;span class="keyword"&gt;null&lt;/span&gt;&lt;span class="special"&gt;;&lt;/span&gt;
 &lt;span class="special"&gt;},&lt;/span&gt;
&lt;span class="special"&gt;};&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;tt&gt;runTask&lt;/tt&gt; method is designed to be called with a task object; for the core mailnews protocols, this is primarily being called by the service &lt;a href="#note-3.3"&gt;[3]&lt;/a&gt;. For now, I've made the value for the maximum number of protocol objects unchangeable, but it is probably better to allow this
value to be configurable via a per-server preference.
&lt;/p&gt;&lt;p&gt;
The core implementation of the protocol running object for webscraping is not too difficult:&lt;/p&gt;
&lt;pre class="lang-js"&gt;&lt;span class="keyword"&gt;const&lt;/span&gt; kMaxLoads &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;4&lt;/span&gt;&lt;span class="special"&gt;;&lt;/span&gt;
&lt;span class="keyword"&gt;function&lt;/span&gt; wfProtocol&lt;span class="special"&gt;(&lt;/span&gt;server&lt;span class="special"&gt;)&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
  &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_server &lt;span class="special"&gt;=&lt;/span&gt; server&lt;span class="special"&gt;;&lt;/span&gt;
&lt;span class="special"&gt;}&lt;/span&gt;
wfProtocol&lt;span class="special"&gt;.&lt;/span&gt;prototype &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
  &lt;span class="comment"&gt;/// Queued URLs; first kMaxLoads are the currently running&lt;/span&gt;
  _urls&lt;span class="special"&gt;:&lt;/span&gt; &lt;span class="special"&gt;[],&lt;/span&gt;
  &lt;span class="comment"&gt;/// The current task&lt;/span&gt;
  _task&lt;span class="special"&gt;:&lt;/span&gt; &lt;span class="keyword"&gt;null&lt;/span&gt;&lt;span class="special"&gt;,&lt;/span&gt;
  &lt;span class="comment"&gt;/// Load the next URL; if all URLs are finished, finish the task&lt;/span&gt;
  onUrlLoaded&lt;span class="special"&gt;:&lt;/span&gt; &lt;span class="keyword"&gt;function&lt;/span&gt; &lt;span class="special"&gt;(&lt;/span&gt;url&lt;span class="special"&gt;)&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
    &lt;span class="keyword"&gt;if&lt;/span&gt; &lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_urls&lt;span class="special"&gt;.&lt;/span&gt;length &lt;span class="special"&gt;&amp;gt;&lt;/span&gt; kMaxLoads&lt;span class="special"&gt;)&lt;/span&gt;
      &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_urls&lt;span class="special"&gt;[&lt;/span&gt;kMaxLoads&lt;span class="special"&gt;].&lt;/span&gt;runUrl&lt;span class="special"&gt;();&lt;/span&gt;
    &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_urls&lt;span class="special"&gt;.&lt;/span&gt;shift&lt;span class="special"&gt;();&lt;/span&gt;
    &lt;span class="keyword"&gt;if&lt;/span&gt; &lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_urls&lt;span class="special"&gt;.&lt;/span&gt;length &lt;span class="special"&gt;==&lt;/span&gt; &lt;span class="constant"&gt;0&lt;/span&gt;&lt;span class="special"&gt;)&lt;/span&gt;
      &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;finishTask&lt;span class="special"&gt;();&lt;/span&gt;
  &lt;span class="special"&gt;},&lt;/span&gt;
  &lt;span class="comment"&gt;/**
   * Queue the next URL to load.
   * Any extra arguments will be passed to the callback method.
   * The callback is called with this protocol as the this object.
   */&lt;/span&gt;
  loadUrl&lt;span class="special"&gt;:&lt;/span&gt; &lt;span class="keyword"&gt;function&lt;/span&gt; &lt;span class="special"&gt;(&lt;/span&gt;url&lt;span class="special"&gt;,&lt;/span&gt; callback&lt;span class="special"&gt;)&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
    &lt;span class="keyword"&gt;let&lt;/span&gt; closure &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;;&lt;/span&gt;
    &lt;span class="keyword"&gt;let&lt;/span&gt; task &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="keyword"&gt;new&lt;/span&gt; UrlRunner&lt;span class="special"&gt;(&lt;/span&gt;url&lt;span class="special"&gt;,&lt;/span&gt; &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;);&lt;/span&gt;
    &lt;span class="keyword"&gt;let&lt;/span&gt; argcalls &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="special"&gt;[&lt;/span&gt;&lt;span class="keyword"&gt;null&lt;/span&gt;&lt;span class="special"&gt;];&lt;/span&gt;
    &lt;span class="keyword"&gt;for&lt;/span&gt; &lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="keyword"&gt;let&lt;/span&gt; i &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;2&lt;/span&gt;&lt;span class="special"&gt;;&lt;/span&gt; i &lt;span class="special"&gt;&amp;lt;&lt;/span&gt; arguments&lt;span class="special"&gt;.&lt;/span&gt;length&lt;span class="special"&gt;;&lt;/span&gt; i&lt;span class="special"&gt;++)&lt;/span&gt;
      argcalls&lt;span class="special"&gt;.&lt;/span&gt;push&lt;span class="special"&gt;(&lt;/span&gt;arguments&lt;span class="special"&gt;[&lt;/span&gt;i&lt;span class="special"&gt;]);&lt;/span&gt;
    task&lt;span class="special"&gt;.&lt;/span&gt;onUrlLoad &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="keyword"&gt;function&lt;/span&gt; &lt;span class="special"&gt;(&lt;/span&gt;dom&lt;span class="special"&gt;)&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
      argcalls&lt;span class="special"&gt;[&lt;/span&gt;&lt;span class="constant"&gt;0&lt;/span&gt;&lt;span class="special"&gt;]&lt;/span&gt; &lt;span class="special"&gt;=&lt;/span&gt; dom&lt;span class="special"&gt;;&lt;/span&gt;
      callback&lt;span class="special"&gt;.&lt;/span&gt;apply&lt;span class="special"&gt;(&lt;/span&gt;closure&lt;span class="special"&gt;,&lt;/span&gt; argcalls&lt;span class="special"&gt;);&lt;/span&gt;
    &lt;span class="special"&gt;};&lt;/span&gt;
    &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_urls&lt;span class="special"&gt;.&lt;/span&gt;push&lt;span class="special"&gt;(&lt;/span&gt;task&lt;span class="special"&gt;);&lt;/span&gt;
    &lt;span class="keyword"&gt;if&lt;/span&gt; &lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_urls&lt;span class="special"&gt;.&lt;/span&gt;length &lt;span class="special"&gt;&amp;lt;=&lt;/span&gt; kMaxLoads&lt;span class="special"&gt;)&lt;/span&gt;
      task&lt;span class="special"&gt;.&lt;/span&gt;runUrl&lt;span class="special"&gt;();&lt;/span&gt;
  &lt;span class="special"&gt;},&lt;/span&gt;
  &lt;span class="comment"&gt;/// Run the task&lt;/span&gt;
  loadTask&lt;span class="special"&gt;:&lt;/span&gt; &lt;span class="keyword"&gt;function&lt;/span&gt; &lt;span class="special"&gt;(&lt;/span&gt;task&lt;span class="special"&gt;)&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
    &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_task &lt;span class="special"&gt;=&lt;/span&gt; task&lt;span class="special"&gt;;&lt;/span&gt;
    &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_task&lt;span class="special"&gt;.&lt;/span&gt;runTask&lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;);&lt;/span&gt;
  &lt;span class="special"&gt;},&lt;/span&gt;
  &lt;span class="comment"&gt;/// Handle a completed task&lt;/span&gt;
  finishTask&lt;span class="special"&gt;:&lt;/span&gt; &lt;span class="keyword"&gt;function&lt;/span&gt; &lt;span class="special"&gt;()&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
    &lt;span class="keyword"&gt;let&lt;/span&gt; task &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_server&lt;span class="special"&gt;.&lt;/span&gt;getNextTask&lt;span class="special"&gt;();&lt;/span&gt;
    &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_task&lt;span class="special"&gt;.&lt;/span&gt;onTaskCompleted&lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;);&lt;/span&gt;
    &lt;span class="keyword"&gt;if&lt;/span&gt; &lt;span class="special"&gt;(&lt;/span&gt;task&lt;span class="special"&gt;)&lt;/span&gt;
      &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;loadTask&lt;span class="special"&gt;(&lt;/span&gt;task&lt;span class="special"&gt;);&lt;/span&gt;
  &lt;span class="special"&gt;}&lt;/span&gt;
&lt;span class="special"&gt;};&lt;/span&gt;
&lt;span class="comment"&gt;/// An object that represents a URL to be run&lt;/span&gt;
&lt;span class="keyword"&gt;function&lt;/span&gt; UrlRunner&lt;span class="special"&gt;(&lt;/span&gt;url&lt;span class="special"&gt;,&lt;/span&gt; protocol&lt;span class="special"&gt;)&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
  &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_url &lt;span class="special"&gt;=&lt;/span&gt; url&lt;span class="special"&gt;;&lt;/span&gt;
  &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_protocol &lt;span class="special"&gt;=&lt;/span&gt; protocol&lt;span class="special"&gt;;&lt;/span&gt;
&lt;span class="special"&gt;}&lt;/span&gt;
UrlRunner&lt;span class="special"&gt;.&lt;/span&gt;prototype &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
  runUrl&lt;span class="special"&gt;:&lt;/span&gt; &lt;span class="keyword"&gt;function&lt;/span&gt; &lt;span class="special"&gt;()&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
    &lt;span class="keyword"&gt;let&lt;/span&gt; real &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;;&lt;/span&gt;
    asyncLoadDom&lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_url&lt;span class="special"&gt;,&lt;/span&gt; &lt;span class="keyword"&gt;function&lt;/span&gt; &lt;span class="special"&gt;(&lt;/span&gt;dom&lt;span class="special"&gt;)&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
      real&lt;span class="special"&gt;.&lt;/span&gt;onUrlLoad&lt;span class="special"&gt;(&lt;/span&gt;dom&lt;span class="special"&gt;);&lt;/span&gt;
      real&lt;span class="special"&gt;.&lt;/span&gt;_protocol&lt;span class="special"&gt;.&lt;/span&gt;onUrlLoaded&lt;span class="special"&gt;(&lt;/span&gt;real&lt;span class="special"&gt;.&lt;/span&gt;_url&lt;span class="special"&gt;);&lt;/span&gt;
    &lt;span class="special"&gt;});&lt;/span&gt;
  &lt;span class="special"&gt;},&lt;/span&gt;
  onUrlLoad&lt;span class="special"&gt;:&lt;/span&gt; &lt;span class="keyword"&gt;function&lt;/span&gt; &lt;span class="special"&gt;(&lt;/span&gt;dom&lt;span class="special"&gt;)&lt;/span&gt; &lt;span class="special"&gt;{}&lt;/span&gt;
&lt;span class="special"&gt;};&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;The protocol is initialized by calling &lt;tt&gt;loadTask&lt;/tt&gt;, which calls &lt;tt&gt;runTask&lt;/tt&gt; on the task object. This would make some calls to &lt;tt&gt;loadUrl&lt;/tt&gt; which will load it (since the max has not been loaded yet). When the function is loaded, via &lt;tt&gt;UrlRunner&lt;span class="special"&gt;.&lt;/span&gt;runUrl&lt;/tt&gt;, the callback function is called
and then the onUrlLoaded function is called to clean up the URL from the queue and run any more. When this function detects that there are no more URLs are being loaded&amp;mdash;hence why the callback is called before this function &lt;tt&gt;is&amp;mdash;finishTask&lt;/tt&gt; is called on the task object.
&lt;/p&gt;&lt;p&gt;
The working of &lt;tt&gt;loadUrl&lt;/tt&gt; bears special mention. The first argument is the URL (as a string) to be loaded. The second argument is the method on &lt;tt&gt;wfProtocol&lt;/tt&gt; to be called when the URL is loaded. This implies that the actual code for implementing tasks is mostly contained on &lt;tt&gt;wfProtocol&lt;/tt&gt; as opposed to the task objects. All subsequent arguments are passed in as arguments to the callback function; the first argument to this function is the DOM document.
&lt;/p&gt;&lt;h4&gt;Notes&lt;/h4&gt;&lt;ol&gt;
&lt;li id="note-3.1"&gt;Well, there is an &lt;tt&gt;nsIDOMParser&lt;/tt&gt; which can turn text into a DOM without needing a document object. Unfortunately, it only supports XML. &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=102699"&gt;There is a patch for making it parse HTML&lt;/a&gt;, but it has gotten no traction in recent months.&lt;/li&gt;
&lt;li id="note-3.2"&gt;Just to muddle it all up, the URL instances in most mailnews implementations are actually how the &lt;b&gt;tasks&lt;/b&gt; are implemented, although I internally use a URL to represent a &lt;b&gt;state&lt;/b&gt; (kind of). A potentially clarifying discussion can be found &lt;a href="news://news.mozilla.org/P8qdnYmfVMg5l0LWnZ2dnUVZ_hSdnZ2d@mozilla.org"&gt;in mozilla.dev.apps.thunderbird&lt;/a&gt;.&lt;/li&gt;
&lt;li id="note-3.3"&gt;I am not totally happy with the current model of the protocol system in mailnews, particularly with the technique of crossing over to the service to make the calls to the protocol. In my implementation, I've made those functions static functions on the protocol object. Since this is somewhat different from the current
implementations and I'm not sure I want to keep this, I've couched my statements of how things work.&lt;/li&gt;
&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-421058381234702501?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/421058381234702501/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=421058381234702501' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/421058381234702501'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/421058381234702501'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2010/05/developing-new-account-types-part-3.html' title='Developing new account types, Part 3: Updating folders (part 1)'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-7558947156950584412</id><published>2010-04-27T21:45:00.003-04:00</published><updated>2010-04-27T21:56:35.305-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mailnews'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>A new folder tree view for real</title><content type='html'>Seeing as how the first build candidates of Thunderbird 3.1 beta 2 are currently being spun, it is time for me to update from the 3.0 builds to 3.1 builds (I have a policy of switching to the next branch of Thunderbird as my primary at the time of the last beta). I decided to take this opportunity to work out issues in &lt;a href="http://quetzalcoatal.blogspot.com/2010/03/new-folder-tree-view.html"&gt;my folder categories extension&lt;/a&gt; for real.
&lt;/p&gt;&lt;p&gt;
I've decided to give up on supporting 3.0 (trying to support two different broken versions of code is not fun), so the oldest supported version is now listed as 3.1 beta 2. In reality, it would work with any nightly since &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=554558"&gt;the broken code&lt;/a&gt; was fixed (it's still not fully fixed now, but it's just a cosmetic issue). The result is &lt;a href="https://addons.mozilla.org/en-US/thunderbird/addon/156142"&gt;an experimental addon&lt;/a&gt; on addons.mozilla.org. For those of you dying for screenshots, &lt;a href="http://foldercat.mozdev.org/screenshots.html"&gt;this page on mozdev&lt;/a&gt; should satiate you.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-7558947156950584412?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/7558947156950584412/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=7558947156950584412' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/7558947156950584412'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/7558947156950584412'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2010/04/new-folder-tree-view-for-real.html' title='A new folder tree view for real'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-6817086371556825810</id><published>2010-04-11T07:47:00.002-04:00</published><updated>2010-04-11T08:08:30.635-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='codecoverage'/><category scheme='http://www.blogger.com/atom/ns#' term='mailnews'/><title type='text'>Animated code coverage</title><content type='html'>I recently spent a fair amount of time collecting historical code coverage data for Thunderbird; the result is 312 distinct files of raw lcov data covering the first year of Thunderbird in a mercurial repository. I also recently wrote a program that makes a treemap for each day (thanks to the &lt;tt&gt;geninfo&lt;/tt&gt; man page and &lt;a href="http://treemap.sourceforge.net/"&gt;this treemap library&lt;/a&gt;), and then wrote another program to convert that treemap into a static PNG image:
&lt;/p&gt;&lt;pre&gt;
view.setSize(1920,1080);
BufferedImage image = new BufferedImage(view.getWidth(),
  view.getHeight(), BufferedImage.TYPE_INT_RGB);
view.paint(image.getGraphics());
ImageIO.write(image, "png", new File(args[1]));
System.exit(0);
&lt;/pre&gt;&lt;p&gt;
I ran that tool to create images for every single day, and then I made another short script to add dates to each of the images (ImageMagick works really well here):
&lt;/p&gt;&lt;pre&gt;
DATE=$(echo $1 | cut -d'.' -f1)
convert -fill "#aaa" -pointsize 50 label:"$DATE" /tmp/label.png
composite -compose Multiply -gravity southwest /tmp/label.png $1 anno-$1
&lt;/pre&gt;&lt;p&gt;
Now, with 312 images on hand, I decided to make them into a video:
&lt;/p&gt;&lt;pre&gt;
mencoder mf://out/anno-*.png -mf w=1920:h=1080 -ovc lavc -lavcopts vcodec=ffv1 -of avi -ofps 3 -o output.avi
&lt;/pre&gt;&lt;p&gt;
I then converted the high-def, lossless AVI into an Ogg file, and produced the following animated video of historical code coverage:
&lt;/p&gt;&lt;video src="http://www.prism.gatech.edu/~jcranmer3/ccov-old.ogv" controls&gt;
Use a web browser that supports playing box Ogg videos. It's the least you could do, considering how much time converting the AVI to Ogg took!
&lt;/video&gt;&lt;p&gt;
Okay, so no sound yet for the animation&amp;mdash;the encoding is painful enough that I don't want to try it out right now. I also didn't filter out any of the days where the tests failed early, so you will occasionally see flashes of red. The data also doesn't have recent stuff (I am holding off until I can figure out how to run mozmill tests and get JS code coverage). Anyways, enjoy!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-6817086371556825810?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/6817086371556825810/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=6817086371556825810' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/6817086371556825810'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/6817086371556825810'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2010/04/animated-code-coverage.html' title='Animated code coverage'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-8353056232333379238</id><published>2010-04-04T16:23:00.005-04:00</published><updated>2010-04-04T16:39:01.987-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='accttype'/><category scheme='http://www.blogger.com/atom/ns#' term='mailnews'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>Developing new account types, Part 2: Message lists</title><content type='html'>This series of blog posts discusses the creation of a new account type implemented in JavaScript. Over the course of these blogs, I use the development of my &lt;a
href="http://hg.mozilla.org/users/Pidgeot18_gmail.com/webfora"&gt;Web Forums extension &lt;/a&gt;to
explain the necessary actions in creating new account types. I hope to add a new post once every two weeks (I cannot guarantee it, though).
&lt;/p&gt;&lt;p&gt;
In &lt;a href="http://quetzalcoatal.blogspot.com/2010/02/developing-new-account-types-part-1.html"&gt;the previous blog post&lt;/a&gt;, I showed how to get an account displayed in the folder pane. Now, we will prepare the necessary components of getting an empty message list displayed in the folder pane.
&lt;/p&gt;&lt;h4&gt;Database basics&lt;/h4&gt;&lt;p&gt;
As mentioned previously, the database is one of the key components of an account. It is, essentially, the object that actually stores the state of messages in folders and even some folder attributes themselves. The database is currently backed by a mork database (the .msf files you see in your profile storage); in principle, you could make your own database from scratch that doesn't use mork, but that is likely a very bad idea. &lt;a href="#note-2.1"&gt;[1]&lt;/a&gt;
&lt;/p&gt;&lt;p&gt;
Originally, as I understand it, the database was merely a cache of the data in the actual mailbox. Its purpose was to store the data that was needed to drive the user interface to prevent having to reparse the potentially large mailbox every time you opened up Netscape. The
implicit assumption here was that blowing away the database was more or less lossless. Well, times change, and now such actions are no longer lossless: pretty much any per-folder or finer-grained property is stored in the message database; in many cases, these properties are not stored elsewhere.
&lt;/p&gt;&lt;p&gt;
The database itself is represented by the &lt;tt&gt;nsIMsgDatabase&lt;/tt&gt; interface. Messages and threads are represented by the &lt;tt&gt;nsIMsgDBHdr&lt;/tt&gt; and &lt;tt&gt;nsIMsgThread&lt;/tt&gt; interfaces, respectively. Per-folder property stores are represented by &lt;tt&gt;nsIDBFolderInfo&lt;/tt&gt;. Finally, the code to open a new database comes from &lt;tt&gt;nsIMsgDBService&lt;/tt&gt;. Most of the database
stuff just works; subclasses would implement only a few methods to override the default
ones.
&lt;/p&gt;&lt;h4&gt;Getting databases&lt;/h4&gt;&lt;p&gt;
There are two main entry points for getting databases: &lt;tt&gt;msgDatabase&lt;/tt&gt;, and &lt;tt&gt;getDBFolderInfoAndDB&lt;/tt&gt;. Both of these must be implemented for anything to work:
&lt;/p&gt;&lt;pre class="lang-js"&gt;
wfFolder&lt;span class="special"&gt;.&lt;/span&gt;prototype &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
  getDatabase&lt;span class="special"&gt;:&lt;/span&gt; &lt;span class="keyword"&gt;function&lt;/span&gt; &lt;span
class="special"&gt;()&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
    &lt;span class="keyword"&gt;if&lt;/span&gt; &lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_inner&lt;span class="special"&gt;[&lt;/span&gt;&lt;span class="string"&gt;"#mDatabase"&lt;/span&gt;&lt;span class="special"&gt;])&lt;/span&gt;
      &lt;span class="keyword"&gt;return&lt;/span&gt; &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_inner&lt;span class="special"&gt;[&lt;/span&gt;&lt;span class="string"&gt;"#mDatabase"&lt;/span&gt;&lt;span class="special"&gt;];&lt;/span&gt;
    &lt;span class="keyword"&gt;let&lt;/span&gt; dbService &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="type"&gt;Cc&lt;/span&gt;&lt;span class="special"&gt;[&lt;/span&gt;&lt;span class="string"&gt;"@mozilla.org/msgDatabase/msgDBService;1"&lt;/span&gt;&lt;span class="special"&gt;]&lt;/span&gt;
                      &lt;span class="special"&gt;.&lt;/span&gt;getService&lt;span class="special"&gt;(&lt;/span&gt;&lt;span
class="type"&gt;Ci&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;nsIMsgDBService&lt;span class="special"&gt;);&lt;/span&gt;
    &lt;span class="keyword"&gt;let&lt;/span&gt; db&lt;span class="special"&gt;;&lt;/span&gt;
    &lt;span class="keyword"&gt;try&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
      db &lt;span class="special"&gt;=&lt;/span&gt; dbService&lt;span class="special"&gt;.&lt;/span&gt;openFolderDB&lt;span
class="special"&gt;(&lt;/span&gt;&lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_inner&lt;span class="special"&gt;,&lt;/span&gt; &lt;span class="keyword"&gt;false&lt;/span&gt;&lt;span class="special"&gt;);&lt;/span&gt;
    &lt;span class="special"&gt;}&lt;/span&gt; &lt;span class="keyword"&gt;catch&lt;/span&gt; &lt;span class="special"&gt;(&lt;/span&gt;e&lt;span class="special"&gt;)&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
      db &lt;span class="special"&gt;=&lt;/span&gt; dbService&lt;span class="special"&gt;.&lt;/span&gt;createNewDB&lt;span
class="special"&gt;(&lt;/span&gt;&lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_inner&lt;span class="special"&gt;);&lt;/span&gt;
    &lt;span class="special"&gt;}&lt;/span&gt;
    &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_inner&lt;span class="special"&gt;[&lt;/span&gt;&lt;span class="string"&gt;"#mDatabase"&lt;/span&gt;&lt;span class="special"&gt;]&lt;/span&gt; &lt;span class="special"&gt;=&lt;/span&gt; db&lt;span class="special"&gt;;&lt;/span&gt;
    &lt;span class="keyword"&gt;return&lt;/span&gt; db&lt;span class="special"&gt;;&lt;/span&gt;
  &lt;span class="special"&gt;},&lt;/span&gt;
  getDBFolderInfoAndDB&lt;span class="special"&gt;:&lt;/span&gt; &lt;span class="keyword"&gt;function&lt;/span&gt; &lt;span
class="special"&gt;(&lt;/span&gt;folderInfo&lt;span class="special"&gt;)&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
    &lt;span class="keyword"&gt;let&lt;/span&gt; db &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;getDatabase&lt;span class="special"&gt;();&lt;/span&gt;
    folderInfo&lt;span class="special"&gt;.&lt;/span&gt;value &lt;span class="special"&gt;=&lt;/span&gt; db&lt;span class="special"&gt;.&lt;/span&gt;dBFolderInfo&lt;span class="special"&gt;;&lt;/span&gt;
    &lt;span class="keyword"&gt;return&lt;/span&gt; db&lt;span class="special"&gt;;&lt;/span&gt;
  &lt;span class="special"&gt;},&lt;/span&gt;
&lt;span class="special"&gt;};&lt;/span&gt;
&lt;/pre&gt;&lt;p&gt;
This portion of the code can turn out to be surprisingly complicated. What is listed is generally a safe option: if the database is incorrect (out of date or non-existent), blow away the database and re-retrieve the information from other sources. Recreating the database is done in the catch statement. Then we set the member variable to be the newly-created database (this is also used by &lt;tt&gt;nsMsgDBFolder&lt;/tt&gt; code) and we return it. Retrieving the folder info should be self-explanatory.
&lt;/p&gt;&lt;p&gt;You may notice that when the database is invalid, all we do is create a new database: we don't try to fix it. This is because these calls to get the database are interested in getting a version of the database quickly: this is one of the calls the folder pane makes, and it is synchronous. Imagine what would happen if, say, a local folder which had a 3GiB backing store needed to be reparsed during this call. The actual recovery of the database would most likely happen when the folder is told to update.
&lt;/p&gt;&lt;p&gt;
Other stuff can be added to these calls. Not everything is necessarily stored in the database: news folders store their read information in the newsrc file, so it needs to sync this with the
database in the method too.
&lt;/p&gt;&lt;h4&gt;Displaying an empty message list&lt;/h4&gt;&lt;p&gt;
If you just try to implement this code and run, you will discover that this is not sufficient to load the database. The key is in the &lt;tt&gt;getIncomingServerType&lt;/tt&gt; function, which is what tells the database service which implementation of &lt;tt&gt;nsIMsgDatabase&lt;/tt&gt; to use. For now, we can just use the default implementation of &lt;tt&gt;nsMsgDatabase&lt;/tt&gt;, but we can't change the parameter output (otherwise URIs will get messed up). The solution is to create a DB proxy:
&lt;/p&gt;&lt;pre class="lang-js"&gt;
&lt;span class="keyword"&gt;function&lt;/span&gt; wfDatabase&lt;span class="special"&gt;()&lt;/span&gt; &lt;span class="special"&gt;{}&lt;/span&gt;
wfDatabase&lt;span class="special"&gt;.&lt;/span&gt;prototype &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
  contractID&lt;span class="special"&gt;:&lt;/span&gt; &lt;span class="string"&gt;"@mozilla.org/nsMsgDatabase/msgDB-&lt;/span&gt;&lt;span class="special"&gt;webforum&lt;/span&gt;&lt;span class="string"&gt;"&lt;/span&gt;&lt;span class="special"&gt;,&lt;/span&gt;
  _xpcom_factory&lt;span class="special"&gt;:&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
    createInstance&lt;span class="special"&gt;:&lt;/span&gt; &lt;span class="keyword"&gt;function&lt;/span&gt; &lt;span
class="special"&gt;(&lt;/span&gt;outer&lt;span class="special"&gt;,&lt;/span&gt; iid&lt;span class="special"&gt;)&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
     &lt;span class="keyword"&gt;if&lt;/span&gt; &lt;span class="special"&gt;(&lt;/span&gt;outer&lt;span class="special"&gt;)&lt;/span&gt;
        &lt;span class="keyword"&gt;throw&lt;/span&gt; &lt;span class="type"&gt;Cr&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;NS_ERROR_NO_AGGREGATION&lt;span class="special"&gt;;&lt;/span&gt;
      &lt;span class="keyword"&gt;return&lt;/span&gt; &lt;span class="type"&gt;Cc&lt;/span&gt;&lt;span class="special"&gt;[&lt;/span&gt;&lt;span class="string"&gt;"@mozilla.org/nsMsgDatabase/msgDB-default"&lt;/span&gt;&lt;span class="special"&gt;].&lt;/span&gt;createInstance&lt;span class="special"&gt;(&lt;/span&gt;iid&lt;span class="special"&gt;);&lt;/span&gt;
    &lt;span class="special"&gt;}&lt;/span&gt;
  &lt;span class="special"&gt;}&lt;/span&gt;
&lt;span class="special"&gt;};&lt;/span&gt;
&lt;/pre&gt;&lt;p&gt;
What this does is use some XPCOM magic to link creating one contract ID to creating the other. I have not yet used the extend-C++-in-JS glue to create the ability to subclass &lt;tt&gt;nsMsgDatabase&lt;/tt&gt; due to the fact that the &lt;tt&gt;nsIMsgDatabase&lt;/tt&gt; interface is more complicated than the others, as well as it being more C++-specific codewise and generally less
useful to override methods.
&lt;/p&gt;&lt;p&gt;
The next thing to do to display the list is to write a simple no-op implementation for &lt;tt&gt;updateFolder&lt;/tt&gt; (the default implementation doesn't do this, for some reason &lt;a href="#note-2.2"&gt;[2]&lt;/a&gt;):
&lt;/p&gt;&lt;pre class="lang-js"&gt;
updateFolder&lt;span class="special"&gt;:&lt;/span&gt; &lt;span class="keyword"&gt;function&lt;/span&gt; &lt;span class="special"&gt;(&lt;/span&gt;msgwindow&lt;span class="special"&gt;)&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
  &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_inner&lt;span class="special"&gt;.&lt;/span&gt;NotifyFolderEvent&lt;span class="special"&gt;(&lt;/span&gt;atoms&lt;span class="special"&gt;[&lt;/span&gt;&lt;span class="string"&gt;"FolderLoaded"&lt;/span&gt;&lt;span class="special"&gt;]);&lt;/span&gt;
&lt;span class="special"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;p&gt;
Here, atoms is merely is an associative array that contains a list of necessary atoms for the code. The end result of all of these changes is the following screenshot:
&lt;/p&gt;
&lt;!--&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_qW4UNslWKZU/S7j4FSwIlkI/AAAAAAAAACI/RUSi5iRVXUo/s1600/Database.png"&gt;--&gt;
&lt;img src="http://4.bp.blogspot.com/_qW4UNslWKZU/S7j4FSwIlkI/AAAAAAAAACI/RUSi5iRVXUo/s400/Database.png" alt="The database of an empty folder" id="BLOGGER_PHOTO_ID_5456383718376117826" /&gt;&lt;p&gt;
In the next part, I'll cover how to replace that screenshot with one containing an actual folder list.
&lt;/p&gt;&lt;h4&gt;Notes&lt;/h4&gt;
&lt;ol&gt;
&lt;li id="note-2.1"&gt;As annoying as it would be, implementing &lt;tt&gt;nsIMsgIncomingServer&lt;/tt&gt; or &lt;tt&gt;nsIMsgFolder&lt;/tt&gt; from scratch is still somewhat feasible. I don't think the same holds true for &lt;tt&gt;nsIMsgDatabase&lt;/tt&gt; (or the other database helper interfaces): &lt;tt&gt;static_cast&lt;/tt&gt;s permeate the code here, with the note that it is a "closed system, cast ok".&lt;/li&gt;
&lt;li id="note-2.2"&gt;If you're wondering why this post took so long to be produced, this is a major reason why. It turns out that not having this implementation causes the folder display to not display the database load, so it just displayed the server page with the server name changed to the folder name. That, on top of having no time to debug it.&lt;/li&gt;
&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-8353056232333379238?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/8353056232333379238/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=8353056232333379238' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/8353056232333379238'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/8353056232333379238'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2010/04/developing-new-account-types-part-2.html' title='Developing new account types, Part 2: Message lists'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_qW4UNslWKZU/S7j4FSwIlkI/AAAAAAAAACI/RUSi5iRVXUo/s72-c/Database.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-6649176240635673169</id><published>2010-04-02T12:02:00.005-04:00</published><updated>2010-04-02T12:39:56.048-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='codecoverage'/><category scheme='http://www.blogger.com/atom/ns#' term='mailnews'/><title type='text'>Code coverage to the extreme</title><content type='html'>If all goes well, sometime tonight I will have completed 362 builds of Thunderbird, one for each day from July 24, 2008 to July 23, 2009 excluding August 3, 2008, December 25, 2008, and July 4, 2009 (more may turn up as I get more data; bonus points if you can figure out the significance of each of those days!). Included for each build is either a build log to tell me why the build failed or a test log telling me what ran. Also included is a copy of the Thunderbird code coverage data.
&lt;/p&gt;&lt;p&gt;
What, you may ask, do I intend to do with a year's worth of code coverage data? I intend to use this data to help answer some questions I have about our code coverage. Already, I've wondered about a more general overview of code coverage data (see &lt;a href="http://quetzalcoatal.blogspot.com/2010/03/visualizing-code-coverage.html"&gt;my last post&lt;/a&gt; for more details). Now, I want to pose some of the following questions:
&lt;/p&gt;&lt;ul&gt;
&lt;li&gt;Whose code is not covered?&lt;/li&gt;
&lt;li&gt;Who is adding code right now without making sure to cover it?&lt;/li&gt;
&lt;li&gt;Whose tests are responsible for most improving code coverage?&lt;/li&gt;
&lt;li&gt;How is code coverage being impacted over time?&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;
My answers to these questions involves taking a snapshot of the code coverage data over time. That, however, proves to be a little more difficult than you'd imagine. First of all, hg doesn't support, as far as I can tell, an "update to what the repo looked like at this time" (hg up -d goes to the revision that most matches that date, not to a snapshot at that time). So I had to write a few scripts to pull out the revisions to look at. Second, gloda ruined some of this data. Fortunately, that's easy to tell due to the &lt;1KB log files complaining about no client.mk. Then there's the issue of my revision logs containing m-c data, not m-1.9.1, so I have to hack around the client.py for Thunderbird trying to pull a different revision.
&lt;/p&gt;&lt;p&gt;
Another source of complaints was actually building and running the things. The computers I'm doing this on are all 64-bit Linux. There are a few m-c revisions that cause 64-bit to break, and libthebes and gcov just can't seem to work together on 64-bit Linux. Plus, libpango has some breaking API changes between 2.22 and 2.24. One of the XPCOM tests seems to crash and sit there with a prompt saying "Do you want to debug me?" Finally, the test plugins seem to cause massive test failure due to assertions. Not to mention that these machines don't have lcov on them and I don't have sudo privileges (so I'm not running mozmill tests yet).
&lt;/p&gt;&lt;p&gt;
In short, it's somewhat surprising to me that this actually works. Just looking at some of the build generation shows some coarse changes: between October 2008 and June 2009, the size of the compressed test log files increase 6-fold, and the compressed lcov output has nearly doubled in the same period. Lcov also reported that the coverage increased from about 20% to around 40% as well.
&lt;/p&gt;&lt;p&gt;
Sometime later, I'll hope to get mozmill tests working, as well as improving the JS code coverage to actually work for Thunderbird (it doesn't like E4X nor some of the other files for no apparent reason). Since jscoverage works by modifying the JS code, I can run that without really needing the builds (archived nightlies plus tricking the build-system will work). When all that data is collected, or sometime before, I'll make a nice little web-app that shows all of this information so people can gawp at pretty pictures.
&lt;/p&gt;&lt;p&gt;
If you want to try this on your own, here is the shell script I used to actually collect data:
&lt;/p&gt;&lt;pre&gt;
#!/bin/bash

if [ -z $1 ]; then
    echo "Need a date to build"
    exit 1
fi
DATE=$1

REV=$(grep $DATE comm-revs.log | cut -d' ' -f 3)
MOZREV=$(grep $DATE moz-revs.log | cut -d' ' -f 3)

if [ -z $REV -o -z $MOZREV ]; then
    echo "Illegal date"
    exit 2
fi

echo "Updating to $REV"
hg -R src update -r "$REV"
hg -R src/mozilla update -r "$MOZREV"
pushd src
python client.py --skip-comm --skip-mozilla checkout &amp;amp;&gt; ../config-$REV.log
make -f client.mk configure &amp;amp;&gt; ../config-$REV.log
popd

pushd obj/mozilla
#make -C .. clean &amp;amp;&gt;/dev/null
for f in $(ls config/autoconf.mk nsprpub/config/autoconf.mk js/src/config/autoconf.mk); do
    sed -e 's/-fprofile-arcs -ftest-coverage//' -e 's/-lgcov//' -i $f
done
echo "Building mozilla..."
make -j3 &amp;amp;&gt; ../../build-$REV.log
popd

pushd src
echo "Building comm-central..."
make -f client.mk build &amp;amp;&gt; ../build-$REV.log || exit
popd

LCOV=lcov-1.8/bin/lcov
$LCOV -z -d obj
pushd obj/
echo "Running tests..."
rm -f mozilla/dist/bin/plugins/*
make -k check &amp;amp;&gt;../tests-$REV.log
make -k xpcshell-tests 2&gt;&amp;amp;1 &gt;&gt;../tests-$REV.log
popd
$LCOV -c -d obj -o $REV.info
echo 'Done!'
&lt;/pre&gt;&lt;p&gt;
Don't bother complaining to me if it doesn't work for you. I just did what I needed to do to get it to reliably work. And be prepared to wait for a few hours to collect any non-trivial number of builds. It took me about 12 hours to get 6 months worth of data using 6 different computers; the next 6 months is still going on right now.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-6649176240635673169?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/6649176240635673169/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=6649176240635673169' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/6649176240635673169'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/6649176240635673169'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2010/04/code-coverage-to-extreme.html' title='Code coverage to the extreme'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-8126026725036121225</id><published>2010-03-25T23:12:00.005-04:00</published><updated>2010-04-12T08:50:46.671-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='codecoverage'/><category scheme='http://www.blogger.com/atom/ns#' term='mailnews'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>Visualizing code coverage</title><content type='html'>One recent goal of Thunderbird development has been to increase test coverage. Murali Nandigama has prepared a nice document on &lt;a href="https://wiki.mozilla.org/QA:CodeCoverage"&gt;getting code coverage data&lt;/a&gt;. Running this on just the xpcshell tests for a recent build gave me &lt;a href="http://www.tjhsst.edu/~jcranmer/c-ccov/"&gt;this output&lt;/a&gt;.
&lt;/p&gt;&lt;p&gt;
So the output of LCOV (which does the post-processing) is passable. With enough clicks, I can figure out which lines are being covered and which ones aren't. But if you step back and try to look at the big picture&amp;hellip; that's hard to do. Some directories sure seem good at code coverage: I mean, we hit both the lines of code in there. On the other hand, we seem bad at covering IMAP, only hitting around 11,000 lines of code (note the difference of scale). There's got to be a better big picture.
&lt;/p&gt;&lt;p&gt;
The answer I came up with was to use &lt;a href="http://en.wikipedia.org/wiki/Treemapping"&gt;a treemap&lt;/a&gt;. Basically, treemaps are a good way to display two key attributes of data on the leafs of a tree at once: one is the color, the other is the size (actually, you can probably manage to squeeze three attributes of data under certain conditions if you vary color/saturation independently, but I'm not going that far here). In this case, the hierarchy is the folder hierarchy under mailnews (I'm not interested in m-c coverage) with the leaves being individual files, size being number of functions in a file, and color being the ratio of functions. The result with the same coverage data is the following graphic:
&lt;/p&gt;
&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_qW4UNslWKZU/S6wqgQEeIZI/AAAAAAAAACA/X_bXoK5t2tU/s1600/gcovmap.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 242px;" src="http://2.bp.blogspot.com/_qW4UNslWKZU/S6wqgQEeIZI/AAAAAAAAACA/X_bXoK5t2tU/s400/gcovmap.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5452779982396727698" /&gt;&lt;/a&gt;
&lt;p&gt;
I've also taken the liberty to label the top-level directories so you can read them without having the mouseover capabilities. Immediately, you can see some interesting points about mailnews:
&lt;/p&gt;&lt;ul&gt;
&lt;li&gt;The IMAP code is the largest of the protocol code in terms of functions by a considerable amount. Local code, NNTP, and compose are all roughly equal in size by the same metric.&lt;/li&gt;
&lt;li&gt;Some of the extensions (SMIME and MDN, actually) are not tested at all. Import code is also poorly tested.&lt;/li&gt;
&lt;li&gt;Some of the MIME code is well tested; others aren't. In fact, it's hard to test a function in libmime without testing half the functions in that file. Perhaps we should have more encrypted messages in our tests?&lt;/li&gt;
&lt;li&gt;Speaking of libmime, it's spread out across several files. In other components, functions are centralized into fewer files: specifically protocol, server, and folder. Wonder why? :-)&lt;/li&gt;
&lt;li&gt;nsAbCardProperty is quite well-tested. LDAP files are not. RDF files everywhere are pretty poorly-tested.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I suppose I should also see how mozmill tests change these results. I'd also like to see how this changes over the history of hg. I can provide the source code to people on request, too.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-8126026725036121225?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/8126026725036121225/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=8126026725036121225' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/8126026725036121225'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/8126026725036121225'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2010/03/visualizing-code-coverage.html' title='Visualizing code coverage'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_qW4UNslWKZU/S6wqgQEeIZI/AAAAAAAAACA/X_bXoK5t2tU/s72-c/gcovmap.png' height='72' width='72'/><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-8796336369297884225</id><published>2010-03-24T21:01:00.004-04:00</published><updated>2010-03-24T21:20:33.429-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='jshydra'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>JSHydra and ASTs</title><content type='html'>One goal I've had for a while with respect to JSHydra was to have it actually spit out an easy-to-understand AST, akin to the kind of AST you get from Pork, as opposed to the parse tree from SpiderMonkey. After reading around in a fashion, I've written &lt;a href="http://hg.mozilla.org/users/Pidgeot18_gmail.com/jshydra/file/bcb0a8fc0be4/utils/astml.js"&gt;a postprocessing script&lt;/a&gt; to do so.
&lt;/p&gt;&lt;p&gt;
The basic idea for the output format is along the lines of the &lt;a href="http://code.google.com/p/es-lab/wiki/JsonMLASTFormat"&gt;JsonML AST format&lt;/a&gt;, with a mixture of pork and "I think this is what's happening" to top it off. The actual &lt;tt&gt;["Type", {}, child1, child2]&lt;/tt&gt; format I quickly gave up using because it proves cumbersome to look at; in the interest of keeping something akin to the Pork format, I moved to a more ad-hoc format, which loosely follows the visitor pattern they mention.
&lt;/p&gt;&lt;p&gt;
I've added this output format to the &lt;a href="http://www.tjhsst.edu/~jcranmer/static/webjshydra/"&gt;WebJSHydra&lt;/a&gt; reader (yes, it is a copy-paste in part of the webpork code), so you can play with it to your heart's content. Just don't make it large. It also doesn't support E4X, and I'm not entirely assured of its correctness. Also, I don't support the visitor yet, nor do I have a C or C++ version of the AST for static analysis tools.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-8796336369297884225?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/8796336369297884225/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=8796336369297884225' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/8796336369297884225'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/8796336369297884225'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2010/03/jshydra-and-asts.html' title='JSHydra and ASTs'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-7425180867933480393</id><published>2010-03-23T22:41:00.002-04:00</published><updated>2010-03-23T23:02:52.318-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mailnews'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>A new folder tree view</title><content type='html'>One complaint I have made a few times is that my hierarchy of accounts does not necessarily match up to the logical structure. For example, I have Mozilla-related folders splayed out across three accounts, one newsgroup, and two email accounts. They're different because, well, you can't combine mail folders, newsgroups, and RSS feeds all under one account.
&lt;/p&gt;&lt;p&gt;
Now, in Thunderbird 3, Joey Minta &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=414038"&gt;replaced the folder pane with a more extensible version&lt;/a&gt;. Having some time on my hands (I finally figured out the bug that was stopping me from completing part 2 of the ongoing Creating New Account Types series), I decided to try to make a simple extension that would create a categorized tree view. &lt;a href="http://hg.mozilla.org/users/Pidgeot18_gmail.com/foldercat/"&gt;So this is what I made&lt;/a&gt;. Notes, though:
&lt;/p&gt;&lt;ol&gt;
&lt;li&gt;It doesn't actually work in Thunderbird 3, only some of the newer nightlies. It turns out that the folder tree view stuff changed between Thunderbird 3 and Thunderbird 3.1, and the newer version is what I used to make the extension.&lt;/li&gt;
&lt;li&gt;Speaking of which, it turns out that there is &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=554558"&gt;a bug in &lt;tt&gt;gFolderTreeView.load&lt;/tt&gt;&lt;/a&gt;. Just to make life fun, the strings in the bundles are different between Thunderbird 3 and 3.1. Argh!&lt;/li&gt;
&lt;li&gt;Categorizing works by setting a property on DBFolderInfo, for now at least. So this means it doesn't appear to work on server folders.&lt;/li&gt;
&lt;li&gt;Uncategorized folders fall under the categories of their parents. So, basically, at the beginning, everything is laid out like the all folders view just shifted one level down. As you categorize more stuff, portions are spliced under different categories.&lt;/li&gt;
&lt;li&gt;Categories should be marked as having new or unread messages if any folders beneath them are so marked.&lt;/li&gt;
&lt;/ol&gt;&lt;p&gt;
Once I can get it working in TB 3.0, I'll try to get it up onto amo.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-7425180867933480393?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/7425180867933480393/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=7425180867933480393' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/7425180867933480393'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/7425180867933480393'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2010/03/new-folder-tree-view.html' title='A new folder tree view'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-1846727436225030919</id><published>2010-02-05T12:14:00.010-05:00</published><updated>2010-02-05T18:25:18.597-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mozilla mailnews accttype'/><title type='text'>Developing new account types, part 1: The folder pane</title><content type='html'>This series of blog posts discusses the creation of a new account type implemented in JavaScript. Over the course of these blogs, I use the development of my &lt;a href="http://hg.mozilla.org/users/Pidgeot18_gmail.com/webfora"&gt;Web Forums extension&lt;/a&gt; to explain the necessary actions in creating new account types. I hope to add a new post once every two weeks (I cannot guarantee it, though).
&lt;/p&gt;&lt;p&gt;
In &lt;a href="http://quetzalcoatal.blogspot.com/2010/01/developing-new-account-types-part-0.html"&gt;the previous blog post&lt;/a&gt;, I gave a broad overview on the overall structure of the backend interfaces and the components of account implementation. Now, we will prepare the necessary components of getting your extension's folder displayed in the folder pane.
&lt;/p&gt;&lt;h4&gt;Account implementation decisions&lt;/h4&gt;&lt;p&gt;
Before you start implementing, you have to decide how to structure the account. The first decision is what the &lt;span style="font-style: italic;"&gt;internal account type&lt;/span&gt; will be. This
will be the value of &lt;tt&gt;nsIMsgAccount::type&lt;/tt&gt; and will dictate the contract IDs for several interfaces. The next decision is what the &lt;span style="font-style: italic;"&gt;account URI scheme&lt;/span&gt; is. This will be the scheme for the URI and dictates the contract IDs for a few more interfaces; for &lt;span style="font-style: italic;"&gt;mailbox accounts&lt;/span&gt;, this scheme will be &lt;tt&gt;&lt;span class="special"&gt;mailbox&lt;/span&gt;&lt;/tt&gt;. For my extension, I have decided to choose &lt;tt&gt;&lt;span class="special"&gt;webforum&lt;/span&gt;&lt;/tt&gt; for both of these.
&lt;/p&gt;&lt;p&gt;
Another important decision to make will be the server for which you will be doing most of your initial tests. It should be something that is manageable for debugging purposes. In my case, I've decided to bestow this honor on &lt;a href="http://wysifauthoring.informe.com/forum/"&gt;the
Kompozer web forum&lt;/a&gt;, because it seems lower traffic than any other forum I'm reasonably interested in. As you may notice, I am starting my extension with the intention of focusing on phpBB access&amp;mdash;it's sufficiently widely used that I expect that only supporting phpBB at first would still make a worthwhile extension.
&lt;/p&gt;&lt;p&gt;
Once you have decided that, you should take the time to study how things will be structured: what determines a folder? What determines a message? A thread? Replies? How are you going to be carrying out new actions, such as checking for new messages? What internal information are you going to need to save for accessing? Heck, what determines the "server" to begin with? In my case, the DOM inspector is an invaluable tool for answering this questions. Don't worry about how to figure out the list of possible subscribable folders yet. Subscription will come into play much later; we are going to start by just hardcoding this list somewhere.
&lt;/p&gt;&lt;p&gt;
In my case, I am choosing to structure the folders as a &lt;span style="font-style: italic;"&gt;Category&lt;/span&gt; &amp;rarr; &lt;span style="font-style: italic;"&gt;Forum&lt;/span&gt; hierarchy. I'll pick a few of the smaller forums to use so I don't overwhelm debug logs.
&lt;/p&gt;&lt;h4&gt;Implementing protocol information&lt;/h4&gt;&lt;p&gt;
Since &lt;tt&gt;nsIMsgProtocolInfo&lt;/tt&gt; is the shortest and simplest of the interfaces, let me start by implementing this one. There are a total of 12 attributes and 1 function on this interface, so the code will not be hard to write. Following is an implementation of the code &lt;a href="#note-1.1"&gt;[1]&lt;/a&gt;:
&lt;/p&gt;
&lt;pre class="lang-js"&gt;&lt;span class="comment"&gt;&lt;/span&gt;wfService&lt;span class="special"&gt;.&lt;/span&gt;prototype &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
  contractID&lt;span class="special"&gt;:&lt;/span&gt; &lt;span class="special"&gt;[&lt;/span&gt;&lt;span class="string"&gt;"@mozilla.org/messenger/protocol/info;1?type=&lt;/span&gt;&lt;span class="special"&gt;webforum&lt;/span&gt;&lt;span class="string"&gt;"&lt;/span&gt;&lt;span class="special"&gt;],&lt;/span&gt;
  QueryInterface&lt;span class="special"&gt;:&lt;/span&gt; XPCOMUtils&lt;span class="special"&gt;.&lt;/span&gt;generateQI&lt;span class="special"&gt;([&lt;/span&gt;&lt;span class="type"&gt;Ci&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;nsIMsgProtocolInfo&lt;span class="special"&gt;]),&lt;/span&gt;

 &lt;span class="comment"&gt; // Used by the account wizard and account manager&lt;/span&gt;
  &lt;span class="keyword"&gt;get&lt;/span&gt; defaultLocalPath&lt;span class="special"&gt;()&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
    &lt;span class="keyword"&gt;let&lt;/span&gt; dirSvc &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="type"&gt;Cc&lt;/span&gt;&lt;span class="special"&gt;[&lt;/span&gt;&lt;span class="string"&gt;"@mozilla.org/file/directory_service;1"&lt;/span&gt;&lt;span class="special"&gt;]&lt;/span&gt;
                   &lt;span class="special"&gt;.&lt;/span&gt;getService&lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="type"&gt;Ci&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;nsIProperties&lt;span class="special"&gt;);&lt;/span&gt;
    &lt;span class="keyword"&gt;let&lt;/span&gt; file &lt;span class="special"&gt;=&lt;/span&gt; dirSvc&lt;span class="special"&gt;.&lt;/span&gt;get&lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="string"&gt;"ProfD"&lt;/span&gt;&lt;span
class="special"&gt;,&lt;/span&gt; &lt;span class="type"&gt;Ci&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;nsIFile&lt;span class="special"&gt;);&lt;/span&gt;
  file&lt;span class="special"&gt;.&lt;/span&gt;append&lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="string"&gt;"WebForums"&lt;/span&gt;&lt;span class="special"&gt;);&lt;/span&gt;
    &lt;span class="keyword"&gt;if&lt;/span&gt; &lt;span class="special"&gt;(!&lt;/span&gt;file&lt;span class="special"&gt;.&lt;/span&gt;exists&lt;span class="special"&gt;())&lt;/span&gt;
      file&lt;span class="special"&gt;.&lt;/span&gt;create&lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="type"&gt;Ci&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;nsIFile&lt;span class="special"&gt;.&lt;/span&gt;DIRECTORY_TYPE&lt;span class="special"&gt;,&lt;/span&gt; &lt;span class="constant"&gt;0775&lt;/span&gt;&lt;span class="special"&gt;);&lt;/span&gt;
    &lt;span class="keyword"&gt;return&lt;/span&gt; file&lt;span class="special"&gt;;&lt;/span&gt;
  &lt;span class="special"&gt;},&lt;/span&gt;
  &lt;span class="keyword"&gt;get&lt;/span&gt; serverIID&lt;span class="special"&gt;()&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt; &lt;span class="keyword"&gt;return&lt;/span&gt; &lt;span class="type"&gt;Ci&lt;/span&gt;&lt;span
class="special"&gt;.&lt;/span&gt;nsIMsgIncomingServer&lt;span class="special"&gt;;&lt;/span&gt; &lt;span class="special"&gt;},&lt;/span&gt;
  &lt;span class="keyword"&gt;get&lt;/span&gt; defaultDoBiff&lt;span class="special"&gt;()&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt; &lt;span class="keyword"&gt;return&lt;/span&gt; &lt;span class="constant"&gt;true&lt;/span&gt;&lt;span class="special"&gt;;&lt;/span&gt; &lt;span class="special"&gt;},&lt;/span&gt;
  &lt;span class="keyword"&gt;get&lt;/span&gt; requiresUsername&lt;span class="special"&gt;()&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt; &lt;span class="keyword"&gt;return&lt;/span&gt; &lt;span class="constant"&gt;false&lt;/span&gt;&lt;span class="special"&gt;;&lt;/span&gt; &lt;span class="special"&gt;},&lt;/span&gt;
  getDefaultServerPort&lt;span class="special"&gt;:&lt;/span&gt; &lt;span class="keyword"&gt;function&lt;/span&gt; &lt;span class="special"&gt;(&lt;/span&gt;secure&lt;span class="special"&gt;)&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt; &lt;span class="keyword"&gt;return&lt;/span&gt; &lt;span class="special"&gt;-&lt;/span&gt;&lt;span class="constant"&gt;1&lt;/span&gt;&lt;span class="special"&gt;;&lt;/span&gt; &lt;span class="special"&gt;},&lt;/span&gt;
  &lt;span class="keyword"&gt;get&lt;/span&gt; canDelete&lt;span class="special"&gt;()&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt; &lt;span class="keyword"&gt;return&lt;/span&gt; &lt;span class="constant"&gt;true&lt;/span&gt;&lt;span class="special"&gt;;&lt;/span&gt; &lt;span class="special"&gt;},&lt;/span&gt;

&lt;span class="comment"&gt;  // Used by UI code&lt;/span&gt;
  &lt;span class="keyword"&gt;get&lt;/span&gt; canLoginAtStartup&lt;span class="special"&gt;()&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt; &lt;span class="keyword"&gt;return&lt;/span&gt; &lt;span class="constant"&gt;true&lt;/span&gt;&lt;span class="special"&gt;;&lt;/span&gt; &lt;span class="special"&gt;},&lt;/span&gt;
  &lt;span class="keyword"&gt;get&lt;/span&gt; canGetMessages&lt;span class="special"&gt;()&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt; &lt;span class="keyword"&gt;return&lt;/span&gt; &lt;span class="constant"&gt;true&lt;/span&gt;&lt;span class="special"&gt;;&lt;/span&gt; &lt;span class="special"&gt;},&lt;/span&gt;
  &lt;span class="keyword"&gt;get&lt;/span&gt; canGetIncomingMessages&lt;span class="special"&gt;()&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt; &lt;span class="keyword"&gt;return&lt;/span&gt; &lt;span class="constant"&gt;false&lt;/span&gt;&lt;span class="special"&gt;;&lt;/span&gt; &lt;span class="special"&gt;},&lt;/span&gt;
  &lt;span class="keyword"&gt;get&lt;/span&gt; showComposeMsgLink&lt;span class="special"&gt;()&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt; &lt;span class="keyword"&gt;return&lt;/span&gt; &lt;span class="constant"&gt;false&lt;/span&gt;&lt;span class="special"&gt;;&lt;/span&gt; &lt;span class="special"&gt;},&lt;/span&gt;
  &lt;span class="keyword"&gt;get&lt;/span&gt; specialFoldersDeletionAllowed&lt;span class="special"&gt;()&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt; &lt;span class="keyword"&gt;return&lt;/span&gt; &lt;span class="constant"&gt;false&lt;/span&gt;&lt;span class="special"&gt;;&lt;/span&gt; &lt;span class="special"&gt;}&lt;/span&gt;
&lt;span class="special"&gt;};&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;
The meaning of each of the attributes can be found in more detail on &lt;a href="https://developer.mozilla.org/en/nsIMsgProtocolInfo"&gt;the MDC page&lt;/a&gt;. The properties used by the account wizard mostly control initial preference values; those used by the UI code mostly control which UI elements are enabled. I have excluded from the implementation also those attributes which are unused.
&lt;/p&gt;&lt;p&gt;
Perhaps the most leeway you have is in implementing &lt;tt&gt;defaultLocalPath&lt;/tt&gt;. In this case, I have adapted the RSS implementation, which does not allow users to change this location. The other implementation (used by IMAP, POP, NNTP, Movemail, and Local Folders) uses a preference to return the default path. An example implementation of this method
is like thus:
&lt;/p&gt;
&lt;pre class="lang-js"&gt;
&lt;span class="keyword"&gt;get&lt;/span&gt; defaultLocalPath&lt;span class="special"&gt;()&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
 &lt;span class="comment"&gt; // This will probably be found in the constructor&lt;/span&gt;
  &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_prefs &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="type"&gt;Cc&lt;/span&gt;&lt;span class="special"&gt;[&lt;/span&gt;&lt;span class="string"&gt;"@mozilla.org/preferences-service;1"&lt;/span&gt;&lt;span class="special"&gt;]&lt;/span&gt;
                  &lt;span class="special"&gt;.&lt;/span&gt;getService&lt;span class="special"&gt;(&lt;/span&gt;&lt;span
class="type"&gt;Ci&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;nsIPrefService&lt;span class="special"&gt;)&lt;/span&gt;
                  &lt;span class="special"&gt;.&lt;/span&gt;getBranch&lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="string"&gt;"extensions.&lt;/span&gt;&lt;span class="special"&gt;webfora&lt;/span&gt;&lt;span class="string"&gt;."&lt;/span&gt;&lt;span class="special"&gt;);&lt;/span&gt;
 &lt;span class="comment"&gt; // Preference looks like [ProfD]WebForums&lt;/span&gt;
  &lt;span class="keyword"&gt;let&lt;/span&gt; pref &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_prefs&lt;span class="special"&gt;.&lt;/span&gt;getComplexValue&lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="string"&gt;"rootDir"&lt;/span&gt;&lt;span class="special"&gt;,&lt;/span&gt; &lt;span class="type"&gt;Ci&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;nsIRelativeFilePref&lt;span class="special"&gt;);&lt;/span&gt;
  &lt;span class="keyword"&gt;return&lt;/span&gt; pref&lt;span class="special"&gt;.&lt;/span&gt;file&lt;span class="special"&gt;;&lt;/span&gt;
&lt;span class="special"&gt;},&lt;/span&gt;&lt;/pre&gt;&lt;p&gt;
Once you have completed that, you should test that the service implementations work as expected via test snippets in the Error Console. The account manager can be mean when it comes to unusable
account types &lt;a href="#note-1.2"&gt;[2]&lt;/a&gt;, so this will help fix the most obvious bugs before the account manager attempts to do it for you.
&lt;/p&gt;&lt;h4&gt;Server and root folder discovery&lt;/h4&gt;&lt;p&gt;
Before I start going any further with code, let me take a minute to explain how servers and folders interact. The server objects themselves do surprisingly little in the UI; the most common property calls are probably &lt;tt&gt;rootFolder&lt;/tt&gt; and &lt;tt&gt;type&lt;/tt&gt;. This even includes
what you might think of as server attributes: the bold display name, has new messages treeview properties, etc. Instead, those features can be found on the &lt;i&gt;root folder&lt;/i&gt;, which is a "fake" folder object. Most of what we care about in this part happens on the root folder instead of the server; however, if you browse the implementation in &lt;tt&gt;nsMsgDBFolder&lt;/tt&gt;, you can see that some of the property calls get forwarded back to the server for root folders.
&lt;/p&gt;&lt;p&gt;
The backend code will create server objects early on and hold onto them for the duration of the program (or until they are deleted). The server objects then create the root folders which then create subfolders as necessary. Links that go backwards (parent links and server links) are weak references to avoid refcount cycles. Most of this work is hidden in &lt;tt&gt;nsMsgDBFolder&lt;/tt&gt; for you. After creation, various properties are accessed at will; some properties will be loaded in from the database info (a topic for later).
&lt;/p&gt;&lt;p&gt;
In more concrete code terms, the following is the steps in loading the
folder pane:
&lt;/p&gt;&lt;ol&gt;
&lt;li&gt;The account manager loads the &lt;tt&gt;mail.accountmanager.accounts&lt;/tt&gt; preference; the values here are a comma-separated list of account keys.&lt;/li&gt;
&lt;li&gt;For each account key, an account is instantiated. Per-account data is read off of the &lt;tt&gt;mail.account.&lt;i&gt;&amp;lt;key&amp;gt;&lt;/i&gt;&lt;/tt&gt; preference branch; in specific, the server preference contains the server key to load and the identities preference is a comma-separated list of identity keys.&lt;/li&gt;
&lt;li&gt;The identities and servers are then bootstrapped. In the case of servers, the server is created as an object with the &lt;tt&gt;@mozilla.org/messenger/server;1?type=&lt;i&gt;&amp;lt;type&amp;gt;&lt;/i&gt;&lt;/tt&gt;
contract ID. The server pref branch is &lt;tt&gt;mail.server.&lt;i&gt;&amp;lt;key&amp;gt;&lt;/i&gt;&lt;/tt&gt;; key preferences here are &lt;tt&gt;type&lt;/tt&gt;, the type for the contract ID; &lt;tt&gt;userName&lt;/tt&gt;, the (optional) username of the server; and &lt;tt&gt;hostname&lt;/tt&gt;, the (required) host of the server.&lt;/li&gt;
&lt;li&gt;The account manager sets the &lt;tt&gt;key&lt;/tt&gt;, &lt;tt&gt;type&lt;/tt&gt;, &lt;tt&gt;username&lt;/tt&gt;, and &lt;tt&gt;hostName&lt;/tt&gt; properties, in that order on the server object instance and then retrieves the &lt;tt&gt;port&lt;/tt&gt; property. The &lt;i&gt;(&lt;tt&gt;type&lt;/tt&gt;, &lt;tt&gt;username&lt;/tt&gt;, &lt;tt&gt;hostName&lt;/tt&gt;, &lt;tt&gt;port&lt;/tt&gt;) tuple&lt;/i&gt; is the unique identifier for a server: no two servers can have the same
combination of these values. Now your server is constructed and returned to the folder pane.&lt;/li&gt;
&lt;li&gt;The folder pane retrieves the &lt;tt&gt;rootFolder&lt;/tt&gt; of your server. If you happened to be saved in the expanded state, &lt;tt&gt;subFolders&lt;/tt&gt; is recursively retrieved from folders as corresponding to the saved open state. The folder pane also calls &lt;tt&gt;performExpand()&lt;/tt&gt; on the
server if the root folder is expanded.&lt;/li&gt;
&lt;/ol&gt;&lt;p&gt;
So that explains how your server gets created; how do your folders get created? &lt;tt&gt;nsMsgIncomingServer::GetRootFolder&lt;/tt&gt; &lt;a href="#note-1.3"&gt;[3]&lt;/a&gt; calls &lt;tt&gt;nsMsgIncomingServer::CreateRootFolder&lt;/tt&gt;, which calls &lt;tt&gt;serverURI&lt;/tt&gt; and uses it to construct an RDF resource. &lt;tt&gt;serverURI&lt;/tt&gt; creates a URI of the form &lt;tt&gt;localstoretype://[&amp;lt;username&amp;gt;@]&amp;lt;hostname&amp;gt;&lt;/tt&gt; by default. This URI is actually the URI of your root folder; other code will assume that this invariant holds true (especially subscribe!). Other folders are created when you get the &lt;tt&gt;subFolders&lt;/tt&gt; property. When the
folder URI is parsed (which is pretty much the first time a useful property is called), &lt;tt&gt;getIncomingServerType&lt;/tt&gt; is called to get the type of the server.
&lt;/p&gt;&lt;p&gt;
In summary, you may need to implement &lt;tt&gt;localStoreType&lt;/tt&gt; and possible &lt;tt&gt;serverURI&lt;/tt&gt; on your server, and &lt;tt&gt;subFolders&lt;/tt&gt;, and &lt;tt&gt;getIncomingServerType&lt;/tt&gt;, and &lt;tt&gt;CreateBaseMessageURI&lt;/tt&gt; on your folder &lt;a href="#note-1.4"&gt;[4]&lt;/a&gt;. First we'll start by getting the root folder display working:
&lt;/p&gt;&lt;pre class="lang-js"&gt;&lt;span class="keyword"&gt;function&lt;/span&gt; wfServer&lt;span class="special"&gt;()&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
  JSExtendedUtils&lt;span class="special"&gt;.&lt;/span&gt;makeCPPInherits&lt;span class="special"&gt;(&lt;/span&gt;&lt;span
class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;,&lt;/span&gt;
    &lt;span class="string"&gt;"@mozilla.org/messenger/jsincomingserver;1"&lt;/span&gt;&lt;span class="special"&gt;);&lt;/span&gt;
&lt;span class="special"&gt;}&lt;/span&gt;
wfServer&lt;span class="special"&gt;.&lt;/span&gt;prototype &lt;span class="special"&gt;=&lt;/span&gt; &lt;span
class="special"&gt;{&lt;/span&gt;
  contractID&lt;span class="special"&gt;:&lt;/span&gt; &lt;span class="special"&gt;[&lt;/span&gt;&lt;span class="string"&gt;"@mozilla.org/messenger/server;1?type=&lt;/span&gt;&lt;span class="special"&gt;webforum&lt;/span&gt;&lt;span class="string"&gt;"&lt;/span&gt;&lt;span class="special"&gt;],&lt;/span&gt;
  QueryInterface&lt;span class="special"&gt;:&lt;/span&gt; JSExtendedUtils&lt;span class="special"&gt;.&lt;/span&gt;generateQI&lt;span class="special"&gt;([]),&lt;/span&gt;
  &lt;span class="keyword"&gt;get&lt;/span&gt; localStoreType&lt;span class="special"&gt;()&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt; &lt;span class="keyword"&gt;return&lt;/span&gt; &lt;span class="string"&gt;"&lt;/span&gt;&lt;span class="special"&gt;webforum&lt;/span&gt;&lt;span class="string"&gt;"&lt;/span&gt;&lt;span class="special"&gt;;&lt;/span&gt; &lt;span
class="special"&gt;}&lt;/span&gt;
&lt;span class="special"&gt;};&lt;/span&gt;


&lt;span class="keyword"&gt;function&lt;/span&gt; wfFolder&lt;span class="special"&gt;()&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
  JSExtendedUtils&lt;span class="special"&gt;.&lt;/span&gt;makeCPPInherits&lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;,&lt;/span&gt;
    &lt;span class="string"&gt;"@mozilla.org/messenger/jsmsgfolder;1"&lt;/span&gt;&lt;span class="special"&gt;);&lt;/span&gt;
&lt;span class="special"&gt;}&lt;/span&gt;
wfFolder&lt;span class="special"&gt;.&lt;/span&gt;prototype &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
  contractID&lt;span class="special"&gt;:&lt;/span&gt; &lt;span class="string"&gt;"@mozilla.org/rdf/resource-factory;1?name=&lt;/span&gt;&lt;span class="special"&gt;webforum&lt;/span&gt;&lt;span class="string"&gt;"&lt;/span&gt;&lt;span class="special"&gt;,&lt;/span&gt;
  QueryInterface&lt;span class="special"&gt;:&lt;/span&gt; JSExtendedUtils&lt;span class="special"&gt;.&lt;/span&gt;generateQI&lt;span class="special"&gt;([]),&lt;/span&gt;
  getIncomingServerType&lt;span class="special"&gt;:&lt;/span&gt; &lt;span class="keyword"&gt;function&lt;/span&gt; &lt;span
class="special"&gt;()&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt; &lt;span class="keyword"&gt;return&lt;/span&gt; &lt;span class="string"&gt;"&lt;/span&gt;&lt;span class="special"&gt;webforum&lt;/span&gt;&lt;span class="string"&gt;"&lt;/span&gt;&lt;span class="special"&gt;;&lt;/span&gt; &lt;span class="special"&gt;}&lt;/span&gt;
&lt;span class="special"&gt;};&lt;/span&gt;
&lt;/pre&gt;&lt;p&gt;
At this point, I recommend you again check to make sure resources are properly registering via the Error Console. With that in hand, it's time to modify your preferences manually. I personally recommend changing settings via editing prefs.js while Thunderbird is off so you
don't accidentally confuse the account manager. I'm using the keys &lt;tt&gt;account99&lt;/tt&gt; and &lt;tt&gt;server99&lt;/tt&gt; to make it plain which account is being edited. First, I copy the &lt;tt&gt;mail.identity.id3&lt;/tt&gt; pref branch (any identity would do) and change the &lt;tt&gt;id3&lt;/tt&gt; to &lt;tt&gt;id99&lt;/tt&gt;. Then I copy the &lt;tt&gt;mail.account.account3&lt;/tt&gt; pref branch and change the &lt;tt&gt;3&lt;/tt&gt;'s to &lt;tt&gt;99&lt;/tt&gt;'s.
&lt;/p&gt;&lt;p&gt;
The next changes are the server preferences, which are going to be the most unique. &lt;tt&gt;directory&lt;/tt&gt;&lt;tt&gt;directory-rel&lt;/tt&gt; are set to a folder where I want to store stuff (&lt;tt&gt;[ProfD]WebForums/kompozer&lt;/tt&gt;, in my case). &lt;tt&gt;download_on_biff&lt;/tt&gt; and &lt;tt&gt;login_at_startup&lt;/tt&gt; are set to &lt;tt&gt;&lt;span class="constant"&gt;false&lt;/span&gt;&lt;/tt&gt; (to avoid
dealing with biff for a bit longer). &lt;tt&gt;name&lt;/tt&gt; is set to be the display name of the server. &lt;tt&gt;hostname&lt;/tt&gt; and &lt;tt&gt;userName&lt;/tt&gt; were set to the appropriate values for this account &lt;a href="#note-1.5"&gt;[5]&lt;/a&gt;. To the preference &lt;tt&gt;mail.accountmanager.accounts&lt;/tt&gt;, I appended &lt;tt&gt;account99&lt;/tt&gt;. With those changes done, I then start up Thunderbird to see the outcome:&lt;br&gt;
&lt;img alt="Root server in folder pane" src="http://4.bp.blogspot.com/_qW4UNslWKZU/S2yEcaNAx8I/AAAAAAAAAB4/qGPYqM8jCCc/s400/First_account.png" /&gt;&lt;br&gt;
Perhaps I should have chosen a shorter name for display.
&lt;/p&gt;&lt;h4&gt;Folder discovery&lt;/h4&gt;&lt;p&gt;
Now that the root folder is displayed, we need to get the folders added to the display pane.
Somehow, we need to figure out what the folder hierarchy looks like&amp;mdash;it has to be stored in some file, in other words. The NNTP code uses the newsrc file to store its folder tree, and local folders looks at the directory hierarchy for its map, to name two examples.
&lt;/p&gt;&lt;p&gt;
In my code, I'm going to choose the use of a JSON file to store this data. I've considered SQLite, but I don't really need synchronization (per-server files work nicely here), and I'm mostly doing simple lookups. Plus, I can probably handle automatic schema migration more easily in SQLite.
&lt;/p&gt;&lt;p&gt;
For this next part, we concentrate on a single property: &lt;tt&gt;subFolders&lt;/tt&gt;. This function typically has two parts: it first checks for initialization (if so, it returns the enumerator to the stored values); if it's not initialized, the rest of the function, or perhaps a second function altogether, is used to create the subfolders.
&lt;/p&gt;&lt;p&gt;
Some code to initialize these subfolders is as follows (the logic to retrieve the database is not included and can instead be found in &lt;a href="http://hg.mozilla.org/users/Pidgeot18_gmail.com/webfora/file/bc9ca61c5d0d/components/wfFolder.js"&gt;the source code for my extension&lt;/a&gt;):
&lt;/p&gt;&lt;pre class="lang-js"&gt; &lt;span class="keyword"&gt;get&lt;/span&gt; subFolders&lt;span class="special"&gt;()&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
    &lt;span class="keyword"&gt;if&lt;/span&gt; &lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_folders&lt;span class="special"&gt;)&lt;/span&gt;
      &lt;span class="keyword"&gt;return&lt;/span&gt; array2enum&lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_folders&lt;span class="special"&gt;);&lt;/span

    &lt;span class="comment"&gt;// If we're here, we need to initialize.&lt;/span&gt;
    &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_inner&lt;span class="special"&gt;.&lt;/span&gt;QueryInterface&lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="type"&gt;Ci&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;nsIMsgFolder&lt;span class="special"&gt;);&lt;/span&gt;
    &lt;span class="keyword"&gt;let&lt;/span&gt; serverDB &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_inner&lt;span class="special"&gt;.&lt;/span&gt;server&lt;span class="special"&gt;.&lt;/span&gt;wrappedJSObject&lt;span class="special"&gt;.&lt;/span&gt;_db&lt;span class="special"&gt;;&lt;/span&gt;
    &lt;span class="comment"&gt;// Uninitialized -&amp;gt; no subfolders&lt;/span&gt;
    &lt;span class="keyword"&gt;if&lt;/span&gt; &lt;span class="special"&gt;(!&lt;/span&gt;serverDB&lt;span class="special"&gt;.&lt;/span&gt;categories&lt;span class="special"&gt;)&lt;/span&gt;
      &lt;span class="keyword"&gt;return&lt;/span&gt; array2enum&lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_folders &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="special"&gt;[]);&lt;/span&gt;

    &lt;span class="comment"&gt;// First find our level&lt;/span&gt;
    &lt;span class="keyword"&gt;let&lt;/span&gt; level &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="comment"&gt;/* some logic */&lt;/span&gt;&lt;span class="special"&gt;;&lt;/span&gt;

    &lt;span class="keyword"&gt;let&lt;/span&gt; URI &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_inner&lt;span class="special"&gt;.&lt;/span&gt;URI &lt;span class="special"&gt;+&lt;/span&gt; &lt;span class="string"&gt;'/'&lt;/span&gt;&lt;span class="special"&gt;;&lt;/span
    &lt;span class="keyword"&gt;let&lt;/span&gt; folders &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="special"&gt;[];&lt;/span&gt;
    &lt;span class="comment"&gt;// Yes, we still use RDF&lt;/span&gt;
    &lt;span class="keyword"&gt;let&lt;/span&gt; RDF &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="type"&gt;Cc&lt;/span&gt;&lt;span class="special"&gt;[&lt;/span&gt;&lt;span class="string"&gt;"@mozilla.org/rdf/rdf-service;1"&lt;/span&gt;&lt;span class="special"&gt;].&lt;/span&gt;getService&lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="type"&gt;Ci&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;nsIRDFService&lt;span class="special"&gt;);&lt;/span&gt;
    &lt;span class="keyword"&gt;let&lt;/span&gt; netUtils &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="type"&gt;Cc&lt;/span&gt;&lt;span class="special"&gt;[&lt;/span&gt;&lt;span class="string"&gt;"@mozilla.org/network/io-service;1"&lt;/span&gt;&lt;span class="special"&gt;]&lt;/span&gt;
                     &lt;span class="special"&gt;.&lt;/span&gt;getService&lt;span class="special"&gt;(&lt;/span&gt;&lt;span
class="type"&gt;Ci&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;nsINetUtil&lt;span class="special"&gt;);&lt;/span&gt;
    &lt;span class="keyword"&gt;for each&lt;/span&gt;&lt;span class="special"&gt; (&lt;/span&gt;&lt;span class="keyword"&gt;let&lt;/span&gt; sub &lt;span class="keyword"&gt;in&lt;/span&gt; level&lt;span class="special"&gt;)&lt;/span&gt; &lt;span class="special"&gt;{&lt;/span&gt;
      &lt;span class="comment"&gt;// Some URIs may contain spaces, etc. -&amp;gt; escape&lt;/span&gt;
      &lt;span class="keyword"&gt;let&lt;/span&gt; folder &lt;span class="special"&gt;=&lt;/span&gt; RDF&lt;span class="special"&gt;.&lt;/span&gt;GetResource&lt;span class="special"&gt;(&lt;/span&gt;URI &lt;span class="special"&gt;+&lt;/span&gt; netUtils&lt;span class="special"&gt;.&lt;/span&gt;escapeString&lt;span class="special"&gt;(&lt;/span&gt;sub&lt;span class="special"&gt;.&lt;/span&gt;name&lt;span class="special"&gt;,&lt;/span&gt;
        &lt;span class="type"&gt;Ci&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;nsINetUtil&lt;span class="special"&gt;.&lt;/span&gt;ESCAPE_URL_PATH&lt;span class="special"&gt;));&lt;/span&gt;
      folder&lt;span class="special"&gt;.&lt;/span&gt;QueryInterface&lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="type"&gt;Ci&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;nsIMsgFolder&lt;span class="special"&gt;);&lt;/span
      folder&lt;span class="special"&gt;.&lt;/span&gt;parent &lt;span class="special"&gt;=&lt;/span&gt; &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;;&lt;/span&gt;
      folders&lt;span class="special"&gt;.&lt;/span&gt;push&lt;span class="special"&gt;(&lt;/span&gt;folder&lt;span class="special"&gt;);&lt;/span&gt;
    &lt;span class="special"&gt;}&lt;/span&gt;
    &lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_folders &lt;span class="special"&gt;=&lt;/span&gt; folders&lt;span class="special"&gt;;&lt;/span&gt;
    &lt;span class="keyword"&gt;return&lt;/span&gt; array2enum&lt;span class="special"&gt;(&lt;/span&gt;&lt;span class="keyword"&gt;this&lt;/span&gt;&lt;span class="special"&gt;.&lt;/span&gt;_folders&lt;span class="special"&gt;);&lt;/span
  &lt;span class="special"&gt;}&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;
There are a few major things to note. First, the new folders are created via the RDF resource. Both Thunderbird and SeaMonkey use RDF for folder access, so it is still a good idea to create via the RDF service so you don't confuse the caller code. Also, with that in mind, the subfolder name still needs to be escaped as well in the URI, hence the calls to &lt;tt&gt;nsINetUtil&lt;/tt&gt;. The auxiliary function &lt;tt&gt;array2enum&lt;/tt&gt; takes in a JS array and returns a proper &lt;tt&gt;nsISimpleEnumerator&lt;/tt&gt; for the array. I've excluded it's definition here do to its simplicity and the length of this document; if you want to see it, you can view it from the extension source code. The last thing to note is that this code is using &lt;tt&gt;this._inner&lt;/tt&gt;: this variable is a link to the &lt;tt&gt;nsMsgDBFolder&lt;/tt&gt; implementation which was created for us by the &lt;tt&gt;JSExtendedUtils&lt;/tt&gt; inheritance call. I will defer a more thorough treatment of this C++-JS glue until later.
&lt;/p&gt;&lt;h4&gt;Folder pane extras&lt;/h4&gt;&lt;p&gt;
At this point, you should have a simple, plain folder hierarchy, which is navigable if not fully usable. In terms of UI, though, it's not quite fully perfect: if you have an inbox, it will be rather indistinguishable from other folders; similarly, "fake" folders (think the [Gmail] folder if you have Gmail IMAP) show up as regular folders. These things are handled to a large degree by CSS.
&lt;/p&gt;&lt;p&gt;
A full list of the available of the styling points for the Thunderbird folder pane &lt;a href="https://developer.mozilla.org/en/Extensions/Thunderbird/Styling_the_Folder_Pane"&gt;can be found on MDC&lt;/a&gt;. Extensions can also &lt;a href="https://developer.mozilla.org/en/Extensions/Thunderbird/Adding_views_to_the_Folder_Pane"&gt;modify the folder pane views&lt;/a&gt; or &lt;a href="https://developer.mozilla.org/en/Extensions/Thunderbird/Adding_items_to_the_Folder_Pane"&gt;add other, non-folder items&lt;/a&gt;. More information can be found at MDC's &lt;a href="https://developer.mozilla.org/en/Extensions/Thunderbird/Working_with_the_Folder_Pane"&gt;folder pane information page&lt;/a&gt;.
&lt;/p&gt;&lt;p&gt;
I would provide some example styling code here, but when I was doing testing, I discovered some related assertion failures that I have not yet had time to grok. In the interest of keeping to a posting every two weeks, I am going to defer this until either a mini "part 1.5" or the beginning of part 2, depending on how much time I will have available next week.
&lt;/p&gt;&lt;h4&gt;Notes&lt;/h4&gt;
&lt;ol&gt;
&lt;li id="note-1.1"&gt;I will not, in general, post the full code for any of the classes, only enough to demonstrate what needs to be done. For example, the &lt;tt&gt;classID&lt;/tt&gt; property is omitted in this example. Something to note is that I have a modification to XPCOMUtils locally that will accept arrays of contract IDs as opposed to a single one (&lt;tt&gt;wfService&lt;/tt&gt; will be implementing more than one contract ID).&lt;/li&gt;
&lt;li id="note-1.2"&gt;What it specifically does is attempt to get the server; if it fails, then it removes the account from the accounts pref. If you are compiling your own builds for your extension development profile, I recommend you remove the lines in &lt;tt&gt;nsMsgAccountManager::LoadAccount&lt;/tt&gt; that remove the account on failure.&lt;/li&gt;
&lt;li id="note-1.3"&gt;In general, I will mix the IDL and C++ names for methods and properties in the course of the guide. As a basic rule of thumb, if you see a &lt;tt&gt;::&lt;/tt&gt; in the name, it's a C++ name; otherwise, it's the IDL name.
&lt;/li&gt;
&lt;li id="note-1.4"&gt;&lt;tt&gt;getBaseMessageURI&lt;/tt&gt; is a local function called by &lt;tt&gt;nsMsgDBFolder&lt;/tt&gt; during initialization that is used to set up the URIs for getting individual messages. This function will be covered in more depth as we get messages working, but it is technically necessary for startup (a stub that does nothing is provided).&lt;/li&gt;
&lt;li id="note-1.5"&gt;A strong temptation for accounts whose sources are some web address (for example, RSS or my web forums account) is to put the base address as the hostname property. However, as you would quickly realize, that plays havoc on URI parsing, and &lt;tt&gt;nsMsgDBFolder::parseURI&lt;/tt&gt; is not virtual. A better option would probably be to leave the hostname as some identifier that you use only for guaranteeing uniqueness and to store the base URI somewhere else. Since all of my folders have independent URIs associated with them, I can safely ignore the issue until account creation and subscription are covered.
&lt;/li&gt;
&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-1846727436225030919?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/1846727436225030919/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=1846727436225030919' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/1846727436225030919'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/1846727436225030919'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2010/02/developing-new-account-types-part-1.html' title='Developing new account types, part 1: The folder pane'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_qW4UNslWKZU/S2yEcaNAx8I/AAAAAAAAAB4/qGPYqM8jCCc/s72-c/First_account.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-4923752132035740476</id><published>2010-01-22T11:54:00.003-05:00</published><updated>2010-01-22T12:06:23.461-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='accttype'/><category scheme='http://www.blogger.com/atom/ns#' term='mailnews'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>Developing new account types, part 0: An introduction</title><content type='html'>This series of blog posts discusses the creation of a new account type implemented in JavaScript. Over the course of these blogs, I use the development of my &lt;a href="http://hg.mozilla.org/users/Pidgeot18_gmail.com/webfora"&gt;Web Forums extension&lt;/a&gt; to
explain the necessary actions in creating new account types. I hope to add a new post once every two weeks (I cannot guarantee it, though).
&lt;/p&gt;&lt;p&gt;
Before I begin the actual discussion, let me give some background. The ability to develop new account types has been my biggest extension goal for about two years now. Probably because of its difficulty, I know of only two extensions that have tried to do it: &lt;a
href="http://mxr.mozilla.org/comm-central/source/mailnews/extensions/newsblog/"&gt;what is now the RSS code&lt;/a&gt;, and &lt;a href="http://webmail.mozdev.org/index.html"&gt;Webmail&lt;/a&gt;. In the first case, the implementer resorted to creating a binary component for the incoming server; in the latter, the implementer wrote a fake IMAP (and POP, SMTP) server to proxy the information to the web interface.
&lt;/p&gt;&lt;p&gt;
Some preliminary points: making a new account type is &lt;strong&gt;not&lt;/strong&gt; a Good First Extension. You will need a fair amount of XPCOM experience, and probably decent experience at delving into implementations of undocumented interfaces. How much XUL and DOM (for stuff like webscraping) you use is up to you. MDC has a guide on &lt;a href="https://developer.mozilla.org/en/Extensions/Thunderbird/Building_a_Thunderbird_extension"&gt;building a Thunderbird extension from scratch&lt;/a&gt;. It is also probably not a bad idea to get comfortable with manual preference editing.
&lt;/p&gt;&lt;p&gt;
I am also trying a different form of development in this guide. This is not being done via my more common method of manually editing HTML by hand, but by writing in &lt;a href="http://www.kompozer.net"&gt;Kompozer&lt;/a&gt;. I'm also attempting to get more code included in my posts, and hopefully some images as well (the last part will be hardest). Like my first guide, I do expect that this will be adapted into a series of documents on MDC at some point. Some more
reference-oriented documentation will be posted on MDC as I write this.
&lt;/p&gt;&lt;p&gt;I
 personally use a debug version of Thunderbird, on Linux, very near the tip as the source code as the basis for my extension development (my regular profile is some 370 MB of stuff I don't dare threaten with developmental work). This is the same build I do patch development on, so I will rely on patches in said tree from time to time. &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=514409"&gt;One patch in particular&lt;/a&gt; is required &lt;a href="#note-0.1"&gt;[1]&lt;/a&gt;, but otherwise, it should work on 1.9.2 and probably 1.9.1 as well.
&lt;/p&gt;&lt;p&gt;
This guide is structured to first demonstrate the actual activity components (e.g., displaying messages) and only cover configuration (e.g., the wizard to create a new account) when the more basic stuff has been completed. Therefore, you will need to get comfortable with editing configuration files by hand if you follow these steps exactly.
&lt;/p&gt;&lt;h4&gt;Backend introduction&lt;/h4&gt;&lt;p&gt;
So, let's start with an overview of the backend interfaces in mailnews. A list of the interfaces a front-end widget might use to talk to an account is: &lt;tt&gt;nsIMsgAccount&lt;/tt&gt;, &lt;tt&gt;nsIMsgDatabase&lt;/tt&gt;, &lt;tt&gt;nsIMsgDBView&lt;/tt&gt; &lt;a href="#note-0.2"&gt;[2]&lt;/a&gt;, &lt;tt&gt;nsIMsgFolder&lt;/tt&gt;, &lt;tt&gt;nsIMsgDBHdr&lt;/tt&gt;, &lt;tt&gt;nsIMsgIdentity&lt;/tt&gt;, &lt;tt&gt;nsIMsgIncomingServer&lt;/tt&gt;, &lt;tt&gt;nsIMsgMailNewsUrl&lt;/tt&gt;, &lt;tt&gt;nsIMsgMessageService&lt;/tt&gt;, &lt;a
href="https://developer.mozilla.org/en/nsIMsgProtocolInfo"&gt;&lt;tt&gt;nsIMsgProtocolInfo&lt;/tt&gt;&lt;/a&gt;,
&lt;tt&gt;nsIChannel&lt;/tt&gt;, &lt;tt&gt;nsIProtocolHandler&lt;/tt&gt;, and &lt;tt&gt;nsIRDFResource&lt;/tt&gt;. Many of these would need to be implemented, and a few of them are not in any way small; to implement &lt;tt&gt;nsIMsgFolder&lt;/tt&gt; would require a total of 186 methods, setters, and getters (as of this writing), many of which are not well-documented.
&lt;/p&gt;&lt;p&gt;
In reality, implementations are not from scratch. Everything tends to boil down into two or five different implementing classes: the server, the service, the folder, the url, and the database (there is also typically a protocol implementation as well). Of these, only the service is implemented from scratch, and it gets the simplest interfaces to implement. When I said "two or five," I am referring to the fact that there are actually two types of accounts. The first type,
which only has to implement a server and service, can be called &lt;span style="font-style: italic;"&gt;mailbox accounts&lt;/span&gt;: all of the messages are downloaded into local folders &lt;a href="#note-0.3"&gt;[3]&lt;/a&gt;. The second type implements all of the above, as the messages are
generally stored on the server and downloaded on demand (or cached).
&lt;/p&gt;&lt;p&gt;
Of these two types, the less interesting is the first one. I will therefore generally ignore this account type. If you want to make such an account type, look at the RSS implementation for guidelines. The primary distinction is that mailbox accounts lack their own folder types, and therefore databases and URLs. In such a case, all you need to worry about is delivering the messages.
&lt;/p&gt;&lt;p&gt;
Following is a description of the major implemented components:
&lt;/p&gt;&lt;dl&gt;
&lt;dt&gt;Server&lt;/dt&gt;
&lt;dd&gt;The server represents the source of messages for an account. It also serves as the per-account configuration information for implementers. For example, NNTP stores the maximum connection limit to a server off of this implementation.&lt;/dd&gt;
&lt;dt&gt;Folder&lt;/dt&gt;
&lt;dd&gt;The folder represents a container of messages. Ultimately, the UI interacts more with folders than with servers, at least on a regular basis. This is the most complex interface to deal with, primarily because it can be hard to tell precisely what you need to implement
versus what (eventually) calls back on some other message.&lt;/dd&gt;
&lt;dt&gt;Database&lt;/dt&gt;
&lt;dd&gt;The database represents a store of a subset of message information. It generally stores by default what NNTP would call &lt;span style="font-style: italic;"&gt;overview information&lt;/span&gt; (enough to create a threaded message list), plus some flags like read status, as well as some information that extensions which to preserve.
&lt;/dd&gt;
&lt;dt&gt;Service&lt;/dt&gt;
&lt;dd&gt;The service is more of a "how-to" guide for accounts. This is the external endpoint for ultimately copying messages, viewing messages, etc. Note that this is the only service implementation, so the actual server communication code typically happens in a different
implementation.
&lt;/dd&gt;
&lt;dt&gt;URL&lt;/dt&gt;
&lt;dd&gt;URLs are &lt;a href="http://tvtropes.org/pmwiki/pmwiki.php/Main/ExactlyWhatItSaysOnTheTin"&gt;what the name implies&lt;/a&gt;. It's how one refers to messages, folders, and servers, although only messages are typically instantiated with the object in question. They also tend to be used as the primary internal communication system.&lt;/dd&gt;
&lt;dt&gt;Protocol&lt;/dt&gt;
&lt;dd&gt;The protocol instance represents a connection to a server. Unlike the other implementations, this one is not mandatory and is typically not visible via the "main" interfaces (nsIChannel is perhaps the most useful one they export). I suspect this is primarily useful for binary protocols, but I have not yet delved far enough into creating a new account to say for certain.&lt;/dd&gt;
&lt;/dl&gt;
&lt;h4&gt;Important interfaces and their interactions&lt;/h4&gt;
&lt;p&gt;The center of an account is represented by the &lt;tt&gt;nsIMsgAccount&lt;/tt&gt;. To get an idea for the amount of interfaces involved, look at the &lt;a
href="http://doxygen.db48x.net/comm-central/html/interfacensIMsgAccount.html"&gt;collaboration
diagrams for &lt;tt&gt;nsIMsgAccount&lt;/tt&gt;&lt;/a&gt; &lt;a
href="http://doxygen.db48x.net/comm-central/html/interfacensIMsgFolder.html"&gt;and
&lt;tt&gt;nsIMsgFolder&lt;/tt&gt;&lt;/a&gt;.
&lt;/p&gt;&lt;p&gt;
&lt;tt&gt;nsIMsgAccount&lt;/tt&gt; represents an account. The interface itself is not terribly useful&amp;mdash;it's mostly just a step on the way to get to a server or an identity.
&lt;/p&gt;&lt;p&gt;
&lt;tt&gt;nsIMsgIdentity&lt;/tt&gt; represents an identity. Identities are essentially a way of persisting compose settings; since their use is wholly related to compose code, I will not discuss them in detail until later parts of the guide.
&lt;/p&gt;&lt;p&gt;
&lt;tt&gt;nsIMsgIncomingServer&lt;/tt&gt;, as mentioned earlier, represents a message source. This is one of the interfaces you will have to implement, although much of it is already done for you. Everything that is specific to a server hangs off of this interface; everything that is specific to a folder hangs off of &lt;tt&gt;nsIMsgFolder&lt;/tt&gt;; folders are accessible via the root folder of a server.
&lt;/p&gt;&lt;p&gt;
&lt;tt&gt;nsIMsgFolder&lt;/tt&gt;, as mentioned earlier, represents a container of messages. This is one of the interfaces that has to be implemented, unless you are using a &lt;span style="font-style: italic;"&gt;mailbox account&lt;/span&gt;. All folders have a database.
&lt;/p&gt;&lt;p&gt;
&lt;tt&gt;nsIMsgDatabase&lt;/tt&gt; represents the message store overview. This has to be implemented if you are implementing &lt;tt&gt;nsIMsgFolder&lt;/tt&gt; (unless you want to be sneaky). Databases are used to get at thread and header information, via &lt;tt&gt;nsIMsgThread&lt;/tt&gt; and &lt;tt&gt;nsIMsgDBHdr&lt;/tt&gt;, respectively. Messages themselves have numerous representations: URIs, header objects, message keys, and (sometimes) message IDs. Conversion between these forms is common.
&lt;/p&gt;&lt;p&gt;
&lt;tt&gt;nsIDBFolderInfo&lt;/tt&gt; represents folder properties normally stored in the database. All of these properties are also stored in the folder cache (&lt;tt&gt;nsIMsgFolderCache&lt;/tt&gt;) to avoid opening up all of the databases just to figure out how many unread messages are in each folder.
&lt;/p&gt;&lt;p&gt;
&lt;tt&gt;nsIMsgAccountManager&lt;/tt&gt; and &lt;tt&gt;nsIMsgBiffManager&lt;/tt&gt; are two managers that handle account creation and the periodic mail download (generally called &lt;span style="font-style: italic;"&gt;biff&lt;/span&gt;), respectively. Expect to see these calling your code a lot.
&lt;/p&gt;&lt;p&gt;
&lt;tt&gt;nsIMsgDBView&lt;/tt&gt; represents the thread pane view. This is going to be the primary consumer of &lt;tt&gt;nsIMsgDatabase&lt;/tt&gt;, and this is where you should go to look to find out what happens if, e.g., you select a new message.
&lt;/p&gt;&lt;p&gt;
&lt;tt&gt;nsIMsgFilterList&lt;/tt&gt;, &lt;tt&gt;nsIMsgFilterPlugin&lt;/tt&gt;, &lt;tt&gt;nsIMsgFilterService&lt;/tt&gt;, and &lt;tt&gt;nsIMsgFilter&lt;/tt&gt; are the interfaces that deal with filtering. None of these will have to be implemented to support filtering &lt;a href="#note-0.4"&gt;[4]&lt;/a&gt;.
&lt;/p&gt;&lt;p&gt;
The &lt;tt&gt;nsIMsgSearch&lt;/tt&gt;* interfaces are those that deal with search (there are around 9 of them). Most of these will not have to be implemented to support searching. More on this when searching is discussed.
&lt;/p&gt;&lt;p&gt;
&lt;tt&gt;nsIMsgWindow&lt;/tt&gt; represents the bridge to the front-end. It is passed into many functions, although it may be &lt;tt&gt;&lt;span class="constant"&gt;null&lt;/span&gt;&lt;/tt&gt;, typically when being invoked from
the backend.&lt;br&gt;
&lt;/p&gt;&lt;p&gt;
&lt;tt&gt;nsIMsgMailNewsUrl&lt;/tt&gt; represents the URL object that loads a message. This will generally have to be implemented if &lt;tt&gt;nsIMsgFolder&lt;/tt&gt; is.
&lt;/p&gt;&lt;p&gt;
&lt;tt&gt;nsIMsgProtocolInfo&lt;/tt&gt; represents the basic information about an account type's capabilities. This interface is one that is required to be implemented. As the name implies, it is generally geared towards the capabilities of the connection to the server.
&lt;/p&gt;&lt;p&gt;
&lt;tt&gt;nsIMsgMessageService&lt;/tt&gt; and &lt;tt&gt;nsIMsgMessageFetchPartService&lt;/tt&gt; represent the ability to retrieve the message (and &lt;span style="font-style: italic;"&gt;message parts&lt;/span&gt;, more often known as attachments &lt;a href="#note-0.5"&gt;[5]&lt;/a&gt;). This is another interface that one must implement if folders are being implemented.
&lt;/p&gt;&lt;p&gt;
The MIME, compose, and import interfaces are omitted from this list of backend interfaces, as these are topics that will not be discussed for a while, and I am not certain they are useful to know about making new account types at present.
&lt;/p&gt;&lt;h4&gt;Notes&lt;/h4&gt;
&lt;ol&gt;
&lt;li id="note-0.1"&gt;The purpose behind this patch is to enable extensions to reuse files from &lt;tt&gt;base/utils&lt;/tt&gt; like C++ components can. If you were to adapt this to use C++ instead of JS, this patch would not be necessary. As the comments in the linked bug indicate, there is no guarantee that this will be implemented for Thunderbird 3.1; however, in such a scenario, the specifically required binary components would be available for reuse on some webpage. More on this when a decision is made.
&lt;/li&gt;
&lt;li id="note-0.2"&gt;Strictly speaking, this interface uses other interfaces in the list to talk to you. That said, a lot of interaction with folders and databases happens through this interface.&lt;/li&gt;
&lt;li id="note-0.3"&gt;I say &lt;span style="font-style: italic;"&gt;local folders&lt;/span&gt;&amp;mdash;not &lt;span style="font-style: italic;"&gt;Local Folders&lt;/span&gt;&amp;mdash;here because Global Inbox settings actually rely on POP-specific attributes. It is still possible, via a reimplementation, to change the delivery settings. Such a mechanism is outside the scope of this guide.&lt;/li&gt;
&lt;li id="note-0.4"&gt;It's not strictly necessary to implement these, but if you want to add custom filter terms or actions or custom search terms, some interfaces will need to be implemented. Such actions are beyond the scope of this guide.&lt;br&gt;
&lt;/li&gt;
&lt;li id="note-0.5"&gt;Classifying all message parts as attachments is a pretty big oversimplification. In general, the only time &lt;em&gt;specific parts&lt;/em&gt; are requested in Thunderbird and SeaMonkey are when attachments are involved. For more information on message parts, please see &lt;a href="http://tools.ietf.org/html/rfc2045"&gt;RFC 2045&lt;/a&gt;, &lt;a
href="http://tools.ietf.org/html/rfc2046"&gt;RFC 2046&lt;/a&gt; (two of the five MIME specifications), as well as the &lt;a href="http://tools.ietf.org/html/rfc3501#section-6.4.5"&gt;IMAP FETCH&lt;/a&gt; subsection (for numbering).
&lt;/li&gt;
&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-4923752132035740476?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/4923752132035740476/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=4923752132035740476' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/4923752132035740476'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/4923752132035740476'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2010/01/developing-new-account-types-part-0.html' title='Developing new account types, part 0: An introduction'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-1809055917445842405</id><published>2010-01-04T15:39:00.002-05:00</published><updated>2010-01-04T16:28:46.575-05:00</updated><title type='text'>Building packages: harder than they look</title><content type='html'>For a course I'm TAing, we (the other TAs and I) decided to revamp the tools so that students could more easily install them on their own computers. This was really my first look into actually producing packages for other people. Here is the long tale:
&lt;/p&gt;
&lt;h4&gt;Step 1: Build &lt;tt&gt;simpl&lt;/tt&gt;&lt;/h4&gt;
&lt;p&gt;Okay, the basic, core tools here compile and work easily. The more complicated tale is the GUI, built on qt. qt 3, to be precise. Except the autodiscovery thinks we want to try building qt 4. A single post-configure change gets this working. Only took a few hours here (trying to go the qt4 route didn't work so well, and we had interesting endeavors trying to figure out how to get KDE headers to work).
&lt;/p&gt;&lt;h4&gt;Step 2: Build binutils&lt;/h4&gt;&lt;p&gt;
This wasn't all that hard at first. Configure ran nicely and without problems, and building... oops, there's a warning and someone turned on &lt;tt&gt;-Werror&lt;/tt&gt;. Another reconfigure gets this building quickly.
&lt;/p&gt;&lt;h4&gt;Step 3: Build (cross-compiling) gcc&lt;/h4&gt;&lt;p&gt;
Configure... build... fail... reconfigure... rebuild... fail... Repeat for several hours. Make that days. Do I want these options? Or those options? Still failing. Try editing files mid-build, so if that gets it to work. And, no. Okay, let's try binutils again. Solution: &lt;tt&gt;make install&lt;/tt&gt; binutils first, then build gcc. That works without problems.
&lt;/p&gt;&lt;h4&gt;Step 3.5: Test the build&lt;/h4&gt;&lt;p&gt;
I have a Makefile that just requires me to change a few lines to swap gcc versions and directories of everything. Do that, try it, and... it doesn't work. Something about libc not working correctly.
&lt;/p&gt;&lt;h4&gt;Step 4: Build newlib&lt;/h4&gt;&lt;p&gt;
By this point, I know the drill: copy the configure from elsewhere, configure, and build. Apparently there's a typo in one of the ARM assembly files. I teach myself a tiny bit more of ARM (this is turning out to be very educational!) and fix the file. Reconfigure, rebuild, install, and test again. This time, it's complaining about missing a few functions. I found some more documentation online, and wrote my own &lt;tt&gt;sbrk&lt;/tt&gt; function (where "wrote" means copied from some file online and tweaked to make it build). Testing fails again, so I make myself a few more functions and everybody's happy.
&lt;/p&gt;&lt;h4&gt;Step 5: Build vba&lt;/h4&gt;&lt;p&gt;
As you might imagine, this one didn't work either. So many build errors. I look at what Debian did, so I ponder some more, talk it over with the other TAs, and give up. &lt;i&gt;Skritch, skritch&lt;/i&gt;.
&lt;/p&gt;&lt;h4&gt;Step 5: Build vbam&lt;/h4&gt;&lt;p&gt;
This fork builds... oh, wait, I need cmake. Okay, this fork builds without problem. They don't have version package downloads for my build script to pull, so I just have it yank a specific svn revision. Nice, simple package to work with after the mess that is cross-compiling.
&lt;/p&gt;&lt;h4&gt;Step 6: Build gdb&lt;/h4&gt;&lt;p&gt;
No problems here. Worked fine the first time, no patching, no need to rebuild. Even the testing had no problems. Stunned me.
&lt;/p&gt;&lt;h4&gt;Step 7: Package and test on school computer&lt;/h4&gt;
Problems:
&lt;ol&gt;
&lt;li&gt;Can't find &lt;tt&gt;libmpfr.so&lt;/tt&gt;&lt;/li&gt;
&lt;li&gt;&lt;tt&gt;cc1: /lib/libc.so.6: version `GLIBC_2.7' not found&lt;/tt&gt;&lt;/li&gt;
&lt;li&gt;&lt;tt&gt;as: /lib/libc.so.6: version `GLIBC_2.7' not found&lt;/tt&gt;&lt;/li&gt;
&lt;li&gt;(vbam) &lt;tt&gt;Segmentation fault&lt;/tt&gt;&lt;/li&gt;
&lt;/ol&gt;
Solutions:
&lt;ol&gt;
&lt;li&gt;Statically compile &lt;/tt&gt;libmpfr.so&lt;/tt&gt;. Not too hard...&lt;/li&gt;
&lt;li&gt;Statically link &lt;tt&gt;gcc&lt;/tt&gt;. Not very trivial. Eventually, &lt;tt&gt;LDFLAGS=-static&lt;/tt&gt; in the configure arguments works.&lt;/li&gt;
&lt;li&gt;Statically link &lt;tt&gt;as&lt;/tt&gt; (and other binutils). This requires manually copying the final line and adding in the &lt;tt&gt;-static&lt;/tt&gt; argument. Every time I rebuild binutils.&lt;/li&gt;
&lt;li&gt;Debug, find backtrace. It's in pthreads, called from SDL. Try statically linking SDL (no luck). Try using different SDL versions. Rebuild vbam with debug. Notice that the primary reason for fault is... no sound device. Patch vbam. Test again, it works!&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;For the sake of clarity, every time I had to test in the final step, I had to reupload the tarball, which started out at 49 MB and grew to 55 MB (thanks to static compilation). Sometimes I had to reupload it again, if the connection died in the middle (my internet connection started getting flaky... possibly related to the 100s of MB I was uploading a day. Or maybe the 100s of MB I was downloading (every time I restarted the script, it downloaded 100 MB of source archives....).
&lt;/p&gt;&lt;p&gt;
So, in short, I had to override build scripts for 2 different packages, patch another 2, and build 3 out of 5 packages statically. One package doesn't have a point release; the other three are spread out among three separate servers to download. Running the build script from scratch requires nearly 2GB of disk space and takes several hours. At least now I repackaged it in Makefile form so you don't have to restart all over from square one if you forgot to install cmake first. Building the final tarball requires a good minute on my system.
&lt;/p&gt;&lt;p&gt;
But, I've finally finished the experience. Plus, I won't have to do it again... after I build the 64-bit version.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-1809055917445842405?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/1809055917445842405/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=1809055917445842405' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/1809055917445842405'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/1809055917445842405'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2010/01/building-packages-harder-than-they-look.html' title='Building packages: harder than they look'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-3746253317053437225</id><published>2010-01-02T14:18:00.003-05:00</published><updated>2010-01-02T14:46:16.102-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='news'/><category scheme='http://www.blogger.com/atom/ns#' term='mailnews'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>Predicted work on Thunderbird</title><content type='html'>It's a new year, so it's time for me to predict (and probably overestimate, who knows) what I would like to do and see in the realm of Thunderbird (and SeaMonkey) and other tidbits in the Mozilla realm.
&lt;/p&gt;
&lt;h4&gt;News submodule&lt;/h4&gt;
&lt;p&gt;Thunderbird 3 improved the filter story dramatically here; the next two biggest itches are the complete inanity that is news URIs (too many bugs to count), and the venerable old &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=43278"&gt;crossposts bug&lt;/a&gt;. I still contend that the latter would best be served by per-account database functionality; in any case, it does require some database changes to work properly. I doubt I'll find time to look at that bug in particular this year.
&lt;/p&gt;&lt;p&gt;
The URI issues are more tractable, but I don't think I'll find time to hit them for 3.1; in any case, I now consider them to be the highest priority news bugs. So, to anyone with time on their hands: feel free to take one or two of these and start fixing. You'll get much kudos from Thunderbird Usenet users.
&lt;/p&gt;&lt;p&gt;
Other various "nice-to-haves" on my list: fixing subscribe, cleaning up some of the gunk in the code, &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=244682"&gt;adding support for RFC 3977 CAPABILITIES&lt;/a&gt;, possibly changing how &lt;tt&gt;news:a.group&lt;/tt&gt; URIs work (open the folder view, not necessarily subscribing to them), among others. &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=60981"&gt;Combine-and-decode&lt;/a&gt; also falls under this list, but it's a lot less tractable than some of the other stuff.
&lt;/p&gt;
&lt;h4&gt;Analysis tools&lt;/h4&gt;
&lt;p&gt;jshydra could use some more love: I hope to be able to be able to get a converter to a more natural AST working by the end of the year, as well as an automated test suite to verify correctness whenever I change m-c versions. I've also been working on-and-off on getting symbols for DXR via MSVC, which should hopefully also be finished this year.
&lt;/p&gt;&lt;h4&gt;Other Mozilla/Mailnews work&lt;/h4&gt;&lt;p&gt;
As I've mentioned before, my biggest goal for 3.1 is to be able to specify new account types in Javascript. I basically have the necessary framework completed locally, I just need to finish writing the tests and fix some bugs before getting it reviewed and committed; after that, I'll be writing a series of blog entries on developing an account type in JS, similar to (and hopefully better than) my pork guides. Speaking of which, I hope to finish that sometime this year as well. Possibly during summer again.
&lt;/p&gt;&lt;p&gt;
I've yet to see a roadmap for the address book in 3.1 and later, so I don't know what I'll be doing for the address book in this upcoming year. I expect, though, that I won't do anything near the scale of what I did for bug 413260. De-RDF and de-morkification are another two things I'd like to see worked on that I don't expect to get to this next year as well.
&lt;/p&gt;&lt;p&gt;
Time to see how much I'll actually get done this year!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-3746253317053437225?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/3746253317053437225/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=3746253317053437225' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/3746253317053437225'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/3746253317053437225'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2010/01/predicted-work-on-thunderbird.html' title='Predicted work on Thunderbird'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-6437672478420013969</id><published>2009-12-09T13:51:00.004-05:00</published><updated>2011-04-10T08:53:08.101-04:00</updated><title type='text'>Google Wave in Thunderbird 3</title><content type='html'>While it's not in the format I would ideally want, I recently got Google Wave inside Thunderbird 3. How, you might ask. Simple: &lt;a href="https://developer.mozilla.org/en/Thunderbird/Content_Tabs"&gt;the new content tabs feature&lt;/a&gt;.
&lt;/p&gt;&lt;p&gt;
So, to do it, go into the Error Console, and type this line in: &lt;tt&gt;Components.classes['@mozilla.org/appshell/window-mediator;1'].getService(Components.interfaces.nsIWindowMediator).getMostRecentWindow("mail:3pane").document.getElementById("tabmail").openTab("contentTab", {contentPage: "https://wave.google.com/wave/?nouacheck"});&lt;/tt&gt;. Note that Google Wave for some idiotic reason decides that Thunderbird isn't a valid UA to be using, so you have to convince it to disable the UA with the &lt;tt&gt;?nouacheck&lt;/tt&gt;. I thought browser sniffing died out years ago...
&lt;/p&gt;&lt;p&gt;
For bonus points, if you restart Thunderbird, the tab will stay open, so all you need to do is login again!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-6437672478420013969?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/6437672478420013969/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=6437672478420013969' title='10 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/6437672478420013969'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/6437672478420013969'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2009/12/google-wave-in-thunderbird-3.html' title='Google Wave in Thunderbird 3'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>10</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-1012262860421265715</id><published>2009-11-29T17:58:00.002-05:00</published><updated>2009-11-29T17:59:18.694-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>Thunderbird extensibility</title><content type='html'>With &lt;a href="http://www.mozillamessaging.com/en-US/about/press/archive/2009-11-24-01"&gt;Thunderbird 3.0rc1 now released&lt;/a&gt;, it is time, in my opinion, to start looking towards the goals that Thunderbird 3.next and future versions should have. One category of goals would be the extensibility of Thunderbird.
&lt;/p&gt;&lt;p&gt;
When I say "extensibility," I specifically refer to replacing backend components or adding new components to certain type categories. I do recognize that things like the power of the address book API is important for extensions, but I don't consider that "extensibility" (perhaps "API usability" is a better name?). I am also going to focus exclusively on mailnews components here.
&lt;/p&gt;&lt;p&gt;
This blog post is a set of my opinions on the state of extensibility in Thunderbird and SeaMonkey. With one exception, I am not recommending what should or shouldn't be worked on: I am just giving my biased beliefs on these topics. I also want to mention that I focus primarily on the backend, so I have a very distorted view of what can be done in extensions and how easily it can be done.
&lt;/p&gt;&lt;p&gt;
The first metric I consider is &lt;b&gt;possibility&lt;/b&gt;, the ability of one to adequately and easily make an extension in the given field in Thunderbird 3. Since it is theoretically possible to replace nearly any component given enough effort, a key component is the ease with which one can do it. High numbers correspond to requiring a little bit of JS and a bit of UI to go along with it; low numbers generally indicate that thousands of lines of C++ reimplementing large swathes of platform code is needed.
&lt;/p&gt;&lt;p&gt;
The second metric I consider is &lt;b&gt;difficulty&lt;/b&gt;, the amount of effort it would take to make this facet of extensibility rather close to perfect. For this metric, having a lower number is better: a very low number indicates that it would only take a few hours to produce a patch (if necessary) to fix it, while a very high number indicates that several people would have to work for a few months to produce a patch, not including the review cycles.
&lt;/p&gt;&lt;p&gt;
The final metric I consider is &lt;b&gt;desirability&lt;/b&gt;, how useful it would be to improve this facet. Again, higher numbers are better.
&lt;/p&gt;&lt;p&gt;
A final note about numbers: they're just qualitative indicators of relative value. Don't read too much into actual values, as I more or less came up with these numbers and then tweaked them up or down as I wrote more of this blog. A low value for &lt;b&gt;possibility&lt;/b&gt; in particular doesn't mean it's very difficult to do right now, it just means it's not as easy as some other things.
&lt;/p&gt;
Facets of extensibility:
&lt;ul&gt;
&lt;li&gt;&lt;a href="#ui"&gt;UI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#folderpane"&gt;Folder pane views&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#dbview"&gt;Alternative database views&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#messages"&gt;Message visualization&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#protocols"&gt;Protocol-level extensions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#import"&gt;Import&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#sync"&gt;Synchronization&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#search"&gt;Search backends&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#filters"&gt;Filter backends&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#abtype"&gt;Specifying new address book types&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#msgstore"&gt;Message storage&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#metadata"&gt;Metadata storage&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#accttype"&gt;Specifying new account types&lt;/a&gt;&lt;/li&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;div&gt;
&lt;h4 id="ui"&gt;UI:&lt;/h4&gt;
Possibility:  10&lt;br /&gt;
Difficulty:    1&lt;br /&gt;
Desirability: 10
&lt;p&gt;
Here, when I talk about UI, I specifically mean the ability to add, remove, replace, or rearrange elements of the UI, including toolbars, menus, and keyboard shortcuts, among others. To my knowledge, there aren't many problems with being able to modify the UI modulo the usability of XPCOM, Mozilla's toolkit, or other components; any problems that come up could be worked out mostly by adding an &lt;tt&gt;id&lt;/tt&gt; attribute or perhaps reshuffling a portion of the dialog. Then again, as I mostly stick to the backend, I could be dead wrong about this.
&lt;/p&gt;&lt;/div&gt;&lt;div&gt;
&lt;h4 id="folderpane"&gt;Folder pane views:&lt;/h4&gt;
Possibility:  9&lt;br /&gt;
Difficulty:   2&lt;br /&gt;
Desirability: 5
&lt;p&gt;
Right now my 18 accounts (19 if you include the "Smart Folders" account) are arrayed in a specific order with trees of various folders hanging off of them. Perhaps I don't want to see them like that. After all, there is rather little correlation between an account and the contents of folders (in my case at least). None of the other layouts are particularly useful for my needs as well. Why not make a layout of folders that suits my needs better? For that matter, why limit it to looking at folders at all?
&lt;/p&gt;&lt;p&gt;
In a supreme effort, &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=414038"&gt;Joey Minta replaced the folder pane implementation with an extensible JS version&lt;/a&gt; (with regressions tracked down and fixed by myriads of others). This is perhaps the only major part where SeaMonkey and Thunderbird diverge greatly: SeaMonkey &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=507601"&gt;has yet to port the patch&lt;/a&gt;, for good reason (the original produced some 33 regressions, so it's a rather risky and complex bug to port). Asides from that, there's probably not much that would have to be done to get easy-to-implement extensions working here.
&lt;/p&gt;&lt;/div&gt;&lt;div&gt;
&lt;h4 id="dbview"&gt;Alternative database views:&lt;/h4&gt;
Possibility:  4&lt;br /&gt;
Difficulty:   7&lt;br /&gt;
Desirability: 3
&lt;p&gt;
This concept loosely ties into the creation of new account types, but I think many of the more exotic account types could use other ways of selecting messages than a tree of threads. Even email lists could sometimes use better visualization; &lt;a href="http://www.visophyte.org/blog/2008/11/29/ill-be-wanting-that-latte-machine-now/"&gt;here is a graphical thread view&lt;/a&gt; discussed by Andrew Sutherland that struck my eyes a year ago.
&lt;/p&gt;&lt;p&gt;
If you don't want to set your sights so high, making essentially a new filtered view is a matter of mere C++ extension combined with a touch of UI; replacing the &lt;tt&gt;&amp;lt;tree&amp;gt;&lt;/tt&gt; XUL tag would probably be a lot more involved. Making the latter part easier would involve removing or shifting around a fair amount of logic around.
&lt;/p&gt;&lt;/div&gt;&lt;div&gt;
&lt;h4 id="#messages"&gt;Message display:&lt;/h4&gt;
Possibility:  5&lt;br /&gt;
Difficulty:   8&lt;br /&gt;
Desirability: 4
&lt;p&gt;
Not all messages are best viewed as static rich text. For example, feed summaries in many cases &lt;a href="http://rss.csmonitor.com/feeds/scitech"&gt;don't include the full article&lt;/a&gt;, so it's sometimes better to think of it as the URI that the summary is associated with. I also imagine that there are some wacky message visualizations for some domain-specific messages that would be nice to play with. Though I say that, I'm at a loss to think of any of them as being particularly useful right now.
&lt;/p&gt;&lt;p&gt;
It's not terribly hard to change message display right now, as one can replace the preview pane or full window with another URI on message load&amp;mdash;this is what RSS does right now. At the same time, though, a lot of information in libmime is rather locked away, and more primitive forms of extension (replacing consideration of parts in specific circumstances) would require changing mostly-hardcoded libmime code. Which would probably require &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=421086"&gt;a C++ infection&lt;/a&gt; first.
&lt;/p&gt;&lt;/div&gt;&lt;div&gt;
&lt;h4 id="protocols"&gt;Protocol-level extensions:&lt;/h4&gt;
Possibility:  1&lt;br /&gt;
Difficulty:   5&lt;br /&gt;
Desirability: 2
&lt;p&gt;
There are &lt;a href="http://www.iana.org/assignments/imap4-capabilities"&gt;38 official IMAP extensions&lt;/a&gt;; the IMAP client code implements far fewer. In addition, there is also the possibility that servers may implement their own custom &lt;tt&gt;XAWESOMEIMAPCOMMAND&lt;/tt&gt; which might be ideal to implement in an extension. Of all of the facets I describe here, this is the most difficult to do in an extension right now: you'd have to rewrite the protocol objects themselves. Making this possible to do is also no cakewalk, as this is inherently connection-level, but the server/service interfaces tend to abstract away connection issues. As for desirability, I mostly consider this a "neat concept," but probably not worth spending time to sit down and actually do. There is one example which recently came up, though: the &lt;a href="http://tools.ietf.org/html/rfc5257"&gt;IMAP &lt;tt&gt;ANNOTATE&lt;/tt&gt;&lt;/a&gt; command is something that seems perfect to implement in an extension.
&lt;/p&gt;&lt;p&gt;
A subcategory of this which is probably more useful is the ability to merely add in new authentication schemes. All of IMAP, NNTP, POP, and SMTP have a mechanism for generic SASL authentication schemes; HTTP authentication (for RSS and possibly others) also has a similar generic authentication measure (I don't actually know how well HTTP-authed RSS feeds work, but I'd be surprised if there were a major problem).
&lt;/p&gt;&lt;/div&gt;&lt;div&gt;
&lt;h4 id="import"&gt;Import:&lt;/h4&gt;
Possibility:  8&lt;br /&gt;
Difficulty:   4&lt;br /&gt;
Desirability: 4
&lt;p&gt;
Coding an importer into an extension isn't too hard: &lt;a href="https://wiki.mozilla.org/User:Jcranmer/Writing_an_Importer"&gt;I've done it before&lt;/a&gt;. The hardest issue I had was the fact that import is actually multithreaded (unlike most actions, which merely run on event queues). My understanding is that this implies that you cannot write it in JS, only C++.
&lt;/p&gt;&lt;p&gt;
That said, though, I'm not sure that it's something to be concerned with. Import is not something that I think a lot of people will be doing time and time again (if they were, you've pretty much entered synchronization land). Instead, import is something that matters more for people redistributing mailnews code, such as Linux distributions. The interesting cases, then, would mostly be other system applications, which generally implies that you're writing stuff in native code anyways (although &lt;a href="https://developer.mozilla.org/en/JavaScript_code_modules/ctypes.jsm"&gt;JSCtypes&lt;/a&gt; can access those native APIs from JS, as long as they are written in C and not C++ (so no qt) and you don't need to expand dozens of macros to get a simple method (so no gtk)).
&lt;/p&gt;&lt;/div&gt;&lt;div&gt;
&lt;h4 id="sync"&gt;Synchronization:&lt;/h4&gt;
Possibility:  7&lt;br /&gt;
Difficulty:   3&lt;br /&gt;
Desirability: 9
&lt;p&gt;
By synchronization, I mean specifically the ability to synchronize data&amp;mdash;be it preferences, mail, contacts, calendar, or anything else people come up with&amp;mdash;between Thunderbird and any other accepting client, which includes other instances of Thunderbird or SeaMonkey, online web services, PDAs, and smartphones, among others. You can already do it now, as evidenced by &lt;a href="https://addons.mozilla.org/en-US/thunderbird/addon/6095"&gt;two of the&lt;/a&gt; &lt;a href="https://addons.mozilla.org/en-US/thunderbird/addon/4631"&gt;more popular extensions&lt;/a&gt; doing this. Indeed, &lt;a href="http://ccgi.standard8.plus.com/blog/archives/200"&gt;until recently&lt;/a&gt;, comm-central had Palm synchronization code in its tree. The problem is that some APIs could stand to cater more to this use case; I would even go so far as to say that a centralized framework might be useful.
&lt;/p&gt;&lt;/div&gt;&lt;div&gt;
&lt;h4 id="search"&gt;Search backends:&lt;/h4&gt;
Possibility:  6&lt;br /&gt;
Difficulty:   6&lt;br /&gt;
Desirability: 8
&lt;p&gt;
I've never precisely tabulated how much mail I get in a day, but I reliably receive around 100 messages a day from someone called "&lt;tt&gt;bugzilla-daemon@mozilla.org&lt;/tt&gt;", another 50 or so from various other mailing lists, on top of innumerable RSS feed messages (I subscribe to 60 or so feeds) and NNTP messages. Compared to &lt;a href="http://www.justdave.net/dave/"&gt;some other people&lt;/a&gt;, that list seems small. In short, many users may be suffering from acute information overload, and how does one remedy that? With comprehensive searching and filtering.
&lt;/p&gt;&lt;p&gt;
It seems to me, therefore, that search extensibility is vital. For example, for triage purposes, I might want to search for all bugzilla threads whose bugs are currently fixed (irrespective of the bug's status when I received the email). This area has really improved recently in Thunderbird 3, but I don't think it's quite there yet. For example, searching on the current status of a bug in bugzilla is not really feasible under the gloda search engine, but it may be possible under the conventional search. The high value for the difficulty is due to the fact that I think to put this where it truly ought to be, one needs to do some &lt;a href="http://bugzilla.mozilla.org/show_bug.cgi?id=11050"&gt;massive database overhauls&lt;/a&gt;.
&lt;/p&gt;&lt;/div&gt;&lt;div&gt;
&lt;h4 id="filters"&gt;Filter backends:&lt;/h4&gt;
Possibility:  5&lt;br /&gt;
Difficulty:   7&lt;br /&gt;
Desirability: 7
&lt;p&gt;
My rationale for the desirability here is similar to searching, but it's slightly lower since I think that filtering is slightly less useful than searching (then again, my modus operandi is typically a trigger-happy finger on the Delete key). Again, filters have had massive improvement in the course of Thunderbird 3 development, but I don't think they're quite where they should be.
&lt;/p&gt;&lt;p&gt;
What makes this more annoying than search in terms of ease of use and hacking is the fact that filters live in a neverland where the headers have not yet been added to the database, so some potentially useful operations (notably, thread lookup) won't work. Changing this would require one to audit carefully the entire new message process and handle the database manipulation in precise ways. Nevertheless, I think this stands out as one of the more likely things to be improved, since the APIs seem to be slowly moving this way as a result of the other new message bugs.
&lt;/p&gt;&lt;/div&gt;&lt;div&gt;
&lt;h4 id="abtype"&gt;Creating new address book types:&lt;/h4&gt;
Possibility:  6&lt;br /&gt;
Difficulty:   2&lt;br /&gt;
Desirability: 5
&lt;p&gt;
This was an idea that I was once a very big proponent of; in recent months, though, I've tempered my enthusiasm for this. Unlike accounts, I don't think there's a lot of reason to prefer a new address book type over synchronization. There is also the possibility of faking a new address book type by selecting an address book to be automatically and zealously synchronized.
&lt;/p&gt;&lt;p&gt;
At this point, it was rather possible to create a new type&amp;mdash;provided you write it in C++, you do some internal sleights of hand (it's a MAPI address book! I swear! Or maybe "none," which (oddly enough) also seems to work&amp;hellip;), and you remember to make sure it's an RDF resource. Actually, getting a very rudimentary, read-only &lt;a href="http://hg.mozilla.org/users/Pidgeot18_gmail.com/ab_rewrite/rev/d377809a4498"&gt;SQL address book implementation&lt;/a&gt; to work only took a few hours. The remaining internal work, to my knowledge, is merely to finish eradicating RDF, overhaul &lt;tt&gt;nsDirPrefs&lt;/tt&gt;, and stop assuming the presence of &lt;tt&gt;nsIAddrDatabase&lt;/tt&gt; in so many places. I'm not including the related address book API overhauls in the difficulty metric, although it seems the hardest part there is getting people to review the changes.
&lt;/p&gt;&lt;/div&gt;&lt;div&gt;
&lt;h4 id="msgstore"&gt;Message storage:&lt;/h4&gt;
Possibility:  2&lt;br /&gt;
Difficulty:   9&lt;br /&gt;
Desirability: 5
&lt;p&gt;
Message storage is the ability to specify other ways to store messages. Predominantly this is is the storage for local folders and offline storage, although theoretically one can consider the remote storage of new account types to fall under this depending on how the API is done. Simply put, right now it is really difficult to do this in an extension: for all of mailnews's heavy usage of XPCOM, this is one area where the using classes create instances via &lt;a href="http://mxr.mozilla.org/comm-central/search?string=+%3D+new&amp;find=mailnews%2Fdb%2Fmsgdb&amp;findi=&amp;filter=^[^\0]*%24&amp;hitlimit=&amp;tree=comm-central"&gt;&lt;tt&gt;new&lt;/tt&gt;&lt;/a&gt;.
&lt;/p&gt;&lt;p&gt;
This is probably the most difficult of the facets to make extensible, even more so than the ability to make new account types. Why? First you have to create a new API, and then make everyone who does offline storage or local folders use it. The next thing is to consider the optimizations made under the hood to speed up local copy/moves and work out how to optimize them for more general APIs. Finally, you have to set up the UI. Flame wars of what to make the default may be involved as well.
&lt;/p&gt;&lt;p&gt;
With all that said, I don't think it's particularly desirable. For all of its foibles, I'm not particularly convinced that the proposed replacements for mbox are appreciably better. With that said, a well-designed API for this feature would also make creating new account types easier, as it would likely cause simplification of the &lt;tt&gt;nsIMsgFolder&lt;/tt&gt; interface.
&lt;/p&gt;&lt;/div&gt;&lt;div&gt;
&lt;h4 id="metadata"&gt;Metadata storage:&lt;/h4&gt;
Possibility:  2&lt;br /&gt;
Difficulty:   8&lt;br /&gt;
Desirability: 1
&lt;p&gt;
This facet of extensibility is very similar to what I call message storage, except this is concerned with the ability to replace the database storage format itself. Creating an extension to do this now is at least as hard as creating one that replaces the message storage, I think. On the other hand, the core work needed to make it easier to do seems to mostly involve creating a few new interfaces and a large automated rewrite (depending on how much you want to depart from the current mork interface). Switching between databases would probably be largely lossy, though.
&lt;/p&gt;&lt;p&gt;
Of all the ways to extend Thunderbird, I think this is one of the least interesting. Seperate from message storage (which tends to imply a need to rework metadata as well), there is little value to being able to create new storage backends for metadata. That is not to say that I think this area couldn't use an entirely new format, I just think that the work needed to make it more pluggable is less desirable. I put it in here mostly to bring up for discussion of &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=11050#c29"&gt;an approach mentioned in bug 11050&lt;/a&gt;.
&lt;/p&gt;&lt;/div&gt;&lt;div&gt;
&lt;h4 id="accttype"&gt;Creating new account types:&lt;/h4&gt;
Possibility:   3&lt;br /&gt;
Difficulty:    9&lt;br /&gt;
Desirability: 10&lt;br /&gt;
&lt;p&gt;
Last, but not least, is what I would consider to be the holy grail of extensibility, to be able to create new account types. While I have several reasons for considering this so important, the best reason in my mind is that this, more than any other facet of extensibility, would be able to best showcase the ingenuity of the world at large.
&lt;/p&gt;&lt;p&gt;
Assigning possibility and difficulty numbers here is difficult, as something like this has not really been fully attempted in recent years. Making a server in the ilk of POP, movemail, or RSS (one whose mail is downloaded and stored locally) is probably rather possible and not too difficult; making one in the format of IMAP or NNTP appears to only have been attempted once, and in &lt;a href="http://webmail.mozdev.org/"&gt;that case&lt;/a&gt;, the author just ended up writing an IMAP server to bridge the two ends. I myself have tried more than once to create a working prototype; my furthest effort got to the downloading mail stage before I lost the will to continue.
&lt;/p&gt;&lt;p&gt;
As far as I know, there is nothing absolutely stopping somebody from just implementing this, at least in C++. That said, truly exotic account types (instant messaging or social networking are the common examples here) would probably require large contortions to work right, and the frontend is littered with manual type checks for account types that would probably have to be overriden in a more generic format. One large sector of possible account types is hampered by a difficulty in turning HTML text into a DOM tree for easier manipulation (i.e., screenscraping). I am therefore assigning rather pessimistic values for possibility and difficulty, since implementation currently requires brute force to attempt and the difficulty of making this easier to use requires a prototype implementation to fully ascertain.
&lt;/p&gt;&lt;p&gt;
On top of that, though, there is the issue that many account types would also like to be able to send stuff via the compose window. That part is probably the single hardest part, as compose cannot fathom the existence of anything other than SMTP or NNTP. One would therefore have to create new APIs for extensions, ramrod support for these into compose, and probably fix up &lt;tt&gt;nsIMsgIdentity&lt;/tt&gt; to not be so mail-centric. Right now, the easiest way to do it is probably to make an SMTP server that acts as a middleman. I doubt a good way to do this can be finished before 3.next, though.
&lt;/p&gt;&lt;p&gt;
With that said, of all the facets I mention, the ability to create new account types is certainly the one I will put the most effort into personally. I have a goal of producing a usable extension in this regard, implemented entirely in JS, by the first beta, and this is my highest-priority 3.next goal.
&lt;/p&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-1012262860421265715?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/1012262860421265715/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=1012262860421265715' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/1012262860421265715'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/1012262860421265715'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2009/11/thunderbird-extensibility.html' title='Thunderbird extensibility'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-6853061368590866162</id><published>2009-08-22T18:40:00.002-04:00</published><updated>2009-08-22T18:44:01.313-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='pork'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>A guide to pork, part 5</title><content type='html'>In the previous two sections of my guide, I discussed &lt;a href="http://quetzalcoatal.blogspot.com/2009/08/guide-to-pork-part-3.html"&gt;basics of functions&lt;/a&gt; and then went into &lt;a href="http://quetzalcoatal.blogspot.com/2009/08/guide-to-pork-part-4.html"&gt;the specifics of types and names&lt;/a&gt;, as represented by the Elsa AST nodes. In this section, I will cover classes and some more errata on declarations. As I have mentioned earlier, the subsections of this third step will be visited out of the order of their numbering.
&lt;/p&gt;&lt;p&gt;
What has been covered so far:&lt;br /&gt;
&lt;b&gt;Step 1&lt;/b&gt;: Building and running your tool&lt;br /&gt;
&lt;b&gt;Step 1.1&lt;/b&gt;: Running the patcher&lt;br /&gt;
&lt;b&gt;Step 2&lt;/b&gt;: Using the patcher&lt;br /&gt;
&lt;b&gt;Step 3&lt;/b&gt;: The structure of the Elsa AST&lt;br /&gt;
&lt;b&gt;Step 3.1&lt;/b&gt;: &lt;i&gt;Declarations and other top-level-esque fun&lt;/i&gt;&lt;br /&gt;
&lt;b&gt;Step 3.1.3&lt;/b&gt;: Function&lt;br /&gt;
&lt;b&gt;Step 3.1.5&lt;/b&gt;: Declaration&lt;br /&gt;
&lt;b&gt;Step 3.1.6&lt;/b&gt;: TypeSpecifier&lt;br /&gt;
&lt;b&gt;Step 3.1.8&lt;/b&gt;: Enumerator&lt;br /&gt;
&lt;b&gt;Step 3.1.11&lt;/b&gt;: Declarator&lt;br /&gt;
&lt;b&gt;Step 3.1.12&lt;/b&gt;: IDeclarator&lt;br /&gt;
&lt;b&gt;Step 3.4&lt;/b&gt;: &lt;i&gt;The AST objects that aren't classes&lt;/i&gt;&lt;br /&gt;
&lt;b&gt;Step 3.4.4&lt;/b&gt;: CVFlags&lt;br /&gt;
&lt;b&gt;Step 3.4.5&lt;/b&gt;: DeclFlags&lt;br /&gt;
&lt;b&gt;Step 3.4.8&lt;/b&gt;: PQName&lt;br /&gt;
&lt;b&gt;Step 3.4.9&lt;/b&gt;: SimpleTypeId&lt;br /&gt;
&lt;b&gt;Step 3.4.11&lt;/b&gt;: TypeIntr&lt;br /&gt;
&lt;/p&gt;&lt;p&gt;
&lt;h3&gt;Step 3.1.9: &lt;tt&gt;MemberList&lt;/tt&gt; (inside a class)&lt;/h3&gt;
&lt;tt&gt;MemberList&lt;/tt&gt; has only a single variable: &lt;tt&gt;ASTList&amp;lt;Member&amp;gt; list&lt;/tt&gt;.
&lt;/p&gt;&lt;p&gt;
&lt;h3&gt;Step 3.1.10: &lt;tt&gt;Member&lt;/tt&gt; (part of a class)&lt;/h3&gt;
&lt;tt&gt;Member&lt;/tt&gt; nodes represent a member of a class type. Like other nodes, this is mostly represented by its subclasses, &lt;tt&gt;MR_decl&lt;/tt&gt;, &lt;tt&gt;MR_func&lt;/tt&gt;, &lt;tt&gt;MR_access&lt;/tt&gt;, &lt;tt&gt;MR_usingDecl&lt;/tt&gt;, and &lt;tt&gt;MR_template&lt;/tt&gt;. The node itself has two members, &lt;tt&gt;SourceLoc loc&lt;/tt&gt; and &lt;tt&gt;SourceLoc endloc&lt;/tt&gt;. End locations represent the location just after the semicolon or closing brace, such that the range of text matches &lt;tt&gt;[loc, endloc)&lt;/tt&gt;; if you recall, this is the same syntax that the patcher uses when working with ranges of location.
&lt;/p&gt;&lt;p&gt;
Each of the subclasses of &lt;tt&gt;Member&lt;/tt&gt; only adds one variable, which is the type that member represents. &lt;tt&gt;MR_decl&lt;/tt&gt; adds &lt;tt&gt;Declaration d&lt;/tt&gt;, hence it is used for all declarations within the class, be it a variable declaration like &lt;tt&gt;int x&lt;/tt&gt;, a type definition, or a function without a body. &lt;tt&gt;MR_func&lt;/tt&gt; adds &lt;tt&gt;Function f&lt;/tt&gt;, so it represents all functions that have their body in the class declaration (&lt;tt&gt;A();&lt;/tt&gt; is a declaration, but &lt;tt&gt;A() {}&lt;/tt&gt; is a function). &lt;tt&gt;MR_usingDecl&lt;/tt&gt; has as its member &lt;tt&gt;ND_usingDecl decl&lt;/tt&gt; (which is covered under the section &lt;tt&gt;NamespaceDecl&lt;/tt&gt;), and &lt;tt&gt;MR_template&lt;/tt&gt; uses &lt;tt&gt;TemplateDeclaration d&lt;/tt&gt;.
&lt;/p&gt;&lt;p&gt;
The final subclass of &lt;tt&gt;Member&lt;/tt&gt; is &lt;tt&gt;MR_access&lt;/tt&gt;, which has its member &lt;tt&gt;AccessKeyword k&lt;/tt&gt;. This node represents all of the declarations like &lt;tt&gt;private:&lt;/tt&gt;. Since the information keeping track of the access is stuffed in separate AST nodes, you may be wondering how to retrieve this information given only a specific member. The answer, naturally, lies in the auxiliary classes to the AST, something which I have avoided mentioning. Some nodes provide access to a &lt;tt&gt;Variable&lt;/tt&gt; member, one of whose methods retrieves the access to the member. More information about this will be discussed when I talk about that class in detail.
&lt;/p&gt;&lt;p&gt;
A final thing to note is that Elsa will add some nodes into the AST by the time you use the visitor. These are the implicit methods dictated by the C++ standard. You can check if one of these members is implicit if the &lt;tt&gt;DeclFlags&lt;/tt&gt; variable contains &lt;tt&gt;DF_IMPLICIT&lt;/tt&gt;. Another flag that will also be set is the &lt;tt&gt;DF_MEMBER&lt;/tt&gt; flag.
&lt;/p&gt;&lt;p&gt;
&lt;h3&gt;Step 3.1.7: &lt;tt&gt;BaseClassSpec&lt;/tt&gt; (extending classes)&lt;/h3&gt;
&lt;tt&gt;BaseClassSpec&lt;/tt&gt; nodes represent a superclass for a class. It has three variables, &lt;tt&gt;bool isVirtual&lt;/tt&gt;, &lt;tt&gt;AccessKeyword access&lt;/tt&gt;, and &lt;tt&gt;PQName name&lt;/tt&gt;, all of which are self-explanatory.
&lt;/p&gt;&lt;p&gt;
&lt;h3&gt;Step 3.4.1: &lt;tt&gt;AccessKeyword&lt;/tt&gt; (controlling access)&lt;/h3&gt;
&lt;tt&gt;AccessKeyword&lt;/tt&gt; is an enumerated type with three important members. These are &lt;tt&gt;AK_PUBLIC&lt;/tt&gt;, &lt;tt&gt;AK_PROTECTED&lt;/tt&gt;, and &lt;tt&gt;AK_PRIVATE&lt;/tt&gt;, all of which represent what you think they represent. There is also a &lt;tt&gt;AK_UNSPECIFIED&lt;/tt&gt; member, but that should not be present by the time you get to the AST nodes. Naturally, there is also a &lt;tt&gt;const char *toString(AccessKeyword)&lt;/tt&gt; method for converting these types into a string.
&lt;/p&gt;&lt;p&gt;
&lt;h3&gt;Step 3.4.7: &lt;tt&gt;OperatorName&lt;/tt&gt; (operator overloading)&lt;/h3&gt;
These nodes are informational nodes about operators. You should only find them within the &lt;tt&gt;PQ_operator&lt;/tt&gt; type. The class &lt;tt&gt;OperatorName&lt;/tt&gt; only has a single method &lt;tt&gt;const char *getOperatorName()&lt;/tt&gt;. This is the basis for the &lt;tt&gt;PQ_operator&lt;/tt&gt; name strings, so its results are as mentioned there.
&lt;/p&gt;&lt;p&gt;
The first subclass, &lt;tt&gt;ON_newDel&lt;/tt&gt;, represents the memory operator overloads. It has two members, &lt;tt&gt;bool isNew&lt;/tt&gt; and &lt;tt&gt;bool isArray&lt;/tt&gt;. The first differentiates between &lt;tt&gt;operator new&lt;/tt&gt; and &lt;tt&gt;operator delete&lt;/tt&gt;, the latter differentiation between &lt;tt&gt;operator new&lt;/tt&gt; and &lt;tt&gt;operator new[]&lt;/tt&gt;.
&lt;/p&gt;&lt;p&gt;
The second subclass, &lt;tt&gt;ON_operator&lt;/tt&gt;, represents the standard operator overloads. It has only one member, &lt;tt&gt;OverloadableOp op&lt;/tt&gt;, which represents the operator being overloaded. The names in the &lt;tt&gt;OverloadableOp&lt;/tt&gt; enum all begin with &lt;tt&gt;OP_&lt;/tt&gt; and can be idiosyncratic. Example operators are &lt;tt&gt;OP_NOT&lt;/tt&gt;, &lt;tt&gt;OP_BITNOT&lt;/tt&gt;, &lt;tt&gt;OP_PLUSPLUS&lt;/tt&gt;, &lt;tt&gt;OP_STAR&lt;/tt&gt;, &lt;tt&gt;OP_AMPERSAND&lt;/tt&gt;, &lt;tt&gt;OP_DIV&lt;/tt&gt;, &lt;tt&gt;OP_LSHIFT&lt;/tt&gt;, &lt;tt&gt;OP_ASSIGN&lt;/tt&gt;, &lt;tt&gt;OP_MULTEQ&lt;/tt&gt;, &lt;tt&gt;OP_GREATEREQ&lt;/tt&gt;, &lt;tt&gt;OP_AND&lt;/tt&gt;, &lt;tt&gt;OP_ARROW_STAR&lt;/tt&gt;, &lt;tt&gt;OP_BRACKETS&lt;/tt&gt;, and &lt;tt&gt;OP_PARENS&lt;/tt&gt;. There are naturally more, but the other names should be derivable from this sample (pretty much all the idiosyncracies were added to the list); the full list is in &lt;tt&gt;cc_flags.h&lt;/tt&gt; if you need to see it. There is also the standard &lt;tt&gt;toString(OverloadableOp)&lt;/tt&gt; method if you are confused about a particular operator.
&lt;/p&gt;&lt;p&gt;
Note that some of the operators can be used in different ways. For example, &lt;tt&gt;OP_STAR&lt;/tt&gt; both represents the multiplication operator and the pointer dereference operator. The way to differentiate between the two is via the number of arguments, although one must keep in mind that operators that are class members have one less argument. The postfix increment and decrement operators are differentiated from the prefix forms in that the postfix forms add a second &lt;tt&gt;int&lt;/tt&gt; argument, which is incidentally always 0 if you don't explicitly call the function.
&lt;/p&gt;&lt;p&gt;
The final subclass is the type conversion operator, &lt;tt&gt;ON_conversion&lt;/tt&gt;. This contains a single member, &lt;tt&gt;ASTTypeId type&lt;/tt&gt;. The member type will have a terminal &lt;tt&gt;D_name&lt;/tt&gt; in its declaration with a null name; the main purpose of the declaration under the &lt;tt&gt;ASTTypeId&lt;/tt&gt; is to capture the pointers.
&lt;/p&gt;&lt;p&gt;
&lt;h3&gt;Step 3.4.2: &lt;tt&gt;ASTTypeId&lt;/tt&gt; (a less powerful version of &lt;tt&gt;Declaration&lt;/tt&gt;)&lt;/h3&gt;
&lt;tt&gt;ASTTypeId&lt;/tt&gt; is modelled after the &lt;tt&gt;Declaration&lt;/tt&gt; node, but it's used in places where multiple declarations are not usable. Indeed, its most common usage is to represent a type (nominally &lt;tt&gt;TypeSpecifier&lt;/tt&gt;) that can have pointers or references. It has two members, &lt;tt&gt;TypeSpecifier spec&lt;/tt&gt; and &lt;tt&gt;Declarator decl&lt;/tt&gt;, both of which act as their analogues in &lt;tt&gt;Declaration&lt;/tt&gt;.
&lt;/p&gt;&lt;p&gt;
&lt;h3&gt;Step 3.1.4: &lt;tt&gt;MemberInit&lt;/tt&gt; (simple constructors)&lt;/h3&gt;
When working with constructors, the initialization of members is treated separately from the rest of the constructor. In Elsa, the nodes where this happens are the &lt;tt&gt;MemberInit&lt;/tt&gt; nodes. These nodes contain a few members:
&lt;/p&gt;&lt;dl class="AST-def"&gt;
&lt;dt&gt;SourceLoc loc&lt;/dt&gt;
&lt;dt&gt;SourceLoc endloc&lt;/dt&gt;
&lt;dt&gt;PQName *name&lt;/dt&gt;
&lt;dt&gt;FakeList&amp;lt;ArgExpression&amp;gt; *args&lt;/dt&gt;
&lt;/dl&gt;&lt;p&gt;
The source location and end locations have the standard meanings. The name attribute refers to the name of member being initialized. The arguments refer to the arguments of the function-like calls.
&lt;/p&gt;&lt;p&gt;
&lt;h3&gt;Step 3.1.14: &lt;tt&gt;Initializer&lt;/tt&gt; (the last part of declarations)&lt;/h3&gt;
The &lt;tt&gt;Initializer&lt;/tt&gt; nodes represent various ways to initialize an object. The class itself has a single member, &lt;tt&gt;SourceLoc loc&lt;/tt&gt;, but it has three subclasses, each representing some form of initialization.
&lt;/p&gt;&lt;p&gt;
&lt;tt&gt;IN_expr&lt;/tt&gt; represents the standard forms of initialization people are used to seeing, something along the lines of &lt;tt&gt;int x = 3;&lt;/tt&gt;. These nodes have a single member in addition, the &lt;tt&gt;Expression e&lt;/tt&gt; member which represents the expression initializing the declaration.
&lt;/p&gt;&lt;p&gt;
&lt;tt&gt;IN_ctor&lt;/tt&gt; represents the constructor-like initialization forms, such as &lt;tt&gt; int x(3);&lt;/tt&gt;. These have a single member, &lt;tt&gt;FakeList&amp;lt;ArgExpression&amp;gt; *args&lt;/tt&gt;, which represents the arguments within the parentheses.
&lt;/p&gt;&lt;p&gt;
&lt;tt&gt;IN_compound&lt;/tt&gt; is the final form, which represents the array-like initialization for structs or arrays. For example, &lt;tt&gt;int x[1] = { 0 };&lt;/tt&gt;. This has a single member as well, &lt;tt&gt;ASTList&amp;lt;Initializer&amp;gt; inits&lt;/tt&gt;, which is a list of the initializers within the aggregate syntax. Some words of caution, though, is that aggregate initialization can have unexpected results: multidimensional arrays need not have nested braces, and, in C++0x (and gcc since a long time, though it gives you a warning), you can also omit the braces for nested structures. Bit-fields and statics are omitted from initializers, and, if you have less elements in the initializer than you need, the rest are "value-initialized" (i.e., the equivalent of 0). Elsa, unfortunately, does not aide you any further in deducing which element is actually initialized by any given initializer.
&lt;/p&gt;&lt;p&gt;
&lt;h3&gt;Step 3.1.13: &lt;tt&gt;ExceptionSpec&lt;/tt&gt; (saying what you may throw)&lt;/h3&gt;
&lt;tt&gt;ExceptionSpec&lt;/tt&gt; nodes correspond to the &lt;tt&gt;throw&lt;/tt&gt; declarations on function declarations. These nodes only contains one member, &lt;tt&gt;FakeList&amp;lt;ASTTypeId&amp;gt; *types&lt;/tt&gt;, which represents the types that method is declared to be able to throw.
&lt;/p&gt;&lt;p&gt;
That is all I have for this part of the pork guide. Part 6 should finish up sections 3.1 and 3.4, so I look track to have part 7 start discussion statements and expressions. I will probably defer the auxilliary API until around part 9 or so, as I really need to play with it some more first.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-6853061368590866162?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/6853061368590866162/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=6853061368590866162' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/6853061368590866162'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/6853061368590866162'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2009/08/guide-to-pork-part-5.html' title='A guide to pork, part 5'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-1153590540630888675</id><published>2009-08-17T10:38:00.002-04:00</published><updated>2009-08-17T11:07:14.443-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='pork'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>A guide to pork, part 4</title><content type='html'>The &lt;a href="http://quetzalcoatal.blogspot.com/2009/08/guide-to-pork-part-3.html"&gt;last portion&lt;/a&gt; of the guide started covering declarations. This week, I will be covering a lot more about declarations. In particular, types and names are covered in a lot more detail. I had intended to talk about classes in more detail as well, but the post was getting long enough as it was, so I'll save discussion for a fifth part.
&lt;/p&gt;&lt;p&gt;
What has been covered so far:&lt;br /&gt;
&lt;b&gt;Step 1&lt;/b&gt;: Building and running your tool&lt;br /&gt;
&lt;b&gt;Step 1.1&lt;/b&gt;: Running the patcher&lt;br /&gt;
&lt;b&gt;Step 2&lt;/b&gt;: Using the patcher&lt;br /&gt;
&lt;b&gt;Step 3&lt;/b&gt;: The structure of the Elsa AST&lt;br /&gt;
&lt;b&gt;Step 3.1&lt;/b&gt;: &lt;i&gt;Declarations and other top-level-esque fun&lt;/i&gt;&lt;br /&gt;
&lt;b&gt;Step 3.1.3&lt;/b&gt;: Function&lt;br /&gt;
&lt;b&gt;Step 3.1.6&lt;/b&gt;: TypeSpecifier&lt;br /&gt;
&lt;b&gt;Step 3.1.11&lt;/b&gt;: Declarator&lt;br /&gt;
&lt;b&gt;Step 3.1.12&lt;/b&gt;: IDeclarator&lt;br /&gt;
&lt;b&gt;Step 3.4&lt;/b&gt;: &lt;i&gt;The AST objects that aren't classes&lt;/i&gt;&lt;br /&gt;
&lt;b&gt;Step 3.4.5&lt;/b&gt;: DeclFlags&lt;br /&gt;
&lt;/p&gt;&lt;p&gt;
&lt;h3&gt;Aside 1: An introduction to porky (continued)&lt;/h3&gt;
It seems that Chris Jones finally &lt;a href="http://blog.mozilla.com/cjones/2009/08/04/introducing-porky-py-low-fat-pork/"&gt;blogged about porky&lt;/a&gt;. If you're interested, go read about it.
&lt;/p&gt;&lt;p&gt;
&lt;h3&gt;Aside 2: Pork Web&lt;/h3&gt;
In the course of writing this guide, I got the idea of writing a tool to display the Elsa AST nodes without having to constantly fidget around with dumpAST. The result is &lt;a href="http://www.tjhsst.edu/~jcranmer/static/webpork/"&gt;Pork Web&lt;/a&gt;, which is also a good expository of how much a little CSS will get you.
&lt;/p&gt;&lt;p&gt;
&lt;h3&gt;Step 3.1.6: &lt;tt&gt;TypeSpecifier&lt;/tt&gt; (continued)&lt;/h3&gt;
In the last article, I mentioned &lt;tt&gt;TypeSpecifier&lt;/tt&gt; but elided details of its subclasses, who hold the interesting information, because I held a misunderstanding of key pieces of information.
&lt;/p&gt;&lt;p&gt;
For projects that are sufficiently large to be considered good candidates for automated rewriting, chances are that basic types like &lt;tt&gt;int&lt;/tt&gt; are going to be rather rare, in favor of typedefs that give more precise storage sizes (such as mozilla's &lt;tt&gt;PRInt32&lt;/tt&gt;). The parsing of the AST in Elsa and pork happens at a different stage from the type verification, which means that typedefs have an impact on the structure of nodes. That is not to say that you can't get type information; it just means that you want to use Elsa's type information (embodied in &lt;tt&gt;Variable&lt;/tt&gt;) for more accuracy here. Naturally, &lt;tt&gt;#define&lt;/tt&gt; has no impact on type information, because we are dealing with preprocessed files.
&lt;/p&gt;&lt;p&gt;
Which of the subclasses of &lt;tt&gt;TypeSpecifier&lt;/tt&gt; is used depends on the format. If you are using a standard type keyword like &lt;tt&gt;int&lt;/tt&gt;, you get the &lt;tt&gt;TS_simple&lt;/tt&gt; flavor, which I covered last week. Structures parsed as classes in C++ (i.e., &lt;tt&gt;class&lt;/tt&gt;, &lt;tt&gt;struct&lt;/tt&gt;, and &lt;tt&gt;union&lt;/tt&gt;) are all &lt;tt&gt;TS_classSpec&lt;/tt&gt; nodes; enums form &lt;tt&gt;TS_enumSpec&lt;/tt&gt; nodes. Class nodes, if you do not provide an actual definition, are classified as &lt;tt&gt;TS_elaborated&lt;/tt&gt; nodes. If all you have is a simple name, then the node is a &lt;tt&gt;TS_name&lt;/tt&gt; node, regardless if that type is a class, enum, or other such type. Names will never be null; for anonymous constructs like &lt;tt&gt;enum {a} x;&lt;/tt&gt;, a unique string beginning with &lt;tt&gt;__&lt;/tt&gt; will be used instead.
&lt;/p&gt;&lt;p&gt;
&lt;tt&gt;TS_name&lt;/tt&gt; has two variables: a &lt;tt&gt;PQName *name&lt;/tt&gt;, and a &lt;tt&gt;bool typenameUsed&lt;/tt&gt;. Both of these parameters are self-explanatory. For the curious, the latter comes about via an elabarator of &lt;tt&gt;typename&lt;/tt&gt;, such as in the below:&lt;br /&gt;
&lt;code&gt;template&amp;lt;class T&amp;gt; class Y { T::A a; };&lt;/code&gt;
&lt;/p&gt;&lt;p&gt;
&lt;tt&gt;TS_elaborated&lt;/tt&gt; again has two variables, the same &lt;tt&gt;PQName *name&lt;/tt&gt; variable, as well as a &lt;tt&gt;TypeIntr keyword&lt;/tt&gt; variable. The keyword variable is an explanation of which keyword was used as the elaboration.
&lt;/p&gt;&lt;p&gt;
&lt;tt&gt;TS_enumSpec&lt;/tt&gt; has again two variables, this time a &lt;tt&gt;StringRef /*(const char *)*/ name&lt;/tt&gt;, as well as a &lt;tt&gt;FakeList&amp;lt;Enumerator&amp;gt; elts&lt;/tt&gt;, which contains the elements in the enumeration.
&lt;/p&gt;&lt;p&gt;
&lt;tt&gt;TS_classSpec&lt;/tt&gt; is the most complex of the subclasses, as it represents the definition of a class. It contains the same &lt;tt&gt;name&lt;/tt&gt; and &lt;tt&gt;keyword&lt;/tt&gt; variables as &lt;tt&gt;TS_elaborated&lt;/tt&gt;, but it also has the base classes in the form of a &lt;tt&gt;FakeList&amp;lt;BaseClassSpec&amp;gt; *bases&lt;/tt&gt; and its members in a &lt;tt&gt;MemberList *members&lt;/tt&gt;.
&lt;/p&gt;&lt;p&gt;
&lt;h3&gt;Step 3.4.9: &lt;tt&gt;SimpleTypeId&lt;/tt&gt; (Primitives, if you come from Java)&lt;/h3&gt;
The &lt;tt&gt;SimpleTypeId&lt;/tt&gt; enum represents the primitive types defined by C++, namely &lt;tt&gt;char, bool, int, long, long long, short, wchar_t, float, double, and void&lt;/tt&gt;, as well as their unsigned and signed counterparts (if they exist). The name of each of these follows the general scheme &lt;tt&gt;ST_UNSIGNED_INT&lt;/tt&gt;, although &lt;tt&gt;short&lt;/tt&gt; and &lt;tt&gt;long&lt;/tt&gt; are &lt;tt&gt;ST_LONG_INT&lt;/tt&gt; and &lt;tt&gt;ST_SHORT_INT&lt;/tt&gt;, respectively (but not &lt;tt&gt;long long&lt;/tt&gt;!).
&lt;/p&gt;&lt;p&gt;
That's not all, though. For simplicity, some places have fake type codes. The most common of these will be &lt;tt&gt;ST_ELLIPSIS&lt;/tt&gt;, the varargs portion of functions; there is also &lt;tt&gt;ST_CDTOR&lt;/tt&gt;, the return type for constructors and destructors. The source code also mentions GNU or C99 support for complex numbers, but I have not found the magic needed to get those to work.
&lt;/p&gt;&lt;p&gt;
&lt;h3&gt;Step 3.4.4: &lt;tt&gt;CVFlags&lt;/tt&gt; (CV-qualified IDs)&lt;/h3&gt;
Whenever something can be const or volatile, there is a &lt;tt&gt;CVFlags&lt;/tt&gt; enum. It can either be &lt;tt&gt;CV_NONE&lt;/tt&gt; (no qualifiers), &lt;tt&gt;CV_CONST&lt;/tt&gt;, &lt;tt&gt;CV_VOLATILE&lt;/tt&gt;, or both of the latter. There also exists a method &lt;tt&gt;sm::string toString(CVFlags cv)&lt;/tt&gt; that will print a string representation of such a variable. Need I say more?
&lt;/p&gt;&lt;p&gt;
&lt;h3&gt;Step 3.1.5: &lt;tt&gt;Declaration&lt;/tt&gt; (The outer part of declarations)&lt;/h3&gt;
Any time a variable is declared, one of the wrappers is &lt;tt&gt;Declaration&lt;/tt&gt; (which may itself be found in various places). This has just three members, a &lt;tt&gt;DeclFlags dflags&lt;/tt&gt; variable that represents the flags on the declaration, a &lt;tt&gt;TypeSpecifier *spec&lt;/tt&gt; that is the type of the declaration, and the &lt;tt&gt;FakeList&amp;lt;Declarator&amp;gt; *decllist&lt;/tt&gt; that contains the rest of the declaration. All of these have been covered in more detail earlier.
&lt;/p&gt;&lt;p&gt;
&lt;h3&gt;Step 3.4.11: &lt;tt&gt;TypeIntr&lt;/tt&gt; (Differentiation between classes and structures)&lt;/h3&gt;
&lt;tt&gt;TypeIntr&lt;/tt&gt; is a little enum that has four members: &lt;tt&gt;TI_STRUCT, TI_CLASS, TI_UNION&lt;/tt&gt;, and &lt;/tt&gt;TI_ENUM&lt;/tt&gt;. The descriptions of them are, I think, straightfoward. There is also a top-level method to convert to a string representation, &lt;tt&gt;char const *toString(TypeIntr tr)&lt;/tt&gt;, which will do what you think it does.
&lt;/p&gt;&lt;p&gt;
&lt;h3&gt;Step 3.1.8: &lt;tt&gt;Enumerator&lt;/tt&gt; (The members of enumerations)&lt;/h3&gt;
Within the definition of an enum is a &lt;tt&gt;FakeList&lt;/tt&gt; of &lt;tt&gt;Enumerator&lt;/tt&gt; nodes. These have a standard location and a &lt;tt&gt;StringRef name&lt;/tt&gt;. The values can be represented in the potentially null &lt;tt&gt;Expression *expr&lt;/tt&gt; variable, or the actual value in &lt;tt&gt;int enumValue&lt;/tt&gt;.
&lt;/p&gt;&lt;p&gt;
&lt;h3&gt;Step 3.4.8: &lt;tt&gt;PQName&lt;/tt&gt; (Everybody's name)&lt;/h3&gt;
In declarations and other places, in lieu of a string representing name, you have the AST node &lt;tt&gt;PQName&lt;/tt&gt;. The name stands for "possibly qualified." It comes about because there is the necessity of finding the different components of a name.
&lt;/p&gt;&lt;p&gt;
This class has four subclasses, &lt;tt&gt;PQ_qualifier&lt;/tt&gt;, &lt;tt&gt;PQ_name&lt;/tt&gt;, &lt;tt&gt;PQ_operator&lt;/tt&gt;, and &lt;tt&gt;PQ_template&lt;/tt&gt;. In addition to these, it has a plethora of functions intended to help you with the task of printing these names, as well as overloaded operators to aid in output (to &lt;tt&gt;std::ostream&lt;/tt&gt; and &lt;tt&gt;stringBuilder&lt;/tt&gt;). They are:
&lt;/p&gt;&lt;dl class="AST-def"&gt;
&lt;dt&gt;SourceLoc loc&lt;/dt&gt;
&lt;dt&gt;bool hasQualifiers()&lt;/dt&gt;
&lt;dt&gt;sm::string qualifierString()&lt;/dt&gt;
&lt;dt&gt;sm::string toString()&lt;/dt&gt;
&lt;dt&gt;sm::string toString_noTemplArgs()&lt;/dt&gt;
&lt;dt&gt;StringRef /* const char * */ getName()&lt;/dt&gt;
&lt;dt&gt;sm::string toComponentString()&lt;/dt&gt;
&lt;dt&gt;PQName *getUnqualifiedName() /* (And a const version) */&lt;/dt&gt;
&lt;dt&gt;bool templateUsed()&lt;/dt&gt;
&lt;/dl&gt;&lt;p&gt;
&lt;tt&gt;PQ_qualifier&lt;/tt&gt; represents a namespace or similar component to a qualified name. This is handled in a right-associative manner, such that &lt;tt&gt;std::tr1::shared_ptr&lt;/tt&gt; would be the qualifier std, which qualifies tr1, which qualifies shared_ptr. This class has three variables: &lt;tt&gt;StringRef qualifier&lt;/tt&gt; (the name to the left of the double-colon), &lt;tt&gt;TemplateArgument *templArgs&lt;/tt&gt; (which represents the template arguments for templated class qualifiers), and &lt;tt&gt;PQName *rest&lt;/tt&gt; (the right of the double-colon).
&lt;/p&gt;&lt;p&gt;
&lt;tt&gt;PQ_operator&lt;/tt&gt; represents a &lt;tt&gt;PQName&lt;/tt&gt; that is actually an operator overload. It has just two variables: &lt;tt&gt;OperatorName *o&lt;/tt&gt;, the operator in question, and &lt;tt&gt;StringRef fakeName&lt;/tt&gt;, a string representation of the operator. The latter is essentially a space-less name of the function (except that &lt;tt&gt;operater new&lt;/tt&gt; and friends are represented as such, as well as conversion operators having the poor name of &lt;tt&gt;conversion-operator&lt;/tt&gt;).
&lt;/p&gt;&lt;p&gt;
&lt;tt&gt;PQ_template&lt;/tt&gt; represents a templated argument name. It again has two variables: the &lt;tt&gt;StringRef name&lt;/tt&gt; of the base type and the &lt;tt&gt;TemplateArgument *templArgs&lt;/tt&gt; that contains the arguments to the templatization. Note that if you are getting a member of a templated class, the name tree will have the &lt;tt&gt;PQ_qualifier&lt;/tt&gt; node instead.
&lt;/p&gt;&lt;p&gt;
&lt;tt&gt;PQ_name&lt;/tt&gt; is the other of &lt;tt&gt;PQName&lt;/tt&gt; (note the minor spelling difference). This has a single variable &lt;tt&gt;StringRef name&lt;/tt&gt; which is the name. This is by far the most common name node, since everything that is not an instantiated template or an operator name will have this in the name somewhere.
&lt;/p&gt;&lt;p&gt;
For standard names, all of the various string output methods save &lt;tt&gt;qualifierString&lt;/tt&gt; (which returns the empty string) will return the same thing, the &lt;tt&gt;name&lt;/tt&gt; variable from &lt;tt&gt;PQ_name&lt;/tt&gt; or the &lt;tt&gt;fakeName&lt;/tt&gt; from &lt;tt&gt;PQ_operator&lt;/tt&gt;. The differences arise when you have templates or qualified names.
&lt;/p&gt;&lt;p&gt;
If you have a qualified name, the methods change rather predictably. &lt;tt&gt;qualifierString&lt;/tt&gt; returns the entire qualification string before the tail node (e.g., &lt;tt&gt;std::auto_ptr&amp;lt;T&amp;gt;&lt;/tt&gt; becomes &lt;tt&gt;std::&lt;/tt&gt;). The &lt;tt&gt;toString&lt;/tt&gt; method and &lt;tt&gt;toString_noTemplArgs&lt;/tt&gt; return the fully qualified names (optionally without template instantation). &lt;tt&gt;toComponentString&lt;/tt&gt; becomes idempotent to the &lt;tt&gt;qualifier&lt;/tt&gt; variable. &lt;tt&gt;getName&lt;/tt&gt; will be essentially identical to &lt;tt&gt;getUnqualifedName()-&amp;gt;getName()&lt;/tt&gt;: it returns the name of the right most declarator.
&lt;/p&gt;&lt;p&gt;
Templated names also modify stuff predictably. &lt;tt&gt;toString_noTemplArgs&lt;/tt&gt; and &lt;tt&gt;getName&lt;/tt&gt; return the base name, without template arguments; &lt;tt&gt;toString&lt;/tt&gt; and &lt;tt&gt;toComponentString&lt;/tt&gt; return the name with template arguments.
&lt;/p&gt;&lt;p&gt;
The interesting stuff happens when you have a templated qualifier (e.g., &lt;tt&gt;std::set&amp;lt;int&amp;gt;::iterator&lt;/tt&gt;). In that case, the &lt;tt&gt;toString_noTemplArgs&lt;/tt&gt; will not strip the template args from the qualifier.
&lt;/p&gt;&lt;p&gt;
&lt;tt&gt;hasQualifiers()&lt;/tt&gt; is identical to the &lt;tt&gt;isPQ_qualifier()&lt;/tt&gt; method. &lt;tt&gt;templateUsed()&lt;/tt&gt; is true if the qualifier or template used the &lt;tt&gt;template&lt;/tt&gt; keyword. This is a feature that would be used for disambiguation purposes, such as this example (taken from the ISO C++ spec):
&lt;/p&gt;&lt;pre&gt;&lt;code&gt;
struct X {
  template&amp;lt;std::size_t&amp;gt; X* alloc();
  template&amp;lt;std::size_t&amp;gt; static X* adjust();
};
template&amp;lt;class T&amp;gt; void f(T* p) {
  // T* p1 = p-&amp;gt;alloc&amp;lt;200&amp;gt;(); (p-&amp;gt;alloc)&amp;lt;200&amp;gt;() -- syntax error
  T* p2 = p-&amp;gt;template alloc&amp;lt;200&amp;gt;();
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;
The last method to talk about is &lt;tt&gt;getUnqualifiedName&lt;/tt&gt;. This method simply returns the &lt;tt&gt;PQName&lt;/tt&gt; at the end of the name.
&lt;/p&gt;&lt;p&gt;
With all of the methods discussed, the most important question you're probably wondering about is the easiest way to get the name of a PQName. If you're trying to find a method or class name, &lt;tt&gt;getName&lt;/tt&gt; is your safest option. If you need to know the type arguments as well (say you're looking for particular instantations), &lt;tt&gt;getUnqualifiedName()-&amp;gt;toString()&lt;/tt&gt; is a better option.
&lt;/p&gt;&lt;p&gt;
If you're looking at class members, you can probably use &lt;tt&gt;toString_templNoArgs&lt;/tt&gt; successfully (when you're looking for a particular function), unless you're interested in the &lt;tt&gt;qualifierString()&lt;/tt&gt; (when you're looking for any function of the class). For cases where the namespace information is necessary, you probably want to investigate the name with the parallel type APIs of elsa, unless you want to maintain your own state for &lt;tt&gt;using&lt;/tt&gt; declarations and other shenanigans that make the original type non-obvious.
&lt;/p&gt;&lt;p&gt;
Unfortunately, while I earlier said that I intended to reference classes in this part of pork, it looks like I have not the time this week to cover them as well. At this point, it seems likely that part 5 will cover classes and part 6 will cover templates and errata. I cannot say what part 7 will touch: it will either be more errata or a start on the expression and statement code.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-1153590540630888675?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/1153590540630888675/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=1153590540630888675' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/1153590540630888675'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/1153590540630888675'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2009/08/guide-to-pork-part-4.html' title='A guide to pork, part 4'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-4445260913167390963</id><published>2009-08-04T17:16:00.002-04:00</published><updated>2009-08-04T17:43:35.272-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='jshydra'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>More jshydra news</title><content type='html'>Taking a break from my ongoing series on pork, it's now time for an overdue update on jshydra.
&lt;/p&gt;&lt;p&gt;
First off, I have pushed Andrew Sutherland's changes (most of them, anyways) to my repo. These consist of the ability to push arguments, proper JS constructor handling (which I had locally but didn't commit for some reason I long forgot), and some minor comment handling fixes. I also never mentioned my JS inheritance divination work from way back in May. I also fixed a bug and finally got around to making TOK_* variables visible in JS, like the JSOP_* variables.
&lt;/p&gt;&lt;p&gt;
&lt;a href="http://daviddahl.blogspot.com/"&gt;David Dahl&lt;/a&gt; has decided to use jshydra to &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=506128"&gt;find dead code in JS&lt;/a&gt;. That bug also has some discussion on how to run jshydra, as well as some work-in-progress files to grab JS information from the mozilla build system in a jshydra-friendly format.
&lt;/p&gt;&lt;p&gt;
Andrew Sutherland also used jshydra to generate documentation for TB. &lt;a href="http://www.visophyte.org/blog/2009/07/20/doccelerator-javascript-documentation-via-jshydra-into-couchdb-with-an-ajax-ui/"&gt;His blog posting&lt;/a&gt; does it more justice than I ever could.
&lt;/p&gt;&lt;p&gt;
Last night, I finally got around to creating an analogue of David Humphrey's &lt;a href="http://germany.proximity.on.ca/dehydra-web/process.cgi"&gt;Dehydra Web&lt;/a&gt; for JSHydra, predictably called &lt;a href="http://www.tjhsst.edu/~jcranmer/static/webjshydra/"&gt;JSHydra Web&lt;/a&gt;. As a note of caution, please don't upload your massive, 100+KB JS file for perusing: you'll find the output hard to read, and I don't want to overload the server.
&lt;/p&gt;&lt;p&gt;
As always, for more information on jshydra, you can either contact me by email (found on bugzilla), by IRC (most topical place of discussion is &lt;a href="irc://irc.mozilla.org/static"&gt;#static&lt;/a&gt;), or via the &lt;a href="news://news.mozilla.org/mozilla.dev.static-analysis"&gt;mozilla.dev.static-analysis newsgroup&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-4445260913167390963?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/4445260913167390963/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=4445260913167390963' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/4445260913167390963'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/4445260913167390963'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2009/08/more-jshydra-news.html' title='More jshydra news'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-8051276026907195346</id><published>2009-08-01T22:15:00.006-04:00</published><updated>2009-08-02T09:56:32.887-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='pork'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>A guide to pork, part 3</title><content type='html'>In the &lt;a href="http://quetzalcoatal.blogspot.com/2009/07/guide-to-pork-part-2.html"&gt;previous installment&lt;/a&gt;, I covered details on the patcher API and the basics of the Elsa AST API. If I were to write this as a reference manual, this is the point at which I would be spewing out a lot of verbose information on the C++ AST which would be hard to use for an introductory patch. One goal of this series of articles is to produce something close to a reference manual, so the parts will be numbered in that order. Instead, I will present the information in an order more likely to facilitate understanding.
&lt;/p&gt;&lt;p&gt;
In summary:&lt;br /&gt;
&lt;b&gt;Step 1&lt;/b&gt;: Building and running your tool&lt;br /&gt;
&lt;b&gt;Step 1.1&lt;/b&gt;: Running the patcher&lt;br /&gt;
&lt;b&gt;Step 2&lt;/b&gt;: Using the patcher&lt;br /&gt;
&lt;b&gt;Step 3&lt;/b&gt;: The structure of the Elsa AST&lt;br /&gt;
&lt;/p&gt;&lt;p&gt;
&lt;h3&gt;Aside 1: An introduction to porky&lt;/h3&gt;
Having had more time to evaluate the new porky wrappers since my announcement of its commit last week, let me introduce you to it briefly. The tool takes in a list of rewrite specifications such as &lt;tt&gt;type PRCondVar** =&gt; mozilla::CondVar**&lt;/tt&gt;, which will automatically change everything of the first type to the second type. It can also transform method calls into other method calls or even new/delete expressions. I would go into more detail, but that's for Chris Jones to discuss. If he ever blogs about it.
&lt;/p&gt;&lt;p&gt;
&lt;h3&gt;Step 3 (continued):&lt;/h3&gt;
The following is an image representation of the C++ AST, excluding specific Expression and Statement subclasses:&lt;br /&gt;
&lt;a href="http://4.bp.blogspot.com/_qW4UNslWKZU/SnT6YX50h5I/AAAAAAAAABw/sivIjiQWmh4/s1600-h/ast.png"&gt;&lt;img src="http://4.bp.blogspot.com/_qW4UNslWKZU/SnT6YX50h5I/AAAAAAAAABw/sivIjiQWmh4/s400/ast.png" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;
To examine AST examples in detail, let us consider the example of a typical hello world program in C++:
&lt;/p&gt;&lt;pre&gt;&lt;code&gt;
#include &amp;lt;iostream&amp;gt;
 
int main() {
  std::cout &amp;lt;&amp;lt; "Hello World!" &amp;lt;&amp;lt; std::endl;
  return 0;
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;
After being preprocessed, this small program becomes 19,153 lines of C++ goodness, thanks to the many header files being recursively included. Although we only define a single function, g++ defines for us namespaces (with GCC-specific attributes), templates, template specialization, typedefs, structs, classes, unions, function declarations, and half of the other C++ language features.
&lt;/p&gt;&lt;p&gt;
Let us look at each of these in turn.
&lt;/p&gt;&lt;p&gt;
&lt;h3&gt;Step 3.1.3: &lt;tt&gt;Function&lt;/tt&gt; (Function definitions)&lt;/h3&gt;
The &lt;tt&gt;Function&lt;/tt&gt; AST node represents the definition of a function. It contains these variables:
&lt;/p&gt;&lt;dl class="AST-def"&gt;
&lt;dt&gt;DeclFlags dflags&lt;/dt&gt;
&lt;dt&gt;TypeSpecifier retspec&lt;/dt&gt;
&lt;dt&gt;Declarator nameAndParams&lt;/dt&gt;
&lt;dt&gt;S_compound &lt;i&gt;/* (Statement) */&lt;/i&gt; body&lt;/dt&gt;
&lt;dt&gt;FakeList&amp;lt;MemberInit&amp;gt; inits&lt;/dt&gt;
&lt;dt&gt;FakeList&amp;lt;Handler&amp;gt; handlers&lt;/dt&gt;
&lt;/dl&gt;&lt;p&gt;
Unlike most AST nodes, the Function node does not have a &lt;tt&gt;SourceLoc loc&lt;/tt&gt; member; instead, it has a &lt;tt&gt;getLoc()&lt;/tt&gt; method which returns the location of the &lt;tt&gt;nameAndParams&lt;/tt&gt; member, which would represent the location of the name. In our example, this would be line 3, column 5 (the beginning of `main').
&lt;/p&gt;&lt;p&gt;
Most of the members are self-explanatory: they represent, in order, the declaration flags (such as &lt;tt&gt;static&lt;/tt&gt; or &lt;tt&gt;inline&lt;/tt&gt;), the type of the return, the name and parameters aglomeration, and the statements of the body. The last two will be less common and &lt;tt&gt;NULL&lt;/tt&gt; in the example I have here. The &lt;tt&gt;inits&lt;/tt&gt; member represents the intializers of a constructor, while the &lt;tt&gt;handlers&lt;/tt&gt; represents try-catch blocks that are scoped for the entire method. An example of such a block:
&lt;/p&gt;&lt;pre&gt;&lt;code&gt;
class Base {
  Base(const char *name)
    try : data(name) { } catch (...) { std::cerr &amp;lt;&amp;lt; "Oops!" &amp;lt&amp;lt; std::endl; }
  const std::string data;
};
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;
Strictly speaking, the enclosing try-catch is not limited to constructors, although I doubt it would be used outside of them except for stress-testing compilers.
&lt;/p&gt;&lt;p&gt;
There is little that you will want to do with these objects that does not fall under the use of other objects. Probably the most burning question you have is how to get the name of the function. These are the shortest ways:
&lt;tt&gt;func-&gt;nameAndParams-&gt;var-&gt;fullyQualifiedName0().c_str()&lt;/tt&gt; (going to a &lt;tt&gt;const char *&lt;/tt&gt;) and &lt;tt&gt;func-&gt;nameAndParams-&gt;decl-&gt;getDeclaratorId()-&gt;toString()&lt;/tt&gt; (going to an &lt;tt&gt;sm::string&lt;/tt&gt;). The latter form will probably be more helpful if you are looking for specific methods, since it overrides the == operator.
&lt;/p&gt;&lt;p&gt;
&lt;h3&gt;Step 3.4.5: &lt;tt&gt;DeclFlags&lt;/tt&gt; (declaration modifiers)&lt;/h3&gt;
&lt;tt&gt;DeclFlags&lt;/tt&gt; is an enum that specifiers certain flags about declarations. Each of the values is in the form DF_&lt;value&gt;, always uppercase. The standard declarations--auto, register, static, extern, mutable, inline, virtual, explicit, friend, and typedef--have values in this manner. Other flags in the enum have uses for the &lt;tt&gt;Variable&lt;/tt&gt; constructs, which I will not go into detail now.
&lt;/p&gt;&lt;p&gt;
&lt;h3&gt;Step 3.1.6: &lt;tt&gt;TypeSpecifier&lt;/tt&gt; (first half of declarations)&lt;/h3&gt;
A &lt;tt&gt;TypeSpecifier&lt;/tt&gt; node represents a declaration of type, such as &lt;tt&gt;char&lt;/tt&gt;. If you recall your C/C++ syntax, the declaration &lt;tt&gt;char *x, y;&lt;/tt&gt; declares only one variable as a pointer--the * is matched with the variable name and not the type itself; in the AST therefore, &lt;tt&gt;TypeSpecifier&lt;/tt&gt; does not receive these pointers (there come about in &lt;tt&gt;Declarator&lt;/tt&gt; nodes).
&lt;/p&gt;&lt;p&gt;
By themselves, these nodes only have a &lt;tt&gt;CVFlags cv&lt;/tt&gt; variable (representing the &lt;tt&gt;const&lt;/tt&gt;- and &lt;tt&gt;volatile&lt;/tt&gt;-ness of the type), as well as a &lt;tt&gt;SourceLoc loc&lt;/tt&gt; location member. Instead, it has five subclasses with more specific attributes.
&lt;/p&gt;&lt;p&gt;
&lt;i&gt;TS_name, TS_elaborated, TS_classSpec, and TS_enumSpec will be discussed in a future part.&lt;/i&gt;
&lt;/p&gt;&lt;p&gt;
&lt;tt&gt;TS_simple&lt;/tt&gt; nodes represent the built-in simple types, like &lt;tt&gt;char&lt;/tt&gt;. It has a single additional member, a &lt;tt&gt;SimpleTypeId id&lt;/tt&gt; member.
&lt;/p&gt;&lt;p&gt;
&lt;h3&gt;Step 3.1.11: &lt;tt&gt;Declarator&lt;/tt&gt; (the other half of declarations)&lt;/h3&gt;
A &lt;tt&gt;Declarator&lt;/tt&gt; node represents the non-type part of a declaration. It contains these variables and methods:
&lt;/p&gt;&lt;dl class="AST-def"&gt;
&lt;dt&gt;IDeclarator *decl&lt;/dt&gt;
&lt;dt&gt;Initializer *init&lt;/dt&gt;
&lt;dt&gt;PQName *getDeclaratorId() &lt;i&gt;/*(And a const version)*/&lt;/i&gt;&lt;/dt&gt;
&lt;/dl&gt;&lt;p&gt;
&lt;h3&gt;Step 3.1.12: &lt;tt&gt;IDeclarator&lt;/tt&gt; (the real declaration part)&lt;/h3&gt;
The &lt;tt&gt;IDeclarator&lt;/tt&gt; node does the heavy work in terms of annotating declarations. In some corner cases, the AST for a declarator can get very deep; the parallel type structure makes working with declarations much easier.
&lt;/p&gt;&lt;p&gt;
With the exception of the &lt;tt&gt;D_name&lt;/tt&gt; and &lt;tt&gt;D_bitfield&lt;/tt&gt; subclasses, all subclasses contain at least an &lt;tt&gt;IDeclarator *base&lt;/tt&gt; member. In addition, &lt;tt&gt;IDeclarator&lt;/tt&gt; holds these variables and methods:
&lt;/p&gt;&lt;dl class="AST-def"&gt;
&lt;dt&gt;SourceLoc loc&lt;/dt&gt;
&lt;dt&gt;PQName *getDeclaratorId() &lt;i&gt;/*(And a const version)*/&lt;/i&gt;&lt;/dt&gt;
&lt;dt&gt;IDeclarator *getBase() &lt;i&gt;/*(And a const version)*/&lt;/i&gt;&lt;/dt&gt;
&lt;dt&gt;IDeclarator *skipGroups()&lt;/dt&gt;
&lt;dt&gt;bool bottomIsDfunc()&lt;/dt&gt;
&lt;dt&gt;D_func getD_func()&lt;/dt&gt;
&lt;/dl&gt;&lt;p&gt;
The &lt;tt&gt;skipGroups&lt;/tt&gt; method skips through any excess groupings (i.e., parentheses layers). &lt;tt&gt;getBase&lt;/tt&gt; returns either the base of the declaration or NULL if it is a leaf. &lt;tt&gt;getD_func&lt;/tt&gt; returns the bottom-most &lt;tt&gt;D_func&lt;/tt&gt; node, while &lt;tt&gt;bottomIsDfunc&lt;/tt&gt; will tell you if the declaration is a function (but not a function pointer). &lt;tt&gt;getDeclaratorId&lt;/tt&gt; obviously returns the name of the object at the very base of the declaration tree.
&lt;/p&gt;&lt;p&gt;
The first subclass is &lt;tt&gt;D_name&lt;/tt&gt;, which is the typical leaf of the declaration. It has a single &lt;tt&gt;PQName *name&lt;/tt&gt; attribute, which returns the name of the variable or function being declared. The other leaf class is &lt;tt&gt;D_bitfield&lt;/tt&gt;, which has both the name attribute as well as an &lt;tt&gt;Expression *bits&lt;/tt&gt; representing the number of bits in the declaration.
&lt;/p&gt;&lt;p&gt;
&lt;tt&gt;D_pointer&lt;/tt&gt; and &lt;tt&gt;D_reference&lt;/tt&gt; represent a pointer or reference indirection, respectively; &lt;tt&gt;D_pointer&lt;/tt&gt; additionally has a &lt;tt&gt;CVFlags cv&lt;/tt&gt; variable representing the qualifications of its pointer type.
&lt;/p&gt;&lt;p&gt;
&lt;tt&gt;D_ptrToMember&lt;/tt&gt; is similar to &lt;tt&gt;D_pointer&lt;/tt&gt;, but it adds another &lt;tt&gt;PQName *nestedName&lt;/tt&gt; attribute to represent the construct whose member the pointer is pointing to. For those who don't recognize this feature, it can be demonstrated in this C++ snippet:
&lt;/p&gt;&lt;pre&gt;&lt;code&gt;
struct Foo { int Bar() { return 0; } };
typedef int (Foo::*ptrToMember)();
// The declarator tree (following base):
// D_func-&gt;D_grouping-&gt;D_ptrToMember-&gt;D_name
ptrToMember p = &amp;Foo::Bar;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;
&lt;tt&gt;D_array&lt;/tt&gt; represents an array declaration. In addition to its base, it also has the possibly null &lt;tt&gt;Expression *size&lt;/tt&gt; member to represent the size of the array. Multidimensional arrays have these members arranged from the outside in.
&lt;/p&gt;&lt;p&gt;
&lt;tt&gt;D_grouping&lt;/tt&gt; is a dummy node used mostly for the purposes of AST disambiguation during the parsing phase (it represents the use of parentheses in declarations). The &lt;tt&gt;skipGroups&lt;/tt&gt; function can be used to pass these nodes.
&lt;/p&gt;&lt;p&gt;
&lt;tt&gt;D_func&lt;/tt&gt; is necessarily the most complex declarator. Its base is typically the name of the function, although function pointers can have nested declarators. It has a &lt;tt&gt;FakeList&amp;lt;ASTTypeId&amp;gt; *params&lt;/tt&gt; member for the parameters, &lt;tt&gt;CVFlags cv&lt;/tt&gt; member for the const member functions, and &lt;tt&gt;ExceptionSpec *exnSpec&lt;/tt&gt; for the exception specifiers.
&lt;/p&gt;&lt;p&gt;
Predicting declarator trees is not difficult. In general, you can apply standard rules to find the declarator: each * and &amp; at the beginning creates the respective declarator indirection node; a [] or () at the end creates the array and function nodes, respectively; surrounding with paranthesis yields a grouping node. Pointers to members are created when the syntax is used (look for the ::*), and the choice between bitfield and name as the leaf comes from obvious decisions. One also needs to remember that the structures on the left are parsed before the ones on the right, unless overriden by parentheses, so the obscene &lt;tt&gt;int (*(*asdf)())[0]&lt;/tt&gt; is yielded as array, grouping, pointer, function, grouping, pointer, name. If you're wondering, it's a zero-sized array named "asdf" that contains pointers to functions that return poiinters to integers.
&lt;/p&gt;&lt;p&gt;
There is much more to talk about in the world of ASTs. This will get you started on function bodies and declarations; the next part of the guide will cover more in-depth knowledge on declarations and begin to introduce classes.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-8051276026907195346?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/8051276026907195346/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=8051276026907195346' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/8051276026907195346'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/8051276026907195346'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2009/08/guide-to-pork-part-3.html' title='A guide to pork, part 3'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_qW4UNslWKZU/SnT6YX50h5I/AAAAAAAAABw/sivIjiQWmh4/s72-c/ast.png' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-4719962143483267425</id><published>2009-07-22T21:11:00.003-04:00</published><updated>2009-07-22T21:15:26.031-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='pork'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>A guide to pork, part 2</title><content type='html'>Last time, &lt;a href="http://quetzalcoatal.blogspot.com/2009/07/guide-to-pork-part-1.html"&gt;I covered&lt;/a&gt; the very basics of using &lt;a href="https://developer.mozilla.org/en/Pork"&gt;pork&lt;/a&gt;. In this portion of the guide, I will cover enough to get you to be able to write a small patch.
&lt;/p&gt;&lt;p&gt;
Since the time I wrote the first part of the guide, &lt;a href="http://blog.mozilla.com/cjones"&gt;Chris Jones&lt;/a&gt; committed some tool wrappers known collectively as porky, which may necessitate updates to first steps.
&lt;/p&gt;&lt;p&gt;
In summary:&lt;br/&gt;
&lt;b&gt;Step 1&lt;/b&gt;: Building and running your tool&lt;br /&gt;
&lt;b&gt;Step 1.1&lt;/b&gt;: Running the patcher
&lt;/p&gt;&lt;p&gt;
&lt;h3&gt;Step 2: Using the patcher&lt;/h3&gt;
The patcher works internally (more or less) by keeping a list of ranges and their replacement text, which it eventually uses to build up hunks that it then spits out to an output stream. The public API it provides (as of the current tip, in any case) comes in two sections: some file utility functions and text replacement functions.
&lt;/p&gt;&lt;p&gt;
Locations can be represented by one of three different types. The first is &lt;tt&gt;SourceLoc&lt;/tt&gt;, which is a bit-packed integer that the elsa AST nodes give you. Then there is &lt;tt&gt;CPPSourceLoc&lt;/tt&gt;, which is an only slightly less manageable location format. The final form is &lt;tt&gt;UnboxedLoc&lt;/tt&gt;, which is the easiest one to work with.
&lt;/p&gt;&lt;p&gt;
As I mentioned earlier, the patcher actually works with pairs of these objects. &lt;tt&gt;PairLoc&lt;/tt&gt; and &lt;tt&gt;UnboxedPairLoc&lt;/tt&gt; are pairs of &lt;tt&gt;CPPSourceLoc&lt;/tt&gt; and &lt;tt&gt;UnboxedLoc&lt;/tt&gt;, respectively. The two are constructed in a pretty intuitive manner (although note that as &lt;tt&gt;UnboxedLoc&lt;/tt&gt;s do not store the file, you need to pass that into its pair type). Note that ranges include the left but not the right endpoint.
&lt;/p&gt;&lt;p&gt;
The class &lt;tt&gt;Patcher&lt;/tt&gt; itself contains two methods for patching stuff: &lt;tt&gt;printPatch&lt;/tt&gt;, which replaces text, and &lt;tt&gt;insertBefore&lt;/tt&gt;, which inserts the text before a location. If you want to delete text, the answer is to replace a range with the empty string.
&lt;/p&gt;&lt;p&gt;
While this is nice, the patcher does suffer from a few flaws. The biggest of these that I've found is really a flaw in elsa: not all nodes have source and end locations (only statements and expressions), requiring me to roll my own search functions. Fortunately, the file API of patcher helps here.
&lt;/p&gt;&lt;p&gt;
The other big flaw is the difficulty of coping with visually important but semantically meaningless clues, namely comments and whitespace. If you naïvely delete text, you may end up with comments whose referents no longer exist or blocks of whitespace where code once was. Inserted text may violate local code conventions. I have not yet expended the effort yet to get this to work; you will either have to do this yourself, bug taras to do it, or possibly both.
&lt;/p&gt;&lt;p&gt;
Now, if you want to see some code in action:
&lt;pre&gt;&lt;code&gt;
// Here, func is a pointer to an elsa AST expression node
// And type a string representing its replacement
// patcher is of course a Patcher object.
patcher.printPatch(type, PairLoc(func-&gt;loc, func-&gt;endloc));

// Elsewhere
UnboxedPairLoc findAndMakePair(Patcher p, const SourceLoc &amp;loc,
    char toFind) {
  int lLine, lCol;
  StringRef file;
  sourceLocManager-&gt;decodeLineCol(loc, file, lLine, lCol);
  int lineNo = lLine, col;
  do {
    std::string line = patcher.getLine(lineNo++, file);
    col = line.find(toFind);
  } while (col == -1);

  return UnboxedPairLoc(file, UnboxedLoc(lLine, lCol),
    UnboxedLoc(lineNo - 1, col + 2));
}
&lt;/code&gt;&lt;/pre&gt;
&lt;/p&gt;&lt;p&gt;
&lt;h3&gt;Step 3: The structure of the Elsa AST&lt;/h3&gt;
The core of pork is the ability to parse AST nodes. In general, these fall under three categories: top-level declarations (possibly within classes or namespaces), statement and expression nodes, and utility nodes.
&lt;/p&gt;&lt;p&gt;
The basic structure of an AST node class is like this:
&lt;pre&gt;&lt;code&gt;
// A typical node type
class TypeSpecifier {
public:
  // Almost all nodes have these
  // Those that don't wouldn't make sense
  SourceLoc loc;

  // These methods are for nodes with subtypes
  // if returns null if it isn't the correct type; as throws
  char const *kindName() const;
  TS_name const *ifTS_nameC() const;
  TS_name *ifTS_name();
  TS_name const *asTS_nameC() const;
  TS_name *asTS_name();
  bool isTS_name() const;

  // There's another parameter that you'll never use
  void debugPrint(std::ostream &amp;, int indent);
  void traverse(ASTVistor &amp;vis);
};
class TS_name: public TypeSpecifier {
public:
  // Typically has some more data nodes
  PQName *name;
  bool typenameUsed;
};
&lt;/code&gt;&lt;/pre&gt;
&lt;/p&gt;&lt;p&gt;
To use these nodes, pork follows a typical visitor pattern. The class &lt;tt&gt;ASTVisitor&lt;/tt&gt; will visit all of the node types; &lt;tt&gt;ExpressionVisitor&lt;/tt&gt; subtypes have individual methods for visiting subtypes of statements or expressions. You can choose to look at nodes in either a pre or postorder traversal. A previsit traversal function is in the form:&lt;br/&gt;
&lt;tt&gt;virtual bool visitTypeSpecifier(TypeSpecifier *);&lt;/tt&gt;&lt;br/&gt;
(where the return is whether or not to dig down deeper), and a postvisit in the form:&lt;br/&gt;
&lt;tt&gt;virtual void postvisitTypeSpecifier(TypeSpecifier *);&lt;/tt&gt;.
&lt;/p&gt;&lt;p&gt;
Hopefully, this is enough to get you started on being able to use pork. In my next part, I will cover the AST nodes in more detail.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-4719962143483267425?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/4719962143483267425/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=4719962143483267425' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/4719962143483267425'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/4719962143483267425'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2009/07/guide-to-pork-part-2.html' title='A guide to pork, part 2'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-8087115905333943366</id><published>2009-07-20T21:10:00.003-04:00</published><updated>2009-07-20T22:24:08.661-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='pork'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>A guide to pork, part 1</title><content type='html'>As one of the first people to have actually used pork (apparently the third, after taras and cjones), I feel obliged to give a guide as to how to write an automatic patch generator, so as best to prevent people from asking the same question a fourth time. This also contains some ranting about some of my annoyances with pork (&lt;tt&gt;sm::string&lt;/tt&gt;, I'm looking at you). So, without further ado, I present &lt;b&gt;Part 1: It works!&lt;/b&gt;
&lt;/p&gt;&lt;p&gt;
&lt;h3&gt;A brief introduction&lt;/h3&gt;
Pork essentially consists of three main areas of API (enumerated in order of my discovery): the patcher API, the C++ AST structure direct from the parser, and the annotated APIs that make finding information more than a bit easier. There is something which constitutes a sort of fourth API, the utilities that partially replicate functionality in the STL.
&lt;/p&gt;&lt;p&gt;
My original interest in pork came from an idea to rewrite libmime, which is roughly a basic C++ implementation in C, into the equivalent C++ code. Such a patch would be on the upper end of difficulty for a normal shell, python, or awk script to rewrite: I need to combine classes, rewrite function prototypes, rename variables, and refactor globs of code like&lt;br /&gt;
&lt;tt&gt;return ((MimeObjectClass*)&amp;amp;MIME_SUPERCLASS)-&gt;initialize(object);&lt;/tt&gt;&lt;br /&gt;
into&lt;br/&gt;
&lt;tt&gt;return MimeLeaf::initialize();&lt;/tt&gt;
&lt;/p&gt;&lt;p&gt;
&lt;h3&gt;Step 1: Building and running your tool&lt;/h3&gt;
The first step is to build pork. &lt;a href="https://developer.mozilla.org/en/Installing_Pork"&gt;Taras's guide&lt;/a&gt; will likely be more up-to-date than any instructions I give. Now you have an installation of pork. After that, you can plug in your own tool into the structure. I've personally handled this by making a tools/ subdirectory and making a very neat Makefile that automatically adds files to be compiled into the tools themselves.
&lt;/p&gt;&lt;p&gt;
Your tool will eventually be invoked &lt;tt&gt;tool &amp;lt;args&amp;gt; &lt;i&gt;filename&lt;/i&gt;&lt;/tt&gt; if you are using the pork-barrel script. All that pork-barrel does is to run the programs one at a time and to merge the outputted patch in the end; you don't need to use it (and I recommend you don't) as you start your tool. The files it runs on are preprocessed files, generally with the extension &lt;tt&gt;i&lt;/tt&gt; or &lt;tt&gt;ii&lt;/tt&gt;. Invoking gcc with &lt;tt&gt;-save-temps&lt;/tt&gt; is a nice way of generating these files. You don't need to use mcpp if you're not overly concerned about stuff lurking in macros.
&lt;/p&gt;&lt;p&gt;
&lt;h3&gt;Step 1.1: Running the patcher&lt;/h3&gt;
Once your tool processes its arguments, it will eventually be reading the C++ files and patching them. Here is some sample code to do that, which I provide without comment (it's just boilerplate):
&lt;/p&gt;&lt;tt&gt;&lt;pre&gt;
#include "piglet.h"
#include "expr_visitor.h"
#include "patcher.h"

class &lt;i&gt;MainVisitor&lt;/i&gt;: public ExpressionVisitor {
public:
  &lt;i&gt;MainVisitor&lt;/i&gt;(Patcher &amp;p): patcher(p) {}
private:
  Patcher &amp;patcher;
};

int main(int argc, char **argv) {
  PigletParser parser;
  Patcher p;
  &lt;i&gt;MainVisitor&lt;/i&gt; visitor(p);
  for (int i = 1; i &lt; argc; i++) {
    TranslationUnit *unit = parser.getASTNoExc(argv[i]);
    unit-&gt;traverse(visitor);
  }
  return 0;
}
&lt;/pre&gt;&lt;/tt&gt;&lt;p&gt;
The necessary APIs for the utilities will eventually be covered in more detail. Unfortunately, it's late, and you now have a working, if idempotent, pork utility. Next time, I'll discuss the basics of &lt;tt&gt;Patcher&lt;/tt&gt; and &lt;tt&gt;ExpressionVisitor&lt;/tt&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-8087115905333943366?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/8087115905333943366/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=8087115905333943366' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/8087115905333943366'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/8087115905333943366'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2009/07/guide-to-pork-part-1.html' title='A guide to pork, part 1'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-6123301578903846557</id><published>2009-05-06T21:20:00.003-04:00</published><updated>2009-05-06T22:07:07.522-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='jshydra'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>jshydra news</title><content type='html'>Since I happen to have some downtime thanks to the short space between college and a summer job, and some spare time with four wisdom teeth newly missing from my mouth, I have returned to work on jshydra. I am in the midst of simplifying the build structure for new or non-Mozilla hackers, and am also patching up one of the main scripts I have, the documentation-association script.
&lt;/p&gt;&lt;p&gt;
On Monday, May 11, 2008, at 1:00 PM EDT (10:00 AM PDT, or 1700 UTC), I will be holding in the #static IRC room for the benefit of users a "learn jshydra" day. You may of course ask me questions any time I'm actually on IRC (nick: jcranmer or derivatives thereof).
&lt;/p&gt;&lt;p&gt;
I hope to finish up the build structure by then and get a wiki article on the topic started as well. A feature request list is already starting, including the ability to take JS out from an HTML file. If you have others, communicate it to me somehow, and I'll stick it somewhere where I can remember it.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-6123301578903846557?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/6123301578903846557/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=6123301578903846557' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/6123301578903846557'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/6123301578903846557'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2009/05/jshydra-news.html' title='jshydra news'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-5774176159213655613</id><published>2009-05-02T21:13:00.003-04:00</published><updated>2009-05-02T21:55:24.089-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='profiling'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>Is this faster?</title><content type='html'>Last Wednesday morning (in the EDT timezone that I go to school in), I took my final in Probability &amp; Statistics. It was a moment that I didn't think too much about until this morning, when, in one of the newsgroups I frequent, there was a discussion as to how much faster one option was compared to another. This post come up:
&lt;/p&gt;&lt;p&gt;
&lt;q&gt;A good idea would be to measure each iteration separately and then discard outliers by e.g. discarding those that exceed the abs diff between the mean and the stddev.&lt;/q&gt;
&lt;/p&gt;&lt;p&gt;
I leave it as an exercise to the reader to figure out why that's &lt;strong&gt;not&lt;/strong&gt; a good idea. In any case, the last unit of the class dealt with the fun part of statistics, which is to actually evaluate whether observed data is statistically significant. The actual math involved isn't too bad, assuming you have someone spit out a cumulative distribution function for the &lt;a href="http://en.wikipedia.org/wiki/Student%27s_t-distribution"&gt;t-distribution&lt;/a&gt; for you (here is &lt;a href="http://www.google.com/codesearch/p?hl=en#5jiG3hroW30/trunk/boost.mod/src/boost/math/distributions/students_t.hpp&amp;l=106"&gt;Boost's code&lt;/a&gt;), but it's a bit convoluted to write here, so read &lt;a href="http://en.wikipedia.org/wiki/Statistical_hypothesis_testing"&gt;Wikipedia's page&lt;/a&gt;. The correct tests are actually the two-sampled t-tests.
&lt;/p&gt;&lt;p&gt;
But I got to thinking. One thing I've wanted to see for a while is a profiling extension, one that would run some JS code snippet multiple times and produce various reports on profiling runs. Wouldn't it be nice if such an extension could compare two runs and determine if they was a statistically significant difference between them?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-5774176159213655613?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/5774176159213655613/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=5774176159213655613' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/5774176159213655613'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/5774176159213655613'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2009/05/is-this-faster.html' title='Is this faster?'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-6957237552959258785</id><published>2009-04-01T08:28:00.002-04:00</published><updated>2009-04-01T08:37:13.676-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mailnews'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>Bugday: help us unconfirm new bugs!</title><content type='html'>Yesterday, Reed helpfully enabled the NEW -&gt; UNCO transition in &lt;a href="https://bugzilla.mozilla.org"&gt;Mozilla's bugzilla installation&lt;/a&gt;, which will enable us to reverse some mistaken decisions in the past. Looking at the &lt;a href="https://bugzilla.mozilla.org/report.cgi?x_axis_field=bug_status&amp;y_axis_field=component&amp;z_axis_field=product&amp;query_format=report-table&amp;short_desc_type=allwordssubstr&amp;short_desc=&amp;product=Core&amp;product=MailNews+Core&amp;product=Thunderbird&amp;long_desc_type=substring&amp;long_desc=&amp;bug_file_loc_type=allwordssubstr&amp;bug_file_loc=&amp;status_whiteboard_type=allwordssubstr&amp;status_whiteboard=&amp;keywords_type=allwords&amp;keywords=&amp;bug_status=UNCONFIRMED&amp;bug_status=NEW&amp;bug_status=ASSIGNED&amp;bug_status=REOPENED&amp;resolution=---&amp;emailassigned_to1=1&amp;emailtype1=exact&amp;email1=&amp;emailassigned_to2=1&amp;emailreporter2=1&amp;emailqa_contact2=1&amp;emailtype2=exact&amp;email2=&amp;bugidtype=include&amp;bug_id=&amp;votes=&amp;chfieldfrom=&amp;chfieldto=Now&amp;chfieldvalue=&amp;format=table&amp;action=wrap&amp;field0-0-0=product&amp;type0-0-0=equals&amp;value0-0-0=Thunderbird&amp;field0-0-1=product&amp;type0-0-1=equals&amp;value0-0-1=MailNews+Core&amp;field0-0-2=component&amp;type0-0-2=anywords&amp;value0-0-2=News+IMAP+POP+SMTP"&gt;current list of bugs&lt;/a&gt;, we have way too many NEW bugs to be healthy (Five thousand three hundred eighty-five, to be precise, at this time of writing).
&lt;/p&gt;&lt;p&gt;
So please help us triage our components by finding NEW bugs that do not have clear steps to reproduce and unconfirming them. While you're at it, you might want to try to do &lt;a href="https://wiki.mozilla.org/Thunderbird:Bug_Triage"&gt;other triage&lt;/a&gt; on the bugs.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-6957237552959258785?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/6957237552959258785/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=6957237552959258785' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/6957237552959258785'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/6957237552959258785'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2009/04/bugday-help-us-unconfirm-new-bugs.html' title='Bugday: help us unconfirm new bugs!'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-6526677157871035521</id><published>2009-02-18T19:52:00.013-05:00</published><updated>2009-02-25T16:18:45.561-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mailnews'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>A database proposal</title><content type='html'>A sore point in the mailnews code is the message database code. Most of the backend ends up being a single amorphous blob with fuzzy boundaries amassing huge couplings between a server's implementation and the database. Add into account the fact that the database documentation (like most of mailnews, but worse) is often either poorly documented or sometimes just plain wrong, and you get a recipe for disaster. There's also the issue, probably the most important one, that the database has grown past its original intent.
&lt;/p&gt;&lt;p&gt;
Originally, the message was merely a cache of the information for the display. Since it was only a cache, it doesn't matter that much if it is blown away and reparsed from the original source. Well, there's a little matter of the ability to set an arbitrary property that isn't reflected in the mbox source. This capability, among other features, has made the message database a ticking time bomb. And, in essence, the bomb recently exploded when I attempted to make it usable from JavaScript.
&lt;/p&gt;&lt;p&gt;
So, in the mid-to-long-term, the database needs serious fixing, not the incremental band-aids applied all over it. It needs a real design to fit its modern and future purposes. Naturally, the first question is what does a database need to do. Following are salient points:
&lt;/p&gt;&lt;dl&gt;
&lt;dt&gt;The database is really multiple, distinct components.&lt;/dt&gt;
&lt;dd&gt;One part of the database is a relational database: metadata for a message that is not reflected in the message itself. If an extension wants to keep information on certain message properties (like how junk-y it is), it would stick the information in this relational database. The second part of the database is a combination of the message store and cache. This part is what the database used to be: a store of information easily recoverable from the message store. Note that this part of the database needs to be at least partially coupled with the message store, more on this later.&lt;/dd&gt;

&lt;dt&gt;The relational database is separate from the cache database.&lt;/dt&gt;
&lt;dt&gt;The cache database exposes a unique, persistent identifier for messages.&lt;/dt&gt;
&lt;dd&gt;While the cache database can, and probably will, be regenerated often, the relational database is permanent. Indeed, the cache database blowing itself away should not cause the relational database to have to do anything. At present, the cache uses ephemeral IDs as unique identifers: IMAP UIDs (destroyed if UIDVALIDITY is sent), mbox file offsets (destroyed if the mbox changes), or NNTP article keys (can of worms there &lt;a href="#dbnote1" id="dbref1"&gt;[1]&lt;/a&gt;). In my proposal, the cache would map these IDs to more persistent ones. Yes, it makes reconstructing the database more difficult, but it makes everyone else's lives easier.

&lt;dt&gt;The cache database may be rebuilt if the store is newer.&lt;/dt&gt;
&lt;dt&gt;The cache database rebuild should be incremental.&lt;/dt&gt;
&lt;dt&gt;The relational database should not be ever automatically rebuilt.&lt;/dt&gt;
&lt;dd&gt;One of the main problems as it stands is the rebuild of the cache database. It has been, in general, assumed that rebuilding the database would never lose information, but the database has become the only store of some information. I am not certain of technical feasibility, but there is in general no need to reparse a 2GB mbox file if you compact out a few messages. Even in an IMAP UIDVALIDITY event, I expect that not all of the UIDs would be changed. Incrementalism would make the database more usable during painful rebuilds, but, naturally, it would require more careful coding.&lt;/dd&gt;

&lt;dt&gt;The cache database's rebuild policy is caller-configurable.&lt;/dt&gt;
&lt;dd&gt;What I mean about this is that the cache database will be accessible via one of three calls: an immediate call that will get the database, even if invalid; a call which will get the database but spawn an asynchronous rebuild event &lt;a href="#dbnote2" id="dbref2"&gt;[2]&lt;/a&gt;; and a call that will block until the database finishes rebuilding, if necessary. The implications of having asynchronous rebuild would require the database to be thread-safe, but I expect that the future of the database already includes asynchronous calls. At the very least, it might help in some cases where we've run into thread safety issues in the past (such as import).&lt;/dd&gt;

&lt;dt&gt;The cache database has access to the message store.&lt;/dt&gt;
&lt;dt&gt;There are three types of store: local-only, local caching remote, and remote-only.&lt;/dt&gt;
&lt;dt&gt;The folder can only access the store through the database.&lt;/dt&gt;
&lt;dd&gt;These points are probably the ones I'm least comfortable with, but I think it's necessary. In the long-term, pluggable message stores and the store-specific mechanisms of database means that the cache database needs to have intimate access with the store. Having explicit interfaces for the message store should allow us to avoid having to subclass &lt;tt&gt;nsIMsgDatabase&lt;/tt&gt; for the different storage types. Limiting access via the folder should help cut down the bloat on &lt;tt&gt;nsIMsgFolder&lt;/tt&gt;. On the other hand, it would probably make the code do a lot more round-tripping, which could lead to more leaks.&lt;/dd&gt;

&lt;dt&gt;The cache database is per-account, not per-folder.&lt;/dt&gt;
&lt;dd&gt;A cleverly-designed per-account store could alleviate some problems. It would make working with cross-posted messages easier, and could, in principle, use less disk if you move messages between folders on the same local stores or caches. Copied messages could point to the same messages (in the spirit of filesystem hard links), so long as we don't permit local message modification.&lt;/dd&gt;
&lt;/dl&gt;&lt;p&gt;
If I haven't missed anything, that is how I see a future database system. Obviously, implementation would not be easy; I expect it would take at least a year or even two years of concentrated work to produce something close to these ideals. There are incremental steps to this future, but they seem to me to be towering steps at many cases (for example, introducing the database to the store, or making it usable from different threads). In any case, I'm interested in hearing feedback on this proposal.&lt;/p&gt;
&lt;p&gt;&lt;a href="#dbref1" id="dbnote1"&gt;[1]&lt;/a&gt;In recent months, some of Giganews' binary newsgroups had begun to press distressingly close to the signed-32 bit limit, which raised the question of what to do. One proposal would have been to reset the ids or maybe wrap around. A news client should be able to handle this case if practical to do so, IMO.&lt;/p&gt;
&lt;p&gt;&lt;a href="#dbref2" id="dbnote2"&gt;[2]&lt;/a&gt;I expect that this method would use the invalid database, although it could be implemented by having the various method calls block until validity. Since it's possible that a caller could use the blocking-get-database call as well, this approach makes significantly less sense to me.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-6526677157871035521?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/6526677157871035521/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=6526677157871035521' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/6526677157871035521'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/6526677157871035521'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2009/02/database-proposal.html' title='A database proposal'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-2622023119198208082</id><published>2009-01-25T15:36:00.006-05:00</published><updated>2011-04-11T08:08:28.460-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='jshydra'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>JSHydra</title><content type='html'>Over the past week or so, in between my myriads of projects, I managed to find time to revive a partially-completed tool called jshydra (properly capitalized as "JSHydra"). JSHydra, in both name and code, is derived from &lt;a href="https://developer.mozilla.org/en/Dehydra"&gt;Dehydra&lt;/a&gt; and &lt;a href="http://developer.mozilla.org/en/Treehydra"&gt;Treehydra&lt;/a&gt;: it is a static analyzer for JavaScript. I first decided that such a tool was needed when I was in the middle of &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=413260"&gt;a large refactoring of address book code&lt;/a&gt;.
&lt;/p&gt;&lt;p&gt;
The source code for jshydra can be found &lt;a href="http://hg.mozilla.org/users/Pidgeot18_gmail.com/jshydra"&gt;at my Mozilla user hg repo&lt;/a&gt;. Presently, it requires some hackery and file modifications to merely build the system. I use SpiderMonkey internal APIs (the parsing APIs, to be exact), so that's where most of the hackery comes into play.
&lt;/p&gt;&lt;p&gt;
For the time being, I'm probably going to put aside doing more work on jshydra, as I have &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=444093"&gt;more&lt;/a&gt; &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=311774"&gt;pressing&lt;/a&gt; &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=118665"&gt;work&lt;/a&gt; on my plate at this time. If you have any questions, feel free to contact me via IRC (handle: &lt;tt&gt;jcranmer&lt;/tt&gt; (I should be in &lt;tt&gt;&lt;a href="irc://irc.mozilla.org/mmgc"&gt;#mmgc&lt;/a&gt;&lt;/tt&gt;)) or via my email address as listed in bugzilla.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-2622023119198208082?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/2622023119198208082/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=2622023119198208082' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/2622023119198208082'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/2622023119198208082'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2009/01/jshydra.html' title='JSHydra'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-4088541694476266576</id><published>2009-01-20T00:12:00.002-05:00</published><updated>2009-01-20T00:17:27.962-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='opinions'/><category scheme='http://www.blogger.com/atom/ns#' term='politics'/><title type='text'>My other life</title><content type='html'>I have just opened up &lt;a href="http://jtcranmer.blogspot.com/"&gt;a new blog&lt;/a&gt; covering those topics that are not oriented towards my work with Mozilla or other similar technically-oriented projects.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-4088541694476266576?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/4088541694476266576/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=4088541694476266576' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/4088541694476266576'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/4088541694476266576'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2009/01/my-other-life.html' title='My other life'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-6764056945751113166</id><published>2009-01-13T22:27:00.009-05:00</published><updated>2009-01-14T08:44:50.290-05:00</updated><title type='text'>Eight things you didn't know about me</title><content type='html'>Apparently it's my turn for these meme, according to &lt;a href="http://blog.mozilla.com/tglek/2009/01/13/seven-things-you-may-not-know-about-me/"&gt;taras&lt;/a&gt;. Oh yes, and &lt;a href="http://home.kairo.at/blog/2009-01/53v3n_7h1ng5"&gt;KaiRo&lt;/a&gt; too. The rules:
&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Link back to your original tagger and list the rules in your post.&lt;/li&gt;
&lt;li&gt;Share seven facts about yourself.&lt;/li&gt;
&lt;li&gt;Tag some (seven?) people by leaving names and links to their blogs.&lt;/li&gt;
&lt;li&gt;Let them know they’ve been tagged.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;
So let's get this show on the road! Eight things you didn't know about me (why eight? Because &lt;a href="http://thatguywiththeglasses.com/videolinks/thatguywiththeglasses/nostalgia-critic"&gt;I like to go one step further&lt;/a&gt;):
&lt;/p&gt;&lt;ol&gt;
&lt;li&gt;When I was tested around first grade, my auditory processing skills were rated as "retarded." Possibly related to this fact, I was apparently deaf or near-deaf at the same time, and I'm still rather poor with auditory processing to this day.&lt;/li&gt;
&lt;li&gt;I nearly flunked 7th grade biology but got the second-highest grade in my 9th grade biology class, I believe averaging over 100%. Indeed, I probably still recall large hunks of biology although biology is my least favorite core science and I wish I could forget it.&lt;/li&gt;
&lt;li&gt;Despite having a high school GPA of 3.99 (where the best grade is a 4.0, and AP credit adds 0.5 to the final grade), the first time in my entire life that I achieved a perfect 4.0 was my first semester of college. My grades have often been littered with scores that just make the next grade level, such as a 93.52 in one year of French, or a 89.50 in Psychology.&lt;/li&gt;
&lt;li&gt;Nearly all of my programming knowledge I taught myself. The only languages I learned as part of a class were (in order) Python, FORTRAN, and Smalltalk. All other languages (roughly speaking, Java, C/C++, x86 assembly, PHP, bash scripting, awk, sed, etc.). I taught myself before any courses that used them taught it to us. One day, for Senior Switch Day (a high school tradition), I taught a Computer Architecture lecture covering, in order, how vtables are implemented in gcc, how to use &lt;tt&gt;setjmp&lt;/tt&gt; and &lt;tt&gt;longjmp&lt;/tt&gt; for exception handling, and how to write a stack walker in C. And yes, I do know that I'm going to hell for using &lt;tt&gt;setjmp&lt;/tt&gt; and &lt;tt&gt;longjmp&lt;/tt&gt;.&lt;/li&gt;
&lt;li&gt;In 7th grade, I &lt;a href="http://www.mathleague.com/reports/2002_03/grade678/VA_7.HTM"&gt;tied for 10th place&lt;/a&gt; in the state for the Mathematics League. I personally know 11 of the other people who did at least as well. One of those people I actually referred to earlier in this post. And it's not Haitao.&lt;/li&gt;
&lt;li&gt;I was once in a fight in 7th grade. The fight happened like this. Step 1: I was knocked into a bench in a locker room. Step 2: I was knocked over this bench into the lockers in same locker room. Fight over. I don't think I actually bled, but it hurt, IIRC.&lt;/li&gt;
&lt;li&gt;For my senior prom, I went with four girls, for a definition of "went" more exclusive than "went in the same dinner/limo group" (there were actually around 12 or so girls in our group). Incidentally, all four girls were Asian, which is fairly representative of the ethnic makeup of my high school (or at least among those I knew).&lt;/li&gt;
&lt;li&gt;Because of my speech impediment (or so I've heard), I've been fairly often asked if I'm British. Technically speaking, it's not a lisp&amp;mdash;my problems are with r's, not s's. I wonder if people here think I sound British (I certainly don't)?&lt;/li&gt;
&lt;/ol&gt;&lt;p&gt;
Time for poking people:
&lt;/p&gt;&lt;dl&gt;
&lt;dt&gt;&lt;a href="http://www.rumblingedge.com/"&gt;Gary Kwong&lt;/a&gt;&lt;/dt&gt;&lt;dd&gt;Fellow intern and bug triager, I wonder what he's done.&lt;/dd&gt;
&lt;dt&gt;&lt;a href="http://sid0.blogspot.com/"&gt;Siddharth Agarwal&lt;/a&gt;&lt;/dt&gt;&lt;dd&gt;His blog has been pretty empty.&lt;/dd&gt;
&lt;dt&gt;&lt;a href="http://blog.davidbienvenu.org/"&gt;David Bienvenu&lt;/a&gt;&lt;/dt&gt;&lt;dd&gt;He needs the excuse to blog.&lt;/dd&gt;
&lt;dt&gt;&lt;a href="http://www.squarefree.com/"&gt;Jesse Ruderman&lt;/a&gt;&lt;/dt&gt;&lt;dd&gt;For finding so many typos in my blog postings.&lt;/dd&gt;
&lt;dt&gt;&lt;a href="http://shawnwilsher.com/"&gt;Shawn Wilsher&lt;/a&gt;&lt;/dt&gt;&lt;dd&gt;&lt;tt&gt;mozStorage&lt;/tt&gt; writer, and one with whom I've spent some time trying to debug a TB news problem&amp;hellip;&lt;/dd&gt;
&lt;dt&gt;&lt;a href="http://viper.haque.net/~timeless/blog"&gt;timeless&lt;/a&gt;&lt;/dt&gt;&lt;dd&gt;Because&amp;hellip;he's awesome?&lt;/dd&gt;
&lt;dt&gt;Saul (whose blog is inaccessible)&lt;/dt&gt;&lt;dd&gt;Because it doesn't take a computer scientist to know the problem with pyramid schemes.&lt;/dd&gt;
&lt;/dl&gt;&lt;p&gt;
Hey look, I got through the entire post without mentioning demorkification! Oh cra&amp;mdash;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-6764056945751113166?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/6764056945751113166/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=6764056945751113166' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/6764056945751113166'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/6764056945751113166'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2009/01/seven-things-you-didnt-know-about-me.html' title='Eight things you didn&apos;t know about me'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-890558068755443590</id><published>2009-01-13T13:53:00.007-05:00</published><updated>2009-01-13T18:20:21.131-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='opinions'/><title type='text'>A problem with feature requests</title><content type='html'>Among many of my secret lives, I personally follow several mailing lists, newsgroups, and technology updates for reasons beyond the scope of this posting. I follow the WHATWG mailings list somewhat closely. One issue that has been raging quite ferociously recently was whether or not to include RDFa, eRDF, RDF, or whatever into HTML 5.
&lt;/p&gt;&lt;p&gt;
Many of my readers, especially those for whom RDF is immediately associated with cumbersome, inflexible APIs that should be rid from programs, are probably wondering why one would want to include such a specification into HTML 5. Don't worry, you're not alone. Even Ian Hickson, the editor of the HTML 5 spec, is having problems trying to figure out why. To whit:
&lt;/p&gt;
&lt;blockquote&gt;
One of the outstanding issues for HTML5 is the question of whether HTML5 should solve the problem that RDFa solves, e.g. by embedding RDFa straight into HTML5, or by some other method.

Before I can determine whether we should solve this problem, and before I can evaluate proposals for solving this problem, I need to learn what the problem is.
&lt;/blockquote&gt;
&lt;p&gt;
A bit of context: earlier in 2008, there was a previous thread about RDF measuring well in excess of 100 messages. Being a good editor, Hixie asked in this message about what problems it was trying to solve. The response? Seventy-three email messages, most of which promptly ignored the issue. Many sub-discussions centered around things such as quality of search engines, how it should be implemented, etc. The idea of trying to figure out why people should use it got lost in the wind.
&lt;/p&gt;&lt;p&gt;
And to me, this signifies a problem. One of the most important questions when deciding whether or not to include a feature is &lt;i&gt;why&lt;/i&gt;. And it seems to me that this question is the one that is least pondered by proponents of new features. The answer is often some variant of "it's obvious" or a description of what the feature does. The last bit is like trying to answer a question of "Why do you want to put a door in the wall here?" with an answer "Because we can have quicker access to the other side of the wall." At first glance, it's acceptable, but in reality, it doesn't justify the feature (why then, do you want quicker access?).
&lt;/p&gt;&lt;p&gt;
There are more instances where I've seen this. One of them that aggravates me the most is the proposal to include closures in Java. There are several conflating issues in the entire controversy, so here's some background. There are three proposals for closures: CICE (usually with ARM included), which really isn't a closures proposal, more of a "let's decrease anonymous inner class verbosity;" FCM, a "lite" version of closures which basically just makes methods first-class objects, and BGGA, which is full-blown closures support. By now, when people refer to a closure proposals, it's the full BGGA closure; FCM has all but disappeared, and CICE+ARM is generally only mentioned as a compromise opportunity.
&lt;/p&gt;&lt;p&gt;
The BGGA proposal can be viewed as roughly comprising three parts. The first is the idea of function pointers, the second is the ability to convert a function pointer to an interface so long as the interface has a single method and the function signatures match, and the third is (in a nutshell) the ability to create control structures. I have seen a lot of valid arguments on both sides for the first two portions, but the third portion still mystifies. Yet it is this third part which truly differentiated BGGA from FCM, and it is there where almost all of BGGA's complexity comes into play.
&lt;/p&gt;&lt;p&gt;
Just one catch: Why do you want or need the ability to create control structures? One control structure almost invariably pops out: the &lt;tt&gt;with&lt;/tt&gt; construct, or (in other words) syntactic sugar for a &lt;tt&gt;try { ... } finally { ... }&lt;/tt&gt; block. This is where the ARM portion of the CICE proposal comes in&amp;mdash;it would add the only construct desired by any sizable amount of people. Okay then, that's one, how about another? And therein lies the problem. It's hard to come up with other examples. Proponents always mention the &lt;tt&gt;with&lt;/tt&gt; and assume that everything else is evident. For something that is definitely going to increase complexity and difficulty in programming greatly (e.g., &lt;tt&gt;return 5&lt;/tt&gt; would behave differently, in some cases, than &lt;tt&gt;return 5;&lt;/tt&gt;), you better have more than one, easily manually-addable use case.
&lt;/p&gt;&lt;p&gt;
Another problem is underestimation of what it takes to include a feature. &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=229686#c41"&gt;A 16 MB extension for a 6 MB program&lt;/a&gt; is completely untenable, so let's include it into the 6MB program! How about introducing multiple tokens as a CSS unit token (think about it for a moment...)?
&lt;/p&gt;&lt;p&gt;
The final thing that irks me with feature requests is the importance with which people attach. The news that Java 7 will definitely not be containing closures (adding a complex, controversial feature to a specification already behind schedule isn't exactly tenable) seems to have been treated with proclamations that Java is dead or that it will die as a result. I somehow can't imagine that millions of Java programmers will suddenly switch because Java won't have closures&amp;mdash;indeed, no programming languages ever in the Top 5 have ever had closures.
&lt;/p&gt;&lt;p&gt;
Similarly, the announcement that JavaScript had been disabled in trunk Thunderbird builds for email was met with a few vocal opponents complaining (I don't know of any non-Mozilla product that ever &lt;i&gt;had&lt;/i&gt; support for JavaScript in email to begin with). And most of my readers are no doubt aware of the furor that &lt;a href="http://bugzilla.mozilla.org/show_bug.cgi?id=18574"&gt;removing MNG support&lt;/a&gt; caused.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-890558068755443590?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/890558068755443590/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=890558068755443590' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/890558068755443590'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/890558068755443590'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2009/01/problem-with-feature-requests.html' title='A problem with feature requests'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-1223015794484469288</id><published>2009-01-04T23:19:00.002-05:00</published><updated>2009-01-04T23:40:59.885-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='news'/><category scheme='http://www.blogger.com/atom/ns#' term='mailnews'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>News submodule roadmap</title><content type='html'>It's the new year! As the owner of the NNTP submodule of Mozilla Mailnews, let me lay out a rough roadmap on what I seek to get done on the submodule for some time to come.
&lt;/p&gt;&lt;p&gt;
For Thunderbird 3 (and SeaMonkey 2.0, of course), the &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=16913"&gt;ability to filter on any header&lt;/a&gt; is probably in and of itself sufficient to make people happy. The only other change I think reasonable to make it into 3.0 is &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=311774"&gt;making get messages actually get messages&lt;/a&gt;. More substantive changes will have to wait for later.
&lt;/p&gt;&lt;p&gt;
For 3.next, I think the most important thing to tackle is news URIs, as they are incredibly broken for most practical purposes. High importance is ensuring that &lt;tt&gt;news://&amp;lt;server&amp;gt;/message-id&lt;/tt&gt; and &lt;tt&gt;news://&amp;lt;server&amp;gt;/group&lt;/tt&gt; are usable via external protocol handlers and command line. Making server-less URIs work would also be delightful.
&lt;/p&gt;&lt;p&gt;
Also important to me is more protocol support: &lt;tt&gt;AUTHINFO SASL&lt;/tt&gt;, &lt;tt&gt;STARTTLS&lt;/tt&gt;, and &lt;tt&gt;CAPABILITIES&lt;/tt&gt; in roughly that order. While I'm at it, I'll also clean up some of the cruft surrounding &lt;tt&gt;nsNNTPProtocol&lt;/tt&gt;. I would also like to have a lot more comprehensive tests for 3.next.
&lt;/p&gt;&lt;p&gt;
What's not on my plate is more binaries support. Specifically, I'll not be working on &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=60981"&gt;combine-and-decode&lt;/a&gt; (as useful as it is in general). Offline love I think will have to wait for 3.next.next before I can find some time to work on it.
&lt;/p&gt;&lt;p&gt;
Finally, I'd like to get all of the news bugs neatly triaged into categories. As of me writing this message, I count some 150 bugs in the &lt;a href="https://bugzilla.mozilla.org/buglist.cgi?action=wrap&amp;bug_file_loc=&amp;bug_file_loc_type=allwordssubstr&amp;bug_id=&amp;bugidtype=include&amp;chfieldfrom=&amp;chfieldto=Now&amp;chfieldvalue=&amp;email1=&amp;email2=&amp;emailassigned_to1=1&amp;emailassigned_to2=1&amp;emailqa_contact2=1&amp;emailreporter2=1&amp;emailtype1=exact&amp;emailtype2=exact&amp;field0-0-0=product&amp;field0-0-1=product&amp;field0-0-2=component&amp;keywords=&amp;keywords_type=allwords&amp;long_desc=&amp;long_desc_type=substring&amp;resolution=---&amp;short_desc=&amp;short_desc_type=allwordssubstr&amp;status_whiteboard=&amp;status_whiteboard_type=allwordssubstr&amp;type0-0-0=equals&amp;type0-0-1=equals&amp;type0-0-2=anywords&amp;value0-0-0=Thunderbird&amp;value0-0-1=MailNews%20Core&amp;value0-0-2=News%20IMAP%20POP%20SMTP&amp;votes=&amp;product=Core&amp;product=MailNews%20Core&amp;product=Thunderbird&amp;component=Networking%3A%20News&amp;bug_status=UNCONFIRMED&amp;bug_status=NEW&amp;bug_status=ASSIGNED&amp;bug_status=REOPENED"&gt;Networking: News component&lt;/a&gt;, and I'm sure there are a few more bugs floating around in other components that I haven't found. I'd like to remove dupes and categorize all of these into specific categories (at least locally). And, more importantly, remove the cruft of these bugs. Many of the NEW bugs should be UNCO because no one ever really confirmed them.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-1223015794484469288?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/1223015794484469288/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=1223015794484469288' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/1223015794484469288'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/1223015794484469288'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2009/01/news-submodule-roadmap.html' title='News submodule roadmap'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-6847138694771769288</id><published>2008-12-08T14:53:00.002-05:00</published><updated>2008-12-08T15:47:11.870-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='bug413260'/><category scheme='http://www.blogger.com/atom/ns#' term='mailnews'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>A new address book public repo</title><content type='html'>Veteran readers may recall &lt;a href="http://quetzalcoatal.blogspot.com/2008/08/great-address-book-rewrite-now-on-your.html"&gt;my old posting&lt;/a&gt; of the &lt;a href="http://hg.mozilla.org/users/Pidgeot18_gmail.com/ab_rewrite/"&gt;address book rewrite repository&lt;/a&gt;. My original idea did not turn out so well&amp;mdash;I had about 5 heads at one point because of review comments and simultaneous patches. Not to mention the history (though fun to look at) was a pain to try to follow.
&lt;/p&gt;&lt;p&gt;
I therefore created a brand new version of the repo. In the new one, I'm structuring things a bit differently. Rather than trying to keep one branch going, I'm going to have multiple named branches.
&lt;/p&gt;&lt;p&gt;
The first one is &lt;tt&gt;default&lt;/tt&gt;, or the comm-central import. The tip of the &lt;tt&gt;default&lt;/tt&gt; branch is the tip of comm-central when I imported that.
&lt;/p&gt;&lt;p&gt;
One set of branches is the experimental let's-add-new-address-book-type branches. I already have a prototype SQL AB developement branch, &lt;tt&gt;sqlab&lt;/tt&gt;. When I start experimenting with Evolution support, there will be an &lt;tt&gt;evolutionab&lt;/tt&gt; branch. And so forth and so on. If you have custom AB types that you want to publish, email me a bundle and you'll get your very own named branch in this repo (for as long as it exists).
&lt;/p&gt;&lt;p&gt;
Another set of branches is the bugs I'll be working on. These branches will tangle around each other as a fix one bug or another, since some bugs depend on other bugs. I'll try to keep a &lt;tt&gt;last-idl&lt;/tt&gt; tag up-to-date such that any changes on a branch after the last branch point or this tag (which ever comes later) do not change any IDL files.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-6847138694771769288?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/6847138694771769288/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=6847138694771769288' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/6847138694771769288'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/6847138694771769288'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2008/12/new-address-book-public-repo.html' title='A new address book public repo'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-7174607290916154283</id><published>2008-12-05T19:34:00.003-05:00</published><updated>2008-12-05T20:16:41.878-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='bug413260'/><category scheme='http://www.blogger.com/atom/ns#' term='mailnews'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>Debugging palmsync</title><content type='html'>Also known as "An Exercise in Frustration and Self-Torture." As part of preliminary work for &lt;a href="http://quetzalcoatal.blogspot.com/2008/10/mailing-list-sanity.html"&gt;mailing list sanity&lt;/a&gt;, I need to make some changes to palmsync. It's not like I haven't &lt;a href="http://hg.mozilla.org/comm-central/rev/b9967a04b9bd"&gt;changed this code before&lt;/a&gt;, but the work I need to do here is more than a simple &lt;tt&gt;s/a/b/g&lt;/tt&gt; command. So, time to fire up the 'ole Windows partition and test.
&lt;/p&gt;&lt;p&gt;
Pain point number one: trying to get necessary stuff to work. First you need the Palm CDK. After getting that (complete with throw-away email account), you now need something to test the other end of palmsync&amp;mdash;the PDA. Back at home, I have a monstrously old PDA whose functioning I would not guarantee. Fortunately, there's a nice palm emulator which I can install.
&lt;/p&gt;&lt;p&gt;
Now I just needed to Hotsync. Another non-trivial task, as my computer does not have two serial ports between which I could slap a cable (it doesn't even have one serial port&amp;hellip;). After figuring that part out comes another problem. The palmsync failed to register. "No problem," I say, "I'll just reinstall it." And it still fails. Turns out that the necessary registry key used %ProgramFiles%. Don't you just love the registry?
&lt;/p&gt;&lt;p&gt;
That's all set up, so I click the sync button and&amp;hellip; it fails. That wouldn't seem to be a problem, as I've already had to diagnose many errors to get to this point. Except I seem to have hit the critical unsupported limit:
&lt;/p&gt;&lt;ul&gt;
&lt;li&gt;I'm using a custom-built version of a program (that's really not saying much, though)&lt;/li&gt;
&lt;li&gt;Said program is built with a technically unsupported compiler (VC 7.1)&amp;hellip;&lt;/li&gt;
&lt;li&gt;&amp;hellip;which is unsupported on my OS (Windows Vista)&amp;hellip;&lt;/li&gt;
&lt;li&gt;&amp;hellip;using and old SDK (Windows 2000)&amp;hellip;&lt;/li&gt;
&lt;li&gt;&amp;hellip;and it's a 32-bit on a 64-bit machine.&lt;/li&gt;
&lt;li&gt;The emulator officially supports neither Vista&amp;hellip;&lt;/li&gt;
&lt;li&gt;&amp;hellip;nor a 64-bit machine.&lt;/li&gt;
&lt;li&gt;Hotsync doesn't officially support the 64-bit machine either.&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;
Debugging this proved to be a pain. The conduit logs didn't seem to function. Trying to get the VS debugger's nose in there proved a futile task until I did a world rebuild. Which takes 5.5 hours on my system (blame ext3). I finally got the debugger in, only to find a barrier thanks to COM. Some more work managed to shove the debugger in there and finally allowed me to pinpoint the error. Sorry, errors. Suffice to say that it's a mess in there.
&lt;/p&gt;&lt;p&gt;
So how did I get palmsync under the debugger? First off, let me describe my VS setup: I have mailnews as a project in VS (an NMake project, to be precise), set up such that clicking the build button actually builds in the mailnews directory (thanks to some custom hacking around msys). Based off of some external documentation for debugging Palm conduits, I created a new configuration (called "Palmsync," imaginative little me) whose command ran the hotsync program with &lt;tt&gt;-ic&lt;/tt&gt; as its sole argument. Popping that under the debugger the normal way in VS allowed me to break on Palm's side of the conduit. On the Mozilla side, debugging is done by firing up an instance of TB under the debugger like normal, initiating the hotsync, and breaking and debugging as normal.
&lt;/p&gt;&lt;p&gt;
Modulo the fact that palmsync is in reality extremely fragile and apparently too buggy to support my nearly-empty testing profiling, it works now. Time to fix some more old bugs solely to be able to get to the point where I can work on the preliminary work for mailing list sanity.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-7174607290916154283?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/7174607290916154283/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=7174607290916154283' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/7174607290916154283'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/7174607290916154283'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2008/12/debugging-palmsync.html' title='Debugging palmsync'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-4493614335813835890</id><published>2008-11-28T21:33:00.002-05:00</published><updated>2008-11-28T21:45:52.661-05:00</updated><title type='text'>ABC (meme)</title><content type='html'>Everyone's doing this, it seems.
&lt;/p&gt;&lt;dl&gt;
&lt;dt&gt;A&lt;/dt&gt;&lt;dd&gt;&lt;a href="http://msdn.microsoft.com/en-us/library/ms879772.aspx"&gt;ActiveSync&lt;/a&gt;&lt;/dd&gt;
&lt;dt&gt;B&lt;/dt&gt;&lt;dd&gt;&lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=71728"&gt;Bug 71728&lt;/a&gt;&lt;/dd&gt;
&lt;dt&gt;C&lt;/dt&gt;&lt;dd&gt;&lt;a href="http://www.w3.org/TR/CSS21/"&gt;CSS 2.1 spec&lt;/a&gt;&lt;/dd&gt;
&lt;dt&gt;D&lt;/dt&gt;&lt;dd&gt;&lt;a href="http://developer.mozilla.org/en/Pork"&gt;Pork (on MDC)&lt;/a&gt;&lt;/dd&gt;
&lt;dt&gt;E&lt;/dt&gt;&lt;dd&gt;&lt;a href="http://en.wikipedia.org/wiki/Hiragana"&gt;Hiragana (Wikipedia)&lt;/a&gt;&lt;/dd&gt;
&lt;dt&gt;F&lt;/dt&gt;&lt;dd&gt;&lt;a href="http://failblog.org/"&gt;FailBlog&lt;/a&gt;&lt;/dd&gt;
&lt;dt&gt;G&lt;/dt&gt;&lt;dd&gt;&lt;a href="http://groups.google.com/group/comp.lang.java.programmer"&gt;&lt;tt&gt;comp.lang.java.programmer&lt;/tt&gt;&lt;/a&gt;&lt;/dd&gt;
&lt;dt&gt;H&lt;/dt&gt;&lt;dd&gt;&lt;a href="http://www.onemanga.com/Hunter_X_Hunter/"&gt;Hunter × Hunter (manga)&lt;/a&gt;&lt;/dd&gt;
&lt;dt&gt;I&lt;/dt&gt;&lt;dd&gt;&lt;a href="http://icanhascheezburger.com/"&gt;I can has cheezburger?&lt;/a&gt;&lt;/dd&gt;
&lt;dt&gt;J&lt;/dt&gt;&lt;dd&gt;&lt;a href="http://java.sun.com/docs/books/jls/third_edition/html/j3TOC.html"&gt;Java Language Specification, Third Edition&lt;/a&gt;&lt;/dd&gt;
&lt;dt&gt;K&lt;/dt&gt;&lt;dd&gt;&lt;a href="http://openjdk.java.net/"&gt;OpenJDK&lt;/a&gt;&lt;/dd&gt;
&lt;dt&gt;L&lt;/dt&gt;&lt;dd&gt;&lt;a href="http://icanhascheezburger.com/"&gt;I can has cheezburger?&lt;/a&gt;&lt;/dd&gt;
&lt;dt&gt;M&lt;/dt&gt;&lt;dd&gt;&lt;a href="http://machinepolitics.pbwiki.com/"&gt;[ School resource ]&lt;/a&gt;&lt;/dd&gt;
&lt;dt&gt;N&lt;/dt&gt;&lt;dd&gt;&lt;a href="http://www.nextbus.com/predictor/stopSelector.jsp"&gt;[ School resource ]&lt;/a&gt;&lt;/dd&gt;
&lt;dt&gt;O&lt;/dt&gt;&lt;dd&gt;&lt;a href="http://www.onemanga.com/Bleach/"&gt;Bleach (manga)&lt;/a&gt;&lt;/dd&gt;
&lt;dt&gt;P&lt;/dt&gt;&lt;dd&gt;&lt;a href="http://planet.mozilla.org/"&gt;Planet Mozilla&lt;/a&gt;&lt;/dd&gt;
&lt;dt&gt;Q&lt;/dt&gt;&lt;dd&gt;&lt;a href="http://questionablecontent.net/"&gt;Questionable Content (webcomic)&lt;/a&gt;&lt;/dd&gt;
&lt;dt&gt;R&lt;/dt&gt;&lt;dd&gt;&lt;a href="http://www.rawranime.com/"&gt;[ Anime video site ]&lt;/a&gt;&lt;/dd&gt;
&lt;dt&gt;S&lt;/dt&gt;&lt;dd&gt;&lt;a href="http://slashdot.org/"&gt;/. (Slashdot)&lt;/a&gt;&lt;/dd&gt;
&lt;dt&gt;T&lt;/dt&gt;&lt;dd&gt;&lt;a href="http://tsquare.gatech.edu/"&gt;[ School resource ]&lt;/a&gt;&lt;/dd&gt;
&lt;dt&gt;U&lt;/dt&gt;&lt;dd&gt;&lt;a href="http://www.theregister.co.uk/odds/bofh/"&gt;BOFH (web... column)&lt;/a&gt;&lt;/dd&gt;
&lt;dt&gt;V&lt;/dt&gt;&lt;dd&gt;&lt;a href="http://www.vgcats.com/comics/?strip_id=212"&gt;VGCats (webcomic)&lt;/a&gt;&lt;/dd&gt;
&lt;dt&gt;W&lt;/dt&gt;&lt;dd&gt;&lt;a href="http://worsethanfailure.com/"&gt;Worse Than Failure &lt;/a&gt;&lt;/dd&gt;
&lt;dt&gt;X&lt;/dt&gt;&lt;dd&gt;&lt;a href="http://xkcd.com/"&gt;XKCD (webcomic)&lt;/a&gt;&lt;/dd&gt;
&lt;dt&gt;Y&lt;/dt&gt;&lt;dd&gt;&lt;a href="http://www.youtube.com/watch?v=3yfNTIsUnZw&amp;feature=PlayList&amp;p=2F2D10CE3CDB8AA7&amp;index=40"&gt;YouTube (specifically, part of a walkthrough for Golden Sun)&lt;/a&gt;&lt;/dd&gt;
&lt;dt&gt;Z&lt;/dt&gt;&lt;dd&gt;&lt;a href="http://www.escapistmagazine.com/videos/view/zero-punctuation"&gt;Zero Punctation (web video columnist)&lt;/a&gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-4493614335813835890?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/4493614335813835890/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=4493614335813835890' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/4493614335813835890'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/4493614335813835890'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2008/11/abc-meme.html' title='ABC (meme)'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-1212906113373494715</id><published>2008-11-28T21:21:00.002-05:00</published><updated>2008-11-28T21:31:44.204-05:00</updated><title type='text'>A public service announcement</title><content type='html'>If you are a programmer who is writing code that will be released to the world as open source, this announcement is for you.
&lt;/p&gt;&lt;p&gt;
If your code will be seen by the world at large, one of your first tasks should be to write documentation. Document all functions as soon as you write them (before is also helpful). Provide samples on how to use code as soon as you finish a module (or earlier, if possible). Do not wait until your 5.0 release. Do not wait until your 1.0 release. Do not even wait until your 0.5 release. Do it as you write your code. The sooner, the better.
&lt;/p&gt;&lt;p&gt;
Users of your code will thank you profusely if you manage to provide comprehensive documentation along with the actual binaries of reference builds. &lt;b&gt;DO NOT&lt;/b&gt; make them have to scour source code or bug developers in IRC channels to figure out a simple task like "get all cards from an address book" or "how do I write a synchronization conduit?"&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5947958124349996271-1212906113373494715?l=quetzalcoatal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quetzalcoatal.blogspot.com/feeds/1212906113373494715/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5947958124349996271&amp;postID=1212906113373494715' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/1212906113373494715'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5947958124349996271/posts/default/1212906113373494715'/><link rel='alternate' type='text/html' href='http://quetzalcoatal.blogspot.com/2008/11/public-service-announcement.html' title='A public service announcement'/><author><name>Joshua Cranmer</name><uri>http://www.blogger.com/profile/02760318962075959780</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5947958124349996271.post-6153986062665280069</id><published>2008-11-05T20:22:00.001-05:00</published><updated>2008-11-05T20:23:38.562-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='opinions'/><category scheme='http://www.blogger.com/atom/ns#' term='politics'/><title type='text'>Why I bemoan Tuesday</title><content type='html'>On Tuesday, November 4, the United States held its presidential and congressional elections. As I am sure most of my readers know by now, the outcome was to elect the Democratic nominee, Obama, as well as increase Democratic gains in the House and Senate (although probably not filibuster-proof). To many, this news was greeted with elation; I do not count myself as one of those people. Let me explain why.
&lt;/p&gt;&lt;p&gt;
First and foremost, I dislike the results because the United States missed out on an opportune time to make up for its polarization. A Republican presidential victory would leave the government divided, a prospect I feel would be ideal for the government. In lieu of a true multiparty system, we have two major parties who will, by necessary of definition, tend to stake opposite sides of issues and, furthermore, stake them at opposite ends of the spectrum. Giving the entire government solely to one party&amp;mdash;actually more, giving it to one with strong enough majorities to evade some mild dissenters&amp;mdash;would have the effect of hindering debate.
&lt;/p&gt;&lt;p&gt;
During the Constitutional Convention, one issue that the drafters considered was the tyranny of the majority. One of the Federalist papers, &lt;a href="http://en.wikisource.org/wiki/The_Federalist_Papers/No._10"&gt;No. 10&lt;/a&gt;, dealt with this topic, mostly by the argument that a larger region would have more diverse parties. However, the two-party system (among other factors) tends to dilute the power of size; another mechanism should be present.
&lt;/p&gt;&lt;p&gt;
This second mechanism is the tortuous process of law creation. A common criticism of Congress is that it is slow to act. Yet why should lack of speed of action be a bad thing? If one wishes to expedite a bill, one has to cut something out. Almost all of the time it takes to pass a bill is spent on debate. So improving Congress's reaction time would mean that one has to debate less and therefore rely on the bill being correct as it is. We don't &lt;a href="http://en.wikipedia.org/wiki/Sarbanes-Oxley_Act"&gt;regret&lt;/a&gt; &lt;a href="http://en.wikipedia.org/wiki/National_Industrial_Recovery_Act"&gt;any&lt;/a&gt; &lt;a href="http://en.wikipedia.org/wiki/Department_of_Homeland_Security"&gt;rushed&lt;/a&gt; &lt;a href="http://en.wikipedia.org/wiki/PATRIOT_Act"&gt;actions&lt;/a&gt;, do we?
&lt;/p&gt;&lt;p&gt;
A divided government would force moderation as the Democrats would not have enough votes to override a Republican veto, so a bill would have to amenable to both (and therefore moderate in effect) to pass. With a Democratic president, the extreme positions could show up in laws more easily without general discourse. The large gains in Congress give the Democrats more ability to cut off dissent; thankfully, though, the Senate l
