Monday, June 7, 2010

Developing new account types, Part 3: Updating folders (part 3)

This series of blog posts discusses the creation of a new account type implemented in JavaScript. Over the course of these blogs, I use the development of my Web Forums extension to explain the necessary actions in creating new account types. I hope to add a new post once every two weeks (I cannot guarantee it, though).

This blog post is a continuation of my previous two posts, which is being broken up into multiple segments to lower the amount of text one has to read in a single sitting. The current step is to actually implement the folder update.

Only new messages

Now that we know how to add messages to the database, we need to figure out how to find the downloaded messages.

It should go without saying that checking to see if you actually need to update the folder should be the first thing you probably want to do in this function. In my extension, I need to download the front page of the specific board and check the topic list to see if it matches what is stored in the database.

For now, at least, I can rely on the forum telling me about the number of replies in a thread (one less than the number of total messages), as this is shown in the thread index of a forum. What I do is grab the reply count that I've seen and subtract that from the number that is listed to get the number of new messages I need to download. Then I need only to look at the last few messages to add them to the database.

At this point, I have two main issues to worry about. First, I am working with paginated return results. That means I actually need to load multiple documents. Second, I am not getting a list of messages, but a list of threads; therefore, I need a database that is associated with threads [8].

The database I use is a simple JSON object that exists for each folder, and so far only has a mapping of threads to the reply count that I've seen; I may give it more in later iterations of this extension.

Pagination is where the trickery in implementation comes in. First, I need to look at the thread index for new messages; if I have seen all of the messages in the last thread, I can stop looking at new pages. Otherwise, I have to grab the next page and continue recursing. Note that it is possible to hit a thread that I've fully seen and still have threads I've not seen: sticky messages can be infrequently updated yet still make it first on my list of messages.

The other issue is when loading threads. The link I end up scraping is to the first page of messages for that thread, which I may already have seen. So I need to skip over pages until I find the page that first has new messages. For now, I'm doing this naïvely by actually loading each page and counting the number of posts rather than trying to deconstruct URLs and calculating where to load. I then need to look at the last set of posts, not the first set, so I calculate the start position and read forwards. Since I'm using querySelectorAll, I get an array of results, so I don't worry about having to throw out a number of iterations; I can just start in the middle when iterating.

Once all of that is implemented, we can then put everything together to make a proper implementation of updateFolder, the function we started implementing a few pages ago. The end result is that, when all is said and done, you can load up the message pane (the last column is the number of messages in the thread): The thread pane after implementation

By comparison, here is an equivalent view of the forum that I loaded this from: The equivalent forum list

Now, I wish to ask you, which user interface would you rather use to view the forum?

Some notes for implementors: be prepared to delete your msf files over and over again. I would recommend tackling the individual components in this order: first build a message, then your protocol object (I found it easier to test when the running tasks were already known to be working), and then start work on tying it all into the database. Leave issues like threading for after the basic stuff is laid out, then tackle determining which messages are new if it's not implicit in what you do (i.e., you don't have a "get new messages" query you can readily use). Pagination should be last: everything is easier to test if you only have a small number of messages you really need to test.

I apologize for the excessive length of this step; this happened to be pretty much the first step where most of the necessary technology had to be used. The next step is to actually be able to display the messages in our database, which should be shorter.

Notes

  1. Kent James and I are both working on developing new account type extensions (he doing an Exchange connector and I this blog series); both of us have identified the narrow-mindedness of the database as an issue. It is therefore possible that my workaround here will not be necessary in the next few versions.

3 comments:

Sevenspade said...

"Now, I wish to ask you, which user interface would you rather use to view the forum?"

This sounds like heresy against the web as a platform.

Anonymous said...

The new account type looks newsgroup like with every post as a single message. I am too lazy to navigate from message to message, that's why I prefer an inline view with many messages in a view where I have just to scroll (but having this account type to see this and automatically seeing new posts would be great).

Anonymous said...

I assume relying on the number of replies in a thread is only a "good enough for a first example" shortcut, right? Most non-trivial forums I inhabit have issues with spammers, leading to frequent post deletions. At the very least you should find the post-id for each post in a thread, and add any posts with unseen ids to the database.

Also, most forums nowadays have feeds for each of their threads and a global one for the forum; I'd look into using those primarily, only falling back to "scraping" this information if feeds aren't available (or don't go back long enough).