One of the uses I had for original statistics collection was to argue why NNTP support for Thunderbird still matters. During an IRC discussion, it was brought up that August is a poor month for logging since there is a tradition of using that month for vacation. Pulling up the data for the month of October, the last one for which I have this data, indicates that approximately 720,000 messages were posted that month, indicating that August is indeed a poor month for indicating volume.
Have the statistics changed much? Google Groups and Thunderbird are both within .2% absolute difference of the scores I calculated last time (44.02% and 12.3%, respectively). Down the line, things change: Outlook Express had 8.98%, followed by Forte Agent at 8.86%. Live Mail had 2.83% and MT-NewsWatcher had 2.51%. Indeed, the tail is longer, with 20.52% as compared to before.
As my new server has a longer retention time, I no longer wish to use the same script as before. My next goal is to log every header of every message posted this year, so that I may collect more information without having to list everything I need, particularly information useful in determining the user of mail-to-news gateways and information to help identify spamminess of messages. I have lots of ideas for possible analysis of data, but first I want usable data.
2 comments:
Does the Google Groups number include the mass amount of spam that gets posted to Usenet via GG?
Yes. Except with the removal of unknown user agent strings and a certain post-happy bot, these numbers are merely the raw counts of servers.
Post a Comment