Searching Apple’s Mailing Lists

You know about Apple’s mailing lists, right? No? Wellll…okay, I forgive you. In fact, I’m putting helpful links at the bottom of this post, so you can go check them out for yourself (or have an easy way to get back to them, if you are already in the know).

Signing up for these lists today is a good start. But what about searching the past archives to make sure a question you want to ask hasn’t already been asked? (You are always going to do that right? No? This time, I don’t forgive you.)

The trouble is, Apple’s Web-based mailing list search is often excruciatingly slow, and doesn’t work very well anyway. If you want detailed, customizable searches, it seems to me a good technique is to cut out the middleman: download all the archive files yourself, and then search them with your own text editor. BBEdit and CodeWarrior, for example, can do directory searches and grep searches. Or you could use Apple’s content indexing and search from the Finder itself. Now, with circa 56,000 files in the current cocoa-dev archive, for example, this won’t be lightning fast until we have quantum hard drives and G9 processors, but it’s not that much slower than Apple’s site, and my impression is the search will be much more accurate.

So how do you download all these files? I seem to recall, at some point in the past, making the herculean effort to download all the files by hand. This was when the archives were stored as a single file per day. I went to each archive Web page and saved each link to a file. One after another after another. Less than pleasant. Now, the archives appear to be stored as a single file per post, 30+ files for busy days!

A better solution involves an application you already have on your system, and a little help.

If you asked most Unix-heads what to use to download links from a Web page, they’ll mention wget. Mac OS X doesn’t have wget, but it has something similar: curl. To find out about curl, you can go to its official Web site, at http://curl.haxx.se/.

One thing you can find there is helpful sample scripts. The one we want is getlinks.pl, which extracts all the links from an HTML page.

So we’re done? Not quite. We still need to point getlinks.pl at the right Web pages. For that, I (with gratefully accepted help from Dan Shiovitz and Gunther Schmidl) wrote my own Perl script, applelist.pl. It and getlinks.pl (be sure to change the name of that script file, since it downloads as “getlinks.txt”) should be put in the directory where you want the downloaded mailing lists to go, and applelist.pl should be run from there with the lists you want to download as command-line arguments.

For large lists like cocoa-dev and carbon-development, the download process will take 4+ hours on a broadband connection, or at least that was my experience, and will take up circa 250 Mb. of disk space.

Note the scripts have been written to be run repeatedly: they will check for the existence of downloaded files before downloading them again. Have a look to see for yourself how it works.

Some parting thoughts:
– Yes, this solution is not for everyone, esp. people with dial-up Internet connections, little need for repeated mailing list searches, or little patience.
– The script could probably use some improvement and augmentation. Feel free to improve it yourself, since it’s in the public domain.
– One improvement would be to strip out all the email header text when saving the files, and merge the files together into single-day, single-week, or single-month files. This would probably improve search speed quite a bit, at the expense of longer first-time downloading.

Enjoy!

All Apple Mailing Lists:
http://lists.apple.com/mailman/listinfo

carbon-development

Web Recent Threads:
http://lists.apple.com/mhonarc/carbon-development/threads.html
Search:
http://search.lists.apple.com/carbon-development
Text Archives:
http://lists.apple.com/archives/carbon-development/

cocoa-dev

Web Recent Threads:
http://lists.apple.com/mhonarc/cocoa-dev/threads.html
Search:
http://search.lists.apple.com/cocoa-dev
Text Archives:
http://lists.apple.com/archives/cocoa-dev/