Searching Apple’s Mailing Lists

You know about Apple’s mailing lists, right? No? Wellll…okay, I forgive you. In fact, I’m putting helpful links at the bottom of this post, so you can go check them out for yourself (or have an easy way to get back to them, if you are already in the know).

Signing up for these lists today is a good start. But what about searching the past archives to make sure a question you want to ask hasn’t already been asked? (You are always going to do that right? No? This time, I don’t forgive you.)

The trouble is, Apple’s Web-based mailing list search is often excruciatingly slow, and doesn’t work very well anyway. If you want detailed, customizable searches, it seems to me a good technique is to cut out the middleman: download all the archive files yourself, and then search them with your own text editor. BBEdit and CodeWarrior, for example, can do directory searches and grep searches. Or you could use Apple’s content indexing and search from the Finder itself. Now, with circa 56,000 files in the current cocoa-dev archive, for example, this won’t be lightning fast until we have quantum hard drives and G9 processors, but it’s not that much slower than Apple’s site, and my impression is the search will be much more accurate.

So how do you download all these files? I seem to recall, at some point in the past, making the herculean effort to download all the files by hand. This was when the archives were stored as a single file per day. I went to each archive Web page and saved each link to a file. One after another after another. Less than pleasant. Now, the archives appear to be stored as a single file per post, 30+ files for busy days!

A better solution involves an application you already have on your system, and a little help.

If you asked most Unix-heads what to use to download links from a Web page, they’ll mention wget. Mac OS X doesn’t have wget, but it has something similar: curl. To find out about curl, you can go to its official Web site, at http://curl.haxx.se/.

One thing you can find there is helpful sample scripts. The one we want is getlinks.pl, which extracts all the links from an HTML page.

So we’re done? Not quite. We still need to point getlinks.pl at the right Web pages. For that, I (with gratefully accepted help from Dan Shiovitz and Gunther Schmidl) wrote my own Perl script, applelist.pl. It and getlinks.pl (be sure to change the name of that script file, since it downloads as “getlinks.txt”) should be put in the directory where you want the downloaded mailing lists to go, and applelist.pl should be run from there with the lists you want to download as command-line arguments.

For large lists like cocoa-dev and carbon-development, the download process will take 4+ hours on a broadband connection, or at least that was my experience, and will take up circa 250 Mb. of disk space.

Note the scripts have been written to be run repeatedly: they will check for the existence of downloaded files before downloading them again. Have a look to see for yourself how it works.

Some parting thoughts:
– Yes, this solution is not for everyone, esp. people with dial-up Internet connections, little need for repeated mailing list searches, or little patience.
– The script could probably use some improvement and augmentation. Feel free to improve it yourself, since it’s in the public domain.
– One improvement would be to strip out all the email header text when saving the files, and merge the files together into single-day, single-week, or single-month files. This would probably improve search speed quite a bit, at the expense of longer first-time downloading.

Enjoy!

All Apple Mailing Lists:
http://lists.apple.com/mailman/listinfo

carbon-development

Web Recent Threads:
http://lists.apple.com/mhonarc/carbon-development/threads.html
Search:
http://search.lists.apple.com/carbon-development
Text Archives:
http://lists.apple.com/archives/carbon-development/

cocoa-dev

Web Recent Threads:
http://lists.apple.com/mhonarc/cocoa-dev/threads.html
Search:
http://search.lists.apple.com/cocoa-dev
Text Archives:
http://lists.apple.com/archives/cocoa-dev/

Exchange Files Gotcha

Summary: FSpExchangeFiles() will happily exchange two files even if one is already open for writing, which can lead to some bad behavior.

In your cool whiz-bang application, you’re implementing “Save As” with the following steps:

  • Save document data to a temporary file.
  • If file exists at “Save As” location, swap the temporary file with the real file.
  • Delete the temporary file.

This works swimmingly — unless the real file is open for writing in another application. Let’s call that app “BusyBody”.

You’d think, in that case, you’d get an OS error when you attempt to swap files, wouldn’t you? The filesystem should prevent such access when a file’s open…shouldn’t it?

Turns out it doesn’t prevent such access. It will happily swap open-for-writing files all the live-long day. You won’t get an error until you get to the last step. Then, the OS will tell you the temp file is “busy” (fBsyErr), because, as far as it’s concerned, the temp file is open in BusyBody.

So, if your app is handling errors correctly, at that point it will tell the user “Can’t do that, file’s busy.” But the damage has already been done.

When the user tries to “Save As” again to the same file — because she’s an idiot, or she’s a QA tester — the save succeeds. Why? Since BusyBody thinks it’s got that temp file open, the real file is free and clear for writing. This might not be a problem, but if BusyBody attempts to save again, instead of saving to where the user thinks it should save to, it will save to an invisible temp file. Oops!

The easiest solution to this that I can see is that, before you attempt such a switch, try to opening for writing the real file. If there’s an error with that open step, stop and signal the user, before any harm is done. That works as it should!

I wonder how many developers do the “right thing” by using FSpExchangeFiles(), but fail to check for open files first? Hopefully not many.

Note: I have not tried this with FSExchangeObjects(), though I’ll be getting to that. When I know, I’ll update this!