I’ve had an average uptime over the last couple of months of approximately 48 hours on my MacBook Pro. This has been frustrating and very hard to diagnose, but I think I nailed it today.
Random “graphics errors”, for lack of a better term, would appear on the screen from time to time. (See picture.)
These looked very software-ish to me, and forcing a window to repaint (resizing it, etc.) would make them go away. Eventually, the entire screen would freeze except for the cursor, which would remain as a pointer or insert bar despite being moved to various parts of the screen. Given that iTunes or a DVD would keep making noise, I guessed that this was not a systemwide freeze, and I discovered that I could ssh in from another machine and run shutdown -h now or reboot as root and usually get a clean shutdown.
I later learned that I could use pmset hibernatemode 1 to force hibernation instead of normal sleep mode, so that I could un-freeze the GUI by logging in remotely and forcing it to hibernate using pmset sleepnow and then manually waking it up.
All this time I was still trying to figure out how to trigger a failure on demand, and to read log files to find some clues. I didn’t find any log file clues, but I did rule out a hardware problem using the Apple Hardware Diagnostic program that came on the installation DVDs with the computer. The recent 10.5.2 update and exciting-sounding “Leopard Graphics Update 1.0” didn’t solve the problem. In fact, after the 10.5.2 update and that graphics update, the time between failures dropped to about 5-6 hours.
I learned that opening ~150 windows and triggering Expose (via F9) repeatedly would quickly freeze the GUI. At this point my question was, is this a generic Leopard quality problem or a problem specific to all the junk I have installed? Or perhaps a hardware issue not detectable by the diagnostic software?
Safe boot proved it to be very likely to be a problem with third party software; the problem was gone when Safe Booted. So then it was a matter of selectively disabling various things that Safe Boot disables and booting normally, until the problem goes away.
I wrote a simple script to move all fonts not in the list of fonts installed with Leopard into a folder on my desktop; that didn’t make the problem go away. According to Font Book’s validation feature, I have about 2 dozen damaged fonts (including all of the music notation fonts installed by Sibelius, and Adobe’s Carta font family), but hey, Apple’s own Helvetica LT MM font fails validation too. In fact, all of the built-in “MM” fonts do. It seems that the Font Book validator considers any wacky glyph shapes (like the Multiple Master fonts that make PDFs possible, or music notation symbols, or dingbats) to be invalid.
My next step was to disable any startup items in /System/Library/StartupItems (which was empty) and /Library/StartupItems. There were some old HP printer and scanner drivers, some M-Audio Firewire audio drivers from my Firewire Solo (no Leopard drivers exist yet), Qmaster, Unlockupd, and BRESINKx86Monitoring. All seemed plausible as causes of problems but disabling all of them didn’t fix the problem. Now that I’ve looked up what they each do, though, I’m not going to put them back.
Finally, I disabled the non-Apple stuff in /System/Library/Extensions. There are almost 300 items in there, and most of them sound important. In fact, there’s even Dont Steal Mac OS X.kext which apparently contains a pretty funny easter egg (see the post by “kick52” on that page). I installed a fresh copy of Leopard from my install DVD onto an external drive and used ls -1 and diff to compare the directory contents. I also performed my torture test and verified that Leopard 10.5.0 all by itself would not freeze when pushed. OK, so it’s definitely not hardware, nor is it a problem inherent to Leopard.
I googled the filenames of the .kext bundles in /System/Library/Extensions on my hard disk that weren’t in the base Leopard install, and found that there were some Parallels kernel extensions in there, from about two years ago when I was using Tiger. Uh oh. So I moved those to another directory. The modules I removed are called helper.kext, hypervisor.kext, Pvsnet.kext, and vmmain.kext. My Parallels.app version is 2.5 v3188, which is pretty old.
UPDATE: I missed one: ConnectUSB. I found it by running kextstat while ssh’d in from another machine, after it froze again a day later. It was listed as com.parallels.kext.ConnectUSB, which stood out among all the com.apple stuff. I had used locate parallels and missed the stuff that shows up with locate Parallels – I didn’t realize it was case sensitive. Hopefully this is the last bit.
The problems seem to be gone now. Woo hoo! I can do my torture test with > 300 finder windows, eight Quicktime movies open and playing, and DVD Player showing a movie behind a partially transparent Terminal window.
UPDATE #2:I continued to have crash problems after this point. I now believe that the Canon MX310 scanner driver was the problem. More details are here.
I could have saved myself a lot of pain with Archive and Install, but I think I’m still coming out ahead due to the time saved by not having to reinstall and relicense dozens of applications. However, for mainstream users (who aren’t willing or able to go through this amount of troubleshooting effort) I strongly recommend using the Archive and Install when installing Leopard on a machine that currently has Tiger installed.
Update #3: It was actually an intermittent hardware problem and Apple has replaced the logic board of this computer.