It has been a while since I posted anything about Trouble; it had been a while since I brought myself to even look at the damned thing.
Still, on sunday morning I did so, and was rewarded: finally I have eradicated the accursed SATA Disk Bug From Hell. For those of a non-geeky persuasion, skip this next text box. I publish it here for the benefit of all those who may, like I was, be stumbling through the net, lost in confusion, unable to obtain even the vaguest hint of answer from Kernel Developers anywhere.
The SATA Disk Bug From Hell
In my case, this problem occurred with a Shuttle SD31P SFF PC, using Western Digital 250GB SATA II drives. These factors probably aren’t necessary, but I include them here just in case. The fault has only been verified to occur on Linux 2.6 series kernels prior to 2.6.16. In my case it involved the Intel ICH7 SATA controller, but this may also not be necessary.
It goes like this: The above hardware comes by default with the two case-top drive holders pre-cabled to connect to the first and second SATA plugs (of three total) on the motherboard. Under Linux this appears to make both drives come from the same SATA controller (ata1), with the second controller (ata2) disabled. When configured thus, the system works fine initially, but within minutes the fist drive starts making a wide variety of nasty errors which look for all the world like the cable is unplugged or faulty. Other tests such as the Hitachi Disk Fitness Test will show that both drives are fine.
If you pull the drives out and swap them, the fault will continue to occur on only the first drive. Any inquiries lauched at lkml or suchlike will likely come back with “Are your cables ok?” or “Are your disks faulty?” or such.
The answer is much simpler: Trace the SATA data cables, and on the motherboard, unplug drive 2 (the rear drive). Now plug it into SATA connector #3. Your dmesg output should now show the drives as being on separate ATA controllers. The bug will go away. The sun will shine, the birds will sing.
All of that being said, Double Trouble is down again today: my relentlessly reliable modem which never crashed with PeopleInternet is crashing every 48 hours or so with Internode. Time to review my options modem-wise.
Back to work then.