RAID 5 drive failure
:: Thursday, May 25th, 2006 @ 11:18:49 am
:: Tags: Computers

Shit happens, and hard drives fail. I woke up this morning to a “corrupted directory” notice and beeping noises coming from the server area under my stage. As a reminder, my main server holds an 8-disk RAID 5 configuration + a system drive, and has been running smoothly for quite awhile now.
I’m in the process of rebuilding the array using the original drive (it is still spinning and accepts a rebuild option), but I’ve also ordered a replacement drive so I can shove the thing in if the RAID fails again. The rebuild will take 30 hours, so I have some time to wait for the replacement to arrive. :) The next time I build a RAID, I’m going to have a dedicated spare in the mix so I don’t have to stress out about sourcing a matching drive.
The failure orphaned hundreds of files, which were recovered and placed in a useless directory structure by CHKDSK. Chris Emura has always warned me about data loss in RAID 5 failure and luckily, I am always backed up (onto a ReadyNas 600 system). When the RAID finishes its rebuild, I’ll do a restore from the backups.
I’m on track for my failure rate of 1-2 drives a year (out of the 20 or so that I use).
Eric - anymore drive failures and I’ll be suggesting one of these:
http://www.apple.com/xserve/raid/
since is it certifiied to work with Microsoft Windows 2003 (and 2000) :-)
http://www.apple.com/xserve/ra.....tions.html
Per GB, it is pretty cheap and you can keep a hot spare in the 7th bay on each side of the RAID.
Victor - I’m not sure how an xserve would prevent drive failure! :) they sure are sexy, though.
Yeah, but consider that Apple might end up introducing a widget that will attach to your iPod and which will wirelessly alert you when a drive goes down.
You make good points, Chester. :)
You may want to investigate a RAID controller which supports dual-parity ADG RAID-6, such as the HP Smart Array P600. :)
http://h18004.www1.hp.com/prod.....index.html
Adam - how am I going to get 2TB of online storage via SCSI? :) I think I’m going to have to stick with SATA. :) Performance isn’t important for me, since there are usually at most 2-3 machines accessing the data at once. It’s all over gigabit, so I’m capped at 26-28MB/s of transfer, anyway.
Eric,
Re: your comment to me. Sadly nothing can prevent drive failure (not even brushing your teeth 3 times a day, saying your prayers and eating your broccoli). What an enterprise-clase storage device would do would reduce the impact that drive failure has (RAID 50, hot spare, battery backup, etc..)
With you being gone on multi-day/week trips, a system with 2+ hot spares that auto-rebuild the raid might mean you could just stop home to do laundry, swap dead drives, have a meal with Vienna and head back to the AirPort. :-)
After seeing your setup, it almost seems that if there was a next (reinforcement) step up to be taken, a Xserve RAID (or similar) might be it. Plus, dang it, Apple’s RAID works for windows - it is a rare time I get to recommend hardware from that little fruit company that menas not buying a Mac.
hehehe.
Yeah, I understand your point, Victor. My system already does auto-rebuilds, but I need a swap drive, which I am not doing at the moment because I need all the space I have!
During the next big upgrade (after I pass 2TB of data), I’ll be sure to configure a 7-drive RAID 5 + 1 drive swap.
Sorry to hear about bad drives. Since you’re never home, you might try a raid 5 + 2 online spares.
Another method that I’ve used in building arrays, 10 Drive array, 2 x Raid 5 + 1 (4 drives + 1 online spare). You’ll have a “faster rebuild” time, thus lowering your unprotected state. Each in a protected group.
When you do buy another set of drives, buy 2 extra shelf spares. At work, we run through drives like crazy and found a bunch of drives with different servo codes, firmware codes, sector size differences. The sector size difference is nasty during a rebuild process.
Also unstead of having all the drives in the same case as your motherboard, go external enclosure. I think it’s time for a 42U rackmount frame! When drives are all spinning in a array, there is a vibration induced by drives. As you add more drives to the mix, each one is vibrating at a different frequency. Basically all the drives can vibrate each other to death.
I just qualified the Hitachi Deskstar T7K500 500gb SATA 3.0gb/s drives in our arrays. :) 500gb x 10! :)
I’m guessing that you already have UPSs but I have a bunch of APC SU700 UPS that I don’t need anymore. Think I have 6 more of them. Want them? They just need new batteries.
Thanks, Curtis. You’re the master of old gear. :) I have a big UPS under there and have been wanting to get another, since I have two devices that can shut themselves off via UPS control signal. Problem is that one of them requires USB, so a SU700 won’t work.
I wonder if the serial port in my main server is working… ;)
Eric, the same controller exists for SATA and SAS disks. I suppose you would have to invest in a SAS chassis that accepts SATA disks, though. Then buy a couple of these bad boys:
http://www.seagate.com/cda/pro.....43,00.html
Check out the P400 if you’re remotely interested in the RAID6 stuff: http://h18004.www1.hp.com/prod.....index.html
Can I have a UPS, Curtis? :D
[...] [ECHENG.COM] - RAID 5 drive failure - never let Eric anywhere near a HD you care about. [...]