[BlueOnyx:10786] Re: Blue Onyx 5106R Crash

Wed Jun 13 12:30:40 -05 2012

Thank you everyone for all the feedback, it help me narrow down what to look 
for - after trying to recover the drive using various software I then 
decided to see if the drive was really at fault. I took an unknown 80gb that 
was on the shelf and did a full erase and test of every sector and didn't 
get an error. I then put the drive that crashed in and did the same, 
hundreds of r/w errors and then eventually not ready errors, so it does 
appear to be the drive.

Bill Hicks

> Hi Bill,
>
>> 6. I log in via ssh to see if maybe it is loocked up on the update or
>> experiencing a DOS. When I get in and su to root I do a "top" to see
>> what is
>> happening and I get "Bus Error". Wow never seen that before.
>
> I'm really sorry to hear that this box is giving you grief again. :-(
>
> I have seen "Bus Error" error messages before on one of my own boxes.
> It was also a fairly old clunker only used for some development and
> testing stuff. When Linux says "Bus Error", then that points somewhere
> in the direction of the hard disk controller. It could be that the
> controller itself is busted, could be bad cabling, oxydized contacts, a
> problem with the circuit board that's screwed under the HD itself, or it
> could be something wrong with a part of the motherboard that interfaces
> with the controller.
>
> For testing purposes I'd put the disks into another box. This could be
> any PC - even a workstation. In my office I'd take a Windows box, would
> disconnect the internal HDs, would hook up one of the HDs from the
> failed server via USB and would boot the box off the BlueOnyx CD in
> rescue mode. That should allow you to check if there is still good data
> on the hard disk or if the partition table is trashed as the failed box
> claims.
>
> If the disks turn out OK, you could even shove them into another server
> and could use them there. If that still makes sense is another question.
> You mentioned 80GB disks, so I assume they are also already pretty aged.
>
> All in all I agree with Chris: If the hardware starts to get flaky,
> then it's time to bin it, or to retire it to unimportant tasks where
> potential loss of data is no longer a critical or crippling issue. In my
> experience the typical hardware lasts me about four years and then the
> mean time between failures usually goes through the roof. First the
> disks let go and the number of disk related failures skyrocket, then the
> box crashed more often and in the end something lets go entirely, which
> prevents startup. The longest I ever got out of a 24/7 running server
> was seven years, but then it was operating on its third set of HDs and
> second set of RAM.
>
> -- 
>
> With best regards,
>
> Michael Stauber
> _______________________________________________
> Blueonyx mailing list
> Blueonyx at mail.blueonyx.it
> http://mail.blueonyx.it/mailman/listinfo/blueonyx
>