[BlueOnyx:10785] Re: Blue Onyx 5106R Crash

Bill Hicks billhicks at netstep.net
Wed Jun 13 09:45:27 -05 2012


> Hi Bill,
>
>> 6. I log in via ssh to see if maybe it is loocked up on the update or
>> experiencing a DOS. When I get in and su to root I do a "top" to see
>> what is
>> happening and I get "Bus Error". Wow never seen that before.
>
> I'm really sorry to hear that this box is giving you grief again. :-(
>
> I have seen "Bus Error" error messages before on one of my own boxes.
> It was also a fairly old clunker only used for some development and
> testing stuff. When Linux says "Bus Error", then that points somewhere
> in the direction of the hard disk controller. It could be that the
> controller itself is busted, could be bad cabling, oxydized contacts, a
> problem with the circuit board that's screwed under the HD itself, or it
> could be something wrong with a part of the motherboard that interfaces
> with the controller.
>
> For testing purposes I'd put the disks into another box. This could be
> any PC - even a workstation. In my office I'd take a Windows box, would
> disconnect the internal HDs, would hook up one of the HDs from the
> failed server via USB and would boot the box off the BlueOnyx CD in
> rescue mode. That should allow you to check if there is still good data
> on the hard disk or if the partition table is trashed as the failed box
> claims.
>
> If the disks turn out OK, you could even shove them into another server
> and could use them there. If that still makes sense is another question.
> You mentioned 80GB disks, so I assume they are also already pretty aged.
>
> All in all I agree with Chris: If the hardware starts to get flaky,
> then it's time to bin it, or to retire it to unimportant tasks where
> potential loss of data is no longer a critical or crippling issue. In my
> experience the typical hardware lasts me about four years and then the
> mean time between failures usually goes through the roof. First the
> disks let go and the number of disk related failures skyrocket, then the
> box crashed more often and in the end something lets go entirely, which
> prevents startup. The longest I ever got out of a 24/7 running server
> was seven years, but then it was operating on its third set of HDs and
> second set of RAM.
>
> -- 
>
> With best regards,
>
> Michael Stauber

Hi Michael - I did take the HD and put it in another PC and got the same 
issue, No OS. I am pretty sure the server is the issue as this is the 2nd HD 
and this one was new, well new in that it had never been used and was still 
in the box. I will keep the server as a test box and am looking at upgrading 
to better servers. These older servers, though powerful for what they used 
for are a concern. Though I do have a Sun Raq4 that I have been using 
non-stop since 1999 and it has never had a single hiccup, except for the 
little fans quiting on it and having to remove the cover so it wouldnt run 
hot.

As a last piece of information about the server, last night it sent me 2 
email warnings, one from LogWatch which I see occasionally on all the 
servers but also the following cron error:

/etc/cron.daily/logrotate:

/usr/bin/analog: analog version 5.32/Unix
/usr/bin/analog: Warning L: Large number of corrupt lines in logfile stdin:
  turn debugging on or try different LOGFORMAT
  (For help on all errors and warnings, see docs/errors.html)
    Current logfile format:
      %v %S %j %u [%d/%M/%Y:%h:%n:%j] "%j%w%r%wHTTP%j" %c %b "%f" "%B"\n
      %S %j %u [%d/%M/%Y:%h:%n:%j] "%j%w%r%wHTTP%j" %c %b "%f" "%B"\n
      %S %j %u [%d/%M/%Y:%h:%n:%j] "%j%w%r" %c %b "%f" "%B"\n
      %S %j %u [%d/%M/%Y:%h:%n:%j] "%r" %c %b "%f" "%B"\n

I am pretty sure this was a symptom of the crash and not the cause.

Bill H 




More information about the Blueonyx mailing list