<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
On 08/06/2012 05:45 PM, Richard Morgan wrote:
<blockquote cite="mid:04B71FB9BFCA4377AA8B9A679DCB1ACE@morganweb"
type="cite">
<meta http-equiv="Context-Type" content="text/html;
charset=iso-8859-1">
<blockquote>
<div>----- Original Message ----- </div>
<div><b>From:</b> <a moz-do-not-send="true"
title="gwaugh@frontstreetnetworks.com"
href="mailto:gwaugh@frontstreetnetworks.com">Gerald Waugh</a>
</div>
<div><b>To:</b> <a moz-do-not-send="true"
title="blueonyx@mail.blueonyx.it"
href="mailto:blueonyx@mail.blueonyx.it">BlueOnyx General
Mailing List</a> </div>
<div><b>Sent:</b> Monday, August 06, 2012 11:22 PM</div>
<div><b>Subject:</b> [BlueOnyx:11108] Re: SL6.2 no boot from
degraded RAID1... with fix... BTW 6.3 is OK</div>
<div><br>
</div>
On 08/06/2012 07:47 AM, Gerald Waugh wrote:
<blockquote cite="mid:501FBCF5.5050100@frontstreetnetworks.com"
type="cite">
<pre><b>I M P O R T A N T ! ! !
Note: copied from Scientific Linux list</b> Konstantin Olchanski <a moz-do-not-send="true" href="mailto:olchansk@triumf.ca%3E"><olchansk@triumf.ca></a>
"======================================
FYI, as a regression from SL6.0 and SL6.1, SL6.2 does not boot from degraded RAID1 devices.
If your "/" is on a RAID1 mirrored across 2 disks and <b>1 of the 2 disks dies,</b> <b>your system will
not boot</b> because dracut does not activate the required md devices.
This is a very serious problem because RAID1 (mirroring) of "/" and "swap" is a popular
solution for protecting against single-disk failures. The present bug defeats this protection
and makes the situation worse because failure of either of the 2 disks makes your system
unbootable.
It is astonishing that this problem was not caught by anybody's QA, did not receive
wide publicity <b><span>*</span>and<span>*</span></b> the solution was not pushed into the current release of SL.
Bug report against dracut was filed in January:
<a moz-do-not-send="true" href="https://bugzilla.redhat.com/show_bug.cgi?id=772926">https://bugzilla.redhat.com/show_bug.cgi?id=772926</a>
marked as duplicate of secret bug:
<a moz-do-not-send="true" href="https://bugzilla.redhat.com/show_bug.cgi?id=761584">https://bugzilla.redhat.com/show_bug.cgi?id=761584</a>
solution made available in July for (the best I can tell) the 6.3 release:
<a moz-do-not-send="true" href="http://rhn.redhat.com/errata/RHBA-2012-0839.html">http://rhn.redhat.com/errata/RHBA-2012-0839.html</a> (dracut-004-283.el6.src.rpm)
<a moz-do-not-send="true" href="http://rhn.redhat.com/errata/RHBA-2012-1078.html">http://rhn.redhat.com/errata/RHBA-2012-1078.html</a> (dracut-004-284.el6_3.src.rpm)
These RPMs are available in SL6 .../6rolling/x86_64/updates/fastbugs/
I confirm that dracut-004-284.el6_3 can boot SL6.2 from degraded "/" (one disk missing).
Note that applying the fix on affected systems is not trivial:
1) rpm -vh --upgrade dracut-004-284.el6_3.noarch.rpm dracut-kernel-004-284.el6_3.noarch.rpm
2) bad dracut is still present inside the /boot/initramfs files, your system is still broken
3) dracut -v -f ### this rebuilds the initramfs for the ***presently running*** kernel, not necessarily the one used for the next reboot
4) find /boot -name 'initramfs*.img' -print -exec lsinitrd {} \; | grep dracut-0 ### report dracut version inside all /boot/initramfs files
5) dracut -v -f /boot/initramfs-2.6.32-279.1.1.el6.x86_64.img 2.6.32-279.1.1.el6.x86_64 ### rebuild initramfs for the latest update kernel
"=======================================
<b>
Is fixed is SL6-3
Looks like CentOS with its 6,3 version is OK
</b>
</pre>
</blockquote>
And no one cares to comment<br>
It doesn't bother you that if one of the drives go south, and
you reboot the server, <br>
it won't boot up, you have go to the data center and swap drives
to get it to boot.<br>
<pre>--
Gerald </pre>
<pre> </pre>
<pre>Hi Gerald, can't speak for others but I've done a bit of research into this prompted by your email. I for one appreciated your message greatly.</pre>
<pre>I've tried putting a test server together this evening to try out both the failing and the fix, but my spare disks are at my office. I'll try tomorrow and reporting my findings.</pre>
<pre>Richard</pre>
<p> </p>
</blockquote>
</blockquote>
I believe you will be OK<br>
Just be sure that the working drive is in 'a' position (1st drive)<br>
<br>
I believe the problem is that a server may not reboot, when one of
the drives/partitions have failed.<br>
and you aren't available to move the drive to the correct position.<br>
<pre class="moz-signature" cols="72">--
Gerald</pre>
</body>
</html>