[BlueOnyx:21466] Bizarre 5209R loses network config
Chris Gebhardt - VIRTBIZ Internet
cobaltfacts at virtbiz.com
Tue Oct 3 15:13:47 -05 2017
Hi Gang,
This isn't really a question - more of a report. I'm posting this here
in case it comes up again and saves anyone from any extended brain damage.
Earlier today one of our customers on a 5209R server issued a reboot and
then reported the system failed to come back online. We got console
on the box and found it to be at the login prompt so an investigation began.
At first it appeared that the IP addresses were active on the box, but
there was no network communication. ifconfig reported the primary IP
and all of the associated aliases, but there was no link at the NIC.
After trouble-shooting the physical connections and eliminating the
switch as a problem, we moved on to diagnosing the network connection at
the OS level. One curious thing we found was that
/etc/udev/rules.d/70-persistent-net.rules had two distinct MAC entries
listed for eth0. Two entries, just as you might expect when there are
2 interfaces (typically eth0 and eth1) but both were eth0.
Our typical operation is to then flush out the contents of
70-persistent-net.rules and reboot in order to have udev rebuild. But
after doing this, there were NO entries in 70-persistent-net.rules.
Since we knew the MAC address on the interface, we added a manual entry
and rebooted again. No joy.
When that happened, I began to wonder why the system did not see an
eth0. We searched for other possible interface names but didn't see
any. I poked my head into the grub configuration, as on a RHEL7 box we
typically need to force the ethX naming convention there by specifying
the net.ifnames=0 variable. It was missing.
To fix, we edited the kernel line (begins with GRUB_CMDLINE_LINUX) and
appended "net.ifnames=0" (no quotes) onto the tail end of that line.
After that, the changes were compiled in with the grub2-mkconfig
command. (We backed up the existing conf file first, just in case.)
After another reboot, the system came online just like it always did.
It's unknown to us whether the customer or any of the customer's users
may have made edits to the grub config, or if something got changed in a
recent yum update, or some other fluke.
FWIW, the customer reported the reboot was initiated because httpd had
stopped responding. Rather than tackle that, the box got rebooted in a
bid to restore service faster than doing an investigation on the initial
problem. My guess is the httpd lock-up is related to other reports
that have been made here and isn't directly connected to the grub issue.
The only thing that gives me concern, though, is what if those boxes
that had Apache lock up on them late last week / early this week are
going to do the same thing when they are rebooted. (Obviously this will
apply to physical bare-metal installs and not Aventurin{e} virtuals.)
--
Chris Gebhardt
VIRTBIZ Internet Services
Access, Web Hosting, Colocation, Dedicated
www.virtbiz.com | toll-free (866) 4 VIRTBIZ
More information about the Blueonyx
mailing list