[BlueOnyx:20894] PKG Autoinstall defect - Summary
Michael Stauber
mstauber at blueonyx.it
Mon Apr 10 18:20:05 -05 2017
Hi all,
As you have seen today: We did something incredibly stupid and I
sincerely apologize to all of you.
If you are still affected by this, here is the quick summary for how to
fix it:
============================================================
1.) As "root" from SSH run:
yum clean all
yum update
That makes sure you're fully update and have the fix.
2.) Removal of unwanted PKG's:
cd ~
wget http://devel.blueonyx.it/pub/BlueOnyx/.scripts/pkgRemoval.pl.txt
mv pkgRemoval.pl.txt pkgRemoval.pl
chmod 755 pkgRemoval.pl
Then edit pkgRemoval.pl and in the list of unwanted PKGs delete all
lines of PKGs that you want to keep. Once satisfied with this list, run
this command:
./pkgRemoval.pl
============================================================
What caused the problem? A stupid beginners mistake in coding. In a
for/next loop a variable was conditionally re-used without resetting it
between runs.
If you then happened to have the "Active Monitor" component "Software
Updates" enabled, then the daily cronjob would poll the list of PKGs
available to you on NewLinQ to keep you informed of PKGs that are
installed and for which updates are available.
However: If the faulty code loop tripped over itself while updating CODB
with the new data from NewLinQ, then it would start to mark all PKGs for
install. Depending on when it tripped, it might mark all PKGs, in other
cases it just marked everything past a certain point.
The next run of "Active Monitor" then installed all PKGs that had been
marked for install.
During testing of the code changes on Friday prior to release everything
seemed to be fine and the problem didn't occur. On Saturday I spotted it
on a client box under maintenance contract. However, in that case just
two "unwanted" PKGs had been installed and that client also had the "All
Packages Bundle" of which he used only half a dozen elements.
This had allowed me to immediately identify the problem and to publish a
fix.
But this is actually where I had screwed up again: I made the assumption
that the problem was minor and that it perhaps might not affect anyone
else in even more drastic ways. But it did due to the delayed nature
with which the "Active Monitor" component "Software Updates" works.
Among the first casualties of the problem were the support ticket relay
system, the list servers, the YUM repository MySQL server backend and
the Solarspeed email server. They all had gotten "Dfix" installed, which
collided with the already present "Dfix2". Two MariaDB servers also
didn't survive the unmonitored updates to MariaDB-10.1, as some of them
were also still running with passwords in the old format.
So I only became aware of the problem when the phone started to ring
itself off the nightstand.
I spent the last 10 hours fixing our own broken stuff while
simultaneously fixing the boxes of everyone that contacted me in any
way. If you need help with the fix or still have pending issues on your
servers, then please let me know and I'll get to it ASAP.
--
With best regards
Michael Stauber
More information about the Blueonyx
mailing list