[BlueOnyx:18999] Important 5207R/5208R/5209R YUM updates published

Michael Stauber mstauber at blueonyx.it
Thu Jan 14 18:41:00 -05 2016


Hi all,

The following updates were just released for 5207R, 5208R and 5209R:

    sausalito-cce-server
    sausalito-cce-client
    base-alpine
    base-swupdate
    base-vsite (5209R only)

I'd also like to apologise for the problems we've created with a well
intended new feature between Christmas and New Year: Back then we
released a new extra (base-memcache) that was supposed to speed up CCEd.
Which it did. But as things went: It caused issues and problems such as:

    - Sporadic Active Monitor Emails in Spanish language
    - Sporadic GUI login problems with weird error messages
    - Erratic behaviour of cronjobs that interface with CCEd
    - Runaway cced child processes
    - Expired Autoresponders started to auto-respond again.
    - Active Monitor emails to non-existing accounts.
    - Other weird issues (too many to name)

We rolled out six or seven memcache related fixes out since New Years
eve. Including an update that disabled Memcache entirely. Still: The
problems wouldn't go away as CCEd (even with deactivated Memcache) would
behave erratic. Just less erratic than with Memcache enabled, but we
certainly can't have that either.

So I just published updates that uninstall base-memcache and bring
sausalito-cce-* back to the same state as it was before the Memcache
feature was added.

This should end all of the mysterious problems that we have seen
cropping up in the last 2-3 weeks. And I thoroughly and sincerely
apologise for these issues. This is a lesson learned and we won't have
that happen again.

How did this all happen? Well, our intention was good. We wanted to
speed up access to the CODB database that CCEd (the GUI backend) uses.
Both CCEd and CODB are some pieces of rock solid technology. But they're
not exactly the fastest by design. Anyone familiar with database design
will know that lack of proper indexing slows down all FIND requests.
Because then you have to loop over all relevant database entries to find
the one(s) you're looking for. The bigger the database gets, the longer
it takes for a FIND query to finish.

The BlueOnyx GUI uses a lot of FIND requests. On some pages more, on
some pages less. Any SET or GET transaction is usually only done after a
FIND request has identified the database Object(s) that we need to
access. Therefore: Speeding up FIND transactions by providing proper
database indexing would speed things up considerably.

The Memcache feature was our attempt to achieve this speedup. For that
purpose CCEd got extended with methods that would use the service
"memcached" to create and maintain an index of database Objects and the
keys they contained. Any FIND request would first hit the cache, which
then (very speedily) returned the ID's of the Objects we were looking for.

Sadly something did not go exactly right. We are still trying to
identify the origin of the fault. But the symptoms were like this:
During one time or another CCEd would enter a fault state where a GET
request to a valid Object would return an error message such as "301
UNKNOWN CLASS" even though the Object was valid. Most typically this
happened with the "System" Object, which contains configurational data
such as language settings and the general state and configuration of the
server.

Any GUI page and any GUI script, handler or constructor depends on the
presence and availability of the "System" Object. If that's
inaccessible, then all hell breaks loose and you see error messages and
very erratic behaviour.

Most unfortunately our CCEd even exhibited these problems if Memcache
had been disabled in the GUI. It just happened less frequent than with
Memcache enabled. This was as unexpected as it was unwelcome.

To address these issues we just rolled back almost all Memcache related
changes:

CCEd got replaced with the same code that we were using before the
Memcache feature got added. Additionally the installation of this
updated sausalito-cce RPM will also remove the base-memcache module that
provided the GUI integration of Memcache. Because with that feature
being removed we don't want to have the GUI pages for it remain behind
either.

Where we'll be going from here:
================================

In the meantime we have been contemplating ideas, concepts and general
design changes that will help us to prevent these problems (and similar
ones) in the future.

Among the problems we identified is the need for proper indexing of the
CODB database to speed up FIND requests. We do have some ideas how we
can achieve this without breaking CCEd. However, this will take some
time to code and naturally we'll test it properly before we even
consider a release.

Secondly we identified (and fixed) several "speed bumps" in existing GUI
pages and libraries. In the last couple of days I released an updated
base-alpine which reduces the amount of redundant FIND and GET requests
on all GUI pages by a factor of 5-6. In terms of speed increase (even
without Memcache) that boils down to 0.5-1.0 seconds of faster
processing and page loading on an average server.

We also identified other areas where slightly different database layout
or structuring would be beneficial and found pages that make redundant
FIND or GET requests which we can subsequently eliminate for speed gains.

But certainly we're not again going to release any drastic changes
without knowing full well which implications that will have on
production servers and "real world" scenarios.

Reliable restarts of CCEd during YUM updates:
==============================================

Among the fixes released today is an updated base-swupdate module. This
tackles one long-standing issue that has plagued us for: Certain RPMs
need to restart (or rehash) CCEd upon YUM updates. We need to do that to
push out configurational changes or minor and major modifications of the
CODB database schemas.

Pretty much any modest feature change needs a CCEd restart or a CCEd
rehash (which is a fast restart of CCEd).

The mechanism we used for that was a carry over from the Cobalt Network
times. It had a few conceptual problems and didn't work reliably enough.
Even less so if the YUM update had been issued through the GUI.

To address this I wrote a YUM plugin that now restarts or rehashes CCEd
at the end of a "yum update" or "yum install" if the GUI RPM's require
it. That way we only restart or rehash CCEd once at the end of a "yum
update" (if at all) and do it with a much greater reliability and
certainty. Because a lot of the support cases and problem reports on the
BlueOnyx list (or in tickets or by email) were from people who had
issues because a mandatory CCEd restart had not been performed after a
YUM update. The new YUM plugin will once and for all solve that
particular problem.

Pending support tickets and support request by email:
======================================================

To anyone who has an open support ticket or an unanswered email: My
sincere apologies. I'll get to them as soon as I can. But as you can
imagine: Fixing these stability issues took a lot of time and energy.
And generated a flood of support request as well. Due to that both Greg
and I are totally backlogged with tickets. Working through them will
take some time, but we will get back to you as quickly as we can.

Thank you for your patience!

-- 
With best regards

Michael Stauber



More information about the Blueonyx mailing list