[BlueOnyx:21882] Re: EU-DSGVO - anonymize ip addresses in apache logfiles / other logfiles?

Michael Stauber mstauber at blueonyx.it
Fri Mar 23 11:36:27 -05 2018


Hi Dirk,

> EU-DSGVO is coming...
> Is there a possibility for you to add a feature to blueonyx to 
> anonymize ip addresses in apache logfiles or any other logfiles
> which collect ip addresses of individuals?

<Sigh>. Idiotic laws like this make me glad that I left Europe for good.
But yeah ... this is something to think about.

> Maybe a feature to activate in GUI like "anonymize ip addresses 
> in logfiles" and a function that ip addresses will be changed to
> 123.234.x.x (anonymize last two segments).

That would be possible. The sledgehammer method would be to change the
logging format in various daemon config files and we would no longer log
the IP, or obfuscate it.

But it's such a bad idea on so many levels that it's mind-boggling.

During diagnostics, forensics, troubleshooting or when fighting SPAM or
blocking attackers you *do* want the full IP. There is no substitute for
that.

Having to obfuscate all IP's is like being told: "When someone breaks
into your house, you are no longer allowed to shoot the attacker. You
have to nuke the whole neighborhood from orbit."

It doesn't matter where we look: Fail2ban, Dfix2, APF, pam_abl, access
rules in Sendmail, RBLs ... they need the IP. Some get them directly
from the services, some only from the logs.

So changing the log format to hide/obfuscate the IP's immediately is not
such a good idea.

Are IPs "personal data" ("personenbezogene Daten")? The EuGH and the BGH
say "Yes!" to that question.

What does the law say about the storage of "personal data"?

------------------------------------------------------------------------
Data must be stored as briefly as possible. The period should take into
account the reasons for processing the data and legal obligations to
keep the data for a specified period (for example, if national labour,
tax or anti-fraud legislation requires your company/organisation to keep
personal data of employees for a specified period, duration of product
warranty, etc.).

Your company/organisation should set deadlines for the deletion or
verification of stored data.

Translated with www.DeepL.com/Translator

Source (German EU page):
https://ec.europa.eu/info/law/law-topic/data-protection/reform/rules-business-and-organisations/principles-gdpr/how-long-can-data-be-kept-and-it-necessary-update-it_de

------------------------------------------------------------------------

So .... even under the EU-DSGVO we are still allowed to have logfiles
(or other records) with IP addresses of originating traffic in them.
However: The retention period should be short, the purpose for the
collection must be lawful and access to the raw data must be restricted
to a "need to know" basis. Plus you as organization or corporate entity
must document that policy somewhere.

Hence I'd say: We keep the logging as it is. We need the (recent) IPs.
BUT: We need to make sure that logs are expunged or the IPs withing must
be obfuscated after the fact and after a configurable period. IF we keep
the logs (or other records) for longer periods.

As is we're not keeping the combined logs under /var/log/ indefinitely,
but I can't even tell for sure w/o checking how long the individual logs
are actually kept. So this might involve changes to logrotate.

All in all this is doable with a GUI page where this is configured and a
cronjob that runs daily to go through all related logs to obfuscate IPs
of older logfiles. However: This also affects
/home/sites/<VSite>/logs/web.log, where we keep a really long backlog of
Vsite related web accesses, which *include* the IP.

This goes even further than that: SpamAssassin's AWL stores IPs for
quite some time. Milter-GeoIP as well. Milter-Greylist stores them for a
configurable period, which is short enough to get a waiver.
SendmailAnalyzer stores data indefinitely as well. That doesn't include
IP's, but sender email addresses, which are also "personal data".

The pruning/obfuscation of SendmailAnalyzer's data will be particularly
fun, as the data-files are actual Perl-Modules *and* there are
daily/weekly/monthly/yearly files, which amounts to a hell of a lot of
file operations on a search and replace to obfuscate. It would make more
sense to day: We delete any SendmailAnalyzer-data older than X-days and
be done with it.

All in all: This new law is pretty ridiculous. Especially considering
that even the German govt considers "personal data" of users aggregated
in large companies as "important marketable asset with considerable
competitive advantage". Ah, don't we love it? They preach water and
drink wine!

But yeah: I'll see to it. Anyone know what the deadline for the
implementation is? AFAIK 25th of May 2018 or thereabouts.

-- 
With best regards

Michael Stauber



More information about the Blueonyx mailing list