[BlueOnyx:01657] Re: filter email on language

Michael Stauber mstauber at blueonyx.it
Mon Jul 13 05:29:48 -05 2009


Hi Steffan,

> Does anybody know a way (say procmail) to filter e-mail on language
> I like to seperate Dutch email from other language

That is something that can't really be done with enough relieably. The problem 
is that you can't trust email clients to encode emails correctly or leave 
indications about what language may be found in the message body. In the end 
you really need something that evaluates the text in the message body to make 
guesstimates (and they're really guesses!) about what language that is.

Which means those checks may sometimes be wrong. Chances are they're more 
often wrong on shorter emails (with less text to parse).

Just with Procmail alone this will thefore be rather complex, tricky and 
possibly unrelieable.

With SpamAssassin you could possibly get a functionality somehwere near what 
you want to achieve.

In SpamAssassin you can define so called "accept_locales" (languages that you 
want to accept email in). If an email arrives for that user (or you can also 
set it globally) in a language that is not OK'ed, then the mail will trigger 
the rule CHARSET_FARAWAY. It may be possible to set up SpamAssassin to 
ok_lokale NL and use a procmail rule to move anything not having the 
CHARSET_FARAWAY to a specific folder destine for Dutch messages. How relieable 
SpamAssassin can detect Dutch messages I can't say.

Another option is to use the SpamAssassin plugin RelayCountry. Our AV-SPAM v5 
and v5.1 use this since the latest update (3-4 days ago) to assign scores to 
emails from China, Korea, Russia, Romania and a few other countries. If you 
have our AV-SPAM (I think you do), then check 
/etc/mail/spamassassin/country.cf for the existing syntax. It'll give you some 
ideas. You can add scores for additional countries, too. Like a negative score 
for messages originating in NL. But that method won't simply catch messages in 
Dutch language. It'll catch *any* message that originated on a mailserver in 
the Netherlands - regardless if the message is in Dutch, English or whatever 
other language.

But maybe someone else has another idea that helps you along.

-- 
With best regards

Michael Stauber




More information about the Blueonyx mailing list