[BlueOnyx:10180] Re: Yup - there's an app for that!

Michael Stauber mstauber at blueonyx.it
Wed Apr 18 17:46:41 -05 2012


Hi all,

> # Get rid of bad bots:
> RewriteEngine on
> RewriteCond %{HTTP_USER_AGENT} .*google.* [OR]
> RewriteCond %{HTTP_USER_AGENT} .*yahoo.* [OR]
> RewriteCond %{HTTP_USER_AGENT} .*bot.* [OR]
> RewriteCond %{HTTP_USER_AGENT} .*spider.* [OR]
> RewriteCond %{HTTP_USER_AGENT} "^Black.Hole"
> RewriteRule .* - [F]

Speaking of "bad robots": The Chinese Baidu search engine is the worst of the 
worst.

It ignores robotx.txt - which is already bad enough.

But once I had started to block it with iptables (address range 180.76.0.0/16)  
and the above code, it came back.

>From different IP address ranges:

#### Baidu Spider:
180.76.0.0/16

# ShenZhen Sunrise Technology Co.,Ltd.
202.46.32.0/10

# SADF - CN
123.125.71.0/24

# CHINANET-IDC-BJ
220.181.0.0/16

# BAIDUJP
119.63.192.0/21

# CTIHK - City Telecom (H.K.) Ltd
183.178.0.0/16

# Victor Villar - Montevideo (errrr ... ok)
200.40.50.0/24

The return happened fairly quickly. In a matter of hours it came back from all 
over the place, even in disguise by simply giving the user agent string 
"Mozilla" instead of anything related to Baidu.

There was even a crawl from Russia which is most likely related to Baidu and I 
blocked it for good measure as well. 

I wonder which part of "No" they didn't understand.

-- 
With best regards

Michael Stauber




More information about the Blueonyx mailing list