小打小闹
在我的网站日志目录里先找到 Sogou spider 的 IP: # grep -h -F "Sogou web spider" * | awk '{print $1}' | sort | uniq -c | sort -nr | head -n 5 109766 220.181.94.231 26244 220.181.125.69 93 220.181.94.235 90 220.181.125.107 83 220.181.94.236 然后看看从访问最多的那个 IP 来的都是什么 user agent: # grep -h -F "220.181.94.231" * | grep -v -F "robots.txt" | awk '{ for (i=12; i<=NF; i++) printf("%s ", $i); printf("\n"); }' | sort | uniq -c | sort -nr 109497 "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)" 187 "Sogou-Test-Spider/4.0 (compatible; MSIE 5.5; Windows 98)" 109 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Avant Browser; InfoPath.1; .NET CLR 2.0.50727; .NET CLR1.1.4322)" 70 "Tsinghua AI Lab Robot 2.0" 55 "Tsinghua AI Lab Robot" 35 "-" 21 "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.7) Gecko/2009031915 Gentoo Firefox/3.0.7" 18 "Sogou Pic Spider/3.0(+http://www.sogou.com/docs/help/webmasters.htm#07)" 1 "Sogou Mobile Spider1.0 (http://wap.sogou.com)" 真有意思。 ...