新搭建的网站内容没多少,就发现大量的垃圾蜘蛛抓取,AI采集爬虫过来了。
导致服务器CPU一直标高,宽带一直占用 今天出个教程屏蔽解决搜索引擎蜘蛛和ai爬虫,传统robots文件屏蔽爬虫和蜘蛛不是很管用,并非所有蜘蛛遵网站的robots.txt 规则,所以不建议只设置robots.txt。
屏蔽方法:
1:nginx配置文件屏蔽。
以宝塔为例,进入宝塔面板,然后选择网站-对应网站设置- 复制到 配置文件 当中,然后重启nginx即可生效。
如果是脚本版,找nginx.conf配置文件添加也可以。
一些常用的爬虫蜘蛛,复制进去保存,然后nginx重新载入即可:
#屏蔽垃圾蜘蛛
if ($http_user_agent ~* (SemrushBot|python|MJ12bot|AhrefsBot|hubspot|petalbot|ImagesiftBot|ImagesiftBot|opensiteexplorer|leiki|BLEXBot|webmeup)) {
return 444;
}也有更详细一点的内容。也可以同样的方法复制进配置文件屏蔽蜘蛛:
if ($http_user_agent ~ "Neevabot|TTD-Content|FeedDemon|ThinkBot|MTRobot|SMTBot|LieBaoFast|Punkspider|MauiBot|Barkrowler|MegaIndex.ru|JikeSpider|TkBot|Indy Library|Alexa Toolbar|AskTbFXTV|AhrefsBot|CrawlDaddy|CoolpadWebkit|Java|Feedly|UniversalFeedParser|ApacheBench|SemrushBot|Swiftbot|ZmEu|oBot|jaunty|Python-urllib|lightDeckReports Bot|YYSpider|DigExt|MJ12bot|heritrix|EasouSpider|LinkpadBot|Ezooms|bsalsa|DotBot|DataXu|Daum|BLEXBot|Scrapy|PetalBot|proximic|GrapeshotCrawler|Mail.RU_Bot|Nimbostratus-Bot|ias-|AdsTxtCrawler|SeznamBot|evc-batch|AspiegelBot|Re-re Studio|^$" )
{
return 403;
}2:Apache中设置屏蔽爬虫蜘蛛.htaccess文件(方法2)
不解决爬虫和垃圾蜘蛛会让我们网站CPU一直标高,宽带跑满随着AI大模型发展AI爬虫也来捣乱主要是不带来访问,屏蔽它们是很有必要的。
<IfModule mod_rewrite.c>
Options +FollowSymlinks -Multiviews
RewriteEngine On
#屏蔽垃圾蜘蛛
SetEnvIfNoCase ^User-Agent$ .*(DataForSeoBot/1.0|dataforseo-bot|SemrushBot|SemrushBot-SA|Scrapy/1.7.3|Bytespider|BLEXBot|CompSpyBot|Exabot|Adsbot/3.1|serpstatbot/2.1|ZoominfoBot|ExtLinksBot|AlphaBot|DotBot|MauiBot|MegaIndex.ru|CCBot/2.0|SiteExplorer|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|Jorgee|SWEBot|spbot|TurnitinBot-Agent|AhrefsBot|perl|Python|Wget|Xenu|thesis-research-bot|ImagesiftBot|my-tiny-bot|Scrapy/1.5.1|Go-http-client/1.1|Yisou|facebook|gptbot|amazonbot|ZmEu) BADBOT
Order Allow,Deny
Allow from all
Deny from env=BADBOT
ErrorDocument 404 /404.html
# 拦截恶意请求,过滤常见攻击特征(如SQL注入尝试)
RewriteCond %{QUERY_STRING} (\<|%3C).*script.*(\>|%3E) [NC,OR]
RewriteCond %{QUERY_STRING} (eval\(|union.*select) [NC]
RewriteRule .* - [F,L]
# 正常规则
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
# RewriteRule ^(.*)$ index.php/$1 [QSA,PT,L]
RewriteRule ^(.*)$ index.php [L,E=PATH_INFO:$1]
</IfModule>





还没有评论,来说两句吧...