扫一扫加微信

网站爬虫爆表怎么办?服务器扛不住AI爬虫怎么办?

最近的不少网站每天都十多个G的日志量,打开一看大部分都是各种国内外的AI爬虫痕迹,频繁请求,不间断请求,服务器根本扛不住,用robots做限制,人家根本不看,照样抓,肆无忌惮的抓,服务器16H32G照样天天CPU100%爆红,怎么办呢?

只能从源头封禁这些爬虫,在nginx里写个配置文件,全部给他们返回403/444,多次抓不到后续就不来了,一下是我的限制代码

# 源服务器添加这几行
#set_real_ip_from 127.0.0.1;
#real_ip_header X-Forwarded-For;
#real_ip_recursive on;
if ($http_user_agent ~* “feesbook.com|Mozilla/5.0 (Windows NT 10.0; Win64; x64)”) { #return 444; }
if ($http_user_agent ~ “feesbook.com|Mozilla/5.0 (Windows NT 10.0; Win64; x64)”){ #return 444; }

if ($http_user_agent ~* “Scrapy|HttpClient|Curl|Wget|Idm|Aria2|Axel|Thunder|Youtube-dl|Movgrab|rtorrent|ctorrent|Transmission-cli|vuze|petalsearch|TinyTestBot|TestBot|Amazonbot|test-bot|fidget-spinner-bot|my-tiny-bot|thesis-research-bot|Baispider|facebook|GPTBot|ClaudeBot|PanguBot|postman|serpstatbot|apple.com|BacklinksExtendedBot|YodaoBot|msnbot|BitSightBot|DuckDuckBot|duckduckbot|Birdcrawlerbot|SeekportBot|duckduckgo|DuckDuckGo-Favicons-Bot|coccocbot|RU_Bot|SurdotlyBot|2ip bot|CCBot|iaskspider|AwarioBot|PubMatic Crawler Bot|RepoLookoutBot|qwant.com|Qwantify|ChatGLM-Spider|chatglm.cn|vxiaotou-spider|Pandalytics|domainsbot.com|yandex.com|Ansuduspider|ansudu.com|KStandBot|URLSuMaBot|SMTBot|smtbot|Twitterbot|SEOkicks|seokicks.de|SafeDNSBot|your-search-bot|Exabot|sunwukong bot|RepoLookoutBot|WebwikiBot|SkyworkSpider|fynd.bot|CheckMarkNetwork|WWUSearchBot|BDCbot|J2L3x-Bot|intelx.io|io_bot|TurnitinBot|yacybot|ZumBot|Diffbot|oBot|ImagesiftBot|Safari/602.4.8 Spider|CensysInspect|apple.com|feesbook.com”) { return 444; }

if ($http_user_agent ~ “python-requests|\xB2\xBB\xCA\xCA\xD3\xC3UA|python|Python|aiohttp|The Knowledge AI|BLEXBot|SemrushBot|AhrefsBot|DotBot|Uptimebot|MJ12bot|MegaIndex.ru|ZoominfoBot|Mail.Ru|SeznamBot|ExtLinksBot|aiHitBot|Researchscan|DnyzBot|spbot|YandexBot|DataForSeoBot|PetalBot|TinyTestBot|TestBot|Amazonbot|test-bot|fidget-spinner-bot|my-tiny-bot|thesis-research-bot|Baispider|facebook|GPTBot|ClaudeBot|PanguBot|Postman|serpstatbot|applebot|BacklinksExtendedBot|YodaoBot|msnbot|BitSightBot|DuckDuckBot|duckduckbot|Birdcrawlerbot|SeekportBot|duckduckgo|DuckDuckGo-Favicons-Bot|coccocbot|RU_Bot|SurdotlyBot|2ip bot|CCBot|iaskspider|AwarioBot|PubMatic Crawler Bot|RepoLookoutBot|qwant.com|Qwantify|ChatGLM-Spider|chatglm.cn|vxiaotou-spider|Pandalytics|domainsbot.com|yandex.com|Ansuduspider|ansudu.com|KStandBot|URLSuMaBot|SMTBot|smtbot|Twitterbot|SEOkicks|seokicks.de|SafeDNSBot|your-search-bot|Exabot|sunwukong bot|RepoLookoutBot|WebwikiBot|SkyworkSpider|fynd.bot|CheckMarkNetwork|WWUSearchBot|BDCbot|J2L3x-Bot|intelx.io|io_bot|TurnitinBot|yacybot|ZumBot|Diffbot|oBot|ImagesiftBot|Safari/602.4.8 Spider|CensysInspect”){ return 444; }

然后在站点配置文件添加配置

include site.conf;

亲测有效~

 

评论

2+3=