记录Nginx和Apache屏蔽指定页面目录不被访问（用户可以访问）怎么操作？

这里简单记录下用户需求，看到网站的分页页面有蜘蛛爬虫在抓取导致服务器的负载变大。于是希望将这些分类没用的目录禁止抓取，但是用户是可以翻页显示的。本来开始是用rebots.txt进行屏蔽的，但是效果不大。

User-agent: *
Disallow: /*/*/page/

1、http部分

map $http_user_agent $is_bot {
default 0;
~crawl|Slurp|spider|bingbot|tracker|click|parser|spider 1;
}

2、server部分

location ~ /(\d+)/(\d+)/page/ {
if ($is_bot) {
return 403; # Please respect the robots.txt file !
}
}

如果是Apache，那如何设置呢？

# Block real Engines , not respecting robots.txt but allowing correct calls to pass
# Google
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/5\.0\ \(compatible;\ Googlebot/2\.[01];\ \+http://www\.google\.com/bot\.html\)$ [NC,OR]
# Bing
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/5\.0\ \(compatible;\ bingbot/2\.[01];\ \+http://www\.bing\.com/bingbot\.htm\)$ [NC,OR]
# msnbot
RewriteCond %{HTTP_USER_AGENT} ^msnbot-media/1\.[01]\ \(\+http://search\.msn\.com/msnbot\.htm\)$ [NC,OR]
# Slurp
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/5\.0\ \(compatible;\ Yahoo!\ Slurp;\ http://help\.yahoo\.com/help/us/ysearch/slurp\)$ [NC]

# block all page searches, the rest may pass
RewriteCond %{REQUEST_URI} ^(/[0-9]{4}/[0-9]{2}/page/) [OR]

# or with the wpmp_switcher=mobile parameter set
RewriteCond %{QUERY_STRING} wpmp_switcher=mobile

# ISSUE 403 / SERVE ERRORDOCUMENT
RewriteRule .* - [F,L]
# End if match

记录Nginx和Apache屏蔽指定页面目录不被访问（用户可以访问）怎么操作？

作者: 小小编

发表回复取消回复

作者: 小小编

关于记录Nginx环境中将不同的爬虫指向不同的后端详细步骤操作如下

利用Nginx user_agent 屏蔽指定的爬虫怎么实现跳转,详细教程如下

为您推荐

SSH的使用详解

宝塔linux面板安装软件错误：宝塔面板检测到系统目录不可写。解决方法

宝塔linux面板node.js项目服务器重启丢失，pm2列表被删除清空解决方案

宝塔linux面板 apache网站访问报错503 Service Unavailable解决

宝塔windows面板apache开启Gzip压缩方法

发表回复 取消回复

发表回复取消回复