In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
In this issue, the editor will bring you about how to understand the web server crawler protocol robot in linux. The article is rich in content and analyzes and describes it from a professional point of view. I hope you can get something after reading this article.
Summary of common configuration contents of robots.txt: (prompt to put it directly under the web root directory, you can use Baidu webmaster tool robot tool to test)
User-agent: Baiduspider
Allow: /
Disallow: / * .jpg$
Disallow: / * .jpeg$
Disallow: / * .gif$
Disallow: / * .png$
Disallow: / * .bmp$
User-agent: Googlebot
Allow: /
Disallow: / * .jpg$
Disallow: / * .jpeg$
Disallow: / * .gif$
Disallow: / * .png$
Disallow: / * .bmp$
User-agent: 360Spider
Allow: /
Disallow: / * .jpg$
Disallow: / * .jpeg$
Disallow: / * .gif$
Disallow: / * .png$
Disallow: / * .bmp$
User-agent: msnbot
Allow: /
Disallow: / * .jpg$
Disallow: / * .jpeg$
Disallow: / * .gif$
Disallow: / * .png$
Disallow: / * .bmp$
User-agent: Sosospider
Allow: /
Disallow: / * .jpg$
Disallow: / * .jpeg$
Disallow: / * .gif$
Disallow: / * .png$
Disallow: / * .bmp$
User-agent: YoudaoBot
Allow: /
Disallow: / * .jpg$
Disallow: / * .jpeg$
Disallow: / * .gif$
Disallow: / * .png$
Disallow: / * .bmp$
Root@ubuntu:/var/www# cat robots.txt
User-agent: Baiduspider
Allow: /
Disallow: / * .jpg$
Disallow: / * .jpeg$
Disallow: / * .gif$
Disallow: / * .png$
Disallow: / * .bmp$
User-agent: Googlebot
Allow: /
Disallow: / * .jpg$
Disallow: / * .jpeg$
Disallow: / * .gif$
Disallow: / * .png$
Disallow: / * .bmp$
User-agent: 360Spider
Allow: /
Disallow: / * .jpg$
Disallow: / * .jpeg$
Disallow: / * .gif$
Disallow: / * .png$
Disallow: / * .bmp$
User-agent: msnbot
Allow: /
Disallow: / * .jpg$
Disallow: / * .jpeg$
Disallow: / * .gif$
Disallow: / * .png$
Disallow: / * .bmp$
User-agent: Sosospider
Allow: /
Disallow: / * .jpg$
Disallow: / * .jpeg$
Disallow: / * .gif$
Disallow: / * .png$
Disallow: / * .bmp$
User-agent: YoudaoBot
Allow: /
Disallow: / * .jpg$
Disallow: / * .jpeg$
Disallow: / * .gif$
Disallow: / * .png$
Disallow: / * .bmp$
User-agent: bingbot
Allow: /
Disallow: / * .jpg$
Disallow: / * .jpeg$
Disallow: / * .gif$
Disallow: / * .png$
Disallow: / * .bmp$
User-agent: Sogou web spider/4.0
Allow: /
Disallow: / * .jpg$
Disallow: / * .jpeg$
Disallow: / * .gif$
Disallow: / * .png$
Disallow: / * .bmp$
User-agent: *
Disallow: /
The above is the editor for you to share how to understand the web server crawler protocol robot in linux. If you happen to have similar doubts, please refer to the above analysis. If you want to know more about it, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.