Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to understand the web server crawler protocol robot in linux

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

In this issue, the editor will bring you about how to understand the web server crawler protocol robot in linux. The article is rich in content and analyzes and describes it from a professional point of view. I hope you can get something after reading this article.

Summary of common configuration contents of robots.txt: (prompt to put it directly under the web root directory, you can use Baidu webmaster tool robot tool to test)

User-agent: Baiduspider

Allow: /

Disallow: / * .jpg$

Disallow: / * .jpeg$

Disallow: / * .gif$

Disallow: / * .png$

Disallow: / * .bmp$

User-agent: Googlebot

Allow: /

Disallow: / * .jpg$

Disallow: / * .jpeg$

Disallow: / * .gif$

Disallow: / * .png$

Disallow: / * .bmp$

User-agent: 360Spider

Allow: /

Disallow: / * .jpg$

Disallow: / * .jpeg$

Disallow: / * .gif$

Disallow: / * .png$

Disallow: / * .bmp$

User-agent: msnbot

Allow: /

Disallow: / * .jpg$

Disallow: / * .jpeg$

Disallow: / * .gif$

Disallow: / * .png$

Disallow: / * .bmp$

User-agent: Sosospider

Allow: /

Disallow: / * .jpg$

Disallow: / * .jpeg$

Disallow: / * .gif$

Disallow: / * .png$

Disallow: / * .bmp$

User-agent: YoudaoBot

Allow: /

Disallow: / * .jpg$

Disallow: / * .jpeg$

Disallow: / * .gif$

Disallow: / * .png$

Disallow: / * .bmp$

Root@ubuntu:/var/www# cat robots.txt

User-agent: Baiduspider

Allow: /

Disallow: / * .jpg$

Disallow: / * .jpeg$

Disallow: / * .gif$

Disallow: / * .png$

Disallow: / * .bmp$

User-agent: Googlebot

Allow: /

Disallow: / * .jpg$

Disallow: / * .jpeg$

Disallow: / * .gif$

Disallow: / * .png$

Disallow: / * .bmp$

User-agent: 360Spider

Allow: /

Disallow: / * .jpg$

Disallow: / * .jpeg$

Disallow: / * .gif$

Disallow: / * .png$

Disallow: / * .bmp$

User-agent: msnbot

Allow: /

Disallow: / * .jpg$

Disallow: / * .jpeg$

Disallow: / * .gif$

Disallow: / * .png$

Disallow: / * .bmp$

User-agent: Sosospider

Allow: /

Disallow: / * .jpg$

Disallow: / * .jpeg$

Disallow: / * .gif$

Disallow: / * .png$

Disallow: / * .bmp$

User-agent: YoudaoBot

Allow: /

Disallow: / * .jpg$

Disallow: / * .jpeg$

Disallow: / * .gif$

Disallow: / * .png$

Disallow: / * .bmp$

User-agent: bingbot

Allow: /

Disallow: / * .jpg$

Disallow: / * .jpeg$

Disallow: / * .gif$

Disallow: / * .png$

Disallow: / * .bmp$

User-agent: Sogou web spider/4.0

Allow: /

Disallow: / * .jpg$

Disallow: / * .jpeg$

Disallow: / * .gif$

Disallow: / * .png$

Disallow: / * .bmp$

User-agent: *

Disallow: /

The above is the editor for you to share how to understand the web server crawler protocol robot in linux. If you happen to have similar doubts, please refer to the above analysis. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report