Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use robots files to optimize the website so that spiders can better crawl the website

2025-04-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

How to use robots files to optimize the site to make spiders better crawl the site, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain in detail for you, people with this need can come to learn, I hope you can get something.

Robots files exist in the root directory of the site and are used to tell Baidu spiders which should be crawled and those should not be crawled. The correct use of robots files helps to optimize seo, and the core vocabulary of robots files is the usage of allow and disallow. Baidu official website is to recognize this document, in Baidu webmaster platform also has robots this column, click to enter, you can see whether the robots document of your site is written correctly.

Instructions for using Baidu robots files

1. Robots.txt can tell Baidu which pages of your site can be crawled and which pages cannot be crawled.

2. You can use Robots tools to create, verify and update your robots.txt files, or view the validity of your website robots.txt files on Baidu.

3. Robots tools do not support https sites at this time.

4. Robots tool currently supports 48k file content detection. Please make sure that your robots.txt file is not too large and the maximum length of the directory is not more than 250k.

In the example I gave above, there is a problem with the disallow sentence because the English colon is written as the Chinese colon.

Of course, you can also enter the root directory of the website and add the robtots.txt file directly.

User-agent:* is used to top those search engine spiders can crawl, the general default setting

The link after turning the page under the Disallow:/category/*/page/ classification directory, for example, after entering the school earning network classification directory "popularizing operational experience" directory, after turning the page once, it becomes the form of stcash.com/category/tuiguangyunying/page/2.

Disallow:/?s=* Disallow:/*/?s=* search results page and category search results page, there is no need to crawl again.

Disallow:/wp-admin/ Disallow:/wp-content/ Disallow:/wp-includes/ these three directories are system directories, which are generally captured by shielded spiders.

Links to Disallow:/*/trackback trackback

Disallow:/feed Disallow:/*/feed Disallow:/comments/feed subscription link

Disallow:/?p=* article short link, will automatically jump to long link

For example, Zhu Haitao's blog has been included with short links before.

Disallow:/*/comment-page-* Disallow:/*?replytocom* these two I have explained in the previous article, from the comment link, it is easy to cause duplicate inclusion.

At the end of the robots.txt file, you can also make the sitemap file Sitemap: http://***.com/sitemap.txt.

Sitemap address instruction, the mainstream is txt and xml format. Share a simemap file in txt format written by Zhang GE here.

Save the above code to the txt file, upload it to the root directory, and specify it in the robots.txt file

Here I would like to share my robots.txt file with you.

Copy the contents to the clipboard

User-agent: * Disallow:/wp-admin/ Disallow:/ * / comment-page-* Disallow:/ *? replytocom* Disallow:/wp- content/ Disallow:/wp- includes/ Disallow:/ category/*/page/ Disallow:/ * / trackback Disallow:/ feed Disallow:/ * / feed Disallow: / comments/feed Disallow: /? swords * Disallow: / * /? slots *\ Disallow: / attachment/ Disallow: / tag/*/page/ Sitemap: http://www.stcash.com/sitemap.xml, is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report