Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Several methods of how to prohibit website content from being included by search engines

2025-02-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/02 Report--

This article is about several ways to prohibit website content from being included by search engines. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

Usually the goal of making a website is to allow search engines to include and expand the scope of promotion, but if your site involves personal privacy or confidential non-public pages and needs to prohibit search engines from crawling, how to do it? For example, Taobao is an example of banning search engines. This article will teach you several ways to block or prohibit search engines from capturing the content of websites.

Search engine spiders continue to crawl the Internet, if our site does not make the operation to prohibit the inclusion of search engines, it will easily be included by search engines. So here's how to prohibit search engines from including website content.

First, robots.txt method

Search engines follow the robots.txt protocol by default (some rogue engines are not excluded). Create a robots.txt text file and place it in the root directory of the website. Edit the code as follows:

User-agent: *

Disallow: /

Through the above code, you can tell the search engine not to crawl to take this site, pay attention to the use of the code as above: this will prohibit all search engines from visiting any part of the site.

If only Baidu search engine is prohibited from crawling web pages

1. Edit the robots.txt file and mark the design as follows:

User-agent: Baiduspider

Disallow: /

The above robots files will prohibit all crawls from Baidu.

What is the user-agent of Baidu's user-agent,Baiduspider here?

Baidu uses different user-agent for each product:

Product name corresponds to user-agent

Wireless search Baiduspider

Image search Baiduspider-image

Video search Baiduspider-video

News search Baiduspider-news

Baidu searches for Baiduspider-favo

Baidu Alliance Baiduspider-cpro

Business search Baiduspider-ads

Web pages and other searches for Baiduspider

You can set different crawling rules according to different user-agent of each product. The following robots implementation prohibits all crawling from Baidu but allows image search to crawl / image/ directory:

User-agent: Baiduspider

Disallow: /

User-agent: Baiduspider-image

Allow: / image/

Please note: the pages crawled by Baiduspider-cpro and Baiduspider-ads will not be indexed, but just perform the agreed operation with the customer, so if you do not abide by the robots agreement, you need to contact Baidu people to solve the problem.

How to prohibit only Google search engines from crawling web pages as follows:

Edit the robots.txt file, and the design is marked as:

User-agent: googlebot

Disallow: /

Second, the method of web page code

Add code between the code on the home page of the site, which prevents search engines from crawling the site and displaying snapshots of the page.

Between the code on the home page of the website, you can add it to prohibit Baidu search engine from crawling the site and displaying snapshots of web pages.

Between the code on the home page of the site, you can add to prohibit the Google search engine from crawling the site and displaying snapshots of the page.

In addition, when our needs are weird, such as the following situations:

1. The site has been added robots.txt, but also in Baidu search out?

Because it takes time to update the search engine index database. Although Baiduspider has stopped accessing pages on your site, it may take months to clear the index information that has been established in the Baidu search engine database. Also check that your robots configuration is correct. If your rejection is included in a very urgent need, you can also respond to the request through the complaint platform.

two。 Want the website content to be indexed by Baidu but not saved snapshots, what should I do?

Baiduspider complies with the Internet meta robots protocol. You can use the settings of the page meta to make Baidu display index only the page, but do not display a snapshot of the page in the search results. Like the update of robots, because the update of the search engine index database takes time, although you have prohibited Baidu from displaying a snapshot of the page in the search results through meta, it may take two to four weeks for Baidu search engine database to take effect online if the index information has been established.

3. Want to be indexed by Baidu, but do not save the snapshot of the website, solve the problem with the following code:

4. If you want to prevent all search engines from saving snapshots of your pages, the code is as follows:

Here are some common code combinations:

You can grab this page, and you can continue to index other links along this page

Do not crawl this page, but you can crawl other links along this page

You can crawl this page, but you are not allowed to crawl other links along this page

Do not crawl this page or index other links along this page

Thank you for reading! This is the end of this article on "how to prohibit website content from being included by search engines". I hope the above content can be of some help to you, so that you can learn more knowledge. If you think the article is good, you can share it for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report