What is the reason why articles are repeatedly included by Baidu in website development? 02/13 Update SLTechnology News&Howtos

What is the reason why articles are repeatedly included by Baidu in website development?

2026-02-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

What are the reasons for the repeated inclusion of articles by Baidu in website development? in view of this problem, this article introduces in detail the corresponding analysis and answers, hoping to help more partners who want to solve this problem to find a more simple and feasible method.

Problem phenomenon:

Click to enter the link, in addition to the original page, respectively appear:

Http://www.stcash.com/5273/comment-page-1

Http://www.stcash.com/5273?replytocom=1989

The previous article unexpectedly appeared a three-level directory, while the latter article was similar to a dynamic web address.

Cause analysis:

I found the source of these two replytocom=1989 sites: links to article comments.

Four comments correspond to four replytocom. Baidu Spider may have some intelligence. Only one of the four replytocom URLs is included, but it is not smart enough to distinguish that the comment link and the original link correspond to the same article content.

The comment-page-1 URL also comes from the comment link, and comment-page-1 represents the first page of the comment page. If I have a lot of comments, say 1000 comments, then one page will definitely not be displayed, and there will be comment-page-2,comment-page-3. This is the comment paging function, which essentially prevents the page from being pulled too long when there are too many comments, resulting in slow page loading and poor user experience. But unfortunately, Baidu spiders still can't recognize this and the original text.

Solution:

1. For repeated inclusion of comment-page-1, there are two ways to solve it.

1) close comment paging in the wordpress background

2) modify robots.txt and add a sentence of project code

Disallow: / comment-page-

Robots.txt is in the root directory of the site. You can see the results of the settings by using the URL / robots.tx. If this file is not in the root directory

There is a piece of code in wp-includes/funtion:

$output = "User-agent: *\ n"

$public = get_option ('blog_public')

If ('0' = = $public) {

$output. = "Disallow: /\ n"

} else {

$site_url = parse_url (site_url ())

$path = (! emptyempty ($site_url ['path'])? $site_url [' path']:''

$output. = "Disallow: $path/wp-admin/\ n"

}

After $output. = "Disallow: $path/wp-admin/\ n"; add the sentence $output. = "Disallow: $path/comment-page-\ n"

2. For repeated inclusion of replytocom, set the robots.txt file

Disallow:/*?replytocom=

Or add nofollow links to all links that contain replytocom

Add_filter ('comment_reply_link',' add_nofollow', 420,4)

Function add_nofollow ($link, $args, $comment, $post) {

Return str_replace ("href=", "rel='nofollow' href=", $link)

}

The answer to the question about why the article was repeatedly included by Baidu in the development of the website is shared here. I hope the above content can be of some help to everyone, if you still have a lot of doubts to be solved. You can follow the industry information channel for more related knowledge.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.