Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What does it mean that search engine spiders grab share in the Internet?

2025-04-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

This article will explain in detail what it means to capture the share of search engine spiders on the Internet. The editor thinks it is very practical, so I share it for you as a reference. I hope you can get something after reading this article.

What is the search engine spider capture share?

As the name implies, crawling share is the upper limit of the total time that search engine spiders spend crawling pages on a site. For a particular site, the total time that search engine spiders spend on this site is relatively fixed and will not crawl all pages of the site indefinitely.

The English Google of grabbing share uses crawl budget, which literally translates as crawling budget. I don't think it can quite explain what it means, so I use grabbing share to express this concept.

What determines share grabbing? This involves crawling requirements and crawling speed limits.

Crawl demand

Crawling requirements, crawl demand, refers to how many pages a search engine "wants" to crawl a particular site.

There are two main factors that determine the capture demand. First, the page weight, how many pages on the site have reached the basic page weight, the search engine wants to crawl how many pages. The second is whether the page in the index library has not been updated for too long. In the final analysis, it is still the weight of the page, the page with high weight will not be updated for too long.

Page weight and website weight are closely related, to improve the website weight, we can make the search engine willing to grab more pages.

Grasping speed limit

Search engine spiders will not drag down other people's website servers in order to crawl more pages, so they will set an upper limit on the crawling speed of a certain website. Crawl rate limit, that is, the upper limit that the server can bear, within this speed limit, spider crawling will not slow down the server and affect user access.

When the response speed of the server is fast enough, the speed limit is raised a little bit, the crawling speed is accelerated, the server response speed decreases, the speed limit decreases, the crawling slows down, and even stops crawling.

Therefore, the crawling speed limit is the number of pages that can be crawled by search engines.

What determines share grabbing?

The capture share is the result of considering the crawling demand and the crawling speed limit, that is, the number of pages that the search engine "wants" to grab and "can" grasp at the same time.

The website weight is high, the page content quality is high, the page is enough, the server speed is fast enough, the crawl share is big.

There is no need for small websites to worry about grabbing share.

The number of pages of a small website is small, even if the weight of the site is low, the server is slow, and no matter how few spiders are crawled by search engines every day, they can usually grab at least hundreds of pages, and how can they crawl the whole site once in more than a dozen days? so sites with thousands of pages don't have to worry about grabbing share at all. Sites with tens of thousands of pages are generally not a big deal. If hundreds of visits a day can slow down the server, SEO is not the main consideration.

Large and medium-sized websites may need to consider grabbing share

For large and medium-sized websites with hundreds of thousands of pages or more, it may be necessary to consider whether the crawling share is enough.

If the share is not enough, for example, the website has 10 million pages, and the search engine can only grab tens of thousands of pages a day, then it may take several months, or even a year, to grab the website once, which may mean that some important pages cannot be crawled, so there is no ranking, or important pages cannot be updated in time.

If you want the page of the website to be crawled timely and fully, you must first make sure that the server is fast enough and the page is small enough. If the website has a large amount of high-quality data, the crawling share will be limited by the crawling speed, and the crawling share will be increased by increasing the page speed and directly increasing the crawling speed limit.

Baidu webmaster platform and Google Search Console have crawled data. The following figure shows the frequency of Baidu crawling on a website:

The image above shows a small website at this level posted by SEO every day. The page crawl frequency has little to do with the crawl time (depending on the server speed and page size), indicating that the crawl share has not been used up. Don't worry.

Sometimes, there is a corresponding relationship between crawl frequency and crawl time, as shown below for another larger website:

As you can see, the improvement of crawling time (reducing page size, improving server speed, optimizing database) obviously leads to an increase in crawling frequency, making more pages captured and included, and traversing the website faster.

An example of a larger site in Google Search Console:

The top is the number of crawled pages, and the middle is the amount of crawled data. Unless there is an error in the server, these two should correspond. At the bottom is the page crawl time. As you can see, the download speed of the page is fast enough to crawl millions of pages every day.

Of course, as mentioned earlier, being able to grab millions of pages is on the one hand, and whether search engines want to catch it on the other hand.

Another reason why large websites often need to consider crawling share is not to waste limited crawling share on meaningless page crawling, resulting in no chance to crawl important pages that should be crawled.

Typical pages that waste grabbing share are:

A large number of filtering pages. This was discussed in detail in a post about invalid URL crawling indexes a few years ago. Unlimited pages such as low-quality content, junk content calendar and so on are copied on the site.

The above pages are crawled a lot, and the share may be used up, but the pages that should be crawled are not.

How to save the grabbing share?

Of course, the first thing is to reduce the page file size, improve the server speed, optimize the database, and reduce the crawling time.

Then, try to avoid the waste of grabbing shares listed above. Some content quality problems, some website structure problems, if it is a structural problem, the easiest way is to prohibit crawling robots files, but more or less will waste some page weight, because the weight only can not go out.

In some cases, using the link nofollow attribute can save grabbing share. Small websites, as the share of crawling can not be used up, adding nofollow is meaningless. For large websites, nofollow can reflow and distribute control to a certain extent. Well-designed nofollow will reduce the weight of meaningless pages and increase the weight of important pages. When the search engine crawls, it uses a URL crawl list, in which the URL to be captured is sorted by page weight. If the important page weight increases, it will be crawled first, and the meaningless page weight may be so low that the search engine does not want to crawl.

The last few notes:

Link plus nofollow won't waste grabbing share. But in Google, weight is wasted. Noindex tags do not save grabbing shares. If search engines want to know that there is a noindex tag on the page, they have to crawl the page first, so it doesn't save crawling share. Canonical tags can sometimes save a little bit of grabbing share. Like the noindex tag, search engines have to crawl the page if they want to know that there is a canonical tag on the page, so they don't directly save the crawl share. However, pages with canonical tags are often crawled less frequently, so it saves a little bit of crawling share. Grasping speed and grabbing share are not ranking factors. But the pages that have not been captured are not ranked either.

This is the end of this article on "what is the meaning of spiders grabbing share in the Internet". I hope the above content can be of some help to you, so that you can learn more knowledge. If you think the article is good, please share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report