In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
The main content of this article is to explain "what are the skills to keep the content of the website from being crawled". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Next let the editor to take you to learn "what are the skills to keep the page content of the website from being crawled?"
Some friends may wonder that the page of the website is not to let the search engine catch as much as possible, and how can there be the idea of how to keep the page content of the site from being crawled?
First of all, the weight that a website can divide out is limited, even if it is the station of Pr10, it is impossible to divide the weight infinitely. This weight includes links to other people's sites as well as internal links within your own site.
Outside the chain, unless it is the person who wants to be cheated. Otherwise, the outer chain needs to be crawled by search engines. This is beyond the scope of this article.
And inside the chain, because some websites have a lot of repetitive or redundant content. For example, some search results that are queried by conditions. In particular, some B2C stations can be searched by product type, model, color, size and so on in the special query page or in a certain location of all product pages. Although these pages are very convenient for visitors, they are very time-consuming for search engines, especially when there are a lot of pages on the site. At the same time, it will also spread the weight of the page, which is bad for SEO.
In addition, the website management landing page, backup page, test page and so on, is also what the webmaster does not want the search engine to include.
Therefore, it is necessary to make some content of the web page, or some pages not included by the search engine.
The author first introduces several more effective methods:
1. Show the content you don't want to be included in FLASH.
As we all know, the search engine's ability to crawl the content of FLASH is limited, and it can not completely crawl all the content in all FLASH. Unfortunately, there is no guarantee that everything in FLASH will not be crawled. Because Google and Adobe are working hard to implement FLASH crawling technology.
two。 Working with robos files
This is the most effective method for the time being, but it has a big drawback. Is not to post any content and links. As we all know, the healthier pages in SEO should be in and out. There is a link outside the link, at the same time, the page also needs to have a link to the external website, so the robots file control, so that this page only can not get in and out, the search engine does not know what the content is. This page will be classified as a low-quality page. The weight may be punished to some extent. This is mostly used for website management pages, test pages and so on.
3. Use nofollow tags to package content you don't want to include
This method can not completely guarantee that it will not be included, because it is not strictly required to follow the label. In addition, if an external site is linked to a page with a nofollow tag. In this way, it is likely to be crawled by search engines.
4. Use Meta Noindex tags with follow tags
This method can prevent inclusion and can also transfer the weight. Do not want to pass on, look at the website construction webmaster's own needs. The disadvantage of this method is that it is also a great waste of time for spiders to crawl pages.
5. Using robots files while using iframe tags on the page to display the content that needs to be included by search engines robots files can prevent content other than iframe tags from being included. Therefore, you can put the content you do not want to include under the normal page tag. The content that you want to be included is put in the iframe tag.
Then, let's talk about the methods that have become invalid, and we should not use these methods in the future.
1. Working with forms
Google and Baidu have already been able to grab the content used in the form and can't stop it from being included.
two。 Using Javascript and Ajax technology
In terms of current technology, the result of the final operation of Ajax and javascript is still transmitted to the browser in the form of HTML for display, so this also can not prevent inclusion.
At this point, I believe you have a deeper understanding of "what skills to keep the page content of the website from being crawled". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.