In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article will explain in detail what the directory should be blocked in the robots.txt file, the content of the article is of high quality, so the editor will share it with you for reference. I hope you will have a certain understanding of the relevant knowledge after reading this article.
Do you really know whether the robots.txt file settings of the website are reasonable, which files or directories need to be blocked, and which setting methods are good for the operation of the website? With these questions below, the author will give a detailed answer, hoping to help the novice webmaster friends, do not spray the old bird.
What is robots.txt
The author quotes the latter paragraph of Baidu webmaster tool to explain. Search engines use spider programs to automatically access web pages on the Internet and obtain web page information. When spider visits a website, it first checks to see if there is a plain text file called robots.txt under the root domain of the site, which is used to specify the crawling scope of spider on your site. You can create a robots.txt in your website, declare in the file the parts of the site that you do not want to be included by search engines, or specify that search engines only include specific parts.
What are the benefits of robots.txt files to the website
1. Quickly increase the weight and traffic of the website
2. Prohibit some files from being indexed by search engines, which can save server bandwidth and website access speed.
3. Provide a concise and clear indexing environment for search engines
Third, the directories of which websites need to use robots.txt files to prohibit crawling
1), picture directory
Pictures are the main elements that make up a website. With the now more and more convenient to build a station, the emergence of a large number of CMS, can really do typing will build a website, and it is precisely because of this convenience, there are a large number of homogenized template sites, are repeatedly used, such a site search engine is definitely not like, even if your site is included, then your effect is also very poor. If you have to use this kind of website, it is recommended that you should block it in the robots.txt file. The usual website image directory is: imags or img.
2), website template directory
As mentioned in the above picture directory, the power and flexibility of CMS has also led to the emergence and abuse of many homogenized website templates, a high degree of repetitive templates have formed a kind of redundancy in search engines, and template files are often highly similar to generated files, which can easily lead to the emergence of similar content. Very unfriendly to the search engine, seriously directly by the search engine into the cold, can not be turned over, many CMS have an independent template storage directory, therefore, the template directory should be shielded. Usually, the file directory of the template directory is: templets
3), shielding of CSS and JS directories
CSS directory files are of no use in crawling search engines, nor can they provide valuable information. Therefore, it is strongly recommended that webmaster friends block it in the Robots.txt file in order to improve the index quality of the search engine. It is easier to improve the friendliness of the site by providing a concise and clear indexing environment for search engines. CSS-style catalogs are usually CSS or style
JS files cannot be identified in search engines. It is only suggested that they can be blocked. This also has an advantage: to provide a concise and clear indexing environment for search engines.
4). Block the content of double pages.
Let's take DEDECMS as an example. We all know that DEDECMS can use static and dynamic URL to access the same content, if you generate a site-wide static, then you must block the URL links of dynamic addresses. There are two advantages: 1, the search engine to the static URL is more friendly and easier to include than the dynamic URL; 2, to prevent the static and dynamic URL from being able to access the same article and is judged by the search engine as duplicate content. This is good for search engine friendliness.
5), template cache directory
Many CMS programs have cache directories. I think it goes without saying that everyone knows the benefits of this cache directory. It can effectively improve the access speed of the website, reduce the bandwidth of the site, and is also very good for the user experience. However, such a cache directory also has some disadvantages, that is, it will make search engines crawl repeatedly, and repetition of content in a website is also a great sacrifice, which does no good to the site. Many friends who use CMS to build stations have not noticed and must be paid attention to.
6) deleted directory
Too many dead chains are fatal to search engine optimization. Cannot but cause the stationmaster to attach great importance to. In the course of the development of the website, the deletion and adjustment of the directory are inevitable. If the current directory of your website no longer exists, it must be shielded by robots and return the correct 404 error page. (note: in IIS, there is a problem with the setting when some friends set the 404 error. In the custom error page, the correct setting for the 404 error should be selected: default or file. It should not be: URL to prevent the search engine from returning a status code of 200. As for how to set up, there are many online tutorials, we should search)
There is a controversial question about whether the website background management directory needs to be blocked, in fact, this is optional. In the case of ensuring the security of the website, if the operation scale of your website is small, even if the website management directory appears in the robots.txt file, it will not be a big problem. I have also seen many websites set up like this. However, if your website has a large operation scale and too much competition, it is strongly recommended that you do not have any information about managing the directory in the background of your website, so as not to be used by people with ulterior motives to harm your interests. In fact, the search engine is becoming more and more intelligent, and the management directory of the website can be well identified, and the index can be abandoned. In addition, when you are doing the background of the website, you can also add to the page meta tag: the search engine is shielded and crawled.
Finally, it should be noted that many webmaster friends like to put the site map address in the robots.txt file, of course, this is not to block the search engine, but to allow the search engine to quickly grab the site content through the site map when indexing the site for the first time.
Need to pay attention to here: 1, the production of site map must be standardized; 2, the website must have high-quality content
On the robots.txt file should be blocked in what the directory is shared here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.