In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly explains "what are the anti-crawler strategies for changing IP software". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Next, let the editor take you to learn "what are the anti-crawler strategies for changing IP software?"
1. Anti-crawler user behavior.
Most sites are the former, which can be solved by using an IP agent. The proxy IP can be saved in the file after detection, but this method is not ideal, and the possibility of failure of the proxy IP is very high, so it is a good choice to crawl from the dedicated proxy IP site in real time.
In the second case, the next request can be executed at random intervals of several seconds after each request. Some websites have logic vulnerabilities that can bypass the same account that cannot be repeated in a short period of time by making multiple requests, logging out, logging in again, continuing requests, and so on.
In addition, cookies can also check cookies to determine whether the user is a valid user, which is often used by websites that need to log in. Further, the login of some websites will be updated and verified dynamically, and the randomly assigned authenticity_token,authenticity_token will be returned to the server along with the login and password submitted by the user.
2. Anti-crawler strategy is the most commonly used anti-crawler strategy required by users through Headers.
Many sites can detect User-Agent of Headers, and some sites can detect Referer (hotlink protection of some resource sites is to detect Referer).
When you encounter this type of anti-crawler mechanism, you can add Headers directly to the crawler, copy the browser's User-Agent to the crawler's Headers, or change the referer value to the target site domain name. For anti-crawlers that detect Headers, modifying the crawler or adding Headers is a good way to avoid crawlers.
3. Restrict some IP access.
Free proxy IP can be obtained from many websites, and since these proxy IP can be used by crawlers, websites can also take advantage of the reverse restrictions of these proxy IP to save the reverse restrictions of these proxy IP on the server by crawling these IP, thus limiting the use of proxy IP for crawling.
4. Anti-crawl dynamic page.
Sometimes when you catch the target page, you will find that the key information content is blank, only the frame code, this is because the site information returns the content information dynamically through the XHR user Post, the solution to this problem is to analyze the website flow through development tools (FireBug, etc.), find independent content information request (such as Json), and get the content information you want to grab.
In addition, it also includes the function of encrypting dynamic requests, which can not parse or grab parameters. In this example, calling the browser kernel through Mechanize,seleniumRC is as successful as actually using a browser to surf the Internet, only at a discount in efficiency.
At this point, I believe you have a deeper understanding of "what are the anti-crawler strategies for changing IP software?" you might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.