In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
How to pierce the CloudFlare 5-second shield, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain for you in detail, people with this need can come to learn, I hope you can gain something.
Students who often write about crawlers must know the five-second shield of Cloud Flare. When you visit the site without using a normal browser, it will return the following text:
Checking your browser before accessing xxx.
This process is automatic. Your browser will redirect to your requested content shortly.
Please allow up to 5 seconds …
Even if you complete the Headers and use the proxy IP, it will find out. Let's look at an example. Mountain View Whisman students sent home after children test positive for COVID-19 [1] this article is accessed using a normal browser, and the effect is as follows:
Looking directly at the original web page source code, you can see that the news title and text are in the source code, indicating that the news title and body are rendered at the back end, not loaded asynchronously. As shown in the following figure:
Now, let's use requests with a complete request header to visit the site, as shown in the following figure:
The website recognized the crawler behavior and successfully blocked the crawler request. Many students are at a loss at this time. Because this is the crawler's first request is blocked, so the site is not detected IP or visit frequency, so even using proxy IP does not help. And now that even with a complete request header can be found, is there any way to bypass the test?
In fact, bypassing the 5-second shield is as simple as using a third-party library called cloudscraper. We can use pip to install:
Python3-m pip install cloudscraper
After the installation is complete, you can bypass Cloud Flare's 5-second shield with only three lines of code:
Import cloudscraper scraper = cloudscraper.create_scraper () resp = scraper.get ('target website') .text
Let's take the above website as an example:
Import cloudscraper from lxml.html import fromstring scraper = cloudscraper.create_scraper () resp = scraper.get ('https://mv-voice.com/news/2021/05/04/mountain-view-whisman-students-sent-home-after-children-test-positive-for-covid-19').text selector = fromstring (resp) title = selector.xpath (' / / h2/text ()') [0] print (title)
The running effect is shown in the following figure:
The shield was broken.
CloudScraper [2] is so powerful that it can break through the five-second shield of every free version of Cloud Flare. And its interface is consistent with requests. How to write code in requests, now you just need to change requests.xxx to scraper.xxx.
Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.