In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article introduces the relevant knowledge of "Python crawler case analysis". In the operation process of actual cases, many people will encounter such difficulties. Next, let Xiaobian lead you to learn how to deal with these situations! I hope you can read carefully and learn something!
I. Discussion
These cases were previously written for some friends who wanted to enter the Python industry. Seeing that everyone was satisfied, I took them out again. If you have already started learning Python and have no clue about reptiles, you may wish to take a look at these cases!
II. Environmental preparation
Python 3
requests library, lxml library, beautifulsoup4 library
pip install XX XX installed together.
Third, Python crawler small case
1. Get the public IP address of this machine
Use python requests library + public network IP search interface to automatically obtain IP address
2. Using Baidu's search interface, Python writes url collection tools
Need to use the requests library, BeautifulSoup library, observe the URL link law of Baidu search structure, bypass the anti-crawler mechanism of Baidu search engine, set User-Agent request header in the program.
Python source code:
After writing the program in Python language, use the keyword inurl:/dede/login.php to extract the background address of a certain network cms in batches:
3. Use Python to create Sogou wallpaper to automatically download crawlers
Sogou wallpaper address is json format, so use json library to parse this set of data, crawler storage image disk path changed to the path to save the image can be.
Effect drawing:
Python automatically fills out questionnaires
As with most web pages, submitting data multiple times requires entering a Captcha, which is the anti-crawling mechanism.
As shown in the figure:
So how do you bypass Captcha anti-creep measures? Using X-Forwarded-For forged IP address access, Python code is as follows:
Effect:
5. Obtain the IP of the Western Thorn proxy and verify the possibility and delay time of blocking these proxies.
You can add the proxy IP crawled by Python to the proxy chain, and you can perform general penetration tasks. Here we call the linux system command ping -c 1 " + ip.string + "| awk 'NR==2{print}' -, To run this program in Windows, you need to modify the command in the third line from the bottom of os.popen to something Windows can execute.
The crawled data is shown in the figure below:
Demo:
"Python crawler case analysis" content is introduced here, thank you for reading. If you want to know more about industry-related knowledge, you can pay attention to the website. Xiaobian will output more high-quality practical articles for everyone!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.