In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly introduces "how to crawl data with Python". In daily operation, I believe many people have doubts about how to crawl data with Python. Xiaobian consulted all kinds of materials and sorted out simple and easy operation methods. I hope to help you answer the doubts about "how to crawl data with Python"! Next, please follow the small series to learn together!
Goal: Crawl 100 pages of love stories from a given URL
Practice website
Preparation: python: version 3.7, development tools: pycharm, browser: google browser
Thinking:
Crawl web pages with requests+xpath and extract information
Regardless of whether the website has an anti-crawling strategy, the simplest way to carry headers is to use
Cycle write txt file
Encapsulate the main code as a function
A few key points:
Note the encoding format of the web page, which in general can be viewed in conjunction with response.encoding and response.headers. If there is no Content-Type in headers, encoding='utf-8'; if there is Content-Type, charset shall prevail, no charset, ISO-8859-1
Observe the structure of the web page, determine how to write procedures to achieve page turning, where the site is relatively simple, directly to the serial number +1 on the line
As for whether to use beautiful soup or xpath, use it as appropriate. Here, it is more convenient for me to use xpath directly for positioning.
Final effect:
Get 100 pages of love and save it to txt file.
major code
resultant output
Later planned improvements:
Prepare to write data to database
Building websites with flask framework
Prepare to implement a word cloud effect similar to the following.
At this point, the study of "how to crawl data with Python" is over, hoping to solve everyone's doubts. Theory and practice can better match to help you learn, go and try it! If you want to continue learning more relevant knowledge, please continue to pay attention to the website, Xiaobian will continue to strive to bring more practical articles for everyone!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.