In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly explains "how to use requests to climb beautiful photos". Friends who are interested may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "how to climb beautiful photos with requests".
What is selenium?
Selenium Automation browser. okay! How you use this power is entirely up to you. It is mainly used to automate web applications for testing, but of course it is not limited to that. Boring web-based management tasks can (and should) be automated.
Some large browser vendors, which support Selenium, have (or are) taking steps to make Selenium a local part of their browsers. It is also the core technology of countless other browser automation tools, api, and frameworks.
It can be said that selenium's job is not to crawl, but to automate the testing of Web applications. So using selenium to climb Zhihu above is actually a way of going too far, which also explains why the amount of crawling is not very optimistic.
What is Requests?
Take a look at Requests's documentation: Requests's only non-GM Python HTTP library that humans can safely enjoy. Warning: unprofessional use of other HTTP libraries can cause dangerous side effects, including safety deficiencies, redundant coding, reinventing the wheel, gnawing, depression, headaches, and even death.
From this humorous introduction, it is not difficult to see that the Requests author is very confident about Requests, and we use Requests to climb Zhihu pictures to verify this point.
Get up!
Let's briefly describe the crawler steps in accordance with the usual practice:
0 1 find the relevant problem and get the question id
As shown in the image above, the id we need is in the red box.
02 parsing web pages with requests
Core code:
Get_url = 'https://www.zhihu.com/api/v4/questions/'+id+'/answers?include=data[*].is_normal,admin_closed_comment,reward_info,is_collapsed,annotation_action,annotation_detail,collapse_reason,is_sticky,collapsed_by,suggest_edit,comment_count,can_comment,content,editable_content,voteup_count,reshipment_settings,comment_permission,created_time,updated_time,review_info,relevant_info,question,excerpt,relationship.is_authorized,is_author,voting, Is_thanked,is_nothelp Data [*] .mark_infos [*] .url; data [*] .author.follower_count,badge [*] .limit=5&offset='+str & limit=5&offset='+str (offset) +'& sort_by=default'
Header = {'User-Agent': "Mozilla/5.0 (X11; Ubuntu; Linux x8634; rv:34.0) Gecko/20100101 Firefox/34.0",' Host': "www.zhihu.com",}
R = requests.get (get_url, verify=False, headers=header) content = r.content.decode ("utf-8") txt = json.loads (content)
We parsed the web page information into txt.
0 3 match to get the picture address
Core code:
ImgUrls = re.findall ([^ "] +)', str (txt)) imgUrls = list (set (imgUrls)) for imgUrl in imgUrls: try: splitPath = imgUrl.split ('.') FTail = splitPath.pop () print (fTail) if len (fTail) > 3: fTail = 'jpg' fileName = path + "/" + str (number) + ". + fTail img_data = urllib.request.urlopen (imgUrl). Read ()
After getting the txt, we need to match the address of the image with a regular expression, and then download the image locally according to that address.
At this point, I believe you have a deeper understanding of "how to use requests to climb beautiful photos". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.