Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use python multithreading to crawl weather website pictures and save them

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces how to use python multithreading to crawl weather website pictures and save, has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, the following let the editor take you to understand it.

1.1 topic

Specify a website to crawl all the images in the site, such as China Meteorological Network (www.weather.com.cn), using single-threaded and multithreaded crawls, respectively. (limit the number of crawled pictures to the last 3 digits of the student number)

Output information: output the downloaded Url information in the console, store the downloaded image in the images sub-file, and give a screenshot.

1.2 ideas

1.2.1 send request

Construct request header

Import requests,reimport urllibheaders = {'Connection':' keep-alive', 'Cache-Control':' max-age=0', 'Upgrade-Insecure-Requests':' 1, 'User-Agent':' Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36', 'Accept':' text/html,application/xhtml+xml,application/xml Qcalendar 0.9 zh-CN,zh;q=0.9', zh-CN,zh;q=0.9', Avie url = "http://www.weather.com.cn/"request = urllib.request.Request (url, headers=headers)

Send a request

Request = urllib.request.Request (url, headers=headers) r = urllib.request.urlopen (request) 1.2.2 parsing web pages

The page is parsed, and enter is replaced to facilitate regular matching of images later.

Html = r.read () .decode () .replace

1.2.3 get the node

Using regular matching, get all the a tags first, and then crawl all the pictures under the a tags

UrlList = re.findall ('

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report