Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to realize multithreaded climbing emoji package after Python

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article is about Python and then the realization of multi-threaded climbing emoji package, the editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article. Let's take a look at it.

Highlights of the course

Systematic analysis of target web pages

Html tag data parsing method

Save massive picture data with one click

Environment introduction

Python 3.8

Pycharm

Module use

Requests > > pip install requests

Parsel > > pip install parsel

Time time module records run time

Process one. Analyze where the data content we want can be obtained.

Meme > Picture url address and Picture name

For the use of developer tools >

two。 Code implementation steps

1. Send a request

Determine the url address of the send request

What is the request method? get request method post request method

Request header parameter: hotlink protection cookie …

two。 Get data

Get the data content returned by the server

Response.text acquires text data

Response.json () gets json dictionary data

Response.content acquires binary data to save picture / audio / video / specific format file content is to obtain binary data content

3. Parsing data

Extract the data content we want

i. It can be parsed directly.

Data key-value pairs of II. Json dictionary

III. Re regular expression

IV. Css selector

V. Xpath

4. Save data

Text

Csv

Database

Local folder

Import module import requests # data request module third party module pip install requestsimport parsel # data parsing module third party module pip install parselimport re # regular expression module import time # time module import concurrent.futures single thread crawl 10 pages of data

1. Send a request

Start_time = time.time () for page in range (1,11): url = f 'https://fabiaoqing.com/biaoqing/lists/page/{page}html' headers = {' User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64) X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36'} response = requests.get (url=url, headers=headers) # response object 200 status code indicates that the request was successful

two。 Get data, get text data / web page source code

# the corresponding tag data is seen in the element panel above the developer tool, but no such data is returned after I send the request. # We need to extract the data according to the data returned by the server # xpath parsing method parsel parsing module parsel this module can call the xpath parsing method # print (response.text)

3. Parsing data

# parsing speed bs4 parsing speed will be slower if you want to directly value the string data content, only the regular expression selector = parsel.Selector (response.text) # convert the acquired html string data content into the selector object title_list = selector.css ('.ui.image.data:: attr (title)'). Getall () img_list = selector.css ('. Ui. Image.lazy::attr (data-original)') .getall () # extract the two list extraction elements one by one # extract list elements for loop through for title Img_url in zip (title_list, img_list):

4. Save data

# split () string segmentation method takes the value # img_name_1 = img_url [- 3:] # slicing through string data according to the list index position # the index position from left to right is 0 and from right to left is-1 # print (title, img_url) title = re.sub (r'[\ /: *? "|\ n]','_' Title) # name too long error img_name = img_url.split ('.') [- 1] # take the value img_content = requests.get (url=img_url) according to the list index position by split () string segmentation. Content # gets the binary data content with open ('img\' + title +'.'+ img_name) of the picture. Mode='wb') as f: f.write (img_content) print (title)

Multithreading crawls 10 pages of data

Def get_response (html_url): "send request" headers = {'User-Agent':' Mozilla/5.0 (Windows NT 10.0; Win64) " X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36'} response = requests.get (url=html_url Headers=headers) return responsedef get_img_info (html_url): "" get the url address and name of the picture "" response = get_response (html_url) selector = parsel.Selector (response.text) # convert the acquired html string data content into the selector object title_list = selector.css ('.ui.image.image:: attr (title)') .getall () img_list = selector.css ('.ui.image.imagination:: attr (data-original)') .getall () zip_data = zip (title_list) Img_list) return zip_datadef save (title, img_url): "Save data" title = re.sub (r'[\ /: *? "|\ n]','_' Title) # name too long error img_name = img_url.split ('.') [- 1] # take the value img_content = requests.get (url=img_url) according to the list index position by split () string segmentation. Content # gets the binary data content with open ('img\' + title +'.'+ img_name) of the picture. Mode='wb') as f: f.write (img_content) print (title) Multi-process crawl 10 pages of data def main (html_url): zip_data = get_img_info (html_url) for title, img_url in zip_data: save (title Img_url) if _ name__ = ='_ main__': start_time = time.time () exe = concurrent.futures.ThreadPoolExecutor (max_workers=10) for page in range (1,11): # 1. Send the request url = f 'https://fabiaoqing.com/biaoqing/lists/page/{page}html' exe.submit (main, url) exe.shutdown () end_time = time.time () use_time = int (end_time-start_time) print (' program time:', use_time) above is Python and then implement the multi-thread climb meme The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report