Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

The method of downloading and saving pictures by scrapy in Python

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces the Python scrapy download and save pictures in the method of related knowledge, the content is detailed and easy to understand, simple and fast operation, with a certain reference value, I believe that after reading this Python scrapy download to save pictures in the article will have a harvest, let's take a look at it.

In the daily crawler exercise, the data we crawled need to be saved. In scrapy, we can use the class ImagesPipeline for related operations. This class is encapsulated by scrapy, and we can use it directly.

When downloading image data using ImagesPipeline, we need to rewrite three of the pipeline class methods, where-get_media_request initiates a request for the image address

-file path returns the name of the picture

-item_completed returns item and returns it to the next pipe class to be executed

What does the specific code look like? first, we need to import the ImagesPipeline class in the pipelines.py file, and then rewrite the three methods mentioned above:

From scrapy.pipelines.images import ImagesPipelineimport scrapyimport os class ImgsPipLine (ImagesPipeline): def get_media_requests (self, item, info): yield scrapy.Request (url = item ["img_src"], meta= {"item": item}) # return def file_path (self, request, response=None, info=None): item = request.meta ["item"] print ("#" Item) filePath = item ["img_name"] return filePath def item_completed (self, results, item, info): return item

After the method is defined, we need to set it in the settings.py configuration file. One is to specify the location where the picture is saved, IMAGES_STORE = "D:ImgPro", and then to enable the "ImgsPipLine" pipe.

ITEM_PIPELINES = {"imgPro.pipelines.ImgsPipLine": 300,300 represents priority, the smaller the number, the higher the priority}

After the setup is complete, we can see the successfully saved picture under "D:ImgPro" after we run the program.

The complete code is as follows:

Spider file code:

#-*-coding: utf-8-*-import scrapyfrom imgPro.items import ImgproItem class ImgSpider (scrapy.Spider): name = "img" allowed_domains = ["www.521609.com"] start_urls = ["http://www.521609.com/daxuemeinv/"] def parse (self) Response): # parse the picture address and picture name li_list = response.xpath ("/ / div [@ class=" index_img list_center "] / ul/li") for li in li_list: item = ImgproItem () item ["img_src"] = "http://www.521609.com/" + li.xpath (". / a [1] / img/@src "). Extract_first () item [" img_name "] = li.xpath (". / a [1] / img/@alt ") .extract_first () +" .jpg "# print (" * ") # print (item) yield item

Items.py file

Import scrapy class ImgproItem (scrapy.Item): # define the fields for your item here like: # name = scrapy.Field () img_src = scrapy.Field () img_name = scrapy.Field ()

Pipelines.py file

From scrapy.pipelines.images import ImagesPipelineimport scrapyimport osfrom imgPro.settings import IMAGES_STORE as IMGS class ImgsPipLine (ImagesPipeline): def get_media_requests (self, item, info): yield scrapy.Request (url = item ["img_src"], meta= {"item": item}) # return def file_path (self, request, response=None, info=None): item = request.meta ["item"] print ("#" Item) filePath = item ["img_name"] return filePath def item_completed (self, results, item, info): return item

Settings.py file

Import randomBOT_NAME = "imgPro" SPIDER_MODULES = ["imgPro.spiders"] NEWSPIDER_MODULE = "imgPro.spiders" IMAGES_STORE = "D:ImgPro" # File save path LOG_LEVEL = "WARNING" ROBOTSTXT_OBEY = False# setting user-agentUSER_AGENTS_LIST = ["Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/22.0.1207.1 Safari/537.1", "Mozilla/5.0 (X11) CrOS i686 2268.111.0) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11 "," Mozilla/5.0 (Windows NT 6.1) WOW64) AppleWebKit/536.6 (KHTML, like Gecko) Chrome/20.0.1092.0 Safari/536.6 "," Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.6 (KHTML, like Gecko) Chrome/20.0.1090.0 Safari/536.6 "," Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/19.77.34.5 Safari/537.1 "," Mozilla/5.0 (X11) " Linux x86x64) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.9 Safari/536.5 "," Mozilla/5.0 (Windows NT 6.0) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.36 Safari/536.5 "," Mozilla/5.0 (Windows NT 6.1" WOW64) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3 "," Mozilla/5.0 (Windows NT 5.1) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3 "," Mozilla/5.0 (Macintosh) Intel Mac OS X 10: 8: 0) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3 "," Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3 "," Mozilla/5.0 (Windows NT 6.1) WOW64) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3 "," Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3 "," Mozilla/5.0 (Windows NT 6.1) WOW64) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3 "," Mozilla/5.0 (Windows NT 6.1) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3 "," Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.0 Safari/536.3 "," Mozilla/5.0 (X11) " Linux x86x64) AppleWebKit/535.24 (KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24 "," Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/535.24 (KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24 "] USER_AGENT = random.choice (USER_AGENTS_LIST) DEFAULT_REQUEST_HEADERS = {" Accept ":" text/html,application/xhtml+xml,application/xml;q=0.9,*/* Qroom0.8 "," Accept-Language ":" en ", #" User-Agent ":" Mozilla/5.0 (Windows NT 10.0; Win64) X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 "," User-Agent ": USER_AGENT} # launch pipeline pipeline ITEM_PIPELINES = {" imgPro.pipelines.ImgsPipLine ": 300,} this article on" how to download and save pictures in scrapy in Python "ends here, thank you for reading! I believe you all have a certain understanding of the knowledge of "how to download and save pictures in scrapy in Python". If you want to learn more, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report