Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to understand the transmission method of Scrapy about item pipeline

2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly introduces "how to understand the delivery method of Scrapy on item pipeline". In daily operation, I believe many people have doubts about how to understand the delivery method of Scrapy on item pipeline. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts of "how to understand Scrapy about the delivery method of item pipeline". Next, please follow the editor to study!

When Item is collected in Spider, it is passed to Item Pipeline, and these Item Pipeline components process Item in a defined order.

Each Item Pipeline is a Python class that implements simple methods, such as determining whether the Item is discarded and stored. Here are some typical applications of item pipeline:

Validate crawled data (check that item contains certain fields, such as name fields)

Check (and discard)

Save the crawl results to a file or database

Write item pipeline

Writing item pipeline is simple. The item pipiline component is a separate Python class, where the process_item () method must be implemented:

Import somethingclass SomethingPipeline (object): def _ init__ (self): # optional implementation, parameter initialization, etc. # doing something def process_item (self, item, spider): # item (Item object)-crawled item # spider (Spider object)-the spider method for crawling the item must be implemented, each item pipeline component needs to call this method, and # this method must return an Item object Discarded item will not be processed by subsequent pipeline components. Return item def open_spider (self, spider): # spider (Spider object)-the spider # optional implementation that is opened, and this method is called when spider is turned on. Def close_spider (self, spider): # spider (Spider object)-spider # optional implementation that is closed, this method is called when spider is closed

Enable an Item Pipeline component

To enable the Item Pipeline component, you must add its classes to the settings.py file ITEM_PIPELINES configuration, as in the following example:

# Configure item pipelines# See http://scrapy.readthedocs.org/en/latest/topics/item-pipeline.htmlITEM_PIPELINES = {# 'mySpider.pipelines.SomePipeline': 300, "mySpider.pipelines.ItcastJsonPipeline": 300}

The integer values assigned to each class determine the order in which they run. Item follows the order of numbers from lowest to highest. Through pipeline, these numbers are usually defined in the range of 0-1000 (0-1000 is set at will. The lower the value, the higher the priority of the component).

Restart the crawler

Change the parse () method to the code in the last thought in the introduction, and then execute the following command:

Scrapy crawl itcast

Check whether the current directory generates teacher.json

At this point, the study on "how to understand the delivery method of Scrapy about item pipeline" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report