In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly introduces "how to understand the delivery method of Scrapy on item pipeline". In daily operation, I believe many people have doubts about how to understand the delivery method of Scrapy on item pipeline. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts of "how to understand Scrapy about the delivery method of item pipeline". Next, please follow the editor to study!
When Item is collected in Spider, it is passed to Item Pipeline, and these Item Pipeline components process Item in a defined order.
Each Item Pipeline is a Python class that implements simple methods, such as determining whether the Item is discarded and stored. Here are some typical applications of item pipeline:
Validate crawled data (check that item contains certain fields, such as name fields)
Check (and discard)
Save the crawl results to a file or database
Write item pipeline
Writing item pipeline is simple. The item pipiline component is a separate Python class, where the process_item () method must be implemented:
Import somethingclass SomethingPipeline (object): def _ init__ (self): # optional implementation, parameter initialization, etc. # doing something def process_item (self, item, spider): # item (Item object)-crawled item # spider (Spider object)-the spider method for crawling the item must be implemented, each item pipeline component needs to call this method, and # this method must return an Item object Discarded item will not be processed by subsequent pipeline components. Return item def open_spider (self, spider): # spider (Spider object)-the spider # optional implementation that is opened, and this method is called when spider is turned on. Def close_spider (self, spider): # spider (Spider object)-spider # optional implementation that is closed, this method is called when spider is closed
Enable an Item Pipeline component
To enable the Item Pipeline component, you must add its classes to the settings.py file ITEM_PIPELINES configuration, as in the following example:
# Configure item pipelines# See http://scrapy.readthedocs.org/en/latest/topics/item-pipeline.htmlITEM_PIPELINES = {# 'mySpider.pipelines.SomePipeline': 300, "mySpider.pipelines.ItcastJsonPipeline": 300}
The integer values assigned to each class determine the order in which they run. Item follows the order of numbers from lowest to highest. Through pipeline, these numbers are usually defined in the range of 0-1000 (0-1000 is set at will. The lower the value, the higher the priority of the component).
Restart the crawler
Change the parse () method to the code in the last thought in the introduction, and then execute the following command:
Scrapy crawl itcast
Check whether the current directory generates teacher.json
At this point, the study on "how to understand the delivery method of Scrapy about item pipeline" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.