Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the knowledge points of Python's Scrapy framework?

2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly explains "what are the knowledge points of the Scrapy framework of Python". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn what are the knowledge points of the Scrapy framework of Python.

one。 Why use the Scrapy framework?

Scrapy is a fast, high-level framework for screen capture and web capture, which can be used for data mining, monitoring and automatic detection, and anyone can modify it as needed.

II. Introduction of each component of the Scrapy framework

1.Scrapy engine (Scrapy Engine): responsible for controlling the flow of data flow through all components of the system and triggering events when the corresponding actions occur.

two。 Scheduler (Scheduler): accepts reques from the engine and queues it so that it can be requested to provide to the engine later.

3. Downloader (Downloader): responsible for getting web page data and providing it to the engine, and then to Spider.

4.Spiders: a URL class written by Scrapy users to analyze response and extract item or additional follow-up. Each Spider is responsible for handling specific sites.

5.Item Pipeline: responsible for processing the item extracted by Spider. Typical treatments are cleaning, verification, and persistence.

6. Downloader middleware (Downloader Middlewares): a specific specific hook between the engine and the downloader that handles the response that the Downloader passes to the engine. It provides a simple mechanism to extend Scrapy functionality by inserting custom code.

7.Spiders middleware (Spider Middlewares): a specific specific hook between the engine and the Spider that handles the input (response) and output of the Spider (items and requests). It provides a simple mechanism to extend Scrapy functionality by inserting custom code.

III. The working principle of Scrapy Framework

1. The engine asked spider for URL

two。 Give the URL that the engine will crawl to the scheduler

3. The scheduler places the URL generation request object in the specified queue

4. Make a request from the queue

5. The engine passes the request to the downloader for processing

6. The downloader sends a request to obtain Internet data

7. The downloader returns the data to the engine

8. The engine sends the data back to the spiders

9.spiders parses the data through xpath to get the data or URL

10.spiders gives data or URL to the engine

11. The engine determines whether the data is URL or data, and gives it to the pipeline for processing, and the URL to the scheduler for processing.

twelve。 When there is no data in the scheduler, the whole program stops

The following is what I draw according to the working principle, which can be combined to see:

Thank you for your reading, these are the contents of "what are the knowledge points of Python's Scrapy framework?" after the study of this article, I believe you have a deeper understanding of what the knowledge points of Python's Scrapy framework have, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report