Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the execution process of scrapy framework in python3

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)05/31 Report--

This article introduces the relevant knowledge of "what is the implementation process of scrapy framework in python3". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Scrapy Framework Overview: a fast, high-level screen capture and web crawl framework developed by Scrapy,Python for crawling web sites and extracting structured data from pages. Scrapy has a wide range of uses and can be used for data mining, monitoring and automated testing.

Create a project

Since pycharm cannot create a scrapy project directly, it must be created through the command line, so the relevant operations are carried out on the terminal of pycharm:

1. Install the scrapy module:

Pip install-I https://pypi.tuna.tsinghua.edu.cn/simple scrapy

2. Create a scrapy project: scrapy startproject test_scrapy

4. Generate a crawler: scrapy genspider itcast "itcast.cn"

5. Extract data: perfect spider, use xpath and other methods

6. Save data: save data in pipeline

Commonly used commands

Create a project: scrapy startproject xxx

Enter the project: cd xxx # go to a folder

Create a crawler: scrapy genspider xxx (crawler name) xxx.com (crawl domain)

Generate file: scrapy crawl xxx-o xxx.json (generate some type of file)

Run the crawler: scrapy crawl XXX

List all crawlers: scrapy list

Get configuration information: scrapy settings [options]

Files under the Scrapy project

Scrapy.cfg: configuration file for the project

Test_scrapy/: the python module of the project. Put the code here (core)

Test_scrapy/items.py: the item file in the project. (this is where the container is created, and the crawled information is put in different containers.)

Test_scrapy/pipelines.py: the pipelines file in the project.

Test_scrapy/settings.py: the settings file for the project. (I used to set the basic parameters, such as adding a file header and setting an encoding.)

The directory where test_scrapy/spiders/: places the spider code. (place where reptiles are kept)

The overall implementation process of the scrapy framework

Yeild of 1.spider sends request to engine

2.engine does nothing to request and sends it to scheduler.

3.scheduler, generate request and give it to engine

4.engine gets the request and sends it to downloader via middleware

After 5.downloader gets the response, it sends it to engine via middleware.

After 6.engine gets the response, the parse () method returned to spider,spider processes the acquired response and parses the items or requests

7. Send the parsed items or requests to engine

8.engine gets items or requests, sends items to ItemPipeline, and sends requests to scheduler (ps, the program stops only if there is no request in the scheduler, and scrapy will remake the request if the request fails)

Introduction to yeild function

To put it simply, the function of yield is to turn a function into a generator (generator). The function with yield is no longer an ordinary function, and the Python interpreter regards it as a generator. The function with yeild returns an iterative value when it encounters yeild. On the next iteration, the code continues to execute from the next statement in yield, and the local variables of the function look exactly the same as before the last interrupt. So the function continues until it encounters yield again.

Popular saying is: in a function, when the program executes the yield statement, the program pauses, returns the value of the expression after yield, and on the next call, continues execution from the place where the yield statement is paused, and loops until the function is finished.

This is the end of the content of "what is the implementation process of the scrapy framework in python3". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report