Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to write python Scrapy crawler code

2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly explains "how to write python Scrapy crawler code". Interested friends may wish to have a look at it. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how to write python Scrapy crawler code.

Scrapy crawler import scrapy

Class demo (scrapy.Spider): # need to inherit the scrapy.Spider class name = "demo" # define the spider name

Def start_requests (self): # this method crawls the page through the link below

# define crawled links urls = ['http://lab.scrapyd.cn/page/1/',' http://lab.scrapyd.cn/page/2/',] for url in urls: yield scrapy.Request (url=url, callback=self.parse) # what to do with crawled pages? Submit to the parse method for processing

Def parse (self, response):''start_requests has crawled to the page, so how do we extract the content we want? Then it can be defined in this method. Here, there is no definition, but simply save the page, and it does not involve extracting the data we want. Later, we will slowly talk about the corresponding extraction with xpath, regular, or css. This example is to show you the process run by scrapy: 1. Define links; 2. Crawl (download) pages through links; 3. Define rules, and then extract data. ''

Page = response.url.split ("/") [- 2] # extract paging according to the above link, for example: / page/1/, extracts: 1 filename = 'demo-%s.html'% page # splices the file name. If it is the first page, the final file name is: mingyan-1.html with open (filename,' wb') as f: # python file operation, not to mention F.write (response.body) # where is the page I just downloaded? Response.body represents the page you just downloaded! Self.log ('save file:% s'% filename) # make a log

The explanation of each sentence is in the notes, you can have a good look, I do not have to say anything, and finally directly crawl run the crawler!

At this point, I believe you have a deeper understanding of "how to write python Scrapy crawler code". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report