Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to climb memes in the actual combat of scrapy

2025-03-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article shows you how to climb emojis in scrapy actual combat, the content is concise and easy to understand, it can definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

First, the idea of climbing memes (http://www.doutula.com)

1. Open the website and click the latest set of pictures.

2. After that, we can see that there is no set of pictures, we need to extract the connection of each set of pictures.

3. After obtaining the connection, go to the page to extract the picture.

4. We can find that the site is also interspersed with advertisements, and we need to filter ads.

II. Actual combat

We won't say any more about the new project. If you don't know, you can take a look at the previous article.

1. First, we extract the url on the first page.

From the picture above, we can find that all the url we want is under the div of class named col-sm-9.

The part of the red box is advertising. It's not an a tag, so we don't have to filter it. We can directly select the direct child node under col-sm-9.

Write down the following code:

It is worth noting that header information needs to be added in settings.py and robots.txt protocol needs to be modified to False.

Let's hit the breakpoint and debug it:

We found that the information we wanted had been extracted.

Note: the mate parameter in Request is used to pass parameters and is passed to the next method. The method of use is similar to that of a dictionary.

2. Perfect item

We only need three fields, what series, picture url, picture name.

3. Extract the fields we need in item

4. Next page

5. Save

Because there is no research on saving pictures in scrapy, I write my own methods to save pictures.

Add the following code to the pipelines.py category:

And add to the settings.py:

6. Run

Directly report the error. Because there is an anti-scraping mechanism, we add header information in settings.py.

The error was reported again after running for a period of time, and it seems that the header information needs to be changed at random.

It is very convenient for us to use the third-party library here, pip3 install fake_useragent

After successful installation, we import: from fake_useragent import UserAgent in middlewares.py

Add the following code:

Add to the settings.py file

Can

Run the main file:

That's it.

Summary:

Effect picture:

Question:

Four problems were encountered during the operation:

1. Did not get a connection as big as a picture:

Maybe there are two versions of this site that get css in different ways.

Solution: you can use | (or) in xpath to solve the problem.

2. Did not get the picture name

Solution: ditto

3. The picture has the same name.

Solution: you can use md5 encryption and add it, or you can use your own method

4. Include it in the picture name? Illegal characters such as /\

Solution: you can use regular filtering, and if md5 encryption, then solve two problems at once.

Although some pictures were not obtained, they still crawled a lot.

The above content is how to climb emojis in scrapy actual combat. Have you learned the knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report