Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to deal with Scrapy Random User-Agent with Code

2025-04-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

Today, I will talk to you about how to use code to deal with Scrapy random User-Agent, many people may not know much about it. In order to make you understand better, the editor has summarized the following content for you. I hope you can get something according to this article.

Abstract: anti-crawling measures in the process of crawlers are very important, in which setting random User-Agent is an important anti-crawling measure. There are many ways to set random UA in Scrapy, some are complex and some are simple. This paper summarizes these methods and provides a setting method that requires only one line of code.

Recently, when I used Scrapy to climb a website, I encountered a situation of anti-crawling, so I began to search for some anti-crawling measures. I learned that setting random UA to disguise the request header is a common way, which can to some extent prevent the website from directly identifying you as a crawler and blocking you. There are many ways to set random UA, some need a lot of lines of code, some only need one line of code to do, and then I'll introduce it.

▌ General Settings UA

First of all, to talk about the normal use of Scrapy, the more convenient way is to use the fake_useragent package, this package built-in a large number of UA can be randomly replaced, which is much more convenient than your own collection list, let's take a look at how to operate.

First, install the fake_useragent package and get it done with one line of code:

1pip install fake-useragent

Then, you can test:

1from fake_useragent import UserAgent

2ua = UserAgent ()

3for i in range (10):

4 print (ua.random)

Here, using the ua.random method, you can randomly generate the UA of various browsers, as shown in the following figure:

(zoom in)

If you only want one browser, such as Chrome, you can change it to ua.chrome and generate a random UA again to check it out:

This is one of the ways to set random UA routinely, which is very convenient.

Next, let's introduce several ways to set up a random UA in Scrapy.

First create a new Project, named wanojia, and test the site choice: http://httpbin.org/get.

First of all, let's take a look at what happens if we don't add UA. You can see that scrapy is displayed, which exposes our crawler and is easily blocked.

Next, we add UA.

▌ directly sets UA

The first method is to set UA directly in the main program as above, and then run the program. You can output the UA of the website through the following command. As shown in the arrow above, each request will generate UA randomly. This method is relatively simple, but every request under requests needs to be set up, which is not very convenient. Since Scrapy is used, it provides a place to set UA. So let's take a look at how to set up UA individually.

1response.request.headers ['User-Agent']

▌ manually add UA

The second method is to manually add some UA in the settings.py file, and then randomly call through the random.choise method, you can generate UA, this convenience is more troublesome is to find their own UA, and increased the number of lines of code.

Set UA in ▌ middlewares.py

The third way is to use the fake-useragent package to overwrite the process_request () method in the middlewares.py middleware by adding the following lines of code.

1from fake_useragent import UserAgent

2class RandomUserAgent (object):

3 def process_request (self, request, spider):

4 ua = UserAgent ()

5 request.headers ['User-Agent'] = ua.random

Then, we go back to the settings.py file to call the custom UserAgent, and note that we need to turn off the default UA setting method first.

1DOWNLOADER_MIDDLEWARES = {

2 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None

3 'wandoujia.middlewares.RandomUserAgent': 543

4}

As you can see, we have successfully obtained the random UA.

▌ one line of code to set up UA

As you can see, the above methods are actually not very convenient, the amount of code is also relatively large, is there a simpler setting method?

Yes, it only takes a single line of code to do it, using a package called scrapy-fake-useragent.

First post the official website of the package: https://pypi.org/project/scrapy-fake-useragent/, it is very easy to use, just install it and then use it.

Execute the following command to install, and then enable the random UA setting command in settings.py, which is very simple and easy.

1pip install scrapy-fake-useragent

1DOWNLOADER_MIDDLEWARES = {

2 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None, # turn off the default method

3 'scrapy_fake_useragent.middleware.RandomUserAgentMiddleware': 400, # Open

4}

When we output the UA and the web page Response, we can see that the result has been successfully output.

These are several ways to set random UA in Scrapy. The last one is to install the scrapy-fake-useragent library, and then add the following line of code to settings:

1 such scrapybread fakeproof useragent.roomleware.RandomUserAgentMiddlewarehousing: 400

In addition, anti-crawling measures in addition to setting random UA, there is a very important measure is to set random IP.

After reading the above, do you have any further understanding of how to deal with Scrapy random User-Agent with code? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report