Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How does python operate scrapy cookie crawling blog involves browsercookie

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article will explain in detail how python scrapy cookie crawling blog involves browsercookies, the quality of the article is high, so Xiaobian shares it with you for a reference, I hope you have a certain understanding of related knowledge after reading this article.

browsercookie knowledge base

The first knowledge point to learn is to obtain browser cookies using browsercookies, which are installed using the command pip install browsercookies.

Next get firefox browser cookies, do not use chrome Google browser reason is after version 80, its cookie encryption method has been modified, so use browsercookie module will appear the following error

win32crypt must be available to decrypt Chrome cookie on Windows

The code to get cookies is as follows:

import browsercookiefirefox_cookiejar = browsercookie.firefox()for c in firefox_cookiejar: print(c)

Run the code and output the following format.

Once cookies are acquired, you can access pages that you cannot access until you log in (provided you log in once in Firefox).

Here is an example of an admin center. After logging in to Firefox, you can directly access the background interface after using browsercookies to obtain cookies.

import browsercookieimport requestsfirefox_cookiejar = browsercookie.firefox()# for c in firefox_cookiejar:# print(c)res = requests.get("https://img-home.csdnimg.cn/data_json/jsconfig/menu_path.json", cookies=firefox_cookiejar)print(res.text)

You can access the background menu directly.

Automatic likes using browsercookies

In the scrapy framework, there is already a built-in CookiesMiddleware for processing cookies. This time, we inherit CookiesMiddleware and then use browsercookie library to complete the development of the like-maker (only one test case was done, no concurrency was used).

Open the middlewares.py file and write custom classes:

from scrapy.downloadermiddlewares.cookies import CookiesMiddlewareimport browsercookieclass BrowserCookiesDownloaderMiddleware(CookiesMiddleware): def __init__(self, debug=False): super().__ init__(debug) self.load_browser_cookies() def load_browser_cookies(self): This place is called Firefox. jar = self.jars['firefox'] firefox_cookiejar = browsercookie.firefox() for cookie in firefox_cookiejar: jar.set_cookie(cookie)

The core content of the above class is to use browsercookie to extract browser cookies, store them in CookieJar type dictionary jars, and invoke them when subsequent requests are made.

Sync disables the default CookiesMiddleware in the settings.py file and enables our custom new classes.

DOWNLOADER_MIDDLEWARES = { 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware': None, 'csdn.middlewares.BrowserCookiesDownloaderMiddleware': 543,}

When writing the crawler core function, focus on modifying the Request request to POST request, and carry relevant parameters, meta={'cookiejar': COOKIEJAR}.

The code is as follows:

import scrapyclass ClikeSpider(scrapy.Spider): name = 'clike' allowed_domains = ['csdn.net'] like_url = 'https://blog.csdn.net/phoenix/web/v1/article/like' def start_requests(self): data = { "articleId": "120845464", } yield scrapy.FormRequest(url=self.like_url, formdata=data, meta={'cookiejar': 'firefox'}) def parse(self, response): print(response.json())

After running the crawler, you can find successful likes in the log.

About python how to do scrapy cookie crawling blog related to browsercookies shared here, I hope the above content can be of some help to everyone, you can learn more knowledge. If you think the article is good, you can share it so that more people can see it.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report