Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How Python crawler uses browser's cookies browsercookie

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article introduces how the Python crawler uses the cookies browsercookie of the browser, the content is very detailed, interested friends can refer to, hope to be helpful to you.

Many people who use Python may have written about web crawlers, automatic access to network data is really a pleasant thing, and Python is a good way to help us achieve this pleasure. However, crawlers often encounter all kinds of login and verification obstacles, which is frustrating (website: it is also frustrating to encounter all kinds of crawlers to catch our website every day). Reptile and anti-reptile is a cat and mouse game, as virtue rises one foot vice rises ten, the two repeatedly entangled.

Because of the stateless nature of http protocol, login authentication is achieved by passing cookies. Once you log in through the browser, the cookie of the login information will be saved by the browser. The next time you open the site, the browser automatically brings the saved cookies, only the cookies has not expired, for the site you are still logged in status.

The browsercookie module is such a tool for extracting saved cookies from browsers. It is a very useful crawler tool that allows you to easily download web content that needs to be logged in by loading your browser's cookies into a cookiejar object.

Installation

Pip install browsercookie

In Windows systems, the built-in sqlite module throws an error when loading an FireFox database. The version of sqlite needs to be updated:

Pip install pysqlite

Usage

Here is an example of extracting a title from a web page:

> import re > get_title = lambda html: re.findall ('(. *?)', html, flags=re.DOTALL) [0] .strip ()

The following is the title of the download without login:

> import urllib2 > url = 'https://bitbucket.org/'>>> public_html = urllib2.urlopen (url). Read () > get_title (public_html)' Git and Mercurial code management for teams'

Next, use browsercookie to obtain cookie from the FireFox logged in to Bitbucket, and then download:

> import browsercookie > cj = browsercookie.firefox () > opener = urllib2.build_opener (urllib2.HTTPCookieProcessor (cj)) > login_html = opener.open (url) .read () > get_title (login_html) 'richardpenman / home-Bitbucket'

Here is the code for Python2. Try Python3 again:

> import urllib.request > public_html = urllib.request.urlopen (url). Read () > opener = urllib.request.build_opener (urllib.request.HTTPCookieProcessor (cj))

You can see that your user name appears in title, indicating that the browsercookie module successfully loaded cookies from FireFox.

Here is an example of using requests. This time we load cookies from Chrome. Of course, you need to log in to Bitbucket with Chrome in advance:

> import requests > cj = browsercookie.chrome () > r = requests.get (url, cookies=cj) > get_title (r.content) 'richardpenman / home-Bitbucket'

If you don't know or care which browser has the cookies you need, you can do this:

> cj = browsercookie.load () > r = requests.get (url, cookies=cj) > get_title (r.content) 'richardpenman / home-Bitbucket' support

Currently, the module supports the following platforms:

Chrome: Linux, OSX, Windows

Firefox: Linux, OSX, Windows

This is the end of the cookies browsercookie on how Python crawlers use browsers. I hope the above content can be of some help and learn more. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report