Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use python crawler to crawl the information on Renren

2025-03-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article mainly explains "how to use python crawler to get the information on Renren". The content in the article is simple and clear, and it is easy to learn and understand. below, please follow the editor's train of thought to study and learn "how to use python crawler to crawl the information above Renren".

Requests provides a class called session to implement session persistence between the client and the server.

Usage

1. Instantiate a session object

two。 Have session send a get or post request

Session = requests.session () session.get (url,headers)

Next, let's use Renren for actual combat.

# coding=utf-8import requestssession = requests.session () # Login form urlpost_url = "http://www.renren.com/PLogin.do"post_data = {" email ":" your_email "," password ":" your_password "} headers = {" User-Agent ":" Mozilla/5.0 (Macintosh) Intel Mac OS X 10 / 13 / 2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36 "} # sends post requests using session Cookie is saved in it session.post (post_url, data=post_data, headers=headers) # address that can only be accessed after requesting login using session # this is the personal home page urlr = session.get ("http://www.renren.com/327550088/profile", headers=headers) # Save the page to the local with open (" renren1.html "," w ", encoding=" utf-8 ") as f: f.write (r.content.decode ('utf-8'))

It's as simple as that, simulate logging in to Renren and get the personal home page information page and save it locally.

In fact, the login status of the website is recorded through the information carried in the cookie. If we send the request with the login cookie, whether we can access the page that can only be accessed by login, of course we can.

Please look at the code

# coding=utf-8import requestsheaders = {"User-Agent": "Mozilla/5.0 (Macintosh Intel Mac OS X 10 / 13 / 2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36 "," Cookie ":" your login cookie "} r = requests.get (" http://www.renren.com/327550088/profile",headers=headers)# save page with open ("renren2.html", "w", encoding= "utf-8") as f: f.write (r.content.decode ())

As you can see, Cookie can be placed in headers. In fact, there is also a parameter in requests to pass cookie. This parameter is cookies.

Please look at the code

# usage of dictionary generator cookies = {i.split ("=") [0]: i.split ("=") [1] for i in cookies.split (" ")} print (cookies) r = requests.get (" http://www.renren.com/327550088/profile",headers=headers,cookies=cookies) thank you for reading, the above is the content of "how to use python crawler to crawl the information on Renren". After the study of this article, I believe you have a deeper understanding of how to use python crawler to crawl the information above Renren, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report