Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use python to log in Douban and crawl movie reviews

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly introduces "how to use python to log on Douban and crawl film reviews". In daily operation, I believe many people have doubts about how to use python to log on Douban and crawl film reviews. Xiaobian consulted various materials and sorted out simple and easy operation methods. I hope to answer your doubts about "how to use python to log on Douban and crawl film reviews". Next, please follow the small series to learn together!

enter the theme

1. Go to the target page and find the data you're looking for.

As soon as I went into the web page, I reflexively opened up developer tools, and it was easy to see this.

The mouse is the next point I want to climb the site, first look at his response and request head information, his request method get, response is a web page structure, this is easy to do, we can use regular to match the data, regular or a very good thing, please be sure to learn ah. Then start typing the code!

2. Get data with re+requests

access to information

First write the data to the txt file (open the file to specify the encoding for utf-8, to avoid encoding problems, because the default encoding of window is gbk, and your encoding is utf-8)

Regular expressions and URLs

One click to run, and it only runs two pages, and there's a problem, because this comment is more than two pages long.

After debugging, he returned a non-existent web page when he got the second page, causing my regular expression to capture no data and an empty page to appear, so he only downloaded two pages. This should have been reverse crawled. Continue to go back to the web page to see what request header needs to be added. However, I added all the information of the request header, and it was still useless. This touched my blind spot (embarrassed face), but I can Baidu ah, Baidu has a look. See someone say simulation landing on it, then good, I will come to simulate landing a wave!!!

3. simulated landing douban

First of all, you need to see what parameters you need to log in. This parameter is the login URL in Douban. First, open the login, open the developer tool (if you don't see the required page later), fill in the information, click on the login page, and then drop down to see the From Data box. This is the parameter required for login.

Just copy them.

Then post the information to the server to complete the login, but there is a problem, how to save the login information? This requires the use of Session() to retain, but note that only need to establish a session information on it, not everyone uses this method, I started to make this mistake so that I have not been successful for a long time. code is as follows

Then use this post, pay attention! Attention! Attention! The URL of post is the login URL, not the URL you want to climb. I was also trapped by this for a long time just after school (how do I feel that I have a lot of problems), and other requests need to be replaced with self.ssession()

And finally, this.

Success, because only 500 comments can be obtained, this is because Douban only opened 500 comments, and refused to give one more.

4. Captcha is required for more login

Because I log in and log out many times, then I need to fill in the Captcha, however, this is still not difficult for me, or analyze the web page to find out the Captcha picture and download it to fill in by myself, not as powerful as those big shots. You can use artificial intelligence to fill in the code as follows

There will be data retained to the database, I will not post, the code and the previous article is almost

At this point, the study on "how to use python to log in Douban and crawl film reviews" is over, hoping to solve everyone's doubts. Theory and practice can better match to help everyone learn, go and try it! If you want to continue learning more relevant knowledge, please continue to pay attention to the website, Xiaobian will continue to strive to bring more practical articles for everyone!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report