Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the method of Python climbing Douban movie?

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly explains "what is the method of Python crawling Douban movie". Interested friends may wish to take a look at it. The method introduced in this paper is simple, fast and practical. Next, let the editor take you to learn "what is the method of Python climbing Douban movie"?

Main goal

Environment: MAC + Python3.6; IDE: Pycharm. The specific modules used are as follows.

Import requests import re import json

But if you have anaconda installed on your system, the module requests has been installed, but pycharm does not recognize it. At this point, you only need to use preferences to install directly, enter the figure below, click +, and install it directly.

Crawl analysis

For each page crawl, we use the requests library. Requests is written in python based on urllib, using the HTTP library of the Apache2 Licensed open source protocol. It is more convenient than urllib and can save us a lot of work. In a word, requests is the easiest HTTP library implemented by python. It is recommended that crawlers use the requests library. After installing python by default, the requests module is not installed and needs to be installed separately through pip or using pycharm software (as shown above).

Directly use the get method in the library requests to make a request to the target URL. To prevent the requested page from being unsuccessful, we added the code block try,except that catches the exception. In addition, if you crawl the same URL multiple times, your ip may be blocked and cannot crawl any information. In order to solve this problem, this time set up the agent information in the code, the specific code, please pay attention to artificial intelligence and big data life (data_circle), at the end of the text long press the QR code, reply to "Douban movie" to get all the code.

`python` def get_one_page (url):''grab the content of the first page: return: information of the requested page' try: response = requests.get (url,headers=headers) if response.status_code = = 200: return response.text return None except RequestException: return None

Page parsing

After parsing, we use the write function to write to the csv file, as shown in the following code.

Def write_to_file (content):''save the result to CSV file: param content:: return: none' 'with open (' douban_movie_250.csv','a',encoding='utf-8') as f: f.write (json.dumps (content,ensure_ascii=False) +'\ n') so far, I believe you have a better understanding of "what is the method of Python crawling Douban movie". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report