In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly explains "what is the method of Python crawling Douban movie". Interested friends may wish to take a look at it. The method introduced in this paper is simple, fast and practical. Next, let the editor take you to learn "what is the method of Python climbing Douban movie"?
Main goal
Environment: MAC + Python3.6; IDE: Pycharm. The specific modules used are as follows.
Import requests import re import json
But if you have anaconda installed on your system, the module requests has been installed, but pycharm does not recognize it. At this point, you only need to use preferences to install directly, enter the figure below, click +, and install it directly.
Crawl analysis
For each page crawl, we use the requests library. Requests is written in python based on urllib, using the HTTP library of the Apache2 Licensed open source protocol. It is more convenient than urllib and can save us a lot of work. In a word, requests is the easiest HTTP library implemented by python. It is recommended that crawlers use the requests library. After installing python by default, the requests module is not installed and needs to be installed separately through pip or using pycharm software (as shown above).
Directly use the get method in the library requests to make a request to the target URL. To prevent the requested page from being unsuccessful, we added the code block try,except that catches the exception. In addition, if you crawl the same URL multiple times, your ip may be blocked and cannot crawl any information. In order to solve this problem, this time set up the agent information in the code, the specific code, please pay attention to artificial intelligence and big data life (data_circle), at the end of the text long press the QR code, reply to "Douban movie" to get all the code.
`python` def get_one_page (url):''grab the content of the first page: return: information of the requested page' try: response = requests.get (url,headers=headers) if response.status_code = = 200: return response.text return None except RequestException: return None
Page parsing
After parsing, we use the write function to write to the csv file, as shown in the following code.
Def write_to_file (content):''save the result to CSV file: param content:: return: none' 'with open (' douban_movie_250.csv','a',encoding='utf-8') as f: f.write (json.dumps (content,ensure_ascii=False) +'\ n') so far, I believe you have a better understanding of "what is the method of Python crawling Douban movie". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.