In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly explains "how to use Python web crawler to see which movies have been shown in cinemas recently". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "how to use Python web crawler to see which movies have been shown in cinemas recently."
Project realization
1. Define a class class to inherit object, define init method to inherit self, and the main function main to inherit self. Import the required libraries and URLs, as shown below.
Import requests from lxml import etree import time import random class MaoyanSpider (object): def _ _ init__ (self): self.url = "https://maoyan.com/films?showType=2&offset={}" def main (self): pass if _ name__ = ='_ _ main__': spider = MaoyanSpider () spider.main ()
2. Randomly generate UserAgent.
For i in range (1,50): # ua.random, be sure to write here, each request will be randomly selected. Self.headers = {'User-Agent': ua.random,}
3. Send a request to get the page response.
Def get_page (self, url): # random.choice must be written here. Res = requests.get (url, headers=self.headers) res.encoding = 'utf-8' html = res.text self.parse_page (html) is randomly selected for each request.
4. Xpath parses the first-level page data and obtains the page information.
1) list of benchmark xpath node objects.
# create parsed object parse_html = etree.HTML (html) # benchmark xpath node object list dd_list = parse_html.xpath ('/ / dl [@ class= "movie-list"] / / dd')
2) traverse each node object in turn to extract data.
For dd in dd_list: name = dd.xpath ('. / / div [@ class= "movie-hover-title"] / / span [@ class= "name noscore"] / text ()') [0] .strip () star = dd.xpath ('. / / div [@ class= "movie-hover-info"] / div [@ class= "movie-hover-title"] [3] / text ()') [1] .strip () type = dd.xpath ('. / / div [@ class= "movie-hover-info"] / / div [@ class= "movie-hover-title"] [2] / text ()') [1] .strip () dowld=dd.xpath ('. / / div [@ class= "movie-item-hover"] / a Universe ref') [0] .strip () # print (movie_dict) movie =''[coming]
5. Define movie and save print data.
Movie ='[coming soon] Movie name:% s starring:% s Type:% s details Link: https://maoyan.com%s = =''% (name, star, type,dowld) print (movie)
6. Random.randint () method, which sets the time delay.
Time.sleep (random.randint (1,3))
7. Call the method to realize the function.
Html = self.get_page (url) self.parse_page (html)
Effect display
1. Click the small green triangle to run the input start page and end the page.
2. After running the program, the results are displayed in the console, as shown in the following figure.
3. Click the blue download link to view the details on the network.
Thank you for your reading, the above is the content of "how to use Python web crawler to see what movies have been shown in cinemas recently". After the study of this article, I believe you have a deeper understanding of how to use Python web crawler to see what movies have been shown in cinemas recently, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.