Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use Python to crawl 2022 Spring Festival movie information

2025-02-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

This article will explain in detail how to use Python to climb the 2022 Spring Festival movie information. The editor thinks it is very practical, so I share it with you as a reference. I hope you can get something after reading this article.

Experimental environment Python 3.x (object-oriented high-level language) Resquest 2.14.2 (python third-party library) Pandas 1.1.0 (python third-party library) Time (python standard library) Lxml (python third-party library) specific steps target website

Https://movie.douban.com/cinema/later/shenzhen/

Analysis website

Press F12 to open the browser console

Press the Ctrl+Shift+C shortcut key

Press the Ctrl+F shortcut key, and a search box appears in the console

Copy Xpath

Xpath is / / * [@ id= "showing-soon"] / div [1] / div/h4/a

Paste into the search box to verify Xpath

Check HTML for commonalities

Find that the target elements are all in a div box. Modify the Xpath.

Xpath is modified to / / * [@ id= "showing-soon"] / div/div/h4/a

Remaining target elements, and so on

Finally, save it as a CSV file with Pandas

# using pandas to save the file df = pd.DataFrame () df ['release date'] = Ondatedf ['film title'] = namedf ['type'] = movie_classdf ['production country'] = areadf ['number of people you want to see'] = numdf ['hyperlink'] = href

Code implementation #-*-coding: utf-8-* "Created on Tue Jan 25 10:07:11 2022@author: TFX"import timeimport requests # request library import pandas as pdfrom lxml import etree# extraction information base # date today = time.strftime ('% {y}% m {m}% d {d}', time.localtime ()) .format (yearly 'year', masked 'month' Url = 'https://movie.douban.com/cinema/later/shenzhen/'# request header headers = {' User-Agent': 'Mozilla/5.0 (Windows NT 10.0) Win64 X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'} # send request response = requests.get (url=url,headers=headers) # data parsing Xpath can use a browser to check elements to get html = etree.HTML (response.text) # Type Transformation # Movie detailed hyperlink href = html.xpath ('/ * [@ id= "showing-soon"] / div/div/h4/a/@href') # release date Ondate = html.xpath ('/ / * [@ id= "showing-soon"] / div/div/ul/li [1] / text ()') # title name = html. Xpath ('/ * [@ id= "showing-soon"] / div/div/h4/a/text ()') # Type movie_class = html.xpath ('/ / * [@ id= "showing-soon"] / div/div/ul/li [2] / text ()') # production country area = html.xpath ('/ * [@ id= "showing-soon"] / div/div/ul/li [3] / text ()') # Want to see the number of people num = html.xpath ('/ * [@ id= "showing-soon"] / div/div/ul/li [4] / span/text ()') # save the file df = pd.DataFrame () df ['release date'] = Ondatedf ['film title'] = namedf ['type'] = movie_classdf ['production country'] = areadf ['number of people want to see'] = numdf ['hyperlink'] = hrefdf. To_csv ('2022 Spring Festival film _' + today+'.csv' Mode='w',index=None,encoding='gbk') print ('save complete!') Output result

This is the end of the article on "how to use Python to climb the movie information of the 2022 Spring Festival". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, please share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report