Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to climb popular movies by python

2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly explains "how python climbs popular movies". Friends who are interested may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "how to climb popular movies by python".

The code implements the operation step 1 of crawling the movie and address path, get the url content 2dcss, select its selection 3, save the data you need, and import the package import requestsfrom bs4 import BeautifulSoup#requests and BeautifulSoup needed by the crawler to parse the import time# setting access time of the web page. Prevent oneself from having too much IP access and be restricted to deny access import reclass Position (): def _ _ init__ (self,position_name,position_require,): # build object property self.position_name=position_name self.position_require=position_require def _ _ str__ (self): return's% s% Universe% (self.position_name Self.position_require) # overload method to change the input variable to string form class Aiqiyi (): def iqiyi (self,url): head= {'User-Agent': "Mozilla/5.0 (Windows NT 10.0) Win64 X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.67 Safari/537.36 Edg/87.0.664.47 "} # simulated server header html = requests.get (url,headers=head) # headers=hard lets scripts be accessed as browsers, and some URLs prohibit anti-crawling with python. This is one of the soup = BeautifulSoup (html.content, 'lxml', from_encoding='utf-8') # BeautifulSoup look at the page soupl = soup.select (".qy-list-wrap") # find the tag, use the css selector, select the data you need to select the content of the page for the first time (if the tag is unique, find id If you do not consider other tags such as class) results = [] # create a list to store data for e in soupl: biao = e.select ('.qy-mod-li') # for secondary filtering for h in biao: p=Position (h.select_one (' .qy-mod-link-wrap'). Get_text (strip=True) H.select_one ('.title-wrap') .get_text (strip=True) # invokes the class transformation (continues filtering three times to select what you want) results.append (p) return results # returns content def address (self Url): # Save URL head = {'User-Agent': "Mozilla/5.0 (Windows NT 10.0 Win64 X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.67 Safari/537.36 Edg/87.0.664.47 "} # simulated server header html = requests.get (url, headers=head) soup = BeautifulSoup (html.content, 'lxml', from_encoding='utf-8') # BeautifulSoup play the web page alist = soup.find (' div' Class_='qy-list-wrap') .find_all ("a") # find a tag ls= [] for i in alist under the div block module: ls.append (i.get ('href')) return lsif _ _ name__ = =' _ main__': time.sleep (2) # set 2 seconds to access a=Aiqiyi () url = "https : / / list.*.com/www/1/-11-1-1-iqiyi--.html "with open (file='e:/ exercise .txt' Mode='a+') as f: # eVOR / exercise .txt newly created files for my computer A + is to add content, but does not overwrite the original content. For item in a.iqiyi (url): line = f'{item.position_name} {item.position_require} 'f.write (line) # use the method print ("download complete") with open (file='e:/ address .txt', mode='a+') as f: # eVera / exercise .txt new files for my computer, a + to add content But do not overwrite the original content. For item in a.address (url): line=f'https {item} 'f.write (line) # adopts the method print ("download complete"). So far, I believe you have a better understanding of "how python crawls popular movies". You might as well do it in practice! Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report