How to use Python to crawl movies 04/27 Update SLTechnology News&Howtos

How to use Python to crawl movies

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "how to use Python to climb a movie". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "how to climb a movie with Python".

First of all, I crawled all the on-screen comments of the movie with python. This crawler is relatively simple, so I won't go into details and go directly to the code:

Import requestsimport pandas as pdheaders = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36"} url = 'https://mfm.video.qq.com/danmu?otype=json&target_id=6480348612%26vid%3Dh0035b23dyt' # the final parameter to control the on-screen comment is that target_id and timestamp,tiemstamp request one packet every 30 requests. Comids= [] comments= [] opernames= [] upcounts= [] timepoints= [] times= [] n=15while True: data = {"timestamp": n} response = requests.get (url,headers=headers,params=data,verify=False) res = eval (response.text) # string converted to list format con = res ["comments"] if res ['count']! = 0: # determine the number of on-screen comments Are you sure if the crawl ends with 30 for j in con: comids.append (j ['commentid']) opernames.append (j ["opername"]) comments.append (j ["content"]) upcounts.append (j ["upcount"]) timepoints.append (j ["timepoint"]) else: breakdata=pd.DataFrame ({' id':comids,'name':opernames) 'comment':comments,'up':upcounts,'pon':timepoints}) data.to_excel (' Wealth Diary on-screen comment .xlsx')

First of all, read the on-screen comment data with padans

Import pandas as pddata=pd.read_excel ('Wealth Diary on-screen comment. Xlsx') data

With nearly 40, 000 on-screen comments, 5 columns of data are "comment id", "nickname", "content", "number of likes" and "on-screen comment location" respectively.

Segment the movie at 6-minute intervals to see how the number of on-screen comments changes in each time period:

Time_list= ['{} '.format (int (iUnip 60)) for i in list (range (0Magne8280360))] pero_list= [] for i in range (len (time_list)-1): pero_list.append (' {0}-{1} '.format (time_ list [I]) Time_ list [I + 1]) counts= [] for i in pero_list: counts.append (data [(data.pon > = int (i.split ('-') [0]) * 60) & (data.pon 1 and seg! ='\ r\ npermission: segment.append (seg) # de-deactivate words (text denoising) words_df = pd.DataFrame ({'segment': segment}) words_df.head () Stopwords = pd.read_csv ("stopword.txt" Index_col=False, quoting=3, sep='\ utf8, names= ['stopword'], encoding= "utf8") words_df = words_df [~ words_df.segment.isin (stopwords.stopword)] words_stat = words_df.groupby (' segment') .agg (count=pd.NamedAgg (column='segment', aggfunc='size')) words_stat = words_stat.reset_index (). Sort_values (by= "count") Ascending=False) return words_statdata_6_text=''.join (data [(data.pon > = 0) & (data.pon=54*60) & (data.pon=120*60) & (data.pon100] # find people with more than 100 on-screen comments data_text=''for I in data3 ['name'] .values.tolist (): data_text+=''.join (data [data.name = = I] [' comment'] .values.tolist ()) words_stat=ciyun (data_text) from pyecharts Import options as optsfrom pyecharts.charts import WordCloudfrom pyecharts.globals import SymbolTypewords = [(I J) for iMagol j in zip (words_stat ['segment'] .values.tolist (), words_stat [' count']. Values.tolist ()] c = (WordCloud () .add (", words, word_size_range= [20,100], shape=SymbolType.DIAMOND) .set _ global_opts (title_opts=opts.TitleOpts (title=" {} ".format ('fan on-screen comment') c.render_notebook () Thank you for your reading The above is the content of "how to use Python to crawl movies". After the study of this article, I believe you have a deeper understanding of how to use Python to crawl movies, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.