How to use Python to visually display the word-of-mouth and box office data of a movie 07/04 Update SLTechnology News&Howtos

How to use Python to visually display the word-of-mouth and box office data of a movie

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "how to use Python to visually display the word-of-mouth and box office data of the movie", interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "how to visually display the word-of-mouth and box office data of a movie with Python".

Data acquisition

1. Scoring data

Web page analysis

Looking at the source code of the web page, you can see the target data in the tag, which can be obtained through xpath parsing. Let's go straight to the code!

Programming realization

Headers = {'Host':'movie.douban.com',' user-agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3947.100 Safari/537.36', 'cookie':'bid=uVCOdCZRTrM; douban-fav-remind=1; _ _ utmz=30149280.1603808051.2.2.utmcsr=google | utmccn= (organic) | utmcmd=organic | utmctr= (not%20provided); _ _ gads=ID=7ca757265e2366c5-22ded2176ac40059:T=1603808052:RT=1603808052:S=ALNI_MYZsGZJ8XXb1oU4zxzpMzGdK61LFA Dbcl2= "165593539:LvLaPIrgug0"; push_doumail_num=0; push_noty_num=0; _ _ utmv=30149280.16559; ll= "118288"; _ _ yadk_uid=DnUc7ftXIqYlQ8RY6pYmLuNPqYp5SFzc; _ vwo_uuid_v2=D7ED984782737D7813CC0049180E68C43 | 1b36a9232bbbe34ac958167d5bdb9a27; ct=y; ck=ZbYm; _ _ utmc=30149280; _ _ utmc=223695111; _ _ utma=30149280.1867171825.1603588354.1613363321.1613372112.11; _ _ utmt=1; _ utmb=30149280.2.10.1613372112; ap_v=0,6.0 _ pk_ref.100001.4cf6=%5B%22%22%2C%22%22%2C1613372123%2C%22https%3A%2F%2Fwww.douban.com%2Fmisc%2Fsorry%3Foriginal-url%3Dhttps%253A%252F%252Fmovie.douban.com%252Fsubject%252F34841067%252F%253Ffrom%253Dplaying_poster%22%5D; _ pk_ses.100001.4cf6=*; _ _ utma=223695111.788421403.1612839506.1613363340.1613372123.9; _ _ utmb=223695111.0.10.1613372123 _ _ utmz=223695111.1613372123.9.4.utmcsr=douban.com | utmccn= (referral) | utmcmd=referral | utmcct=/misc/sorry _ pk_id.100001.4cf6=e2e8bde436a03ad7.1612839506.9.1613372127.1613363387.',} url= "https://movie.douban.com/cinema/nowplaying/zhanjiang/"r = requests.get (url Headers=headers) r.encoding = 'utf8's = (r.content) selector = etree.HTML (s) li_list = selector.xpath (' / / * [@ id= "nowplaying"] / div [2] / ul/li') dict = {} for item in li_list: name = item.xpath ('. / / * [@ class= "stitle"] / a) [0] .replace (",") .replace ("\ n") ") rate = item.xpath ('. / * [@ class=" subject-rate "] / text ()') [0] .replace (",") .replace ("\ n", ") dict [name] = float (rate) print (" Movie = "+ name) print (" rating = "+ rate) print ("-")

The movie title and rating data have been crawled down and sorted in descending order, which will be visualized later.

two。 Duration and type of movie

Web page analysis

In the page source code, the page tag of the movie length is roperty= "v:runtime", and the page tag of the movie type is property= "v:genre".

Programming realization

# duration def getmovietime (): url = "https://movie.douban.com/cinema/nowplaying/zhanjiang/" r = requests.get (url) Headers=headers) r.encoding = 'utf8' s = (r.content) selector = etree.HTML (s) li_list = selector.xpath (' / / * [@ id= "nowplaying"] / div [2] / ul/li') for item in li_list: title = item.xpath ('. / / * [@ class= "stitle"] / a Compact title') [0] .replace (",") .replace ("\ n") ") href = item.xpath ('. / / * [@ class=" stitle "] / a _ href _ r = requests.get (href) Headers=headers) r.encoding = 'utf8' s = (r.content) selector = etree.HTML (s) times = selector.xpath (' / / * [@ property= "v:runtime"] / text ()') type = selector.xpath ('/ * [@ property= "v:genre"] / text ()') print (title) print (times) print (type) Print ("-")

3. Comment data

Web page analysis

To view the web page code, the target tag for the comment data is

Do not know how to analyze, you can read an article python crawled 44130 pieces of user viewing data, analysis and mining of hidden information between users and the movie! This article is also an analysis of Douban movie, which is introduced in detail.

Let's start crawling the review data of these seven movies!

Programming realization

# comment data def getmoviecomment (): url = "https://movie.douban.com/cinema/nowplaying/zhanjiang/" r = requests.get (url) Headers=headers) r.encoding = 'utf8' s = (r.content) selector = etree.HTML (s) li_list = selector.xpath (' / / * [@ id= "nowplaying"] / div [2] / ul/li') for item in li_list: title = item.xpath ('. / / * [@ class= "stitle"] / a Compact title') [0] .replace (",") .replace ("\ n") ") href = item.xpath ('. / / * [@ class=" stitle "] / a replace stitle') [0] .replace (",") .replace ("\ n", ") .replace (" /? from=playing_poster ",") print ("movie =" + title) print ("link =" + href) # with open (title+ ".txt", "a +") Encoding='utf-8') as f: for k in range (0jie 200jue 20): url = href+ "/ comments?start=" + str (k) + "& limit=20&status=P&sort=new_score" r = requests.get (url) Headers=headers) r.encoding = 'utf8' s = (r.content) selector = etree.HTML (s) li_list = selector.xpath (' / / * [@ class= "comment-item"]') for items in li_list: text = items.xpath ('. / / * [@ class= "short "] / text ()') [0] f.write (str (text) +"\ n ") print ("-") time.sleep (4)

Save these comment data to a text file, and then use different graphics to visualize the comment data!

Data visualization

1. Visualization of scoring data

# drawing font_size = 10 # font size fig_size = (13,10) # Chart size data = ([datas]) # Update font size mpl.rcParams ['font.size'] = font_size# update chart size mpl.rcParams [' figure.figsize'] = fig_size# set histogram width bar_width = 0.35index = np.arange (len (data [0])) # drawing score rects1 = plt.bar (index, data [0]) Bar_width, color='#0072BC') # X axis title plt.xticks (index + bar_width, itemNames) # Y axis range plt.ylim (ymax=10, ymin=0) # Chart title plt.title (u 'Douban score') # legend is shown at the bottom of the chart plt.legend (loc='upper center', bbox_to_anchor= (0.5,0.03), fancybox=True Ncol=5) # add data label def add_labels (rects): for rect in rects: height = rect.get_height () plt.text (rect.get_x () + rect.get_width () / 2, height, height, ha='center', va='bottom') # histogram edges are filled with white Purely for beauty rect.set_edgecolor ('white') add_labels (rects1) # chart output to local plt.savefig (' Douban score .png')

Among the seven popular films, "Hello, Li Huanying" has the highest score (8.3), while "Chinatown investigation 3" has the lowest score (5.8), which is a bit unexpected (Chinatown investigation 3 is much hotter than you, Li Huanying is hotter).

two。 Visualization of duration and type

Visualization of duration data

# Visual itemNames.reverse () datas.reverse () # drawing. Fig, ax = plt.subplots () b = ax.barh (range (len (itemNames)), datas, color='#6699CC') # add data labels to the right side of the horizontal bar chart. For rect in b: W = rect.get_width () ax.text (w, rect.get_y () + rect.get_height () / 2,'% d'% int (w), ha='left', va='center') # sets the tick mark label on the Y axis ordinate. Ax.set_yticks (range (len (itemNames)) ax.set_yticklabels (itemNames) plt.title ('movie duration (minutes)', loc='center', fontsize='15', fontweight='bold', color='red') # plt.show () plt.savefig ("movie duration (minutes)")

The length of the movies in the picture is about 120 minutes.

The longest movie is Chinatown 3 (136 minutes), and the shortest is Bear in the Wild Continent (99 minutes).

Visualization of movie type data

# 2. Type Visualization # sort from small to large dict = sorted (dict.items (), key=lambda kv: (kv [1], kv [0]) print (dict) itemNames = [] datas = [] for i in range (len (dict)-1,-1,-1): itemNames.append (itemNames.append [I] [0]) datas.append (datas.append [1]) x = range (len (itemNames)) plt.plot (x, datas, marker='o', mec='r') Mfc='w', label=u' Movie Type') plt.legend () # make the legend effective plt.xticks (x, itemNames, rotation=45) plt.margins (0) plt.subplots_adjust (bottom=0.15) plt.xlabel (u "Type") # X-axis label plt.ylabel ("quantity") # Y-axis label plt.title ("Movie Type Statistics") # title plt.savefig ("Movie Type Statistics .png")

Count the types of these seven films (some movies belong to multiple types, such as' action', 'fantasy', 'adventure'). Four of the seven films are comedies. Science fiction, crime, suspense and adventure are all part of it.

3. Cloud Visualization of comment data

Use seven different patterns for word cloud visualization, so encapsulate the drawing code into a function!

# the cloud code def jieba_cloud (file_name, icon): with open (file_name, 'ringing, encoding='utf8') as f: text = f.read () text = text.replace ('\ npermission, "). Replace ("\ u3000 ","). Replace (","). Replace ("." , ") word_list = jieba.cut (text) result =" >

Start mapping the review data of these seven films.

# comment data words def commentanalysis (): lists = ['assassinate novelist', 'Hello, Li Huanying', 'crowds', 'serve God Ling', 'Chinatown investigation 3' rebirth of new gods' list: Nezha's rebirth, 'bear haunts the wild mainland'] for i in range (lists): title = Lists[ I] + ".txt" jieba_cloud (title, (iTun1)) so far I believe you have a deeper understanding of "how to use Python to visually display the film's word-of-mouth and box office data". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.