Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use python to analyze the Film Box Office

2025-02-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly introduces the relevant knowledge of "how to use python to analyze the movie box office". The editor shows you the operation process through the actual case. The operation method is simple, fast and practical. I hope this article "how to use python to analyze the movie box office" can help you solve the problem.

1. Asking questions this case comes from the TMDB 5000 Movie Dataset data set on kaggle. In order to explore the visualization of movie data and provide data support for film production, we mainly study the following questions:

How does the type of movie change over time?

What is the relationship between the type of film and profit?

How are the comparisons between Universal and Paramount?

How does the adapted film compare with the original film?

What is the relationship between the duration of the movie and the box office and rating of the movie?

Analyze movie keywords

2. Understanding data 1. Collecting data

Download the dataset from TMDB 5000 Movie Dataset on kaggle:

Https://www.kaggle.com/tmdb/tmdb-movie-metadata

2. Import data

3. View dataset information

The following is an introduction to the meaning of some fields in the moviedf dataset:

Id: identification number

Imdb id:IMDB identification number

Popularity: relative page views on Movie Database

Budget: budget (USD)

Revenue: revenue (USD)

Original_title: movie name

Cast: list of actors, separated by | maximum of 5 actors

Homepage: the URL of the home page of the movie

Director: list of directors, separated by |, up to 5 directors

Tagline: slogan of the movie

Keywords: keywords related to movies, separated by |, up to 5 keywords

Overview: summary of the plot

Runtime: the duration of the movie

Genres: list of styles, separated by |, up to 5 styles

Production_companies: list of production companies, separated by |, up to 5 companies

Release_date: first release date

Vote_count: number of times scored

Vote_average: average score release year: year of release

3. Data cleaning 1. Merge the data of credits dataset and moviedf dataset together, and then view the information of the merged dataset:

2. Select a subset

Because there is too much information in the dataset, some of the data is not the focus of our research, so select the data we need:

Since the later data analysis involves the profit calculation of the movie type, first calculate the profit of each movie, and add the profit data column to the data set moviesdf:

3. Missing value processing

Through the above data set information, we can know that there is less missing data in the whole data set, including 1 data missing in release_date (first release date) and 2 data missing in runtime (movie duration), which can be made up by online query.

Fill in release_date (premiere date) data:

Find the missing data for runtime (movie duration):

Fill in the missing runtime values:

4. Data format conversion

Genres column data processing:

Release_date column data processing:

Fourth, data analysis and visualization question 1: how does the type of film change over time?

1. Establish a relationship data box containing the year and the number of movie types:

2. Data visualization

Draw a quantity bar chart for various types of movies:

Draw pie charts for various types of movies:

Analysis conclusion:

As can be seen from the above results, Drama (drama) accounts for 18.9% of all film genres, followed by Comedy (comedy), accounting for 14.2% of all film genres.

Among all the movie genres, the top 5 movie types are: Drama (drama), Comedy (comedy), Thriller (thriller), Action (action) and Romance (adventure).

3. Analysis of the trend of film types changing with time:

Analysis conclusion:

It is observed from the picture that with the passage of time, all types of films have shown a growing trend, especially since 1992, all types of films have grown rapidly, of which Drama (drama) and Comedy (comedy) have grown the fastest and are still the most popular types of movies.

Question 2: the relationship between film type and profit?

First, find out the average profit of each type of film:

Visualization of average profit data of movie types:

Analysis conclusion:

It is observed from the picture that shooting Animation, Adventure and Fantasy films is the most profitable, while shooting Foreign, TV and Movie films is at risk of losing money.

Question 3: what is the comparison between Universal Pictures and Paramount Pictures in the release of films?

Universal Pictures (Universal Pictures) and Paramount Pictures (Paramount Pictures) are two giant American film companies.

1. Check the number of films distributed by Universal Pictures and Paramount Pictures.

Process the production_companies column data first:

Query production_companies data columns and count Universal Pictures and Paramount Pictures data:

Use a pie chart to compare the proportion of movies released by the two companies:

2. Analyze the trend of film distribution of Universal Pictures and Paramount Pictures.

Extract relevant data columns for processing:

The line chart of the film distribution of the two film companies:

Analysis conclusion:

It is observed from the picture that the film circulation of Universal Pictures and Paramount Pictures shows an increasing trend over time, especially after 1995, in which Universal Pictures has released more films than Paramount Pictures.

Question 4: how does the adapted film compare with the original film?

Processing keywords column data:

Draw a bar chart and compare the adapted film with the original film in terms of budget, revenue and profit:

Analysis conclusion:

As can be seen from the picture, the budget of the adapted film is slightly higher than that of the original film, but the box office revenue and profit of the adapted film is much higher than that of the original film, which may be that the adapted film has a certain fan base.

Question 5: the relationship between movie duration and film box office and rating

The relationship between the length of a movie and the box office:

The relationship between the length of a movie and its average score:

Analysis conclusion:

As can be seen from the picture, if a film wants to get a high box office and a good reputation, the duration of the film should be kept within 90-150 minutes.

Question 6: analysis of movie keywords

First extract movie keywords:

Generate word cloud image through word cloud package WordCloud:

So much for the introduction of "how to use python to analyze the Box Office of a Movie". Thank you for reading. If you want to know more about the industry, you can follow the industry information channel. The editor will update different knowledge points for you every day.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report