In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly introduces the relevant knowledge of "how to use python to analyze the movie box office". The editor shows you the operation process through the actual case. The operation method is simple, fast and practical. I hope this article "how to use python to analyze the movie box office" can help you solve the problem.
1. Asking questions this case comes from the TMDB 5000 Movie Dataset data set on kaggle. In order to explore the visualization of movie data and provide data support for film production, we mainly study the following questions:
How does the type of movie change over time?
What is the relationship between the type of film and profit?
How are the comparisons between Universal and Paramount?
How does the adapted film compare with the original film?
What is the relationship between the duration of the movie and the box office and rating of the movie?
Analyze movie keywords
2. Understanding data 1. Collecting data
Download the dataset from TMDB 5000 Movie Dataset on kaggle:
Https://www.kaggle.com/tmdb/tmdb-movie-metadata
2. Import data
3. View dataset information
The following is an introduction to the meaning of some fields in the moviedf dataset:
Id: identification number
Imdb id:IMDB identification number
Popularity: relative page views on Movie Database
Budget: budget (USD)
Revenue: revenue (USD)
Original_title: movie name
Cast: list of actors, separated by | maximum of 5 actors
Homepage: the URL of the home page of the movie
Director: list of directors, separated by |, up to 5 directors
Tagline: slogan of the movie
Keywords: keywords related to movies, separated by |, up to 5 keywords
Overview: summary of the plot
Runtime: the duration of the movie
Genres: list of styles, separated by |, up to 5 styles
Production_companies: list of production companies, separated by |, up to 5 companies
Release_date: first release date
Vote_count: number of times scored
Vote_average: average score release year: year of release
3. Data cleaning 1. Merge the data of credits dataset and moviedf dataset together, and then view the information of the merged dataset:
2. Select a subset
Because there is too much information in the dataset, some of the data is not the focus of our research, so select the data we need:
Since the later data analysis involves the profit calculation of the movie type, first calculate the profit of each movie, and add the profit data column to the data set moviesdf:
3. Missing value processing
Through the above data set information, we can know that there is less missing data in the whole data set, including 1 data missing in release_date (first release date) and 2 data missing in runtime (movie duration), which can be made up by online query.
Fill in release_date (premiere date) data:
Find the missing data for runtime (movie duration):
Fill in the missing runtime values:
4. Data format conversion
Genres column data processing:
Release_date column data processing:
Fourth, data analysis and visualization question 1: how does the type of film change over time?
1. Establish a relationship data box containing the year and the number of movie types:
2. Data visualization
Draw a quantity bar chart for various types of movies:
Draw pie charts for various types of movies:
Analysis conclusion:
As can be seen from the above results, Drama (drama) accounts for 18.9% of all film genres, followed by Comedy (comedy), accounting for 14.2% of all film genres.
Among all the movie genres, the top 5 movie types are: Drama (drama), Comedy (comedy), Thriller (thriller), Action (action) and Romance (adventure).
3. Analysis of the trend of film types changing with time:
Analysis conclusion:
It is observed from the picture that with the passage of time, all types of films have shown a growing trend, especially since 1992, all types of films have grown rapidly, of which Drama (drama) and Comedy (comedy) have grown the fastest and are still the most popular types of movies.
Question 2: the relationship between film type and profit?
First, find out the average profit of each type of film:
Visualization of average profit data of movie types:
Analysis conclusion:
It is observed from the picture that shooting Animation, Adventure and Fantasy films is the most profitable, while shooting Foreign, TV and Movie films is at risk of losing money.
Question 3: what is the comparison between Universal Pictures and Paramount Pictures in the release of films?
Universal Pictures (Universal Pictures) and Paramount Pictures (Paramount Pictures) are two giant American film companies.
1. Check the number of films distributed by Universal Pictures and Paramount Pictures.
Process the production_companies column data first:
Query production_companies data columns and count Universal Pictures and Paramount Pictures data:
Use a pie chart to compare the proportion of movies released by the two companies:
2. Analyze the trend of film distribution of Universal Pictures and Paramount Pictures.
Extract relevant data columns for processing:
The line chart of the film distribution of the two film companies:
Analysis conclusion:
It is observed from the picture that the film circulation of Universal Pictures and Paramount Pictures shows an increasing trend over time, especially after 1995, in which Universal Pictures has released more films than Paramount Pictures.
Question 4: how does the adapted film compare with the original film?
Processing keywords column data:
Draw a bar chart and compare the adapted film with the original film in terms of budget, revenue and profit:
Analysis conclusion:
As can be seen from the picture, the budget of the adapted film is slightly higher than that of the original film, but the box office revenue and profit of the adapted film is much higher than that of the original film, which may be that the adapted film has a certain fan base.
Question 5: the relationship between movie duration and film box office and rating
The relationship between the length of a movie and the box office:
The relationship between the length of a movie and its average score:
Analysis conclusion:
As can be seen from the picture, if a film wants to get a high box office and a good reputation, the duration of the film should be kept within 90-150 minutes.
Question 6: analysis of movie keywords
First extract movie keywords:
Generate word cloud image through word cloud package WordCloud:
So much for the introduction of "how to use python to analyze the Box Office of a Movie". Thank you for reading. If you want to know more about the industry, you can follow the industry information channel. The editor will update different knowledge points for you every day.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.