How to analyze the development and sales of games in jupyter 07/15 Update SLTechnology News&Howtos

How to analyze the development and sales of games in jupyter

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly explains "how to analyze the development and sales of games in jupyter". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's ideas to study and learn "how to analyze game development and sales in jupyter".

1. Import necessary libraries

Before analyzing the relevant data, import the necessary libraries:

Import pandas as pdimport matplotlib.pyplot as pltimport numpy as np# let the picture be displayed on the interactive page% matplotlib inline in order to display the from matplotlib import font_manager# normally in Chinese.

2. Code body

Import data from a csv file

As a tool library for Python data analysis, pandas contains a large number of simple and convenient methods, which is very practical in data processing. The read_csv method can read the data from the csv file and save it to the DataFrame object as follows

# read the csv file df = pd.read_csv ('vgsales.csv') # display the first five lines of the file df.head ()

The results are as follows:

Data cleaning and finishing

Partners with crawler experience should know that when crawling a large amount of data, it is inevitable that there will be missing data or data errors, so the most important step after importing data is to observe whether the above situation occurs. The results of data analysis after cleaning and finishing are more accurate.

Common data cleaning methods include filling in missing or incorrect data or deleting a line that contains missing or incorrect data. The latter is adopted in this paper, and the methods are as follows:

# check whether there is a missing value. True indicates that it contains missing data df.isnull (). Any ()

# Delete the missing value df = df.dropna () df.info ()

Data analysis

The first example analyzes the game platform, because the data is relatively large, so we only analyze the game platform with more than 100 games.

First of all, we index the data of the game platform (Platform) according to DataFrame, then use the value_counts () method to count the games contained in the game platform, and finally intercept the data we need.

# pf = df ['Platform'] .value_counts (). Sort_values () # sort_values () is the permutation operation pf = pf [pf > 100] pf

# Game platform name formation list as Y-axis data pf_name = pf.index.tolist () # contains game number formation list as X-axis data pf_number = pf.values.tolist () # build canvas fig,ax = plt.subplots (figsize = (16P10), dpi = 80) # y-axis range length = np.arange (len (pf_name)) # draw bar graph ax.barh (length,pf_number Tick_label = pf_name) # set the title and label ax.set_title ("The top 20 of Platform", fontsize = 18) ax.set_xlabel ("Number", fontsize = 16) # add the data label for an in zip b in zip (length,pf_number): ax.text (bang Zuo 40memaLyrie 0.15 length,pf_number ha = "center", fontsize = 12) plt.savefig ('Erique JupyterUniplement resultUniverse Gameple sale1.jpg') plt.show (

The result is as follows:

So what should we do if we want to know the sales of games in different regions?

# check whether the year has an inappropriate value df ['Year'] .value_counts () .sort_index ()

The following sales are obtained:

When you see the index of the year, it is strange that the year 2020 has not yet arrived, indicating that it is a data error, so this line of data needs to be cleaned. The methods are as follows:

Df = df [~ df ["Year"] .isin ([2020.0])] # ~ means to take the inverse df ['Year'] .value_counts () .sort_index ()

Knock on the blackboard! Here, if we want to get the total sales value of all the games in each region of the year, we will use the cumsum method-- the cumsum method is generally said to be a cumulative sum. Note the data in the red box. Here, the sales of each game are added according to the year.

# cumsum functions are cumulative, that is, according to the column Year Add up the sales in the same year df ['sum_sales'] = df [' Global_Sales'] .groupby (df ['Year']). Cumsum () df [' NA_sum_sales'] = df ['NA_Sales'] .groupby (df [' Year']). Cumsum () df ['EU_sum_sales'] = df [' EU_Sales'] .groupby (df ['Year']). Cumsum () df [' JP_sum_sales'] = df ['JP_Sales'] .groupby (df [' Year']) .cumsum () df ['Other_sum_sales'] = df [' Other_Sales'] .groupby (df ['Year']) .cumsum () df.head (10)

The results are as follows:

Think about it with a small brain. The data you need is the total sales for a year, so you only need to keep the sales of the last row of each year. Here, you can use drop_duplicates to remove the weight, and the keep parameter can keep the last line.

# de-weight-get the annual sales value of each game sale_df = df.drop_duplicates (subset= ['Year'], keep='last') sale_df.head () # cast type sale_df [' Year'] = sale_df ['Year'] .astype (int) sale_df.head () # sort by year-ascending (ascending adjusts ascending or descending) sale_df = sale_df.sort_values (by= "Year", ascending=True) sale_df.head ()

In order to make it easier to see the changing trend of sales, we draw the following curve:

# draw a bar graph my_font = font_manager.FontProperties (fname=r "c:\ windows\ fonts\ simsun.ttc") # prepare the data: yellow1 = sale_df ['sum_sales'] .values.tolist () yfresh2 = sale_df [' NA_sum_sales'] .values.tolist () yellow3 = sale_df ['EU_sum_sales'] .values.tolist () yellow4 = sale_df [' JP_sum_sales'] .values.tolist ( ) plt.figure 5 = sale_df ['Other_sum_sales'] .values.tolist () xroom1 = sale_df [' Year'] .values.tolist () # x axis range x = range (values1)) # create canvas plt.figure (figsize= (20J10)) Dpi=80) # draw a line chart Label is the label plt.plot of lengend (xPoweryao1magentific label = 'global') plt.plot (xrecedence yao2magnum label = 'North America') plt.plot (xmaijinyao3magentiallabel = 'European') plt.plot (xmemyyao4magentialabel = 'Japan') plt.plot (xmaiyu5) Label = 'other') # adjust x-axis scale _ xtick_labels = ['{} year '.format (I) for i in x = 1] plt.xticks (list (x) [:: 3], _ xtick_labels [:: 3], fontproperties = my_font,fontsize = 16) # fill in x Y-axis label plt.xlabel ('year', fontproperties = my_font,fontsize = 16) plt.ylabel ('sales', fontproperties = my_font,fontsize = 16) plt.title ('change curve of sales volume', fontproperties = my_font,fontsize = 18) # fence plt.grid (alpha=0.5) plt.legend (prop = my_font,loc = 'upper left') plt.savefig (' Ejupytermax) plt.show ()

Get the change curve of sales:

Let's take a look at the top 10 game publishers in the world. The method is similar to the first example, see the note:

# get the first ten publishers pb = df ['Publisher']. Value_counts (). Sort_values (ascending=False) .head (10) # ascending can adjust descending or ascending order # set the size of the canvas plt.figure (figsize= (8 pb.index#)) # use the publisher's name as the label of the corresponding data labels = pb.index# prepare data x = pb.values# draw a hollow pie chart x1 = [1 plt.pie (x, radius=1.0) Pctdistance = 1.1f%%') plt.pie (x1, radius=0.5,colors ='w') plt.title ('Top 10 of Publisher',fontsize = 16) plt.savefig (' Elux plt.show plt.show')

The world's top 10 game publishers are as follows:

Radish and green vegetables have their own preferences, everyone likes differently, and games are the same, so what type of games are most popular with the public? We need a DataFrame object that contains only the type of game and sales, first counted by the sum () method, and then indexed by loc. The method is as follows

# add the sales of each place by game type Group = df.groupby (['Genre']). Sum (). Loc [:,' NA_Sales':'Other_Sales'] Group

The preference results are as follows:

Intuitive display, draw the sales chart of different types of games, and see what types of games friends like!

# display Chinese my_font = font_manager.FontProperties (fname=r "c:\ windows\ fonts\ simsun.ttc") # create canvas plt.figure (figsize= (8,12)) # x axis label xlabel = ['NA_Sales','EU_Sales','JP_Sales','Other_Sales'] # y axis range and label y = range (len (Group.index)) ylabel = Group.index# draw heat map plt.imshow (Group, interpolation='nearest', cmap=plt.cm.pink) Aspect='auto') # cmap for the color change of the heat map # adjust x Y-axis scale plt.xticks (list (range (4)) [:: 1], xlabel [:: 1], fontproperties = my_font,fontsize = 12) plt.yticks (list (y) [:: 1], ylabel [:: 1], fontproperties = my_font,fontsize = 12) # set x Y-axis label plt.xlabel ('sales', fontproperties = my_font,fontsize = 14) plt.ylabel ('type of game', fontproperties = my_font,fontsize = 14) plt.colorbar (shrink=0.8) # change the length of the Bar plt.title ('The Heat Map of Sales',fontproperties = my_font,fontsize = 16) plt.savefig (' Ejupyter my_font,fontsize plt.savefig') plt.show ()

The sales results are as follows, the left coordinates represent different types of games, the chromaticity indicates sales, and the lighter the color, the higher the sales!

Thank you for your reading, the above is the content of "how to analyze the development and sales of games in jupyter". After the study of this article, I believe you have a deeper understanding of how to analyze the development and sales of games in jupyter. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.