In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly introduces "what is the basic configuration of matplotlib histogram drawing". In daily operation, I believe many people have doubts about what is the basic configuration of matplotlib histogram drawing. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts of "what is the basic configuration of matplotlib histogram drawing?" Next, please follow the editor to study!
Introduction to histogram
Histogram, also known as mass distribution map, is a statistical report chart, which is represented by a series of longitudinal stripes or line segments of different heights. The horizontal axis is generally used to represent the data type, and the vertical axis is used to represent the distribution.
A histogram is an accurate graphical representation of the distribution of numerical data. This is an estimation of the probability distribution of a continuous variable (quantitative variable) and was first introduced by Carl Karl Pearson. It's a bar chart.
To build the histogram, the first step is to segment the range of values, dividing the entire range of values into a series of intervals, and then calculate how many values are in each interval. These values are usually specified as continuous, non-overlapping variable intervals. The intervals must be adjacent and are usually (but not necessary) equal in size.
Histograms can also be normalized to show "relative" frequencies. It then shows the proportion of each case that belongs to several categories, with a height equal to 1.
Parameters for drawing a histogram (plt.hist ())
Generally speaking, there are many ways to draw histograms, such as using modules in matplotlib, graphics in pandas, or other statistical drawing modules in Python. In short, if you want to show beauty, you need to configure your own configuration, that is, templates are important, but if you don't understand the principle, you need to carry and borrow them. Instead, the effect is not very good!
Case #-*-coding: utf-8-*-import numpy as npimport matplotlib as mplimport matplotlib.pyplot as pltfrom matplotlib.font_manager import FontProperties mpl.rcParams ['font.sans-serif'] = [' SimHei'] # display Chinese plt.rcParams ['axes.unicode_minus'] = False # normal display negative sign import pymysql # connect to MySQL database v1 = [] v2 = [] db = pymysql.connect (host='127.0.0.1', port=3306 Database='mydb',user='root',password='root') cursor = db.cursor () # read order table data Statistics daily profit amount sql_str = "SELECT order_date,ROUND (SUM (profit) / 10000 profit 2) FROM orders WHERE FY=2019 GROUP BY order_date" cursor.execute (sql_str) result = cursor.fetchall () for res in result: v1.append (res [0]) # order_date v2.append (res [1]) # sum_profit_by_order_date daily profit amount plt.figure (figsize= (10L5)) # set figure size cs,bs,bars = plt.hist (v2) Bins=20, density=False, facecolor= "cyan", edgecolor= "black", alpha=0.7) width = bs [1]-bs [0] for I in enumerate (cs): plt.text (BS [I] + width/3,c,round (c)) # returns an array of counts A bins array and a graphical object # display horizontal axis label plt.xlabel ("interval", fontdict= {'family':'Fangsong','fontsize':15}) # display vertical axis label plt.ylabel ("frequency", fontdict= {' family':'Fangsong','fontsize':15}) # display chart title plt.title ("profit distribution histogram", fontdict= {'family':'Fangsong') 'fontsize':20}) plt.show () uses the plot function in dataframe to draw (universal template)
Generally speaking, when we import data, the probability is based on table data for visualization, rarely use those independent data for drawing, if that kind of data, many people will use origin this drawing software, the biggest advantage of program drawing is that there is no need for data results for output, input, which greatly reduces our time and improves our work efficiency.
# use DataFrame's plot function to draw import numpy as npimport matplotlib as mplimport matplotlib.pyplot as pltfrom matplotlib.font_manager import FontProperties mpl.rcParams ['font.sans-serif'] = [' SimHei'] # display Chinese plt.rcParams ['font.sans-serif'] =' KaiTi' # set global font to Chinese italic plt.rcParams ['axes.unicode_minus'] = False # normal display minus plt.figure (dpi=130) datafile = r' .. / data/orders.csv'data = pd.read_csv (datafile) .query ("FY==2019"). Groupby ('ORDER_DATE') [[' PROFIT']] .sum () data.plot (kind='hist' Bins=20,figsize= (15jue 5), color='y',alpha=0.5,edgecolor='c',histtype='bar') plt.xlabel ("interval", fontdict= {'family':'Fangsong','fontsize':15}) plt.ylabel ("frequency", fontdict= {' family':'Fangsong','fontsize':15}) plt.title ("profit distribution histogram", fontdict= {'family':'Fangsong','fontsize':20}) Set various theme values on the graph plt.suptitle ('histogram case', size=22,y=1.05) plt.title ("drawing date: 2022 nickname: Wang Xiaowang-123", loc='right',size=12,y=1.03) plt.title (" homepage: https://blog.csdn.net/weixin_47723732", loc='left',size=12,y=1.03) plt.show ()
Draw multiple subgraphs (multi-subgraph histogram example template)
Plt.tight_layout () # automatic compact layout to avoid occlusion
Is a very important parameter, which is usually added at the end
Import pandas as pd datafile = r'../data/orders.csv'data = pd.read_csv (datafile) .query ("FY==2019"). Groupby ('ORDER_DATE') [[' PROFIT']] .sum () fig = plt.figure (figsize=, dpi=130) # generate canvas # generate subgraph 1ax1 = plt.subplot (121) # the first plt.title ("CSDN blogger", loc='left',size=12) in row 1 and column 2 Add remarks # generate subgraph 2ax2 = plt.subplot (1) # 2 in row 2 column # set various theme values on the graph plt.title ("Wang Xiaowang-123", loc='right',size=12,y=1.03) # add remarks # df.plot makes the drawing function at figure level A new figure is generated by default You can specify the coordinate subgraph data.plot (kind='hist',bins=20,color='c',alpha=0.5,edgecolor='c',histtype='bar',ax=ax1,figure=fig) # of the drawing through the ax parameter # plt.xlabel ("interval", fontdict= {'family':'Fangsong','fontsize':15}) ax1.set_xlabel ("interval", fontdict= {' family':'Fangsong','fontsize':15}) # plt.ylabel ("Frequency") in ax1 Fontdict= {'family':'Fangsong','fontsize':15}) ax1.set_ylabel ("frequency", fontdict= {' family':'Fangsong','fontsize':15}) ax1.set_title ("cyan") # print (ax1.get_xticks ()) data.plot (kind='hist',bins=20,color='y',alpha=0.5,edgecolor='y',histtype='bar',ax=ax2,figure=fig) # specify this picture in ax2 # plt.xlabel = plt.gca (). Set_xlabel () plt. Get the coordinate subgraph of "current" Locations that need to be carefully executed: plt.xlabel ("interval", fontdict= {'family':'Fangsong','fontsize':15}) plt.ylabel ("frequency", fontdict= {' family':'Fangsong','fontsize':15}) plt.title ("yellow") # subplot title plt.suptitle ("profit distribution histogram", fontdict= {'family':'Fangsong' 'size':22}) # figure title plt.tight_layout () # automatic compact layout Avoid occlusion of plt.show ()
Probability distribution histogram (statistical graph) #-*-coding:utf-8-*-import numpy as npimport matplotlib.pyplot as plt # probability distribution histogram # Gaussian distribution # mean 0mean = standard deviation 1, response data set or scattered values sigma = 1x=mean+sigma*np.random.randn (10000) fig, (ax0,ax1) = plt.subplots (nrows=2,figsize= (9) 6) # is the second parameter wider or narrower? The bigger, the narrower, the denser ax0.hist (XGrame 40 density calendar 1, histtypedia density, barometer, facecolorship, fellowship, greenbelt, alphasia, 0.75) # histtype returns an array of bar's array # # pdf probability distribution map, the number of ax0.set_title ('pdf') ax1.hist of ten thousand numbers falling in a certain interval (xmem20 density, parallelism, parallelepagenesis, parallelepagenesis, parallelephift, epimorphy, truth, ax0.set_title, 0. 8) The cumulative probability function of # cdf for the accumulation of cumulative=True values, cumulative accumulation. For example, the probability ax1.set_title ("cdf") fig.subplots_adjust (hspace=0.4) plt.show () of numbers less than 5 needs to be counted.
Display line chart distribution in histogram import matplotlib.mlab as mlabimport matplotlib.pyplot as pltmpl.rcParams ['font.sans-serif'] = [' SimHei'] # display Chinese plt.rcParams ['font.sans-serif'] =' KaiTi' # set global font to Chinese italics plt.rcParams ['axes.unicode_minus'] = False # normal display minus plt.figure (figsize= (17Power8) Dpi=120) import numpy as npfrom scipy.stats import normnp.random.seed (10680801) mu=100sigma=15x=mu+sigma*np.random.randn (500) num_bins=60fig,ax=plt.subplots () # fig,ax=plt.subplots (ncols=2) # ax1 = ax [0] # ax2 = ax [1] n BinsBinsFinder patchesdensity ax.hist (xnumlembinsdensitydensity True) y=norm.pdf (bins,mu,sigma) ax.plot (bins,y) '-') ax.set_xlabel ('IQ') ax.set_ylabel (' probability density') ax.set_title (r'IQ distribution histogram') fig.tight_layout ()
Stacking area histogram import numpy as npimport pandas as pdfrom matplotlib import pyplot as pltcrime=pd.read_csv (r "http://datasets.flowingdata.com/crimeRatesByState2005.csv")fig,ax=plt.subplots() ax.hist (crime [" robbery "], bins=12,histtype=" bar ", alpha=0.6,label=" robbery ", stacked=True) ax.hist (crime [" aggravated_assault "], bins=12,histtype=" bar ", alpha=0.6,label=" aggravated_assault ") Stacked=True) ax.legend () ax.set_xticks (np.arange (0mem721)) ax.set_xlim (0720) ax.set_yticks (np.arange (0mem21)) plt.show ()
Draw the numerical distribution of various types of crime data in different subgraphs import numpy as npimport pandas as pdfrom matplotlib import pyplot as pltcrime=pd.read_csv (r "http://datasets.flowingdata.com/crimeRatesByState2005.csv") crime= crime.query (" stateful crimes United States' "). Query (" statecraft district of Columbia' ") plt.figure (figsize= (10 nrows*ncols 5), dpi=120) nrows=2ncols=4n = np.arange (nrows*ncols) + 1for i in n: ax = plt.subplot (nrows,ncols) I) ax.hist (crime.iloc [:, I]) ax.set_title (crime.columns [I]) plt.suptitle ("numerical distribution of various types of crime data", yearly 1.02) plt.tight_layout ()
Other cases passenger age distribution frequency histogram # import third-party library import pandas as pdimport matplotlib.pyplot as plt # set Chinese plt.rcParams ['font.sans-serif'] = [' SimHei'] # create graphic plt.figure (figsize= (20J8) Dpi=80) # prepare data (read Titanic dataset) titanic = pd.read_csv (ringing E:\ PythonData\ exercise_data\ train.csv') # check age for missing any (titanic.Age.isnull ()) # remove observation titanic.dropna with missing age (subset= ['Age'], inplace=True) # drawing: frequency histogram of passenger age plt.hist (titanic.Age, # drawing data bins = 20 # specify the number of bars of the histogram as 20 color = 'steelblue', # specify the fill color edgecolor =' Kwon, # set the histogram boundary color label = 'histogram') # set the histogram rendering label # scale set plt.xticks (fontsize=15) plt.yticks (fontsize=15) # add description information plt.xlabel ('age: age') Fontsize=20) plt.ylabel ('number: number', fontsize=20) plt.title ('passenger age distribution', fontsize=20) # display graphic plt.show ()
Histogram of male and female passengers (two-dimensional data)
Set group distance and other parameters
# Import library import matplotlib.pyplot as pltimport numpy as np # set font plt.rcParams ['font.sans-serif'] = [' SimHei'] # create graphic plt.figure (figsize= (20Power8), dpi=80) # extract age data of different genders age_female = titanic.Age [= 'female'] age_male = titanic.Age [titanic.Sex = =' male'] # set the group distance of the histogram bins = np.arange (titanic.Age.min () Titanic.Age.max (), 2) # male passenger age histogram plt.hist (age_male, bins = bins, label = 'male', edgecolor = 'knight, color =' steelblue', alpha = 0.7) # female passenger age histogram plt.hist (age_female, bins = bins, label = 'female', edgecolor = 'kicking, alpha = 0.6 Color='r') # adjust scale plt.xticks (fontsize=15) plt.yticks (fontsize=15) # set axis label and title plt.title ('age histogram of male and female passengers', fontsize=20) plt.xlabel ('age', fontsize=20) plt.ylabel ('number', fontsize=20) # scale plt.tick_params (top='off', right='off') # shows legend plt.legend (loc='best') Fontsize=20) # display graphic plt.show ()
电影时长分布直方图# 导入库import matplotlib.pyplot as plt # 设置字体plt.rcParams['font.sans-serif'] = ['SimHei'] # 创建图形plt.figure(figsize=(20,8),dpi=80) # 准备数据time=[131,98,125,131,124,139,131,117,128,108,135,138,131,102,107,114,119,128,121,142,127,130,124,101,110,116,117,110,128,128,115,99,136,126, 134,95,138,117,111,78,132,124,113,150,110,117,86,95,144,105,126,130,126,130,126,116,123,106,112,138,123,86,101,99,136,123,117,119,105, 137,123,128,125,104,109,134,125,127,105,120,107,129,116,108,132,103,136,118,102,120,114,105,115,132,145,119,121,112,139,125,138,109, 132,134,156,106,117,127,144,139,139,119,140,83,110,102,123,107,143,115,136,118,139,123,112,118,125,109,119,133,112,114,122,109,106, 123,116,131,127,115,118,112,135,115,146,137,116,103,144,83,123,111,110,111, 100,154,136,100,118,119,133,134,106,129,126,110,111,109, 141,120,117,106,149,122,122,110,118,127,121,114,125,126,114,140,103,130,141,117,106,114,121,114,133,137,92,121,112,146,97,137,105,98, 117,112,81,97,139,113,134,106,144,110,137,137,111,104,117,100,111,101,110,105,129,137,112,120,113,133,112,83,94,146, 133,101,131,116, 111, 84137115122106144109123116111111133150] # set group distance bins=2 groups = int ((max (time)-min (time)) / bins) # draw histogram plt.hist (time,groups,color='b', edgecolor = 'kink, density = True) # specify the boundary color of the histogram) # adjust the scale plt.xticks (list (range (min (time), max (time) [:: 2] Fontsize=15) plt.yticks (fontsize=15) # add description information plt.xlabel ('movie duration: minutes', fontsize=20) plt.ylabel ('number of movies', fontsize=20) # add grid plt.grid (True,linestyle='--',alpha=1) # add title plt.title ('movie duration distribution histogram', fontsize=20) plt.show ()
At this point, the study on "what is the basic configuration of matplotlib histogram drawing" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.