In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly introduces the commonly used Matplotlib diagram Python code, the article is very detailed, has a certain reference value, interested friends must read it!
#! pip install brewer2mpl import numpy as np import pandas as pd import matplotlib as mpl import matplotlib.pyplot as plt import seaborn as sns import warnings; warnings.filterwarnings (action='once') large = 22; med = 16 Small = 12 params = {'axes.titlesize': large,' legend.fontsize': med, 'figure.figsize': (16,10),' axes.labelsize': med, 'axes.titlesize': med,' xtick.labelsize': med, 'ytick.labelsize': med 'figure.titlesize': large} plt.rcParams.update (params) plt.style.use (' seaborn-whitegrid') sns.set_style ("white")% matplotlib inline # Version print (mpl.__version__) # > 3.0.0 print (sns.__version__) # > 0.9.0
1. Scatter plot
Scatteplot is a classical and basic diagram used to study the relationship between two variables. If there are multiple groups in the data, you may need to visualize each group in a different color. In Matplotlib, you can use it easily.
# Import dataset midwest = pd.read_csv ("https://raw.githubusercontent.com/selva86/datasets/master/midwest_filter.csv") # Prepare Data # Create as many colors as there are unique midwest ['category'] categories = np.unique (midwest [' category']) colors = [plt.cm.tab10 (i/float (len (categories)-1) for i in range (len (categories))] # Draw Plot for Each Category plt.figure (figsize= (16,10), dpi= 80, facecolor='w' Edgecolor='k') for I, category in enumerate (categories): plt.scatter ('area',' poptotal', data=midwest.loc [midwest.category==category,:], Song20, c=colors [I], label=str (category)) # Decorations plt.gca (). Set (xlim= (0.0,0.1), ylim= (0,0.1), xlabel='Area', ylabel='Population') plt.xticks (fontsize=12) Plt.yticks (fontsize=12) plt.title (Scatterplot of Midwest Area vs Population, fontsize=22) plt.legend (fontsize=12) plt.show ()
two。 Bubble diagram with boundary
Sometimes you want to display a set of points within the boundary to emphasize their importance. In this example, you will take the record from the data frame that should be surrounded and pass it to the record described in the following code. Encircle ()
From matplotlib import patches from scipy.spatial import ConvexHull import warnings Warnings.simplefilter ('ignore') sns.set_style ("white") # Step 1: Prepare Data midwest = pd.read_csv ("https://raw.githubusercontent.com/selva86/datasets/master/midwest_filter.csv") # As many colors as there are unique midwest [' category'] categories = np.unique (midwest ['category']) colors = [plt.cm.tab10 (i/float (len (categories)-1) for i in range (len (categories))] # Step 2: Draw Scatterplot with unique color for each category fig = plt.figure (figsize= (16 10), dpi= 80, facecolor='w', edgecolor='k') for I, category in enumerate (categories): plt.scatter ('area',' poptotal', data=midwest.loc [midwest.category==category,:], c=colors [I], label=str (category), edgecolors='black', linewidths=.5) # Step 3: Encircling # https://stackoverflow.com/questions/44575681/how-do-i-encircle-different-data-sets-in-scatter-plot def encircle (xMague y, ax=None) * * kw): if not ax: ax=plt.gca () p = np.c_ [XQuery y] hull = ConvexHull (p) poly = plt.Polygon (p [hull.vertices,:], * * kw) ax.add_patch (poly) # Select data to be encircled midwestmidwest_encircle_data = midwest. [midwest.state = = 'IN',:] # Draw polygon surrounding vertices encircle (midwest_encircle_data.area Midwest_encircle_data.poptotal, ec= "k", fc= "gold", alpha=0.1) encircle (midwest_encircle_data.area, midwest_encircle_data.poptotal, ec= "firebrick", fc= "none", linewidth=1.5) # Step 4: Decorations plt.gca () .set (xlim= (0.0,0.1), ylim= (0, 90000), xlabel='Area', ylabel='Population') plt.xticks (fontsize=12) Plt.yticks (fontsize=12) plt.title (Bubble Plot with Encircling, fontsize=22) plt.legend (fontsize=12) plt.show ()
3. Scatter plot with linear regression best fitting line
If you want to understand how two variables change each other, then the most appropriate line is the way to go. The following figure shows the differences in the best fit lines between the groups in the data. To disable grouping and draw only one best fit line for the entire dataset, remove this parameter from the call below.
# Import Data df = pd.read_csv ("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv") dfdf_select = df.loc [df.cyl.isin ([4je 8]),:] # Plot sns.set_style (" white ") gridobj = sns.lmplot (x =" displ ", y =" hwy ", hue=" cyl ", data=df_select, height=7, aspect=1.6, robust=True, palette='tab10' Scatter_kws=dict (Scatterplot with line of best fit grouped by number of cylinders 60, linewidths=.7, edgecolors='black') # Decorations gridobj.set (xlim= (0.5,7.5), ylim= (0,50) plt.title ("Scatterplot with line of best fit grouped by number of cylinders", fontsize=20)
Each regression line is in its own column.
Alternatively, you can display the best fit lines for each group in its own column. You can do this by setting parameters in it.
# Import Data df = pd.read_csv ("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv") dfdf_select = df.loc [df.cyl.isin ([4je 8])),:] # Each line in its own column sns.set_style (" white ") gridobj = sns.lmplot (x =" displ ", y =" hwy ", data=df_select, height=7 Robust=True, palette='Set1', col= "cyl", scatter_kws=dict (linewidths=.7 60, linewidths=.7, edgecolors='black') # Decorations gridobj.set (xlim= (0.5,7.5), ylim= (0,50)) plt.show ()
4. Jitter diagram
Typically, multiple data points have exactly the same X and Y values. As a result, multiple points are drawn and hidden from each other. To avoid this, wobble a little so that you can see them visually. It's easy to use.
# Import Data df = pd.read_csv ("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv") # Draw Stripplot fig, ax= plt.subplots (figsize= (16jor10), dpi= 80) sns.stripplot (df.cty, df.hwy, jitter=0.25, size=8, axax=ax, linewidth=.5) # Decorations plt.title ('Use jittered plots to avoid overlapping of points', fontsize=22) plt.show ()
5. Counting chart
Another option to avoid the problem of point overlap is to increase the size of the point, depending on how many points are in the point. Therefore, the larger the size of the point, the greater the concentration of the surrounding points.
# Import Data df = pd.read_csv ("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv") dfdf_counts = df.groupby (['hwy',' cty']). Size (). Reset_index (name='counts') # Draw Stripplot fig, ax = plt.subplots (figsize= (16) 10), dpi= 80) sns.stripplot (df_counts.cty, df_counts.hwy, size=df_counts.counts*2 Axax=ax) # Decorations plt.title ('Counts Plot-Size of circle is bigger as more points overlap', fontsize=22) plt.show ()
6. Edge histogram
The edge histogram has a histogram of variables along the X and Y axes. This is used to visualize the relationship between X and Y and the single variable distribution of separate X and Y. If the graph is often used for exploratory data analysis (EDA).
# Import Data df = pd.read_csv ("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv") # Create Fig and gridspec fig = plt.figure (figsize= (16,10), dpi= 80) grid = plt.GridSpec (4,4, hspace=0.5, wspace=0.2) # Define the axes ax_main = fig.add_subplot (grid [:-1,:-1]) ax_right = fig.add_subplot (grid [:-1,-1], xticklabels= [] Yticklabels= []) ax_bottom = fig.add_subplot (grid [- 1, fig.add_subplot], xticklabels= [], yticklabels= []) # Scatterplot on main ax ax_main.scatter ('displ',' hwy', s=df.cty*4, c=df.manufacturer.astype ('category'). Cat.codes, alpha=.9, data=df, cmap= "tab10", edgecolors='gray', linewidths=.5) # histogram on the right ax_bottom.hist (df.displ, 40, histtype='stepfilled', orientation='vertical' Color='deeppink') ax_bottom.invert_yaxis () # histogram in the bottom ax_right.hist (df.hwy, 40, histtype='stepfilled', orientation='horizontal', color='deeppink') # Decorations ax_main.set (title='Scatterplot with Histograms displ vs hwy', xlabel='displ', ylabel='hwy') ax_main.title.set_fontsize (20) for item in ([ax_main.xaxis.label Ax_main.yaxis.label] + ax_main.get_xticklabels () + ax_main.get_yticklabels (): item.set_fontsize (14) xlabels = ax_main.get_xticks () .tolist () ax_main.set_xticklabels (xlabels) plt.show ()
7. Edge box diagram
The edge box diagram is similar to the edge histogram. However, the box chart helps to pinpoint the median of X and Y, the 25th and 75th percentiles.
# Import Data df = pd.read_csv ("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv") # Create Fig and gridspec fig = plt.figure (figsize= (16,10), dpi= 80) grid = plt.GridSpec (4,4, hspace=0.5, wspace=0.2) # Define the axes ax_main = fig.add_subplot (grid [:-1,:-1]) ax_right = fig.add_subplot (grid [:-1,-1], xticklabels= [] Yticklabels= []) ax_bottom = fig.add_subplot (grid [- 1, fig.add_subplot], xticklabels= [], yticklabels= []) # Scatterplot on main ax ax_main.scatter ('displ',' hwy', s=df.cty*5, c=df.manufacturer.astype ('category'). Cat.codes, alpha=.9, data=df, cmap= "Set1", edgecolors='black', linewidths=.5) # Add a graph in each part sns.boxplot (df.hwy, ax=ax_right, orient= "v") sns.boxplot (df.displ Ax=ax_bottom, orient= "h") # Decorations-# Remove xaxis name for the boxplot ax_bottom.set (xlabel='') ax_right.set (ylabel='') # Main Title, Xlabel and YLabel ax_main.set (title='Scatterplot with Histograms displ vs hwy', xlabel='displ', ylabel='hwy') # Set fontsize of different components ax_main.title.set_fontsize (20) for item in ([ax_main.xaxis.label Ax_main.yaxis.label] + ax_main.get_xticklabels () + ax_main.get_yticklabels (): item.set_fontsize (14) plt.show ()
8. Correlation diagram
Correlogram is used to visually view the related metrics between all possible pairs of numeric variables in a given data frame (or 2D array).
# Import Dataset df = pd.read_csv ("https://github.com/selva86/datasets/raw/master/mtcars.csv") # Plot plt.figure (figsize= (12jing10), dpi= 80) sns.heatmap (df.corr (), xticklabels=df.corr (). Columns, yticklabels=df.corr (). Columns, cmap='RdYlGn', center=0, annot=True) # Decorations plt.title ('Correlogram of mtcars', fontsize=22) plt.xticks (fontsize=12) plt.yticks (fontsize=12) plt.show ()
9. Matrix diagram
Pairwise diagrams are a favorite in exploratory analysis to understand the relationship between all possible pairs of digital variables. It is a necessary tool for bivariate analysis.
# Load Dataset df = sns.load_dataset ('iris') # Plot plt.figure (figsize= (10L8), dpi= 80) sns.pairplot (df, kind= "scatter", hue= "species", plot_kws=dict (swarm 80, edgecolor= "white", linewidth=2.5)) plt.show ()
# Load Dataset df = sns.load_dataset ('iris') # Plot plt.figure (figsize= (10 reg 8), dpi= 80) sns.pairplot (df, kind= "reg", hue= "species") plt.show ()
Deviation
10. Divergent bar chart
If you want to see the changes in the project based on a single metric, and you can visualize the order and number of differences, then the divergence bar is a good tool. It helps to quickly distinguish the performance of groups in the data, is very intuitive, and can convey this immediately.
# Prepare Data df = pd.read_csv ("https://github.com/selva86/datasets/raw/master/mtcars.csv") x = df.loc [:, ['mpg']] df [' mpg_z'] = (x-x.mean ()) / x.std () df ['colors'] = [' red' if x]
< 0 else 'green' for x in df['mpg_z']] df.sort_values('mpg_z', inplace=True) df.reset_index(inplace=True) # Draw plot plt.figure(figsize=(14,10), dpi= 80) plt.hlines(y=df.index, xmin=0, xmax=df.mpg_z, color=df.colors, alpha=0.4, linewidth=5) # Decorations plt.gca().set(ylabel='$Model$', xlabel='$Mileage$') plt.yticks(df.index, df.cars, fontsize=12) plt.title('Diverging Bars of Car Mileage', fontdict={'size':20}) plt.grid(linestyle='--', alpha=0.5) plt.show() 11. 发散型文本 分散的文本类似于发散条,如果你想以一种漂亮和可呈现的方式显示图表中每个项目的价值,它更喜欢。 # Prepare Data df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mtcars.csv") x = df.loc[:, ['mpg']] df['mpg_z'] = (x - x.mean())/x.std() df['colors'] = ['red' if x < 0 else 'green' for x in df['mpg_z']] df.sort_values('mpg_z', inplace=True) df.reset_index(inplace=True) # Draw plot plt.figure(figsize=(14,14), dpi= 80) plt.hlines(y=df.index, xmin=0, xmax=df.mpg_z) for x, y, tex in zip(df.mpg_z, df.index, df.mpg_z): t = plt.text(x, y, round(tex, 2), horizontalalignment='right' if x < 0 else 'left', verticalalignment='center', fontdict={'color':'red' if x < 0 else 'green', 'size':14}) # Decorations plt.yticks(df.index, df.cars, fontsize=12) plt.title('Diverging Text Bars of Car Mileage', fontdict={'size':20}) plt.grid(linestyle='--', alpha=0.5) plt.xlim(-2.5, 2.5) plt.show() 12. 发散型包点图 发散点图也类似于发散条。然而,与发散条相比,条的不存在减少了组之间的对比度和差异。 # Prepare Data df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mtcars.csv") x = df.loc[:, ['mpg']] df['mpg_z'] = (x - x.mean())/x.std() df['colors'] = ['red' if x < 0 else 'darkgreen' for x in df['mpg_z']] df.sort_values('mpg_z', inplace=True) df.reset_index(inplace=True) # Draw plot plt.figure(figsize=(14,16), dpi= 80) plt.scatter(df.mpg_z, df.index, s=450, alpha=.6, color=df.colors) for x, y, tex in zip(df.mpg_z, df.index, df.mpg_z): t = plt.text(x, y, round(tex, 1), horizontalalignment='center', verticalalignment='center', fontdict={'color':'white'}) # Decorations # Lighten borders plt.gca().spines["top"].set_alpha(.3) plt.gca().spines["bottom"].set_alpha(.3) plt.gca().spines["right"].set_alpha(.3) plt.gca().spines["left"].set_alpha(.3) plt.yticks(df.index, df.cars) plt.title('Diverging Dotplot of Car Mileage', fontdict={'size':20}) plt.xlabel('$Mileage$') plt.grid(linestyle='--', alpha=0.5) plt.xlim(-2.5, 2.5) plt.show() 13. 带标记的发散型棒棒糖图 带标记的棒棒糖通过强调您想要引起注意的任何重要数据点并在图表中适当地给出推理,提供了一种可视化分歧的灵活方式。 # Prepare Data df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mtcars.csv") x = df.loc[:, ['mpg']] df['mpg_z'] = (x - x.mean())/x.std() df['colors'] = 'black' # color fiat differently df.loc[df.cars == 'Fiat X1-9', 'colors'] = 'darkorange' df.sort_values('mpg_z', inplace=True) df.reset_index(inplace=True) # Draw plot import matplotlib.patches as patches plt.figure(figsize=(14,16), dpi= 80) plt.hlines(y=df.index, xmin=0, xmax=df.mpg_z, color=df.colors, alpha=0.4, linewidth=1) plt.scatter(df.mpg_z, df.index, color=df.colors, s=[600 if x == 'Fiat X1-9' else 300 for x in df.cars], alpha=0.6) plt.yticks(df.index, df.cars) plt.xticks(fontsize=12) # Annotate plt.annotate('Mercedes Models', xy=(0.0, 11.0), xytext=(1.0, 11), xycoords='data', fontsize=15, ha='center', va='center', bbox=dict(boxstyle='square', fc='firebrick'), arrowprops=dict(arrowstyle='-[, widthB=2.0, lengthB=1.5', lw=2.0, color='steelblue'), color='white') # Add Patches p1 = patches.Rectangle((-2.0, -1), width=.3, height=3, alpha=.2, facecolor='red') p2 = patches.Rectangle((1.5, 27), width=.8, height=5, alpha=.2, facecolor='green') plt.gca().add_patch(p1) plt.gca().add_patch(p2) # Decorate plt.title('Diverging Bars of Car Mileage', fontdict={'size':20}) plt.grid(linestyle='--', alpha=0.5) plt.show() 14.面积图 通过对轴和线之间的区域进行着色,区域图不仅强调峰值和低谷,而且还强调高点和低点的持续时间。高点持续时间越长,线下面积越大。 import numpy as np import pandas as pd # Prepare Data df = pd.read_csv("https://github.com/selva86/datasets/raw/master/economics.csv", parse_dates=['date']).head(100) x = np.arange(df.shape[0]) y_returns = (df.psavert.diff().fillna(0)/df.psavert.shift(1)).fillna(0) * 100 # Plot plt.figure(figsize=(16,10), dpi= 80) plt.fill_between(x[1:], y_returns[1:], 0, where=y_returns[1:] >= 0, facecolor='green', interpolate=True, alpha=0.7) plt.fill_between (x [1:], y_returns [1:], 0, where=y_ cities [1:] 0 else 'green', marker='o', markersize=6) ax.add_line (l) return l fig, ax= plt.subplots (1 dpi= 1, dpi= 80) # Vertical Lines ax.vlines (xanth1, ymin=500, ymax=13000, color='black', alpha=0.7, linewidth=1, linestyles='dotted') ax.vlines (xonom3, ymin=500 Ymax=13000, color='black', alpha=0.7, linewidth=1, linestyles='dotted') # Points ax.scatter (y=df ['1952'], x=np.repeat (1, df.shape [0]), ax.scatter (y=df ['1957'], x=np.repeat (3, df.shape [0]), y=df 10, color='black', alpha=0.7) # Line Segmentsand Annotation for p1, p2, c in zip (df ['1952'], df ['1957'] Df ['continent']): newline ([1je p1], [3je p2]) ax.text (1-0.05,p1, c +','+ str (round (p1)), horizontalalignment='right', verticalalignment='center', fontdict= {'size':14}) ax.text (30.05p2, c +','+ str (round (p2)), horizontalalignment='left', verticalalignment='center' Fontdict= {'size':14}) #' Before' and 'After' Annotations ax.text (1-0.05,13000,' BEFORE', horizontalalignment='right', verticalalignment='center', fontdict= {'size':18,' weight':700}) ax.text (3 '0.05,13000,' AFTER', horizontalalignment='left', verticalalignment='center', fontdict= {'size':18,' weight':700}) # Decoration ax.set_title ("Slopechart: Comparing GDP Per Capita between 1952 vs 1957" Fontdict= {'size':22}) ax.set (xlim= (0je 4), ylim= (0pr 14000), ylabel='Mean GDP Per Capita') ax.set_xticks ([1je 3]) ax.set_xticklabels (["1952", "1957"]) plt.yticks (np.arange (500,13000, 2000) Fontsize=12) # Lighten borders plt.gca (). Spines ["top"]. Set_alpha (.0) plt.gca (). Spines ["bottom"]. Set_alpha (.0) plt.gca (). Spines ["right"]. Set_alpha (.0) plt.gca (). Spines ["left"]. Set_alpha (.0) plt.show ()
19. Dumbbell diagram
The dumbbell chart conveys the "front" and "back" positions of various projects, as well as the order of items. It is useful if you want to visualize the impact of a particular project / plan on different objects.
Import matplotlib.lines as mlines # Import Data df = pd.read_csv ("https://raw.githubusercontent.com/selva86/datasets/master/health.csv") df.sort_values ('pct_2014', inplace=True) df.reset_index (inplace=True) # Func to draw line segment def newline (p1, p2, color='black'): ax = plt.gca () l = mlines.Line2D ([p1 [0], p2 [0]], [p1 [1], p2 [1]]) Color='skyblue') ax.add_line (l) return l # Figure and Axes fig, ax= plt.subplots (1Jing 1jigme figsize14), facecolor='#f7f7f7', dpi= 80) # Vertical Lines ax.vlines (xanth.05, ymin=0, ymax=26, color='black', alpha=1, linewidth=1, linestyles='dotted') ax.vlines (xanth.10, ymin=0, ymax=26, color='black', alpha=1, linewidth=1, linestyles='dotted') ax.vlines (xonom.15, ymin=0, ymax=26, color='black', alpha=1, linewidth=1 Linestyles='dotted') ax.vlines (xpend.20, ymin=0, ymax=26, color='black', alpha=1, linewidth=1, linestyles='dotted') # Points ax.scatter (y=df ['index'], x=df [' pct_2013'], slug 50, color='#0e668b', alpha=0.7) ax.scatter (y=df ['index'], x=df [' pct_2014'], slug 50, color='#a3c4dc', alpha=0.7) # Line Segments for i, p1, p2 in zip (df ['index'], df [' pct_2013'] Df ['pct_2014']): newline ([p1, I], [p2, I]) # Decoration ax.set_facecolor (' # f7f7f7') ax.set_title ("Dumbell Chart: Pct Change-2013 vs 2014", fontdict= {'size':22}) ax.set (xlim= (0Power.25), ylim= (- 1,27), ylabel='Mean GDP Per Capita') ax.set_xticks ([.05, .1, .15) .20]) ax.set_xticklabels (['5%,'15%,'20%,'25%]) ax.set_xticklabels (['5%,'15%,'20%,'25%']) plt.show ()
20. Histogram of continuous variables
The histogram shows the frequency distribution of a given variable. The following representation groups frequency bars based on classified variables to better understand continuous and concatenated variables.
# Import Data df = pd.read_csv ("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv") # Prepare data x_var = 'displ' groupby_var =' class' dfdf_agg = df.loc [:, [x_var, groupby_var]] .groupby (groupby_var) vals = [DF [x _ var] .values.tolist () for I, df in df_agg] # Draw plt.figure (figsize= (16mem9) Dpi= 80) colors = [plt.cm.Spectral (i/float (len (vals)-1) for i in range (len (vals))] n, bins, patches = plt.hist (vals, 30, stacked=True, density=False, color=colors [: len (vals)]) # Decoration plt.legend ({group:col for group, col in zip (np.unique (DF [groupby _ var]). Tolist () Colors [: len (vals)]}) plt.title (f "Stacked Histogram of ${x_var} $colored by ${groupby_var} $", fontsize=22) plt.xlabel (x_var) plt.ylabel ("Frequency") plt.ylim (0,25) plt.xticks (ticks=bins [:: 3], labels= [round (bjor1) for bin bins [:: 3]]) plt.show ()
21. Histogram of type variables
The histogram of the classified variable shows the frequency distribution of the variable. By shading the bar chart, you can associate the distribution with another classification variable that represents the color.
# Import Data df = pd.read_csv ("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv") # Prepare data x_var = 'manufacturer' groupby_var =' class' dfdf_agg = df.loc [:, [x_var, groupby_var]] .groupby (groupby_var) vals = [DF [x _ var] .values.tolist () for I, df in df_agg] # Draw plt.figure (figsize= (16mem9) Dpi= 80) colors = [plt.cm.Spectral (i/float (len (vals)-1) for i in range (len (vals))] n, bins, patches = plt.hist (vals, DF [x _ var] .unique (). _ len__ (), stacked=True, density=False, color=colors [: len (vals)]) # Decoration plt.legend ({group:col for group, col in zip (np.unique (DF [groupby _ var]). Tolist () Colors [: len (vals)]) plt.title (f "Stacked Histogram of ${x_var} $colored by ${groupby_var} $", fontsize=22) plt.xlabel (x_var) plt.ylabel ("Frequency") plt.ylim (0,40) plt.xticks (ticks=bins, labels=np.unique (DF [x _ var]). Tolist (), rotation=90, horizontalalignment='left') plt.show ()
twenty-two。 Density diagram
Density map is a common tool to visualize the distribution of continuous variables. You can examine the relationship between X and Y by grouping them with the response variable. In the following cases, how the distribution of urban mileage varies with the number of cylinders is described for representative purposes.
# Import Data df = pd.read_csv ("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv") # Draw Plot plt.figure (figsize= (16jue 10), dpi= 80) sns.kdeplot (df.loc [df ['cyl'] = = 4," cty "], shade=True, color=" g ", label=" Cyl=4 ", alpha=.7) sns.kdeplot (df.loc [df [' cyl'] = = 5," cty "], shade=True, color=" deeppink ", label=" Cyl=5 " Alpha=.7) sns.kdeplot (df.loc [df ['cyl'] = = 6, "cty"], shade=True, color= "dodgerblue", label= "Cyl=6", alpha=.7) sns.kdeplot (df.loc [df [' cyl'] = = 8, "cty"], shade=True, color= "orange", label= "Cyl=8", alpha=.7) # Decoration plt.title ('Density Plot of City Mileage by accurate Cylinders, fontsize=22) plt.legend ()
23. Histogram of density
The density curve with a histogram brings together the collective information conveyed by the two charts so that you can put them in one graph instead of two.
# Import Data df = pd.read_csv ("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv") # Draw Plot plt.figure (figsize= (13jing10), dpi= 80) sns.distplot (df.loc [df ['class'] = =' compact'," cty], color= "dodgerblue", label= "Compact", hist_kws= {'alpha':.7}, kde_kws= {' linewidth':3}) sns.distplot (df.loc [df ['class'] = =' suv' "cty"], color= "orange", label= "SUV", hist_kws= {'alpha':.7}, kde_kws= {' linewidth':3}) sns.distplot (df.loc [df ['class'] = =' minivan', "cty"], color= "g", label= "minivan", hist_kws= {'alpha':.7}, kde_kws= {' linewidth':3}) plt.ylim (0,0.35) # Decoration plt.title ('Density Plot of City Mileage by Vehicle Type') Fontsize=22) plt.legend () plt.show ()
24. 喜悦 Plot
喜悦 Plot allows the density curves of different groups to overlap, which is a good way to visualize the distribution relative to each other's large number of groups. It looks pleasing to the eye and clearly conveys the right message. It can easily build matplotlib using packages on which joypy is based.
#! pip install joypy # Import Data mpg = pd.read_csv ("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv") # Draw Plot plt.figure (figsize= (16d10), dpi= 80) fig, axes = joypy.joyplot (mpg, column= ['hwy',' cty'], by=" class ", ylim='own', figsize= (14d10)) # Decoration plt.title ('喜悦 Plot of City and Highway Mileage by Class', fontsize=22) plt.show ()
25. Distributed point graph
The distribution point graph shows the univariate distribution of points divided by group. The darker the number of points, the higher the concentration of data points in the region. By coloring the median differently, the true positioning of the group immediately becomes obvious.
Import matplotlib.patches as mpatches # Prepare Data df_raw = pd.read_csv ("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv") cyl_colors = {4 pd.read_csv Tablo reddish, 5 pd.read_csv Tabril greenbelt, 6 VAT Tablo bluetooth, 8 Prepare Data df_raw Tablo orange'} df_raw ['cyl_color'] = df_raw.cyl.map (cyl_colors) # Mean and Median city mileage by make df = df_raw [[' cty'] 'manufacturer'] .groupby (' manufacturer') .apply (lambda x: x.mean ()) df.sort_values ('cty', ascending=False, inplace=True) df.reset_index (inplace=True) df_median = df_raw [[' cty', 'manufacturer']] .groupby (' manufacturer') .apply (lambda x: x.median ()) # Draw horizontal lines fig, ax= plt.subplots (figsize= (16 cty', 10), dpi= 80) ax.hlines (y=df.index, xmin=0, xmax=40, color='gray', alpha=0.5 Linewidth=.5, linestyles='dashdot') # Draw the Dots for i, make in enumerate (df.manufacturer): df_make = df_ raw.loco [DF _ raw.manufacturer==make,:] ax.scatter (y=np.repeat (I, df_make.shape [0]), xylene ctypex, data=df_make, swarm 75, edgecolors='gray', cymbals, alpha=0.5) ax.scatter (yellowi, xylene ctypex, data=df_ median.loco [DF _ median.index==make,:], sound75 # Annotate ax.text (33, 13, "$red") Dots; are The: median$ ", fontdict= {'size':12}, color='firebrick') # Decorations red_patch = plt.plot ([], marker=" o ", ms=10, ls=", mec=None, color='firebrick', label=" Median ") plt.legend (handles=red_patch) ax.set_title (' Distribution of City Mileage by Make', fontdict= {'size':22}) ax.set_xlabel (' Miles Per Gallon (City)' Alpha=0.7) ax.set_yticks (df.index) ax.set_yticklabels (df.manufacturer.str.title (), fontdict= {'horizontalalignment':' right'}, alpha=0.7) ax.set_xlim (1 40) plt.xticks (alpha=0.7) plt.gca (). Spines ["top"]. Set_visible (False) plt.gca (). Spines ["bottom"]. Set_visible (False) plt.gca (). Spines ["right"]. Set_visible (False) plt.gca (). Spines ["left"]. Set_visible (False) plt.grid (axis='both', alpha=.4, linewidth=.1) plt.show ()
The above is all the contents of the article "what are the Python codes of commonly used Matplotlib diagrams?" Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.