In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly shows you the "Matplotlib visualization of the most valuable charts are", the content is easy to understand, clear, hope to help you solve your doubts, the following let the editor lead you to study and learn what are the most valuable charts of Matplotlib visualization "this article.
Introduction
These charts are grouped according to seven different scenarios of the visualization target. For example, if you want to imagine the relationship between two variables, look at the chart under the Associations section. Or, if you want to show how the value changes over time, check the changes section, and so on.
Important features of valid charts:
Convey correct and necessary information without distorting the facts.
The design is simple and you don't have to work too hard to understand it.
Support information from an aesthetic point of view rather than cover it up.
The information is not overloaded.
Preparatory work
Introduce the following settings before the code runs. Of course, a separate chart can reset the display elements.
#! pip install brewer2mpl
Import numpy as np
Import pandas as pd
Import matplotlib as mpl
Import matplotlib.pyplot as plt
Import seaborn as sns
Import warnings; warnings.filterwarnings (action='once')
Large = 22; med = 16; small = 12
Params = {'axes.titlesize': large
'legend.fontsize': med
'figure.figsize': (16,10)
'axes.labelsize': med
'axes.titlesize': med
'xtick.labelsize': med
'ytick.labelsize': med
'figure.titlesize': large}
Plt.rcParams.update (params)
Plt.style.use ('seaborn-whitegrid')
Sns.set_style ("white")
% matplotlib inline
# Version
Print (mpl.__version__) # > 3.0.0
Print (sns.__version__) # > 0.9.0
3.0.20.9.0 I. Association (Correlation)
The association chart is used to visualize the relationship between two or more variables. That is, how one variable changes relative to another.
1 scatter plot (Scatter plot)
A scatter chart is a classical and basic chart used to study the relationship between two variables. If there are multiple groups in the data, you may need to visualize each group in a different color. In matplotlib, you can easily do this using plt.scatterplot ().
# Import dataset
Midwest = pd.read_csv ("https://raw.githubusercontent.com/selva86/datasets/master/midwest_filter.csv")
# Prepare Data
# Create as many colors as there are unique midwest ['category']
Categories = np.unique (midwest ['category'])
Colors = [plt.cm.tab10 (i/float (len (categories)-1)) for i in range (len (categories))]
# Draw Plot for Each Category
Plt.figure (figsize= (16,10), dpi= 80, facecolor='w', edgecolor='k')
For I, category in enumerate (categories):
Plt.scatter ('area',' poptotal'
Data=midwest.loc [midwest.category==category,:]
Label=str 20, cmap=colors [I], label=str (category))
# "c =" changed to "cmap=", remarks for Python data
# Decorations
Plt.gca () .set (xlim= (0.0,0.1), ylim= (0, 90000)
Xlabel='Area', ylabel='Population')
Plt.xticks (fontsize=12); plt.yticks (fontsize=12)
Plt.title ("Scatterplot of Midwest Area vs Population", fontsize=22)
Plt.legend (fontsize=12)
Plt.show ()
Figure 1
2 Bubble diagram with boundary (Bubble plot with Encircling)
Sometimes you want to display a set of points within the boundary to emphasize their importance. In this example, you get the record from the data box and use encircle () described in the following code to display the boundary.
From matplotlib import patches
From scipy.spatial import ConvexHull
Import warnings; warnings.simplefilter ('ignore')
Sns.set_style ("white")
# Step 1: Prepare Data
Midwest = pd.read_csv ("https://raw.githubusercontent.com/selva86/datasets/master/midwest_filter.csv")
# As many colors as there are unique midwest ['category']
Categories = np.unique (midwest ['category'])
Colors = [plt.cm.tab10 (i/float (len (categories)-1)) for i in range (len (categories))]
# Step 2: Draw Scatterplot with unique color for each category
Fig = plt.figure (figsize= (16,10), dpi= 80, facecolor='w', edgecolor='k')
For I, category in enumerate (categories):
Plt.scatter ('area',' poptotal', data=midwest.loc [midwest.category==category,:]
Cmap=colors [I], label=str (category), edgecolors='black', linewidths=.5)
# "c =" changed to "cmap=", remarks for Python data
# Step 3: Encircling
# https://stackoverflow.com/questions/44575681/how-do-i-encircle-different-data-sets-in-scatter-plot
Def encircle (XBI y, ax=None, * * kw):
If not ax: ax=plt.gca ()
P = np.c_ [XBI y]
Hull = ConvexHull (p)
Poly = plt.Polygon (p [hull.vertices,:], * * kw)
Ax.add_patch (poly)
# Select data to be encircled
Midwest_encircle_data = midwest.loc.state = 'IN',:]
# Draw polygon surrounding vertices
Encircle (midwest_encircle_data.area, midwest_encircle_data.poptotal, ec= "k", fc= "gold", alpha=0.1)
Encircle (midwest_encircle_data.area, midwest_encircle_data.poptotal, ec= "firebrick", fc= "none", linewidth=1.5)
# Step 4: Decorations
Plt.gca () .set (xlim= (0.0,0.1), ylim= (0, 90000)
Xlabel='Area', ylabel='Population')
Plt.xticks (fontsize=12); plt.yticks (fontsize=12)
Plt.title ("Bubble Plot with Encircling", fontsize=22)
Plt.legend (fontsize=12)
Plt.show ()
Figure 2
3 scatter plot with linear regression best fitting line (Scatter plot with linear regression line of best fit)
If you want to understand how two variables change each other, then the best fit line is a common method. The following figure shows the differences in the best fit lines between the groups in the data. To disable grouping and draw only one best fit line for the entire dataset, remove the hue = 'cyl' parameter from the sns.lmplot () call below.
# Import Data
Df = pd.read_csv ("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")
Df_select = df.loc [df.cyl.isin ([4jin8]),:]
# Plot
Sns.set_style ("white")
Gridobj = sns.lmplot (x = "displ", y = "hwy", hue= "cyl", data=df_select
Height=7, aspect=1.6, robust=True, palette='tab10'
Scatter_kws=dict (linewidths=.7 60, linewidths=.7, edgecolors='black'))
# Decorations
Gridobj.set (xlim= (0.5,7.5), ylim= (0,50))
Plt.title ("Scatterplot with line of best fit grouped by number of cylinders", fontsize=20)
Plt.show ()
Figure 3
Draw a linear regression line for each column
Alternatively, you can display the best fit lines for each group in each column. You can do this by setting the col=groupingcolumn parameter in sns.lmplot (), as follows:
# Import Data
Df = pd.read_csv ("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")
Df_select = df.loc [df.cyl.isin ([4jin8]),:]
# Each line in its own column
Sns.set_style ("white")
Gridobj = sns.lmplot (x = "displ", y = "hwy"
Data=df_select
Height=7
Robust=True
Palette='Set1'
Col= "cyl"
Scatter_kws=dict (linewidths=.7 60, linewidths=.7, edgecolors='black'))
# Decorations
Gridobj.set (xlim= (0.5,7.5), ylim= (0,50))
Plt.show ()
Figure 3-2
4 Jitter diagram (Jittering with stripplot)
Typically, multiple data points have exactly the same X and Y values. As a result, multiple point drawings overlap and are hidden. To avoid this, wobble the data points slightly so that you can see them visually. It's easy to do this using seaborn's stripplot ().
# Import Data
Df = pd.read_csv ("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")
# Draw Stripplot
Fig, ax = plt.subplots (figsize= (16jin10), dpi= 80)
Sns.stripplot (df.cty, df.hwy, jitter=0.25, size=8, ax=ax, linewidth=.5)
# Decorations
Plt.title ('Use jittered plots to avoid overlapping of points', fontsize=22)
Plt.show ()
Figure 4
5 count chart (Counts Plot)
Another option to avoid the problem of point overlap is to increase the size of the point, depending on how many points are in the point. Therefore, the larger the size of the point, the higher the concentration of the points around it.
# Import Data
Df = pd.read_csv ("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")
Df_counts = df.groupby (['hwy',' cty']) .size () .reset_index (name='counts')
# Draw Stripplot
Fig, ax = plt.subplots (figsize= (16jin10), dpi= 80)
Sns.stripplot (df_counts.cty, df_counts.hwy, size=df_counts.counts*2, ax=ax)
# Decorations
Plt.title ('Counts Plot-Size of circle is bigger as more points overlap', fontsize=22)
Plt.show ()
Figure 5
6 Edge histogram (Marginal Histogram)
The edge histogram has a histogram of variables along the X and Y axes. This is used to visualize the relationship between X and Y and the single variable distribution of separate X and Y. Such diagrams are often used for exploratory data analysis (EDA).
Figure 6
7 Edge Box Diagram (Marginal Boxplot)
The edge box diagram is similar to the edge histogram. However, the box chart helps to pinpoint the median, 25th and 75th percentiles of X and Y.
Figure 7
8 correlation diagram (Correllogram)
The correlation graph is used to visually view the correlation metrics between all possible pairs of numeric variables in a given data box (or two-dimensional array).
# Import Dataset
Df = pd.read_csv ("https://github.com/selva86/datasets/raw/master/mtcars.csv")
# Plot
Plt.figure (figsize= (122.10), dpi= 80)
Sns.heatmap (df.corr (), xticklabels=df.corr (). Columns, yticklabels=df.corr (). Columns, cmap='RdYlGn', center=0, annot=True)
# Decorations
Plt.title ('Correlogram of mtcars', fontsize=22)
Plt.xticks (fontsize=12)
Plt.yticks (fontsize=12)
Plt.show ()
Figure 8
9 Matrix Diagram (Pairwise Plot)
Matrix diagrams are a favorite in exploratory analysis and are used to understand the relationships between all possible pairs of numerical variables. It is a necessary tool for bivariate analysis.
# Load Dataset
Df = sns.load_dataset ('iris')
# Plot
Plt.figure (figsize= (106.8), dpi= 80)
Sns.pairplot (df, kind= "scatter", hue= "species", plot_kws=dict (swarm 80, edgecolor= "white", linewidth=2.5))
Plt.show ()
Figure 9
# Load Dataset
Df = sns.load_dataset ('iris')
# Plot
Plt.figure (figsize= (106.8), dpi= 80)
Sns.pairplot (df, kind= "reg", hue= "species")
Plt.show ()
Figure 9-2
2. Deviation (Deviation) 10 divergent bar chart (Diverging Bars)
A scattered bar chart (Diverging Bars) is a good tool if you want to see the changes in the project based on a single metric and visualize the order and number of differences. It helps to quickly distinguish the performance of groups in the data, is very intuitive, and can convey this immediately.
# Prepare Data
Df = pd.read_csv ("https://github.com/selva86/datasets/raw/master/mtcars.csv")
X = df.loc [:, ['mpg']]
Df ['mpg_z'] = (x-x.mean ()) / x.std ()
Df ['colors'] = [' red' if x
< 0 else 'green' for x in df['mpg_z']] df.sort_values('mpg_z', inplace=True) df.reset_index(inplace=True) # Draw plot plt.figure(figsize=(14,10), dpi= 80) plt.hlines(y=df.index, xmin=0, xmax=df.mpg_z, color=df.colors, alpha=0.4, linewidth=5) # Decorations plt.gca().set(ylabel='$Model$', xlabel='$Mileage$') plt.yticks(df.index, df.cars, fontsize=12) plt.title('Diverging Bars of Car Mileage', fontdict={'size':20}) plt.grid(linestyle='--', alpha=0.5) plt.show() 图10 11 发散型文本 (Diverging Texts) 发散型文本 (Diverging Texts)与发散型条形图 (Diverging Bars)相似,如果你想以一种漂亮和可呈现的方式显示图表中每个项目的价值,就可以使用这种方法。 # Prepare Data df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mtcars.csv") x = df.loc[:, ['mpg']] df['mpg_z'] = (x - x.mean())/x.std() df['colors'] = ['red' if x < 0 else 'green' for x in df['mpg_z']] df.sort_values('mpg_z', inplace=True) df.reset_index(inplace=True) # Draw plot plt.figure(figsize=(14,14), dpi= 80) plt.hlines(y=df.index, xmin=0, xmax=df.mpg_z) for x, y, tex in zip(df.mpg_z, df.index, df.mpg_z): t = plt.text(x, y, round(tex, 2), horizontalalignment='right' if x < 0 else 'left', verticalalignment='center', fontdict={'color':'red' if x < 0 else 'green', 'size':14}) # Decorations plt.yticks(df.index, df.cars, fontsize=12) plt.title('Diverging Text Bars of Car Mileage', fontdict={'size':20}) plt.grid(linestyle='--', alpha=0.5) plt.xlim(-2.5, 2.5) plt.show() 图11 12 发散型包点图 (Diverging Dot Plot) 发散型包点图 (Diverging Dot Plot)也类似于发散型条形图 (Diverging Bars)。 然而,与发散型条形图 (Diverging Bars)相比,条的缺失减少了组之间的对比度和差异。 # Prepare Data df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mtcars.csv") x = df.loc[:, ['mpg']] df['mpg_z'] = (x - x.mean())/x.std() df['colors'] = ['red' if x < 0 else 'darkgreen' for x in df['mpg_z']] df.sort_values('mpg_z', inplace=True) df.reset_index(inplace=True) # Draw plot plt.figure(figsize=(14,16), dpi= 80) plt.scatter(df.mpg_z, df.index, s=450, alpha=.6, color=df.colors) for x, y, tex in zip(df.mpg_z, df.index, df.mpg_z): t = plt.text(x, y, round(tex, 1), horizontalalignment='center', verticalalignment='center', fontdict={'color':'white'}) # Decorations # Lighten borders plt.gca().spines["top"].set_alpha(.3) plt.gca().spines["bottom"].set_alpha(.3) plt.gca().spines["right"].set_alpha(.3) plt.gca().spines["left"].set_alpha(.3) plt.yticks(df.index, df.cars) plt.title('Diverging Dotplot of Car Mileage', fontdict={'size':20}) plt.xlabel('$Mileage$') plt.grid(linestyle='--', alpha=0.5) plt.xlim(-2.5, 2.5) plt.show()Figure 12
13 divergent lollipop map with markers (Diverging Lollipop Chart with Markers)
Tagged lollipop graphs provide a flexible way to visualize differences by highlighting any important data points you want to pay attention to and giving appropriate reasoning in the chart.
Figure 13
14 area map (Area Chart)
By coloring the area between the axis and the line, the area map emphasizes not only peaks and valleys, but also the duration of high and low points. The longer the duration of the high point, the larger the offline area.
Import numpy as np
Import pandas as pd
# Prepare Data
Df = pd.read_csv ("https://github.com/selva86/datasets/raw/master/economics.csv", parse_dates= ['date']) .head (100)
X = np.arange (df.shape [0])
Y_returns = (df.psavert.diff (). Fillna (0) / df.psavert.shift (1)) .fillna (0) * 100
# Plot
Plt.figure (figsize= (160.10), dpi= 80)
Plt.fill_between (x [1:], y_returns [1:], 0, where=y_ provinces [1:] > = 0, facecolor='green', interpolate=True, alpha=0.7)
Plt.fill_between (x [1:], y_returns [1:], 0, where=y_ cities [1:]
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.