Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the most valuable charts for Matplotlib visualization

2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly shows you the "Matplotlib visualization of the most valuable charts are", the content is easy to understand, clear, hope to help you solve your doubts, the following let the editor lead you to study and learn what are the most valuable charts of Matplotlib visualization "this article.

Introduction

These charts are grouped according to seven different scenarios of the visualization target. For example, if you want to imagine the relationship between two variables, look at the chart under the Associations section. Or, if you want to show how the value changes over time, check the changes section, and so on.

Important features of valid charts:

Convey correct and necessary information without distorting the facts.

The design is simple and you don't have to work too hard to understand it.

Support information from an aesthetic point of view rather than cover it up.

The information is not overloaded.

Preparatory work

Introduce the following settings before the code runs. Of course, a separate chart can reset the display elements.

#! pip install brewer2mpl

Import numpy as np

Import pandas as pd

Import matplotlib as mpl

Import matplotlib.pyplot as plt

Import seaborn as sns

Import warnings; warnings.filterwarnings (action='once')

Large = 22; med = 16; small = 12

Params = {'axes.titlesize': large

'legend.fontsize': med

'figure.figsize': (16,10)

'axes.labelsize': med

'axes.titlesize': med

'xtick.labelsize': med

'ytick.labelsize': med

'figure.titlesize': large}

Plt.rcParams.update (params)

Plt.style.use ('seaborn-whitegrid')

Sns.set_style ("white")

% matplotlib inline

# Version

Print (mpl.__version__) # > 3.0.0

Print (sns.__version__) # > 0.9.0

3.0.20.9.0 I. Association (Correlation)

The association chart is used to visualize the relationship between two or more variables. That is, how one variable changes relative to another.

1 scatter plot (Scatter plot)

A scatter chart is a classical and basic chart used to study the relationship between two variables. If there are multiple groups in the data, you may need to visualize each group in a different color. In matplotlib, you can easily do this using plt.scatterplot ().

# Import dataset

Midwest = pd.read_csv ("https://raw.githubusercontent.com/selva86/datasets/master/midwest_filter.csv")

# Prepare Data

# Create as many colors as there are unique midwest ['category']

Categories = np.unique (midwest ['category'])

Colors = [plt.cm.tab10 (i/float (len (categories)-1)) for i in range (len (categories))]

# Draw Plot for Each Category

Plt.figure (figsize= (16,10), dpi= 80, facecolor='w', edgecolor='k')

For I, category in enumerate (categories):

Plt.scatter ('area',' poptotal'

Data=midwest.loc [midwest.category==category,:]

Label=str 20, cmap=colors [I], label=str (category))

# "c =" changed to "cmap=", remarks for Python data

# Decorations

Plt.gca () .set (xlim= (0.0,0.1), ylim= (0, 90000)

Xlabel='Area', ylabel='Population')

Plt.xticks (fontsize=12); plt.yticks (fontsize=12)

Plt.title ("Scatterplot of Midwest Area vs Population", fontsize=22)

Plt.legend (fontsize=12)

Plt.show ()

Figure 1

2 Bubble diagram with boundary (Bubble plot with Encircling)

Sometimes you want to display a set of points within the boundary to emphasize their importance. In this example, you get the record from the data box and use encircle () described in the following code to display the boundary.

From matplotlib import patches

From scipy.spatial import ConvexHull

Import warnings; warnings.simplefilter ('ignore')

Sns.set_style ("white")

# Step 1: Prepare Data

Midwest = pd.read_csv ("https://raw.githubusercontent.com/selva86/datasets/master/midwest_filter.csv")

# As many colors as there are unique midwest ['category']

Categories = np.unique (midwest ['category'])

Colors = [plt.cm.tab10 (i/float (len (categories)-1)) for i in range (len (categories))]

# Step 2: Draw Scatterplot with unique color for each category

Fig = plt.figure (figsize= (16,10), dpi= 80, facecolor='w', edgecolor='k')

For I, category in enumerate (categories):

Plt.scatter ('area',' poptotal', data=midwest.loc [midwest.category==category,:]

Cmap=colors [I], label=str (category), edgecolors='black', linewidths=.5)

# "c =" changed to "cmap=", remarks for Python data

# Step 3: Encircling

# https://stackoverflow.com/questions/44575681/how-do-i-encircle-different-data-sets-in-scatter-plot

Def encircle (XBI y, ax=None, * * kw):

If not ax: ax=plt.gca ()

P = np.c_ [XBI y]

Hull = ConvexHull (p)

Poly = plt.Polygon (p [hull.vertices,:], * * kw)

Ax.add_patch (poly)

# Select data to be encircled

Midwest_encircle_data = midwest.loc.state = 'IN',:]

# Draw polygon surrounding vertices

Encircle (midwest_encircle_data.area, midwest_encircle_data.poptotal, ec= "k", fc= "gold", alpha=0.1)

Encircle (midwest_encircle_data.area, midwest_encircle_data.poptotal, ec= "firebrick", fc= "none", linewidth=1.5)

# Step 4: Decorations

Plt.gca () .set (xlim= (0.0,0.1), ylim= (0, 90000)

Xlabel='Area', ylabel='Population')

Plt.xticks (fontsize=12); plt.yticks (fontsize=12)

Plt.title ("Bubble Plot with Encircling", fontsize=22)

Plt.legend (fontsize=12)

Plt.show ()

Figure 2

3 scatter plot with linear regression best fitting line (Scatter plot with linear regression line of best fit)

If you want to understand how two variables change each other, then the best fit line is a common method. The following figure shows the differences in the best fit lines between the groups in the data. To disable grouping and draw only one best fit line for the entire dataset, remove the hue = 'cyl' parameter from the sns.lmplot () call below.

# Import Data

Df = pd.read_csv ("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")

Df_select = df.loc [df.cyl.isin ([4jin8]),:]

# Plot

Sns.set_style ("white")

Gridobj = sns.lmplot (x = "displ", y = "hwy", hue= "cyl", data=df_select

Height=7, aspect=1.6, robust=True, palette='tab10'

Scatter_kws=dict (linewidths=.7 60, linewidths=.7, edgecolors='black'))

# Decorations

Gridobj.set (xlim= (0.5,7.5), ylim= (0,50))

Plt.title ("Scatterplot with line of best fit grouped by number of cylinders", fontsize=20)

Plt.show ()

Figure 3

Draw a linear regression line for each column

Alternatively, you can display the best fit lines for each group in each column. You can do this by setting the col=groupingcolumn parameter in sns.lmplot (), as follows:

# Import Data

Df = pd.read_csv ("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")

Df_select = df.loc [df.cyl.isin ([4jin8]),:]

# Each line in its own column

Sns.set_style ("white")

Gridobj = sns.lmplot (x = "displ", y = "hwy"

Data=df_select

Height=7

Robust=True

Palette='Set1'

Col= "cyl"

Scatter_kws=dict (linewidths=.7 60, linewidths=.7, edgecolors='black'))

# Decorations

Gridobj.set (xlim= (0.5,7.5), ylim= (0,50))

Plt.show ()

Figure 3-2

4 Jitter diagram (Jittering with stripplot)

Typically, multiple data points have exactly the same X and Y values. As a result, multiple point drawings overlap and are hidden. To avoid this, wobble the data points slightly so that you can see them visually. It's easy to do this using seaborn's stripplot ().

# Import Data

Df = pd.read_csv ("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")

# Draw Stripplot

Fig, ax = plt.subplots (figsize= (16jin10), dpi= 80)

Sns.stripplot (df.cty, df.hwy, jitter=0.25, size=8, ax=ax, linewidth=.5)

# Decorations

Plt.title ('Use jittered plots to avoid overlapping of points', fontsize=22)

Plt.show ()

Figure 4

5 count chart (Counts Plot)

Another option to avoid the problem of point overlap is to increase the size of the point, depending on how many points are in the point. Therefore, the larger the size of the point, the higher the concentration of the points around it.

# Import Data

Df = pd.read_csv ("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")

Df_counts = df.groupby (['hwy',' cty']) .size () .reset_index (name='counts')

# Draw Stripplot

Fig, ax = plt.subplots (figsize= (16jin10), dpi= 80)

Sns.stripplot (df_counts.cty, df_counts.hwy, size=df_counts.counts*2, ax=ax)

# Decorations

Plt.title ('Counts Plot-Size of circle is bigger as more points overlap', fontsize=22)

Plt.show ()

Figure 5

6 Edge histogram (Marginal Histogram)

The edge histogram has a histogram of variables along the X and Y axes. This is used to visualize the relationship between X and Y and the single variable distribution of separate X and Y. Such diagrams are often used for exploratory data analysis (EDA).

Figure 6

7 Edge Box Diagram (Marginal Boxplot)

The edge box diagram is similar to the edge histogram. However, the box chart helps to pinpoint the median, 25th and 75th percentiles of X and Y.

Figure 7

8 correlation diagram (Correllogram)

The correlation graph is used to visually view the correlation metrics between all possible pairs of numeric variables in a given data box (or two-dimensional array).

# Import Dataset

Df = pd.read_csv ("https://github.com/selva86/datasets/raw/master/mtcars.csv")

# Plot

Plt.figure (figsize= (122.10), dpi= 80)

Sns.heatmap (df.corr (), xticklabels=df.corr (). Columns, yticklabels=df.corr (). Columns, cmap='RdYlGn', center=0, annot=True)

# Decorations

Plt.title ('Correlogram of mtcars', fontsize=22)

Plt.xticks (fontsize=12)

Plt.yticks (fontsize=12)

Plt.show ()

Figure 8

9 Matrix Diagram (Pairwise Plot)

Matrix diagrams are a favorite in exploratory analysis and are used to understand the relationships between all possible pairs of numerical variables. It is a necessary tool for bivariate analysis.

# Load Dataset

Df = sns.load_dataset ('iris')

# Plot

Plt.figure (figsize= (106.8), dpi= 80)

Sns.pairplot (df, kind= "scatter", hue= "species", plot_kws=dict (swarm 80, edgecolor= "white", linewidth=2.5))

Plt.show ()

Figure 9

# Load Dataset

Df = sns.load_dataset ('iris')

# Plot

Plt.figure (figsize= (106.8), dpi= 80)

Sns.pairplot (df, kind= "reg", hue= "species")

Plt.show ()

Figure 9-2

2. Deviation (Deviation) 10 divergent bar chart (Diverging Bars)

A scattered bar chart (Diverging Bars) is a good tool if you want to see the changes in the project based on a single metric and visualize the order and number of differences. It helps to quickly distinguish the performance of groups in the data, is very intuitive, and can convey this immediately.

# Prepare Data

Df = pd.read_csv ("https://github.com/selva86/datasets/raw/master/mtcars.csv")

X = df.loc [:, ['mpg']]

Df ['mpg_z'] = (x-x.mean ()) / x.std ()

Df ['colors'] = [' red' if x

< 0 else 'green' for x in df['mpg_z']] df.sort_values('mpg_z', inplace=True) df.reset_index(inplace=True) # Draw plot plt.figure(figsize=(14,10), dpi= 80) plt.hlines(y=df.index, xmin=0, xmax=df.mpg_z, color=df.colors, alpha=0.4, linewidth=5) # Decorations plt.gca().set(ylabel='$Model$', xlabel='$Mileage$') plt.yticks(df.index, df.cars, fontsize=12) plt.title('Diverging Bars of Car Mileage', fontdict={'size':20}) plt.grid(linestyle='--', alpha=0.5) plt.show() 图10 11 发散型文本 (Diverging Texts) 发散型文本 (Diverging Texts)与发散型条形图 (Diverging Bars)相似,如果你想以一种漂亮和可呈现的方式显示图表中每个项目的价值,就可以使用这种方法。 # Prepare Data df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mtcars.csv") x = df.loc[:, ['mpg']] df['mpg_z'] = (x - x.mean())/x.std() df['colors'] = ['red' if x < 0 else 'green' for x in df['mpg_z']] df.sort_values('mpg_z', inplace=True) df.reset_index(inplace=True) # Draw plot plt.figure(figsize=(14,14), dpi= 80) plt.hlines(y=df.index, xmin=0, xmax=df.mpg_z) for x, y, tex in zip(df.mpg_z, df.index, df.mpg_z): t = plt.text(x, y, round(tex, 2), horizontalalignment='right' if x < 0 else 'left', verticalalignment='center', fontdict={'color':'red' if x < 0 else 'green', 'size':14}) # Decorations plt.yticks(df.index, df.cars, fontsize=12) plt.title('Diverging Text Bars of Car Mileage', fontdict={'size':20}) plt.grid(linestyle='--', alpha=0.5) plt.xlim(-2.5, 2.5) plt.show() 图11 12 发散型包点图 (Diverging Dot Plot) 发散型包点图 (Diverging Dot Plot)也类似于发散型条形图 (Diverging Bars)。 然而,与发散型条形图 (Diverging Bars)相比,条的缺失减少了组之间的对比度和差异。 # Prepare Data df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mtcars.csv") x = df.loc[:, ['mpg']] df['mpg_z'] = (x - x.mean())/x.std() df['colors'] = ['red' if x < 0 else 'darkgreen' for x in df['mpg_z']] df.sort_values('mpg_z', inplace=True) df.reset_index(inplace=True) # Draw plot plt.figure(figsize=(14,16), dpi= 80) plt.scatter(df.mpg_z, df.index, s=450, alpha=.6, color=df.colors) for x, y, tex in zip(df.mpg_z, df.index, df.mpg_z): t = plt.text(x, y, round(tex, 1), horizontalalignment='center', verticalalignment='center', fontdict={'color':'white'}) # Decorations # Lighten borders plt.gca().spines["top"].set_alpha(.3) plt.gca().spines["bottom"].set_alpha(.3) plt.gca().spines["right"].set_alpha(.3) plt.gca().spines["left"].set_alpha(.3) plt.yticks(df.index, df.cars) plt.title('Diverging Dotplot of Car Mileage', fontdict={'size':20}) plt.xlabel('$Mileage$') plt.grid(linestyle='--', alpha=0.5) plt.xlim(-2.5, 2.5) plt.show()

Figure 12

13 divergent lollipop map with markers (Diverging Lollipop Chart with Markers)

Tagged lollipop graphs provide a flexible way to visualize differences by highlighting any important data points you want to pay attention to and giving appropriate reasoning in the chart.

Figure 13

14 area map (Area Chart)

By coloring the area between the axis and the line, the area map emphasizes not only peaks and valleys, but also the duration of high and low points. The longer the duration of the high point, the larger the offline area.

Import numpy as np

Import pandas as pd

# Prepare Data

Df = pd.read_csv ("https://github.com/selva86/datasets/raw/master/economics.csv", parse_dates= ['date']) .head (100)

X = np.arange (df.shape [0])

Y_returns = (df.psavert.diff (). Fillna (0) / df.psavert.shift (1)) .fillna (0) * 100

# Plot

Plt.figure (figsize= (160.10), dpi= 80)

Plt.fill_between (x [1:], y_returns [1:], 0, where=y_ provinces [1:] > = 0, facecolor='green', interpolate=True, alpha=0.7)

Plt.fill_between (x [1:], y_returns [1:], 0, where=y_ cities [1:]

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report