In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article introduces the relevant knowledge of "how to use Python to quickly reveal various relationships between data". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
Exploratory data Analysis (EDA) involves two basic steps:
Data analysis (data preprocessing, cleaning and processing).
Data visualization (using different types of diagrams to show the relationships in the data).
Pandas is the most commonly used data analysis library in Python. Python provides a large number of libraries for data visualization, of which Matplotlib is the most commonly used, providing complete control over drawings and making it easy to customize them.
However, Matplotlib lacks support for Pandas. Seaborn makes up for this defect. It is a data visualization library built on Matplotlib and tightly integrated with Pandas.
However, Seaborn does a good job, but there are so many functions that people don't know how to use them. Don't be pussy, this article is to clarify this point, so that you can quickly master this sharp weapon.
This article mainly covers the following contents
Different drawing types available in Seaborn.
How can the integration of Pandas and Seaborn realize the drawing of complex multi-dimensional graphs with the least amount of code?
How to customize Seaborn drawing settings with the help of Matplotlib?
1. Matplotlib
Although many tasks can be accomplished with only the simplest features, it is important to understand the basics of Matplotlib for two reasons
Seaborn uses Matplotlib to draw at the bottom.
Some customizations require the direct use of Matplotlib.
Here is a brief overview of the basics of Matplotlib. The following figure shows the various elements of the Matplotlib window.
The three main classes you need to know are Figure, Axes, and Axis.
Figure: it refers to the entire graphics window that you see. There may be multiple subgraphs (axes) in the same drawing. In the above example, there are four subgraphs (axes) in a graph.
Axes: the graph axis refers to the graph actually drawn in the graph. A graph can have multiple axes, but a given axis is only part of the whole graph. In the above example, we have four axes in a graph.
Axis: axes refer to the actual x-axes and y-axes in a particular graph axis.
Each example in this post assumes that the required modules and datasets have been loaded, as shown below
Import pandas as pd import numpy as np from matplotlib import pyplot as plt import seaborn as sns tips = sns.load_dataset ('tips') iris = sns.load_dataset (' iris') import matplotlib matplotlib.style.use ('ggplot') tips.head ()
Iris.head ()
Let's use an example to understand the Figure and Axes classes.
Dates = ['1981-01-01-01,' 1981-01-02, '1981-01-03,' 1981-01-04, '1981-01-05,' 1981-01-06, '1981-01-07,' 1981-01-08, '1981-01-09,' 1981-01-10'] min_temperature = [20.7,17.9,18.8,14.6,15.8,15.8,15.8] 17.4,21.8,20.0] max_temperature = [34.7,28.9,31.8,25.6,28.8,21.8,22.8,28.4,30.8,32.0] fig, axes = plt.subplots (nrows=1, ncols=1, figsize= (10Magne 5)) Axes.plot (dates, min_temperature, label='Min Temperature'); axes.plot (dates, max_temperature, label=' Max Temperature'); axes.legend ()
Plt.subplots () creates an instance of the Figure object, as well as nrows x ncols instances of Axes, and returns the created Figure object and Axes instance. In the above example, because we passed nrows = 1 and ncols = 1, it only created one instance of Axes. If nrows > 1 or ncols > 1, an Axes grid is created and returned as an numpy array of ncols columns in the nrows row.
The most commonly used customization methods of the Axes class are
Axes.set_xlabel () Axes.set_ylabel () Axes.set_xlim () Axes.set_ylim () Axes.set_xticks () Axes.set_yticks () Axes.set_xticklabels () Axes.set_yticklabels () Axes.set_title () Axes.tick_params ()
Here is an example of customization using some of the above methods
Fontsize= 20 fig, axes = plt.subplots (nrows=1, ncols=1, figsize= (157.7)) axes.plot (dates, min_temperature, label='Min Temperature') axes.plot (dates, max_temperature, label='Max Temperature') axes.set_xlabel ('Date', fontsizefontsize=fontsize) axes.set_ylabel (' Temperature', fontsizefontsize=fontsize) axes.set_title ('Daily Min and Max Temperature', fontsizefontsize=fontsize) axes.set_xticks (dates) axes.set_xticklabels (dates) axes.tick_params (' x, labelsize=fontsize, labelrotation=30 Size=15) axes.set_ylim (1010) axes.set_yticks (np.arange (10) 41)) axes.tick_params ('yelsizeboat font size) axes.legend (fontsizefontsize=fontsize,loc='upper left', bbox_to_anchor= (1)
Above we have a quick understanding of the basics of Matplotlib, now let's enter Seaborn.
II. Seaborn
Each drawing function in Seaborn is both a graph-level function and an axis-level function, so it is necessary to understand the difference between the two.
As mentioned earlier, the graph refers to the graph you see on the entire drawing window, and the graph axis refers to a specific subgraph in the graph.
Axis-level functions are drawn only on a single Matplotlib axis and do not affect the rest of the graph.
On the other hand, the graph-level function can control the whole graph.
We can understand this this way. Graphics-level functions can call different axis-level functions to draw different types of subgraphs on different axes.
Sns.set_style ('darkgrid')
1. Graph axis level function
The following is a detailed list of all the axis-level functions in Seaborn.
Diagram Relational Plots:
Scatterplot ()
Lineplot ()
Category diagram Categorical Plots:
Striplot (), swarmplot ()
Boxplot (), boxenplot ()
Violinplot (), countplot ()
Pointplot (), barplot ()
Distribution map Distribution Plots:
Distplot ()
Kdeplot ()
Rugplot ()
Regression graph Regression Plots:
Regplot ()
Residplot ()
Matrix graph MatrixPlots ():
Heatmap ()
Two things you need to know to use any graph axis-level function
Different ways to provide input data to graph axis-level functions.
Specifies the axis used for drawing.
(1) different methods of providing input data to graph axis-level functions
a. A list, array, or series.
The most common way to pass data to axis-level functions is to use iterators, such as list list, array array, or sequence series
Total_bill = tips ['total_bill']. Values tip = tips [' tip']. Values fig = plt.figure (figsize= (10,5)) sns.scatterplot (total_bill, tip, 15)
Tip = tips ['tip']. Values day = tips [' day']. Values fig = plt.figure (figsize= (10,5)) sns.boxplot (day, tip, palette= "Set2")
b. Use the Dataframe type of Pandas and the column name.
One of the main reasons why Seaborn is popular is that it can be used directly with Pandas's Dataframes. In this data passing method, the column name should be passed to the x and y parameters, and the Dataframe should be passed to the data parameter.
Fig = plt.figure (figsize= (10,5)) sns.scatterplot (x-rays to taltalions billboards, yawns-tippers, data=tips, slots 50)
Fig = plt.figure (figsize= (10,5)) sns.boxplot (x-ray daylight, yearly daylight, data=tips)
c. Pass only Dataframe
In this way of data transfer, only the Dataframe is passed to the data parameter. Each numeric column in the dataset will be drawn using this method. This method can only be used with the following axis-level functions
Stripplot (), swarmplot ()
Boxplot (), boxenplot (), violinplot (), pointplot ()
Barplot (), countplot ()
Using the above axis-level functions to show the distribution of multiple numeric variables in a dataset is a common use case for this way of data transfer.
Fig = plt.figure (figsize= (10,5)) sns.boxplot (data=iris)
(2) specify the drawing axis for drawing
Each graph axis-level function in Seaborn takes an ax parameter. The Axes passed to the ax parameter will be responsible for the specific drawing. This provides great flexibility for controlling the use of specific drawing axes for drawing. For example, suppose we want to look at the relationship between the total bill bill and tip tip (using a scatter chart) and their distribution (using a box chart), we want to show them on the same graph but on different axes.
Fig, axes = plt.subplots (1, 2, figsize= (10, 7)) sns.scatterplot (x-switched to taltalized billboards, yawned tippers, data=tips, ax=axes [1]); sns.boxplot (data=tips [['total_bill','tip']], ax=axes [0])
Each axis-level function also returns the axis on which the actual drawing is performed. If the axis is passed to the ax parameter, the axis object is returned. You can then use different methods (such as Axes.set_xlabel (), Axes.set_ylabel (), etc.) to further customize the returned axis object.
If no axis is passed to the ax parameter, Seaborn uses the current (active) axis for drawing.
Fig, curr_axes = plt.subplots () scatter_plot_axes = sns.scatterplot (xylene to total billboards, yawns tippers, data=tips) id (curr_axes) = = id (scatter_plot_axes)
True
In the above example, even though we did not explicitly pass curr_axes (the current active graph axis) to the ax parameter, Seaborn still uses it to draw because it is the current active graph axis. Id (curr_axes) = = id (scatter_plot_axes) returns True, indicating that they are the same axis.
If the axis is not passed to the ax parameter and there is no currently active axis object, Seaborn creates a new axis object for drawing and then returns it.
The axis-level function in Seaborn has no parameters to control the size of the drawing. However, because we can specify which axis to use for drawing, we can control the size of the drawing by passing the axis for the ax parameter, as shown below.
Fig, axes = plt.subplots (1,1, figsize= (10,5)) sns.scatterplot
two。 Graph level function
When browsing cubes, one of the most common use cases of data visualization is to draw multiple instances of the same class graph for each subset of data.
The graph-level functions in Seaborn are tailored to this situation.
The drawing-level function has complete control over the entire drawing, and each time the drawing-level function is called, it creates a new drawing that contains multiple axes.
The three most common graph-level functions in Seaborn are FacetGrid, PairGrid, and JointGrid.
(1) FacetGrid
Considering the following use case, we want to visualize the relationship between the total bill and tips on different data subsets (through a scatter chart). Each subset of data is classified by a unique combination of the values of the following variables
What day of the week (Thursday, Friday, Saturday, Sunday)
Whether or not to smoke (yes or no)
Gender (male or female)
As shown below, we can easily do this with Matplotlib and Seaborn
Row_variable = 'day' col_variable =' smoker' hue_variable = 'sex' row_variables = tips [row _ variable] .unique () col_variables = TPS [col _ variable] .unique () num_rows = row_variables.shape [0] num_cols = col_variables.shape [0] fig,axes = plt.subplots (num_rows, num_cols, sharex=True, sharey=True, figsize= (15J 10)) subset = tips.groupby ([row_variable Col_variable]) for row in range (num_rows): for col in range (num_cols): ax= axes [row] [col] row_id = row_ variables [row] col_id = col_ variables [col] ax_data = subset.get_group ((row_id, col_id)) sns.scatterplot Title = row_variable +':'+ row_id +'|'+ col_variable +':'+ col_id ax.set_title (title)
After analysis, the above code can be divided into three steps.
Create a graph axis (subgraph) for each subset of data
Divide a dataset into subsets
On each axis, a scatter plot is drawn using a subset of data corresponding to that axis.
In Seaborn, the above trilogy can be further simplified to two.
Step 1 can be done in Seaborn using FacetGrid ()
Steps 2 and 3 can be done using FacetGrid.map ()
Using FacetGrid, we can create a graph axis and combine the row,col and hue parameters to divide the dataset into three dimensions. Once you have created the FacetGrid, you can pass the specific drawing function as an argument to FacetGrid.map () to draw the same type of graph on all the axes. When drawing, we also need to pass the specific column names in the Dataframe used for the drawing.
Facet_grid = sns.FacetGrid (row='day', col='smoker', hue='sex', data=tips, height=2, aspect=2.5) facet_grid.map (sns.scatterplot, 'total_bill',' tip') facet_grid.add_legend ()
Matplotlib provides good support for drawing with multiple axes, while Seaborn connects the structure of the graph directly with the structure of the dataset.
With FacetGrid, we do not have to explicitly create axes for each subset of data, nor do we have to explicitly divide the data into subsets. These tasks are done internally by FacetGrid () and FacetGrid.map (), respectively.
We can pass different graph axis-level functions to FacetGrid.map ().
In addition, Seaborn provides three graphics-level functions (high-level interfaces) that use FacetGrid () and FacetGrid.map () at the bottom.
Relplot ()
Catplot ()
Lmplot ()
The above graph-level functions all use FacetGrid () to create multiple axis Axes, record an axis-level function with the parameter kind, and then pass that parameter to FacetGrid.map () internally. The above three functions use different graph axis-level functions to achieve different rendering.
Relplot ()-FacetGrid () + lineplot () / scatterplot () catplot ()-FacetGrid () + stripplot () / swarmplot () / boxplot () boxenplot () / violinplot () / pointplot () barplot () / countplot () lmplot ()-FacetGrid () + regplot ()
Explicit use of FacetGrid provides more flexibility than directly using advanced interfaces such as relplot (), catplot (), or lmplot (). For example, with FacetGrid (), we can also pass custom functions to FacetGrid.map (), but for advanced interfaces, we can only assign the parameter kind using the built-in axis-level function. If you don't need this flexibility, you can use these advanced interface functions directly.
Grid = sns.relplot (row='day', col='smoker', hue='sex', data=tips, kind='scatter', height=3, aspect=2.0)
Sns.catplot (col='day', kind='box', data=tips, X-ray sexuality, y-bag, hue='smoker', height=6, aspect=0.5)
Sns.lmplot (col='day', data=tips, hue='sex', height=6, aspect=0.5)
(2) PairGrid
PairGrid is used to draw pairwise relationships between variables in a dataset. Each subgraph shows the relationship between a pair of variables. Considering the following use cases, we want to visualize the relationship between each pair of variables (through a scatter chart). Although you can do this in Matplotlib, it becomes more convenient if you use Seaborn.
Iris = sns.load_dataset ('iris') g = sns.PairGrid (iris)
The implementation here is mainly divided into two steps.
Create an axis for each pair of variables
On each axis, use the data corresponding to the pair of variables to draw a scatter plot
Step 1 can be done using PairGrid (). Step 2 can be done using PairGrid.map ().
Therefore, PairGrid () creates a graph axis for each pair of variables, while PairGrid.map () uses the data corresponding to the pair of variables to draw curves on each graph axis. We can pass different graph axis-level functions to PairGrid.map ().
Grid = sns.PairGrid (iris) grid.map (sns.scatterplot)
Grid = sns.PairGrid (iris, diag_sharey=True, despine=False) grid.map_lower (sns.scatterplot) grid.map_diag (sns.kdeplot)
Grid = sns.PairGrid (iris, hue='species') grid.map_diag (sns.distplot) grid.map_offdiag (sns.scatterplot)
The diagram does not have to be square: rows and columns can be defined using separate variables
Xroomvars = ['sepal_length',' sepal_width', 'petal_length',' petal_width'] yroomvars = ['sepal_length'] grid = sns.PairGrid (iris, hue='species', x_varsx_vars=x_vars, y_varsy_vars=y_vars, height=3) grid.map_offdiag (sns.scatterplot, swarm 150) # grid.map_diag (sns.kdeplot) grid.add_legend ()
(3) JointGrid
JointGrid is used when we want to draw joint and marginal distributions of two variables in the same graph. You can use scatter plot, regplot, or kdeplot to visualize the joint distribution of two variables. The marginal distribution of variables can be visualized by histograms and / or kde diagrams.
The graph axis-level function for joint distribution must be passed to JointGrid.plot_joint ().
The axis-level function for marginal distribution must be passed to JointGrid.plot_marginals ().
Grid = sns.JointGrid (x = "total_bill", y = "tip", data=tips, height=8) grid.plot (sns.regplot, sns.distplot)
Grid = sns.JointGrid (x = "total_bill", y = "tip", data=tips, height=8) gridgrid = grid.plot_joint (plt.scatter, color= ".5", edgecolor= "white") gridgrid = grid.plot_marginals (sns.distplot, kde=True, color= ".5")
G = sns.JointGrid (x = "total_bill", y = "tip", data=tips, height=8) gg = g.plot_joint (plt.scatter, color= "g", marker='$\ clubsuit$', edgecolor= "white", alpha=.6) _ = g.ax_marg_x.hist (tips ["total_bill"], color= "b", alpha=.36, bins=np.arange (0,60,5)) _ = g.ax_marg_y.hist (tips ["tip"] Color= "r", alpha=.36, orientation= "horizontal", bins=np.arange (0,12,1))
Add a comment with statistics (Annotation) that summarizes the bivariate relationship
From scipy import stats g = sns.JointGrid (x = "total_bill", y = "tip", data=tips, height=8) gg = g.plot_joint (plt.scatter, color= "b", alpha=0.36, slug 40, edgecolor= "white") gg = g.plot_marginals (sns.distplot, kde=False, color= "g") rsquare = lambda a, b: stats.pearsonr (a, b) [0] * 2 gg = g.annotate (rsquare, template= "{stat}: {val:.2f}", stat= "$R ^ 2 $", loc= "upper left", fontsize=12)
3. Summary
Exploratory data Analysis (EDA) involves two basic steps
Data analysis (data preprocessing, cleaning and processing).
Data visualization (using different types of diagrams to show the relationships in the data).
The integration of Seaborn and Pandas helps to make complex multidimensional maps with the least amount of code.
Every drawing function in Seaborn is an axis-level function or a graph-level function.
Axis-level functions are drawn on a single Matplotlib axis and do not affect the rest of the graph.
The graph-level function controls the entire drawing.
This is the end of the content of "how to use Python to quickly reveal the relationships between data". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.