In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly explains the "introduction to Python data visualization case analysis", the article explains the content is simple and clear, easy to learn and understand, now please follow the editor's ideas slowly in depth, together to study and learn "introduction to Python data visualization case analysis" bar!
First of all, what libraries do we use to draw pictures?
Matplotlib
The most basic drawing library in python is matplotlib, which is the most basic Python visualization library. Generally, Python data visualization starts from matplotlib, and then begins to expand vertically and horizontally.
Seaborn
Is an advanced visualization effect library based on matplotlib, aiming at variable feature selection in data mining and machine learning. Seaborn can use short code to draw visualization effects that describe more dimensional data.
Other libraries include
Bokeh (a library for browser-side interaction visualization, which enables analysts to interact with data); Mapbox (a more powerful visualization tool library for dealing with geographic data engines), etc.
This article mainly uses matplotlib for case study.
Step 1: identify the problem and select the graphic
The business may be complex, but after a split, we need to find out what specific problems we want to express graphically. The training of analytical thinking can learn the methods in McKinsey method and Pyramid principle.
This is a summary of the selection of chart types on the Internet.
In python, we can summarize the following four basic visual elements to present graphics:
Points: scatter plot 2D data, suitable for simple 2D relationships
Lines: line plot 2D data, suitable for time series
Column: bar plot 2D data, suitable for category statistics
Color: heatmap is suitable for displaying the third dimension
There are relationships among data, such as distribution, composition, comparison, connection and changing trend. Corresponding to different relationships, select the corresponding graphics to display.
Step 2: convert data and apply functions
A great deal of programming work in data analysis and modeling is based on data preparation: loading, cleaning, transformation, and reshaping. In the visualization step, we also need to sort out the data, convert it to the format we need, and then use the visualization method to complete the drawing.
Here are some common data conversion methods:
Merge: merge,concat,combine_frist (similar to full external connections in a database)
Reshape: reshape; Axial rotation: pivot (similar to excel PivotTable)
Weight removal: drop_duplicates
Mapping: map
Fill replacement: fillna,replace
Rename axis index: rename
Convert classified variables into the get_dummies function of 'dummy variable matrix' and take a limit value for a column of data in df, and so on.
The function looks for the corresponding function in python according to the selected graph in the first step.
Step 3: parameter setting, clear at a glance
After the original drawing is finished, we can modify the color (color), linetype (linestyle), mark (maker) or other chart decoration item title (Title), axis label (xlabel,ylabel), axis scale (set_xticks), and legend (legend) to make the graph more intuitive.
The third step is on the basis of the second step, in order to make the graphics more clear, do the modification work. The specific parameters can be found in the drawing function.
The basis of visual drawing
The basis of Matplotlib drawing
# Import package import numpy as npimport pandas as pdimport matplotlib.pyplot as plt
Figure and Subplot
Matplotlib's graphics are all in the Figure (canvas), and Subplot creates the image space. You cannot draw through figure, you must create one or more subplot with add_subplot.
Figsize can specify the image size.
# create canvas fig = plt.figure () # create subplot,221 indicates that this is the first image in a 2-row and 2-column table. Ax1 = fig.add_subplot (221) # but now more accustomed to using the following methods to create canvases and images. 2Magne2 means that this is a 2p2 canvas that can place four image fig, and the sharex and sharey parameters of axes = plt.subplots (2PowerShaft TrueSharedShaft True) # plt.subplot can specify that all subplot use the same xMagine y-axis scale.
Color color, Mark marker, and Linetype linestyle
The plot function of matplotlib accepts a set of X and Y coordinates, as well as a string abbreviation for color and linetype: 'GMurmuri', indicating that the color is green green and the linetype is'--'dashed line. You can also use parameters to specify explicitly.
Linetypes can also be marked (marker) to highlight the location of data points. Tags can also be placed in a format string, but the tag type and linetype must be placed after the color.
Plt.plot (np.random.randn (30), color='g',linestyle='--',marker='o') []
Ticks, labels and legends
The xlim, xticks, and xtickslabels methods of plt control the range and scale position and scale label of the chart, respectively.
The current parameter value is returned when the method is called without a parameter, and the parameter value is set when the method is called with a parameter.
Plt.plot (np.random.randn (30), color='g',linestyle='--',marker='o') plt.xlim ([0jue 15]) # horizontal axis scale changed to 0-15 (0Jue 15)
Plt.plot (np.random.randn (30), color='g',linestyle='--',marker='o') plt.xlim ([0jue 15]) # horizontal axis scale changed to 0-15 (0Jue 15)
Set title, axis label, scale and scale label
Fig = plt.figure () Ax = fig.add_subplot (1 My first Plot' 1) ax.plot (np.random.randn (1000). Cumsum () ticks = ax.set_xticks ([0min250 pint 500pje 750pr 1000]) # set scale value labels = ax.set_xticklabels (['one','two','three','four','five']) # set scale label ax.set_title (' My first Plot') # set title ax.set_xlabel ('Stage') # set axis label Text (0.5pc0mt Stage`)
Add Legend
Legend legend is another important tool for identifying icon elements. You can pass in the label parameter when you add subplot.
Fig = plt.figure (figsize= (12pm 5)) Ax = fig.add_subplot (1000) ax.plot (np.random.randn (1000). Cumsum () # pass in the label parameter, and define the label name ax.plot (np.random.randn (1000). Cumsum (), 'KMB (1000)) ax.plot (np.random.randn (1000). Cumsum (),) # after the graph is created, you only need to call the legend parameter to call the label. Ax.legend (loc='best') # if the requirement is not very strict, it is recommended to use the loc='best' parameter to let it choose the best location.
Notes
In addition to standard chart objects, we can also customize the addition of some text notes or arrows.
Annotations can be added through functions such as text,arrow and annotate. The text function can draw the text in the specified XMagne y coordinate location, and can also customize the format.
Plt.plot (np.random.randn (1000). Cumsum ()) plt.text (600, 10) family='monospace',fontsize=10) # Chinese annotations do not display properly in the default environment, so you need to modify the configuration file to support Chinese fonts. Please search for specific steps by yourself.
Save the chart to a file
Using plt.savefig, you can save the current chart to a file. For example, to save the chart as a png file, you can execute
The file type is based on the extension. Other parameters include:
Fname: a string containing the file path, and the extension specifies the file type
Dpi: resolution, default background color of 100facecolor,edgcolor image, default'w 'white
Format: display settings file format ('png','pdf','svg','ps','jpg', etc.)
Bbox_inches: the part of the chart that needs to be retained. If set to "tight", an attempt is made to cut out the white space around the image
Plt.savefig ('. / plot.jpg') # saves the drawing function in Pandas of jpg format image with plot name
Matplotlib drawing
Matplotlib is the most basic drawing function and a relatively low-level tool. Assembling a chart requires individual calls to the underlying components. There are many advanced matplotlib-based drawing methods in Pandas, and charts that originally require multiple lines of code need only a few lines to use pandas.
What we use is the drawing package in pandas.
Import matplotlib.pyplot as plt
Line pattern diagram
Both Series and DataFrame have a plot method for generating various types of charts. By default, they generate linetypes.
S = pd.Series (np.random.randn (10). Cumsum (), index=np.arange (0meme 100j 10)) the index index of the s.plot () # Series object is passed to matplotlib to draw the x-axis.
Df = pd.DataFrame (np.random.randn (10) 4) .Cumsum (0), columns= ['Achilles df.plot () # plot automatically changes colors for different variables and adds legends.
Parameters of the Series.plot method
Label: the label for the chart
Style: style string, 'gmurmuri'
Alpha: fill opacity of the image (0-1)
Kind: chart type (bar,line,hist,kde, etc.)
Xticks: setting x-axis scale valu
Yticks: setting y-axis scale valu
Xlim,ylim: set the limit of the axis, [0recom 10]
Grid: displays grid lines, off by default
Rot: rotating scale label
Use_index: use the index of an object as a scale label
Logy: using a logarithmic ruler on the Y axis
Parameters of the DataFrame.plot method
DataFrame has some unique options in addition to the parameters in Series.
Subplots: draw individual DataFrame columns into a separate subplot
Sharex,sharey: sharing XBI y axes
Figsize: controlling image size
Title: image titl
Legend: add legend, displayed by default
Sort_columns: draw columns alphabetically, using the current order by default
Bar chart
Add kind='bar' or kind='barh', to the code that generates a linetype chart to generate a bar chart or a horizontal bar chart.
Fig,axes = plt.subplots (2) data = pd.Series (np.random.rand (10), index=list ('abcdefghij')) data.plot (kind='bar',ax=axes [0], rot=0,alpha=0.3) data.plot (kind='barh',ax=axes [1], grid=True)
There is a very practical method for bar charts:
Use value_counts to graphically display the frequency of values in Series or DF.
For example, df.value_counts (). Plot (kind='bar')
The basic syntax of Python visualization ends here, and other graphics are drawn in more or less the same way.
The key point is to follow the train of thought of three steps to think, choose and apply. More practice can make you more proficient.
Thank you for your reading, the above is the content of "introduction to Python data Visualization case Analysis". After the study of this article, I believe you have a deeper understanding of the problem of Python data Visualization introduction case Analysis, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.