In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
How to make efficient use of Python visualization tool Matplotlib, in view of this problem, this article introduces the corresponding analysis and solutions in detail, hoping to help more partners who want to solve this problem to find a more simple and easy way.
Matplotlib is one of the most commonly used visualization tools in Python. It is very convenient to create massive types of 2D charts and some basic 3D charts. This article focuses on some of the challenges you face when learning Matplotlib, why you use Matplotlib, and recommends a step to learn to use Matplotlib.
Brief introduction
For beginners, entering the Python visualization world can sometimes be frustrating. Python has many different visualization tools, and choosing the right one is sometimes a challenge. After using python tools such as pandas,scikit-learn,seaborn and other data science and technology stacks, I felt that it was a bit premature to discard matplotlib. To be honest, I didn't know much about matplotlib before, and I didn't know how to use it effectively in the workflow.
Now that I've taken the time to learn about some of these tools and how to use matplotlib, I'm starting to see matplotlib as an indispensable tool. This article will show how I use matplotlib and give some advice to beginners or users who don't have time to learn matplotlib. I firmly believe that matplotlib is an important part of the python data science and technology stack, and I hope this article will help you understand how to use matplotlib for your own visualization.
Why are all negative comments on matplotlib?
In my opinion, there are several main reasons why new users will face certain challenges in learning matplotlib.
First, matplotlib has two interfaces. * is based on MATLAB and uses state-based interfaces. The second is an object-oriented interface. Why these two interfaces are beyond the scope of this article, but it is important to know that there are two ways to use matplotlib for drawing.
The reason for the confusion between the two interfaces is that when a lot of information is available to the stack overflow community and Google search, new users are confused by multiple solutions to problems that look similar but different. Start with my own experience. Look back at my old code, a mix of matplotlib code-- it's very confusing for me (even if I wrote it).
Key points
New users of matplotlib should learn to use object-oriented interfaces.
Another historic challenge for matplotlib is that some of the default style options are quite unattractive. In the R world, you can use ggplot to generate some pretty cool drawings, and matplotlib's options look a little ugly by comparison. The good news is that matplotlib 2.0 has a more beautiful style and a very convenient ability to theme visual content.
I think the third challenge with matplotlib is whether you should simply use matplotlib or use a tool like pandas or seaborn on top of it when drawing something. There are many ways to do things at any time, and for novices or users who don't often use matplotlib, it's challenging to follow the right path. Linking this confusion to two different API is the secret to solving the problem.
Why do you insist on using matplotlib?
Despite these problems, I am glad to have matplotlib because it is very powerful. This library allows you to create almost any visualization you can imagine. In addition, there is a rich ecosystem of python tools around it, and many of the more advanced visualization tools use matplotlib as the base library. If you do any work in the python data science stack, you will need to have a basic understanding of how to use matplotlib. This is the focus of the rest of this article-- introducing a basic way to use matplotlib effectively.
Basic premise
If you don't have any foundation other than this article, it is recommended to use the following steps to learn how to use matplotlib:
Learn basic matplotlib terms, especially what are graphs and axes
Always use the object-oriented interface and get into the habit of using it from the beginning
Start your visual learning with basic pandas drawings
More complex Statistical Visualization with seaborn
Use matplotlib to customize pandas or seaborn visualization
This picture from matplotlib faq is a classic, making it easy to understand the different terms of a picture.
Most terms are very straightforward, but the important thing to remember is that Figure is the final image and may contain one or more axes. The axis represents a separate partition. Once you understand these and how to access them through the object-oriented API, the following steps can begin.
This knowledge of terminology has another advantage: when you look at something online, you have a starting point. If you take the time to understand this, you will understand the rest of matplotlib API. In addition, many advanced packages of python, such as seaborn and ggplot, rely on matplotlib. Therefore, it will be easier to learn the more powerful frameworks after learning the basics.
* I'm not saying that you should avoid choosing other better tools such as ggplot (aka ggpy), bokeh,plotly or altair. I just think you need to start with a basic understanding of matplotlib + pandas + seaborn. Once you understand the basic visualization techniques, you can explore other tools and make wise choices according to your own needs.
Introduction
The rest of this article will serve as an introductory tutorial on how to do basic visual creation in pandas and customize the most commonly used projects using matplotlib. Once you understand the basic process, further customization is relatively easy to create.
Focus on the most common drawing tasks I encounter, such as marking axes, adjusting limits, updating drawing titles, saving pictures, and adjusting legends. If you want to follow along, it should be useful to include notes with additional details in the link https://github.com/chris1610/pbpython/blob/master/notebooks/Effectively-Using-Matplotlib.ipynb.
To get started, I'll import the library and read some data:
Import pandas as pd import matplotlib.pyplot as plt from matplotlib.ticker import FuncFormatter df = pd.read_excel ("https://github.com/chris1610/pbpython/blob/master/data/sample-salesv3.xlsx?raw=true") df.head ())
This is the sales transaction data for 2014. To keep the data short, I'll aggregate the data so that we can see the total purchases and sales of the top ten customers. In order to be clear, I will rename it in the drawing.
Top_10 = (df.groupby ('name') [' ext price', 'quantity'] .agg ({' ext price': 'sum',' quantity': 'count'}) .sort _ values (by='ext price', ascending=False)) [: 10] .reset _ index () top_10.rename (columns= {' name': 'Name',' ext price': 'Sales',' quantity': 'Purchases'}, inplace=True)
The following is the result of the data processing.
Now that the data is formatted into a simple table, let's take a look at how to draw these results into a bar chart.
As mentioned earlier, matplotlib has many different styles for rendering drawings, and you can use plt.style.available to see which styles are available in the system.
Plt.style.available ['seaborn-dark',' seaborn-dark-palette', 'fivethirtyeight',' seaborn-whitegrid', 'seaborn-darkgrid',' seaborn', 'bmh',' classic', 'seaborn-colorblind',' seaborn-muted', 'seaborn-white',' seaborn-talk', 'grayscale',' dark_background', 'seaborn-deep',' seaborn-bright', 'ggplot',' seaborn-paper', 'seaborn-notebook' 'seaborn-poster',' seaborn-ticks', 'seaborn-pastel']
Simply use a style like this:
Plt.style.use ('ggplot')
I encourage you to try different styles and see which ones you like.
Now that we have a more beautiful style, the step is to draw the data using the standard pandas drawing function:
Top_10.plot (kind='barh', y = "Sales", x = "Name")
I recommend using pandas drawing first because it is a quick and easy way to build visualization. Since most people may have done some data processing / analysis in pandas, start with basic drawing.
Customized drawing
Assuming you are satisfied with the main points of the drawing, the next step is to customize it. Customization using pandas drawing features, such as adding titles and tags, is very simple. However, you may find that your needs go beyond this function to some extent. That's why I suggest getting into the habit of doing this:
Fig, ax= plt.subplots () top_10.plot (kind='barh', y = "Sales", x = "Name", ax=ax)
The resulting diagram looks the same as the original, but we add an additional call to plt.subplots () and pass ax to the drawing function. Why would you do that? Remember when I said it was important to access axes and numbers in matplotlib? This is what we have done here. Any future customization will be done through ax or fig objects.
We got all the access to matplotlib thanks to the rapid drawing of pandas. What can we do now? Show it with an example. In addition, through naming conventions, you can easily change other people's solutions into solutions that suit your own unique needs.
Suppose we want to adjust the x limit and change the labels of some axes? Now that the axes are stored in the ax variable, we have a lot of control:
Fig, ax= plt.subplots () top_10.plot (kind='barh', y = "Sales", x = "Name", ax=ax) ax.set_xlim ([- 10000, 140000]) ax.set_xlabel ('Total Revenue') ax.set_ylabel (' Customer')
Here is a shortcut that can be used to change the title and two tags:
Fig, ax= plt.subplots () top_10.plot (kind='barh', y = "Sales", x = "Name", ax=ax) ax.set_xlim ([- 10000, 140000]) ax.set (title='2014 Revenue', xlabel='Total Revenue', ylabel='Customer')
To further verify this method, you can also resize the image. Through the plt.subplots () function, you can define figsize in inches. You can also use ax.legend (). Set_visible (False) to delete the legend.
Fig, ax= plt.subplots (figsize= (5,6)) top_10.plot (kind='barh', y = "Sales", x = "Name", ax=ax) ax.set_xlim ([- 10000, 140000]) ax.set (title='2014 Revenue', xlabel='Total Revenue') ax.legend () .set_visible (False)
You may want to adjust this diagram for many reasons. The most awkward thing to look at is the format of total income figures. Matplotlib can be implemented for us through FuncFormatter. This function applies a user-defined function to a value and returns a neatly formatted string to be placed on the axis.
Here is a currency formatting function that gracefully handles dollar formats in the range of hundreds of thousands:
Def currency (x, pos): 'The two args are the value and tick position' if x > = 1000000: return' ${: 1.1f} M'.format (x*1e-6) return'${: 1.0f} K'.format (x*1e-3)
Now we have a formatting function that needs to be defined and applied to the x-axis. Here is the complete code:
Fig, ax= plt.subplots () top_10.plot (kind='barh', y = "Sales", x = "Name", ax=ax) ax.set_xlim ([- 10000, 140000]) ax.set (title='2014 Revenue', xlabel='Total Revenue', ylabel='Customer') formatter = FuncFormatter (currency) ax.xaxis.set_major_formatter (formatter) ax.legend () .set_visible (False)
This is more beautiful and is a good example of how to flexibly define your own problem solution.
One of the custom features we are going to explore is by adding comments to the drawing. To draw a vertical line, use ax.axvline (). To add custom text, use ax.text ().
In this example, we will draw an average line and display the labels of three new customers. Here is the complete code and comments, put them together.
# Create the figure and the axes fig, ax= plt.subplots () # Plot the data and get the averaged top_10.plot (kind='barh', y = "Sales", x = "Name", ax=ax) avg = top_10 ['Sales'] .mean () # Set limits and labels ax.set_xlim ([- 10000, 140000]) ax.set (title='2014 Revenue', xlabel='Total Revenue', ylabel='Customer') # Add a line for the average ax.axvline (x=avg, color='b' Label='Average', linestyle='--', linewidth=1) # Annotate the new customers for cust in [3,5,8]: ax.text (115000, cust, "New Customer") # Format the currency formatter = FuncFormatter (currency) ax.xaxis.set_major_formatter (formatter) # Hide the legend ax.legend () .set_visible (False)
While this may not be an exciting (bright) way of drawing, it shows how much authority you have when using this method.
Graphics and images
All the changes we have made so far are single graphics. Fortunately, we also have the ability to add multiple graphics to the diagram and use various options to save the entire image.
If we decide to put two images on the same image, we should have a basic understanding of how to do this. First, create the shape, then create the axes, and then draw them all together. We can do this with plt.subplots ():
Fig, (ax0, ax1) = plt.subplots (nrows=1, ncols=2, sharey=True, figsize= (7,4))
In this example, nrows and ncols are used to specify the size, which is clearer to new users. In the sample code, you often see variables like 1 ~ 2. I find it easier to understand when looking at the code later by using named parameters.
Use the parameter sharey = True so that yaxis shares the same tag.
This example is also good because the axes are extracted to ax0 and ax1. With these axes, you can draw a graph like the example above, but put a graph on ax0 and one on ax1.
# Get the figure and the axes fig, (ax0, ax1) = plt.subplots (nrows=1,ncols=2, sharey=True, figsize= (7,4)) top_10.plot (kind='barh', y = "Sales", x = "Name", ax=ax0) ax0.set_xlim ([- 10000, 140000]) ax0.set (title='Revenue', xlabel='Total Revenue', ylabel='Customers') # Plot the average as a vertical line avg = top_10 ['Sales'] .mean () ax0.axvline (x=avg, color='b') Label='Average', linestyle='--', linewidth=1) # Repeat for the unit plot top_10.plot (kind='barh', y = "Purchases", x = "Name", ax=ax1) avg = top_10 ['Purchases'] .mean () ax1.set (title='Units', xlabel='Total Units', ylabel='') ax1.axvline (x=avg, color='b', label='Average', linestyle='--', linewidth=1) # Title the figure fig.suptitle (' 2014 Sales Analysis', fontsize=14, fontweight='bold') # Hide the legends ax1.legend () .set_visible (False) ax0.legend () .set_visible (False)
So far, I have been using jupyter notebook to display graphics with the% matplotlib inline instruction. But in many cases, you need to save the numbers in a specific format and display them with other content.
Matplotlib supports saving files in many different formats. You can use fig.canvas.get_supported_filetypes () to view the supported formats:
Fig.canvas.get_supported_filetypes () {'eps':' Encapsulated Postscript', 'jpeg':' Joint Photographic Experts Group', 'jpg':' Joint Photographic Experts Group', 'pdf':' Portable Document Format', 'pgf':' PGF code for LaTeX', 'png':' Portable Network Graphics', 'ps':' Postscript', 'raw':' Raw RGBA bitmap', 'rgba':' Raw RGBA bitmap', 'svg':' Scalable Vector Graphics' 'svgz': 'Scalable Vector Graphics',' tif': 'Tagged Image File Format',' tiff': 'Tagged Image File Format'}
Since we have a fig object, we can save the image with several options:
Fig.savefig ('sales.png', transparent=False, dpi=80, bbox_inches= "tight")
The above code saves the image as png with an opaque background. The resolution dpi and bbox_inches = "tight" are also specified to minimize excess spaces.
Conclusion
Hopefully this process will help you understand how to use matplotlib more effectively in your daily data analysis. If you get into the habit of using this method when doing analysis, you should be able to quickly customize any image you need.
As a benefit, I introduce a quick guide to summarize all the concepts. I hope this will help to connect this article and provide convenience for reference in the future.
This is the answer to the question about how to make efficient use of Python visualization tool Matplotlib. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel to learn more about it.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.