In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
Most people do not understand the knowledge points of this article "how to achieve the bins and rwidth parameters of Matplotlib", so the editor summarizes the following, detailed contents, clear steps, and a certain reference value. I hope you can get something after reading this article. Let's take a look at this "Matplotlib bins and rwidth parameters how to achieve" article.
Scene introduction
When we do machine learning related projects, we often analyze the sample distribution of data sets, which requires the drawing of histograms.
In Python, you can easily call the hist function of matplotlib.pyplot to draw a histogram. However, this function has a lot of parameters, and there are a few small details of the drawing that need to be paid attention to.
First of all, let's assume that there is now a federal learning project scenario. We have a picture dataset with 15 samples and four sample labels, namely cat, dog, car, and ship. This data set has been unevenly divided into four task nodes (client). Situation introduction
When we do machine learning related projects, we often analyze the sample distribution of data sets, which requires the drawing of histograms.
In Python, you can easily call the hist function of matplotlib.pyplot to draw a histogram. However, this function has a lot of parameters, and there are a few small details of the drawing that need to be paid attention to.
First of all, let's assume that there is now a federal learning project scenario. We have a picture dataset with 15 samples and four sample labels, namely cat, dog, car, and ship. This data set has been unevenly divided into four task nodes (client), as shown below:
N_CLIENTS = 3 num_cls, classes = 4, ['cat',' dog', 'car',' ship'] train_labels = [0,3,2,0,3,1,0,3,3,1,0,3,2,2] # tag list of datasets client_idcs = [slice (0,4), slice (4,11), slice (11,15)] # Partition of dataset samples on client
We need to visualize the distribution of samples on the task node. We may write the following code for the first time:
Import matplotlib.pyplot as pltimport numpy as npplt.figure (figsize= (5 for i in range 3)) plt.hist ([train_ labels [IDC] for idc in client_idcs], stacked=False, bins=num_cls, label= ["Client {}" .format (I) for i in range (N_CLIENTS)]) plt.xticks (np.arange (num_cls), classes) plt.legend () plt.show ()
The visualization results at this time are as follows:
At this point, we will find that the label on our x-axis and the bar above (the three bar corresponding to each image category are called 1 bin) are not aligned, and the play requires us to adjust the bins parameter.
Bins parameter
Before we talk about the bins parameters, let's familiarize ourselves with the meaning of bin and bar in hist drawings. Here is an interpretation of them:
Here\ (x _ axis 1\) and\ (x _ axis 2\) are x-axis objects. In hist, the default scale of the first object on the x-axis is 0, and the second object scale is 1, followed by class diagram. On this interpretation diagram, bin (originally dustbin) refers to the rectangular drawing area dominated by each x-axis object, and bar (originally means block) refers to the bar in each rectangular drawing area. As shown in the figure above, the bin interval for the first object on the x-axis is [- 0.5,0.5), and the bin area for the second object is [0.5,1) (note that the hist must be left closed and open). There are three bar in the bin area of each object.
By looking at the matplotlib documentation, we know that the bins parameter is explained as follows:
Bins: int or sequence or str, default: rcParams ["hist.bins"] (default: 10)
If bins is an integer, it defines the number of equal-width bins in the range.
If bins is a sequence, it defines the bin edges, including the left edge of the first bin and the right edge of the last bin; in this case, bins may be unequally spaced. All but the last (righthand-most) bin is half-open. In other words, if bins is:
[1, 2, 3, 4]
Then the first bin is [1,2) (including 1, but excluding 2) and the second [2,3). The last bin, however, is [3, 4], which includes 4.
If bins is a string, it is one of the binning strategies supported by numpy.histogram_bin_edges: 'auto',' fd', 'doane',' scott', 'stone',' rice', 'sturges', or' sqrt'.
Let me summarize, that is, if bins is a number, then it sets the number of bin, that is, how many separate drawing areas are divided along the x-axis. We have four image categories here, so we need to set up four drawing areas, each of which is offset from the x-axis scale by default.
However, if we want to set the position offset of each region, we need to set the bins to a sequence.
The scale of the bins sequence should be set according to the x-coordinate scale in the hist function. The x-axis scales of the four categories in this task are [0,1,2,3] respectively. If we set the sequence to [0,1,2,3,4], it means that the first drawing area corresponds to [1,2), the second drawing area corresponds to [1,2), the third drawing area corresponds to [2,3), and so on.
As far as popular aesthetics is concerned, we want to align the center of each region with the corresponding x-axis scale, the interval of the first region is [- 0.5, 0.5), the interval of the second region is [0.5, 1.5), and so on. The final bins sequence is [- 0.5,0.5,1.5,2.5,3.5]. Therefore, we modify the hist function as follows:
Plt.hist ([train_ labels [IDC] for idc in client_idcs], stacked=False, bins=np.arange (- 0.5,4,1), label= ["Client {}" .format (I) for i in range (N_CLIENTS)])
In this way, the scale of each divided area and the corresponding x-axis is aligned:
Stacked parameter
Sometimes there are too many items on the x-axis, and setting 3 bar for each x-axis object is undoubtedly a huge occupation of drawing space. How do we compress the use of space in this case? At this point, the parameter stacked comes in handy. We set the parameter stacked to True:
Plt.hist ([train_ labels [IDC] for idc in client_idcs], stacked=True bins=np.arange (- 0.5,4,1), label= ["Client {}" .format (I) for i in range (N_CLIENTS)])
You can see that the bar of each x-axis object is "superimposed":
However, a new problem arises again, so that there is no distance between the bar of each x-axis object, which is very "crowded". Can we modify the bins parameter to set the spacing between regional bin? The answer is no, because as we mentioned earlier, the bins parameter can only set the zone to be sequentially arranged.
To change the way of thinking, let's set the spacing between the bar and bin boundaries within each bin. At this point, we need to modify the r_width parameter.
Rwidth parameter
Let's take a look at the explanation of the rwidth parameter in the document:
Rwidth float or None, default: None
The relative width of the bars as a fraction of the bin width. If None, automatically compute the width.
Ignored if histtype is' step' or 'stepfilled'.
To translate, rwidth is used to set the size of bar relative to bin in each bin. Here we might as well change it to 0.5:
Plt.hist ([train_ labels [IDC] for idc in client_idcs], stacked=True, bins=np.arange (- 0.5,4,1), rwidth=0.5, label= ["Client {}" .format (I) for i in range (N_CLIENTS)])
The modified chart is as follows:
You can see that the bar within each x-axis element accounts for exactly 1/2 of the width of the corresponding bin.
The above is about the content of this article on "how to achieve the bins and rwidth parameters of Matplotlib". I believe we all have a certain understanding. I hope the content shared by the editor will be helpful to you. If you want to know more about the relevant knowledge, please follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.