In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article introduces how to use Python code to achieve the five best and simple data visualization, the content is very detailed, interested friends can refer to, I hope it can be helpful to you.
Data visualization is an important part of the work of data scientists. In the early stages of a project, you usually do exploratory data analysis (EDA) to gain some insight into the data. Creating visualization does help make things clearer and easier to understand, especially for larger, higher-dimensional datasets. At the end of the project, it is important to be able to present your final results in a clear, concise, and compelling manner so that your audience (usually non-technical customers) can understand.
Matplotlib is a popular Python library that can be used to easily create data visualization. However, setting up data, parameters, numbers, and drawings can become quite confusing and tedious every time a new project is executed. In this blog post, we'll look at six data visualizations and write some quick and easy functions for them using Python's Matplotlib. At the same time, this is a good chart to choose the right visualization for your work!
Scatter plot
A scatter chart is ideal for showing the relationship between two variables because you can directly view the original distribution of the data. You can also simply view this relationship between different data groups by color-coding groups, as shown in the following figure. Want to imagine the relationship between the three variables? No problem! Just use another parameter, such as the point size, to encode the third variable, as we can see in the second figure below.
Now let's look at the code. We first import Matplotlib's pyplot using the alias "plt". To create a new plot diagram, we call plt.subplots (). Pass the x-axis and y-axis data to the function, and then pass them to ax.scatter () to draw a scatter plot. We can also set points, dot colors, and Alpha transparency. You can even set the y-axis to the logarithmic scale. Then specifically label the title and axis for the drawing. It's easy to use an end-to-end function to create a scatter chart!
Linear diagram
When you can clearly see that one variable is very different from another, that is, they have a high covariance, it is best to use a graph. Let's take a look at the following chart to illustrate that it is clear that the percentage of all majors varies greatly over time. Using scatter plots to draw these graphs can be very confusing, which makes it difficult for us to really understand and see what's going on. Line graphs are ideal for this situation because they basically quickly summarize the covariance (percentage and time) of the two variables. Similarly, we can also group by color coding.
This is the code for the line graph. It is very similar to the scatter above. There are only minor changes in variables.
Histogram
Histograms can be used to view (or actually discover) the distribution of data points. Looking at the histogram below, we draw the frequency and IQ histogram. We can clearly see the concentration and median of the center. We can also see that it follows the Gaussian distribution. Using bars (rather than scatters), you can really see the relative difference between the frequencies of each box. Using staging (discretization) does help us see "larger images", and if we use all the data points without discrete fragmentation, there may be a lot of noise in the visualization, which makes it difficult to see what is really happening.
The code for the histogram in Matplotlib is as follows. There are two parameters to note. First, the n_bins parameter controls the number of discrete regions required for our histogram. More bins will give us better information, but it may also introduce noise; on the other hand, less bins gives us more "aerial view" and no more detailed details to better understand what is going on. Second, the cumulative parameter is a Boolean value that allows us to choose whether our histogram is cumulative or not. This is basically the choice of probability density function (PDF) or cumulative density function (CDF).
Imagine that we want to compare the distribution of two variables in the data. One might think that you have to make two separate histograms and put them side by side for comparison. However, there is actually a better way: we can overwrite the histogram with different transparency. Take a look at the following figure, the uniform distribution is set to transparency to 0.5 so that we can see what's behind it. This allows the user to view the two distributions directly on the same diagram.
Something is set up in the code for the superimposed histogram. First, we set the horizontal range to accommodate the distribution of the two variables. According to this range and the number of boxes required, we can actually calculate the width of each box. Finally, we draw two histograms on the same graph, one of which is slightly transparent.
Bar chart
When you try to visualize with a small amount (possibly
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.