How to solve the problem of excessive data density with Python 07/02 Update SLTechnology News&Howtos

How to solve the problem of excessive data density with Python

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article introduces the knowledge of "how to use Python to solve the problem of excessive data density". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

What is a density map?

The so-called density map (Density Plot) is the dense distribution of data, which is often used to show the distribution of data in a continuous period of time. Strictly speaking, it evolved from the histogram, which is similar to filling the histogram.

Generally, the smooth curve is used to draw the numerical level to observe the distribution, and the position of the peak value is the highest concentration in this period of time.

It is more applicable than histogram, it is not affected by the number of groups (the number of bars of histogram should not be too large), and can better define the distribution shape.

This article does not talk about histograms, after which Lao Hai will specifically summarize the use of histograms.

What is a 2D density map?

After talking about density diagrams and histograms, they are both one-dimensional data variables.

Now let's take a look at the 2D density map, which shows the distribution of values within the range of two quantitative variables in the data set, which helps to avoid overdrawing in the scatter chart.

If there are too many points, the 2D density map calculates the number of observations in a specific area of 2D space.

This particular area can be square or hexagonal (hexagonal), and a 2D kernel density estimate can be estimated and represented by an outline.

This article mainly describes the use of 2D density maps.

Basic data style of 2D density map

Suggestions on the use of 2D density map

Density diagram is an alternative to histogram, which is often used to observe the distribution of continuous variables.

2D density map is mainly used to solve the problem of excessive density of data points, so we should pay attention to whether the density segmentation is reasonable or not.

When the data range is very concentrated and there is little change between the data, the density map is often difficult to observe the effect.

Let's start with a specific operation case.

Preparatory work

As before, introduce the necessary toolkits

# # initial font setting can avoid a lot of troubles plt.rcParams ['font.sans-serif'] = [' Source Han Sans CN'] # display Chinese characters without garbled code, think source boldface plt.rcParams ['font.size'] = 22 # set the global font size of the chart The font size of a later element can be adjusted by plt.rcParams ['axes.unicode_minus'] = False # display negative non-garbled # # initialization chart size plt.rcParams [' figure.figsize'] = (20.0) 8.0) # set figure_size size # # initialize chart resolution quality plt.rcParams ['savefig.dpi'] = 300 # set pixel resolution when saving chart plt.rcParams [' figure.dpi'] = 300 # set pixel resolution when drawing chart # # Custom color of the chart colors = ['# dc2624' '# 2b4750stories,' # 45a0a2posts,'# e87a59posts,'# 7dcaa9articles,'# 649E7Dbands,'# dc8018','# C89F91posts,'# 6c6d6cf3cwings,'# 4f6268legs,'# c7cccf'] plt.rcParams ['axes.prop_cycle'] = plt.cycler (color=colors) path =' D:\ series articles\\ # Custom File path You can set os.chdir (path) # as the working path, and generally store data source files

Set chart style and file path

Financial_data = pd.read_excel ('virtual demo case data. Xlsx', sheet_name=' 2D table') Financial_data

Read in data

Financial_data = pd.read_excel ('virtual demo case data. Xlsx', sheet_name=' 2D table') Financial_data

Six common types of density charts

From scipy.stats import kde # introduces kernel density calculation method # to facilitate demonstration, create 6 sub-graph drawing board fig, axes = plt.subplots (3 figsize= 2, figsize= (20, 20)) # the first sub-graph, let's draw a basic scatter plot # scatter plot is the most classic way to observe the relationship between two variables, but when the amount of data is very large, the data points will be stacked and interlaced. When the value we cannot further explore axes [0] [0] .set _ title ('scatter plot') # set title axes_0 = axes [0] [0] .plot (Financial_data ['material'], Financial_data ['Management'], 'ko') # draw scatter chart # second subgraph We draw a hexagonal honeycomb diagram # when you look for the relationship between two numerical variables, the amount of data is large and you don't want the data to be stacked together, you can cut the data points according to the shape of the honeycomb. Calculate the number of points in each hexagon to express density num_bins = 50 # set the distance of the hexagonal inclusion axes [0] [1] .set _ title ('honeycomb hexagonal graph') # set the title axes_1= axes [0] [1] .hexbin (Financial_data ['material'] Financial_data ['manage'], gridsize=num_bins, # set the size of the hexagon cmap= "Blues" # set the color combination) fig.colorbar (axes_1 Ax=axes [0] [1]) # set color display bar # third subgraph We draw a 2D histogram. # 2D histograms are very useful when you need to analyze the relationship between two numerical variables with a large amount of data It can avoid the problem of excessive data density in scatter chart num_bins = 50 axes [1] [0] .set _ title ('2D histogram') axes_2 = axes [1] [0] .hist2d (Financial_data ['material'], Financial_data ['Management'], bins= (num_bins,num_bins), cmap= "Blues") # fig.colorbar (axes_2) Ax=axes [1] [0]) # fourth subgraph We draw a Gaussian kernel density diagram # considering that we want to study the relationship between two numerical variables with many points. The 2D kernel density estimate can be calculated by considering the number of points on each part of the drawing area. Like a smooth histogram, this method does not cause a point to fall into a particular container, but increases the weight of the surrounding container, such as deepening the color. K = kde.gaussian_kde (Financial_data.loc [:, ['material', 'management'] .values.T) # for nuclear density calculation xi, yi = np.mgrid [financial _ data ['material'] .min (): Financial_data ['material'] .max (): num_bins*1j Financial_data ['manage'] .min (): Financial_data ['manage'] .max (): num_bins*1j] zi = k (np.vstack ([xi.flatten (), yi.flatten ()]) axes [1] [1] .set _ title ('Gaussian nuclear density map') axes_3 = axes [1] [1] .pcolormesh (xi, yi, zi.reshape (xi.shape)) Cmap= "Blues") fig.colorbar (axes_3,ax=axes [1] [1]) # set color display bar # fifth subgraph We draw a 2D density map with shadow axes [2] [0]. Set _ title ('2D density map with shadow effect') axes [2] [0] .pcolormesh (xi, yi, zi.reshape (xi.shape), shading='gouraud', cmap= "Blues") # sixth subgraph We draw a density map with contour axes [2] [1]. Set _ title ('2D density map with shadow + contour') axes_5 = axes [2] [1]. Pcolormesh (xi, yi, zi.reshape (xi.shape), shading='gouraud' Cmap= "Blues") fig.colorbar (axes_5,ax=axes [2] [1]) # set the color display bar # draw the outline axes [2] [1] .outline (xi, yi, zi.reshape (xi.shape)) plt.show ()

Special mention: 2D kernel density estimation map

Sns.kdeplot (Financial_data ['material'], Financial_data ['Management']) sns.despine () # default no parameter state, that is, delete the top and right borders. Matplotlib seems to be unable to do so.

Sns.kdeplot (Financial_data ['material'], Financial_data ['manage'], cmap= "Reds", shade=True, # if True, shadow processing is performed in the area below the kde curve, and color controls the color of the curve and shadow shade_lowest=True, # if True The lowest outline of the bivariate KDE graph is masked. # bw=.15) sns.despine () # default no parameter state, that is, delete the top and right borders. Matplotlib seems to be unable to "how to use Python to solve the problem of excessive data density". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.