How to define Python covariance and correlation coefficient 07/01 Update SLTechnology News&Howtos

How to define Python covariance and correlation coefficient

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article introduces the relevant knowledge of "how to define Python covariance and correlation coefficient". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

The joint distribution contains a wealth of information. For example, the marginal distribution of a random variable is extracted from the joint distribution, that is, the distribution of the random variable is obtained, and the expectation and variance of the random variable can be obtained accordingly. By limiting the line of sight to a single random variable, we lose other useful information contained in the joint distribution, such as the interaction between different random variables. In order to understand the relationship between different random variables, we need to turn to other descriptive quantities.

Covariance

Covariance (covariance) expresses the synergistic relationship between two random variables. We take a sample space, that is, the students' physical examination data. The height of the student is a random variable X and the weight of the student is a random variable Y.

160cm170cm180cm60kg0.20.050.0570kg0.050.30.0580kg0.050.050.2

According to the above table, large height (180cm) and large weight (80kg) are more likely to occur at the same time (0.2), and small height values (160cm) and small weight (60kg) are also more likely (0.2). The larger height is often accompanied by the larger weight, and the smaller height is often accompanied by the smaller weight. The situation in which "big" is accompanied by "big" and "small" is accompanied by "small" is called positive correlation. According to the above data, the two random variables of height and weight have a strong positive correlation.

On the other hand, if the probability of "big" with "small" and "small" with "big" is very high, then the two random variables are negatively related. "the cutest height difference" is an example of negative correlation. (the sample space is the height information of the couple. The height of boys can be defined as a random variable, and the height of girls as another random variable)

As with other distribution descriptions, covariance extracts information from the probability distribution, allowing us to know the "performance" of the distribution. For a known joint distribution, a covariance, or a numerical value, can be calculated between any two random variables.

Define

The covariance is defined as follows. If X and Y are random variables of joint distribution and have expected μ X μ X and μ Y μ Y, then the covariance of X and Y is

Cov (XMagi Y) = E [(X − μ X) (Y − μ Y)] Cov (XMagi Y) = E [(X − μ X) (Y − μ Y)]

The definition of covariance is based on expectations. According to the definition of expectation, covariance can be directly applied to discrete random variables and continuous random variables.

We already know that expectation is the weighted average of a random variable according to probability. The goal of our weighted average is the product of X − μ XX − μ X and Y − μ YY − μ Y. The difference between the random variable and the expectation represents the degree of deviation between the value of the random variable and the central value, which is what we call "too large" or "too small" above: the deviation from the positive value means "too large" and the deviation from the negative value means "too small". If it is a positive correlation, that is, in the case of a large match and a small one, then the product is positive; if it is a negative correlation, the product is negative. Therefore, through the quantity (X − μ X) (Y − μ Y) (X − μ X) (Y − μ Y), we express the correlation between X and Y.

Go back to the data just now to calculate the correlation.

160cm170cm180cm60kg0.20.050.0570kg0.050.30.0580kg0.050.050.2

Let the height be X and the weight Y. We can get the distribution of X and Y respectively through the marginal distribution (recall). The expectations of X and Y are 170 and 70, respectively. Calculate (X − μ X) (Y − μ Y) (X − μ X) (Y − μ Y) in each lattice

160cm170cm180cm60kg1000-10070kg00080kg-1000100

In the above two tables, the corresponding grids are multiplied and summed, and the covariance is obtained:

Cov (XMagi Y) = 0.2 × 100mm 0.2 × 100mm 0.05x (− 100) + 0.05x (− 100) = 30 (1) (2) (1) Cov (XMagi Y) = 0.2 × 100mm 0.2 × 100mm 0.05x (− 100) + 0.05x (− 100) (2) = 30

In the above calculation, the positive correlation items are assigned a relatively large probability value. The final covariance is also a positive value.

According to the nature of expectation, we can rewrite the expression of covariance:

Cov (XQuery Y) = E (XY − X μ X − Y μ X + μ X μ Y) = E (XY) − E (X) μ X − E (Y) μ Y + μ X μ Y = E (XY) − E (X) E (Y) (3) (4) (5) Cov (XMague Y) = E (XY − X μ X − Y μ X + μ X μ Y) (4) = E (XY) − E (X) μ X − E (Y) μ Y + μ X μ Y (5) = E (XY) − E (X) E (Y)

When X and Y are independent, E (XY) = E (X) E (Y) E (XY) = E (X) E (Y), Cov (XMagi Y) = 0Cov (XMagi Y) = 0.

(note that Cov (XMagi Y) = 0Cov (XMagi Y) = 0 does not mean that X and Y are independent)

Correlation coefficient

Positive covariance expresses positive correlation, negative covariance expresses negative correlation. For the same two random variables, the larger the calculated covariance is, the stronger the correlation is.

But then the question is, the covariance of height and weight is 30. How big is this? If we find that the covariance between height and shoe size is 5, does it mean that there is a stronger correlation between height and weight than shoe size?

This horizontal comparison is beyond the ability of covariance. From daily life experience, the fluctuation of weight is about 20kg, while the fluctuation of shoe size may be only about five sizes. Therefore, for weight, the deviation between the 5kg and the center is not large, and the five-size shoe size gap may be the most extreme case. Suppose the correlation intensity between height and weight is similar to that between height and shoe size, but because the weight itself fluctuates more up and down, the calculated covariance is also greater. In another case, the covariance of height and weight is still calculated. The data is completely unchanged, but only the units are changed. Our weight is measured in grams instead of kilograms, and the calculated co-defense error is 1000 times the original value!

In order to make such a horizontal comparison, we need to rule out a uniform way to quantify the ups and downs of a random variable. At this point, we calculate the correlation coefficient (correlation coefficient). The correlation coefficient is the covariance of "normalization". It is defined as follows:

ρ = Cov (X) Var (Y) Var (X) Var (Y) −√ ρ = Cov (X) Var (X) Var (Y)

The correlation coefficient is the covariance divided by the standard deviation of two random variables. The correlation coefficient varies between-1 and 1. The numerical value will no longer soar as a result of the change in the unit of measurement.

Still using the above height and weight data, we can calculate

Var (X) = 0.3 × (60 − 70) 2 × 0.3 × (80 − 70) 2=60Var (X) = 0.3 × (60 − 70) 2 × 0.3 × (80 − 70) 2 × 60

Var (Y) = 0.3 × (180 − 170) 2 × 0.3 × (160 − 170) 2=60Var (Y) = 0.3 × (180 − 170) 2 × 0.3 × (160 − 170) 2 × 60

ρ = 30tick 60mm 0.5 ρ = 30max 60mm 0.5

Such a "normalized" correlation coefficient makes it easier for people to grasp the strength of the correlation, and it is easier to make a horizontal comparison of the correlation between different random variables.

Bivariate normal distribution

Bivariate normal distribution is a common joint distribution. It describes the probability distribution of two random variables X1X1 and X2X2. The expression of probability density is as follows:

F (x1 ~ x2) = 12 π σ 1 σ 21 − ρ 2 −√ exp [− Z2 (1 − ρ 2)] f (x1 ~ x2) = 12 π σ 1 σ 21 − ρ 2exp 2exp [− z2 (1 − ρ 2)]

Among them

Z = (x1 − μ 1) 2 σ 21 − 2 ρ (x 1 − μ 1) (x 2 − μ 2) σ 1 σ 2 + (x 2 − μ 2) 2 σ 22z = (x 1 − μ 1) 2 σ 12 − 2 ρ (x 1 − μ 1) (x 2 − μ 2) σ 1 σ 2 + (x 2 − μ 2) 2 σ 22

The edge densities of X1X1 and X2X2 are two normal distributions, namely, normal distribution N (μ 1, σ 1) N (μ 1, σ 1), N (μ 2, σ 2) N (μ 2, σ 2).

On the other hand, unless ρ = 0 ρ = 0, the joint distribution is not a simple multiplication of two normal distributions. It can be proved that ρ ρ is the correlation coefficient of two variables in bivariate normal distribution.

We now draw an image of the distribution. Unfortunately, scipy.stats is not supposed to be distributed today. We need to write it ourselves.

Select the normal distribution to be plotted. For the sake of simplicity, let μ 1 = 0 μ 1 = 0, μ 2 = 0 μ 2 = 0, σ 1 = 1 σ 1, σ 2 = 1, σ 2 = 1.

Let us first let ρ = 0 ρ = 0, where the joint distribution is equivalent to the product of two normal distributions. Draw the same distribution from different perspectives, and the results are as follows. As you can see, the probability distribution is Centrosymmetric.

Then let ρ = 0.8 ρ = 0.8, that is, the correlation coefficient of the two random variables is 0.8. Draw the same distribution from different perspectives, and the results are as follows. As you can see, the probability distribution is not centrally symmetric. Along the Y=XY=X line, the probability surface rises, and the probability is obviously higher. Along the line Y = − XY= − X, the probability is low. This is what we call positive correlation.

Now, for us, ρ has a more specific practical significance. : -)

# By Vameifrom scipy.stats import normimport numpy as np# this function is to generate a pdf of bivariate normal distributiondef bivar_norm (mu1, mu2, sigma1, sigma2, rho): # pdf of bivariate norm def pdf (x1) X2): # get z part1 = (x1-mu1) * * 2/sigma1**2 part2 =-2.rhoh * (x1-mu1) * (x2-mu2) / sigma1*sigma2 part3 = (x2-mu2) * * 2/sigma2**2 z = part1 + part2 + part3 cof = 1 / (2.*np.pi*sigma1*sigma2*np.sqrt (1-rho**2)) return cof * np.exp (- z / (2.* (1-rho**2) return pdfpdf1 = bivar_norm (0 0,1,1,0) pdf2 = bivar_norm (0,0,1,1,0.8) from mpl_toolkits.mplot3d import Axes3Dfrom matplotlib import cmfrom matplotlib.ticker import LinearLocator, FormatStrFormatterimport matplotlib.pyplot as plt# plot functiondef space_surface (pdf, xp, yp, zlim, rot1=30, rot2=30): fig = plt.figure () ax = fig.gca (projection='3d') X = np.arange (* xp) Y = np.arange (* yp) X, Y = np.meshgrid (X) Y) Z = pdf (X, Y) surf = ax.plot_surface (X, Y, Z, rstride=8, cstride=8, alpha = 0.3) cset = ax.contour (X, Y, Z, zdir='z', offset=zlim [0], cmap=cm.coolwarm) cset = ax.contourf (X, Y, Z, zdir='x', offset=xp [0], cmap=cm.coolwarm) cset = ax.contourf (X, Y, Z, zdir='y', offset=yp [0]) Cmap=cm.coolwarm) for angle in range (rot1 + 0, rot1 + 360): ax.view_init (rot2, angle) ax.set_zlim (* zlim) ax.zaxis.set_major_locator (LinearLocator (10)) ax.zaxis.set_major_formatter (FormatStrFormatter ('.02f') ax.set_xlabel ("X") ax.set_ylabel ("Y") ax.set_zlabel ("f (x) ) # fig.colorbar (surf, shrink=0.5, aspect=5) xp = [- 3,3,0.05] zlim1 = [- 0.15,0.15] zlim2 = [- 0.25,0.15] space_surface (pdf1, xp, yp, zlim1, 30,20) space_surface (pdf1, xp, yp, zlim1, 60,45) space_surface (pdf2, xp, yp, zlim2, 30,20) space_surface (pdf2, xp, yp, zlim2, 60) 45) the content of "how to define Python covariance and correlation coefficient" ends here. Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.