Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to realize 8 probability Distribution formulas by Python

2025-02-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)05/31 Report--

Most people do not understand the knowledge points of this article "Python how to achieve 8 probability distribution formulas", so the editor summarizes the following content, detailed content, clear steps, and has a certain reference value. I hope you can get something after reading this article. Let's take a look at this "how to achieve 8 probability distribution formulas in Python" article.

Preface

Probability and statistical knowledge are the core of data science and machine learning; we need statistical and probabilistic knowledge to effectively collect, review and analyze data.

There are several examples of phenomena in the real world that are considered to be statistical (i.e. weather data, sales data, financial data, etc.). This means that in some cases, we have been able to develop methods to help us simulate nature through mathematical functions that can describe the characteristics of the data.

"the probability distribution is a mathematical function that gives the probability of different possible results in the experiment."

Understanding the distribution of data helps to better simulate the world around us. It can help us to determine the possibility of various outcomes, or to estimate the variability of events. All this makes it valuable to understand different probability distributions in data science and machine learning.

1. Uniform distribution

The most direct distribution is uniform distribution. Uniform distribution is a probability distribution in which all results are equally likely. For example, if we roll a fair dice, the probability of falling on any number is 1 pound 6. This is a discrete uniform distribution.

But not all uniform distributions are discrete-they can also be continuous. They can take any actual value within a specified range. The probability density function (PDF) of continuous uniform distribution between an and b is as follows:

Let's take a look at how to code them in Python:

Import numpy as np import matplotlib.pyplot as plt from scipy import stats # for continuous a = 0b = 50 size = 5000 X_continuous = np.linspace (a, b, size) continuous_uniform = stats.uniform (loc=a, scale=b) continuous_uniform_pdf = continuous_uniform.pdf (X_continuous) # for discrete X_discrete = np.arange (1,7) discrete_uniform = stats.randint (1,7) discrete_uniform_pmf = discrete_uniform.pmf (X_discrete) # plot both tables fig, ax = plt.subplots (nrows=1, ncols=2) Figsize= # discrete plot ax [0] .bar (X_discrete, discrete_uniform_pmf) ax [0] .set _ xlabel ("X") ax [0] .set _ ylabel ("Probability") ax [0] .set _ title ("Discrete Uniform Distribution") # continuous plot ax [1] .plot (X_continuous) Continuous_uniform_pdf) ax [1] .set _ xlabel ("X") ax [1] .set _ ylabel ("Probability") ax [1] .set _ title ("Continuous Uniform Distribution") plt.show ()

two。 Gaussian distribution

The Gaussian distribution is probably the most frequently heard and familiar distribution. It has several names: some call it a bell curve because its probability graph looks like a bell, some call it a Gaussian distribution, because Karl Gauss, the German mathematician who first described it, is named, and some people call it a normal distribution. because early statisticians noticed that it happened over and over again.

The probability density function of normal distribution is as follows:

σ is the standard deviation and μ is the average value of the distribution. It should be noted that in a normal distribution, the mean, mode, and median are all equal.

When we draw a random variable with a normal distribution, the curve is symmetrical around the mean-half the value on the left side of the center and half on the right side of the center. And the total area under the curve is 1.

Mu = 0 variance = 1 sigma = np.sqrt (variance) x = np.linspace (mu-3*sigma, mu + 3*sigma, 100) plt.subplots (figsize= (8,5)) plt.plot (x, stats.norm.pdf (x, mu, sigma) plt.title ("Normal Distribution") plt.show ()

For normal distribution. The rule of thumb tells us that the percentage of the data falls within a certain amount of standard deviation of the average. These percentages are:

68% of the data fall within a standard deviation of the average.

95% of the data fall within two standard deviations of the average.

99.7% of the data fall within the three standard deviations of the average.

3. Lognormal distribution

Lognormal distribution is the continuous probability distribution of random variables with lognormal distribution. Therefore, if the random variable X is lognormal, then Y = ln (X) has a normal distribution.

This is the PDF of the lognormal distribution:

The random variables of lognormal distribution only take positive real values. Therefore, the lognormal distribution creates a right deviation curve.

Let's draw it in Python:

X = np.linspace (0,6,500) std = 1 mean = 0 lognorm_distribution = stats.lognorm ([std], loc=mean) lognorm_distribution_pdf = lognorm_distribution.pdf (X) fig, ax = plt.subplots (figsize= (8,5)) plt.plot (X, lognorm_distribution_pdf, label= "μ = 0, σ = 1") ax.set_xticks (np.arange (min (X), max (X)) std = 0 mean = 0 lognorm_distribution = stats.lognorm ([std]) Loc=mean) lognorm_distribution_pdf = lognorm_distribution.pdf (X) plt.plot (X, lognorm_distribution_pdf, label= "μ = 0, σ = 0.5") std = 1.5 mean = 1 lognorm_distribution = stats.lognorm ([std], loc=mean) lognorm_distribution_pdf = lognorm_distribution.pdf (X) plt.plot (X, lognorm_distribution_pdf, label= "μ = 1, σ = 1.5") plt.title ("Lognormal Distribution") plt.legend () plt.show ()

4. Poisson distribution

The Poisson distribution is named after the French mathematician Simon Dennis Poisson. This is a discrete probability distribution, which means that it calculates events with finite results-in other words, it is a counting distribution. Therefore, the Poisson distribution is used to show the number of times an event is likely to occur in a specified period of time.

If an event occurs at a fixed rate in time, the probability of observing the number of events in time (n) can be described by Poisson distribution. For example, a customer may arrive at a cafe at an average rate of three times per minute. We can use Poisson distribution to calculate the probability of 9 customers arriving within 2 minutes.

The following is the formula of probability quality function:

λ is the event rate of a unit of time-in our case, it is 3. K is the number of occurrences-in our case, it is 9. Here you can use Scipy to complete the probability calculation.

From scipy import stats print (stats.poisson.pmf (Kappa 9, mu=3))

Output:

0.002700503931560479

The curve of Poisson distribution is similar to the normal distribution, and λ represents the peak value.

X = stats.poisson.rvs (mu=3, size=500) plt.subplots (figsize= (8,5)) plt.hist (X, density=True, edgecolor= "black") plt.title ("Poisson Distribution") plt.show ()

5. Exponential distribution

The exponential distribution is the probability distribution of time between events in the Poisson point process. The probability density function of the exponential distribution is as follows:

λ is a rate parameter and x is a random variable.

X = np.linspace (0,5,5000) exponetial_distribtuion = stats.expon.pdf (X, loc=0, scale=1) plt.subplots (figsize= (8 exponetial_distribtuion 5)) plt.plot (X, exponetial_distribtuion) plt.title ("Exponential Distribution") plt.show ()

6. Binomial distribution

Binomial distribution can be regarded as the probability of success or failure in the experiment. Some people may also describe it as the probability of flipping a coin.

The binomial distribution with parameters n and p is a discrete probability distribution of success times in n independent experimental sequences. each experiment asks a yes-no question, and each experiment has its own Boolean result: success or failure.

In essence, binomial distribution measures the probability of two events. The probability of one event is p, and the probability of another event is 1MIP.

This is the formula of binomial distribution:

P = binomial distribution probability

= number of combinations

X = the number of specific results in n trials

P = probability of success in a single experiment

Q = probability of failure in a single experiment

N = the number of experiments

The visualization code is as follows:

X = np.random.binomial (nasty 1, pendant 0.5, size=1000) plt.subplots (figsize= (8,5)) plt.hist (X) plt.title ("Binomial Distribution") plt.show ()

7. Student t distribution

The student t-distribution (or t-distribution for short) is any member of the continuous probability distribution family when estimating the mean of the normal distribution population when the sample size is small and the overall standard deviation is unknown. It was developed by British statistician William Ciley Gossett (William Sealy Gosset) under the pseudonym "student".

PDF is as follows:

N is a parameter called "degree of freedom", and it can sometimes be seen as "d.o.f." For higher n values, the t distribution is closer to the normal distribution.

Import seaborn as sns from scipy import stats X1 = stats.t.rvs (df=1, size=4) X2 = stats.t.rvs (df=3, size=4) X3 = stats.t.rvs (df=9, size=4) plt.subplots (figsize= (8 d.o.f 5)) sns.kdeplot (X1, label = "1 d.o.f") sns.kdeplot (X2, label = "3 d.o.f") sns.kdeplot (X3, label = "6 d.o.f") plt.title ("Student's t distribution") plt.legend () plt.show ()

8. Chi-square distribution

Chi-square distribution is a special case of gamma distribution; for k degrees of freedom, chi-square distribution is the sum of the squares of some independent standard normal random variables.

PDF is as follows:

This is a popular probability distribution, which is often used in hypothesis testing and confidence interval construction.

Draw some sample diagrams in Python:

X = np.arange (0,6,0.25) plt.subplots (figsize= (8,5)) plt.plot (X, stats.chi2.pdf (X, df=1), label= "1 d.o.f") plt.plot (X, stats.chi2.pdf (X, df=2), label= "2 d.o.f") plt.plot (X, stats.chi2.pdf (X, df=3), label= "3 d.o.f") plt.title ("Chi-squared Distribution") plt.legend () plt.show ()

The above is the content of this article on "how to achieve 8 probability distribution formulas in Python". I believe we all have a certain understanding. I hope the content shared by the editor will be helpful to you. If you want to know more about the relevant knowledge, please pay attention to the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report