What are the basic mathematical knowledge in machine learning? 07/06 Update SLTechnology News&Howtos

What are the basic mathematical knowledge in machine learning?

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article introduces the relevant knowledge of "what is the basic mathematical knowledge in machine learning". In the operation of actual cases, many people will encounter such a dilemma. Next, let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Note: the code for this article is written in Python 3.

The first formula of linear algebra (linear algebra) f (x) = xwT+b

This is the most common formula in machine learning. I call this the first formula of machine learning, which is actually the linear classification function (linear classifier).

The goal of training the classifier is to find out (w..b) (w..b).

Where:

Xx is an one-line matrix [[x1rect x2jen]] [[x1rect x2meme. Dint xn]].

Ww is an one-line matrix [[W1MagneW2JOWN]] [[WLECHEROLING W2JING]].

Xx and ww have the same dimensions.

Bb is a number.

XwT= ∑ ni=1xiwixwT= ∑ i=1nxiwi, called dot product (dot product).

Sometimes, we also see this formula expressed as something similar to the following, and their basic meanings are all the same.

F (x) = wx+bf (x) = wx+b

F (x) = wTx+bf (x) = wTx+b

F (x) = w "⋅ x" + bf (x) = w →⋅ x → + b

Note: here ww is expressed as an one-dimensional array (or vector, vector (vector)) [x1Magnex2Magne.Magazine xn] [x1Personx2reparence.Perry xn].

Note: one-dimensional array: mathematically, it can be understood as a vector, representing a point in a multi-dimensional space.

Note: because in linear algebra, matrix multiplication ab ≠ baab ≠ ba, so for the expression wTxwTx, strictly speaking, the vector (vector) should be regarded as a column of matrix (rather than a row of matrix), in order to meet the mathematical definition.

Note: the expressions w '⋅ x' w →⋅ x → and wxwx are correct because ww and xx are vectors, which meet the definition of vector calculation.

Operation of matrix

Since this article is written from a mathematical point of view, we first focus on the operation of the matrix.

Transposition (transpose)

Transposition operation of the matrix: the numbers in the matrix are exchanged diagonally.

Mathematical formula: wTwT

Code example:

# Matrix Transposem = numpy.mat ([[1,2], [3,4]]) print ("Matrix.Transpose:") print (m.T)''Output:Matrix.Transpose: [[13] [24]]' 'matrix multiplication

The meaning of matrix multiplication

If a jin of apples is 10 yuan, how much is 5 jin of apples? The answer is: 10 ∗ 55010 ∗ 55050

How much is 10 yuan for a jin of apples, 20 yuan for a jin of pears, and 2 jin of pears for 5 jin of apples?

The answer is:

[1020] [52] = 10 × 5 × 20 × 2 × 90 (2) [1020] [52] = 10 × 5 × 20 × 2 × 90

We can see the constraint of matrix multiplication: the number of columns of multiplier 1 should be equal to the number of rows of multiplier 2.

Matrix multiplication does not satisfy commutative law.

M1 ⋅ m2 ≠ m2 ⋅ M1 (3) (3) M1 ⋅ m2 ≠ m2 ⋅ M1

Let's take a look at the result of the calculation after the exchange of multipliers:

[1020] [52] = [501002040] (4) [1020] [52] = [502010040]

For example, counting 2020 means how much is 2 jin of apples.

Give an example of their differences:

M1 = [12] (5) (5) M1 = [12]

M2 = [1020] (6) (6) M2 = [1020]

M1 ⋅ m2m1 ⋅ m2 is calculated as follows:

M1 ⋅ m2 = [12] [1020] 1 ∗ 10 ∗ 20 = [50] (7) M1 ⋅ m2 = [1020] [12] 1 ∗ 10 ∗ 20 = [50]

M2 ⋅ m1m2 ⋅ M1 is calculated as follows:

M2 ⋅ M1 / 1020110 ∗ 120 ∗ 1210 ∗ 220 ∗ 2 = [10202040] (8) m2 ⋅ M1 / 121010 ∗ 110 ∗ 22020 ∗ 120 ∗ 2 = [10202040]

Calculation formula

Matrix multiplication is: a matrix is obtained by using the dot product of each row of matrix 1 and each column of matrix 2.

The matrix of l ∗ ml ∗ m is multiplied by the matrix of m ∗ nm ∗ n to form a matrix of l ∗ nl ∗ n.

X ⋅ y = [x1 ⋯xn] ⎡⎣⎢ y1 ⋯yn ⎤⎦⎥ = [∑ ni=1xiyi] x ⋅ y = ⎡⎣⎢ x1 ⋯xm ⎤⎦⎥ [y1 ⋯yn] = ⎡⎣⎢ x1y1 ⋯xmy1 ⋯⋯⋯x1yn ⋯xmyn ⎤⎦⎥ x ⋅ y = ⎡⎣⎢⎢⎢ x11x21 ⋯xm1 x1nx2n xmn y11y21 yn1 y1qy2q ynq ynq = ni=1x1iyi1 ni=1x1iyi1 (9) (9) x ⋅ y = [x1 ⋯xn] [y1 ⋯yn] = [∑ i=1nxiyi] x ⋅ y = [x1 ⋯xm] [y1 ⋯yn] = [x1y1 ⋯x1yn ⋯⋯⋯xmy1 ⋯xmyn] x ⋅ y = [x11 ⋯x1nx21 ⋯x2n ⋯⋯⋯xm1 ⋯xmn] [Y11 ⋯y1qy21 ⋯y2q ⋯⋯⋯yn1 ⋯ynq] = [i=1nx1iyi1 i=1nx1iyiq i=1nx1iyiq i=1nxmiyi1]

Code demonstration:

# Matrix Multiplicationprint ("Matrix Multiplication") a = numpy.mat ([1,2]) b = numpy.mat ([[10], [20]]) print (a * b) print (a.T * b.T) a = numpy.mat ([[1,2], [3,4]]) b = numpy.mat ([[10,20], [30]) 40]]) print (a * b)''Output: [[50]] [[10 20] [20 40]] [[70 100] [150 220]]' 'matrix various product operation mathematical symbols PythonDemo dot product (dot product) ababa.dot (b)

Numpy.dot (a, b)

AB= (1 ∗ 2) (1020) = 1 ∗ 10 (2) ∗ 20 (50) (10) (10) AB= (1) 2) (1020) = 1 ∗ 10 2 ∗ 20 50

Inner product (inner product) a ⋅ ba ⋅ b

⟨⟩⟨b ⟩⟨numpy.inner b ⟩numpy.inner

A ⋅ b=abT (11) (11) a ⋅ b=abT

Outer product (outer product) a ⊗ ba ⊗ bnumpy.outer (a, b)

A ⊗ B = (12) (1020) = (1 ∗ 102∗ 101∗ 202∗ 20) = (10202040) (12) (12) A ⊗ B = (12) (1020) = (1 ∗ 101∗ 202∗ 102∗ 20) = (10202040)

Element product (element-wise product, point-wise product, Hadamard product) a ∘ ba ∘ b

A ⊙ ba ⊙ bnumpy.multiply (a, b)

A ⊙ B = (1324) (1020) = (1 ∗ 103∗ 102∗ 204∗ 20) = (10304080) (13) (13) A ⊙ B = (1234) (1020) = (1 ∗ 102∗ 203104 ∗ 20) = (10403080)

Note: in Python, matrix data can be represented as matrix and ndarray.

The two types of operations are very similar, but there are slight differences.

Ndarray * operation: element-wise product.

Matrix * operation: dot product.

Numpy.multiply for ndarray: element-wise product. Same.

Numpy.multiply for matrix: element-wise product. Same.

Numpy.dot for ndarray: inner product. 1Mutual d array.

Numpy.dot for matrix: dot product. Shape determined by values.

Numpy.inner for ndarray: inner product. 1Mutual d array.

Numpy.inner for matrix: inner product. Shape determined by values.

Numpy.outer for ndarray: outer product. Same.

Numpy.outer for matrix: outer product. Same.

Inner product

English: inner product, scalar product.

The dimensionality reduction of a vector becomes a number.

The inner product of a matrix is the inner product of each row and column.

Xy= ⟨x Magi y ⟩= ∑ ni=1xiyi (14) (14) xy= ⟨x Magi y ⟩= ∑ i=1nxiyi

X = numpy.array ([1,2]) y = numpy.array ([10,20]) print ("Array inner:") print (numpy.inner (x, y))''Output:Array inner:50'''x = numpy.mat ([[1,2], [3,4]]) y = numpy.mat ([10,20]) print ("Matrix inner:") print (numpy.inner (x, y))' 'Output:Matrix inner: [50] [110]]''

The outer product of mm dimension vector and nn dimension vector is m ∗ nm ∗ n is a matrix.

The outer product of A1 ∗ a2a1 ∗ a2-dimensional vector and b1 ∗ b2b1 ∗ b2-dimensional matrix is that (A1 ∗ a2) ∗ (b1 ∗ b2) (A1 ∗ a2) ∗ (b1 ∗ b2) is a matrix.

X ⊗ y = ⎡⎣⎢⎢⎢ x1x2 ⋯xm ⋯⋯⋯⋯x1nx2n ⋯xmn ⎤⎦⎥⎥⎥⎡⎣⎢⎢⎢⎢ y1y2 ⋯yp y1qy2q ⋯xpq ⎤⎦⎥⎥⎥⎥ = ⎡⎣⎢ x1y1 ⋯x1ny1x2y1 ⋯xmny1 ⋯x1y1q ⋯x1ny1qx2y1q ⋯xmny1qx1y2 ⋯x1ny2x2y2 ⋯xmny2 ⋯x1ypq x1nypqx2ypq x1nypqx2ypq (15) (15) x x1nypqx2ypq y = [x1 xmnypq xmnypq x2n xmnypq xmnypq ] [y1 ⋯y1qy2 ⋯y2q ⋯⋯⋯yp ⋯xpq] = [x1y1 ⋯x1y1qx1y2 ⋯x1ypq ⋯x1ny1 ⋯x1ny1qx1ny2 ⋯x1nypqx2y1 ⋯x2y1qx2y2 ⋯x2ypq ⋯xmny1 ⋯xmny1qxmny2 ⋯xmnypq]

X = numpy.array ([1,3]) y = numpy.array ([10,20]) print ("Array outer:") print (numpy.outer (x, y))''Output:Array outer: [[10 20] [30 60]]''x = numpy.mat ([[1, 2], [3, 4]]) y = numpy.mat ([10, 20]) print ("Matrix outer:") print (numpy.outer (x) Y)''Output:Matrix outer: [[10 20] [20 40] [30 60] [40 80]]

Note: have you found that matrix outer is the union of vector outer?

Element product (element-wise product/point-wise product/Hadamard product)

Calculation formula

X ⋅ y = [x1 ⋯xn] [y1 ⋯yn] = [x1y1 ⋯xnyn] x ⋅ y = [x1 ⋯xn] ⎡⎣⎢ y1 ⋯ym ⎤⎦⎥ = ⎡⎣⎢ x1y1 ⋯x1ym ⋯⋯⋯xny1 ⋯xnym ⎤⎦⎥ x ⋅ y = ⎡⎣⎢ x11 ⋯xm1 ⋯⋯⋯x1n ⋯xmn y11 ym1 y1n xn = x11y11 x11y11 xm1ym1 x11y11 (16) x xm1ym1 y = [x1 x1ny1n] [y1 x1ny1n] = [x1y1 ⋯xnyn] x ⋅ y = [x1 ⋯xn] [y1 ⋯ym] = [x1y1 ⋯xny1 ⋯⋯⋯x1ym ⋯xnym] x ⋅ y = [x11 ⋯x1n ⋯⋯⋯xm1 ⋯xmn] [y11 ⋯y1n ⋯⋯⋯ym1 ⋯xn] = [x11y11 ⋯x1ny1n ⋯⋯⋯xm1ym1 ⋯xmnynn]

X = numpy.array ([1,3]) y = numpy.array ([10,20]) print ("Array element-wise product:") print (x * y)''Output:Array element-wise product: [10 60]' 'plus x = numpy.mat ([[1,2], [3,4]]) y = numpy.mat ([[10,20], [30]) 40]]) print ("Matrix Add:") print (x + y)''Output:Matrix Add: [[11 22] [33 44]]' 'lower mathematics

Summation formula

Everyone should know that.

∑ iFix1Nxixix1x2 + ⋯+ xn (17) ∑ iFix1Nxixix1roomx2 + ⋯+ xn

The formula for finding the total product

∏ i=1Nxi=x1 × x2 × ⋯× xn (18) ∏ i=1Nxi=x1 × x2 × ⋯× xn

Logarithm

The meaning of logarithm:

Mathematical expression

The length of the number.

Turn multiplication into addition.

Solve the underflow problem: a problem caused by the multiplication of too many very small numbers.

Log (x) = log10xlog2xln (x) (19) (19) log (x) = log10 "xlog2" xln (x)

Since the result of the logarithm of different bases is proportional, it doesn't matter who the base number is sometimes.

Equal ratio

Aa is equal to bb. It can be used to calculate the algorithm complexity.

A ba ∝ b (20) a ba ∝ b

Bottom round (floor) and top round (ceil)

Floor: ⌊ x ⌋ ceil: ⌈ x ⌉ (21) (21) floor: ⌊ x ⌋ ceil: ⌈ x ⌉

Geometric norm (norm)

L1 norm

∥ w ∥ 1: L1 norm, that is, the sum of the absolute values of each item.

∥ w ∥ 1 = ∑ ni=1 | wi | (22) (22) w ∥ 1 = ∑ iTunes 1n | wi |

L2 norm

∥ w ∥ or ∥ w ∥ 2: L2 norm, that is, the square root of the sum of squares of each item.

∥ w ∥ = ∑ ni=1w2i −√ (23) (23) w i=1nwi2 = ∑ i=1nwi2

Lagrange multiplier method and KKT condition

If the equation f (x) = wx+bf (x) = wx+b has inequality constraints, the Lagrangian multiplier method and the KKT condition provide a way to calculate (WMagneb) (wMagneb).

L (WBI b, α) (24) (24) L (WP b, α)

For the Lagrange multiplier method and the KKT condition, see:

Deep understanding of Lagrange multiplier method (Lagrange Multiplier) and KKT condition

Differential representation

F'(x) or partial differential in Leibniz notation: ∂ f (x) ∂ xdydxor: ∇ f (x) ∇ x: the gradient of f at x (25) f'(x) or partial differential in Leibniz notation: ∂ f (x) ∂ xdydxor: ∇ f (x) ∇ x: the gradient of f at x

Meaning

Df (x) dx=limh → 0f (x) − f (x) hwhereddx is an operation of f (x) (26) df (x) dx=limh → 0f (x) − f (x) hwhereddx is an operation of f (x)

The mathematical meaning is that at xx point, the change of f (x) f (x) is divided by the change of xx.

Mathematically, it can be thought of as: slope

Machine learning refers to: gradient.

After calculating the gradient, multiplied by a ratio (step size), the correction value can be obtained, which can be used for back propagation (correction) weight.

Partial differential: partial differential, which represents the differential of a function in a certain dimension. At this point, other dimensions can be treated as constants.

Law, differential, partial differential and sum of laws (sum rule) (fregg)'= faugueg'(fregg)'= faggog'

∂ (UFV) ∂ x = ∂ u ∂ x + ∂ v ∂ x (27) ∂ (UFV) ∂ x = ∂ u ∂ x + ∂ v ∂ x

Product rule (product rule) (f ⋅ g)'= f '⋅ gendf ⋅ g' (f ⋅ g)'= f '⋅ g% f ⋅ g'

∂ (u ⋅ v) ∂ Xunu ⋅∂ v ∂ Xunv ⋅∂ u ∂ x (28) ∂ (u ⋅ v) ∂ Xunu ⋅∂ v ∂ Xunv ⋅∂ u ∂ x

Chain rule (chain rule of differentiation) (f (g (x)'= f'(g (x)) g'(x) (f (g (x)'= f'(g (x)) g'(x)

∂ z ∂ x = ∂ z ∂ y ⋅∂ y ∂ x (29) (29) ∂ z ∂ x = ∂ z ∂ y ⋅∂ y ∂ x

Common derivative formula f (x) f'(x) axaxaaxnxnnxn − 1nxn − 1x+cx+c11exexexexln (x) ln (x) 1x1x Statistics / probability Theory

Bayesian formula (Bayes formula)

P (A | B) = p (B) p (A) p (B) wherep (A): the probability of observing event A.p (B): the probability of observing event B.p (A | B): the probability of observing event A given that B is true.p (B | A): the probability of observing event B given that An is true. (30) (30) p (A | B) = p (B | A | A) p (A) p (B) wherep (A): the probability of observing event A.p (B): the Probability of observing event B.p (A | B): the probability of observing event A given that B is true.p (B | A): the probability of observing event B given that An is true.

For example, in the algorithm for judging spam:

P (A): the probability of spam in all messages.

P (B): the probability of a word appearing.

P (B | A): the probability of a word appearing in spam.

P (A | B): the probability that a message with a word appears is spam.

Information Theory Shannon Entropy (Shannon Entropy)

The definition of entropy

In information theory, entropy is the average amount of information contained in each message received, also known as information entropy, source entropy, and average self-information.

Entropy is defined as the expected value of information.

Entropy is actually the mathematical expectation of multiplying and summing up the bit quantity and sequential occurrence probability of random variables.

Entropy is usually measured in bits, bit or sh (annon) (based on 2), but it is also measured in nat (based on natural logarithm) and Hart (based on 10), depending on the base of logarithm used in the definition.

The unit of entropy is not important. (because it is a logarithm, it is proportional. It doesn't matter if you don't understand this sentence. )

Entropy is a value > = 0.

If 0, the result can be accurately predicted. As can be seen from the following formula, the probability is 1. 5%.

The characteristics of entropy

The smaller the probability of occurrence of information, the greater the entropy.

The entropy of common sense is 0.

From the point of view of calculating the loss: the greater the entropy value, the greater the loss.

Expected value

In probability theory and statistics, the expected value of a discrete random variable (or mathematical expectation, or mean, or expected value in physics) is the probability of each possible result in the experiment multiplied by the sum of its results.

For example, if you roll the dice, the expected value of the points is 3.5:

E (x) = 1 ∗ 1 ∗ 6, ∗ 2, 6, 1 ∗ 3, ∗ 4, ∗ 5, 6, 1, ∗, 6, 5, E (x) = 1 ∗, 1, ∗, 2, 6, ∗, 3, ∗, 6, 1, ∗, 4, 6, 1, ∗, 5, ∗, 6, 3, 5, 5.

A popular understanding

Information entropy is:

The sum of each (probability of the value * the length of the value).

The calculation Formula of Information Entropy of data set

H (X) = E [I (X)] = E [− lnP (X)] = ∑ i=1nP (xi) I (xi) = −∑ i=1nP (xi) logP (xi) (31) (32) (33) (34) whereH (X): information entropy of data set X. E (): calculate the expected value. I (): find the information value (surprise value). X: data set X. Xi: an enumerated value of the label of the data set X. I (xi): the amount of information of the xi (informationself) I (xi) = − log (P (xi)) P (xi): the probability of the occurrence of Xeroi. The probabilistic quality function (probability mass function) of x. P (xi) = count (xi) / len (X). (31) H (X) = E [I (X)] (32) = E [− lnP (X)] (33) = ∑ i=1nP (xi) I (xi) (34) = −∑ i=1nP (xi) log entropy P (xi) whereH (X): the information entropy of the data set X. E (): calculate the expected value. I (): find the information value (surprise value). X: data set X. Xi: an enumerated value of the label of the data set X. I (xi): the amount of information of the xi (informationself) I (xi) = − log (P (xi)) P (xi): the probability of the occurrence of Xeroi. The probabilistic quality function (probability mass function) of x. P (xi) = count (xi) / len (X).

The role of entropy

Calculate loss (Loss function)

Used to adjust the step size of the gradient decline. The entropy (loss) of this time is larger than that of last time, indicating that the step size is too large. )

Used for decision tree

The greater the entropy, the stronger the ability of feature to divide data.

Game theory

Tendency relationship (preference relation)

Describes the tendency of the player, "x" yx "y" means "x is at least as good as y".

I don't know where to put it.

Find the maximum parameter

Mathematical representation

ArgmaxcP (c) argmaxcP (c)

explain

Can be used to return a possibility to a large classification.

Returns the value of c when P (c) is the maximum.

For example:

C ∈ {1 ∴ 2} P (1) = 0.9p (2) = 0.1∴ argmaxcP (c) = 1 (35) (35) c ∈ {1je 2} P (1) = 0.9p (2) = 0.1 ∴ argmaxcP (c) = 1

Return the maximum value

Mathematical representation

Maxa ∈ AP (a) maxa ∈ AP (a)

explain

In all a ∈ Aa ∈ A calculations, the maximum value P (a) P (a) is returned.

Constraint condition (Subject to)

Mathematical representation

Yichun 2xfei 1jinshengs.t. X > 0y, 2xx, 1m. T. X > 0

explain

When the constraint x > 0x > 0 is established, there is y=2x+1y=2x+1.

Equal in definition

Mathematical representation

A ≐ BA ≐ B

explain

An is defined as B.

2 complement (2's complement)

A method of using binary to represent the number of symbols.

The first bit is the symbolic bit.

If it is 0, write it down as 0

If 1, write down as − 2n − 1, n is the size of the number − 2n − 1, n is the size of the number.

For example: 0010 is 2; 1010 is-6.

Machine learning activation function

Please read my other blog post:

Neural Network Learning Notes-the function, definition and differential proof of Activation function

Loss function

Please read my other blog post:

The definition and differential proof of Neural Network Learning Note-loss function

Appendix meaning and pronunciation of Greek letters in uppercase and lowercase English Chinese meaning 1 "α alphaa:lf Alpha 2" β betabet beta 3 "γ gammaga:m gamma 4 delta delta deltadelt Delta δ: delta value, deviation 5" ε epsilonep'silon Ipsilon 6 "ζ zetazat cut-off tower 7" η Etait Eta 8 "θ thet θ it 9" iotaiot Yota 10 kappakapkappa Kapa 11 ∧ λ lambdalambd Lambda 12 "μ mumju Mu 13 v nunju new 14" ξ xiksiksiksiksiksi ξ: Relaxation variable 15 "omicronomik'ron Omicron 16 ∏ π pipai Pi: pi 17" ρ rhorou meat 18 ∑ σ sigma'sigma 19 "τ tautau set 20" upsilonjup'silon Yupu Xilong 21 "φ phifai Buddha Love 22" χ chiphai Kai 23 ψ psipsai Pussy 24 Ω ω omegao'miga omega

Relaxation variable (slack variable): in SVM, the tolerance value set in order to handle outliers (points in another category).

The meaning and pronunciation of mathematical symbols uppercase and lowercase English pronunciation Chinese meaning 1 ∂∂ partial- partial score 1 ∞∞ infinity- infinite "what are the basic mathematical knowledge in machine learning" is introduced here, thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.