Machine Learning (4): supervised Learning 07/13 Update SLTechnology News&Howtos

Machine Learning (4): supervised Learning

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >

Shulou(Shulou.com)06/01 Report--

Supervised learning is a very important category in machine learning, because the main starting point of ML is to use the obtained data to compensate for unknown knowledge, so learning the pattern rules in the data from the training set is the most natural situation. Today, I decided to use about two weeks to record and organize my machine learning notes. The main reference material is Ethen Alpaydin's introduction to Machine Learning. If there are any mistakes or omissions, please criticize and correct them. Today, we will mainly talk about supervised learning from a macro point of view. The main points are as follows:

1. An example of supervised learning

two。 The Dimension of supervised Learning algorithm

3. The ability to learn algorithms-- VC Dimension

4. Determination of sample size of Learning algorithm-probability approximation

All right, to make a long story short, let's introduce supervised learning in machine learning.

I. an example of supervised learning

Starting with an example is the most easy to understand. For example, now we have a task to judge a "family car", specifically based on the two characteristics of car price and engine power. In practice, there may be more factors. Here we will only consider these two features for the sake of simplicity. The task of the algorithm is to judge whether a new sample is a "family car" through the learning of the training set. We can mark what is considered to be a family car as Positive Example and others as negative Negative Example. Class learning is to find a description that contains all positive examples but does not contain any negative examples.

The above formula describes our example, where the two components of vector x represent the price of the car and the power of the engine, while the vector r represents the output, 1 when it is a positive example and 0 when it is negative; the first set represents N sample training sets, each of which consists of a sample feature x and a standard judgment r. Our goal now is to find an algorithm that can find a classification method through the training set, which is suitable for all training sets (including all positive examples but not any negative examples). Then use this classification method to predict and judge new samples.

In the concrete implementation here, people often have to have a Hypothesis class first, for example, we can use a rectangular set (a family car that is falsely located in a certain price range and at the same time in a certain engine power range as a household car, that is, a discriminant), to contain all the positive examples, but not any negative examples. There may be multiple rectangles that meet this condition, so there is a smallest rectangle, that is, the most special hypothesis (most specific hypothesis), such as S, and a positive example will not be included in the hypothesis. There is also a most general hypothesis (most general hypothesis), such as G, and the larger hypothesis will contain one or more negative examples. So the hypothesis we are looking for should be between S and G. It is generally believed that you can choose between S and G, because this allows you to get a larger edge (margin), which is the distance between the boundary and its nearest instance.

Because there are many available hypotheses between S and G, but different hypotheses may make different predictions and judgments for new samples, this leads to the question of generalization, that is, the accuracy of our assumptions in the classification of future instances that are not in the training set.

Second, the dimension of supervised learning algorithm

To put it simply, supervised learning is to let the computer learn the rules and patterns between the data through the training set, and then classify and regress and predict. The representation of the training set is like the combination X above, in which the samples should be independent and identically distributed, for classification, the two kinds of learning outputs are 0 and 1, and K-type learning is a K-dimensional vector, in which only one component is 1. The rest of the components are 0, which means that any book can only belong to one category at most. For regression, the output is a real value. We can simply distinguish between classification and regression problems: the classification output is a discrete value, while the regression output is a continuous value. Next let's take a look at the dimension of supervised learning, that is, the basic steps of supervised learning.

1. Determine the hypothetical class, for example, suppose the function model G (XMague A), A represents a parameter vector, and x represents our sample input, we determine the best A through training set learning, so that the hypothesis can judge the new sample.

two。 There may be many assumptions that satisfy the training set, so we have to choose the most appropriate one. The standard is a loss function L (Loss Function). For example, L is the square difference or absolute value between x and G (x, A), which is used to indicate the difference between our hypothesis and the training set, and we seek the smallest one. Of course, there are other definitions of loss function, but the basic idea is to express the difference between hypothesis and training set data.

3. With the loss function L, then we enter the optimization process, even if L is minimized, there are many ways to achieve this step, such as finding the partial derivative of all the characteristic components of L to determine the minimum, or using gradient descent, simulated annealing and genetic algorithms.

The difference between different machine learning methods is either the assumption class is different (hypothetical model or inductive bias), or the loss function used is different, and the optimization process is different. It can be said that hypothetical model, loss measurement and optimization process are the three basic dimensions of machine learning.

Third, the ability to learn algorithms-- VC dimension

The ability to learn the algorithm is measured by the VC dimension, that is, the number of data points of a hypothetical class hash. Suppose that there are N data points in a data set, and for the judgment of positive and negative cases, there are different learning problems of 2 to the power of N. if for any of these learning problems, we can find a hypothesis h in the hypothesis class H that can separate the positive and negative cases, we will call the hypothesis class H hash these N points. Therefore, the VC dimension measures the learning ability of the hypothetical class.

4. Determination of sample size of learning algorithm-probability approximation

Probability approximation is mainly used for specific hypothetical classes to determine the minimum sample size needed to ensure a certain confidence rate of learning results, that is to say, if we want to achieve a better hypothesis, what is the minimum size of the training set? According to our expected confidence rate and different assumptions, we can calculate the minimum sample size of its probability approximation.

All right, that's all for today's basic concepts, and continue tomorrow!

Refer:

Introduction to Machine Learning, Ethen Alpaydin (Turkey), Machinery Industry Press

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.