In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces "what are the basic terms of machine learning". In daily operation, I believe many people have doubts about the basic terms of machine learning. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the questions about "what are the basic terms of machine learning?" Next, please follow the editor to study!
I. data set
To carry out machine learning, we must first have data. A dataset is a set of descriptions of research objects.
If each alarm message is a dataset, each alarm message is a sample.
2. Sample
A sample is also called an example, and a set of samples forms a dataset.
III. Attributes
The sample will have many attributes (or features), such as alarm message trigger, grouping, classification, etc., and the corresponding value of the attribute is called the attribute value.
IV. Attribute space
Attribute space is also called sample space, or input space.
If the alarm message has four attributes: trigger, grouping, classification and alarm level, each attribute has an axis, then the four attributes can form a four-dimensional space, and each alarm message can find its own coordinate location in this space.
5. Feature vector
Coordinate vector in space for each example
VI. Input space
A collection of all the features of a sample
7.
D = {x1 →, x2 →, ⋯, xm →} D = {x1 →, x2 →, ⋯, xm →}: dataset containing m samples
Xi → = (xi1;xi2; ⋯; xid) xi → = (xi1;xi2; ⋯; xid): a vector in the dd dimensional sample space χ, xi →∈ χ
The value of xijxij:xi → xi → on the jj attribute, which may be shown in X X → later.
"Dimension dimensionlity" of dd:xi → xi →
8.
Learning learning/ and training training: the process of learning models from data
Training data training data: data used during training
Training sample training sample: each sample in the training
Suppose hypothesis: the learning model corresponds to some potential law about the data.
Truth / Real ground-truth: latent laws themselves
Learner learner: model
Predictive prediction: obtaining "result" information of training samples
Tag label: information about the sample result
Sample example: a sample with tag information
(xi,yi) (xi,yi): the ii sample, yi ∈ Y tags yi ∈ Y → is the tag of the sample xixi, and Y tags → is a collection of all tags
Tag space label space/ output space: a collection of all tags
Supervise the learning of supervised learning
Classified classification: learning tasks in which the predicted results are discrete values
Regression regression: a learning task in which the predicted results are continuous values
Two categories binary calssification: involving two categories
Positive class positive class and anti-class negative class: two categories in "two categories"
Multi-category multi-class classification: involving multiple categories
Prediction task: learn from the training set {(x1 →, y1), (x2 →, y2), ⋯, (xm →, ym)} {(x1 →, y1), (x2 →, y2), ⋯, (xm →, ym)}, and establish a mapping from the input space X → to the output space Y →. Usually let Y → = {− 1 record1} Y → = {− 1 precinct 1} or {0Pert 1} {0jue 1} For multi-category tasks, | Y categories | > 2 | Y → | > 2; for regression tasks, | Y categories | = R | Y → | = RenerRR is a set of real numbers
Testing testing: the process of predicting learned models
Test sample testing sample: for the predicted sample, for example, after learning ff, for the test case x examples x →, the prediction marker YAFF (x) ytransif (x) can be obtained.
Unsupervised learning unsupervised learning
Clustering clustering: divide the watermelons in the training set into several groups
Cluster cluster: each group in "clustering". Each cluster may correspond to some potential conceptual divisions that we don't know in advance.
Advanced level
Generalized generalization: the ability to learn models that can be applied to new samples
Distribution distribution DD: it is usually assumed that all in the sample space obey an unknown "distribution"
Independent and uniformly distributed independent and identically distributed i.i.d.i.i.d.each sample is sampled independently from this distribution.
Hypothetical space
Two basic methods of Scientific reasoning: inductive induction and deductive deduction
Induction: the process of "generalizing generalization" from special to general, that is, summing up general rules from specific facts.
Deduction: the process of "specialized specialization" from general to special, that is, deducing the specific situation from the basic principles.
Inductive Learning inductive learning: learning from samples
Generalized inductive learning: equivalent to learning from examples
Narrow inductive learning: learning concepts from training data, so it is also called "concept learning" or "concept formation".
Boolean concept learning: learning the target concept of "yes" and "no", which can be expressed as a Boolean value of 0 ram 1
Learning process: the process of searching in a space of all assumptions, the goal of which is to find assumptions that match the training set "fit"
Hypothesis space: a set that can judge a sample in a training set as a correct hypothesis.
Version space version space: a "hypothesis set" consistent with the training set
Inductive preference
There may be multiple hypotheses in version space that correspond to a sample in the training set, but multiple hypotheses may have different outputs, so which model (or hypothesis) should be used?
Inductive preference inductive bias: the algorithm's preference for certain types of assumptions in the learning process
If there is no inductive preference: learn to predict the model from time to time and tell us that it is good and sometimes tell us that it is bad.
Occam razor Ocam's razor: if there are multiple assumptions consistent with observation, choose the simplest one
No free lunch Theorem No Free Lunch Theorem: no matter how smart the learning algorithm aa is and how clumsy the learning algorithm bb is, their expected performance will be the same.
NFL Theorem premise: all "problems" have the same opportunity, or all problems are equally important.
The most important implication of NFL's theorem: it makes no sense to talk about "what learning algorithm is better", because if you consider all the potential problems, the learning algorithm is equally good.
Noise: the emergence of data that should not be present, such as the same attributes to get different classifications
The main content of machine learning research: the algorithm of generating "model model" from data in computer, that is, "learning algorithm learning algorithm".
Computer science studies "algorithms"; machine learning studies "learning algorithms".
Most of the time, whether the inductive preference of the algorithm matches the problem itself directly determines whether the algorithm can achieve good performance.
At this point, the study of "what are the basic terms of machine learning" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.