In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article introduces the relevant knowledge of "what is the decision tree Decision Tree". In the operation of actual cases, many people will encounter such a dilemma. Then let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
Decision tree (Decision Tree) is a decision analysis method, which is based on the known probability of occurrence of various situations, by constructing a decision tree to obtain the probability that the expected value of the net present value is greater than or equal to zero, to evaluate the project risk and to judge its feasibility. It is a graphic method of intuitively using probability analysis. Because this decision branch is graphically like the branches of a tree, it is called a decision tree. In machine learning, decision tree is a predictive model, which represents a mapping relationship between object attributes and object values, and it is a kind of supervised learning.
I. decision tree model
First of all, what is the decision tree? A decision tree is a tree structure similar to a flowchart: each internal node (branch node / branch node) represents a feature or attribute, and each leaf node represents a classification.
The main problem in the process of decision tree growth is that it is subjective to select branch nodes. Solution: to solve the problem of subjective judgment by using information entropy or information gain, we only need to calculate information entropy or information gain to reorder the process of correct classification.
The meaning of information gain: the change of information before and after dividing the data set.
Entropy: in physics, it refers to the uniform distribution of energy of an object. Information entropy: a measure of the uncertainty of information: formula: h (x) =-sum (plog (p)). The smaller the information entropy, the smaller the uncertainty, the greater the certainty, and the higher the purity of the information. H (D) is the entropy of dataset D, and the formula is as follows:
Ck is the number of k classes that appear in dataset D, and N is the number of samples and the total number of categories. H (D | A) is the conditional entropy of feature A pair and data set D, which means the distribution of Y in the subset Di. The method of calculation is:
GainA (information gain of A) = H_All (total information entropy)-H (A) (information entropy based on A node) selection of branch nodes in the decision tree: the larger the information gain is, the smaller the information entropy is, the smaller the information uncertainty is, the greater the certainty is, the higher the purity is. The formula of information gain after synthesis:
The information gain ratio gR (DMaga) of feature A to training set D is defined as
HA (D) describes the distinguishing ability of feature A to the training set D, and the information gain rate is improved because the information gain is biased towards more feature values, so the information gain rate is used to further divide the decision tree.
The above decision algorithms: ID3 algorithm-information gain, C4.5 algorithm-information gain rate. Decision tree pruning strategy: pruning first and then pruning, which is used to solve the over-fitting problem.
II. ID3 and C4.5 partitioning strategy
The partition idea of ID3 and C4.5 algorithm: the branch nodes of the decision tree are selected according to the information gain or information gain rate, and the tree is built recursively.
The basic steps for building a decision tree:
(1) if all attributes are used for division, it ends directly.
(2) calculate the information gain or information gain rate of all features, and select the features corresponding to the larger information gain (such as node a) for classification.
(3) if the use of node an as the partition node is not completed, then the decision tree is established by using the information gain of other feature nodes except node a. (recursively build decision tree)
The conditions under which the decision tree stops growing:
If all the attributes are used for partition, it ends directly; if there are no nodes to be divided, use majority voting
If all the samples have been classified, end directly.
Define the maximum impurity to measure
Define the number of leaf nodes
Defines the number of samples contained in the branch node.
Three. Decision tree pruning
The decision tree is a complex tree generated by fully considering all the data points, and it is possible that it has been fitted. The more complex the decision tree is, the higher the degree of over-fitting will be. The process of building a decision tree is a recursive cross-layer, so the stop condition must be determined, otherwise the process will not stop and the tree will keep growing.
Pruning first: end the growth of the decision tree ahead of time. Pre-pruning reduces the risk of over-fitting and reduces the training time and testing time of the decision tree. It brings the risk of underfitting.
Post-pruning: refers to the process of pruning after the decision tree has grown. The generalization performance of minimum error pruning (MEP), pessimistic error pruning (MEP) and cost complexity pruning (CCP) is often better than that of pre-pruned decision trees, and the training time cost is much higher than that of unpruned decision trees and pre-pruned decision trees.
Summary:
The advantage of using decision tree for classification is very intuitive, easy to understand, and efficient execution, which only needs to be built once and can be used repeatedly. However, it is more effective for small data sets, and the effect is not good when dealing with continuous variables, it is difficult to predict continuous fields, and errors increase faster when there are more categories.
This is the end of the content of "what is the decision Tree Decision Tree". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.