How to understand the basic principles of python decision tree 04/15 Update SLTechnology News&Howtos

How to understand the basic principles of python decision tree

2025-04-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article to share with you is about how to understand the basic principles of python decision tree, Xiaobian feel quite practical, so share to everyone to learn, I hope you can read this article after some gains, not much to say, follow Xiaobian to see it.

Decision tree is a nonparametric supervised learning method, which is mainly used for classification and regression problems.

The decision tree model divides the feature space into finite disjoint sub-regions by a set of if then decision rules. For samples falling in the same sub-region, the decision tree model gives the same prediction value.

The hierarchical relationship between these if then decision rules forms a tree structure, called a decision tree. These disjoint subregions correspond to the leaf nodes of the tree structure one by one.

1. Overview of Decision Tree Principle

1, assumed space

The basic principle of decision tree algorithm is described from three aspects: hypothesis space, objective function and optimization algorithm.

The hypothesis space is our prior assumptions about the form of the model, and ultimately the model we obtain must conform to our prior assumptions about the form of the model.

The prior form of the decision tree model can be expressed as follows:

where q[x] is a function that maps from feature space to node numbering space. The key of decision tree model is to divide the feature space into disjoint sub-regions, and the samples falling in the same sub-region have the same prediction value.

In order to determine the complete structure of a decision tree, two aspects should be clear: one is how to divide sub-regions, the other is how many prediction values of sub-regions.

2, objective function

The objective function is what criteria we use to evaluate a model. The objective function determines our preference for choosing models from the hypothesis space.

The objective function of a decision tree can be used to evaluate the quality of a decision tree. This objective function should include two aspects. The first is a loss term reflecting the accuracy of the fit of the decision tree to the sample data points, and the second is a regularization term reflecting the complexity of the decision tree model.

The regularization term can take the number of leaf nodes of the model. That is, the more disjoint subregions the decision tree model is divided into, the more complex the model is considered.

For the loss term, if it is a regression problem, the loss term can be squared, and if it is a classification problem, we can use impurity as a measure.

Why impure? Since all samples on the same leaf node of the decision tree take the same prediction value, if the true label of these samples has only one value, then the samples on this leaf node are very "pure". We can directly specify the prediction value as the value of label on this leaf node, and the prediction error is 0. On the other hand, if the label values of different samples on the leaf node are very random, the so-called multi-mouth is difficult to adjust, then no matter how we specify the prediction values on the leaf node, there will always be a large prediction error.

So how do you measure impurity? There are generally three methods, information entropy, Gini impurity, and classification error rate. The classification error rate is the error rate when the category with the most label values is used as the leaf node prediction value. Information entropy and Gini impurity we'll talk about later.

3, optimization algorithm

Optimization algorithm refers to how to adjust the model structure or model hyperparameter values so that the objective function values of the model are continuously reduced.

The optimization algorithm determines what steps we use to find the right model in the hypothesis space.

For decision trees, optimization algorithms include tree generation strategy and tree pruning strategy.

Greedy idea is generally adopted in tree generation strategy to segment feature space by selecting features continuously.

Tree pruning strategies are generally divided into pre-pruning and post-pruning strategies. Generally speaking, the decision tree generated by post-pruning strategy has better effect, but its computational cost is also higher.

II. Comparison of ID3, C4.5, CART decision trees

1. Different scope of application

ID3 algorithm can only deal with discrete feature classification problem, C4.5 can deal with discrete feature and continuous feature classification problem, CART algorithm can deal with discrete and continuous feature classification and regression problem.

2. Different assumptions about space

ID3 and C4.5 algorithms use decision trees that can be multi-forked, while CART algorithm decision trees must be binary trees.

3, the difference in objective function

When dealing with classification problems, the three models use different criteria when deciding which feature to choose for decision tree splitting. ID3 algorithm uses information gain as standard, C4.5 algorithm uses information gain ratio as standard, and CART algorithm uses Gini impurity gain as standard.

4. Different optimization algorithms

The three algorithms have different pruning strategies.

ID3 algorithm actually has no pruning strategy, and the decision tree stops growing when all samples on leaf nodes belong to the same category or all features are used.

C4.5 algorithm uses pre-pruning strategy, when the gain after splitting is less than a given threshold or the number of samples on the leaves is less than a certain threshold or the number of leaf nodes reaches a limit or the depth of the tree reaches a limit, the decision tree stops growing.

CART decision tree mainly uses post-pruning strategy.

5. Differences in effectiveness

ID3 decision tree is the earliest decision tree, C4.5 is an improvement on it, CART decision tree is later, the effect is generally better than C4.5, C4.5 is better than ID3.

entropy, conditional entropy, information gain, information gain ratio

1, entropy

Entropy is a measure of the uncertainty of a discrete random variable. Since it is a response to uncertainty, our prior knowledge is that entropy is zero when the random variable has only one value, and that entropy is greater when the probability distribution between the probabilities is more even with more possibilities for the random variable. The formula for entropy satisfies these prior properties. Note that entropy can only measure uncertainty in discrete random variables.

In the application scenario of decision tree, we actually use empirical entropy to measure the "purity" of label value distribution, that is, frequency distribution instead of probability distribution for calculation.

2, conditional entropy

Conditional entropy is a measure of uncertainty of random event Y given the value of random variable X.

In the application scenario of decision tree, the meaning of conditional entropy is clearer, that is, the sample space is divided into multiple leaf nodes according to the value of discrete feature X, and the weighted average of entropy impurity of sample label Y value on each leaf node.

3, information gain

The information gain of a random variable X over a random variable Y is defined as the difference between the entropy of Y and the conditional entropy of Y over X.

In the application scenario of decision tree, the meaning of information gain is the contribution of feature X to the uncertainty reduction of sample label Y.

Information gain is also called mutual information. Mutual information has the property that Y's mutual information for X and X's mutual information for Y are equal. Mutual information is a commonly used measure of the correlation between two discrete random variables.

The simple proof is as follows:

4, information gain ratio

ID3 model uses information gain as the criterion for selecting features to be split, but information gain tends to select features with more feature values. C4.5 This tendency can be avoided by using information gain ratio as a criterion for selecting features to be split. It is worth noting that C4.5 still uses information gain as the selection criterion when selecting split points for continuous features.

The information gain ratio of X to Y is the ratio of the information gain of X to Y to the entropy of X.

IV. Gini Impurity and Gini Impurity Gain

1, Gini impurity

Gini impurity and entropy have similar effects and can measure the uncertainty or "impurity" of a random variable value. It satisfies our prior expectation that the Gini impurity is 0 when there is only one possible value for the random variable, and the more possible values for the random variable, the more even the probability distribution of the values, the greater the Gini impurity.

Gini impurity is defined as follows.

Gini impurity satisfies our prior assumptions about uncertainty measures. In fact, Gini impurity is closely related to entropy. Taylor expansion of logarithmic part of entropy to order 1 gives the definition formula of Gini impurity.

2, Gini impurity gain

Gini impurity gain and information gain work very similarly. The calculations are very similar.

It is worth noting that CART decision trees are binary trees. When calculating the Gini impurity gain of discrete features, they try to divide the feature space into two parts according to whether the features take a particular class, while when calculating the Gini impurity gain of continuous features, they try to choose a split point to divide the feature space into two parts.

The above is how to understand the basic principles of Python decision tree, Xiaobian believes that some knowledge points may be seen or used in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.