What are the parameters of sklearn random forest 04/18 Update SLTechnology News&Howtos

What are the parameters of sklearn random forest

2025-04-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces "what are the parameters of sklearn random forest". In daily operation, I believe many people have doubts about what are the parameters of sklearn random forest. Xiaobian consulted all kinds of information and sorted out simple and easy operation methods. I hope to help you answer the doubts of "what are the parameters of sklearn random forest"! Next, please follow the small series to learn together!

random forest

Random forest is a meta-estimator that fits multiple decision tree classifiers on individual subsamples of a dataset and uses averages to improve prediction accuracy and control overfitting. The subsample size is always the same as the original input sample size, but if bootstrap = True (default), the sample is drawn using substitution.

Let's look at the parameters of this class:

class sklearn.ensemble.RandomForestClassifier(n_estimators=10, criterion='gini', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, bootstrap=True, oob_score=False, n_jobs=1, random_state=None, verbose=0, warm_start=False, class_weight=None)

Code example:

from sklearn.ensemble import RandomForestClassifier

from sklearn.datasets import make_classification

X, y = make_classification(n_samples=1000, n_features=4,

n_informative=2, n_redundant=0,

random_state=0, shuffle=False)

clf = RandomForestClassifier(max_depth=2, random_state=0)

clf.fit(X, y)

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',

max_depth=2, max_features='auto', max_leaf_nodes=None,

min_impurity_decrease=0.0, min_impurity_split=None,

min_samples_leaf=1, min_samples_split=2,

min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,

oob_score=False, random_state=0, verbose=0, warm_start=False)

print(clf.feature_importances_)

[ 0.17287856 0.80608704 0.01884792 0.00218648]

print(clf.predict([[0, 0, 0, 0]]))

[1]

The specific parameters are as follows:

Parameters:

n_estimators : Number of (decision) trees in the forest

integer, optional (default=10) integer, optional (default = 10)

criterion : string, optional (default="gini") String, optional (default is "gini").

Performance (function) that measures the quality of splitting. Supported criteria are "gini" for Gini impurity, and "entropy" for information gain.

Note: This parameter is tree specific.

max_features : int, float, string or None, optional (default="auto") integer, float, string or None, optional (default ="auto")

Number of features to consider when finding the best segmentation:

& If int, consider max_feature at each split

If & is float, then max_features is a percentage, then (max_feature*n_features) feature integer values are considered at each partition.

If auto, max_features=sqrt(n_features), which is the square root of n_features.

& If log2, then max_features= log2 (n_features)

& If None, then max_features=n_features

Note: The search for split points does not stop until at least one valid node partition is found, even if it requires valid inspection of features exceeding max_features.

max_depth : integer or None, optional (default = None)

The maximum depth of the tree. If the value is None, the node is expanded until all leaves are pure, or until all leaves contain fewer samples than min_sample_split.

min_samples_split : int, float, optional (default=2) integer, float, optional (default = 2)

Minimum number of samples required to segment internal nodes:

~ If int, then consider min_samples_split as the smallest number.

~ If float, then min_samples_split is a percentage, and let ceiling (min_samples_split*n_samples) be the minimum number of samples per split.

Changed in version 0.18: Added floating point values for percentages.

min_samples_leaf : int, float, optional (default=1) integer, float, optional (default = 1)

Minimum number of samples required on leaf nodes:

~ If int, then consider min_samples_leaf as the smallest number.

~ If float, min_samples_leaf is a percentage and ceiling (min_samples_leaf*n_samples) is the minimum number of samples per node.

Changed in version 0.18: Added floating point values for percentages.

min_weight_fraction_leaf : float, optional (default=0.) Floating point number, optional (default is 0.0)

The minimum weighted score of the sum of the weights required for a leaf node (all input samples). When sample_weight is not provided, samples have the same weight

max_leaf_nodes : int or None, optional (default=None) Integer or None, optional (default = None)

Use max_leaf_nodes to grow trees optimally. The best node is defined as the relative reduction in impurity. If None, then the number of leaf nodes is not limited.

min_impurity_split : float, float

Threshold for early tree growth. If the impurity of a node exceeds a threshold then the node will split, otherwise it is still a leaf.

min_impurity_decrease : float, optional (default=0.) Floating point number, optional (default is 0)

bootstrap : boolean, optional (default=True) Boolean, optional (default = True) Whether to use sampling with fallback when building decision trees.

oob_score : bool (default=False) bool,(default is False) Whether to use out-of-bag samples to estimate generalization accuracy.

n_jobs : integer, optional (default=1) Integer, optional (default = 1) Number of jobs (jobs) running in parallel for fitting and prediction. If the value is-1, then the number of jobs is set to the number of cores.

random_state : int, RandomState instance or None, optional (default=None) integer, RandomState instance, or None, optional (default = None) RandomStateIf int, random_state is the seed used by the random number generator; if Random State instance, random_state is the random number generator; if None, random number generator is the RandomState instance used by np.random.

verbose : int, optional (default=0) Integer, optional (default = 0) Controls the redundancy of the decision tree building process.

warm_start : bool, optional (default=False) Boolean, optional (default = False) When set to True, reuses the previously called solution to fit the population and add more estimators, otherwise, just to fit a completely new forest.

class_weight : dict, list of dictionaries, "balanced", dictionary, dictionary sequence,"balanced"

Attribute:

estimators_ : list of DecisionTreeClassifier The sequence of decision tree classifiers, the set of fitted subestimators.

classes_ : array of shape = [n_classes] or a list of such arrays Array of dimensions =[n_classes] or a list of such arrays. Category labels (single-output problems), or array sequences of category labels (multiple-output problems).

n_classes_ : int or list integer or sequence, number of classes (single output problem), or a sequence containing number of classes per output (multiple output problem)

n_features_ : int integer, number of features when fitting is performed

n_outputs_ : int integer, the number of outputs when fitting is performed.

feature_importances_ : array of shape = [n_features] Array of dimensions equal to n_features, importance of features (higher values, more important features)

oob_score_ : float Floating point number, the score of the training dataset obtained using out-of-bag estimation.

oob_decision_function_ : array of shape = [n_samples, n_classes] Array of dimensions =[n_samples,n_classes], decision function computed with out-of-bag estimation on training set. If n_estimators are small, then it is possible that a data point will not be ignored in the sample with replacement. In this case, oob_decision_function_may include NaN.

Attention:

The default values of the parameters control the size of the decision tree (e.g. max_depth, min_samples_leaf, etc.), resulting in complete growth and possibly very large unpruned trees on some datasets. To reduce content consumption, the complexity and size of the decision tree should be controlled by setting the values of these parameters.

These features are always randomly arranged in each partition. Thus, even if the same training data is used, max_features = n_features and bootstrap = False, the best segmentation points found may differ if the improvements of several segmentation criteria enumerated during the search for the best segmentation are the same. In order to obtain a definite behavior during fitting, random_state will have to be modified.

Methods:

apply(X) Apply trees in the forest to X, return leaf indices.

decision_path(X) Return the decision path in the forest

fit(X, y[, sample_weight]) Build a forest of trees from the training set (X, y).

get_params([deep]) Get parameters for this estimator.

predict(X) Predict class for X.

predict_log_proba(X) Predict class log-probabilities for X.

predict_proba(X) Predict class probabilities for X.

score(X, y[, sample_weight]) Returns the mean accuracy on the given test data and labels.

set_params(**params) Set the parameters of this estimator.

At this point, the study of "what are the parameters of sklearn random forest" is over, hoping to solve everyone's doubts. Theory and practice can better match to help everyone learn, go and try it! If you want to continue learning more relevant knowledge, please continue to pay attention to the website, Xiaobian will continue to strive to bring more practical articles for everyone!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.