In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly introduces "what is the LightGBM parameter Bayesian global optimization method". In the daily operation, I believe many people have doubts about what the LightGBM parameter Bayesian global optimization method is. The editor consulted all kinds of data and sorted out a simple and useful operation method. I hope it will be helpful to answer the doubt of "what is the LightGBM parameter Bayesian global optimization method?" Next, please follow the editor to study!
Here, combined with a dataset of the Kaggle competition, record the steps of using Bayesian global optimization and Gaussian process to find the best parameters.
1. Install Bayesian global optimization library
Install the latest version from pip
Pip install bayesian-optimization
two。 Load dataset
Import pandas as pd
Import numpy as np
From sklearn.model_selection import StratifiedKFold
From scipy.stats import rankdata
From sklearn import metrics
Import lightgbm as lgb
Import warnings
Import gc
Pd.set_option ('display.max_columns', 200)
Train_df = pd.read_csv ('.. / input/train.csv')
Test_df = pd.read_csv ('.. / input/test.csv')
Distribution of target variables
Target = 'target'
Predictors = train_df.columns.values.tolist () [2:]
Train_df.target.value_counts ()
The problem is imbalance. 50% hierarchical rows are used as hold rows so that the validation set gets the best parameters. A 50% discount cross-validation will be used later in the final model fit.
Bayesian_tr_index, bayesian_val_index = list (StratifiedKFold (n_splits=2)
Shuffle=True, random_state=1) .split (train_df, train_df.target.values)) [0]
These bayesian_tr_index and bayesian_val_index indexes will be used for Bayesian optimization as indexes for training and validating datasets.
3. Black box function optimization (LightGBM)
When loading data, create a black box function for LightGBM to find parameters.
Def LGB_bayesian (
Num_leaves, # int
Min_data_in_leaf, # int
Learning_rate
Min_sum_hessian_in_leaf, # int
Feature_fraction
Lambda_l1
Lambda_l2
Min_gain_to_split
Max_depth):
# LightGBM expects next three parameters need to be integer. So we make them integer
Num_leaves = int (num_leaves)
Min_data_in_leaf = int (min_data_in_leaf)
Max_depth = int (max_depth)
Assert type (num_leaves) = = int
Assert type (min_data_in_leaf) = = int
Assert type (max_depth) = = int
Param = {
'num_leaves': num_leaves
'max_bin': 63
'min_data_in_leaf': min_data_in_leaf
'learning_rate': learning_rate
'min_sum_hessian_in_leaf': min_sum_hessian_in_leaf
'bagging_fraction': 1. 0
'bagging_freq': 5
'feature_fraction': feature_fraction
'lambda_l1': lambda_l1
'lambda_l2': lambda_l2
'min_gain_to_split': min_gain_to_split
'max_depth': max_depth
'save_binary': True
'seed': 1337
'feature_fraction_seed': 1337
'bagging_seed': 1337
'drop_seed': 1337
'data_random_seed': 1337
'objective': 'binary'
'boosting_type': 'gbdt'
'verbose': 1
'metric': 'auc'
'is_unbalance': True
'boost_from_average': False
}
Xg_train = lgb.Dataset (train_ df.iloc [Bayesian _ tr_index] [predictors] .values
Label=train_ df.iloc[bayesian _ tr_index] [target]. Values
Feature_name=predictors
Free_raw_data = False
)
Xg_valid = lgb.Dataset (train_ df.iloc [Bayesian _ val_index] [predictors] .values
Label=train_ df.iloc[bayesian _ val_index] [target]. Values
Feature_name=predictors
Free_raw_data = False
)
Num_round = 5000
Clf = lgb.train (param, xg_train, num_round, valid_sets = [xg_valid], verbose_eval=250, early_stopping_rounds = 50)
Predictions = clf.predict (train_ df.iloc [Bayesian _ val_index] [predictors] .values, num_iteration=clf.best_iteration)
Score = metrics.roc_auc_score (train_ df.iloc [Bayesian _ val_index] [target] .values, predictions)
Return score
The above LGB_bayesian function will be used as a black box function for Bayesian optimization. I have defined the trainng and validation datasets for LightGBM in the LGB_bayesian function.
The LGB_bayesian function gets the value of num_leaves,min_data_in_leaf,learning_rate,min_sum_hessian_in_leaf,feature_fraction,lambda_l1,lambda_l2,min_gain_to_split,max_depth from the Bayesian optimization framework. Keep in mind that it should be an integer for LightGBM,num_leaves,min_data_in_leaf and max_depth. But Bayesian optimization sends continuous functions. So I force them to be integers. I will only find their best parameter values. Readers can increase or decrease the number of parameters to be optimized.
Now you need to provide boundaries for these parameters so that Bayesian optimization searches only within boundaries.
Bounds_LGB = {
'num_leaves': (5,20)
'min_data_in_leaf': (5,20)
'learning_rate': (0.01,0.3)
'min_sum_hessian_in_leaf': (0.00001, 0.01)
'feature_fraction': (0.05,0.5)
'lambda_l1': (0,5.0)
'lambda_l2': (0,5.0)
'min_gain_to_split': (0,1.0)
'max_depth': (3pr 15)
}
Let's put them all in the BayesianOptimization object
From bayes_opt import BayesianOptimization
LGB_BO = BayesianOptimization (LGB_bayesian, bounds_LGB, random_state=13)
Now, let's optimize key space (parameters):
Print (LGB_BO.space.keys)
I created the BayesianOptimization object (LGB_BO), which doesn't work until maxime is called. Before calling, explain the two parameters of the Bayesian optimization object (LGB_BO), which we can pass to maximize:
Init_points: the initial random run number of random explorations we want to perform. In our example, LGB_bayesian will be run n_iter times.
N_iter: how many Bayesian optimization runs do we have to perform after running the init_points number?
Now, it's time to call functions from the Bayesian optimization framework to maximize. I allow the LGB_BO object to run five init_points and five n_iter.
Init_points = 5
N_iter = 5
Print ('- * 130)
With warnings.catch_warnings ():
Warnings.filterwarnings ('ignore')
LGB_BO.maximize (init_points=init_points, n_iter=n_iter, acq='ucb', xi=0.0, alpha=1e-6)
After the optimization is complete, let's see what the maximum value we get is.
LGB_BO.max ['target']
The validation AUC for the parameter is 0.89. let's look at the parameter:
LGB_BO.max ['params']
Now we can use these parameters for our final model!
There is also a cool option in the BayesianOptimization library. You can probe the LGB_bayesian function if you know anything about the best parameters, or if you get the parameters from another kernel. I will copy and paste parameters from other kernels here. You can probe in the following ways:
LGB_BO.probe (
Params= {'feature_fraction': 0.1403
'lambda_l1': 4.218
'lambda_l2': 1.734
'learning_rate': 0.07
'max_depth': 14
'min_data_in_leaf': 17
'min_gain_to_split': 0.1501
'min_sum_hessian_in_leaf': 0.000446
'num_leaves': 6}
Lazy=True, #
)
Well, by default these will be explored lazily (lazy = True), which means that these points will only be evaluated the next time you call maxime. Let's make a maximized call to the LGB_BO object.
LGB_BO.maximize (init_points=0, n_iter=0) # remember no init_points or n_iter
Finally, you can get a list of all the probed parameters and their corresponding target values through the attribute LGB_BO.res.
For I, res in enumerate (LGB_BO.res):
Print ("Iteration {}:\ n\ t {}" .format (I, res))
We got better verification scores in the survey! As before, I only ran LGB_BO 10 times. In practice, I increased it to 100.
LGB_BO.max ['target']
LGB_BO.max ['params']
Let's build a model together and use these parameters.
4. Training LightGBM model
Param_lgb = {
'num_leaves': int (LGB_BO.max [' params'] ['num_leaves']), # remember to int here
'max_bin': 63
'min_data_in_leaf': int (LGB_BO.max [' params'] ['min_data_in_leaf']), # remember to int here
'learning_rate': LGB_BO.max [' params'] ['learning_rate']
'min_sum_hessian_in_leaf': LGB_BO.max [' params'] ['min_sum_hessian_in_leaf']
'bagging_fraction': 1. 0
'bagging_freq': 5
'feature_fraction': LGB_BO.max [' params'] ['feature_fraction']
'lambda_l1': LGB_BO.max [' params'] ['lambda_l1']
'lambda_l2': LGB_BO.max [' params'] ['lambda_l2']
'min_gain_to_split': LGB_BO.max [' params'] ['min_gain_to_split']
'max_depth': int (LGB_BO.max [' params'] ['max_depth']), # remember to int here
'save_binary': True
'seed': 1337
'feature_fraction_seed': 1337
'bagging_seed': 1337
'drop_seed': 1337
'data_random_seed': 1337
'objective': 'binary'
'boosting_type': 'gbdt'
'verbose': 1
'metric': 'auc'
'is_unbalance': True
'boost_from_average': False
}
As you can see, I save the best parameters of LGB_BO to the param_lgb dictionary, which will be used to train models with a 50% discount.
Quantity of Kfolds: http://www.87554006.com/ of Wuxi Gynecological examination Hospital
Nfold = 5
Gc.collect ()
Skf = StratifiedKFold (n_splits=nfold, shuffle=True, random_state=2019)
Oof = np.zeros (len (train_df))
Predictions = np.zeros ((len (test_df), nfold))
I = 1
For train_index, valid_index in skf.split (train_df, train_df.target.values):
Print ("\ nfold {}" .format (I))
Xg_train = lgb.Dataset (train_ df.iloc [train _ index] [predictors]. Values
Label=train_ df.iloc[train _ index] [target]. Values
Feature_name=predictors
Free_raw_data = False
)
Xg_valid = lgb.Dataset (train_ df.iloc.valid _ index] [predictors] .values
Label=train_ df.iloc[valid _ index] [target]. Values
Feature_name=predictors
Free_raw_data = False
)
Clf = lgb.train (param_lgb, xg_train, 5000, valid_sets = [xg_valid], verbose_eval=250, early_stopping_rounds = 50)
Of [valid _ index] = clf.predict (train_ df.iloc [valid _ index] [predictors] .values, num_iteration=clf.best_iteration)
Predictions [:, iMur1] + = clf.predict (test_df [predictors], num_iteration=clf.best_iteration)
I = I + 1
Print ("\ n\ nCV AUC: {:
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.