What is the method of hyperparameter optimization in machine learning 07/19 Update SLTechnology News&Howtos

What is the method of hyperparameter optimization in machine learning

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "what is the method of hyperparameter optimization in machine learning". The content of the explanation in this article is simple and clear, and it is easy to learn and understand. let's study and learn "what is the method of hyper-parameter optimization in machine learning"?

Introduction

The machine learning model consists of two different types of parameters:

Hyperparameters = are all parameters that the user can set arbitrarily before starting training (for example, estimators in Random Forest).

Instead, model parameters are learned in the process of model training (for example, weights in neural networks, linear regression).

The model parameters define how to use the input data to obtain the desired output and learn during training. On the contrary, the hyperparameters first determine the structure of our model.

Machine learning model adjustment is an optimization problem. We have a set of superparameters, and our goal is to find the correct combination of their values, which can help us find the minimum value (for example, loss) or maximum value (for example, precision) of the function.

This is especially important when comparing how different machine learning models perform on datasets. In fact, for example, it would be unfair to compare the SVM model with the best hyperparameter with the unoptimized stochastic forest model.

In this article, the following hyperparametric optimization methods are described:

Manual search

Random search

Grid search

Automatic hyperparameter adjustment (Bayesian optimization, genetic algorithm)

Artificial neural network (ANN) adjustment

To demonstrate how to perform hyperparametric optimization in Python, I decided to perform a complete data analysis on the credit card fraud detection Kaggle dataset. The purpose of this article is to correctly classify which credit card transactions should be marked as fraudulent or genuine (binary classification). The dataset is anonymized before it is distributed, so the meaning of most features has not yet been disclosed.

In this case, I decided to use only a subset of the dataset to speed up the training time and ensure a perfect balance between two different classes. In addition, only a small number of features are used to make the optimization task more challenging. The final dataset is shown in the following figure (figure 2).

Machine learning

First, we need to divide the dataset into a training set and a test set.

In this paper, we will use random forest classifier as a model for optimization.

The random forest model is formed by a large number of unrelated decision trees, which form a whole. In the random forest, each decision tree makes its own prediction, and the overall model output is selected as the most common prediction.

Now, we can start by calculating the accuracy of the basic model.

Using the random forest classifier with the default scikit-learn parameter gives an overall accuracy of 95%. Now let's see if the accuracy can be improved by applying some optimization techniques.

Manual search

When using Manual search, we will select some model hyperparameters based on our judgment / experience. Then we train the model, evaluate the accuracy of the model and restart the process. Repeat the cycle until satisfactory accuracy is achieved.

The main parameters used by the random forest classifier are:

Standard = function used to evaluate the quality of the division.

Max_depth = the maximum number of levels allowed per tree.

Max_features = the maximum number of features to consider when splitting the node.

Min_samples_leaf = the minimum number of samples that can be stored in the leaves.

Min_samples_split = the minimum number of samples needed in the node to cause the node to split.

N_estimators = number of integrated trees.

You can find more information about random forest parameters in the scikit-learn documentation.

As an example of a manual search, I try to specify the estimators in the model. Unfortunately, this has not led to an improvement in accuracy.

Random search

In random search, we create hyperparametric grids and train / test the model only based on some random combinations of these hyperparameters. In this example, I separately decided to perform cross-validation on the training set.

When performing machine learning tasks, we usually divide data sets into training sets and test sets. This is done to test our model after training (in this way, we can check its performance while working with invisible data). When using cross-validation, we divide the training set into the other N partitions to ensure that our model does not over-fit our data.

One of the most commonly used cross-validation methods is K-fold verification. In K-Fold, we divided the training set into N partitions, then iteratively trained the model using NMel 1 partition and tested with the remaining partitions (we changed the remaining partitions in each iteration). Once the model has been trained for N times, we can average the training results of each iteration and obtain the overall training results.

It is important to use cross-validation when implementing hyperparameter optimization. In this way, we may avoid using hyperparameters that are very effective for training data but not very good for test data.

Now we can start the random search by defining a hyperparametric grid that will be randomly sampled when RandomizedSearchCV () is called. For this example, I decided to divide the training set into a 60% discount (cv = 4) and select 80 as the number of combinations to sample (n_iter = 80). Then, using the scikit-learn best_estimator_ attribute, we can retrieve the set of hyperparameters that perform best during training to test our model.

After training the model, we can visually change how some of its hyperparameters affect the accuracy of the overall model (figure 4). In this case, I decided to observe how the number of changes in estimators and criteria affected the accuracy of our random forests.

We can then take this step a step further by making the visualization more interactive. In the following chart, we can examine (using the slider) how changing the number of estimators affects the overall accuracy of the model when considering the estimated min_split and min_leaf parameters in the model.

Now, we can use random search to evaluate the performance of the model. In this case, compared with our basic model, the use of random search will lead to continuous improvement in accuracy.

Grid search

In grid search, we build a hyperparametric grid and train / test our model on every possible combination.

To select the parameters to use in Grid Search, we can now see which parameters work best with Random Search and form a grid based on those parameters to see if a better combination can be found.

You can use the scikit-learn GridSearchCV () function to implement grid search in Python. Also in this case, I decided to divide the training set into a 60% discount (cv = 4).

When using a grid search, all possible parameter combinations in the grid are tried. In this case, 128000 combinations (2 × 10 × 4 × 4 × 10) will be used during the training period. In contrast, in the previous Grid search example, only 80 combinations were used.

Compared with random search, grid search is slower, but because it can traverse the entire search space, it is generally more efficient. Instead, random searches can be faster and faster, but may miss some important points in the search space.

Automatic hyperparameter adjustment

When using automatic hyperparameter adjustment, the following techniques are used to identify the model hyperparameters to be used: Bayesian optimization, gradient descent, and evolutionary algorithms.

Bayesian optimization

Bayesian optimization can be performed in Python using the Hyperopt library. Bayesian optimization uses probability to find the minimum value of a function. The ultimate goal is to find the input value of the function, which can provide us with the lowest possible output value.

Bayesian optimization has been shown to be more effective than random, grid or manual search. Therefore, Bayesian optimization can bring better performance during the test phase and reduce optimization time.

In Hyperopt, Bayesian optimization can be implemented, providing three main parameters for the function fmin ().

Objective function = defines the loss function to minimize.

Domain space = defines the range of input values to be tested (in Bayesian optimization, this space creates a probability distribution for each hyperparameter used).

Optimization algorithm = defines the search algorithm used to select the best input value to use in each new iteration.

In addition, you can define the maximum number of evaluations to be performed in fmin ().

Bayesian optimization can select input values by considering past results, thus reducing the number of search iterations. In this way, we can focus our search on values closer to the desired output from the start.

Now we can run the Bayesian optimizer using the fmin () function. Start by creating a Trials () object to later visualize what is happening while the fmin () function is running (for example, how the loss function changes and how Hyperparameters is used).

Now we can retrieve the best parameter set identified and test the model using the best dictionary created during the training process. Some parameters have been digitally stored in the best dictionary using indexes, so we need to convert them back to strings before entering them into the random forest.

The classification report using Bayesian optimization is shown below.

Genetic algorithm.

Genetic algorithm attempts to apply natural selection mechanism to machine learning environment. They are inspired by Darwin's natural selection process, so they are often called evolutionary algorithms.

Suppose we create N machine learning models with some predefined hyperparameters. We can then calculate the accuracy of each model and decide to keep only half of the models (the best performing models). Now, we can generate offspring with hyperparameters similar to the best model, in order to obtain the population of N models again. At this point, we can calculate the accuracy of each model again and repeat the cycle in the defined generation. In this way, only the best model can survive at the end of the process.

In order to implement genetic algorithm in Python, we can use TPOT automatic machine learning library. TPOT is built on the scikit-learn library and can be used for regression or classification tasks.

The following code snippet shows the training report and the best parameters determined using the genetic algorithm.

The overall accuracy of our stochastic forest genetic algorithm optimization model is shown below.

Artificial neural network (ANN) adjustment

Using the KerasClassifier wrapper, you can apply grid search and random search to the deep learning model as you would with the scikit-learn machine learning model. In the following example, we will try to optimize some ANN parameters, such as how many neurons are used in each layer, and which activation function and optimizer to use. More examples of deep learning hyperparametric optimization are provided here.

The overall accuracy of the score using our artificial neural network (ANN) can be seen below.

Appraise

Now we can compare the performance of all the different optimization techniques in this given exercise. In general, random search and evolutionary algorithms work best.

The results obtained are highly dependent on the selected grid space and the dataset used. Therefore, in different situations, different optimization techniques will perform better than other technologies.

Thank you for your reading. the above is the content of "what is the method of super-parameter optimization in machine learning". After the study of this article, I believe you have a deeper understanding of what is the method of super-parameter optimization in machine learning. the specific use also needs to be verified by practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.