In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
Today, I will talk to you about how to use tensorflow and Keras. Many people may not know much about it. In order to make you understand better, the editor has summarized the following for you. I hope you can get something from this article.
Introduction
Artificial neural network (ANNs) is the advanced version of machine learning technology and the core of deep learning. Artificial neural network involves the following concepts. Input-output layer, hidden layer, neurons under the hidden layer, forward propagation and reverse propagation.
To put it simply, the input layer is a set of independent variables, the output layer represents the final output (dependent variables), and the hidden layer is composed of neurons, where equations and activation functions are applied. Forward propagation discusses the specific form of the equation to obtain the final output, while back propagation calculates the gradient drop to update the parameters accordingly.
Deep neural network
When an ANN contains a deep hidden layer, it is called a deep neural network (DNN). DNN has multiple weights and deviations, each of which requires training. Back propagation can determine how to adjust each weight and each deviation term of all neurons to reduce the error. This process will be repeated unless the network converges to a minimum error.
The steps of the algorithm are as follows:
Obtain training and test data to train and verify the output of the model. All statistical assumptions relating to correlation and outlier processing are still valid and must be addressed.
The input layer consists of independent variables and their respective values. The training set is divided into multiple batch. The complete training set is called an epoch. The more epoch, the longer the training time
Each batch is passed to the input layer, which sends it to the first hidden layer. Calculate the output of all neurons in this layer (for each small batch). The result is passed to the next layer, and the process is repeated until we get the output of the last layer, the output layer. This is forward propagation: just like making predictions, all but intermediate results are retained because they are required for back propagation.
The loss function is then used to measure the output error of the network, which compares the expected output with the actual output of the network.
The contribution of each parameter to the error term is calculated.
The algorithm performs a gradient descent according to the learning rate (back propagation) to adjust the weights and parameters, and the process is repeated.
It is important to initialize the weights of all hidden layers randomly, otherwise the training will fail.
For example, if you initialize ownership weight and offset to zero, all neurons in a given layer will be exactly the same, so back propagation will affect them in exactly the same way, so they will remain the same. In other words, although there are hundreds of neurons in each layer, your model will behave as if there is only one neuron in each layer: it will not be too smart. On the contrary, if you initialize weights randomly, you break the symmetry and allow back propagation to train different neurons.
Activation function
The activation function is the key to the gradient decline. The gradient descent cannot move on the plane, so it is important to have a well-defined non-zero derivative so that the gradient descent can make progress at each step. Sigmoid is commonly used for logistic regression problems, but there are other popular options.
Hyperbolic tangent function
This function is S-shaped and continuous, and the output range is between-1 and + 1. At the beginning of the training, the output of each layer is more or less centered on 0, so it helps to converge faster.
Rectifier linear unit
For inputs less than 0, it is non-differentiable. For other cases, it produces good output and, more importantly, faster computing speed. The function has no maximum output, so some problems that may occur in the process of gradient descent are well dealt with.
Why do we need to activate the function?
Suppose f (x) = 2x+5 and g (x) = 3x-1. The weights of the two inputs are different. When we link these functions, we get f (g (x)) = 2 (3x-1) + 5=6x+3, which is another linear equation. The lack of nonlinearity is represented by the fact that the deep neural network is equivalent to a linear equation. The complex problem space in this case cannot be dealt with.
Loss function
When dealing with regression problems, we do not need to use any activation functions for the output layer. The loss function used in training regression problems is the mean square error. However, the outliers in the training set can be dealt with by the average absolute error. Huber loss is also a widely used error function in regression-based tasks.
When the error is less than the threshold t (mostly 1), the Huber loss is quadratic, but when the error is greater than t, the Huber loss is linear. Compared with the mean square error, the linear part makes it less sensitive to abnormal values, and the quadratic part converges faster and has more accurate numbers than the mean absolute error.
Classification problems usually use two-classification cross-entropy, multi-classification cross-entropy or sparse classification cross-entropy. Two-classification cross-entropy is used for two-classification, while multi-classification or sparse classification cross-entropy is used for multi-class classification problems. You can find more details about the loss function in the link below.
Note: classification cross-entropy is used for the one-hot representation of dependent variables, and sparse classification cross-entropy is used when the label is provided as an integer.
Https://keras.io/api/losses/
Developing ANN with Python
We will use Kaggle's credit data to develop a fraud detection model using Jupyter Notebook. The same method can be implemented in google colab.
The data set contains transactions made by European cardholders through credit cards in September 2013. This dataset shows transactions that occurred within two days, of which 492 out of 284807 transactions were fraudulent. The dataset is highly unbalanced, with positive classes (fraud) accounting for 0.172% of all transactions.
Https://www.kaggle.com/mlg-ulb/creditcardfraud
Import tensorflow as tfprint (tf.__version__) import pandas as pdimport numpy as npfrom sklearn.model_selection import train_test_splitimport tensorflow as tffrom sklearn import preprocessingfrom tensorflow.keras.models import Sequentialfrom tensorflow.keras.layers import Dense, Dropout, BatchNormalizationfrom sklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score, f1_score, precision_recall_curve Aucimport matplotlib.pyplot as pltfrom tensorflow.keras import optimizersimport seaborn as snsfrom tensorflow import kerasimport random as rnimport osos.environ ["CUDA_VISIBLE_DEVICES"] = "3" PYTHONHASHSEED=0tf.random.set_seed (1234) np.random.seed (1234) rn.seed (1254)
The dataset consists of the following attributes. Time, main composition, amount and category. For more information, visit the Kaggle website.
File = tf.keras.utilsraw_df = pd.read_csv ('https://storage.googleapis.com/download.tensorflow.org/data/creditcard.csv')raw_df.head()
Since most attributes are principal components, the correlation is always 0. The only column where outliers are likely to occur is amount. The following is a brief introduction to the statistics in this area.
Count 284807.00mean 88.35std 250.12min 0.0025% 5.6050% 22.0075% 77.16max 25691.16Name: Amount, dtype: float64
Outliers are essential for detecting fraud because the basic assumption is that high trading volume may be a sign of fraudulent activity. However, the box chart does not reveal any specific trends to test the above hypothesis.
Prepare input, output and training test data X_data = credit_data.iloc [:,:-1] y_data = credit_data.iloc [:,-1] X_train, X_test, y_train, y_test = train_test_split (X_data, y_data, test_size = 0.2, random_state = 7) X_train = preprocessing.normalize (X_train)
Quantitative and principal component analysis variables use different scales, so the dataset is standardized. Standardization plays an important role in the gradient decline. Standardized data converges much faster.
Print (X_train.shape) print (X_test.shape) print (y_train.shape) print (y_test.shape)
Output:
(227845, 29) # number of records x columns (56962, 29) (227845,) (56962,) develop neural network layer
The above output shows that we have 29 arguments to deal with, so the shape of the input layer is 29. The general structure of any artificial neural network architecture is summarized below.
+-- +-- + | Hyper Parameter | Binary Classification | +-- -+ | # input neurons | One per input feature | | # hidden layers | Typically 1 to 5 | | # neurons per hidden layer | Typically 10 to 100 | | # output neurons | 1 per prediction dimension | | Hidden activation | ReLU Tanh Sigmoid | | Output layer activation | Sigmoid | | Loss function | Binary Cross Entropy | +-- -+ | Hyper Parameter | Multiclass Classification | +-- +- -+ | # input neurons | One per input feature | | # hidden layers | Typically 1 to 5 | | # neurons per hidden layer | Typically 10 to 100 | | # output neurons | 1 per prediction dimension | | Hidden activation | ReLU Tanh Sigmoid | | Output layer activation | Softmax | | Loss function | "Categorical Cross Entropy | | Sparse Categorical Cross Entropy" | | +-- Input of-+ Dense function
Units-output Siz
Activation-activates the function, if not specified, nothing is used
Use_bias-Boolean value, if an offset item is used
Kernel_initializer-initializer for kernel weight
Bias_initializer-initializer for the offset vector.
Model = Sequential (layers=None, name=None) model.add (Dense (10, input_shape = (29,), activation = 'tanh')) model.add (Dense (5, activation =' tanh')) model.add (Dense (1, activation = 'sigmoid')) sgd = optimizers.Adam (lr = 0.001) model.compile (optimizer = sgd, loss =' binary_crossentropy' Metrics= ['accuracy']) Architecture Summary model.summary () Model: "sequential" _ Layer (type) Output Shape Param # = dense (Dense) (None 10) 300 _ dense_1 (Dense) (None 5) 55 _ dense_2 (Dense) (None 1) 6 = Total params: 361Trainable params: 361Non-trainable params: 0 _ Let's try to understand the above output (output instructions are provided using two hidden layers):
We create a neural network with one input, two hidden layers and one output layer.
There are 29 variables and 10 neurons in the input layer. So the shape of the weight matrix is 10 x 29, and the shape of the offset matrix is 10 x 1.
Total number of layer 1 parameters = 10 x 29 "10 x 1" 300
The first layer has 10 output values, using tanh as the activation function. The second layer has 5 neurons and 10 inputs, so the weight matrix is 5 × 10 and the bias matrix is 5 × 1.
Total parameters of layer 2 = 5 x 10 "5 x 1" 55
Finally, there is one neuron in the output layer, but it has five inputs that are different from the hidden layer 2, and there is a bias term, so the number of neurons = 5, 1, 6.
Model.fit (X_train, y_train.values, batch_size = 2000, epochs = 20) Verbose = 1) Epoch 1 2ms/step 20114 2ms/step 114 [=]-0s 2ms/step-loss: 0.3434-accuracy: 0.9847Epoch 2Accord 20114 2ms/step-loss: 0.1029-accuracy: 0.9981Epoch 3 0.9981Epoch 20114 accuracy 114 [=]-0s 2ms/step-loss: 0.0518-accuracy: 0.9983Epoch 4Uniq114 [=]-0s 2ms/step-loss: 0.0341-accuracy: 0.9986Epoch 520114114 2ms/step-loss: 0.0255-accuracy: 0.9987Epoch 6 1ms/step 20114 loss 114 [=]-0s 1ms/step-loss: 0.0206-accuracy: 0.9988Epoch 7 0.9988Epoch 20114 loss 114 [=]-0s 1ms/step-loss: 0.0174-accuracy: 0.9988Epoch 8 1ms/step 20114 14 [=]-0s 1ms/step-loss: 0.0152-accuracy: 0.9988Epoch 9 1ms/step 20114 114 [=]-0s 1ms/step-loss: 0.0137-accuracy: 0.9989Epoch 10 / 20114 1ms/step 114 [=]-0s 1ms/step-loss: 0.0125-accuracy: 0.9989Epoch 11 0.9989Epoch 20114 2ms/step-loss: 0.0117-accuracy: 0.9989Epoch 12 1ms/step 20114 0.9989Epoch 114 [=]-0s 1ms/step-loss: 0.0110-accuracy: 0.9989Epoch 1314 accuracy: 0.0104-accuracy: 0.9989Epoch 14 1ms/step-loss: 0.0099-accuracy: 0.9989Epoch 15 1ms/step 20114 1ms/step 114 [=]-0s 1ms/step-loss: 0.0095-accuracy: 0.9989Epoch 16 accuracy 20114 loss 114 [=]-0s 1ms/step-loss: 0.0092-accuracy: 0.9989Epoch 17 0.9989Epoch 20114 accuracy 114 [=]-0s 1ms/step-loss: 0.0089-accuracy: 0.9989Epoch 18 accuracy 20114 loss 114 [=]-0s 1ms/step-loss: 0.0087-accuracy: 0.9989Epoch 19amp 20114114 [= = ]-0s 1ms/step-loss: 0.0084-accuracy: 0.9989Epoch 20Unip 20114 1ms/step 114 [=]-0s 1ms/step-loss: 0.0082-accuracy: 0.9989 Evaluation output X_test = preprocessing.normalize (X_test) results = model.evaluate (X_test) Y_test.values) 1781 614us/step 1781 [=]-1s 614us/step-loss: 0.0086-accuracy: 0.9989 analyze the learning curve by Tensor Board
TensorBoard is a good interactive visualization tool, which can be used to view the learning curve during training, compare multiple running learning curves, analyze training indicators, and so on. This tool is installed automatically with TensorFlow.
Import osroot_logdir = os.path.join (os.curdir, "my_logs") def get_run_logdir (): import time run_id = time.strftime ("run_%Y_%m_%d-%H_%M_%S") return os.path.join (root_logdir, run_id) run_logdir = get_run_logdir () tensorboard_cb = keras.callbacks.TensorBoard (run_logdir) model.fit (X_train, y_train.values, batch_size = 2000, epochs = 20, verbose = 1 Callbacks= [tensorboard _ cb])% load_ext tensorboard%tensorboard-logdir=./my_logs-port=6006
Over-parameter regulation
As mentioned earlier, there are no predefined rules for how many hidden layers or how many neurons are most appropriate for a problem space. We can use randomized searchcv or GridSearchCV to overshoot some parameters. The fine-tuning parameters are summarized as follows:
Number of hidden layers
Hidden layer neuron
Optimizer
Learning rate
Epoch
Declare functions to develop the model
Def build_model (n_hidden_layer=1, n_neurons=10, input_shape=29): # create model model = Sequential () model.add (Dense (10, input_shape= (29,), activation= 'tanh')) for layer in range (n_hidden_layer): model.add (Dense (n_neurons, activation= "tanh") model.add (Dense (1, activation=' sigmoid')) # compilation model model.compile (optimizer = 'Adam') Loss = 'binary_crossentropy', metrics= [' accuracy']) return model
Clone the model using the wrapper class
From sklearn.base import clone keras_class = tf.keras.wrappers.scikit_learn.KerasClassifier (build_fn = build_model,nb_epoch = 100,100, batch_size=10) clone (keras_class) keras_class.fit (X_train, y_train.values)
Create a random search grid
From scipy.stats import reciprocalfrom sklearn.model_selection import RandomizedSearchCVparam_distribs = {"n_hidden_layer": [1,2,3], "n_neurons": [20,30], # "learning_rate": reciprocal (3e-4, 3e-2), # "opt": ['Adam']} rnd_search_cv = RandomizedSearchCV (keras_class, param_distribs, n_iter=10, cv=3) rnd_search_cv.fit (X_train, y_train.values, epochs=5)
Check the best parameters
Rnd_search_cv.best_params_ {'naught neuronsgiving: 30,' naughtified neuronslaying: 3} rnd_search_cv.best_score_model = rnd_search_cv.best_estimator_.model
Optimizers should also be fine-tuned because they affect gradient descent, convergence, and automatic adjustment of learning rates.
Adadelta-Adadelta is a more robust extension of Adagrad that adjusts the learning rate based on gradient-updated mobile windows, rather than accumulating all past gradients.
Random gradient drop-commonly used. You need to use the search grid to fine-tune the learning rate
The learning rate of Adagrad- is constant for each cycle of all parameters and other optimizers. However, when dealing with the derivative of the error function, Adagrad will change the learning rate of each parameter "η" and change at each time step "t".
ADAM-ADAM (Adaptive moment estimation) uses first-order and second-order momentum to prevent local minima from skipping and maintains the exponential decay average of past gradients.
In general, better output can be obtained by increasing the number of layers rather than the number of neurons in each layer.
After reading the above, do you have any further understanding of how to use tensorflow and Keras? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.