In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
Today, I will talk to you about how to achieve CNN text classification in Tensorflow, many people may not know much about it. In order to make you understand better, the editor has summarized the following content for you. I hope you can get something according to this article.
1. Data and preprocessing
The dataset we will use in this article is Movie Review data from Rotten Tomatoes, which is also one of the datasets used in the original literature. The dataset contains 10662 sample comment sentences, half positive and negative. The size of the dataset is about 20k. Note that due to the small size of this dataset, we are likely to use a powerful model. In addition, the dataset does not come with a split training / test set, so we only need to use 10% of the data as dev set. The original literature shows the results of 10-fold cross-validation of the data.
The data preprocessing code is not discussed here, which is available on Github and does the following:
Load positive and negative emotional sentences from the original data file.
Clean up the text data using the same code as the original document.
Add each sentence to the maximum sentence length (59). We add special actions to all other sentences to make them 59 words. Filling sentences with the same length is useful because it allows us to effectively batch our data, because each example in the batch must have the same length.
Build a vocabulary index and map each word to an integer between 0 and 18765 (thesaurus size). Each sentence becomes an integer vector.
two。 Model
The network structure of the original literature is shown in the following figure:
The first layer embeds words into low-dimensional vectors. The next layer uses multiple filter sizes to perform convolution on embedded word vectors. For example, skate over 3, 4 or 5 words at a time. Next, we take the max_pooling results of the convolution layer as a long feature vector, add dropout regularities, and use the softmax layer to classify the results.
Because this is a teaching blog, the model of the original literature is simplified:
We will not embed our words using pre-trained word2vec vectors. Instead, we learn embedding from scratch.
We will not enforce L2 canonical constraints on the weight vector. In the article "A Sensitivity Analysis of (and Practitioners' Guide to) Convolutional Neural Networks for Sentence Classification", it is found that constraints have little effect on the final result. (follow the official account and enter cnn to get)
The original experiment used two input data channels-static and non-static word vectors. We only use one channel.
It is relatively easy to add these extension code here (dozens of lines of code). Take a look at the exercises at the end of the post.
3. Code implementation
To allow various hyperparameter configurations, we put the code into the TextCNN class and generate the model diagram in the init function.
To instantiate the class, we pass the following parameters:
Sequence_length-the length of the sentence. Note: we fill all sentences to the same length (our dataset is 59).
Num_classes-the number of classes in the output layer, in our case (negative, positive).
Vocab_size-the size of our vocabulary. This requires defining the size of our embedded layer, which will have the shape of [vocabulary_size,embedding_size].
Embedding_size-embedded dimension.
Filter_sizes-the number of words we want the convolution filter to cover. We will set the num_filters for each size specified here. For example, [3Jing 4 num_filters 5] means that we will have a filter that slides through 3Jing 4 and 5 words, respectively, with a total of 3 * 5 filters.
Num_filters-the number of filters per filter size (see above).
3.1 INPUT PLACEHOLDERS
First define the input data for the network
Tf.placeholder creates a placeholder variable that we feed to the network when we execute it during the training set or test time. The second parameter is the shape of the input tensor: None means that the length of the dimension can be anything. In our case, the first dimension is batch size, and the use of "None" allows the network to process batches of any size.
The probability of keeping neurons in the missing layer is also an input to the network because we only use dropout during training. We disable it when we evaluate the model (later).
3.2 EMBEDDING LAYER
The first layer we define is the embedding layer, which maps the lexical index to a low-dimensional vector representation. It is essentially a lookup table that learns from data.
We use several features here:
Tf.device ("/ cpu:0") forces the operation to be performed on the CPU. By default, TensorFlow will try to make operations available on GPU, if any, but the embedded implementation currently does not support GPU and will cause an error if placed on GPU.
Tf.name_scope creates a name range called "embedding". Scope adds all operations to a top-level node called embed to get a good hierarchy when visualizing the network in TensorBoard.
W is the embedded matrix that we learn in training. We use random uniform distribution to initialize it. Tf.nn.embedding_lookup creates the actual embedding operation. The result of the embedding operation is a three-dimensional tensor of the shape [None,sequence_length,embedding_size].
The convolution conversion operation of TensorFlow has a 4-dimensional tensor corresponding to the batch, width, height, and channel size. The result of our embedding does not include the channel size, so we add it manually, leaving a layer of shape as [None,sequence_length,embedding_size,1].
3.3 CONVOLUTION AND MAX-POOLING LAYERS
Now we are ready to build the convolution layer before max-pooling. Note: we use filter of different sizes. Because each convolution produces tensors of different shapes, we need to iterate over them, create a layer for each of them, and then merge the results into a large eigenvector.
Here, W is our filter matrix and h is the result of applying nonlinearity to convolution output. Each filter slides through the entire embedding, but it covers a different number of words. "VALID" padding means that we slide the filter over our sentences without filling the edges, giving us a narrow convolution of the shape [1 sequencesequencelength-filter_size + 1 Magne1]. Performing a maximum pool on output of a specific filter size leaves a tensor shape [batch_size,1,num_filters]. This is essentially a feature vector, in which the last dimension corresponds to our feature. Once we get all the aggregate output tensors from each filter size, we combine them into a long feature vector [batch_size,num_filters_total]. Using-1 in tf.reshape tells TensorFlow to flatten the dimension if possible.
3.4 DROPOUT LAYER
Dropout is probably the most popular method for convolution neural network regularization. The idea behind Dropout is simple. The Dropout layer randomly "disables" some of its neurons. This prevents neurons from co-adapting (co-adapting) and forces them to learn individual useful functions. The score of the neurons we keep enabled is defined by the dropout_keep_prob input of our network. We set it to 0. 5 during training and 1 (disable Dropout) during evaluation.
3.5 SCORES AND PREDICTIONS
Using the eigenvector of max-pooling (with dropout), we can generate predictions by performing matrix multiplication and selecting the class with the highest score. We can also use the softmax function to convert the original score to the normalized probability, but this does not change our final prediction.
Here, tf.nn.xw_plus_b is a convenient wrapper for performing Wx + b matrix multiplication.
3.6 LOSS AND ACCURACY
Using scores, we can define the loss function. Loss is a measure of our network errors, and our goal is to minimize it. The standard loss function of the classification problem is cross-entropy loss cross-entropy loss.
Here, tf.nn.softmax_cross_entropy_with_logits is a convenient function that calculates the cross-entropy loss of each class, giving our score and the correct input label. Then average the loss. We can also use summation, but it is more difficult to compare the loss of different batch sizes and training / test set data.
We also define an expression for precision, which is a useful value to track during training and testing.
TensorFlow can see that its structure is as follows:
3.7 TRAINING PROCEDURE
Before we define training programs for the network, we need to know some basics about how TensorFlow uses Sessions and Graphs. If you are already familiar with these concepts, please feel free to skip this section.
In TensorFlow, the Session is the environment in which the graph operation is being performed, which contains the status of the variables and queues. Each Session runs on a single graph. If you do not explicitly use Session when creating variables and operations, the current default Session created by TensorFlow is used. You can change the default Session by executing the command in the session.as_default () block (see below).
Graph contains operations and tensors. You can use multiple Graph in your program, but most programs only need one Graph. You can use the same Graph in multiple Session, but you cannot use multiple Graph in one Session. TensorFlow always creates a default Graph, but you can also manually create a Graph and set it as the new default Graph, as shown in the following figure. Explicitly creating Session and Graph ensures that resources are properly released when they are no longer needed.
When the preferred device does not exist, the allow_soft_placement setting allows the TensorFlow to fall back to the device with a specific operation. For example, if our code places an operation on GPU and we run the code on a machine without GPU, not using allow_soft_placement will result in an error. If log_device_placement,TensorFlow is set, it will log in to those devices (CPU or GPU) to operate. This is very useful for debugging. The tag is the command line argument of our program.
3.8 INSTANTIATING THE CNN AND MINIMIZING THE LOSS
When we instantiate our TextCNN model, all defined variables and operations will be placed in the default diagram and session created above.
Next, we define how to optimize the loss function of the network. TensorFlow has several built-in optimizers. We are using the Adam optimizer.
Here, train_op, here is a newly created operation that we can run to update our parameters. Each execution of train_op is a training step. TensorFlow automatically calculates which variables are "trainable" and calculates their gradients. Let TensorFlow count the training steps by defining a global_step variable and passing it to the optimizer. Each time train_op is executed, global step is automatically incremented by 1.
3.9 SUMMARIES
TensorFlow has an overview (summaries) that allows you to track and view various values during training and evaluation. For example, you may want to track your loss and accuracy over time. You can also track more complex values, such as layer-activated histograms. Summaries is a serialized object and is written to disk using SummaryWriter.
Here, we follow up the summary of training and evaluation respectively. In our case, these values are the same, but you may only have values that are tracked during training (such as parameter update values). Tf.merge_summary is a convenient function for merging multiple summary operations into a single operation that can be performed.
3.10 CHECKPOINTING
Another common feature of using TensorFlow is that checkpointing- saves the parameters of the model for later recovery. Checkpoints can be used to continue training at a later time, or to use early stopping to select the best parameter settings. Use the Saver object to create a Checkpoints.
3.11 INITIALIZING THE VARIABLES
Before training the model, we also need to initialize the variables in the diagram.
The global_variables_initializer function is a convenient function that runs all the initial values we defined for the variable. You can also manually invoke the initialization program of the variable. This is useful if you want to initialize embedding with pre-trained values.
3.12 DEFINING A SINGLE TRAINING STEP
Now let's define a function of the training step, evaluate the model on a batch of data, and update the model parameters.
Feed_dict contains the data that we pass to the placeholder nodes of our network. You must provide values for all placeholder nodes, or TensorFlow will throw an error. Another way to use input data is to use queues, but this is beyond the scope of this article.
Next, we use session.run to execute our train_op, which returns the values of all the operations we asked it to evaluate. Note that train_op returns nothing, it just updates the parameters of our network. Finally, we print the loss and accuracy of the current training batch and save the summary to disk. Please note that if the batch is too small the loss and accuracy of the training batch may vary significantly from batch to batch. And because we use dropout, your training metrics may start to be worse than your evaluations.
We write a similar function to evaluate the loss and accuracy of any data set, such as the validation set or the entire training set. This function is essentially the same as above, but there is no training operation. It also forbids exit.
3.13 TRAINING LOOP
Finally, prepare to write a training cycle. Iterate over batches of data, call the train_step function for each batch, and occasionally evaluate and check our model:
Here, batch_iter is a helper function for batch data, and tf.train.global_step is a convenient function for returning global_ step values.
3.14 VISUALIZING RESULTS IN TENSORBOARD
Our training script writes summaries to the output directory and points TensorBoard to that directory, where we can visualize the diagram and the summaries we created.
There are several things that stand out:
Our training indicators are not smooth because we use small batches. If we use larger batches (or evaluate on the entire training set), we will get a smoother blue line.
Because the tester's accuracy is significantly lower than the training accuracy, our network seems to overfit the training data, which indicates that we need more data (MR data set is very small), stronger regularization or fewer model parameters. For example, I tried to add additional L2 regularities to the weight at the last layer, and was able to improve the accuracy to 76%, close to the original literature.
Because of the use of dropout, the training loss and accuracy began to be much lower than the test indicators.
You can use code to operate and try to run the model with various parameter configurations. Github provides the code and instructions.
4. EXTENSIONS AND EXERCISES
Here are some exercises to improve the performance of the model:
Initialize the embedding with a pre-trained word2vec vector. To work, you need to use 300-dimensional embedding and initialize them with pre-trained values.
Limit the L2 norm of the last layer of weight vector, just like the original literature. You can update the weight value after each training step by defining a new operation.
Add L2 normalization to the network to prevent overfitting, while also increasing the dropout ratio. (the code on Github already includes L2 regularization, but is disabled by default)
Add histogram summaries for weight updates and layer operations and visualize them in TensorBoard.
After reading the above, do you have any further understanding of how to implement CNN text classification in Tensorflow? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 203
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.