How to implement CBOW Model Class in nlp Natural language processing 04/19 Update SLTechnology News&Howtos

How to implement CBOW Model Class in nlp Natural language processing

2025-04-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces the relevant knowledge of "how to implement nlp natural language processing CBOW model class". The editor shows you the operation process through an actual case, and the operation method is simple, fast and practical. I hope this article "how to implement nlp natural language processing CBOW model class" can help you solve the problem.

Implement the CBOW model class

Initialization: the parameters of the initialization method include the number of words vocab_size and the number of neurons in the middle layer hidden_size. First generate two weights (W_in and W_out) and initialize them with some small random values. Set astype ('f') to initialize a 32-bit floating-point number.

Generation layer: generate two MatMul layers on the input side, one MatMul layer on the output side, and one Softmax with Loss layer.

Save weights and gradients: save the weight parameters and gradients used in the neural network in the list type member variables params and grads, respectively.

Forward propagation forward () function: this function takes parameters contexts and target and returns loss. The structure of these two parameters is as follows.

Contexts is a three-dimensional NumPy array, the number of elements in dimension 0 is the number of mini-batch, the number of elements in dimension 1 is the window size of the context, and dimension 2 represents the one-hot vector. What does the following code take out?

H0 = self.in_layer0.forward (contexts [:, 0]) H2 = self.in_layer1.forward (contexts [:, 1])

Jym did a test:

Import syssys.path.append ('..') from common.util import preprocess #, create_co_matrix, most_similarfrom common.util import create_contexts_target, convert_one_hottext = 'You say goodbye and I say hello.'corpus, word_to_id, id_to_word = preprocess (text) contexts, target = create_contexts_target (corpus, window_size=1) # print (contexts) # print (target) vocab_size = len (word_to_id) target = convert_one_hot Vocab_size) contexts = convert_one_hot (contexts, vocab_size) print (contexts [:, 0])

Output: then you can see from the output that the words on the left of different target are taken.

[[1 0 0 0]

[0 1 0 0 0]

[0 0 1 0 0 0]

[0 0 0 1 0 0 0]

[0 0 0 1 0 0]

[0 1 0 0 0]]

Back propagation backward (): the back propagation of neural networks propagates gradients in the opposite direction of forward propagation. This backpropagation starts at 1 and passes it to the Softmax with Loss layer. Then, the output ds of the back propagation of the Softmax with Loss layer is transmitted to the MatMul layer on the output side. The back propagation of "×" will "swap" the input value of the forward propagation and multiply it by the gradient. The back propagation of "+" propagates the gradient "as is".

What is called in this backward function is the backpropagation function of the previously written layer, such as loss_layer.backward (dout), so after the backward function is used up, the gradient of each weight parameter is saved in the member variable grads (this is realized by the backpropagation function in the previously written layer). First call the forward () function, then call the backward () function, and the gradient in the grads list is updated.

Import syssys.path.append ('..') import numpy as npfrom common.layers import MatMul, SoftmaxWithLossclass SimpleCBOW: def _ init__ (self, vocab_size, hidden_size): v, H = vocab_size, hidden_size # initialize weight W_in = 0. 01 * np.random.randn (V, H) .astype ('f') W_out = 0. 01 * np.random.randn (H V) .astype ('f') # Generation layer self.in_layer0 = MatMul (W_in) self.in_layer1 = MatMul (W_in) self.out_layer = MatMul (W_out) self.loss_layer = SoftmaxWithLoss () # organize all weights and gradients into the list layers = [self.in_layer0, self.in_layer1 Self.out_layer] self.params, self.grads = [], [] for layer in layers: self.params + = layer.params self.grads + = layer.grads # sets the distributed representation of words to the member variable self.word_vecs = W_in def forward (self, contexts, target): H0 = self.in_layer0.forward (contexts [: 0]) H2 = self.in_layer1.forward (contexts [:, 1]) h = (h0 + H2) * 0.5 score = self.out_layer.forward (h) loss = self.loss_layer.forward (score, target) return loss def backward (self) Dout=1): ds = self.loss_layer.backward (dout) da = self.out_layer.backward (ds) da * = 0.5 self.in_layer1.backward (da) self.in_layer0.backward (da) return NoneTrainer class implementation

The realization of learning CBOW model: prepare learning data for neural network. Then the gradient is calculated and the weight parameters are updated step by step.

Trainer class: a class for learning.

Initialization: the initialization program of the class receives the neural network (model) and the optimizer (SGD, Momentum, AdaGrad, Adam)

Learning: call the fit () method to start learning. Parameters: X, input data; t, supervision tag; max_epoch, number of epoch to learn; size of batch_size,mini-batch; eval_interval, interval of output (average loss, etc.). For example, if eval_interval=20 is set, the average loss is calculated every 20 iterations and the result is output to the interface; max_grad, the maximum norm of the gradient. When the norm of the gradient exceeds this value, the gradient is reduced.

Def fit (self, x, t, max_epoch=10, batch_size=32, max_grad=None, eval_interval=20):

Plot method: draw the loss recorded by the fit () method (the average loss evaluated by eval_interval).

Class Trainer: def _ init__ (self, model, optimizer): self.model = model self.optimizer = optimizer self.loss_list = [] self.eval_interval = None self.current_epoch = 0 def fit (self, x, t, max_epoch=10, batch_size=32, max_grad=None Eval_interval=20): data_size = len (x) max_iters = data_size / / batch_size self.eval_interval = eval_interval model, optimizer = self.model Self.optimizer total_loss = 0 loss_count = 0 start_time = time.time () for epoch in range (max_epoch): # disrupt idx = numpy.random.permutation (numpy.arange (data_size)) x = x [idx] t = t [idx] for iters in range (max_iters): Batch_x = x [iters+1 * batch_size: (iters+1) * batch_size] batch_t = t [iters * batch_size: (iters+1) * batch_size] # calculate the gradient Update parameters loss = model.forward (batch_x, batch_t) model.backward () params, grads = remove_duplicate (model.params, model.grads) # integrate the shared weight into an if max_grad is not None: clip_grads (grads, max_grad) optimizer.update (params Grads) total_loss + = loss loss_count + = 1 # evaluate if (eval_interval is not None) and (iters% eval_interval) = = 0: avg_loss = total_loss / loss_count elapsed_time = time.time ()-start_time Print ('| epoch% d | iter% d /% d | time% d [s] | loss% .2f'% (self.current_epoch + 1) Iters + 1, max_iters, elapsed_time, avg_loss) self.loss_list.append (float (avg_loss)) total_loss, loss_count = 0,0 self.current_epoch + = 1 def plot (self) Ylim=None): X = numpy.arange (len (self.loss_list)) if ylim is not None: plt.ylim (* ylim) plt.plot (x, self.loss_list, label='train') plt.xlabel ('iterations (x' + str (self.eval_interval) +')) plt.ylabel ('loss') plt.show ()

The Trainer class is used to perform the learning of the CBOW model.

This model actually stores the member variables of SimpleCBOW.

Model = SimpleCBOW (vocab_size, hidden_size)

The following is the call to the Trainer class:

Trainer = Trainer (model, optimizer) trainer.fit (contexts, target, max_epoch, batch_size) trainer.plot () # coding: utf-8import syssys.path.append ('..') # Settings from common.trainer import Trainerfrom common.optimizer import Adamfrom simple_cbow import SimpleCBOWfrom common.util import preprocess, create_contexts_target, convert_one_hotwindow_size = 1hidden_size = 5batch_size = 3max_epoch = 1000text = 'You say goodbye and I say hello.'corpus for introducing files in the parent directory Word_to_id, id_to_word = preprocess (text) vocab_size = len (word_to_id) contexts, target = create_contexts_target (corpus, window_size) target = convert_one_hot (target, vocab_size) contexts = convert_one_hot (contexts, vocab_size) model = SimpleCBOW (vocab_size, hidden_size) optimizer = Adam () trainer = Trainer (model, optimizer) trainer.fit (contexts, target, max_epoch, batch_size) trainer.plot () trainer.plot = word_vecs Word in id_to_word.items (): print (word, word_ vectors [word _ id])

Results:

The member variables in the SimpleCBOW class have the following: the weight matrix W_in is the distributed representation of words.

# set the distributed representation of words to the member variable self.word_vecs = W_in

Then you can look at the distributed representation of words.

Word_vecs = model.word_vecsfor word_id, word in id_to_word.items (): print (word, word_ vectors [word _ id])

The results are as follows: as can be seen, the word represents a dense vector.

You [- 0.9987413 1.0136298-1.4921554 0.97300434 1.0181936]

Say [1.161595-1.1513934-0.25779223-1.1773298-1.1531342]

Goodbye [- 0.88470864 0.9155085-0.30859873 0.9318609 0.9092796]

And [0.7929211-0.8148116-1.8787507-0.7845257-0.8028278]

I [- 0.8925459 0.95505357-0.29667985 0.90895575 0.90703803]

Hello [- 1.0259517 0.97562104-1.5057516 0.96239203 1.0297285]

. [1.2134467-1.1766206 1.6439314-1.1993438-1.1676227]

Why there are five numbers here actually lies in the weight matrix W. In the SimpleCBOW class, W_in size is related to the number of words and hidden_size.

V, H = vocab_size, hidden_size # initialize weights W_in = 0.01* np.random.randn (V, H) .astype ('f')

When you use the Trainer class to perform the learning of the CBOW model, set hidden_size = 5, so the final word is represented as a vector containing five numbers.

Learning the CBOW model: adjust the weight to make the prediction accurate. In other words, the context is you and goodbye, and the correct untagging should be say, so if the network has a good weight, the score of the say corresponding to the correct solution should be higher.

In fact, Softmax function and cross-entropy error are used to learn the neural network. The Softmax function is used to transform the score into probability, and then the cross-entropy error between these probabilities and the supervision label is calculated and learned as a loss. If the CBOW model of reasoning is added with Softmax layer and Cross Entropy Error layer, the loss can be obtained.

The weight of both the input side and the output side can be regarded as the distributed representation of words, in which only the weight of the input side is used as the distributed representation of words.

Finally, put up the CBOW model class you wrote earlier:

Class SimpleCBOW: def _ init__ (self, vocab_size, hidden_size): v, H = vocab_size, hidden_size # initialization weight W_in = 0.01* np.random.randn (V, H) .astype ('f') W_out = 0.01* np.random.randn (H V) .astype ('f') # Generation layer self.in_layer0 = MatMul (W_in) self.in_layer1 = MatMul (W_in) self.out_layer = MatMul (W_out) self.loss_layer = SoftmaxWithLoss () # organize all weights and gradients into the list layers = [self.in_layer0, self.in_layer1 Self.out_layer] self.params, self.grads = [], [] for layer in layers: self.params + = layer.params self.grads + = layer.grads # sets the distributed representation of words to the member variable self.word_vecs = W_in def forward (self, contexts, target): H0 = self.in_layer0.forward (contexts [: 0]) H2 = self.in_layer1.forward (contexts [:, 1]) h = (h0 + H2) * 0.5 score = self.out_layer.forward (h) loss = self.loss_layer.forward (score, target) return loss def backward (self) Dout=1): ds = self.loss_layer.backward (dout) da = self.out_layer.backward (ds) da * = 0.5 self.in_layer1.backward (da) self.in_layer0.backward (da) return None on "how to implement CBOW model classes in nlp natural language processing" ends here Thank you for your reading. If you want to know more about the industry, you can follow the industry information channel. The editor will update different knowledge points for you every day.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.