What is the summary and implementation of Word2Vec papers? 07/16 Update SLTechnology News&Howtos

What is the summary and implementation of Word2Vec papers?

2025-07-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

In this issue, Xiaobian will bring you about the summary and implementation of Word2Vec papers. The article is rich in content and analyzed and described from a professional perspective. After reading this article, I hope you can gain something.

I. Summary

1. Word2vec can be trained efficiently on millions of dictionaries and hundreds of millions of datasets; the tool gets word embeddings trained to measure similarity between words.

2. A shallow neural network is behind the word2 vec algorithm. Word2vec algorithms or models refer to the CBOW model and skip-gram model for computing word vectors.

3. Skip-gram formula and explanation

When the time window size is m, the word skipping model needs to maximize the probability that any given central word generates a background word by:

The probability that a central word in the loss function generates a background word can be defined using the softmax function:

Description:

4. CBOW Formula and Explanation

5. negative sample

Negative sampling updates only a small fraction of the weights one training sample at a time. In the paper, the authors point out that 5-20 negative words are better for small datasets, and only 2-5 negative words for large datasets. Large-scale data sets instead take fewer negative sampling words, 1) reduce the amount of computation, and 2) experience. Logistic regression problem in which negative samples eventually become 1,0 (positive and negative samples).

Negative sampling words are selected according to the formula,

The greater the negative sampling probability of a word, the greater the probability that it will be selected. Negative sampling I understand this way, now I need to take 20 negative samples, there are 1w words in the corpus, each word has a probability calculated according to frequency (and is 1), I now randomly choose a number between 0 and 1, choose which word falls in which position, choose 20 times.

6. hierarchical softmax

During the training process of the model, a huge Huffman tree is constructed by Huffman coding, and vectors are assigned to non-leaf nodes. What we want to calculate is the probability of the target word w, which means the probability of going randomly from the root node to the target word w. Therefore, when passing non-leaf nodes (including root) on the way, you need to know the probability of going left and right respectively. For example, the probabilities of going left and right when arriving at non-leaf node n are:

From this, we can iteratively compute all word vectors v and vectors u of non-leaf nodes in the dictionary using stochastic gradient descent in the jump word model and the continuous word bag model. The computational overhead per iteration is given by O(|V| O(log) is a binary tree.| V|)。

II. Realization

According to the simple corpus, the relationship between words in the expected corpus is displayed, which provides the basis for similarity calculation.

# 1. import base package, global parameters import torchimport numpy as npimport matplotlib.pyplot as pltfrom torch import nn, optimfrom torch.utils.data import TensorDataset, DataLoaderdevice = torch.device ('cuda ' if torch.cuda.is_available() else 'cpu')#2. Corpus and corpus size, word-index relationship sentences ="'i like dog','jack hate coffee','i love milk','jack study natural language process', 'word2vec conclude skip-gram and cbow model', 'jack like coffee', 'dog coffee milk']word_list = ' '.join(sentences).split()vocab = list(set(word_list))vocab_size = len(vocab)word2idx = {w:i for i, w in enumerate(vocab)}idx2word = {i:w for i, w in enumerate(vocab)}# 3. window, skip_gram, input output window = 2batch_size = 8#generate skip_gram = []for center_idx in range(len(word_list)): center = word2idx[word_list[center_idx]] for context_idx in (list(range(center_idx - window, center_idx)) + list(range(center_idx + 1, center_idx + 1 + window))): if context_idx

< 0 or context_idx >

len(word_list) - 1: continue else: context = word2idx[word_list[context_idx]] skip_gram.append([center, context])def get_data(): input_data = [] target_data = [] for i in range(len(skip_gram)): input_data.append(np.eye(vocab_size)[skip_gram[i][0]]) target_data.append(skip_gram[i][1]) return input_data, target_datainput, target = get_data()input, target = torch.Tensor(input), torch.LongTensor(target)# 4. dataloaderdataset = TensorDataset(input, target)dataloder = DataLoader(dataset, batch_size, True)# 5. model implementation, optimizer, loss function class Word2Vec(nn.Module): def __init__(self): super(Word2Vec, self).__ init__() self.embed_size = 2 self.W = nn.Parameter(torch.randn(vocab_size, self.embed_size).type(torch.Tensor)) self.V = nn.Parameter(torch.randn(self.embed_size, vocab_size).type(torch.Tensor)) def forward(self, x): # x[batch_size, vocab_size] one_hot out = torch.mm(torch.mm(x, self.W), self.V) return outmodel = Word2Vec().to(device)criteriom = nn.CrossEntropyLoss()optimizer = optim.Adam(model.parameters(), lr=1e-3)# 6. Training for epoch in range(2000): for i, (input_x, target_y) in enumerate(dataloder): input_x = input_x.to(device) target_y = target_y.to(device) pred = model(input_x) loss = criteriom(pred, target_y) optimizer.zero_grad() loss.backward() optimizer.step() if (epoch + 1) % 500 == 0 and i == 0: print(epoch + 1, loss.item())# 7. Show the relationship between vectors by image for i, label in enumerate(vocab): W, WT = model.parameters() x,y = float(W[i][0]), float(W[i][1]) plt.scatter(x, y) plt.annotate(label, xy=(x, y), xytext=(5, 2), textcoordinates ='offset points', ha=' right', va='bottom') plt.show() The above is what the summary and implementation of Word2Vec papers shared by Xiaobian is. If there is any similar doubt, please refer to the above analysis for understanding. If you want to know more about it, please pay attention to the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.