Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is word vector and Embedding

2025-02-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "what is word vector and Embedding". In daily operation, I believe many people have doubts about what word vector and Embedding are. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the questions of "what word vector and Embedding are". Next, please follow the editor to study!

In the wijwij matrix, take out rows 1 and 2, which is the same as looking up the table of the so-called word vector (finding the corresponding word vector from the table). As a matter of fact, it is! This is the so-called Embedding layer. The Embedding layer is a fully connected layer with one hot as the input and middle layer nodes as the word vector dimension. And the parameter of this fully connected layer is a "word vector table"! At this level, the word vector doesn't do anything! It is one hot, stop laughing at the problem of one hot, the word vector is the parameter of the full connection layer of one hot!

So, is there really no innovation in word vectors and word vectors? Some, from the operational point of view, basically through the study found that one hottype matrix multiplication, like looking up the table, so it directly uses the look-up table as the operation, rather than written as a matrix re-operation, which greatly reduces the amount of computation. Again, the amount of computation is reduced not because of the emergence of word vectors, but because the matrix operation of one hottype is simplified to look up tables. This is at the computational level. At the ideological level, that is, after it has obtained the parameters of the full connection layer, it directly uses the parameters of the full connection layer as the feature, that is to say, it uses the parameters of the full connection layer as the representation of words and words, thus obtaining the word and word vectors. Finally, some interesting properties are found, such as the cosine of the angle between the vectors can represent the similarity of words and words to some extent.

By the way, some people criticize that Word2Vec is only a three-tier model, which is not "in-depth" learning. in fact, including the full connection layer of one hot, there are four layers, which can basically be called depth.

Where did it come from?

Wait, if you take the word vector as a parameter of the full connection layer (this reader, I have to correct, not "as", it is), then you haven't told me how to get this parameter! The answer is: I don't know how to get it. Don't the parameters of the neural network depend on your task? Your task should ask yourself, why am I here? You said Word2Vec was unsupervised? Then let me clarify again.

Strictly speaking, neural networks are supervised, while models such as Word2Vec, which actually train a language model to obtain word vectors, should be "self-supervised" to be exact. The so-called language model is to predict the probability of the next word through the former nn words, which is just a multi-classifier. We enter one hot, then connect a full connection layer, and then connect several layers, and finally a softmax classifier, we can get the language model, and then input a large number of text into training. Finally, we get the parameters of the first full connection layer, that is, word, word vector table, of course. Word2Vec has also done a lot of simplification, but it is all simplified in the language model itself, its first layer is the full connection layer, and the parameters of the full connection layer are words and word vector tables.

From this point of view, the problem is relatively simple, and I don't have to use language models to train vectors, do I? Yes, you can use other tasks, such as text emotion classification tasks, for supervised training. Because it has already been said, it is just a fully connected layer, and of course it is up to you to decide what to follow. Of course, because the label data is generally not much, so it is easy to over-fit, so generally use large-scale corpus to train words and word vectors unsupervised first to reduce the risk of over-fitting. Note that the reason for reducing the risk of overfitting is that the untagged corpus can be used to pretrain word vectors (untagged corpus can be very large, and if the corpus is large enough, there will be no over-fitting risk), which has nothing to do with the word vector, which is a layer of parameters to be trained. what can you do to reduce the risk of over-fitting?

Finally, explain why these word vectors have some properties, such as the angle cosine of the vector and the Euclidean distance of the vector can reflect the similarity between words to some extent. This is because, when we use the language model for unsupervised training, we open a window to predict the probability of the next word through the former nn word, this nn is the size of the window, words in the same window will have similar updates, these updates will accumulate, and words with similar patterns will accumulate these similar updates to a considerable extent. Let me cite an example. The words "perturbed" and "perturbed" are almost used together. when updating "perturbed", they will almost update "perturbed" at the same time, so their updates are almost the same, so the word vectors of "perturbed" and "perturbed" must be almost the same. " "similar patterns" means that in specific language tasks, they are replaceable, for example, in general generalization corpus, the word "like" in "I like you" and "like" in general context is still a valid sentence, so "like" and "hate" must have similar word vectors, but if the word vectors are trained through emotional classification tasks. Then "like" and "hate" will have very different word vectors.

At this point, the study of "what is word vector and Embedding" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report