In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article introduces how to understand the index column and embedded column in TensorFlow. The content is very detailed. Interested friends can use it for reference. I hope it will be helpful to you.
Generally speaking, the input data of machine learning and deep learning feed should be numeric. If the value of a column is character type and needs to be converted, let's summarize the metric columns and embedded columns in TensorFlow today.
Index column (indicator column) refers to a vector whose value is only 1 and the other is 0, it is sparse; embedded column (embedding column), whose value is between 0 and 1, is dense.
Index column, using one-hot coding method, the number of types of input will get a number of dimensions of the vector. If the input category is 4 categories, then it can be encoded as follows, and 0, 1, 2, and 3 categories are encoded as 4-dimensional vectors, respectively.
Create indicator column by calling tf.feature_column.indicator_column in TensorFlow
Categorical_column =...
Indicator_column = tf.feature_column.indicator_column (categorical_column)
But suppose we have 10 million possible categories, or maybe a billion, instead of just four. For a variety of reasons, with the increase in the number of categories, it is not feasible to use index columns to train neural networks.
How to solve the problem that the index list is not feasible caused by the surge in the number of categories?
Use embedded columns to overcome this limitation. Instead of representing data as unique hot vectors of many dimensions, embedded columns represent data as low-dimensional ordinary vectors, where each cell can contain any number, not just 0 or 1. By enabling each cell to contain more numbers, the embedded column contains far fewer cells than the metric column.
How is the dimension of each embedded vector determined? How are the values embedded in the vector magically assigned?
1. Set the number of words in the vocabulary form to 10,000. If the index column is selected, the value of each word is 10,000 dimensions, and the embedded column is adopted, and the dimension of each word is only 10, which is absolutely low compared with one-hot coding. The general empirical formula for the value of dimensions is the fourth root of the number of words.
2. At the beginning, the random number is put into the embedded vector, the value is assigned during the training, and the embedded vector learns the new relationship between categories from the training data.
Create embedded columns by calling tf.feature_column.embedding_column in TensorFlow
Categorical_column =...
Embedding_column = tf.feature_column.embedding_column (
Categorical_column=categorical_column, dimension=dimension)
Finally, it ends with an example showing the difference between the indicator column and the embedded column.
How to understand the index column and embedded column in TensorFlow is shared here. I hope the above content can be helpful to you and learn more knowledge. If you think the article is good, you can share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.