What does One Hot code mean? 04/25 Update SLTechnology News&Howtos

What does One Hot code mean?

2025-04-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article introduces what One Hot coding refers to, the content is very detailed, interested friends can refer to, hope to be helpful to you.

Guide reading

When you are playing with ML models, you will come across the term "One hot encoding" anywhere.

When you are playing with ML models, you will come across the term "One hot encoding" anywhere. You can see the sklearn document of an one hot encoder, which says "use one-hot, that is, one-of- k mode coding classification integer feature". It's not very clear, is it? Or at least not for me. Let's see what the one hot code is.

The One hot coding method converts the classification variables into a form that can be provided to the ML algorithm for better prediction.

Suppose the dataset is as follows:

╔═╦═╦═╗

║ CompanyName Categoricalvalue ║ Price ║

╠═╬═╣═║

║ VW ╬ 1 ║ 20000 ║

║ Acura ╬ 2 ║ 10011 ║

║ Honda ╬ 3 ║ 50000 ║

║ Honda ╬ 3 ║ 10000 ║

╚═╩═╩═╝

The classification value represents the numerical value of the entry in the dataset. For example, if there is another company in the dataset, its classification value should be 4. With the increase of the number of unique items, the classification value increases accordingly.

The above table is just a representation. In fact, the classification value starts at 0 and goes all the way to Nmuri 1 category.

As you probably already know, you can use sklearn's LabelEncoder to complete the classification value assignment.

Now let's go back to one hot coding: suppose we encode one hot according to the instructions given in the sklearn document, then do some cleaning, and get the following result:

╔════╦

║ VW ║ Acura ║ Honda ║ Price ║

╠════╬

║ 1 ╬ 0 ╬ 0 ║ 20000 ║

║ 0 ╬ 1 ╬ 0 ║ 10011 ║

║ 0 ╬ 0 ╬ 1 ║ 50000 ║

║ 0 ╬ 0 ╬ 1 ║ 10000 ║

╚════╩═╝ 0 means it does not exist, and 1 means it exists.

Can you think of a reason before we go any further? Why is it not enough to use tag coding to train the model? Why do I need one hot coding?

The problem with tag coding is that it assumes that the higher the category value, the better the category. "wait, what!?"

Let me explain: the premise of this organization is based on the value of the analogy, VW > Acura > Honda. Suppose you calculate the average inside your model, then we get 1 times 3 = 4 ax 2 = 2. This means that the average level of VW and Honda is Acura. This is definitely a disaster. There will be many errors in the prediction of this model.

This is why we use one hot encoders to perform the "binarization" of categories and train the model as a feature.

Another example: suppose you have a "flower" feature that accepts values of "daffodil", "lily" and "rose". An one hot code converts the "flower" feature into three features, "is_daffodil", "is_lily" and "is_rose", all of which are binary.

See the following figure:

About One Hot coding refers to what is shared here, I hope that the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.