How the CRF layer on BiLSTM works 07/12 Update SLTechnology News&Howtos

How the CRF layer on BiLSTM works

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article introduces how the CRF layer on BiLSTM works, the content is very detailed, interested friends can refer to, I hope it can be helpful to you.

Guide reading

Read a lot of CRF introduction and explanation, this feeling is the clearest, combined with the actual application scenario, let you understand the usefulness and usage of CRF.

Preliminary knowledge

The only thing you need to know is what named entity recognition is. If you don't know anything about neural networks, CRF or any other related knowledge, please don't worry. I will explain everything as intuitively as possible.

1. Introduction

For named entity recognition tasks, the method based on neural network is very common. I'll use the model in this article as an example to explain how the CRF layer works.

If you don't know the details of BiLSTM and CRF, keep in mind that they are two different layers in the named entity recognition model.

1.1 before you begin

Let's assume that we have a dataset with two entity types, Person and Organization. But, in fact, in our dataset, we have five entity tags:

BMI Personi-PersonB-OrganizationI-OrganizationO

In addition, x is a five-word sentence, w0, w1, w1, w2, w3, w4. More importantly, in the sentence x, [w0jinw1] is a Person entity, [w3] is an Organization entity, and the others are "O".

1.2 BiLSTM-CRF model

I will give a brief introduction to this model.

As shown in the following figure:

First of all, each word in the sentence x is represented as a vector, including the embedding of words and characters. Character embedding is randomly initialized. Word embedding is usually imported from a pre-trained word embedding file. All embedding will be fine-tuned during training. Second, the input to the BiLSTM-CRF model is these embeddings, and the output is the prediction label of the words in sentence x.

Although you don't need to know the details of the CRF layer, in order to understand the BiLSTM layer more easily, we need to know what the meaning of the BiLSTM layer output is.

The figure above shows that the output of the BiLSTM layer is the score for each tag. For example, for w0, the output of the BiLSTM node is 1.5 (B-Person), 0.9 (I-Person), 0.1 (B-Organization), 0.08 (I-Organization), and 0.05 (O), and these scores will be used as input to the CRF layer.

Then, enter all the scores predicted by the BiLSTM layer into the CRF layer. In the CRF layer, select the tag sequence with the highest prediction score as the best answer.

1.3 what happens if there is no CRF layer

You may have found that even if there is no CRF layer, that is, we can train a BiLSTM named entity recognition model, as shown in the following figure.

Because the output of the BiLSTM for each word is the label score. We can choose the label with the highest score for each word.

For example, for w0, "B-Person" has the highest score (1.5), so we can choose "B-Person" as its best prediction label. Similarly, we can select "I-Person" for w1, "O" for w2, "B-Organization" for w3, and "O" for w4.

Although we can get the correct label of the sentence x in this example, this is not always the case. Try the example in the picture below again.

Obviously, the output this time is invalid, "I-Organization I-Person" and "B-Organization I-Person".

1.4 the CRF layer can learn constraints from training data.

The CRF layer can add some constraints to the final prediction tags to ensure that they are valid. These constraints can be automatically learned by the CRF layer from the training data set during the training process.

Constraints can be:

The label of the first word in a sentence should start with "B -" or "O" instead of "I -"B-label1 I-label2 I-label3 I-…" In this mode, label1, label2, label3... Should be the same named entity tag. For example, "B-Person I-Person" is valid, but "B-Person I-Organization" is invalid. "O I-label" is invalid. The first label of a named entity should start with "B -" instead of "I -". In other words, the valid pattern should be "O B-label".

With these useful constraints, the number of invalid prediction tag sequences will be significantly reduced.

So much for sharing about how the CRF layer on BiLSTM works. I hope the above content can help you and learn more. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.