Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to understand gated cycle unit and GRU

2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article introduces you how to understand the gated cycle unit, GRU, the content is very detailed, interested friends can refer to, hope to be helpful to you.

1. What is GRU?

In the gradient calculation method of circulatory nerve collaterals, we find that when the time step is small or the time step is small, the gradient of circulatory nerve collaterals is more likely to attenuate or explode. Although the clipping gradient can deal with the gradient explosion, the gradient attenuation problem is solved by the clipping method. Usually for this reason, it is difficult to capture the dependence of time step distance in time series in practice.

Gated circulatory nerve collateral (gated recurrent neural network) is proposed to better capture the dependence of time step distance in time series. It controls the flow of information through learnable information. Among them, the gated circulatory unit (gatedrecurrent unit,GRU) is the gated circulatory nerve network.

2. Cycle control unit 2.1 reset door and update door

GRU introduces the concepts of reset gate and update gate, thus modifying the formula for calculating the hidden state in the circulatory nerve network.

The outputs of the reset and update cycles in the gated cycle unit are hidden in the current time step and the previous time step, and the output is calculated by the full connection layer whose activation function is the sigmoid function. As shown in the following figure:

Specifically, assuming that the number of hidden units is h, the small batch output of t at a given time step

(the number of samples is n, the number of output is d) and the previous time step hidden state.

. The reset and Update is calculated as follows:

)

)

The sigmoid function converts the value of an element to between 0 and 1. Therefore, the range of each element in reset and Update is [0,1].

2.2 candidate hidden state

Next, the controller loop unit will calculate the candidate hidden state to assist in the later hidden state calculation. We multiply the output of the current time step by element (symbol ⊙) with the hidden state of the previous time step. If the value of the reset element is close to 0, it means that the corresponding hidden state element is reset to 0, that is, the hidden state of the previous time step is discarded. If the element value is close to 1, the table "retains the hidden state of the previous" time step. Then, the result of element multiplication is connected with the input of the current time step, and then the candidate hidden state is calculated through the full connection layer with activation function tanh, and the value range of all elements is [- 1].

Specifically, the candidate hidden state of time step t is calculated as follows:

W _ {hh} + bailh))

As can be seen from the formula above, the reset controls how the hidden state of the previous time step flows to the candidate hidden state of the current time step. The hidden state of the previous time step may contain all the historical information of the last time step of the time series. Therefore, reset the "can" to discard the historical information related to the prediction.

2.3 Hidden status

Finally, the calculation of the hidden state of the time step t causes the update of the current time step to combine the hidden state of the previous time step with the candidate hidden state of the current time step:

It is worth noting that the update "can control how the hidden state should be updated by the candidate hidden state containing the current time step information, as shown in the above figure." Suppose the update "between time steps" is directly approximated by 1. In that case, the input information between time steps is almost free from the hidden state of time step t, in fact, this can be seen as the hidden state of the earlier time directly saved and transferred through time to the current time step t. This design can deal with the problem of gradient attenuation in circulatory nerve collaterals and better capture the dependence of time step distance in time series.

We summarize the design of the control cycle unit:

Reset the short-term dependency of "help capture time series"

The update helps capture the dependency of the duration of the time series.

On how to understand the gated cycle unit, GRU to share here, I hope that the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report