What is the mathematical principle behind RNN? 07/15 Update SLTechnology News&Howtos

What is the mathematical principle behind RNN?

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

What is the mathematical principle behind RNN? aiming at this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible way.

Introduction: nowadays, there are more and more discussions about machine learning, deep learning and artificial neural networks. But programmers often just want to use these magical frameworks, and most don't want to know how they work behind them. But if we can grasp the principles behind these, it will be better for us to use them. Today we are going to talk about cyclic neural networks and the basic mathematical principles behind them, which enable cyclic neural networks to do things that other neural networks cannot do. Cyclic RNN (cyclic neural network).

The purpose of this paper is to provide an intuitive understanding of the function and structure of cyclic neural networks.

A neural network usually takes from variables (or a group of independent variables) and dependent variables, and then it learns the mapping between them (we call it training). Once the training is completed, when a new independent variable is given, it can predict the corresponding dependent variables.

But what if the order of the data matters? Imagine if the order of all independent variables is important?

Let me explain it intuitively.

Just assume that each ant is an independent variable, and if one ant moves in a different direction, it doesn't matter to the other ants, right? But what if the order of the ants is important?

At this point, if an ant misses or leaves the colony, it will affect the ants behind.

So, in the machine learning space, which data order is important?

Word order of natural language data speech data time series data video / music series data stock market data and so on

So how does RNN deal with data that is important in the overall order? We use natural text data as an example to explain RNN.

Suppose I am doing an emotional analysis of the user comments on a movie.

From the good movie (This movie is good) to the bad movie (This movie is bad)-negative.

We can classify them by using a simple vocabulary bag model, we can predict (positive or negative), but so on.

What if the review is that the film is This movie is not good?

The BOW model may say that this is a positive signal, but it is not. RNN understands it and predicts that it is a negative message.

How does 1RNN do it? 1 all kinds of RNN models

1. One to many

RNN takes an input, such as an image, and generates a sequence of words.

2. Many to one

RNN takes a sequence of words as input and generates an output.

3. Many to many

Next, we are focusing on the second mode, many-to-one. The input to RNN is treated as a time step.

Example: enter (X) = ["this", "movie", "is", "good"]

The timestamp of this is x (0), movie is x (1), is is x (2), and good is x (3).

2 network architecture and mathematical formula

Let's delve into the mathematical world of RNN.

First, let's understand what RNN cells contain! I hope and assume that you know the generalization of feedforward neural networks, FFNN.

An example of a feedforward neural network with only a single neuron in the hidden layer. A feedforward neural network with a single hidden layer.

In the feedforward neural network, we have X (input), H (hidden) and Y (output). We can have as many hidden layers as we like, but the weight W of each hidden layer and the corresponding input weight of each neuron are different.

Above, we have weights Wy10 and Wy11, which correspond to the weights of two different layers relative to the output Y, while Wh00 and Wh01 represent different weights of different neurons relative to the input.

Due to the existence of time step, the neural network unit contains a set of feedforward neural networks. The neural network has the characteristics of sequential input, sequential output, multi-time step and multi-hidden layer.

Unlike FFNN, here we calculate the hidden layer value not only from the input value, but also from the previous time step value. For the time step, the weight (W) of the hidden layer is the same. Below is a complete picture of RNN and the mathematical formulas it involves.

In the picture, we are calculating the value of the time step t of the hidden layer:

The activation function inputs the hidden layer weight. The hidden layer weight is the previous time step. I said W is the same for all time steps. Activation functions can be Tanh, Relu, Sigmoid, and so on. Different activation functions.

Above we only calculate Ht, similarly, we can calculate all other time steps.

Steps:

1. From and calculation

2. By and calculation

3. From, and calculation

4. By and calculation, and so on.

It is important to note that:

1. Sum is a weight vector, and each time step is different.

2. We can even calculate the hidden layer (all time steps) first, and then calculate the value.

3. The weight vector is random at first.

Once the feedforward input is completed, we need to calculate the error and use the back propagation method to back propagate the error, and we use cross entropy as the cost function.

2BPTT (back propagation of time)

If you know how normal neural networks work, the rest is very simple. If you are not clear, you can refer to the previous article on artificial neural networks in this number.

We need to calculate the following

1. How does the total error relative to the output (hidden and output units) change? 2. How does the output change relative to the weight (U, V, W)?

Because W is the same for all time steps, we need to go back to the front to update.

BPTT in "RNN".

Remember that the back propagation of RNN is the same as that of artificial neural networks, but the current time step here is based on the previous time step, so we have to traverse back and forth from beginning to end.

If we apply the chain rule, like this

"back propagate the chain rule.

W is the same in all time steps, so more and more items are expanded according to the chain rule.

In Richard Sochers's circular neural network lecture slide [1], we can see a similar but different method of calculating formulas.

Similar but more concise RNN formula: the total error is the sum of the corresponding errors of each time step t: the application of the chain rule:

So here, it's the same as ours.

Can be updated with any optimization algorithm, such as the gradient descent method.

2 return to the instance

Now let's go back to our emotional analysis. Here's a RNN.

We provide a word vector or a hot coding vector for each word as input, and carry out feedforward and BPTT. Once the training is completed, we can give new text for prediction. It will learn something, such as not + positive words = negative.

The problem of RNN the problem of → disappearance / explosion gradient

Because W is the same for all time steps, in the back propagation process, when we go back to adjust the weight, the signal will become either too weak or too strong, resulting in the problem of either disappearing or exploding.

The answer to the question about the mathematical principle behind RNN is shared here. I hope the above content can be of some help to everyone. If you still have a lot of doubts to be solved, you can follow the industry information channel to learn more about it.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.