How to use tensorflow to construct long-term memory LSTM in python 07/01 Update SLTechnology News&Howtos

How to use tensorflow to construct long-term memory LSTM in python

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)05/31 Report--

This article mainly explains "how to use tensorflow to build short-term memory LSTM in python". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "how to use tensorflow to build long-and short-time memory LSTM in python".

Introduction to LSTM 1. The problem of gradient disappearance of RNN

In the past, we have studied the RNN cyclic neural network, and its structure diagram is as follows:

The biggest problem is that when the values of W1, W2 and W3 are less than 0, if the sentence is long enough, then the gradient disappears in the back propagation and forward propagation of the neural network.

0.925 to 0.07, if a sentence has 20 to 30 words, then the implicit layer output of the first word will be 0.07 times that of the original, which is much lower than that of the last word.

The specific situation is as follows:

The short-term memory network is designed to solve the problem of gradient disappearance.

2. The structure of LSTM

The hidden layer of the original RNN has only one state h, passed from beginning to end, which is very sensitive to short-term input.

If we add another state c to preserve the long-term state, the problem can be solved.

For RNN and LSTM, the comparison of the two step units is as follows.

We unfold the structure of LSTM according to the time dimension:

We can see that at the n moment, there are three inputs to LSTM:

1. The input value of the current network

2. The output value of LSTM in the last moment

3. The status of the unit in the last moment.

There are two outputs from LSTM:

1. Current LSTM output value

2. The status of the unit at the current time.

3. LSTM's unique door structure.

LSTM uses two doors to control the contents of the unit status cn:

1. Forget gate, which determines how much of the previous unit state cn-1 is retained to the current moment.

2. The input gate (input gate), which determines how much of the input clockn of the network is saved to the unit state at the current time.

LSTM uses a door to control the contents of the current output value hn:

Output gate (output gate), which determines how much output the current unit status cn has.

Related functions tf.contrib.rnn.BasicLSTMCelltf.contrib.rnn.BasicLSTMCell (num_units, forget_bias=1.0, state_is_tuple=True, activation=None, reuse=None, name=None, dtype=None) of LSTM in tensorflow

The number of neurons in the num_units:RNN unit, that is, the number of output neurons.

Forget_bias: bias added forgetting doors. When recovering from a CudnnLSTM-trained checkpoint (checkpoin), it must be manually set to 0.0.

State_is_tuple: if True, the accepted and returned states are c_state and m_state 's 2Murtuple; if False, they are connected along the column axis. False is about to be deprecated.

Activation: activate the function.

Reuse: describes whether variables are reused in an existing scope. If it is not True and the existing scope already has a given variable, an error is thrown.

Name: the name of the layer.

Dtype: the data type of this layer.

When in use, it can be defined as:

Lstm_cell = tf.contrib.rnn.BasicLSTMCell (self.cell_size, forget_bias=1.0, state_is_tuple=True)

After the definition is complete, you can initialize the state:

Self.cell_init_state = lstm_cell.zero_state (self.batch_size, dtype=tf.float32) tf.nn.dynamic_rnntf.nn.dynamic_rnn (cell, inputs, sequence_length=None, initial_state=None, dtype=None, parallel_iterations=None, swap_memory=False, time_major=False, scope=None)

Cell: the lstm_cell defined above.

Inputs:RNN input. If time_major==false (default), it must be the tensor of the following shape: [batch _ size,max_time,...] Or a nested tuple of such elements. If time_major==true, it must be a tensor with the following shape: [Max _ time,batch_size,...] Or a nested tuple of such elements.

Sequence_length:Int32/Int64 vector size. Used to copy pass state and zero output when the sequence length of a batch element is exceeded. Therefore, it is more about performance than correctness.

Initial_state: _ init_state as defined above.

Dtype: data type.

Parallel_iterations: the number of iterations running in parallel. Those operations that do not have any time dependence and can be run in parallel will be. This parameter uses time to exchange space. The value > > 1 uses more memory, but takes less time, while smaller values use less memory, but take longer to calculate.

Time_major: the shape format of the input and output tensor. If true, the shape of these tensors must be [max_time,batch_size,depth]. If false, the shape of these tensors must be [batch_size,max_time,depth]. Using time_major=true is more efficient because it avoids transpositions at the beginning and end of RNN calculations. However, most TensorFlow data is batch master data, so by default this function is False.

Scope: the variable scope of the created subgraph; the default is "RNN".

At the end of the LSTM, you need to use this function to get the result.

Self.cell_outputs, self.cell_final_state = tf.nn.dynamic_rnn (lstm_cell, self.l_in_y, initial_state=self.cell_init_state, time_major=False)

A tuple (outputs, state) is returned:

The output of the last layer of outputs:LSTM is a tensor. If time_major== False, its shape is [batch_size,max_time,cell.output_size]. If time_major== True, its shape is [max_time,batch_size,cell.output_size].

States:states is a tensor. State is the final state, that is, the state of the last cell output in the sequence. In general, the shape of states is [batch_size, cell.output_size], but when the input cell is BasicLSTMCell, the shape of states is [2 cell.output_size], where 2 also corresponds to cell state and hidden state in LSTM.

The whole LSTM definition process is as follows:

Def add_input_layer (self,): the initial shape of # X is (256 batch,28 steps,28 inputs) # converted to (256 batch*28 steps,128 hidden) l_in_x = tf.reshape (self.xs, [- 1, self.input_size], name='to_2D') # get Ws and Bs Ws_in = self._weight_variable ([self.input_size) Self.cell_size]) bs_in = self._bias_variable ([self.cell_size]) # convert to (256 batch*28 steps,256 hidden) with tf.name_scope ('Wx_plus_b'): l_in_y = tf.matmul (l_in_x, Ws_in) + bs_in # (batch* n_steps, cell_size) = > (batch, n_steps) Cell_size) # (256-28256)-> (256-28256) self.l_in_y = tf.reshape (l_in_y, [- 1, self.n_steps, self.cell_size], name='to_3D') def add_cell (self): # number of neurons lstm_cell = tf.contrib.rnn.BasicLSTMCell (self.cell_size, forget_bias=1.0 State_is_tuple=True) # the size of each incoming batch with tf.name_scope ('initial_state'): self.cell_init_state = lstm_cell.zero_state (self.batch_size, dtype=tf.float32) # is not the main column self.cell_outputs, self.cell_final_state = tf.nn.dynamic_rnn (lstm_cell, self.l_in_y) Initial_state=self.cell_init_state, time_major=False) def add_output_layer (self): # set Ws,Bs Ws_out = self._weight_variable ([self.cell_size, self.output_size]) bs_out = self._bias_variable ([self.output_size]) # shape = (batch) Output_size) # (256 Wx_plus_b' 10) with tf.name_scope ('Wx_plus_b'): self.pred = tf.matmul (self.cell_final_state [- 1], Ws_out) + bs_out all code

This example is an example of handwriting recognition, which takes 28 rows of handwriting as the input of each step, and the input dimensions are 28 columns.

Import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_dataimport numpy as npmnist = input_data.read_data_sets ("MNIST_data", one_hot = "true") BATCH_SIZE = 256 # number of data per batch TIME_STEPS = 28 # Image 28 lines, divided into 28 step for transmission INPUT_SIZE = 28 # Image 28 columns OUTPUT_SIZE = 10 # A total of 10 hidden unit size outputs CELL_SIZE = 256 # RNN The number of neurons in the hidden layer LR = 1e-3 # learning rate Learning rate def get_batch (): # get training batch batch_xs,batch_ys = mnist.train.next_batch (BATCH_SIZE) batch_xs = batch_xs.reshape ([BATCH_SIZE,TIME_STEPS,INPUT_SIZE]) return [batch_xs,batch_ys] class LSTMRNN (object): # build LSTM-like def _ init__ (self, n_steps, input_size, output_size, cell_size Batch_size): self.n_steps = n_steps self.input_size = input_size self.output_size = output_size self.cell_size = cell_size self.batch_size = batch_size # input / output with tf.name_scope ('inputs'): self.xs = tf.placeholder (tf.float32, [None, n_steps, input_size] Name='xs') self.ys = tf.placeholder (tf.float32, [None, output_size] Name='ys') # directly add layers with tf.variable_scope ('in_hidden'): self.add_input_layer () # add LSTM's cell with tf.variable_scope (' LSTM_cell'): self.add_cell () # add layers directly with tf.variable_scope ('out_hidden'): Self.add_output_layer () # calculate loss with tf.name_scope ('cost'): self.compute_cost () # training with tf.name_scope (' train'): self.train_op = tf.train.AdamOptimizer (LR) .minimize (self.cost) # correct calculation self.correct_pre = tf.equal (tf.argmax (self.ys) 1), tf.argmax (self.pred,1)) self.accuracy = tf.reduce_mean (tf.cast (self.correct_pre,tf.float32)) def add_input_layer (self,): # X the initial shape is (256batch,28 steps,28 inputs) # converted to (256batch*28 steps,128 hidden) l_in_x = tf.reshape (self.xs, [- 1, self.input_size] Name='to_2D') # get Ws and Bs Ws_in = self._weight_variable ([self.input_size, self.cell_size]) bs_in = self._bias_variable ([self.cell_size]) # convert to (256 batch*28 steps,256 hidden) with tf.name_scope ('Wx_plus_b'): l_in_y = tf.matmul (l_in_x Ws_in) + bs_in # (batch * n_steps, cell_size) = > (batch, n_steps, cell_size) # (256-28256)-> (256-28256) self.l_in_y = tf.reshape (l_in_y, [- 1, self.n_steps, self.cell_size] Name='to_3D') def add_cell (self): # number of neurons lstm_cell = tf.contrib.rnn.BasicLSTMCell (self.cell_size, forget_bias=1.0, state_is_tuple=True) # the size of each afferent batch with tf.name_scope ('initial_state'): self.cell_init_state = lstm_cell.zero_state (self.batch_size) Dtype=tf.float32) # is not the main column self.cell_outputs, self.cell_final_state = tf.nn.dynamic_rnn (lstm_cell, self.l_in_y, initial_state=self.cell_init_state, time_major=False) def add_output_layer (self): # set Ws,Bs Ws_out = self._weight_variable ([self.cell_size Self.output_size]) bs_out = self._bias_variable ([self.output_size]) # shape = (batch,output_size) # (256 and 10) with tf.name_scope ('Wx_plus_b'): self.pred = tf.matmul (self.cell_final_state [- 1]) Ws_out) + bs_out def compute_cost (self): self.cost = tf.reduce_mean (tf.nn.softmax_cross_entropy_with_logits (logits = self.pred,labels = self.ys)) def _ weight_variable (self, shape, name='weights'): initializer = np.random.normal (0.010, size=shape) return tf.Variable (initializer, name=name Dtype = tf.float32) def _ bias_variable (self, shape, name='biases'): initializer = np.ones (shape=shape) * 0.1return tf.Variable (initializer, name=name,dtype = tf.float32) if _ name__ = ='_ main__': # build LSTMRNN model model = LSTMRNN (TIME_STEPS, INPUT_SIZE, OUTPUT_SIZE, CELL_SIZE BATCH_SIZE) sess = tf.Session () sess.run (tf.global_variables_initializer ()) # training 10000 times for i in range (10000): xs, ys = get_batch () # extract batch data if i = = 0: # initialization data feed_dict = {model.xs: xs, model.ys: ys } else: feed_dict = {model.xs: xs, model.ys: ys, model.cell_init_state: state # keep state continuity} # training _, cost, state, pred = sess.run ([model.train_op, model.cost Model.cell_final_state, model.pred], feed_dict=feed_dict) # print accuracy results if I% 20 = 0: print (model.accuracy,feed_dict = {model.xs: xs, model.ys: ys Model.cell_init_state: state # keep state continuity})) Thank you for reading The above is the content of "how to use tensorflow to build long and short time memory LSTM in python". After the study of this article, I believe you have a deeper understanding of how to use tensorflow to build long and short time memory LSTM in python, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.