Case Analysis: time Series Prediction using LSTM Deep layer Neural Network 04/16 Update SLTechnology News&Howtos

Case Analysis: time Series Prediction using LSTM Deep layer Neural Network

2025-04-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article will focus on how to use the LSTM neural network architecture to provide time series predictions using Keras and Tensorflow, particularly on stock market datasets, to provide momentum indicators of stock prices.

The code for this framework can be found in the GitHub repo below (it assumes python version 3.5.x and the requirements version in the requirements.txt file). Deviations from these versions may cause errors): github.com/jaungiers/LSTM-Neural-Network-for-Time-Series-Prediction

What is an LSTM neuron?

One of the fundamental problems that has long plagued traditional neural network architectures is how to interpret input sequences that depend on information and context. This information can be the previous word in a sentence to facilitate predicting what the next word might be from context, or it can be temporal information for a sequence.

In short, traditional neural networks take separate data vectors each time and have no memory concept to help them handle tasks that require memory.

An early attempt to solve this problem was to use a simple feedback-type approach to the neurons in the network, where the output is fed back into the input to provide context for the input last seen. These are called recurrent neural networks (RNNs). While these RNNs work to some extent, they have a considerable drawback, so some of their important uses can lead to vanishing gradient problems. We won't expand on it, but say that RNNs aren't suitable for most real-world problems because of this problem, so we need to find another solution.

This is where long-term short-term memory (LSTM) neural networks come into play. Like RNN neurons, LSTM neurons can retain memory in their pipes to allow sequential and temporal problems to be solved without vanishing gradients that affect their performance.

Many research papers and articles on it can be found online, discussing in mathematical detail how LSTM cells work. However, in this article, we will not discuss the complex workings of LSTMs, as we are more concerned with their use.

For context, below is a diagram of the typical internal workings of an LSTM neuron. It consists of several layers and point-by-point operations that act as gates for data inputs and outputs, providing information about the state of the LSTM unit. This unit state is maintained through networks and inputs to long-term memory and context.

A simple sine wave example

To demonstrate the use of LSTM neural networks in predicting time series, let's start with the most basic thing we can think of as a time series: a reliable sine wave. Let's create the data we need to train many "oscillation models" of this function for the LSTM network.

The data provided in the Code Data folder contains the sinewave.csv file we created, which contains 5001 sine wave time periods with amplitude and frequency of 1 (angular frequency of 6.28) and time difference of 0.01. The result when drawn is as follows:

Data set is sine wave

Now that we have the data, what do we actually want to achieve? Simply put, we just want the LSTM to learn sine waves from the set window size of data we will provide, and hopefully we can ask the LSTM to predict the Nth step in the series, and it will continue to output sine waves.

We will first load the data transformation from the CSV file into the pandas data frame and then use it for output, which will provide the numpy array of data for LSTM. The Keras LSTM layer works by taking a numpy array of 3 dimensions (N, W, F), where N is the number of training sequences, W is the sequence length, and F is the number of features of each sequence. We chose to use a sequence length (read window size) of 50 to allow the network, so you can see the shape of a sine wave in each sequence, hopefully building your own sequence-based sequence pattern.

The sequence itself is a sliding window, so each shift by 1 results in a constant overlap with the previous window. When drawn, a typical training window with a sequence length of 50 is as follows:

Sinewave dataset training window

To load this data, we create a DataLoader class in our code that provides an abstraction for the data load layer. You'll notice that when you initialize the DataLoader object, you pass in the file name, determine the split variable for the percentage of data used for training and testing, and allow column variables for selecting one or more columns of data for one-dimensional or multidimensional analysis.

After we have a data object that allows us to load data, we need to build a deep neural network model. Similarly, for abstraction, our code framework uses model classes and config.json files to easily build instances of models given the desired architecture and hyperparameters stored in the configuration file. The main function for building our network is the build_model () function, which receives parsed configuration files.

This feature code is shown below and can be easily extended for future use on more complex architectures.

After loading the data and building the model, we can now proceed to train the model using our training data. To do this, we created a separate operational module that will combine our models and module abstractions for training, output, and visualization.

Here's the generic run-thread code for training our model:

For the output, we will run two types of predictions: the first will predict in a point-by-point fashion, i.e. we predict only a single point at a time, plot this point as a prediction, then predict along the next window using the complete test data and predict the next point again.

The second prediction we want to make is to predict a complete sequence, and we initialize the training window once with only the first part of the training data. Then the model predicts the next point, and we move the window, just like the point-by-point approach. The difference is that we use the data we predicted in the previous prediction to make the next prediction. In the second step, this means that only one data point (the last point) comes from the previous prediction. In the third prediction, the last two data points will come from the previous prediction, and so on. After 50 predictions, our model will then make predictions based on its previous predictions. This allows us to use the model to predict many time steps into the future, but since it makes predictions on top of predictions, which in turn can be predicted, will increase the error rate of predictions we will make further.

Below we can see the codes and corresponding outputs for point-by-point prediction and full sequence prediction.

point-by-point sine wave prediction

sine wave full sequence prediction

As a reference, you can see the network architecture and hyperparameters for the sine wave example in the configuration file below.

On superposition of real data we can see that with only 1 period and a fairly small training dataset, the LSTM deep neural network can predict sinusoidal functions very well.

As you can see, as we make more and more predictions about the future, the margin of error is amplified and increases as errors in previous predictions are used for future predictions. Thus, we see that in the complete series example, the farther the prediction is, the less accurate the frequency and magnitude of our predictions are compared to the true data. However, since the sin function is a very simple zero-noise oscillating function, it can still predict it well without overfitting. This is important because we can easily overfit the model by adding periods and taking out dropout layers. This training data is almost completely accurate and has the same pattern as the test data, but for other real-world examples, overfitting the model to the training data causes the test accuracy to plummet because the model does not generalize.

In the next step, we will try to use this model on such real data to see the effect.

The stock market is not so simple.

We predicted steps of several hundred sine waves on an accurate point-by-point basis. So we can now do the same thing in the stock market time series and make an immediate profit, right? Unfortunately, in the real world, it's not that simple.

Unlike sine waves, stock market time series are not any specific static function that can be mapped. Random walk is the best property to describe the time series movement of stock market. As a random process, true random walks have no predictable patterns, so it would be pointless to try to model them. Fortunately, there is ongoing debate in many quarters that the stock market is not a purely random process, which allows us to understand that time series may have some hidden pattern. It is these hidden patterns that make LSTM deep networks the leading candidates for prediction.

The data that this example will use is the sp500.csv file in the data folder. This file contains the opening price, high price, low price, closing price and daily trading volume of the Standard & Poor's 500 Stock Index from January 2000 to September 2018.

In the first example, we will create a one-dimensional model using only the closing price. Adjust the config.json file to reflect the new data, and we'll keep most of the parameters the same. However, one change that needs to be made is that, unlike sine waves, which have only a range of values between-1 and +1, closing prices are the ever-changing absolute prices of the stock market. This means that if we try to train the model without normalizing it, it will never converge.

To solve this problem, we will take each training/test data window of size n and normalize each window to reflect the percentage change from that window (so the data at point i = 0 will always be 0). We will normalize using the following equation and then de-normalize at the end of the prediction process to obtain a prediction of the real-world quantity:

n = normalized list of price changes [Window]

p = Original list of adjusted daily return prices [Window]

Normalization:

Denormalization:

We have added the normalize_windows () function to the DataLoader class to perform this transformation, and the configuration file contains Boolean normalization flags to indicate normalization of these windows.

With the normalization of the windows, we can now run the model just as we did for sine wave data. However, we made an important change in running this data, instead of using our framework's model.train () method, we just used the model.train_generator () method we created. We did this because we found it easy to run out of memory when trying to train large datasets, because the model.train () function loads the full dataset into memory and then applies normalization to every window in memory, easily causing memory overflow. Therefore, we used Keras 'fit_generator () function, which allows us to use Python generator to train the dataset dynamically to plot the data, which means memory utilization will be greatly reduced. The following code details the new running thread used to run the three types of predictions (point-by-point, full sequence, and multisequence).

As mentioned above, running the data on a single point-by-point prediction can very closely match what is returned. But it's a little deceptive. On closer inspection, the prediction line is made up of singular prediction points that have the entire previous true history window behind them. Thus, the network does not need to know the time series itself, except that the next point is likely not too far from the last point. So even if it gets a prediction at the wrong point, the next prediction will take into account the true history and ignore the incorrect prediction, and then allow errors to occur again.

While this may not initially sound optimistic for accurate prediction of the next price point, it does have some important uses. Although it doesn't know exactly what the next price is, it does accurately indicate the range of the next price.

This information can be used for applications such as volatility prediction (being able to predict periods of high or low volatility in the market is very beneficial for a particular trading strategy), or away from trading. Anomaly detection can be achieved by predicting the next point and then comparing it to real data, and if the real data value differs significantly from the predicted point, an anomaly flag can be flagged for that data point.

Standard & Poor's 500 Index Point by Point Forecast

Continuing with full sequence prediction, it seems that this proves to be the least useful prediction for this type of time series (at least training the model using these hyperparameters). We can see a slight collision at the beginning of the prediction, where the model follows some type of momentum, but soon we can see that the model determines that the best pattern is to converge to some equilibrium of the time series. At this stage, this may not seem to provide much value, but the mean-reversion trader may claim there that the model simply finds the average of the price series that will recover when volatility is eliminated.

Standard & Poor's 500 Index Full Series Forecast

Finally, we make a third prediction on this model, which I call multi-sequence prediction. This is a hybrid of full-sequence prediction because it still initializes the test window with test data, predicts the next point, and then creates a new window with the next point. But once it reaches the point where the input window consists entirely of past predictions, it stops, moves forward a full window length, resets the window with real test data, and starts the process again. In essence, this provides multiple trendline predictions for the test data to be able to analyze the extent to which the model acquires future momentum trends.

Standard & Poor's 500 index multi-series forecasting

We can see from the multi-series prediction that the network seems to predict trends (and trend magnitudes) correctly for the vast majority of time series. While not perfect, it does demonstrate the usefulness of LSTM deep neural networks in sequential and time series problems. With careful hyperparameter tuning, higher accuracy can certainly be achieved.

conclusion

Although the purpose of this paper is to give a working example of LSTM deep neural networks in practice, it only scratches the surface of their potential and applications in sequencing and timing problems.

At the time of writing, LSTM has been successfully applied to a wide range of real-world problems, from the classic time-series problems described here to text auto-correction, anomaly detection, and fraud detection, as well as to the heart of developing self-driving car technology.

There are some limitations to the current use of the LSTM described above, particularly when using financial time series, which themselves have non-stationary properties that are difficult to model (although progress has been made in using Bayesian deep neural network methods) to solve the non-stationarity problem of time series. Also for some applications, new advances in attention-based neural network mechanisms have been found to outpace LSTMs (and LSTMs combined with these attention-based mechanisms outperform themselves).

However, to date, LSTM has provided significant advances over more classical statistical time series methods, being able to model relationships non-linearly, and being able to process data with multiple dimensions in a non-linear manner.

The complete source code for the framework we developed can be found under the MIT license on the following GitHub page: https: //github.com/jaungiers/LSTM-Neural-Network-for-Time-Series-Prediction

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.