Example Analysis of time Series Forecast based on LightGBM in Python 07/06 Update SLTechnology News&Howtos

Example Analysis of time Series Forecast based on LightGBM in Python

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces the example analysis of Python time series prediction based on LightGBM, which has a certain reference value, and interested friends can refer to it. I hope you will gain a lot after reading this article.

Preface

When we think about the enhancement tree of the time series, we usually think of the M5 competition, in which a large part of the top ten uses LightGBM. However, when using an enhanced tree in the case of a single variable, its performance is very poor because there are not a large number of exogenous features to take advantage of.

The first thing to be clear is that DID, the runner-up in the M4 competition, used the enhancement tree. But it acts as a metamodel to integrate other more traditional time series methods. In the code exposed on M4, the benchmark tests of all standard enhancement trees are quite poor, sometimes falling short of traditional prediction methods. Here is the excellent work done by the Sktime package and their papers:

Any model with "XGB" or "RF" uses tree-based integration. In the list above, Xgboost provides the best results of 10.9 in the hourly dataset! Then, but these models are just simple attempts made by Sktime in their framework, while the winner of M4 scored 9.3 on the same data set. In this chart we need to remember some numbers, such as 10.9 of the hourly dataset from XGB-s and the "best" result of the tree model in the weekly dataset: 9.0 from RF-t-s.

Our goal is to create a fast modeling program based on LightGBM and suitable for personal use that can absolutely surpass these numbers and is comparable to traditional statistical methods in terms of speed.

It sounds difficult, and our first idea may be that we have to optimize our tree. But the ascending tree is very complex, the changes are time-consuming, and the results are not necessarily effective. But one advantage is that we are fitting a single data set, can we start with features?

Features

When you look at other implementations of trees in a single variable space, you will see some feature projects, such as partitioning boxes, using the lag values of targets, simple counters, seasonal virtual variables, and perhaps Fourier functions. This is great for using traditional methods such as exponential smoothing. But our goal today is to characterize the time element and represent it as tabular data to provide the tree model, and LazyProphet emerged. In addition, LazyProphet contains an additional feature engineering element: to "connect" points.

It's simple to connect the first point of the time series, connect a line to another point in the middle, and then connect the point in the middle to the last point. Repeat several times, while changing which point is used as the "kink" (intermediate node), which is what we call "connection".

The following picture illustrates this point very well. The blue line is a time series, and the other lines are just "connection points":

It turns out that these are only weighted piecewise linear basis functions. One disadvantage of this is that the extrapolation of these lines may deviate. To solve this problem, an "attenuation" factor is introduced to punish the slope of each line from the midpoint to the last point.

On this basis, by adding lagging target values and Fourier basis functions, we can approach the most advanced performance on some problems. Because there are few requirements, we call it "LazyProphet".

Let's take a look at the actual application results.

Code

The datasets used here are open source and released on M-competitions github. The data has been divided into training and test sets, and we directly use the training csv for fitting, while the test csv is used for evaluation using SMAPE. Now import LazyProphet:

Pip install LazyProphet

After installation, start coding:

Import matplotlib.pyplot as pltimport numpy as npfrom tqdmimport tqdmimport pandas as pdfrom LazyProphet import LazyProphet as lptrain_df = pd.read_csv (rudm4murmurlymurafes.csv') test_df = pd.read_csv (rhumm4murmurlyMurray test.csv') train_df.index = train_df ['V1'] train_df = train_df.drop (' V1th, axis = 1) test_df.index = test_df ['V1'] test_df = test_df.drop (' V1th, axis = 1)

Weekly data will be read after all necessary packages are imported. Create the SMAPE function, which returns the SMAPE of the given forecast and actual value:

Def smape (A, F): return 100/len (A) * np.sum (2 * np.abs (F-A) / (np.abs (A) + np.abs (F)

For this experiment, the average values of all time series will be compared with other models. In order to perform a fitness check, we will also get an average SMAPE, which ensures that what we do is consistent with what we did in the game.

Smapes = [] naive_smape = [] j = tqdm (range (len (train_df)) for row in j: y = train_df.iloc [row,:] .dropna () y_test = test_df.iloc [row,:] .dropna () j.set_description (f'{np.mean (smapes)}, {np.mean (naive_smape)}') lp_model = LazyProphet (scale=True, seasonal_period=52) N_basis=10, fourier_order=10, ar=list (range (1,53)), decay=.99, linear_trend=None Decay_average=False) fitted = lp_model.fit (y) predictions = lp_model.predict (len (y_test)). Reshape (- 1) smapes.append (smape (y_test.values, pd.Series (predictions) .clip (lower=0)) naive_smape.append (smape (y_test.values, np.tile (y.iloc [- 1]) Len (y_test) print (np.mean (smapes)) print (np.mean (naive_smape))

Before looking at the results, let's take a quick look at the LazyProphet parameters.

Scale: this is simple, just whether or not to scale the data. The default is True.

Seasonal_period: this parameter controls the seasonal Fourier basis function because this is the weekly frequency at which we use 52.

N_basis: this parameter controls the weighted piecewise linear basis function. This is just an integer for the number of functions to use.

Fourier_order: the number of sine and cosine pairs used seasonally.

Ar: the value of the lagging target variable to use. You can get multiple lists 1-52.

Decay: the attenuation factor is used to punish the "right" of our base function. A setting of 0.99 multiplies the slope by (1-0.99) or 0.01.

Linear_trend: one of the main drawbacks of trees is that they cannot infer the scope of subsequent data. In order to overcome this problem, some off-the-shelf tests for polynomial trends will fit linear regression to eliminate trends. None means testing, True means always going out of trend, False means no testing and no linear trend is used.

Decay_average: not a useful parameter when using decay rates. This is a trick, but don't use it. Passing True is just all the future values of the average base function. This is useful for fitting with elasticnet programs, but not for LightGBM in testing.

Let's continue with the data:

Train_df = pd.read_csv (rumbm4murmurhourlyMust.csv') test_df = pd.read_csv (rumbm4MuhourlyMust.csv') train_df.index = train_df ['V1'] train_df = train_df.drop (' V1th, axis = 1) test_df.index = test_df ['V1'] test_df = test_df.drop (' V1') Axis = 1) smapes = [] naive_smape = [] j = tqdm (range (len (train_df) for row in j: y = train_df.iloc [row,:] .dropna () y_test = test_df.iloc [row,:] .dropna () j.set_description (f'{np.mean (smapes)}, {np.mean (naive_smape)}') lp_model = LazyProphet (seasonal_period= [24168]) N_basis=10, fourier_order=10, ar=list (range (1,25)), decay=.99) fitted = lp_model.fit (y) predictions = lp_model.predict (len (y_test)). Reshape (- 1) smapes.append (smape (y_test.values) Pd.Series (predictions) .clip (lower=0)) naive_smape.append (smape (y_test.values, np.tile (y.iloc [- 1], len (y_test) print (np.mean (smapes)) print (np.mean (naive_smape))

So what you really need to modify are the seasonal_period and ar parameters. When you pass list to seasonal_period, it builds a seasonal base function for everything in the list. Ar has been adjusted to adapt to the new main season 24.

Result

For the above Sktime results, the table is as follows:

LazyProphet beats Sktime's best model, which includes several different tree-based approaches. Lost to the winner of M4 on the hourly data set, but generally outperformed ES-RNN on average. The important thing to realize here is that only the default parameters are used to do this.

Boosting_params = {"objective": "regression", "metric": "rmse", "verbosity":-1, "boosting_type": "gbdt", "seed": 42, 'linear_tree': False,' learning_rate':. 15 'min_child_samples': 5, 'num_leaves': 31,' num_iterations': 50}

You can pass a dictionary of your parameters when you create a LazyProphet class, and you can optimize for each time series to get more benefits.

Compare our results with the goals mentioned above:

Zero parameter optimization (slightly modified for different seasonality)

Fit each time series separately

The prediction was generated "lazily" in a minute on my local machine.

Beat all other tree methods in the benchmark

It seems to be very successful so far but success may not be able to replicate completely because the amount of data in his dataset is much smaller so our approach tends to significantly degrade performance. According to the test LazyProphet performs better in high frequency and large amount of data, but LazyProphet is still a good choice for time series modeling, we do not need to take long to code to test, this time is still worth it.

Thank you for reading this article carefully. I hope the article "sample Analysis of Python time Series Prediction based on LightGBM" shared by the editor will be helpful to you. At the same time, I also hope you will support us and pay attention to the industry information channel. More related knowledge is waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.