Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use time Series in python

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article shows you how to use time series in python. The content is concise and easy to understand. It will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

Time series is one of the most common data types encountered in daily life. Stock prices, sales data, climate data, energy use, and even personal weight are data that can be collected on a regular basis. Almost every data scientist will encounter time series in his work, and being able to deal with these data effectively is an important skill in the data science toolbox.

Here is a brief description of how to use time series in python. This includes time series and some data operations to access the energy consumption data of London home smart meters using Pandas. You can retrieve the data used in this post here. Contains code that I think might be useful. (https://data.london.gov.uk/dataset/smartmeter-energy-use-data-in-london-households)

Let's start with the basics and look at the definition of time series:

A time series is a collection of data points indexed, listed, or drawn in chronological order. Usually, a time series is a series taken at a continuous time interval. Therefore, it is a sequence of discrete-time data.

Time series data is organized around a relatively definite timestamp, so compared with random samples, it may contain other information that we will try to extract.

Load and process time series

Data set

For example, using energy consumption readings of kilowatt hours (per half hour), for example, we took samples from London households participating in the "low carbon London" project led by the UK Power Grid between November 2011 and February 2014. We can make some exploration maps, preferably with a concept of structure and scope, which will also enable us to find the final missing values that need to be corrected.

For the rest of this article, we will focus only on DateTime and kWhcolumns.

Resampling

Let's start with a simple resampling technique. Resampling involves changing the frequency of time series observations. One reason you may be interested in resampling time series data is feature engineering. In fact, it can be used to provide additional structures for supervised learning models or to point out insights into learning problems. The resampling method in pandas is similar to the groupby method because you are actually grouping by a specific time span. You can then specify the resampling method. Let's make resampling more specific by looking at some examples. We will start with a weekly summary:

Data.resample () will be used to resample the kWh column of our DataFrame

"W" indicates that we want to resample on a weekly basis.

Sum () is used to indicate that the sum we want to use during this period is kWh.

We can do the same thing in the daily summary, and we can do the hourly summary using the groupby and mean functions:

For further resampling, pandas comes with a number of built-in options, and you can even define your own methods. The following two tables provide table cycle options and some common methods that may be used for resampling.

Other explorations

Here are some things you can explore about the data:

Modeling and prophet Framework

Facebook Prophet, released in 2017, can be used in Python and R. Prophet is designed to analyze time series, and daily observations show patterns on different time scales. Prophet is very good at dealing with missing data and changes in trends, and is usually good at dealing with outliers. It also has advanced features for modeling the impact of holidays on time series and implementing custom change points, but I will stick to using the basics to build and run the model. I think Prophet is a good choice for making rapid predictions because it has intuitive parameters that can be adjusted by people with good domain knowledge but lack of technical skills in predictive models. For more information about Prophet, refer to the official documentation here. (https://facebook.github.io/prophet/docs/quick_start.html)

Before using Prophet, we renamed the columns in the data to the correct format. The Date column must be called 'ds',' and the value column we want to predict'y'. We use daily summary data in the following example.

Then we import Prophet, create a model and fit the data. In Prophet, the changepoint_prior_scale (https://facebook.github.io/prophet/docs/trend_changepoints.html) parameter) is used to control the sensitivity of trends to changes, with higher values being more sensitive and lower values insensitive. After trying a series of values, I set this parameter from the default value of 0.05 to 0.10.

In order to predict, we need to create so-called future data frames. We specify the future cycle of the forecast (two months in our example) and the frequency of the forecast (daily). Then we use the Prophet model we created and the future data boxes to make predictions.

There's nothing to it! The future data box contains estimated household consumption over the next two months. We can use a graph to visualize the prediction:

Black dots represent actual values, blue lines represent predicted values, and light blue shaded areas indicate uncertainty.

As shown in the figure below, as we further develop in the future, uncertain areas will also grow, as the initial uncertainty will spread and grow over time.

Prophet also makes it easy to visualize overall trends and component patterns:

The annual pattern is interesting because it seems to indicate an increase in household consumption in autumn and winter and a decrease in spring and summer. Intuitively, this is exactly what we expect to see. Judging from the weekly trend, there seems to be more spending on Sundays than on other days of the week. Finally, the overall trend indicates that consumption increased by one year before a slow decline. Attempts to explain this trend require further investigation. In the next article, we will try to find out if it is related to the weather.

LSTM prediction

The neural network of long-term and short-term memory recurrence is expected to learn long-term observation sequences. This article, entitled "understanding LSTM Networks," does an excellent job of explaining the underlying complexity in an easy-to-understand way. (http://colah.github.io/posts/2015-08-Understanding-LSTMs/) the following figure is an image that describes the internal unit architecture of LSTM.

LSTM seems to be very suitable for time series prediction. Let's use our daily summary data again.

LSTM is sensitive to the size of the input data, especially when using sigmoid or tanh to activate functions. It is usually a good practice, also known as normalization, to readjust the data to the range of [0Phone1] or [- 1Phone1]. We can easily normalize the dataset using the MinMaxScaler preprocessing class in the scikit-learn library.

Now we can split the ordered data set into training and test data sets. The following code calculates the index of the split point and divides the data into training data sets, of which 80% of the observations can be used to train our model and the remaining 20% can be used to test the model.

We can define a function to create a new dataset and use that function to prepare the training and test datasets for modeling.

The LSTM network expects the input data to provide a specific array structure in the following form: [samples, time steps, characteristics].

Our data is currently in the form of [samples, characteristics], and we define the problem as two time steps for each sample. We can convert the prepared training and test input data into the expected structure, as follows:

That's all! We are now ready to design and adjust our LSTM network for our example.

From the loss graph, we can see that the model has similar performance in both training and test data sets.

In the following figure, we see that LSTM does a very good job of fitting test data sets.

Clustering

Last but not least, we can also use our sample data for clustering. There are many different ways to perform clusters, one of which is to form clusters hierarchically. You can form a hierarchy in two ways: split from the top or merge from the bottom. I decided to use the latter in this article.

Let's start with the data, we just need to import the original data and add two columns for the day of the year and the hour of the day.

Linkage function and tree diagram

The Linkage function groups objects according to their similarity and distance information. These newly formed clusters are linked to each other to create larger clusters. This process is iterated until all objects in the original dataset are linked to a hierarchical tree.

Clustering our data:

It's done! But what does "ward" mean? How does this actually work? As the scipy linkage documentation tells us, ward is one of the methods that can be used to calculate the distance between newly formed clusters. The "ward" link function is the Ward variance minimization algorithm.

Now let's take a look at the tree diagram of this hierarchical clustering. A tree is a hierarchical graph of clustering, in which the length of the tree represents the distance to the next cluster center.

If this is the first time you've seen the tree, it looks scary, but don't worry, let's separate it:

On the x-axis, you can see the label. If you don't specify anything else (such as me), they are the index of the sample in X.

On the y axis, you can see the distance (in our case, the word algorithm).

The horizontal line is cluster consolidation.

The vertical line tells you which clusters / tags are merged to form part of the new cluster

The height of the horizontal line tells you the distance required by the newly formed cluster.

Even if there is an explanation, the previous tree view is not obvious. We can "cut" a little bit so that we can look at the data better.

Much better, isn't it? View the clustering documents for more information and use different parameters.

The above is how to use time series in python. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report