Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use Pandas and Numpy to group data in Groupby by timestamp

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly explains "how to use Pandas and Numpy to group data in Groupby by timestamp". The content in the article is simple and clear, and it is easy to learn and understand. Please follow Xiaobian's train of thought to study and learn "how to use Pandas and Numpy to group data in Groupby by timestamp".

First of all, let's talk about the requirements. I need to group the data in minutes, and then output the data in each minute as a line. Because the amount of data varies at different times, all the data is based on the longest set of data. the insufficient data is made up by the last data.

After that, I would like to introduce my data source. The previously useless data columns have been removed. I only leave the data data column and the timestamp time column that I want to use. The timestamp is in seconds, and you can see that the total is 407454 rows.

Data time0 6522.50 1.530668e+091 6522.66 1.530668e+092 6523.79 1.530668e+093 6523.79 1.530668e+094 6524.82 1.530668e+095 6524.35 1.530668e+096 6523.66 1.530668e+097 6522.64 1.530668e+098 6523.25 1.530668e+099 6523.88 1.530668e+0910 6525.30 1.530668e+0911 6525.70 1.530668 ...... 407443 6310.69 1.531302e+09407444 6310.55 1.531302e+09407445 6310.42 1.531302e+09407446 6310.40 1.531302e+09407447 6314.03 1.531302e+09407448 6314.04 1.531302e+09407449 6312.84 1.531302e+09407450 6312.57 1.531302e+09407451 6312.56 1.531302e+09407452 6314.04 1.531302e+09407453 6314.04 1.531302e+09 [407454 rows x 2 columns]

Start data processing, define a function, and enter a name for a DataFrame and time column.

Def getdata_time (dataframe Name): dataframe [name] = dataframe [name] / 60 # convert time into minutes dataframe [name] = dataframe [name] .astype ('int64') datalen = dataframe.groupby (name). Count (). Max () # get the maximum length of data timeframe = dataframe.groupby (name). Count (). Reset_index () # convert the post-packet time to DataFrame timeseries = timeframe [' time'] array = [] # create an empty array to store the value for time Group in dataframe.groupby (name): tmparray = numpy.array (group ['data']) # convert series to an array and add it to the total group array.append (tmparray) notimedata = pandas.DataFrame (array) notimedata = notimedata.fillna (method='ffill',axis = 1 dint datalen [0]) # fill in the missing values notimedata [datalen [0] + 1] = timeseries # add time to the last column return notimedata

The following will be analyzed line by line, first grouped on a per-minute basis, so divide the timestamp of the stopmeter by 60 into minutes, and convert it to int for convenience. (it is not clear whether changing the type will lead to a lack of data accuracy. If anyone who knows about it sees it, welcome to point out, thank you).

Datalen is the maximum length of data per minute that we need to use as a basis for alignment. DataFrame.groupby.count () shows the number of each group of data separately, not how many packets there are. If you want to get the index of each group after a packet, you need to use the reset_index method of the next line. The reason why you do not use reset_index directly but call after the count () method is because the result of groupby grouping is not a DataFrame, and after count () (not just count, you can manipulate the grouped data. As long as the result is one-to-one corresponding to the index of each group) after the operation, you can get a DataFrame with index as one column and another column as the count result. The following is an error report for performing reset_index operations directly:

AttributeError: Cannot access callable attribute 'reset_index' of' DataFrameGroupBy' objects, try using the 'apply' method

The following is the result displayed by the reset_index method after the count operation. You can see that it is divided into 10397 groups:

Time data0 25511135 331 25511136 182 25511137 253 25511138 424 25511139 365 25511140 76 25511141 617 25511142 458 25511143 469 25511144 1910 25511145 21... ...... 10387 25521697 310388 25521698 910389 25521699 1610390 25521700 1310391 25521701 410392 25521702 3410393 25521703 12410394 25521704 30210395 25521705 8610396 25521706 52 [10397 rows x 2 columns]

The extracted timeseries will be used in the final data consolidation. Now start to extract each group of data, first establish an empty array for storage, and then use the for loop to obtain the information of each group. Time is the index,group of the packet, that is, the content of each packet. The data is extracted from the group ['data'] and added to the previously established empty array, and then converted to DataFrame after the circular operation. Of course, this DataFrame contains a large number of missing values, because its number of columns is based on the longest data. As follows:

0 1 23... 1143 1144 1145 11460 6522.50 6522.66 6523.79 6523.79... NaN NaN1 6523.95 6524.90 6525.00 6524.35... NaN NaN2 6520.87 6520.00 6520.45 6520.46... NaN NaN3 6516.34 6516.26 6516.21 6516.21... NaN NaN4 6513.28 6514.00 6514.00 6514.00... NaN NaN5 6511.98 6511.98 6511.99 6513.00... NaN NaN6 6511.00 6511.00 6511.00... NaN NaN7 6511.70 6511.78 6511.99 6511.99... NaN NaN8 6509.51 6510.00 6510.80 6510.80... NaN NaN9 6511.36 6510.00 6510.00 6510.00... NaN NaN10 6507.00 6507.00 6507.00... NaN NaN... ... 10386 6333.77 6331.31 6331.30 6333.19... NaN NaN10387 6331.68 6331.30 6331.68 NaN... NaN NaN10388 6331.30 6331.30 6331.00 6331.00... NaN NaN10389 6330.93 6330.92 6330.92 6330.93... NaN NaN10390 6330.83 6330.83 6330.90 6330.80... NaN NaN10391 6327.57 6326.00 6326.00 6325.74... NaN NaN10392 6327.57 6329.70 6328.85 6328.85... NaN NaN10393 6323.54 6323.15 6323.15 6322.77... NaN NaN10394 6311.00 6310.83 6310.83 6310.50... NaN NaN10395 6311.45 6311.32 6310.01 6310.01... NaN NaN10396 6310.46 6310.46 6310.56 6311.61... NaN [10397 rows x 1147 columns]

You can see that the number of rows is the number of groups, a total of 1147 columns is also the largest set of data length.

Then we populate the missing value by calling the fillna method. Method='ffill' is based on the data before the missing value, axis = 1 is in behavior unit, and limit is the maximum fill length. Finally, we add the timeseries we obtained earlier to the last column, and we get the final result of the requirement.

0 1 2... 1145 1146 11480 6522.50 6522.66 6523.79... 6522.14 6522.14 255111351 6523.95 6524.90 6525.00... 6520.00 6520.00 255111362 6520.87 6520.00 6520.45... 6517.00 6517.00 255111373 6516.34 6516.26 6516.21... 6514.00 6514.00 255111384 6513.28 6514.00 6514.00... 6511.97 6511.97 255111395 6511.98 6511.98 6511.99... 6511.00 6511.00 255111406 6511.00 6511.00 6511.00... 6510.90 6510.90 255111417 6511.70 6511.78 6511.99... 6512.09 6512.09 255111428 6509.51 6510.00 6510.80... 6512.09 6512.09 255111439 6511.36 6510.00 6510.00... 6507.04 6507.04 2551114410 6507. 00 6507.00 6507.00... 6508.57 6508.57 2551114511 6507.16 6507.74 6507.74... 6506.35 6506.35 25511146... ... 10388 6331.30 6331.30 6331.00... 6331.00 6331.00 2552169810389 6330.93 6330.92 6330.92... 6330.99 6330.99 2552169910390 6330.83 6330.83 6330.90... 6327.58 6327.58 2552170010391 6327.57 6326.00 6326.00... 6325.74 6325.74 2552170110392 6327. 57 6329.70 6328.85... 6325.00 6325.00 2552170210393 6323.54 6323.15 6323.15... 6311.00 6311.00 2552170310394 6311.00 6310.83 6310.83... 6315.00 6315.00 255217041095 6311.45 6311.32 6310.01... 6310.00 6310.00 2521705396 6310.46 6310.56... 6314.04 6314.04 2521706 [10397 rows x 1148 columns] Thank you for your reading The above is the content of "how to use Pandas and Numpy to group data in Groupby by timestamp". After the study of this article, I believe you have a deeper understanding of how to use Pandas and Numpy to group data in Groupby by timestamp, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report