How to deal with time in Pandas in Python 04/16 Update SLTechnology News&Howtos

How to deal with time in Pandas in Python

2025-04-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly introduces the Python Pandas how to deal with time, has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, the following let the editor take you to understand it.

Brief introduction

Time should be a data type that is often used in data processing. In addition to the datetime64 and timedelta64 data types in Numpy, pandas also integrates the functions of other python libraries such as scikits.timeseries.

Time classification

There are four types of time in pandas:

Date times: date and time, with time zone. Similar to datetime.datetime in the standard library.

Time deltas: absolute duration, similar to datetime.timedelta in the standard library.

Time spans: a time span defined by a point in time and its associated frequency.

Date offsets: the time calculated based on the calendar is similar to dateutil.relativedelta.relativedelta.

We use a table to show:

Type scalar class array classpandas data type main creation method Date timesTimestampDatetimeIndexdatetime64 [ns] or datetime64 [ns, tz] to_datetime or date_rangeTime deltasTimedeltaTimedeltaTimedeltaIndextimedelta64 [ns] to_timedelta or timedelta_rangeTime spansPerformance indexing indexers [freq] Period or period_rangeDate offsetsDateOffsetNoneNoneDateOffset

Take a look at an example of use:

In [19]: pd.Series (range (3), index=pd.date_range ("2000", freq= "D", periods=3)) Out [19]: 2000-01-01 02000-01-02 12000-01-03 2Freq: d, dtype: int64

Take a look at the null values of the above data types:

In [24]: pd.Timestamp (pd.NaT) Out [24]: NaTIn [25]: pd.Timedelta (pd.NaT) Out [25]: NaTIn [26]: pd.Period (pd.NaT) Out [26]: NaT# Equality acts as np.nan wouldIn [27]: pd.NaT = = pd.NaTOut [27]: FalseTimestamp

Timestamp is the most basic type of time, and we can create it like this:

In [28]: pd.Timestamp (datetime.datetime (2012, 5, 1)) Out [28]: Timestamp ('2012-05-01 00 Out 00') In [29]: pd.Timestamp ("2012-05-01") Out [29]: Timestamp (' 2012-05-01 0000 Out') In [30]: pd.Timestamp (2012, 5, 1) Out [30]: Timestamp ('2012-05-01 00001 Out) DatetimeIndex

Timestamp is automatically converted to DatetimeIndex as index:

In [33]: dates = [....: pd.Timestamp ("2012-05-01"),....: pd.Timestamp ("2012-05-02"),....: pd.Timestamp ("2012-05-03"),....: In [34]: ts = pd.Series (np.random.randn (3)) Dates) In [35]: type (ts.index) Out [35]: pandas.core.indexes.datetimes.DatetimeIndexIn [36]: ts.indexOut [36]: DatetimeIndex (['2012-05-01,' 2012-05-02, '2012-05-03]], dtype='datetime64 [ns]', freq=None) In [37]: tsOut [37]: 2012-05-01 0.4691122012-05-02-0.2828632012-05-03-1.509059dtype: float64date_range and bdate_range

You can also use date_range to create a DatetimeIndex:

In [74]: start = datetime.datetime (2011, 1,1) In [75]: end = datetime.datetime (2012, 1,1) In [76]: index = pd.date_range (start, end) In [77]: indexOut [77]: DatetimeIndex (['2011-01-01-01,' 2011-01-02, '2011-01-03,' 2011-01-04, '2011-01-05,' 2011-01-06' '2011-01-07,' 2011-01-08, '2011-01-09,' 2011-01-10,... '2011-12-23,' 2011-12-24, '2011-12-25,' 2011-12-26, '2011-12-27, 2011-12-28, 2011-12-29 '2011-12-30,' 2011-12-31, '2012-01-01], dtype='datetime64 [ns], length=366, freq='D')

Date_range is the calendar range and bdate_range is the working day range:

In [78]: index = pd.bdate_range (start, end) In [79]: indexOut [79]: DatetimeIndex (['2011-01-03,' 2011-01-04, '2011-01-05,' 2011-01-06, '2011-01-07,' 2011-01-10, '2011-01-11, 2011-01-12, 2011-01-13' '2011-01-14,...' 2011-12-19, '2011-12-20,' 2011-12-21, '2011-12-22,' 2011-12-23, '2011-12-26,' 2011-12-27, '2011-12-28,' 2011-12-29, 2011-12-30] Dtype='datetime64 [ns]', length=260, freq='B')

Both methods can take start, end, and periods parameters.

In [84]: pd.bdate_range (end=end, periods=20) In [83]: pd.date_range (start, end, freq= "W") In [86]: pd.date_range ("2018-01-01", "2018-01-05", periods=5) origin

Using the origin parameter, you can modify the starting point of the DatetimeIndex:

In [67]: pd.to_datetime ([1,2,3], unit= "D", origin=pd.Timestamp ("1960-01-01")) Out [67]: DatetimeIndex (['1960-01-02,' 1960-01-03, '1960-01-04], dtype='datetime64 [ns]', freq=None)

By default, origin='unix', that is, the starting point is 1970-01-01 00:00:00.

In [68]: pd.to_datetime ([1,2,3], unit= "D") Out [68]: DatetimeIndex (['1970-01-02,' 1970-01-03, '1970-01-04]], dtype='datetime64 [ns]', freq=None) formatting

Time can be formatted using the format parameter:

In [51]: pd.to_datetime ("2010-11-12", format= "% Y/%m/%d") Out [51]: Timestamp ('2010-11-1200 Y/%m/%d 00') In [52]: pd.to_datetime ("12-11-2010 00:00", format= "% d-%m-%Y% HJV% M") Out [52]: Timestamp (' 2010-11-120000Fringe 00') Period

Period represents a time span and is usually used with freq:

In [31]: pd.Period ("2011-01") Out [31]: Period ('2011-01,' M') In [32]: pd.Period ("2012-05", freq= "D") Out [32]: Period ('2012-05-01,' D')

Period can perform operations directly:

In [345]: P = pd.Period ("2012", freq= "A-DEC") In [346]: P + 1Out [346]: Period ('20132013,' Amae DEC') In [347]: P-3Out [347]: Period ('2009', 'AmurDEC') In [348]: P = pd.Period ("2012-01", freq= "2m") In [349]: P + 2Out [349]: Period (' 2012-05') '2m') In [350]: P-1Out [350]: Period ('2011-11,' 2M')

Note that Period can do arithmetic only if it has the same freq. Including offsets and timedelta

In [352]: P = pd.Period ("2014-07-01 09:00", freq= "H") In [353]: P + pd.offsets.Hour (2) Out [353]: Period ('2014-07-01 11 Out,' H') In [354]: P + datetime.timedelta (minutes=120) Out [354]: Period ('2014-07-01 1115) In [355]: P + np.timedelta64 (7200) "s") Out [355]: Period ('2014-07-01 11 purse 00mm,' H')

Period can be automatically converted to PeriodIndex as index:

In [38]: periods = [pd.Period ("2012-01"), pd.Period ("2012-02"), pd.Period ("2012-03")] In [39]: ts = pd.Series (np.random.randn (3), periods) In [40]: type (ts.index) Out [40]: pandas.core.indexes.period.PeriodIndexIn [41]: ts.indexOut [41]: PeriodIndex (['2012-01,' 2012-02, '2012-03], dtype='period [M]' Freq='M') In [42]: tsOut [42]: 2012-01-1.1356322012-02 1.2121122012-03-0.173215Freq: M, dtype: float64

You can create a PeriodIndex through the pd.period_range method:

In [359]: prng = pd.period_range In: prngOut: PeriodIndex (['2011-01,' 2011-02, '2011-03,' 2011-04, '2011-05,' 2011-06, '2011-07,' 2011-08, '2011-09),' 2011-07, '2011-08,' 2011-09 '2011-12,' 2012-01], dtype='period [M]', freq='M')

You can also create it directly through PeriodIndex:

In: pd.PeriodIndex (["2011-1", "2011-2", "2011-3"], freq= "M") Out [361]: PeriodIndex (['2011-01,' 2011-02, '2011-03], dtype='period [M]', freq='M') DateOffset

DateOffset represents the frequency object. It is very similar to Timedelta in that it represents a duration, but has special calendar rules. For example, Timedelta must be 24 hours a day, while in DateOffset, depending on daylight saving time, there may be 23 hours or 25 hours a day.

# This particular day contains a day light savings time transitionIn [144A]: ts = pd.Timestamp ("2016-10-30 00:00:00", tz= "Europe/Helsinki") # Respects absolute timeIn [145i]: ts + pd.Timedelta (days=1) Out [145i]: Timestamp ('2016-10-302300VL 0000V 0200A, tz='Europe/Helsinki') # Respects calendar timeIn [146i]: ts + pd.DateOffset (days=1) Out [146A]: Timestamp (' 2016-10-3100Rd pd.Timestamp 000000V 0200' Tz='Europe/Helsinki') In: friday = pd.Timestamp ("2018-01-05") In: friday.day_name () Out: 'Friday'# Add 2 business days (Friday-- > Tuesday) In [149]: two_business_days = 2 * pd.offsets.BDay () In: two_business_days.apply (friday) Out + two_business_daysOut [151l]: Timestamp ('2018-01-0900In') In [152]: (friday + two_business_days) .day_name () Out [152]: 'Tuesday'

The DateOffsets and Frequency operations are turned off first. Take a look at the available DateOffset and its associated Frequency:

Date OffsetFrequency String describes DateOffsetNone generic offset BDay or BusinessDay'B' working days CDay or CustomBusinessDay'C' custom working days Week'W' the day of the week ordinal of each month LastWeekOfMonth'LWOM' the day of the last week of each month MonthEnd'M' calendar the end of MonthBegin'MS' calendar the beginning of BMonthEnd or BusinessMonthEnd'BM' business month the beginning of CBMonthEnd or CustomBusinessMonthEnd'CBM' business month end of CBMonthEnd or CustomBusinessMonthEnd'CBM' custom business month end CBMonthBegin or CustomBusinessMonthBegin'CBMS' customizes the beginning of the SemiMonthEnd'SM' calendar the 15th day of the end of the SemiMonthBegin'SMS' calendar the 15th day of the beginning of the QuarterEnd'Q' calendar the end of the QuarterBegin'QS' calendar the beginning of the BQuarterEnd'BQ working season the beginning of the BQuarterBegin'BQS' working season the beginning of the FY5253Quarter'REQ' retail season (52-53 week) the end of the YearEnd'A' calendar the end of the YearBegin'AS' or 'BYS' calendar the end of the BYearEnd'BA' business year the beginning of the BYearBegin'BAS' business year FY5253'RE' zero Sales year (aka 52-53 week) EasterNone Easter holiday BusinessHour'BH'business hourCustomBusinessHour'CBH'custom business hourDay'D' absolute time of day Hour'H' one hour Minute'T' or 'min' one minute Second'S' one second Milli'L' or' ms' a subtle Micro'U' or 'us' one millisecond Nano'N' one nanosecond

DateOffset also has two methods, rollforward () and rollback (), to move time:

In [153i]: ts = pd.Timestamp ("2018-01-06 00:00:00") In [154i]: ts.day_name () Out [154l]: 'Saturday'# BusinessHour's valid offset dates are Monday through FridayIn [155i]: offset = pd.offsets.BusinessHour (start= "09:00") # Bring the date to the closest offset date (Monday) In [156i]: offset.rollforward (ts) Out [156i]: Timestamp (' 2018-01-08 09Bring the date to the closest offset date) # Date Is brought to the closest offset date first and then the hour is addedIn [157]: ts + offsetOut [157]: Timestamp ('2018-01-08 10 purl 00')

The above operation automatically saves hours, minutes, and other information. If you want to set it to 00:00:00, you can call the normalize () method:

In: ts = pd.Timestamp ("2014-01-01 09:00") In [159160]: day = pd.offsets.Day () In [160160]: day.apply (ts) Out [160]: Timestamp ('2014-01-02 09Timestamp (ts)) In [161l]: day.apply (ts). Normalize () Out [161l]: Timestamp (' 2014-01-00000In) In [162]: ts = pd.Timestamp ("2014-01-01 22") In: hour = pd.offsets.Hour () In: hour.apply (ts) Out: Timestamp: hour.apply (ts). Normalize () Out: Timestamp ('2014-01-01) 23:30) In: hour.apply (pd.Timestamp (2014-01-01 23:30)). Normalize () Out '2014-01-02 00 00') as index

Time can be used as an index, and as an index will have some very convenient features.

You can use time directly to get the corresponding data:

In [99]: ts: Out [99]: 0.11920871129693428In: ts [datetime.datetime (2011, 12,25):] Out: 2011-12-30 0.56702Freq: BM, dtype: float64In: ts [10-31 0.2718602011-11-30-0.4249722011-12-30 0.567020Freq: BM, dtype: float64

Get data for the whole year:

In: ts ["2011"] Out: 2011-01-31 0.1192092011-02-28-1.0442362011-03-31-0.8618492011-04-29-2.1045692011-05-31-0.4949292011-06-30 1.0718042011-07-29 0.7215552011-08-31-0.7067712011-09-30-1.0395752011-10-31 0.2718602011-11-30-0.4249722011-12-30 0.567020Freq: BM, dtype: float64

Get data for a month:

In: ts ["2011-6"] Out: 2011-06-30 1.071804Freq: BM, dtype: float64

DF can accept time as a parameter of loc:

In: dftOut 00:00:00 0.2762322013-01-01 00:01:00-1.0874012013-01-01 00:02:00-0.6736902013-01-01 00:03:00 0.1136482013-01-01 00:04:00-1.478427. .. 2013-03-11 10:35:00-0.7479672013-03-11 10:36:00-0.0345232013-03-11 10:37:00-0.2017542013-03-11 10:38:00-1.5090672013-03-11 10:39:00-1.693043 [100000 rows x 1 columns] In: dft.loc [Out]: a 2013-01-01 00:00:00 0. 2762322013-01-01 00:01:00-1.0874012013-01-01 00:02:00-0.6736902013-01-01 00:03:00 0.1136482013-01-01 00:04:00-1.478427 .. 2013-03-11 10:35:00-0.7479672013-03-11 10:36:00-0.0345232013-03-11 10:37:00-0.2017542013-03-11 10:38:00-1.5090672013-03-11 10:39:00-1.693043 [100 000 rows x 1 columns]

Time slice:

In [2013-1 ":" 2013-2 "] Out: a 2013-01-01 00:00:00 0.2762322013-01-01 00:01:00-1.0874012013-01-01 00:02:00-0.6736902013-01-01 00:03:00 0.1136482013-01-01 00:04:00-1.478427. 2013-02-28 23:55:00 0.8509292013-02-28 23:56:00 0.9767122013-02-28 23:57:00-2.6938842013-02-28 23:58:00-1.5755352013-02-28 23:59:00-1.573517 [84960 rows x 1 columns] slices and exact match

Consider the following Series object with a precision of minutes:

In: series_minute = pd.Series (.: [1,2,3],.: pd.DatetimeIndex (.: ["2011-12-31 23:59:00", "2012-01-01 00:00:00", "2012-01-01 00:02:00"].) .: In [121]: series_minute.index.resolutionOut [121]: 'minute'

If the time precision is less than minutes, a Series object is returned:

In: series_minute ["2011-12-31 23"] Out: 2011-12-31 23:59:00 1dtype: int64

If the time precision is greater than minutes, a constant is returned:

In [123]: series_minute ["2011-12-31 23:59"] Out [123]: 1In [124]: series_minute ["2011-12-31 23:59:00"] Out [124]: 1

Similarly, if the precision is seconds, less than seconds will return an object, equal to seconds will return a constant value.

Operation of time series

Shifting

Use the shift method to make time series move accordingly:

In: ts = pd.Series (range (len (rng)), index=rng) In [276]: ts = ts [: 5] In [277]: ts.shift (1) Out [277]: 2012-01-01 NaN2012-01-02 0.02012-01-03 1.0Freq: d, dtype: float64

By specifying freq, you can set the way of shift:

In: ts.shift (5, freq= "D") Out: 2012-01-06 02012-01-07 12012-01-08 2Freq: d, dtype: int64In: ts.shift (5, freq=pd.offsets.BDay ()) Out [279]: 2012-01-06 02012-01-09 12012-01-10 2dtype: int64In Freq= "BM") Out: 2012-05-31 02012-05-31 12012-05-31 2dtype: int64 Frequency conversion

The frequency of a time series can be converted by calling the asfreq method:

In: dr = pd.date_range, periods=3, freq=3 * pd.offsets.BDay () In: ts = pd.Series (np.random.randn (3), index=dr) In: tsOut: 2010-01-01 1.4945222010-01-06-0.7784252010-01-11-0.253355Freq: 3B Dtype: float64In: ts.asfreq (pd.offsets.BDay ()) Out: 2010-01-01 1.4945222010-01-04 NaN2010-01-05 NaN2010-01-06-0.7784252010-01-07 NaN2010-01-08 NaN2010-01-11-0.253355Freq: B, dtype: float64

Asfreq can also specify the filling method after changing the frequency:

In: ts.asfreq (pd.offsets.BDay (), method= "pad") Out: 2010-01-01 1.4945222010-01-04 1.4945222010-01-05 1.4945222010-01-06-0.7784252010-01-07-0.7784252010-01-08-0.7784252010-01-01-11-0.253355Freq: B, dtype: float64Resampling resampling

A given time series can be resampled by calling the resample method:

In [286]: rng = pd.date_range ("1len 2012", periods=100, freq= "S") In [287]: ts = pd.Series (np.random.randint (0500, len (rng)), index=rng) In [288]: ts.resample ("5Min"). Sum () Out: 2012-01-01 25103Freq: 5T, dtype: int64

Resample can accept all kinds of statistical methods, such as sum, mean, std, sem, max, min, median, first, last, ohlc.

In: ts.resample ("5Min"). Mean () Out [289]: 2012-01-01 251.03Freq: 5T, dtype: float64In: ts.resample ("5Min"). Ohlc () Out [290]: open high low close2012-01-01 308 460 9 205In [291]: ts.resample ("5Min"). Max () Out [291]: 2012-01-01 460Freq: 5T, dtype: int64 Thank you for reading this article carefully I hope the article "how to deal with time in Pandas in Python" shared by the editor is helpful to everyone. At the same time, I also hope that you can support us and pay attention to the industry information channel. More related knowledge is waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.