In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
Editor to share with you how to filter, display and statistics by date in Pandas, I believe most people do not know much about it, so share this article for your reference. I hope you will gain a lot after reading this article. Let's learn about it together.
The running environment is windows system, 64-bit, python3.5.
1 read and organize the data
First introduce the pandas library
1. Import pandas as pd
Read data from a csv file
1. Df = pd.read_csv ('date.csv', header=None)
Print (df.head (2))
1. 0 1
2. 0 2013-10-24 3
3. 1 2013-10-25 4
Collate the data
1. Df.columns = ['date','number']
2. Df ['date'] = pd.to_datetime (df [' date']) # convert data type to date type
3. Df = df.set_index ('date') # set date to index
4. Print (df.head (2))
Print (df.tail (2))
6. Print (df.shape)
1. Number
2. Date
3. 2013-10-24 3
4. 2013-10-25 4
5. Number
6. Date
7. 2017-02-14 6
8. 2017-02-22 6
9. (425,1)
The total number of rows in df is 425.
View the data type of Dataframe
1. Print (type (df))
2. Print (df.index)
3. Print (type (df.index))
1.
2. DatetimeIndex (['2013-10-24,' 2013-10-25, '2013-10-29,' 2013-10-30'
3. 2013-11-04, 2013-11-06, 2013-11-08, 2013-11-12
4. '2013-11-14,' 2013-11-25'
5....
6. '2017-01-03,' 2017-01-07, '2017-01-14, 2017-01-17'
7. '2017-01-23,' 2017-01-25, '2017-01-26, 2017-02-07'
8. '2017-02-14,' 2017-02-22]
9. Dtype='datetime64 [ns]', name='date', length=425, freq=None)
10.
Construct Series type data
1. S = pd.Series (df ['number'], index=df.index)
Print (type (s))
3. S.head (2)
1.
two。
3. Date
4. 2013-10-24 3
5. 2013-10-25 4
6. Name: number, dtype: int64
2 filter data by date
Obtain data on an annual basis
1. Print ('- get data for 2013 -')
2. Print (df ['2013'] .head (2)) # get the data for 2013
3. Print (df ['2013'] .tail (2)) # get data for 2013
1.-get data for 2013-
2. Number
3. Date
4. 2013-10-24 3
5. 2013-10-25 4
6. Number
7. Date
8. 2013-12-27 2
9. 2013-12-30 2
Get data from 2016 to 2017
1. Print ('- get data from 2016 to 2017 -')
2. Print (df ['20162013 / 2017'] .head (2)) # get data from 2016 to 2017
3. Print (df ['20162013 / 2017'] .tail (2)) # get data from 2016 to 2017
1.-access to data from 2016 to 2017-
2. Number
3. Date
4. 2016-01-04 4
5. 2016-01-07 6
6. Number
7. Date
8. 2017-02-14 6
9. 2017-02-22 6
Get the data of a month
1. Print ('- get a month's data -')
2. Print (df ['2013-11']) # get the data of a month
1.-get a month's data-
2. Number
3. Date
4. 2013-11-04 1
5. 2013-11-06 3
6. 2013-11-08 1
7. 2013-11-12 5
8. 2013-11-14 2
9. 2013-11-25 1
10. 2013-11-29 1
Get the data of a specific day
Please note that the data of dataframe type is somewhat different from that of series when obtaining data for a specific day, as shown in the following code:
1. # filter data by date
2. Print ('- get data for a specific day -')
3. # get the data of a specific day
4. Print (s ['2013-11-06])
5.
6. # if you get the data of a specific day, you will report an error when you directly select it with datafrme, but there will be no problem with the data of series.
7. # print (df ['2013-11-06'])
8.
9. # you can consider using intervals to get data for a certain day
10. Print (df ['2013-11-06])
1.-get data for a specific day-
2. 3
3. Number
4. Date
5. 2013-11-06 3
The truncate function of dataframe can get the data before or after a certain period, or the data of a certain time interval.
However, it is generally recommended to use slice directly, which is more intuitive and convenient.
1. The truncate function of # dataframe can obtain data before or after a certain period, or data in a certain time interval.
2. # but it is generally recommended to use slice directly, which is more intuitive and convenient
3. Print ('- obtain data before or after a certain period -')
4. Print ('- after-')
5. Print (df.truncate (after = '2013-11'))
6. Print ('- before-')
7. Print (df.truncate (before='2017-02'))
1.-obtain data before or after a certain period of time-
2.-after-
3. Number
4. Date
5. 2013-10-24 3
6. 2013-10-25 4
7. 2013-10-29 2
8. 2013-10-30 1
9.-before-
10. Number
11. Date
12. 2017-02-07 8
13. 2017-02-14 6
14. 2017-02-22 6
3 display data by date
3.1 to_period () method
Please note that the data type of df.index is DatetimeIndex
The data type of df_peirod is PeriodIndex
Show by month, but do not count
1. Df_period = df.to_period ('M') # is displayed on a monthly basis, but does not count
2. Print (type (df_period))
3.
4. Print (type (df_period.index))
5. # Please note that the data type of df.index is DatetimeIndex
6. The data type of df_peirod is PeriodIndex
7.
8. Print (df_period.head ())
1.
two。
3. Number
4. Date
5. 2013-10 3
6. 2013-10 4
7. 2013-10 2
8. 2013-10 1
9. 2013-11 1
Show by quarter, but do not count
1. Print (df.to_period ('Q'). Head ()) # shows on a quarterly basis, but does not count
1. Number
2. Date
3. 2013Q4 3
4. 2013Q4 4
5. 2013Q4 2
6. 2013Q4 1
7. 2013Q4 1
Displayed by year, but not counted
1. Print (df.to_period ('A'). Head ()) # shows by year, but does not count
1. Number
2. Date
3. 2013 3
4. 2013 4
5. 2013 2
6. 2013 1
7. 2013 1
3.2 asfreq () method
Display by annual frequency
1. Df_period.index.asfreq ('A') #'A' defaults to 'AmurDECO', others such as' AmurJan'
1. PeriodIndex (['2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013)
2. '2013', '2013'
3....
4. '2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017
5. '2017years,' 2017']
6. Dtype='period [A-DEC]', name='date', length=425, freq='A-DEC')
1. Df_period.index.asfreq ('Amurjan') #'A' defaults to 'Amura DEC', others such as' Ameljan'
1. PeriodIndex ([2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014)
2. '2014', '2014'
3....
4. '2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2018
5. '2018', '2018']
6. Dtype='period [A-JAN]', name='date', length=425, freq='A-JAN')
For the display of annual frequency in different situations, please refer to the following figure:
Display by quarterly frequency
1. Df_period.index.asfreq ('Q') #'Q' defaults to 'QMurDEC', others such as "Q-SEP" and "Q-FEB"
1. PeriodIndex (['2013 Q4,' 2013Q4, '2013Q4,' 2013Q4, '2013Q4,' 2013Q4'
2. '2013 Q4,' 2013Q4, '2013Q4,' 2013Q4
3....
4. '2017Q1regions,' 2017Q1bands, '2017Q1bands,' 2017Q1bands, '2017Q1bands,' 2017Q1s
5. '2017Q1regions,' 2017Q1bands, '2017Q1bands,' 2017Q1']
6. Dtype='period [Q-DEC]', name='date', length=425, freq='Q-DEC')
1. Df_period.index.asfreq ('QmurSEP') # can display different quarterly fiscal years, "Q-SEP", "Q-FEB"
2. # df_period.index = df_period.index.asfreq ('QmurDEC') # can display different quarterly fiscal years, "Q-SEP", "Q-FEB"
3. # print (df_period.head ())
1. PeriodIndex (['2014 Q1,' 2014 Q1, '2014Q1,' 2014Q1, '2014Q1,' 2014Q1
2. '2014Q1customers,' 2014Q1bands, '2014Q1bands,' 2014Q1'
3....
4. '2017Q2regions,' 2017Q2bands, '2017Q2bands,' 2017Q2bands, '2017Q2bands,' 2017Q2'
5. '2017Q2,' 2017Q2, '2017Q2,' 2017Q2]
6. Dtype='period [Q-SEP]', name='date', length=425, freq='Q-SEP')
For the display of quarterly frequency in different situations, please refer to the following figure:
Display by monthly frequency
1. Df_period.index.asfreq ('M') # is displayed by month
1. PeriodIndex (['2013-10,' 2013-10, '2013-10,' 2013-10, '2013-11,' 2013-11'
2. '2013-11,' 2013-11, '2013-11, 2013-11'
3....
4. 2017-01, 2017-01, 2017-01, 2017-01, 2017-01, 2017-01
5. '2017-01,' 2017-02, '2017-02,' 2017-02]
6. Dtype='period [M]', name='date', length=425, freq='M')
Display by working day
Method 1
1. Df_period.index.asfreq ('dated, how='start') # is displayed by working date
1. PeriodIndex (['2013-10-01,' 2013-10-01, '2013-10-01,' 2013-10-01'
2. '2013-11-01,' 2013-11-01, '2013-11-01, 2013-11-01'
3. '2013-11-01,' 2013-11-01'
4....
5. '2017-01-02,' 2017-01-02, '2017-01-02,' 2017-01-02
6. '2017-01-02,' 2017-01-02, '2017-01-02, 2017-02-01'
7. '2017-02-01,' 2017-02-01']
8. Dtype='period [B]', name='date', length=425, freq='B')
Method 2
1. Df_period.index.asfreq ('dated, how='end') # is displayed by working date
2. '2013-11-29,' 2013-11-29, '2013-11-29, 2013-11-29'
4....
5. '2017-01-31,' 2017-01-31, '2017-01-31,' 2017-01-31
6. '2017-01-31,' 2017-01-31, '2017-01-31, 2017-02-28'
7. '2017-02-28,' 2017-02-28]
8. Dtype='period [B]', name='date', length=425, freq='B')
4 Statistics by date
4.1 Statistics by date
Weekly statistics
1. Print (df.resample ('w'). Sum (). Head ())
2. "w", week
1. Number
2. Date
3. 2013-10-27 7.0
4. 2013-11-03 3.0
5. 2013-11-10 5.0
6. 2013-11-17 7.0
7. 2013-11-24 NaN
Monthly statistics
1. Print (df.resample ('M'). Sum (). Head ())
2. "MS" is the first day of each month as the start date, and "M" is the last day of each month
1. Number
2. Date
3. 2013-10-31 10
4. 2013-11-30 14
5. 2013-12-31 27
6. 2014-01-31 16
7. 2014-02-28 4
Quarterly statistics
1. Print (df.resample ('Q'). Sum (). Head ())
2. "QS" is the first day of each quarter as the start date, and "Q" is the last day of each quarter
1. Number
2. Date
3. 2013-12-31 51
4. 2014-03-31 73
5. 2014-06-30 96
6. 2014-09-30 136
7. 2014-12-31 148
Annual statistics
1. Print (df.resample ('AS'). Sum ())
2. "AS" is the first day of the year as the start date, and "An is the last day of the year
1. Number
2. Date
3. 2013-01-01 51
4. 2014-01-01 453
5. 2015-01-01 743
6. 2016-01-01 1552
7. 2017-01-01 92
For the type of date, select the appropriate staging frequency as shown in the following figure:
4.2 after statistics by date, display by year or quarter or month
Annual statistics and display
1. Print (df.resample ('AS'). Sum (). To_period (' A'))
2. # Statistics and display by year
1. Number
2. Date
3. 2013 51
4. 2014 453
5. 2015 743
6. 2016 1552
7. 2017 92
Statistics by quarter and display
1. Print (df.resample ('Q'). Sum (). To_period ('Q'). Head ()
2. # Statistics and display by quarter
1. Number
2. Date
3. 2013Q4 51
4. 2014Q1 73
5. 2014Q2 96
6. 2014Q3 136
7. 2014Q4 148
Statistics and display by month
1. Print (df.resample ('M'). Sum (). To_period ('M'). Head ()
2. # Statistics and display by month
1. Number
2. Date
3. 2013-10 10
4. 2013-11 14
5. 2013-12 27
6. 2014-01 16
7. 2014-02 4
The above is all the contents of the article "how to filter, display and Statistics by date in Pandas". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.