In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces what are the knowledge points of Pandas in Python, which can be used for reference by interested friends. I hope you can learn a lot after reading this article.
Pandas is a tool based on NumPy, which is created to solve data analysis tasks. Pandas incorporates a large number of libraries and some standard data models, providing the tools needed to manipulate large datasets efficiently. Pandas provides a large number of functions and methods that enable us to process data quickly and easily. You will soon find that it is one of the important factors that make Python a powerful and efficient data analysis environment.
Introduction of 1.pandas data structure
Series: one-dimensional array, similar to one-dimensional array in Numpy. They are also very similar to List, the basic data structure of Python. Series can now save different data types, strings, Boolean values, numbers, and so on can be stored in Series.
Time- Series: Series indexed by time.
DataFrame: two-dimensional tabular data structure. Many features are similar to data.frame in R. You can think of DataFrame as a container for Series.
Panel: a three-dimensional array that can be understood as a container for DataFrame.
Operation of 2.Series
2.1 object creation 2.1.1 Direct creation 2.1.2 Dictionary creation
Import pandas as pd import numpy as np # directly create s = pd.Series (np.random.randn (5), index=]) print (s) # Dictionary (dict) type data creation s = pd.Series ({'astat10,' bounded np.random.randn 20, 'cantilevered np.random.randn 30}, index= [' baked purve10, 'baked VR20,' caterpillar 30}, index= ) OUT: a-0.620323 b-0.189133 c 1.677690 d-1.480348 e-0.539061 dtype: float64 OUT: a 10 b 20 c 30 dtype: int64
2.2 viewing data slices, indexing, dict manipulation Series since it is an one-dimensional array type of data structure, it supports manipulating it like an array. It can be manipulated by array indexing and slicing, and its data can be of type dict, so it certainly supports dictionary indexing.
Import pandas as pd import numpy as np s = pd.Series (np.random.randn (5), index=. 'e']) print (s) # subscript index print (' subscript index method s [0] =:% s'% s [0]) # dictionary access method print ('dictionary access method s [b] =:% s [' b']) # slicing operation print ('slicing operation s [2:]\ nprint% s [2:]) print Print ('K'in 's) OUT: a-0.799676 b-1.581704 c-1.240885 d 0.623757 e-0.234417 dtype: float64 subscript index mode s [0] =:-0.799676067487 dictionary access mode s [b] =:-1.58170351838 slice operation s [2:]: C-1.240885 d 0.623757 e-0.234417 True False
2.3 arithmetic operation of Series
Import pandas as pd import numpy as np S1 = pd.Series (np.random.randn (3), index= ['axiaxiajiaoyun']) s2 = pd.Series (np.random.randn (3), index= ['axiememain') ) print (s1+s2) print (s1-s2) print (s1*s2) print (s1/s2) OUT: a 0.236514 b-0.132153 c 0.203186 dtype: float64 a 0.305397 b-1.474441 c-1.697982 dtype: float64 a-0.009332 b-0.539128 c-0.710465 dtype: float64 a-7.867120 b-1.196907 c-0.786252 dtype: float64
Operation of 3.dataframe
3.1 object creation
In [70]: data = {'state': [' Ohio', 'Nevada',' Nevada'], 'year': [2000, 2001, 20...: 02, 2001, 2002],' pop': [1.5,1.7,3.6,2.4,2.9]} In [71]: data Out [71]: {'pop': [1.5,1.7,3.6,2.4,2.9] 'state': ['Ohio',' Nevada', 'Nevada'],' year': [2000, 2001, 2002, 2001 2002]} # build the DataFrame object In [72]: frame1 = DataFrame (data) # the red part is the automatically generated index In [73]: frame1 Out [73]: pop state year 01.5 Ohio 2000 11.7 Ohio 2001 23.6 Ohio 2002 32.4 Nevada 2001 42.9 Nevada 2002 > lista = [1, 2, 2, 5, Nevada, 7] > > listb = ['axiao, jungle, pommel, pommel, pompous, pompous, scintillation] > > df = pd.DataFrame ({' col1':lista) 'col2':listb}) > df col1 col2 0 1 a 1 2 b 2 5 c 3 7 d
3.2 Select data
In [1]: import numpy as np...: import pandas as...: df = pd.DataFrame In [2]: df Out [2]: a b c 0 0 24 16 8 10 2 12 14 16 3 18 20 22 4 24 26 26 5 30 32 34 6 36 38 40 7 42 46 8 48 50 52 9 54 56 58 In [3]: df.loc [0mementc'] Out [3]: 4 In [4]: df.loc [1:4, ['a'' [C']] Out [4]: a c 16 10 2 12 16 3 18 22 4 24 28 In [5]: df.iloc [0LJ 2] Out [5]: 4 In [6]: df.iloc [1:4, [0mem2]] Out [6]: a c 16 10 2 12 16 3 18 22
3.3 function application
Frame = pd.DataFrame (np.random.randn (4,3), columns=list ('bde'), index= [' Utah', 'Ohio',' Texas') Frame np.abs (frame) OUT: bd e Utah 0.204708 0.478943 0.519439 Ohio 0.555730 1.965781 1.393406 Texas 0.092908 0.281746 0.769023 Oregon 1.246435 1.007189 1.296221 f = lambda x: x.max ()-x.min () frame.apply (f) OUT: b 1.802165 d 1.684034 e 2.689627 dtype: float64 def f (x ): return pd.Series ([x.min () X.max ()], index= ['min',' max']) frame.apply (f) b de Utah-0.20 0.48-0.52 Ohio-0.56 1.97 1.39 Texas 0.09 0.28 0.77 Oregon 1.25 1.01-1.30
3.4 Statistical overview and calculation
Df = pd.DataFrame ([[1. 4, np.nan], [7. 1,-4. 5], [np.nan, np.nan], [0. 75,-1. 3]], index= ['one',' baked, 'cased,' d'], columns= ['one'] 'two']) df OUT: one two a 1.40 NaN b 7.10-4.5 c NaN NaN d 0.75-1.3 df.info () df.describe () Index: 4 entries A to d Data columns (total 2 columns): one 3 non-null float64 two 2 non-null float64 dtypes: float64 (2) memory usage: 256.0 + bytes OUT: one two count 3.000000 2.000000 mean 3.083333-2.900000 std 3.493685 2.262742 min 0.750000-4.500000 25% 1.075000-3.700000 50% 1.400000-2.900000 75% 4.250000-2.100000 max 7.100000-1.300000
3.5 data reading
Data = pd.read_csv ('. / dataset/HR.csv') data.info () out: RangeIndex: 14999 entries 0 to 14998 Data columns (total 10 columns): satisfaction_level 14999 non-null float64 last_evaluation 14999 non-null float64 number_project 14999 non-null int64 average_montly_hours 14999 non-null int64 time_spend_company 14999 non-null int64 Work_accident 14999 non-null int64 left non-null int64 promotion_last_5years 14999 non-null int64 sales 14999 non-null object salary 14999 non-null object dtypes: float64 (2) Int64 (6), object (2) memory usage: 1.1 + MB data = pd.read_csv ('. / dataset/movielens/movies.dat', header=None, names= ['name',' types'], sep='::' Engine='python') data.head () OUT: name types 1 Toy Story (1995) Animation | Children's | Comedy 2 Jumanji (1995) Adventure | Children's | Fantasy 3 Grumpier Old Men (1995) Comedy | Romance 4 Waiting to Exhale (1995) Comedy | Drama 5 Father of the Bride Part II (1995) Comedy data = pd.read_excel ('. / dataset/my_excel.xlsx' Sheet_name=1) data.head () ouput: date H1 H2 H3 0 2014-06-01 1 2 3 1 2014-06-02 2 3 4 2 2014-06-03 3 4 5 3 2014-06-04 4 5 6
# 4. Operation of Time- Series
Generate date range:
Import pandas as pd pd.data_range OUT: DatetimeIndex ('2019-03-13, 2019-03-14, 2019-03-15, 2019-03-16, 2019-03-17, 2019-03-18, 2019-03-19, 2019-03-20, 2019-03-21, 2019-03-22] Dtype='datetime64 [ns]', freq='D')
5. Drawing function
Ts = pd.DataFrame (np.random.randn (1000prime4), index=pd.date_range ('20180101), columns=list (' abcd')) ts = ts.cumsum () ts.plot (figsize = (12jing8)) plt.show ()
Thank you for reading this article carefully. I hope the article "what are the knowledge points of Pandas in Python" shared by the editor will be helpful to you. At the same time, I also hope you will support us and pay attention to the industry information channel. More related knowledge is waiting for you to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.