In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article will explain in detail the example analysis of Python Pandas data structure for you. The editor thinks it is very practical, so I share it with you for reference. I hope you can get something after reading this article.
1 Pandas introduction
Libraries developed by WesMcKinney in 2008
Open source python library dedicated to data mining
Based on Numpy, take advantage of the high performance of Numpy module in computing.
Based on matplotlib, it can draw pictures easily.
Unique data structure
Numpy has been able to help us deal with data, can be combined with matplotlib to solve some data presentation and other problems, so what is the purpose of pandas learning?
Enhance the readability of charts
Convenient data processing capability
It is easy to read files.
Encapsulate the drawing and calculation of Matplotlib and Numpy
2 Pandas data structure
There are three data structures in Pandas: Series, DataFrame, and MultiIndex (called Panel in older versions).
Series is an one-dimensional data structure, DataFrame is a two-dimensional tabular data structure, and MultiIndex is a three-dimensional data structure.
2.1 Series
Series is a data structure similar to an one-dimensional array, which can save any type of data, such as integers, strings, floating-point numbers, etc., mainly composed of a set of data and related indexes.
2.1.1 creation of Series
# Import pandasimport pandas as pdpd.Series (data=None, index=None, dtype=None)
Parameters:
Data: incoming data, which can be ndarray, list, etc.
Index: the index must be unique and equal to the length of the data. If no index parameter is passed in, an integer index from 0mern is automatically created by default.
Dtype: type of data
Specify index creation:
Pd.Series ([6. 7, 5. 6, 3, 10, 2], index= [1, 2, 2, 3, 4, 5])
Create from dictionary data
Color_count = pd.Series ({'red':100,' blue':200, 'green': 500,' yellow':1000}) color_count
2.1.2 attributes of Series
To make it easier to manipulate indexes and data in Series objects, two properties, index and values, are provided in Series
1.index
Color_count.index# result Index (['blue',' green', 'red',' yellow'], dtype='object')
2.values
Color_count.values# result array ([200,500,100,1000])
Of course, you can also use indexes to get data:
Color_count [2] # result 1002.2 DataFrame
A DataFrame is an object similar to a two-dimensional array or table, such as excel, with both row and column indexes.
Row index, indicating different rows, horizontal index, called index,0 axis, axis=0
Column index, table name different column, vertical index, called columns,1 axis, axis=1
2.2.1 creation of DataFrame
# Import pandasimport pandas as pdpd.DataFrame (data=None, index=None, columns=None)
Parameters:
Index: line label. If no index parameter is passed in, an integer index from 0mern is automatically created by default.
Columns: column label. If no index parameter is passed in, an integer index from 0mern is automatically created by default.
Example: create a student score sheet
# generate 10 students Data of 5 courses score = np.random.randint (40,100,10,5) # results array ([[46, 93, 49, 70, 53], [42, 86, 65, 50, 87], [41, 74, 44, 87, 64], [62, 57, 45, 46, 86], [82, 46, 72, 85, 63], [82, 77, 61, 55, 41], [48] 41, 48, 52, 58], [90, 53, 95, 96, 78], [77, 49, 51, 76, 56], [79, 91, 75, 95, 66])
But this kind of data form is difficult to see what kind of data is stored, the readability is relatively poor!
Question: how to make the data more meaningful?
# use the data structure in Pandas score_df = pd.DataFrame (score)
Add row and column indexes:
# construct row index sequence subjects = ["Chinese", "mathematics", "English", "physics", "chemistry"] # construct column index sequence stu = ['classmate' + str (I) for i in range (score.shape [0])] # add row index data = pd.DataFrame (score, columns=subjects, index=stu)
2.2.2 Properties of DataFrame
1.shape
Data.shape# results (10,5)
2.index
Row index list for DataFrame
Data.index# result Index (['classmate 0, classmate 1, classmate 2, classmate 3, classmate 4, classmate 5, classmate 6, classmate 7, classmate 8, classmate 9], dtype='object')
3.columns
Column index list for DataFrame
Data.columns# results Index (['Chinese', 'Mathematics', 'English', 'Politics', 'Sports'], dtype='object')
4.values
Get the value of array directly
Data.valuesarray ([46, 93, 49, 70, 53], [42, 86, 65, 50, 87], [41, 74, 44, 87, 64], [62, 57, 45, 46, 86], [82, 46, 72, 85, 63], [82, 77, 61, 55, 41], [48, 41, 48, 52, 58], [90, 53, 95, 96, 78] [77, 49, 51, 76, 56], [79, 91, 75, 95, 66])
5.T
Transpose
Data.T
Output result:
6.head (5): displays the first 5 lines (very commonly used)
If you do not add parameters, the default is 5 lines. Fill in the parameter N to display the first N lines
Data.head (5)
7.tail (5): displays the last 5 lines
If you do not add parameters, the default is 5 lines. Fill in the parameter N to display the last N lines.
Data.tail (5)
2.2.3 setting of DatatFrame index
1. Modify the row and column index value
Stu = ["student _" + str (I) for i in range (score_df.shape [0])] # must modify data.index = stu as a whole
Note: the following modifications are wrong
# error modification method data.index [3] = 'student _ 3' # error
two。 Reset index
Reset_index (drop=False)
Set up a new subscript index
Drop: defaults to False and does not delete the original index. If True, delete the original index value
# reset the index, drop=Falsedata.reset_index ()
3. Set a column value as the new index
Set_index (keys, drop=True)
Keys: a list of column index names or column index names
Drop: boolean, default True. Delete the original column as a new index
Df = pd.DataFrame ({'month': [1,4,7,10],' year': [2012, 2014, 2013, 2014], 'sale': [55, 40, 84, 31]}) df = df.set_index ([' year', 'month'])
Note: through the settings just now, DataFrame becomes a DataFrame with MultiIndex.
This is the end of this article on "sample Analysis of Python Pandas data structures". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, please share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.