Example Analysis of Python Pandas data structure 07/19 Update SLTechnology News&Howtos

Example Analysis of Python Pandas data structure

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

This article will explain in detail the example analysis of Python Pandas data structure for you. The editor thinks it is very practical, so I share it with you for reference. I hope you can get something after reading this article.

1 Pandas introduction

Libraries developed by WesMcKinney in 2008

Open source python library dedicated to data mining

Based on Numpy, take advantage of the high performance of Numpy module in computing.

Based on matplotlib, it can draw pictures easily.

Unique data structure

Numpy has been able to help us deal with data, can be combined with matplotlib to solve some data presentation and other problems, so what is the purpose of pandas learning?

Enhance the readability of charts

Convenient data processing capability

It is easy to read files.

Encapsulate the drawing and calculation of Matplotlib and Numpy

2 Pandas data structure

There are three data structures in Pandas: Series, DataFrame, and MultiIndex (called Panel in older versions).

Series is an one-dimensional data structure, DataFrame is a two-dimensional tabular data structure, and MultiIndex is a three-dimensional data structure.

2.1 Series

Series is a data structure similar to an one-dimensional array, which can save any type of data, such as integers, strings, floating-point numbers, etc., mainly composed of a set of data and related indexes.

2.1.1 creation of Series

# Import pandasimport pandas as pdpd.Series (data=None, index=None, dtype=None)

Parameters:

Data: incoming data, which can be ndarray, list, etc.

Index: the index must be unique and equal to the length of the data. If no index parameter is passed in, an integer index from 0mern is automatically created by default.

Dtype: type of data

Specify index creation:

Pd.Series ([6. 7, 5. 6, 3, 10, 2], index= [1, 2, 2, 3, 4, 5])

Create from dictionary data

Color_count = pd.Series ({'red':100,' blue':200, 'green': 500,' yellow':1000}) color_count

2.1.2 attributes of Series

To make it easier to manipulate indexes and data in Series objects, two properties, index and values, are provided in Series

1.index

Color_count.index# result Index (['blue',' green', 'red',' yellow'], dtype='object')

2.values

Color_count.values# result array ([200,500,100,1000])

Of course, you can also use indexes to get data:

Color_count [2] # result 1002.2 DataFrame

A DataFrame is an object similar to a two-dimensional array or table, such as excel, with both row and column indexes.

Row index, indicating different rows, horizontal index, called index,0 axis, axis=0

Column index, table name different column, vertical index, called columns,1 axis, axis=1

2.2.1 creation of DataFrame

# Import pandasimport pandas as pdpd.DataFrame (data=None, index=None, columns=None)

Parameters:

Index: line label. If no index parameter is passed in, an integer index from 0mern is automatically created by default.

Columns: column label. If no index parameter is passed in, an integer index from 0mern is automatically created by default.

Example: create a student score sheet

# generate 10 students Data of 5 courses score = np.random.randint (40,100,10,5) # results array ([[46, 93, 49, 70, 53], [42, 86, 65, 50, 87], [41, 74, 44, 87, 64], [62, 57, 45, 46, 86], [82, 46, 72, 85, 63], [82, 77, 61, 55, 41], [48] 41, 48, 52, 58], [90, 53, 95, 96, 78], [77, 49, 51, 76, 56], [79, 91, 75, 95, 66])

But this kind of data form is difficult to see what kind of data is stored, the readability is relatively poor!

Question: how to make the data more meaningful?

# use the data structure in Pandas score_df = pd.DataFrame (score)

Add row and column indexes:

# construct row index sequence subjects = ["Chinese", "mathematics", "English", "physics", "chemistry"] # construct column index sequence stu = ['classmate' + str (I) for i in range (score.shape [0])] # add row index data = pd.DataFrame (score, columns=subjects, index=stu)

2.2.2 Properties of DataFrame

1.shape

Data.shape# results (10,5)

2.index

Row index list for DataFrame

Data.index# result Index (['classmate 0, classmate 1, classmate 2, classmate 3, classmate 4, classmate 5, classmate 6, classmate 7, classmate 8, classmate 9], dtype='object')

3.columns

Column index list for DataFrame

Data.columns# results Index (['Chinese', 'Mathematics', 'English', 'Politics', 'Sports'], dtype='object')

4.values

Get the value of array directly

Data.valuesarray ([46, 93, 49, 70, 53], [42, 86, 65, 50, 87], [41, 74, 44, 87, 64], [62, 57, 45, 46, 86], [82, 46, 72, 85, 63], [82, 77, 61, 55, 41], [48, 41, 48, 52, 58], [90, 53, 95, 96, 78] [77, 49, 51, 76, 56], [79, 91, 75, 95, 66])

5.T

Transpose

Data.T

Output result:

6.head (5): displays the first 5 lines (very commonly used)

If you do not add parameters, the default is 5 lines. Fill in the parameter N to display the first N lines

Data.head (5)

7.tail (5): displays the last 5 lines

If you do not add parameters, the default is 5 lines. Fill in the parameter N to display the last N lines.

Data.tail (5)

2.2.3 setting of DatatFrame index

1. Modify the row and column index value

Stu = ["student _" + str (I) for i in range (score_df.shape [0])] # must modify data.index = stu as a whole

Note: the following modifications are wrong

# error modification method data.index [3] = 'student _ 3' # error

two。 Reset index

Reset_index (drop=False)

Set up a new subscript index

Drop: defaults to False and does not delete the original index. If True, delete the original index value

# reset the index, drop=Falsedata.reset_index ()

3. Set a column value as the new index

Set_index (keys, drop=True)

Keys: a list of column index names or column index names

Drop: boolean, default True. Delete the original column as a new index

Df = pd.DataFrame ({'month': [1,4,7,10],' year': [2012, 2014, 2013, 2014], 'sale': [55, 40, 84, 31]}) df = df.set_index ([' year', 'month'])

Note: through the settings just now, DataFrame becomes a DataFrame with MultiIndex.

This is the end of this article on "sample Analysis of Python Pandas data structures". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, please share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.