Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Introduction of Pandas data structure and how to create Series,DataFrame object

2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

The introduction of Pandas data structure and how to create Series,DataFrame objects, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain in detail for you, people with this need can come to learn, I hope you can get something.

In the Pandas tutorials on the network, many people have mentioned how to use Pandas to load existing data (such as csv, such as hdfs, etc.) directly into Pandas data objects, and then analyze the data based on them. However, very often, we need to create Pandas data objects and fill in some data, common application scenarios such as: we want to process the existing data and generate a new Pandas data object, and We want to use the data saving functions of Pandas (such as to_csv, to_json, to_hdf, etc.) to write the data we collected into IO, so it is also important to master the characteristics of Pandas objects and how to create them.

1. Two data types of Pandas

Pandas supports two data types, Series and DataFrame, where:

Series-is a tagged one-dimensional array that supports many different types, but the data types stored in the same Series must be consistent

DataFrame-is a two-dimensional array with tags, a table that can be modified in size, a DataFrame consisting of multiple Series, each column being a Series

In a word, Series is a collection of many scalar Scalar, while DataFrame is a collection of many Series.

Let's take a look at the following example. In the Series of 1D, there are three Series in the following picture, which store the name (name), age (age) and score (marks), respectively, and each of their rows corresponds to a different person's information. In each cell in each Series (such as the first line of name series, the corresponding Prasadi) is a scalar (Scalar), and the 01Q in front of each line is a scalar. 3 these are index of data, which can also be called tags, so Series is an one-dimensional array with tags.

As you can see, only one type of data can be stored with Series. For example, the data stored in name series is of string type, while the data stored in age series is integer. If we want to store name,age,marks in a data structure, we need to use DataFrame. From the figure, we can see that DataFrame is similar to a table data, with rows and columns, rows consistent with the rows of Series, is the label of the data, and each column is the original Series.

2. Series type

As we mentioned earlier, any type of data can be stored in the Series structure, including integers, string types, floating-point types, even Python objects, etc., but the data type of each row must be uniform. So how to create a Series object?

Through numpy array

One of the main uses of Pandas is data analysis, and it is also implemented based on Numpy, so it is very common to create Series through numpy array.

Np_array = np.random.randn (5) pd.Series (np_array, index= ['averse,' baked, 'cached,' dumped,'e'])

In the above code, a 5-length numpy array is randomly generated using np.random.randn, and then pd.Series uses this numpy array to create a Series. At the same time, it specifies that the index (label) of each line is a _

Through the Python dictionary

Through the above example, have you found that Series is similar to the built-in dict type of Python? the tag is equivalent to the key in dict, while the data content is equivalent to the value in dict, and they have an one-to-one corresponding relationship, so we can imagine that we can create a Series directly through the dict of Python.

D = {'baked: 1,' axed: 0, 'cached: 2} pd.Series (d)

In the above code, we first create a Python location d, and then pass this dictionary to pd.Series to create a Pandas Series. The result is:

By scalar value (Scalar)

In addition to the above two ways, we can also create a Series with a simple scalar value. In particular, unlike the above two ways, when creating a Series in this way, we must specify the index.

Pd.Series (5, index= ['averse,' baked, 'crested,' dumped,'e'])

As shown in the above code, we use a constant of 5, and then specify index as arecedence brecedence, and we can also use pd.Series to create a Series object. We can see here why we must specify Index. That's because Series objects have length, the length can be greater than 1, and the length of a scalar is fixed to 1. We can control the length of the generated Series by specifying Index. The value in Series is to reuse this scalar constant 5. The running results are as follows:

Name attribute

When we create a Series, we can specify a name that will be stored in the name property of the Series, and later we can use the rename method to modify this property, such as the following code:

S = pd.Series (np.random.randn (5), name='this_is_name') s

We created a Series named this_is_name, and then we used the rename method to rename the Series to this_is_new_name:

S = s.rename ('this_is_new_name') s

The input of the above two parts of code is shown below:

So what's the use of this name? here's a preview, we'll use it in DataFrame (don't forget that DataFrame is a collection of multiple Series).

3. DataFrame type

In the first section, we introduced that DataFrame is a two-dimensional table data structure, it has the concept of rows and columns, corresponding to row labels, in order to be able to index data by column, each column can have a name, that is, column name. We just saw in the Series chapter that Series can represent a column of data. The DataFrame we introduced in this section is a combination of multiple Series, each column corresponds to a Series. And each line also corresponds to a Series. Read here, you can guess the use of the name attribute of Series we just said, by the way, when you use Series to create a column of DataFrame, the name of Series will become the column name, and if Series is used as a row, then the name of Series will become the row name.

Next, let's show you how to create a DataFrame.

A dictionary made up of one-dimensional numpy array or Python List

You can think about it: if the value of a dictionary is array or list, then the dictionary is actually a table structure. The picture shows that DataFrame is a data type of table structure. We can create a DataFrame through such a dictionary, such as the following code.

D = {'one': [1, 2, 3, 4.],' two': [4, 3, 2, 1.]} pd.DataFrame (d, index= ['await,' baked, 'cased,' d'])

We pass the Python dictionary d to pd.DataFrame to create a new DataFrame, and we can specify the line name (label) of DataFrame by specifying index. The output of the above code is

Through the Python List that contains the list

Let's think about it again. Is there any other way to represent tabular data besides dictionaries, yes, and Python List, such as the following code?

Data = [(1,2., 'Hello'), (2,3., "World")] pd.DataFrame (data)

We can use Python List such as data to represent table data. Unlike the dict mentioned earlier, table data represented by List actually has no row and column names, so Pandas automatically generates row and column names by default, so the above code is output as follows:

Of course, the automatically generated row name column name does not make any sense. In order to better manipulate the data, we can also specify our own row name or column name by setting the index or columns parameter of the pd.DataFrame method.

Through the Python List that contains the Python dictionary

Let's keep thinking, what else can represent tabular data? By the way, the Python List containing the Python dictionary can also express tabular data, such as the following code

Data = [{'averse: 1,' baked: 2}, {'averse: 5,' baked: 10, 'cased: 20}] pd.DataFrame (data)

Data is a Python List, and each element in the list is a dictionary. The result is:

Similarly, we can change the row and column names by specifying the index or columns parameters

Through Series

We keep mentioning that DataFrame is a collection of many Series (note: Series can be a row or a column in DataFrame), so we can also create a DataFrame through Series, such as the following code

S1 = pd.Series (np.random.randn (5), name='this_is_name') df = pd.DataFrame (S1) df

Use S1 Series to create a DataFrame with only one column, and the output is as follows:

Remember, we mentioned the name property of Series earlier, which will be used as the column name (or row name, as we can see in the column below) when creating a DataFrame using Series, such as the following code. If there are two Series, we can also create a DataFrame in the following way

S1 = pd.Series (np.random.randn (5), name='this_is_name') S2 = s.rename ('this_is_new_name') df = pd.DataFrame ([S1, S2]) df

Here we use two Series named this_is_name and this_is_new_name to create DataFrame, and the result is:

At this point, I believe that readers have a comprehensive understanding of the data types provided by Pandas, and have the ability to create their own Pandas data structures and store their own data. a common application scenario is that after we get the data through crawlers, we can save the unstructured data in Pandas table format. It is worth noting that when the data is stored in the Pandas data structure. The data is actually in memory, and when the program is closed, the data is lost. If we need to persist the data to the hard disk or database, we can save the data permanently by simply calling the to_csv, to_json, to_hdf and other interfaces provided by Pandas.

Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report