Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the most basic functions of pandas

2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article mainly explains "what are the most basic functions of pandas". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn what are the most basic functions of pandas.

Python is open source, and it's great, but it can't avoid some of the inherent problems of open source: many packages are doing (or trying to do) the same thing. If you are new to Python, it's hard to know which package is the best for a particular task, and you need someone with experience to tell you. It is absolutely necessary to have a package for data science, which is pandas.

The most interesting thing about pandas is that it contains a lot of hidden bags. It is a core package with many other package functions. This is great because you only need to use pandas to get the job done.

Pandas is the equivalent of excel in python: it uses tables (that is, dataframe) and can make various transformations on data, but there are many other functions.

If you are already familiar with the use of python, you can skip to the third paragraph.

Let's get started:

Import pandas as pd

Don't ask why it's "pd" instead of "p", that's all. Just use it:)

The most basic functions of pandas

Read data

Data = pd.read_csv (my_file.csv) data = pd.read_csv (my_file.csv, sep=;, encoding= latin-1, nrows=1000, skiprows= [2mae 5])

Sep stands for delimiters. If you are using French data, the csv delimiter in excel is ";", so you need to specify it explicitly. The encoding is set to latin-1 to read French characters. Nrows=1000 means to read the first 1000 rows of data. Skiprows= [2Jing 5] means that you remove lines 2 and 5 when you read the file.

Most commonly used functions: read_csv, read_excel

Some other great features: read_clipboard, read_sql

Write data

Data.to_csv (my_new_file.csv, index=None)

Index=None indicates that the data will be written as it is. If you don't write index=None, you'll have an extra first column, which is 1, 2, 2, 3, and all the way up to the last line.

I don't usually use other functions, such as .to _ excel, .to _ json, .to _ pickle, etc., because .to _ csv does the job well, and csv is the most common way to save tables.

Check the data

Gives (# rows, # columns)

Give the number of rows and columns

Data.describe ()

Calculate basic statistics

View data

Data.head (3)

Print out the first three lines of the data. Similarly, .tail () corresponds to the last row of data.

Data.loc [8]

Print out the eighth line

Data.loc [8, column_1]

Print the eighth row of columns named "column_1"

Data.loc [range (4pd6)]

Data subset of rows 4 to 6 (left closed and right open)

The basic function of pandas

Logical operation

Data [data [column_1] = = french] data [(data [column_1] = = french) & (data [year_born] = = 1990)] data [(data [column_1] = = french) & (data [year_born] = = 1990) & ~ (data [city] = = London)]

A subset of data is obtained by logical operation. To use & (AND), ~ (NOT), and | (OR), you must add "and" before and after the logical operation.

Data [data [column_1] .isin ([french, english])]

In addition to using multiple OR in the same column, you can also use the .isin () function.

Basic drawing

The matplotlib package makes this feature possible. As we said in the introduction, it can be used directly in pandas.

Data [column_numerical] .plot ()

Example of () .plot () output

Data [column_numerical] .hist ()

Draw the data distribution (histogram)

An example of the output of .hist ()

% matplotlib inline

If you are using Jupyter, don't forget to add the above code before drawing.

Update data

Data.loc [8, column_1] = english

Replace the eighth column named column_1 with "english"

Data.loc [data [column_1] = = french, column_1] = French

Change the value of multiple columns in one line of code

OK, now you can do something that can be easily accessed in excel. Let's take a closer look at some amazing operations that cannot be achieved in excel.

Intermediate function

Count the number of occurrences

Data [column_1] .value_counts ()

Example of output of .value _ counts () function

Operate on all rows, columns, or all data

Data [column_1] .map (len)

The len () function is applied to every element in the "column_1" column

The .map () operation applies a function to each element in a column

Data [column_1] .map (len) .map (lambda x: xamp100) .plot ()

A good feature of pandas is the chained method (https://tomaugspurger.github.io/method-chaining). It can help you perform multiple operations (.map () and .plot ()) more easily and efficiently in a single line.

Data.apply (sum)

.apply () applies a function to a column.

.applymap () applies a function to all units in the DataFrame.

Tqdm, the only one

When working with large datasets, pandas spends some time doing operations such as .map (), .apply (), .applymap (), and so on. Tqdm is a package that can be used to help predict when the execution of these operations will be completed (yes, I lied, I said we would only use pandas).

From tqdm import tqdm_notebook tqdm_notebook () .pandas ()

Set up tqdm with pandas

Data [column_1] .progress_map (lambda x: x.count (e))

Replacing .map (), .apply (), and .applymap () with .progress _ map () is similar.

Use the progress bar obtained by tqdm and pandas in Jupyter

Correlation and scattering matrix

Data.corr () data.corr () .applymap (lambda x: int (xx100) / 100)

.corr () gives the correlation matrix.

Pd.plotting.scatter_matrix (data, figsize= (12pm 8))

An example of a scatter matrix. It draws all the combinations of the two columns in the same picture.

Advanced operations in pandas

The SQL association

It is very, very simple to implement association in pandas.

Data.merge (other_data, on= [column_1, column_2, column_3])

Associating three columns requires only one line of code

Grouping

It's not that simple at first, you need to master the grammar first, and then you will find that you have been using this feature.

Data.groupby (column_1) [column_2] .apply (sum) .reset_index ()

Group by column and select another column to execute a function. .reset _ index () reconstructs the data into a table.

As explained earlier, to optimize the code, concatenate your functions on one line.

Row iteration

Dictionary = {} for iMagazine row in data.iterrows (): dictionary [row [column_1]] = row [column_2]

.iterrows () loops with two variables: the row index and the row data (I and row above)

All in all, pandas is one of the reasons why python is an excellent programming language.

I could have shown more interesting pandas features, but what I've written is enough to understand why data scientists can't live without pandas. To sum up, pandas has the following advantages:

Easy to use, hiding all complex and abstract calculations behind

Intuitive

Fast, if not the fastest, is very fast.

It helps data scientists to read and understand data quickly and improve their work efficiency.

Thank you for your reading, these are the contents of "what are the most basic functions of pandas". After the study of this article, I believe you have a deeper understanding of what are the most basic functions of pandas, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report