In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article mainly explains "what are the most basic functions of pandas". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn what are the most basic functions of pandas.
Python is open source, and it's great, but it can't avoid some of the inherent problems of open source: many packages are doing (or trying to do) the same thing. If you are new to Python, it's hard to know which package is the best for a particular task, and you need someone with experience to tell you. It is absolutely necessary to have a package for data science, which is pandas.
The most interesting thing about pandas is that it contains a lot of hidden bags. It is a core package with many other package functions. This is great because you only need to use pandas to get the job done.
Pandas is the equivalent of excel in python: it uses tables (that is, dataframe) and can make various transformations on data, but there are many other functions.
If you are already familiar with the use of python, you can skip to the third paragraph.
Let's get started:
Import pandas as pd
Don't ask why it's "pd" instead of "p", that's all. Just use it:)
The most basic functions of pandas
Read data
Data = pd.read_csv (my_file.csv) data = pd.read_csv (my_file.csv, sep=;, encoding= latin-1, nrows=1000, skiprows= [2mae 5])
Sep stands for delimiters. If you are using French data, the csv delimiter in excel is ";", so you need to specify it explicitly. The encoding is set to latin-1 to read French characters. Nrows=1000 means to read the first 1000 rows of data. Skiprows= [2Jing 5] means that you remove lines 2 and 5 when you read the file.
Most commonly used functions: read_csv, read_excel
Some other great features: read_clipboard, read_sql
Write data
Data.to_csv (my_new_file.csv, index=None)
Index=None indicates that the data will be written as it is. If you don't write index=None, you'll have an extra first column, which is 1, 2, 2, 3, and all the way up to the last line.
I don't usually use other functions, such as .to _ excel, .to _ json, .to _ pickle, etc., because .to _ csv does the job well, and csv is the most common way to save tables.
Check the data
Gives (# rows, # columns)
Give the number of rows and columns
Data.describe ()
Calculate basic statistics
View data
Data.head (3)
Print out the first three lines of the data. Similarly, .tail () corresponds to the last row of data.
Data.loc [8]
Print out the eighth line
Data.loc [8, column_1]
Print the eighth row of columns named "column_1"
Data.loc [range (4pd6)]
Data subset of rows 4 to 6 (left closed and right open)
The basic function of pandas
Logical operation
Data [data [column_1] = = french] data [(data [column_1] = = french) & (data [year_born] = = 1990)] data [(data [column_1] = = french) & (data [year_born] = = 1990) & ~ (data [city] = = London)]
A subset of data is obtained by logical operation. To use & (AND), ~ (NOT), and | (OR), you must add "and" before and after the logical operation.
Data [data [column_1] .isin ([french, english])]
In addition to using multiple OR in the same column, you can also use the .isin () function.
Basic drawing
The matplotlib package makes this feature possible. As we said in the introduction, it can be used directly in pandas.
Data [column_numerical] .plot ()
Example of () .plot () output
Data [column_numerical] .hist ()
Draw the data distribution (histogram)
An example of the output of .hist ()
% matplotlib inline
If you are using Jupyter, don't forget to add the above code before drawing.
Update data
Data.loc [8, column_1] = english
Replace the eighth column named column_1 with "english"
Data.loc [data [column_1] = = french, column_1] = French
Change the value of multiple columns in one line of code
OK, now you can do something that can be easily accessed in excel. Let's take a closer look at some amazing operations that cannot be achieved in excel.
Intermediate function
Count the number of occurrences
Data [column_1] .value_counts ()
Example of output of .value _ counts () function
Operate on all rows, columns, or all data
Data [column_1] .map (len)
The len () function is applied to every element in the "column_1" column
The .map () operation applies a function to each element in a column
Data [column_1] .map (len) .map (lambda x: xamp100) .plot ()
A good feature of pandas is the chained method (https://tomaugspurger.github.io/method-chaining). It can help you perform multiple operations (.map () and .plot ()) more easily and efficiently in a single line.
Data.apply (sum)
.apply () applies a function to a column.
.applymap () applies a function to all units in the DataFrame.
Tqdm, the only one
When working with large datasets, pandas spends some time doing operations such as .map (), .apply (), .applymap (), and so on. Tqdm is a package that can be used to help predict when the execution of these operations will be completed (yes, I lied, I said we would only use pandas).
From tqdm import tqdm_notebook tqdm_notebook () .pandas ()
Set up tqdm with pandas
Data [column_1] .progress_map (lambda x: x.count (e))
Replacing .map (), .apply (), and .applymap () with .progress _ map () is similar.
Use the progress bar obtained by tqdm and pandas in Jupyter
Correlation and scattering matrix
Data.corr () data.corr () .applymap (lambda x: int (xx100) / 100)
.corr () gives the correlation matrix.
Pd.plotting.scatter_matrix (data, figsize= (12pm 8))
An example of a scatter matrix. It draws all the combinations of the two columns in the same picture.
Advanced operations in pandas
The SQL association
It is very, very simple to implement association in pandas.
Data.merge (other_data, on= [column_1, column_2, column_3])
Associating three columns requires only one line of code
Grouping
It's not that simple at first, you need to master the grammar first, and then you will find that you have been using this feature.
Data.groupby (column_1) [column_2] .apply (sum) .reset_index ()
Group by column and select another column to execute a function. .reset _ index () reconstructs the data into a table.
As explained earlier, to optimize the code, concatenate your functions on one line.
Row iteration
Dictionary = {} for iMagazine row in data.iterrows (): dictionary [row [column_1]] = row [column_2]
.iterrows () loops with two variables: the row index and the row data (I and row above)
All in all, pandas is one of the reasons why python is an excellent programming language.
I could have shown more interesting pandas features, but what I've written is enough to understand why data scientists can't live without pandas. To sum up, pandas has the following advantages:
Easy to use, hiding all complex and abstract calculations behind
Intuitive
Fast, if not the fastest, is very fast.
It helps data scientists to read and understand data quickly and improve their work efficiency.
Thank you for your reading, these are the contents of "what are the most basic functions of pandas". After the study of this article, I believe you have a deeper understanding of what are the most basic functions of pandas, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.