Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the Pandas functions?

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article introduces the relevant knowledge of "what are the Pandas functions?" in the operation of actual cases, many people will encounter such a dilemma, and then let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

1. Installation

If you want to run these examples yourself, download the Anime recommended dataset from Kaggle, extract it, and put it in the same folder as jupyter notebook.

Then run these instructions and you should be able to repeat the results of any of the following functions.

Import pandas as pd import numpy as npanime = pd.read_csv ('anime-recommendations-database/anime.csv') rating = pd.read_csv (' anime-recommendations-database/rating.csv') anime_modified= anime.set_index ('name') 2. Input

Enter CSV (comma separated values)

Convert the CSV directly to a data box. Sometimes CSV also needs to specify an encoding (that is, encoding='ISO-8859-1') to load data. If the data box contains unreadable characters, you should try the above method first.

For table files, there is a similar function called pd.read_excel.

Anime = pd.read_csv ('anime-recommendations-database/anime.csv')

Build a data box based on the input data

This is useful when manually instantiating simple data, making it easy to see how the data changes at run time.

Df = pd.DataFrame ([1 id','name', id','name', 'Builder], [2 id','name', "Builder], [3 CandleStick Maker'], columns= [' id','name', 'Builder])

Df.head ()

Copy data frame

Copying a data box is useful if you want to keep the original copy while making changes to the data box. It is a good practice to copy the data box as soon as it is entered.

Anime_copy = anime.copy (deep=True)

3. View and verify

Get the top or bottom n records

Displays the first n records in the data box. The author usually prints the record at the top of the data box somewhere in the notebook so that he can refer back to it if he forgets its contents.

Anime.head (3) rating.tail (1)

Calculate number of rows

This itself is not a pandas function, but the len () function counts the rows and saves them in variables for use elsewhere.

Len (df) # = > 3

Calculate unique row

Calculates the unique value in a column.

Len (ratings ['user_id'] .unique ())

Get data frame information

It is useful for getting general information, such as the title, the number of values, and the data type by column. Df.dtypes is a similar but less practical function that provides only column data types.

Anime.info ()

Get statistics

It is useful to get statistics if the data box has many numeric values. To understand the average, minimum and maximum values of the rating column, you can get a general idea of the data box.

Anime.describe ()

Get the sum of values

Gets the sum of the values for a specific column.

Anime.type.value_counts ()

4. Output

Save to CSV format

This will be dumped to the same directory as notebook. The author only saves the first 10 lines below, but the reader does not need to do so. Similarly, you can use the df.to_excel () function to save the table file in CSV format.

Rating [: 10] .to _ csv ('saved_ratings.csv',index=False) 5. Select

Gets a list of values or a series of values for a column.

This method works when you need to put the values in the column into the X and y variables to fit the machine learning model.

Anime ['genre'] .tolist () anime [' genre']

Anime ['genre'] .tolist ()

Anime ['genre']

Get a list of index values

Create a numeric list by index. Note that the anime_modified data box is used here because the index values are more interesting.

Anime_modified.index.tolist ()

Get a list of column values

Anime.columns.tolist ()

6. Add / remove

Append a new column with a set value

Occasionally, I do this when the test set and the training set are in two separate data boxes and want to mark the correspondence between the trip and the set before combining them.

Anime ['train set'] = True

Create a new data box from a part of the column

This method is used when you want to keep only a few columns in the giant data box and do not want to specify delete columns.

Anime [['name','episodes']]

Delete the specified column

Delete specified columns is used when only a few columns need to be deleted. Otherwise, writing out the whole content may be very boring, and the author prefers the former to delete the specified column.

Anime.drop (['anime_id',' genre','members'], axis=1) .head ()

Add a line that adds the sum of other lines

Because it is easier to view, create a small data box manually here. What's interesting here is that df.sum (axis=0) adds values to each row or column.

The same logic is used when calculating the sum or average, such as:

Df.mean (axis=0). F = pd.DataFrame ([1 df.append df.append (axis=0), ignore_index=True), [2 df.append (axis=0), 9000], [3 recorder Scottling, 20], columns= ['df.sum' power level'])

7. Merge

Concatenate two data boxes

For situations where a peer has two data boxes and wants to combine them. Here, divide the data box into two parts, and then add them back together.

Df1 = anime [0:2] df2 = anime [2:4] pd.concat ([df1, df2], ignore_index=True)

Merge data frame

When you want to merge two data boxes into one column, merging data boxes is like the left conjunction of SQL (structured query language).

Rating.merge (anime,left_on='anime_id', right_on='anime_id', suffixes= ('_ left','_ right'))

8. Screening

Retrieve rows that match index values

The index value in anime_modified is the name of the animation. Notice how these names are used to get specific columns.

Anime_modified.loc [['Haikyufang second Season','Gintama']]

Retrieve rows by numbering index values

Unlike the above function, using iloc, the index value of the first row is 0, the index value of the second row is 1, and so on. Even after you modify the data box, use string values in the index column.

Use this function when you want to get the first three rows in the data box.

Anime_modified.iloc [0:3]

Get Row

Retrieves rows in the column values of a given list. Anime [anime ['type'] = =' TV'] also applies when matching a single value.

Anime [anime ['type'] .isin ([' TV','Movie'])]

Split data frame

It's like splitting a table. Split the data box to get all rows before / in / after a particular index.

Anime [1:3]

Pass value filtering

A data box that filters eligible rows. Note, however, that this will maintain the existing index value.

Anime [anime ['rating'] > 8]

9. Sort

Sorting function sort_values

Sorts the data box by the value in the column.

Anime.sort_values ('rating',ascending=False)

10. Summary

Grouping and counting

Calculates the number of records for each different value in the column.

Anime.groupby ('type') .count ()

Group and summarize columns in different ways

Note that the author adds the reset_index () function, otherwise the "type" column below will become an index column-- which I recommend in most cases.

Anime.groupby (["type"]) .agg ({"rating": "sum", "episodes": "count", "name": "last"}) .reset_index ()

Create a PivotTable report

PivotTable is a more suitable tool for extracting a subset of data from a data box.

It should be noted that the author has done a lot of filtering on the data box, so that the PivotTable report can be built more quickly.

Tmp_df = rating.copy () tmp_df.sort_values ('user_id', ascending=True, inplace=True) tmp_df = tmp_ DF [tmp _ df.user_id < 10] tmp_df = tmp_ DF [TMP _ df.anime_id < 30] tmp_df = tmp_ DF [TMP _ df.rating! =-1] pd.pivot_table (tmp_df, values='rating',index= [' user_id'], columns= ['anime_id'], aggfunc=np.sum, fill_value=0)

11. Finishing

Set the non-NaN cell to a value

Set the non-numeric cell to 0. In the example, the author creates the same PivotTable report as before, but instead of using fill_value=0, I populate it with fillna (0).

Pivot = pd.pivot_table (tmp_df, values='rating',index= ['user_id'], columns= [' anime_id'], aggfunc=np.sum) pivot.fillna (0)

twelve。 Other

Sampled data box

The author has been extracting a small number of samples from larger data boxes. If frac = 1, you can randomly rearrange the row while retaining the index.

Anime.sample (frac=0.25)

Iterative row index

Iterate over the index and rows in the data box.

For idx,row inanime [: 2] .iterrows (): print (idx,row)

Start jupyter notebook

Start the jupyter notebook program at a high data rate limit.

This is the end of jupyter notebook-NotebookApp.iopub_data_rate_limit=1.0e10 's "what are the Pandas functions"? thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report