In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article introduces the relevant knowledge of "what are the Pandas functions?" in the operation of actual cases, many people will encounter such a dilemma, and then let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
1. Installation
If you want to run these examples yourself, download the Anime recommended dataset from Kaggle, extract it, and put it in the same folder as jupyter notebook.
Then run these instructions and you should be able to repeat the results of any of the following functions.
Import pandas as pd import numpy as npanime = pd.read_csv ('anime-recommendations-database/anime.csv') rating = pd.read_csv (' anime-recommendations-database/rating.csv') anime_modified= anime.set_index ('name') 2. Input
Enter CSV (comma separated values)
Convert the CSV directly to a data box. Sometimes CSV also needs to specify an encoding (that is, encoding='ISO-8859-1') to load data. If the data box contains unreadable characters, you should try the above method first.
For table files, there is a similar function called pd.read_excel.
Anime = pd.read_csv ('anime-recommendations-database/anime.csv')
Build a data box based on the input data
This is useful when manually instantiating simple data, making it easy to see how the data changes at run time.
Df = pd.DataFrame ([1 id','name', id','name', 'Builder], [2 id','name', "Builder], [3 CandleStick Maker'], columns= [' id','name', 'Builder])
Df.head ()
Copy data frame
Copying a data box is useful if you want to keep the original copy while making changes to the data box. It is a good practice to copy the data box as soon as it is entered.
Anime_copy = anime.copy (deep=True)
3. View and verify
Get the top or bottom n records
Displays the first n records in the data box. The author usually prints the record at the top of the data box somewhere in the notebook so that he can refer back to it if he forgets its contents.
Anime.head (3) rating.tail (1)
Calculate number of rows
This itself is not a pandas function, but the len () function counts the rows and saves them in variables for use elsewhere.
Len (df) # = > 3
Calculate unique row
Calculates the unique value in a column.
Len (ratings ['user_id'] .unique ())
Get data frame information
It is useful for getting general information, such as the title, the number of values, and the data type by column. Df.dtypes is a similar but less practical function that provides only column data types.
Anime.info ()
Get statistics
It is useful to get statistics if the data box has many numeric values. To understand the average, minimum and maximum values of the rating column, you can get a general idea of the data box.
Anime.describe ()
Get the sum of values
Gets the sum of the values for a specific column.
Anime.type.value_counts ()
4. Output
Save to CSV format
This will be dumped to the same directory as notebook. The author only saves the first 10 lines below, but the reader does not need to do so. Similarly, you can use the df.to_excel () function to save the table file in CSV format.
Rating [: 10] .to _ csv ('saved_ratings.csv',index=False) 5. Select
Gets a list of values or a series of values for a column.
This method works when you need to put the values in the column into the X and y variables to fit the machine learning model.
Anime ['genre'] .tolist () anime [' genre']
Anime ['genre'] .tolist ()
Anime ['genre']
Get a list of index values
Create a numeric list by index. Note that the anime_modified data box is used here because the index values are more interesting.
Anime_modified.index.tolist ()
Get a list of column values
Anime.columns.tolist ()
6. Add / remove
Append a new column with a set value
Occasionally, I do this when the test set and the training set are in two separate data boxes and want to mark the correspondence between the trip and the set before combining them.
Anime ['train set'] = True
Create a new data box from a part of the column
This method is used when you want to keep only a few columns in the giant data box and do not want to specify delete columns.
Anime [['name','episodes']]
Delete the specified column
Delete specified columns is used when only a few columns need to be deleted. Otherwise, writing out the whole content may be very boring, and the author prefers the former to delete the specified column.
Anime.drop (['anime_id',' genre','members'], axis=1) .head ()
Add a line that adds the sum of other lines
Because it is easier to view, create a small data box manually here. What's interesting here is that df.sum (axis=0) adds values to each row or column.
The same logic is used when calculating the sum or average, such as:
Df.mean (axis=0). F = pd.DataFrame ([1 df.append df.append (axis=0), ignore_index=True), [2 df.append (axis=0), 9000], [3 recorder Scottling, 20], columns= ['df.sum' power level'])
7. Merge
Concatenate two data boxes
For situations where a peer has two data boxes and wants to combine them. Here, divide the data box into two parts, and then add them back together.
Df1 = anime [0:2] df2 = anime [2:4] pd.concat ([df1, df2], ignore_index=True)
Merge data frame
When you want to merge two data boxes into one column, merging data boxes is like the left conjunction of SQL (structured query language).
Rating.merge (anime,left_on='anime_id', right_on='anime_id', suffixes= ('_ left','_ right'))
8. Screening
Retrieve rows that match index values
The index value in anime_modified is the name of the animation. Notice how these names are used to get specific columns.
Anime_modified.loc [['Haikyufang second Season','Gintama']]
Retrieve rows by numbering index values
Unlike the above function, using iloc, the index value of the first row is 0, the index value of the second row is 1, and so on. Even after you modify the data box, use string values in the index column.
Use this function when you want to get the first three rows in the data box.
Anime_modified.iloc [0:3]
Get Row
Retrieves rows in the column values of a given list. Anime [anime ['type'] = =' TV'] also applies when matching a single value.
Anime [anime ['type'] .isin ([' TV','Movie'])]
Split data frame
It's like splitting a table. Split the data box to get all rows before / in / after a particular index.
Anime [1:3]
Pass value filtering
A data box that filters eligible rows. Note, however, that this will maintain the existing index value.
Anime [anime ['rating'] > 8]
9. Sort
Sorting function sort_values
Sorts the data box by the value in the column.
Anime.sort_values ('rating',ascending=False)
10. Summary
Grouping and counting
Calculates the number of records for each different value in the column.
Anime.groupby ('type') .count ()
Group and summarize columns in different ways
Note that the author adds the reset_index () function, otherwise the "type" column below will become an index column-- which I recommend in most cases.
Anime.groupby (["type"]) .agg ({"rating": "sum", "episodes": "count", "name": "last"}) .reset_index ()
Create a PivotTable report
PivotTable is a more suitable tool for extracting a subset of data from a data box.
It should be noted that the author has done a lot of filtering on the data box, so that the PivotTable report can be built more quickly.
Tmp_df = rating.copy () tmp_df.sort_values ('user_id', ascending=True, inplace=True) tmp_df = tmp_ DF [tmp _ df.user_id < 10] tmp_df = tmp_ DF [TMP _ df.anime_id < 30] tmp_df = tmp_ DF [TMP _ df.rating! =-1] pd.pivot_table (tmp_df, values='rating',index= [' user_id'], columns= ['anime_id'], aggfunc=np.sum, fill_value=0)
11. Finishing
Set the non-NaN cell to a value
Set the non-numeric cell to 0. In the example, the author creates the same PivotTable report as before, but instead of using fill_value=0, I populate it with fillna (0).
Pivot = pd.pivot_table (tmp_df, values='rating',index= ['user_id'], columns= [' anime_id'], aggfunc=np.sum) pivot.fillna (0)
twelve。 Other
Sampled data box
The author has been extracting a small number of samples from larger data boxes. If frac = 1, you can randomly rearrange the row while retaining the index.
Anime.sample (frac=0.25)
Iterative row index
Iterate over the index and rows in the data box.
For idx,row inanime [: 2] .iterrows (): print (idx,row)
Start jupyter notebook
Start the jupyter notebook program at a high data rate limit.
This is the end of jupyter notebook-NotebookApp.iopub_data_rate_limit=1.0e10 's "what are the Pandas functions"? thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.