In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article mainly shows you "how to use pandas", the content is easy to understand, clear, hope to help you solve your doubts, the following let the editor lead you to study and learn "how to use pandas" this article.
The required packages are introduced in the following format:
Undefined
First, create objects
You can view detailed information about the contents of this section through Data Structure Intro Setion.
1. You can create a Series,pandas by passing a list object. An integer index will be created by default:
2. Create a DataFrame by passing a numpyarray, a time index, and a column label:
3. Create a DataFrame by passing a dictionary object that can be transformed into a similar sequence structure:
4. View the data types of different columns:
5. If you are using IPython, using the Tab auto-completion feature will automatically identify all attributes and custom columns. The following figure is a subset of all attributes that can be automatically identified:
Second, view the data
For more information, please see: Basics Section
1. View the header and tail lines in frame:
2. Display indexes, columns, and underlying numpy data:
3. The describe () function makes a quick summary of the data:
4. Transpose the data:
5. Sort by axis
6. Sort by value
III. Choice
Although standard Python/Numpy selection and setting expressions can be directly used, as the code used by the project, we recommend using optimized pandas data access: .at, .iat, .loc, .iloc and .ix see Indexingand Selecing Data and MultiIndex/ Advanced Indexing for details.
L get
1. Select a separate column, which will return a Series, which is equivalent to df.A:
2. Select through [], which will slice the row
L Select by label
1. Use tags to get a crossed area
2. Select on multiple axes through labels
3. Label slice
4. Reduce the dimension of the returned object
5. Get a scalar
6. Quickly access a scalar (equivalent to the previous method)
L select by location
1. Select the location by passing a numerical value (rows are selected)
2. Slice by numerical value, which is similar to that in numpy/python.
3. By specifying a list of locations, this is similar to the situation in numpy/python
4. Slice the row
5. Slice the column
6. Get a specific value
L Boolean index
1. Use the value of a separate column to select the data:
2. Use where operation to select data:
3. Use the isin () method to filter
L Settin
1. Set a new column:
2. Set a new value through the label:
3. Set the new value through the location:
4. Set a new set of values through an numpy array:
The results of the above operations are as follows:
5. Set the new value through the where operation:
Fourth, missing value treatment
In pandas, use np.nan instead of missing values, which will not be included in the calculation by default, see: Missing Data Section for details.
1. The reindex () method can change / add / delete the index on the specified axis, which returns a copy of the original data:,
2. Remove the row that contains the missing value:
3. Fill in the missing values:
4. Boolean filling of the data:
V. related operations
For more information, please participate in Basic Section On Binary Ops
L statistics (missing values are not usually included in related operations)
1. Perform descriptive statistics:
2. Do the same on other axes:
3. Manipulate objects that need to be aligned with different dimensions. Pandas automatically broadcasts along the specified dimension:
L Apply
1. Apply functions to the data:
L histogram
For more information, please refer to Histogrammingand Discretization
L string method
The Series object is equipped with a set of string handling methods in its str property, which can be easily applied to each element in the array, as shown in the following code. For more details, please refer to Vectorized String Methods.
VI. Merger
Pandas provides a large number of ways to easily merge Series,DataFrame and Panel objects in accordance with various logical relationships. For more information, please see: Mergingsection
L Concat
L Join is similar to SQL type merging. For more information, please refer to: Databasestyle joining
L Append connects a line to a DataFrame. For more information, please see Appending:
VII. Grouping
For "group by" operations, we usually refer to one or more of the following steps:
L (Splitting) divides data into different groups according to some rules
L (Applying) executes a function for each set of data
L (Combining) combines the results into a data structure
For more information, please see: Groupingsection
1. Group and execute the sum function on each packet:
2. Group multiple columns to form a hierarchical index, and then execute the function:
VIII. Reshaping
Please refer to HierarchicalIndexing and Reshaping for details.
L Stack
L PivotTable report, please refer to: PivotTables.
You can easily generate a PivotTable from this data:
IX. Time series
Pandas has simple, powerful, and efficient functions when resampling the frequency conversion (such as converting data sampled by seconds into data sampled in 5-minute units). This kind of operation is very common in the financial field. Specific reference: TimeSeries section.
1. The time zone indicates:
2. Time zone conversion:
3. Time span conversion:
4. The conversion between period and timestamp makes it possible to use some convenient arithmetic functions.
10. Categorical
Starting from version 0.15, pandas can support data of type Categorical in DataFrame. For more information, see categoricalintroduction and APIdocumentation.
1. Convert the original grade to Categorical data type:
2. Rename Categorical type data to something more meaningful:
3. Reorder the categories to add the missing categories:
4. Sorting is done in Categorical order rather than dictionary order:
5. There are empty categories when sorting the Categorical column:
11. Drawing
For more information, please see Plottingdocs.
For DataFrame, plot is an easy way to draw all columns and their labels:
Import and save data
L CSV, reference: Writingto a csv file
1. Write to csv file:
2. Read from csv file:
L HDF5, reference: HDFStores
1. Write to HDF5 storage:
2. Read from HDF5 storage:
L Excel, reference: MSExcel
1. Write to excel file:
2. Read from excel file:
The above is all the contents of this article "how to use pandas". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.