In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article mainly introduces "what are the practical pandas knowledge points". In the daily operation, I believe that many people have doubts about the practical pandas knowledge points. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful for everyone to answer the doubts about "what are the practical pandas knowledge points?" Next, please follow the editor to study!
1 introduction
As a sharp tool to carry out data analysis, pandas contains a variety of API related to data processing, so that we can flexibly and conveniently process all kinds of data, but many practical methods in pandas are actually unknown to most people. Today, I will introduce to you six practical pandas tips that are not well known by people.
Figure 12 6 practical pandas tips 2.1 conversion between Series and DataFrame
In many cases, the result produced in the process of calculation is in Series format, and many of the following operations, especially when using "chained" syntax, need to be followed by variables in DataFrame format. In this case, we can use the method of converting Series to DataFrame in pandas:
"using to_frame () to realize Series to DataFrame" s = pd.Series ([0,1,2])
# Series to DataFrame,name parameter is used to specify the converted field name
S = s.to_frame (name=' column name')
S
Figure 2
By the way, I would like to introduce how to convert a data box composed of single column data to Series:
"using squeeze () to realize the conversion from DataFrame to Series of single column data" # only DataFrame of single column data is converted to Series
S.squeeze ()
Figure 32.2 randomly disrupting the order of rows of records in DataFrame
Sometimes we need to disrupt the row order of the data box as a whole, for example, when training the machine learning model, we take the first few rows as the training set and several rows after the training set as the test set, which can be quickly realized by using the sample () method in pandas.
The essential function of the sample () method is to sample row records from the original data, which is not returned by default. Its parameter frac is used to control the sampling ratio, and setting it to 1 is equivalent to disrupting the order:
Df = pd.DataFrame ({
'V1customers: range (5)
'V2percent: range (5)
})
Df.sample (frac=1)
Figure 42.3 using class data to reduce memory consumption
When some columns in our data box are formed by a large number of repetitions of a few values, it consumes a lot of memory, as in the following example:
Import numpy as np
Pool = ['A','B','C','D']
# V1 column is formed by a large number of repetitions of ABCD
Df = pd.DataFrame ({
'V1customers: np.random.choice (pool, 1000000)
})
# View memory usage
Df.memory_usage (deep=True)
Figure 5
In this case, we can use classes in the pandas data type to greatly reduce memory consumption:
Df ['V1'] = df [' V1'] .astype ('category')
Df.memory_usage (deep=True)
Figure 6
As you can see, memory consumption has been reduced by nearly 98.3% after the conversion!
2.4 object type traps in pandas
In the daily process of using pandas to process data, we often encounter object as a data type. Many beginners will regard it as a string. In fact, object can represent uncertain data types in pandas, that is, Series of type object can be mixed with multiple data types:
S = pd.Series (['111100,' 111100, 111100, '111100])
S
Figure 7
View type distribution:
S.apply (lambda s: type (s))
Figure 8
In this case, if we rashly treat it as a string column, the corresponding elements that cannot be processed will only become missing values without reporting errors, which brings hidden trouble to our analysis process:
S.str.replace ('009,' 11')
Figure 9
In this case, you must first switch to the corresponding type, and then execute the corresponding method:
S.astype ('str'). Str.replace (' 009, '11')
Figure 102.5 quickly determine whether there are missing values in each column
In pandas, we can look at the hanans property for a single Series to see if it contains missing values, while with apply (), we can quickly see which columns in the entire data box contain missing values:
Df = pd.DataFrame ({
'V1customers: [1, 2, None, 4]
'V2percent: [1, 2, 3, 4]
'V3percent: [None, 1, 2, 3]
})
Df.apply (lambda s: s.hasnans)
Figure 112.6 five strategies for calculating rankings using rank ()
In pandas, we can use the rank () method to calculate the ranking information corresponding to a column of data, but there is a parameter method in rank () to control the specific result calculation strategy. There are the following five strategies, which should be flexibly selected according to the needs:
"average"
Under the average strategy, the ranking of elements with the same value is the average of their internal rankings:
S = pd.Series ([1,2,2,2,3,4,4,5,6])
S.rank (method='average')
Figure 12 "min"
Under the min strategy, the ranking of the same element is the lowest of its internal ranking:
S.rank (method='min')
Figure 13 "max"
The max strategy, in contrast to min, takes the maximum internal ranking of the same element:
S.rank (method='max')
Figure 14 "dense"
Under the dense strategy, it is equivalent to ranking the sequence after being deduplicated, and then assigning the ranking of each element to the same element, which is also more in line with the actual needs:
S.rank (method='dense')
Figure 15 "first"
Under the first policy, when multiple elements are the same, the ranking is assigned according to the order of the same elements in the actual Series:
S = pd.Series ([2,2,2,1,3])
S.rank (method='first')
Figure 16
At this point, the study of "what are the practical pandas knowledge points" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.