In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly explains "Python's method of data reprocessing and data statistics". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let Xiaobian take you to learn "Python's method of data reprocessing and data statistics"!
I. To repeat the operation
The deduplication operation is indispensable in the process of processing data. For example, we want to know how many people in a class hold class positions, but it is possible that some people hold multiple positions, and we need to deweight the data. The following is a simple list of numbers as an example, and we introduce three simple operations to duplicate:
(1) Cyclic deduplication = [1,2,3,3,2,3,4,2,4,3,1,2,6,3,3,2,4,3,6,2,3,6,4,7,5,4,3,5,6,7,9,5,4,6,8,9,5,4,5,3,6,7,8,5,4,6,8,4,5,6,7]
new_array = []
for i in array:
if i in new_array:
continue
else:
new_array.append(i)
print(new_array)(2) set properties set()array = [1,2,3,3,2,4,3,1,2,6,3,2,4,3,6,2,3,4,6,3,6,4,7,5,4,3,5,6,7,9,5,4,6,8,9,5,4,5,3,6,7,8,5,4,6,8,4,5,6,7]
new_array = list(set(array))
print(new_array)
Output:
(3) Use keys() mode.array = [1,2,3,3,2,3,4,2,3,1,2,6,3,3,2,4,2,3,6,2,3,6,4,7,5,4,3,5,6,7,9,5,4,6,8,9,5,4,5,3,6,7,8,5,4,6,8,4,5,6,7]
new_array = list({}.fromkeys(array).keys())
print(new_array)
Compare the three ways of de-weighting:
(1) Cycle, obviously more cumbersome, and the results will not be automatically sorted.
(2) The set() method requires a certain understanding of the set, and the result is automatically sorted.
(3) keys() method, need to have a certain understanding of the dictionary, although much shorter than the loop, but like the loop, the result will not be automatically sorted.
II. Statistical operations
The first method is very simple, this is also because this array is relatively simple, you can directly use len(array) to take the length of the list.
However, I suggest using the second method, using the pandas library.
Note: To put it bluntly, if you have a deep interest in Python big data, it is necessary to learn the pandas library.
import pandas as pd
array = [1,2,3,3,2,3,4,2,4,3,1,2,6,3,3,2,4,2,3,4,6,2,3,6,4,7,5,4,3,5,6,7,9,5,4,6,8,9,5,4,5,3,6,7,8,5,4,6,8,4,5,6,7]
sr = pd.Series(array)
print(sr.head())
print(sr.count())
print(sr.value_counts())
print(sr.value_counts().count())
The head() method prints only the first 5 lines of the list, and if you want to print more, you can pass in parameters. For example, if you want to print 20 lines, you just need to pass in 20, head(20).
The count() method counts the list. As you can see from the printout, there are 53 values in this array.
The value_counts() method deduplicates the list and returns a new list, sorted by default from most to least. Then use the count() method to know that there are 9 values after deweighting.
At this point, I believe that everyone has a deeper understanding of "Python's method of reprocessing data and data statistics". Let's actually operate it! Here is the website, more related content can enter the relevant channels for inquiry, pay attention to us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 245
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.