In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
This article is about how Python counts the number of letter occurrences. Xiao Bian thinks it is quite practical, so share it for everyone to make a reference. Let's follow the editor and have a look.
Title:
Count the number of times each word appears in a file and list the five words that appear most frequently.
Foreword:
This question is widely used in practical application scenarios, such as statistics of high-frequency words appearing in CET-4 and CET-6 examinations over the years. I remember that Li Xiaolai published a best-selling book on memorizing words by using his programming skills, which is to memorize words according to word frequency, which is very popular among students. This is a typical scenario where programming skills are used to solve practical problems. In addition, in data analysis, those word cloud effects are essentially based on word frequency statistics to adjust the size of the font, if you can skillfully use Python knowledge to solve the problem, it means that you really get started in Python.
analysis
This topic mainly examines the following knowledge points:
1. How to read and write files correctly
In python, you can use the built-in function open() to read and write files, and the open function has certain differences in python2 and python3. For example, Python can specify the encoding format of reading and writing files, while Python cannot. In order to be compatible with both 2 and 3, we usually use the open function below the io module. You can check the documentation to find out the difference between them, and cultivate the habit of active learning ability and looking up information.
Another point is that after reading and writing the file, you need to close the file descriptor, except that you can use try... except... After the syntax of finally, we can also use the more elegant syntax of with … as to automatically close the file.
2. How to sort data
Sorted function is a very frequently used built-in function, its usage is also very powerful, because it can be customized by specifying the parameter key, which means that you can not only sort numbers, sort letters, but also sort lists, dictionaries, custom objects, you only need to tell the sorted function what the sorting rules are, such as a people object, I can sort by age or by height and weight, So this function is very flexible, in addition, for list objects have their own sort method, if you can distinguish between list.sort and sorted that means you have been able to use flexibility.
3. Use of dictionary data types
Do word frequency statistics, dictionary is undoubtedly the most appropriate data type, words as the dictionary key, the number of times the word appears as the dictionary value, it is convenient to record the frequency of each word, dictionary is very similar to our telephone book, each name is associated with a telephone number. In addition, the biggest feature of the dictionary is that its query speed will be very fast. Ideally the time complexity is O(1), I mean ideally, if you want to learn more about dictionaries, I suggest you check out this article https://www.laurentluce.com/posts/python-dictionary-implementation/
The use of regular expressions
For text, string processing, regular expressions are simply artifacts, whether it is data crawler or data cleaning is widely used, of course, regular expressions are not unique to Python, all programming languages support, we have to do in addition to learning regular expressions and his API, only familiar with the API we can apply to the actual scene. About regular expressions recommend an article: www.cnblogs.com/huxi/archive/2010/07/04/1771073.html, in addition I also found that some students introduced the jieba sub-thesaurus, this library is very useful in doing Chinese word segmentation, interested can go to understand.
achieve
After analyzing, we actually realized it very quickly. So when we get a requirement, we must first figure out the requirement, think about what technologies can be used to achieve it, and then start writing code. In fact, we actually write code less than half of the time at work.
# -*- coding:utf-8 -*-import ioimport reclass Counter: def init(self, path): "" :param path: file path "" self.mapping = dict() with io.open (path, encoding="utf-8") as f: data = f.read () words = [s.lower() for s in re.findall ("\w+", data)] for word in words: self.mapping[word] = self.mapping.get(word, 0) + 1 def most_common(self, n): assert n > 0, "n should be large than 0" return sorted (self.mapping.items(), key=lambda item: item[1], reverse=True)[:n]if name == 'main': most_common_5 = Counter("importthis.txt").most_common(5) for item in most_common_5: print(item)
Print Results:
('is', 10)
('better', 8)
('than', 8)
('the', 6)
('to', 5)
summary
When I look at everyone's code, a lot of code still has naming irregularities (recommended reading PEP8), code typography confusion (difficult to read, recommended formatting with Pycharm). There are also a lot of code implementations that seem complex (often the more complex the code, the more bugs). Of course, there is no unique way to achieve this.
For example, Python module itself provides a collections.Counter class, which inherits from dict class, which is used for statistics. I found that some students use this class to implement it. Careful you may find that the Counter I implemented is very similar to the Counter below collections. In fact, this is to make wheels. Making wheels can exercise our programming thinking. Of course, there is no need to make wheels yourself if you have ready-made things at work, unless you have confidence to do better. You can also think about what you would do if Python didn't provide Counter.
In addition, the module also provides an ordered dictionary object OrderedDict, which can be used to avoid our manual sorting operation. Finally, I suggest that you learn to summarize all the contents I mentioned above. If you can persist for 100 days, I believe that your mastery of Python is easy.
Thank you for reading! About Python how to count the number of letters to share here, I hope the above content can be of some help to everyone, so that everyone can learn more knowledge. If you think the article is good, you can share it so that more people can see it!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.