In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article introduces how to use python to achieve word frequency statistics, the content is very detailed, interested friends can refer to, hope to be helpful to you.
Functional requirements
This is our teacher's homework code, there are comments requiring word frequency statistics software:
1) read data from text: (input and output of file)
2) do not be case-sensitive and remove special characters.
3) count the words such as about: 10 and count the total number of words
4) sort the words. Number of occurrences
5) output the 10 words and times with the highest word frequency
6) save the statistical results into the text
The method is as follows
1. Read the file, case-sensitive, remove special characters
Import re def getword (): # read file f=open ('read.txt','r',encoding='utf-8') # convert uppercase to lowercase word=f.read (). Lower () # close file f.close () # remove special characters by regular removal of special characters |\ character + list=re.split (\, + |\. + |\. + |\: + |\? + |\ + |\ (+ |\) + |\ _ + |\ = + |\ "+ |\" + |\, + |\ {+ |\} + |\ "+ |\: + |\. + |\ "+ |\ [+ |\] + |\ [+ |\] + |\-+ |\% + |\" +', word) # remove spaces in the traversal list I = 0 while I
< len(list): if list[i] == '': list.remove(list[i]) i -= 1 i += 1 # for a in list: # if a == "": # list.remove(a) #用for循环的话如果存在多个空字符串 其列表会随时发生变化,导致无法正常删除空字符串 所以在使用for…in循环遍历列表时,最好不要对元素进行增删操作 # 对于others'优化 如果最后一个字符是'就将'其去掉 for i in range(len(list)): l=list[i] if list[i][-1] == "'": list[i] = list[i][:-1] return list 2. 统计,排序 from getfilewords import getword def statistics(): dict={} #定义一个空的字典,在后面的运算中逐步添加数据 words=getword() for word in words: #遍历整个列表 if word in dict.keys(): #判断当前单词是否已经存在 dict.keys()是已存进字典中的单词 # 补充:keys() 方法用于返回字典中的所有键; # values() 方法用于返回字典中所有键对应的值; #详情见Test1 dict[word]=dict[word]+1 #在当前单词的个数上加 1 else: dict[word]=1 #当前单词第一次出现时 会把单词写入dict字典里 格式为 '单词'=1#排序 w_order=sorted(dict.items(),key=lambda x:x[1],reverse=True)# print(dict.items())# dict.items()返回的是列表# 按字典集合中,每一个元组的第二个元素排列。# sorted会对dict.items()这个list进行遍历,把list中的每一个元素,也就是每一个tuple()当做x传入匿名函数lambda x:x[1],函数返回值为x[1]# reverse属性True为降序 False为升序 return w_order #返回排序后的列表 3.结果写入文本 from WordStatistics import statisticsdef writefile(): w_order=statistics() f = open('result.txt', 'w',encoding='utf-8') print("文章单词总个数:",+len(getword()),file=f) print("文章单词总个数:", +len(getword())) # 写入文件 print("词频最高的10个单词和次数",file=f) print("词频最高的10个单词和次数") w_order10=w_order[:10]#将列表的前十位提取并且遍历 输出key(单词)和values(次数) for key,values in w_order10: print(key,':',values,file=f) print(key, ':', values) #遍历列表中的所有数据 print("统计结果",file=f) for key,values in w_order: print(key,':',values,file=f) f.close()#关闭文件 4.程序入口 import os from writefile import writefile print("词频统计软件")print("正在统计中。。。")print("统计成功,结果保存到result.txt")writefile()print("程序运行结束")os.system("pause") 5.运行截图 这是需要统计的文本 运行程序Running result
On how to use python to achieve word frequency statistics function to share here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.