How to use python to realize the function of word Frequency Statistics 07/16 Update SLTechnology News&Howtos

How to use python to realize the function of word Frequency Statistics

2025-07-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article introduces how to use python to achieve word frequency statistics, the content is very detailed, interested friends can refer to, hope to be helpful to you.

Functional requirements

This is our teacher's homework code, there are comments requiring word frequency statistics software:

1) read data from text: (input and output of file)

2) do not be case-sensitive and remove special characters.

3) count the words such as about: 10 and count the total number of words

4) sort the words. Number of occurrences

5) output the 10 words and times with the highest word frequency

6) save the statistical results into the text

The method is as follows

1. Read the file, case-sensitive, remove special characters

Import re def getword (): # read file f=open ('read.txt','r',encoding='utf-8') # convert uppercase to lowercase word=f.read (). Lower () # close file f.close () # remove special characters by regular removal of special characters |\ character + list=re.split (\, + |\. + |\. + |\: + |\? + |\ + |\ (+ |\) + |\ _ + |\ = + |\ "+ |\" + |\, + |\ {+ |\} + |\ "+ |\: + |\. + |\ "+ |\ [+ |\] + |\ [+ |\] + |\-+ |\% + |\" +', word) # remove spaces in the traversal list I = 0 while I

< len(list): if list[i] == '': list.remove(list[i]) i -= 1 i += 1 # for a in list: # if a == "": # list.remove(a) #用for循环的话如果存在多个空字符串其列表会随时发生变化，导致无法正常删除空字符串所以在使用for…in循环遍历列表时，最好不要对元素进行增删操作 # 对于others'优化如果最后一个字符是'就将'其去掉 for i in range(len(list)): l=list[i] if list[i][-1] == "'": list[i] = list[i][:-1] return list 2. 统计，排序 from getfilewords import getword def statistics(): dict={} #定义一个空的字典，在后面的运算中逐步添加数据 words=getword() for word in words: #遍历整个列表 if word in dict.keys(): #判断当前单词是否已经存在 dict.keys()是已存进字典中的单词 # 补充：keys() 方法用于返回字典中的所有键； # values() 方法用于返回字典中所有键对应的值； #详情见Test1 dict[word]=dict[word]+1 #在当前单词的个数上加 1 else: dict[word]=1 #当前单词第一次出现时会把单词写入dict字典里格式为 '单词'=1#排序 w_order=sorted(dict.items(),key=lambda x:x[1],reverse=True)# print(dict.items())# dict.items()返回的是列表# 按字典集合中，每一个元组的第二个元素排列。# sorted会对dict.items()这个list进行遍历，把list中的每一个元素，也就是每一个tuple()当做x传入匿名函数lambda x：x[1],函数返回值为x[1]# reverse属性True为降序 False为升序 return w_order #返回排序后的列表 3.结果写入文本 from WordStatistics import statisticsdef writefile(): w_order=statistics() f = open('result.txt', 'w',encoding='utf-8') print("文章单词总个数:",+len(getword()),file=f) print("文章单词总个数:", +len(getword())) # 写入文件 print("词频最高的10个单词和次数",file=f) print("词频最高的10个单词和次数") w_order10=w_order[:10]#将列表的前十位提取并且遍历输出key(单词)和values(次数) for key,values in w_order10: print(key,':',values,file=f) print(key, ':', values) #遍历列表中的所有数据 print("统计结果",file=f) for key,values in w_order: print(key,':',values,file=f) f.close()#关闭文件 4.程序入口 import os from writefile import writefile print("词频统计软件")print("正在统计中。。。")print("统计成功，结果保存到result.txt")writefile()print("程序运行结束")os.system("pause") 5.运行截图这是需要统计的文本运行程序

Running result

On how to use python to achieve word frequency statistics function to share here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.