In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly explains "how to use python pkuseg to generate cloud words", interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how to use python pkuseg to generate cloud words.
Install pkusegpip3 install pkuseg
The first step is to download the speech, save it to a txt file, and then load the content into memory
Content = []
With open ("yanjiang.txt", encoding= "utf-8") as f:
Content = f.read ()
I counted and found that the total number of words was 32546.
Next, we use pkuseg to segment the content, and count the top 20 words with the highest frequency.
Import pkuseg
From collections import Counter
Import pprint
Content = []
With open ("yanjiang.txt", encoding= "utf-8") as f:
Content = f.read ()
Seg = pkuseg.pkuseg ()
Text = seg.cut (content)
Counter = Counter (text)
Pprint.pprint (counter.most_common (20))
Output result:
What the heck, what are these things? don't worry. In fact, ah, there is also a concept in the field of participle called stop words. The so-called stop words are words that have no specific meaning in the context, such as this, that, you, me, him, land, and punctuation match, and so on. Because no one is searching with these meaningless stop words when searching, in order to make the word segmentation better, we have to get rid of these stop words, and let's go to the Internet to find a stop thesaurus.
The second version code:
Import pkuseg
From collections import Counter
Import pprint
Content = []
With open ("yanjiang.txt", encoding= "utf-8") as f:
Content = f.read ()
Seg = pkuseg.pkuseg ()
Text = seg.cut (content)
Stopwords = []
With open ("stopword.txt", encoding= "utf-8") as f:
Stopwords = f.read ()
New_text = []
For w in text:
If w not in stopwords:
New_text.append (w)
Counter = Counter (new_text)
Pprint.pprint (counter.most_common (20))
Printed result:
[(Wechat, 163)
('user', 112)
('products', 89)
('Friends', 81)
(tools, 56)
('Program', 55)
('socialize', 55)
(circle, 47)
(video, 40)
(hope, 39)
(time, 39)
('Game', 36)
('read', 33)
('content', 32)
('platform', 31)
(article, 30)
('Information', 29)
('team', 27)
('AI', 27)
('APP', 26)]
It looks much better than the first time, because the stop words have been filtered out, which is a bit similar to the picture of copper mining, but the words he picked out may come from another dimension, after all, he is a psychologist. But the first 20 high-frequency words we selected are still not accurate, and some words that should not be segmented, such as moments, official account, Mini Program and so on, are also split. We think this is a whole.
For these proper nouns, we only need to specify a user dictionary, when word segmentation, the words in the user dictionary are fixed and re-segmented.
Lexicon = ['Mini Program', 'moments', 'official account'] #
Seg = pkuseg.pkuseg (user_dict=lexicon) # load model, given user dictionary
Text = seg.cut (content)
The final result is that the first 50 high-frequency words are like this.
163 Wechat
112 users
89 products
72 moments
56 tools
55 socializing
53 Mini Program
40 Video
39 Hope
39 hours
36 games
33 Reading
32 content
31 friends
31 platform
30 articles
29 Information
27 team
27 AI
26 APP
25 official account
25 Servic
24 good friends
22 photos
The 21st era
21 record
20 mobile phone
20 recommendation
20 enterprises
19 motive force
18 function
18 True
18 Life
17 flow
16 computer
15 space
15 discovery
15 creativity
15 embodiment
15 companies
15 value
Version 14
14 share
14 Future
13 Internet
13 release
13 ability
13 discussion
13 dynamic
12 Design
Zhang Xiaolong's most frequently spoken words are users, friends, motivation, value, sharing, creativity, discovery and so on. Users appear 112 times, hope 39 times, and friends 31 times. These words are the spirit of the Internet. If we make these words into words, the effect may be better.
At this point, I believe you have a deeper understanding of "how to use python pkuseg to generate cloud words". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.