In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article is about how to use the Python word segmentation tool jieba. The editor thinks it is very practical, so I share it with you. I hope you can get something after reading this article.
Stuttering is one of the most popular word segmentation tools in Python language, which is widely used in natural language processing and other scenarios.
Because the document written by GitHub is too verbose, I have sorted out a simple version of the entry guide, which can be used directly after reading it.
Install pip install jieba
Simple participle import jieba
Result = jieba.cut ("I love Peking University in China")
For word in result:
Print (word)
Output
I
Love
China
Peking University
The sentence is divided into five phrases.
Full pattern participle result = jieba.cut ("I love Peking University of China", cut_all=True)
For word in result:
Print (word)
Output
I
Love
China
Beijing
Peking University
University
The words separated from the full model cover a wider range.
Extract keywords
Extract the first k keywords from a sentence or paragraph
Import jieba.analyse
Result = jieba.analyse.extract_tags ("Machine learning requires a certain mathematical foundation, and a lot of mathematical basic knowledge is needed."
"if you start from beginning to end, it is estimated that most people will not have time. I suggest learning the most basic knowledge of mathematics first."
TopK=5
WithWeight=False)
Import pprint
Pprint.pprint (result)
Output
['mathematics', 'learning', 'mathematical knowledge', 'basic knowledge', 'from beginning to end']
TopK is the keyword with the largest weight in the first topk returned.
WithWeight returns the weight value of each keyword
Remove the stop word
Stop words refer to words that don't matter in a sentence, such as punctuation, demonstrative pronouns, etc., which should be removed before participle. The word segmentation method cut does not support direct filtering of stop words and needs to be handled manually. The method of extracting keywords extract_tags supports stopping word filtering
# filter the stop word first
Jieba.analyse.set_stop_words (file_name)
Result = jieba.analyse.extract_tags (content, tokK)
The file format of file_name is a text file with one word per line
The above is how to use the Python word segmentation tool jieba. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.