Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use Python word Segmentation tool jieba

2025-03-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article is about how to use the Python word segmentation tool jieba. The editor thinks it is very practical, so I share it with you. I hope you can get something after reading this article.

Stuttering is one of the most popular word segmentation tools in Python language, which is widely used in natural language processing and other scenarios.

Because the document written by GitHub is too verbose, I have sorted out a simple version of the entry guide, which can be used directly after reading it.

Install pip install jieba

Simple participle import jieba

Result = jieba.cut ("I love Peking University in China")

For word in result:

Print (word)

Output

I

Love

China

Peking University

The sentence is divided into five phrases.

Full pattern participle result = jieba.cut ("I love Peking University of China", cut_all=True)

For word in result:

Print (word)

Output

I

Love

China

Beijing

Peking University

University

The words separated from the full model cover a wider range.

Extract keywords

Extract the first k keywords from a sentence or paragraph

Import jieba.analyse

Result = jieba.analyse.extract_tags ("Machine learning requires a certain mathematical foundation, and a lot of mathematical basic knowledge is needed."

"if you start from beginning to end, it is estimated that most people will not have time. I suggest learning the most basic knowledge of mathematics first."

TopK=5

WithWeight=False)

Import pprint

Pprint.pprint (result)

Output

['mathematics', 'learning', 'mathematical knowledge', 'basic knowledge', 'from beginning to end']

TopK is the keyword with the largest weight in the first topk returned.

WithWeight returns the weight value of each keyword

Remove the stop word

Stop words refer to words that don't matter in a sentence, such as punctuation, demonstrative pronouns, etc., which should be removed before participle. The word segmentation method cut does not support direct filtering of stop words and needs to be handled manually. The method of extracting keywords extract_tags supports stopping word filtering

# filter the stop word first

Jieba.analyse.set_stop_words (file_name)

Result = jieba.analyse.extract_tags (content, tokK)

The file format of file_name is a text file with one word per line

The above is how to use the Python word segmentation tool jieba. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report