Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to extract keywords and part of speech tagging of jieba, the basic tool of NLP

2025-04-10 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

How to carry out the keyword extraction and part of speech tagging of jieba, the basic tool of NLP, I believe many inexperienced people are at a loss about this. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

Jieba can also extract keywords and part of speech tagging.

Use:

Importjieba # Import jieba

Importjieba.analyse as anls # keyword extraction

Importjieba.posseg as pseg # part of speech tagging

There are two algorithms for keyword extraction:

The first is the TF-IDF algorithm (Term Frequency-Inverse Document Frequency, word frequency-inverse file frequency). Its basic idea is that the more times a word appears in an article and the fewer times it appears in all documents, the more it shows that the word can represent the article.

The second is the TextRank algorithm, the basic idea:

Segment the text of the keywords to be extracted

Build the diagram with fixed window size (default is 5, adjusted by span attribute), co-occurrence relationship between words

Calculate the PageRank of the nodes in the graph, undirected weighted graph

Code:

TF-IDF: jieba.analyse.extract_tags (sentence,topK=20, withWeight=True, allowPOS= ())

TextRank:jieba.analyse.textrank (sentence, topK=20, withWeight=True)

Among them, topK is the output of how many keywords, whether withWeight output the weight of each keyword.

Enter the sentence "jieba can extract keywords and part of speech tagging in addition to the most important function-word segmentation":

Keywords for TF-IDF output:

Part of speech 0.91

Jieba0.85

-0.85

Participle 0.84

Dimension 0.66

Keyword 0.64

Extract 0.54

Outside 0.42

Function 0.39

Except for 0.37.

Important 0.29

And 0.29

Carry on 0.27

Can be 0.25

Keywords for TextRank output:

Part of speech 1.00

Extract 0.99

Keyword 0.99

Function 0.90

Participle 0.90

Carry on 0.76

Dimension 0.75

Relatively speaking, the keywords of TextRank output are more regular.

Part of speech tagging

Use jieba.posseg for part of speech tagging.

Code:

Importjieba.posseg

Words = jieba.posseg.cut ("I came to Beijing Tsinghua University")

For x, win words:

Print ('% s% s'% (x, w))

Output:

I r

Come to v

Beijing ns

Nt of Tsinghua University

After reading the above, have you mastered how to extract keywords and part of speech tagging of jieba, the basic tool of NLP? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report