In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-10 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
How to carry out the keyword extraction and part of speech tagging of jieba, the basic tool of NLP, I believe many inexperienced people are at a loss about this. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.
Jieba can also extract keywords and part of speech tagging.
Use:
Importjieba # Import jieba
Importjieba.analyse as anls # keyword extraction
Importjieba.posseg as pseg # part of speech tagging
There are two algorithms for keyword extraction:
The first is the TF-IDF algorithm (Term Frequency-Inverse Document Frequency, word frequency-inverse file frequency). Its basic idea is that the more times a word appears in an article and the fewer times it appears in all documents, the more it shows that the word can represent the article.
The second is the TextRank algorithm, the basic idea:
Segment the text of the keywords to be extracted
Build the diagram with fixed window size (default is 5, adjusted by span attribute), co-occurrence relationship between words
Calculate the PageRank of the nodes in the graph, undirected weighted graph
Code:
TF-IDF: jieba.analyse.extract_tags (sentence,topK=20, withWeight=True, allowPOS= ())
TextRank:jieba.analyse.textrank (sentence, topK=20, withWeight=True)
Among them, topK is the output of how many keywords, whether withWeight output the weight of each keyword.
Enter the sentence "jieba can extract keywords and part of speech tagging in addition to the most important function-word segmentation":
Keywords for TF-IDF output:
Part of speech 0.91
Jieba0.85
-0.85
Participle 0.84
Dimension 0.66
Keyword 0.64
Extract 0.54
Outside 0.42
Function 0.39
Except for 0.37.
Important 0.29
And 0.29
Carry on 0.27
Can be 0.25
Keywords for TextRank output:
Part of speech 1.00
Extract 0.99
Keyword 0.99
Function 0.90
Participle 0.90
Carry on 0.76
Dimension 0.75
Relatively speaking, the keywords of TextRank output are more regular.
Part of speech tagging
Use jieba.posseg for part of speech tagging.
Code:
Importjieba.posseg
Words = jieba.posseg.cut ("I came to Beijing Tsinghua University")
For x, win words:
Print ('% s% s'% (x, w))
Output:
I r
Come to v
Beijing ns
Nt of Tsinghua University
After reading the above, have you mastered how to extract keywords and part of speech tagging of jieba, the basic tool of NLP? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.