How to use jieba module to extract keywords in python 02/15 Update SLTechnology News&Howtos

How to use jieba module to extract keywords in python

2026-02-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

Python how to use jieba module to extract keywords, I believe that many inexperienced people do not know what to do, so this paper summarizes the causes of the problem and solutions, through this article I hope you can solve this problem.

1. When reading all the data of a user, pay attention to the difference between read (), readline () and readlines (). Read () reads all the contents of the file and stores it in a string variable, readline () reads only one line in the file at a time, and readlines () returns a list of lines.

two。 Notice how to write a list as a string:', '.join (list). For example, if list = [1Person2pyrin3], you can output 1meme2pyr3.

The code is as follows:

Text analysis-keyword acquisition (jieba word splitter, TF-IDF model)

Keyword acquisition can be obtained in two ways:

1. After using jieba word segmentation to process the text, we can get the keywords: jieba.analyse.extract_tags (news, topK=10) by counting the word frequency, and get the top 10 word frequency as keywords.

2. Using TF-IDF weight to acquire keywords, we first need to construct the word frequency matrix for the text, and then we can use the vector to calculate the TF-IDF value.

#-*-coding:utf-8-*-

Import uniout # coding format to solve the problem of garbled Chinese output

Import jieba.analyse

From sklearn import feature_extraction

From sklearn.feature_extraction.text import TfidfTransformer

From sklearn.feature_extraction.text import CountVectorizer

TF-IDF weight:

1. CountVectorizer constructs word frequency matrix.

2. TfidfTransformer constructs tfidf weight calculation.

3. Keywords of the text

4. Corresponding tfidf matrix

# read files

Def read_news ():

News = open ('news.txt'). Read ()

Return news

# jieba word Separator acquires keywords through word frequency

Def jieba_keywords (news):

Keywords = jieba.analyse.extract_tags (news, topK=10)

Print keywords

Def tfidf_keywords ():

# 00. Read the file, one line is a document, and output all documents to one list

Corpus = []

For line in open ('news.txt', 'r'). Readlines ():

Corpus.append (line)

# 01. Construct the word frequency matrix and convert the words in the text into the word frequency matrix

Vectorizer = CountVectorizer ()

# a [I] [j]: indicates the word frequency of j words in the I th text

X = vectorizer.fit_transform (corpus)

Print X # word frequency matrix

# 02. Build TFIDF weights

Transformer = TfidfTransformer ()

# calculate the tfidf value

Tfidf = transformer.fit_transform (X)

# 03. Get the keywords in the word bag model

Word = vectorizer.get_feature_names ()

# tfidf Matrix

Weight = tfidf.toarray ()

# print feature text

Print len (word)

For j in range (len (word)):

Print word [j]

# print weight

For i in range (len (weight)):

For j in range (len (word)):

Print weight [i] [j]

# print'\ n'

If _ _ name__ = ='_ _ main__':

News = read_news ()

Jieba_keywords (news)

Tfidf_keywords ()

After reading the above, have you mastered the method of how to use the jieba module to extract keywords in python? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.