Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Python how to learn NLP Natural language processing basic Operation word bag Model

2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

In this issue, the editor will bring you about Python how to learn the basic operation word bag model of NLP natural language processing. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.

Overview

Starting today, we will start a journey of Natural language processing (NLP). Natural language processing enables processing, understanding, and use of human language to bridge the gap between machine language and human language.

Word bag model

Word bag model (Bag of Words Model) can help us to convert a sentence into a vector representation. The word bag model regards the text as a disordered set of words and counts each word.

Vectorization

The word bag model will first carry out word segmentation, after word segmentation. By counting the number of times each word appears in the text. We can get the word-based characteristics of the text. If we put these words of each text sample together with the corresponding word frequency, it is what we often call vectorization.

Example:

Import jiebafrom gensim import corpora# defines the punctuation mark punctuation = [",", "." Defining corpus content = [it's a nice day today! "," is it going to rain tomorrow? "," it's going to thunder the day after tomorrow. "] # participle seg = [jieba.lcut (con) for con in content] print ("Corpus:", seg) # remove punctuation marks tokenized = seg.copy () for s in tokenized: for p in punctuation: if p in s: s.remove (p) print ("remove punctuation:", tokenized) # tokenized is dictionary = corpora.Dictionary (seg) print ("word bag Model:" Dictionary) # Save dictionary dictionary.save ('deerwester.dict') # View the mapping print of dictionary and subscript id ("number:", dictionary.token2id)

Output result:

Building prefix dict from the default dictionary... Loading model from cache C:\ Users\ Windows\ AppData\ Local\ Temp\ jieba.cacheLoading model cost 1.140 seconds.Prefix dict has been built successfully. Corpus: [['weather today', 'nice','!'], ['tomorrow', 'want', 'rain','?], ['the day after tomorrow', 'yes', 'thunder',']] Remove punctuation: [['today's weather', 'really nice'], ['tomorrow', 'yes', 'rain'], ['the day after tomorrow', 'yes', 'thunder']] word bag model: Dictionary (7 unique tokens: ['today's weather', 'really nice', 'rain', 'tomorrow', 'yes].) Serial number: {'weather today': 0, 'really nice': 1, 'rain': 2, 'tomorrow': 3,'to be': 4, 'the day after tomorrow': 5, 'thunder': 6} above is the Python how to learn NLP natural language processing basic operation word bag model shared by Xiaobian, if you happen to have similar doubts, please refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report