In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
Today, I will talk to you about which Python libraries can help you carry out natural language preprocessing easily, which may not be well understood by many people. in order to make you understand better, the editor has summarized the following contents for you. I hope you can get something according to this article.
Natural language processing (NLP) is one of the more extensive research fields. Many large companies have invested heavily in this field. NLP provides companies with the opportunity to understand consumers well based on their mood and text. Some of the best use cases for NLP are detection of fake emails, classification of fake news, emotional analysis, prediction of your next word, autocorrection, chatbots, personal assistants, and so on.
7 terms to know before solving any NLP task
Markup: it is the process of dividing the entire text into small tags. Divination is done on the basis of sentences and words.
Text = "Hello there, how are you doing today? The weather is great today. Python is awsome "# # sentece tokenize (Separated by sentence) ['Hello there, how are you doing today?',' The weather is great today.', 'python is awsome'] # # word tokenizer (Separated by words) [' Hello', 'there',',''how',' are', 'you',' doing', 'today','?', 'The',' weather', 'is',' great', 'today','. 'python',' is', 'awsome']
Stop words: generally speaking, these words do not add much meaning to the sentence. In NLP, we delete all stop words because they are not important for analyzing data. There are altogether 179 stop words in English.
Stem extraction: the process of restoring a word to its root by removing suffixes and prefixes.
Morphological restoration: it works the same as the word stem, but the key difference is that it returns a meaningful word. Mainly the development of chat robots, question and answer robots, text prediction and so on.
WordNet: it is a vocabulary database or dictionary of English language nouns, verbs, adjectives, and adverbs, grouped into collections designed specifically for natural language processing.
Part of speech tagging: it is the process of converting a sentence into a list of tuples. Each tuple has a form (words, tags). The label here indicates whether the word is a noun, an adjective, a verb, etc.
Text ='An sincerity so extremity he additions.'-- ('An',' DT'), ('sincerity',' NN'), ('so',' RB'), ('extremity',' NN'), ('he',' PRP'), ('additions',' VBZ')]
Word bag: it is a process of converting text into some kind of numerical representation. Such as single thermal coding and so on.
Sent1 = he is a good boy sent2 = she is a good girl | | girl good boy sent1 0 1 1 sent2 1 0 1
Now, let's go back to our topic and take a look at the libraries that can help you preprocess your data easily.
NLTK
There is no doubt that it is one of the best and most used libraries for natural language processing. NLTK is the abbreviation of natural language toolkit. Developed by Steven Bird and Edward Loper. It comes with many built-in modules for tagging, lexicalization, stemming, parsing, chunking, and part of speech tagging. It provides more than 50 corpora and vocabulary resources.
Installation: pip install nltk
Let's use NLTK to preprocess a given text
Import nltk # nltk.download ('punkt') from nltk.tokenize import word_tokenize from nltk.corpus import stopwords from nltk.stem import PorterStemmer import re ps = PorterStemmer () text =' Hello there,how are you doing today? I am Learning Python.' Text = re.sub ("[^ a-zA-Z0-9]", "" Text) text = word_tokenize (text) text_with_no_stopwords = [ps.stem (word) for word in text if word not in stopwords.words ('english')] text = ".join (text_with_no_stopwords) text-OUTPUT- -'hello today I learn python'TextBlob
Textblob is a simplified text processing library. It provides a simple API for performing common NLP tasks, such as part of speech tagging, affective analysis, classification, translation, etc.
Installation: pip install textblob
Spacy
This is one of the best natural language processing libraries in python and is written in cpython. It provides some pre-trained statistical models and supports tagging in more than 49 languages. It is characterized by convolution neural network and is used for marking, parsing and naming entity recognition.
Installation: pip install spacy
Import spacy nlp = spacy.load ('en_core_web_sm') text = "I am Learning Python Nowdays" text2 = nlp (text) for token in text2: print (token,token.idx)-- OUTPUT--- I 0 am 2 Learning 5 Python 14 Nowdays 21Gensim
It is a Python library designed to identify semantic similarities between two documents. It uses vector space modeling and topic modeling toolkits to find similarities between documents. It is an algorithm designed to deal with large text corpora.
Installation: pip install gensim
CoreNLP
The goal of Stanford CoreNLP is to simplify the process of applying different language tools to a piece of text. This library runs very fast and works well in development.
Installation: pip install stanford-corenlp
After reading the above, do you have any further understanding of what Python libraries are available to help you do natural language preprocessing easily? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.