How to use python spaCy 07/11 Update SLTechnology News&Howtos

How to use python spaCy

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "how to use python spaCy". In daily operation, I believe many people have doubts about how to use python spaCy. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts about "how to use python spaCy". Next, please follow the editor to study!

Statistical model of spaCy

These models are the core of spaCy. These models enable spaCy to perform NLP-related tasks such as part of speech tagging, named entity recognition, and dependency parsing.

Here I list the different statistical models and their specifications in spaCy:

En_core_web_sm: English multitasking CNN, trained on OntoNotes, size is 11 MB

En_core_web_md: English multitasking CNN, trained on OntoNotes and embedded using GLoVe words trained on Common Crawl, size is 91 MB

En_core_web_lg: English multitasking CNN, trained on OntoNotes and embedded using GLoVe words trained on Common Crawl, the size is 789 MB

It is very easy to import these models. We can import the model by executing spacy.load ('model_name'), as follows:

Import spacynlp = spacy.load ('en_core_web_sm') spaCy processing pipeline

When using spaCy, the first step in a text string is to pass it to the NLP object. This object is essentially a pipeline of several text preprocessing operations through which the input text string must be entered.

As shown in the figure above, the NLP pipeline has several components, such as tag generator, tag maker, parser, ner, and so on. Therefore, you must pass all of these components before you can process the input text string.

Let me show you how to create a nlp object:

Import spacynlp = spacy.load ('en_core_web_sm') # create nlp object doc = nlp ("He went to play basketball")

You can use the following code to find the active pipe component:

Nlp.pipe_names

Output: ['tagger','parser','ner']

If you want to disable the pipe component and keep only the ner running, you can disable the pipe component using the following code:

Nlp.disable_pipes ('tagger',' parser')

Let's examine the active pipe component again:

Nlp.pipe_names

Output: ['ner']

When you only need to mark up the text, you can disable the entire pipe. The tagging process becomes very fast. For example, you can disable multiple components of a pipe using the following line of code:

Nlp.disable_pipes ('tagger',' parser') spaCy actual combat

Now, let's practice. In this section, you will learn to use spaCy to perform various NLP tasks. We'll start with popular NLP tasks, including part of speech tagging, dependency analysis, and named entity recognition.

1. Part of speech tagging

In English grammar, parts of speech tell us what the function of a word is and how to use it in sentences. The commonly used parts of speech in English are nouns, pronouns, adjectives, verbs, adverbs and so on.

Part of speech tagging is the task of automatically assigning part of speech tagging to all words in a sentence. It is helpful to various downstream tasks in NLP, such as feature engineering, language understanding and information extraction.

Executing POS tags in spaCy is a simple process:

Import spacy nlp = spacy.load ('en_core_web_sm') # create nlp object doc = nlp ("He went to play basketball") # traversal tokenfor token in doc: # Print the token and its part-of-speech tag print (token.text, "-->", token.pos_)

Output:

He-> PRON went-> VERB to-> PART play-> VERB basketball-> NOUN

Therefore, the model correctly identifies the POS tags of all the words in the sentence. If you are unsure about any of these tags, you can simply use * spacy.explain () * to determine:

Spacy.explain ("PART")

Output: 'particle'

two。 Using spaCy for dependency analysis

Every sentence has a grammatical structure, which can be extracted by dependency parsing. It can also be regarded as a directed graph, in which the nodes correspond to the words in the sentence, and the edges between the nodes are the corresponding dependencies between words.

In spaCy, performing dependency analysis is also very easy. We will use the same sentence as part of speech tagging:

# dependency Analysis for token in doc: print (token.text, "-->", token.dep_)

Output:

He-> nsubj went-> ROOT to-> aux play-> advcl basketball-> dobj

The dependency marker ROOT indicates the main verb or action in a sentence. Other words are directly or indirectly related to the root of a sentence. You can understand the meaning of other tags by executing the following code:

Spacy.explain ("nsubj"), spacy.explain ("ROOT"), spacy.explain ("aux"), spacy.explain ("advcl"), spacy.explain ("dobj")

Output:

('nominal subject', None,' auxiliary', 'adverbial clause modifier',' direct object')

3. Named entity recognition based on spaCy

First of all, let's understand what an entity is. An entity is a word or phrase that represents information about common things, such as individuals, places, organizations, and so on. These entities have proper names.

For example, consider the following sentence:

In this sentence, entities are "Donald Trump", "Google" and "New York City".

Now let's take a look at how spaCy recognizes named entities in sentences.

Doc = nlp ("Indians spent over $71 billion on clothes in 2018") for ent in doc.ents: print (ent.text, ent.label_)

Output:

Indians NORP over $71 billion MONEY 2018 DATE

Spacy.explain ("NORP")

Output: 'Nationalities or religious or political groups'

4. Rule-based spaCy matching

Rule-based matching is a new feature of spaCy. Using this spaCy matcher, you can use user-defined rules to find words and phrases in text.

Like regular expressions.

Regular expressions use text patterns to find words and phrases, while spaCy matchers use not only text patterns, but also the lexical attributes of words, such as POS tags, dependency tags, roots, and so on.

Let's see how it works:

Import spacynlp = spacy.load ('en_core_web_sm') # Import spaCy Matcherfrom spacy.matcher import Matcher# initialize with spaCy vocabulary Matchermatcher = Matcher (nlp.vocab) doc = nlp ("Some people start their day with lemon water") # define rule pattern = [{' TEXT': 'lemon'}, {' TEXT': 'water'}] # add rule matcher.add (' rule_1', None, pattern)

So, in the above code:

First, we import spaCy matcher

After that, we initialize the matcher object with the default spaCy vocabulary

Then, we pass the input in the NLP object as usual

In the next step, we will define rules for what we want to extract from the text.

Suppose we want to extract the phrase "lemon water" from the text. So, our goal is for water to follow lemon. Finally, we add the defined rules to the matcher object.

Now let's see what matcher found:

Matches = matcher (doc) matches

Output: [(7604275899133490726,6,8)]

The output has three elements. The first element "7604275899133490726" is the matching ID. The second and third elements are the location of the matching tag.

# extract matching text for match_id, start, end in matches: # to get matching width matched_span = doc [start:end] print (matched_span.text)

Output: lemon water

Therefore, the pattern is a list of tag attributes. For example, "TEXT" is a tag attribute that represents the exact text of the tag. In fact, there are many other useful tag attributes in spaCy that can be used to define various rules and patterns.

I listed the following tag attributes:

Attribute type description ORTHunicode exact matching text TEXTunicode exact matching text LOWERunicode text lowercase LENGTHint text length IS_ALPHA, IS_ASCII, IS_DIGITbool text consists of alphabetic characters, ASCII characters, and numbers. IS_LOWER, IS_UPPER, IS_TITLEbool text is in lowercase, uppercase, uppercase format. IS_PUNCT, IS_SPACE, and IS_STOPbool texts are punctuation marks, spaces, and stop words. LIKE_NUM, LIKE_URL, LIKE_EMAILbool text represents numbers, URL, and email. POS, TAG, DEP, LEMMA, SHAPEunicode texts are part of speech tags, dependency tags, roots, shapes. ENT_TYPEunicode entity label

Let's look at another use case for spaCy matcher. Consider the following two sentences:

You can read this book

I will book my ticket

Now we are interested in finding out whether a sentence contains the word "book". Looks pretty straightforward, right? But there is a problem here-only when the word "book" is used as a noun in a sentence can we find it.

In the first sentence above, "book" is used as a noun and in the second sentence, it is used as a verb. Therefore, the spaCy matcher should only be extracted from the first sentence. Let's try it:

Doc1 = nlp ("You read this book") doc2 = nlp ("I will book my ticket") pattern = [{'TEXT':' book', 'POS':' NOUN'}] # initialize matchermatcher = Matcher (nlp.vocab) matcher.add ('rule_2', None, pattern) matches = matcher (doc1) matches with shared vocab

Output: [(7604275899133490726,3,4)]

Matcher found the pattern in the first sentence.

Matches = matcher (doc2) matches

Output: []

Fine! Although "book" appears in the second sentence, matcher ignores it because it is not a noun.

At this point, the study on "how to use python spaCy" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.