Full-text search function of SQL Server 07/08 Update SLTechnology News&Howtos

Full-text search function of SQL Server

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article will explain in detail the full-text search function of SQL Server. The editor thinks it is very practical, so I share it for you as a reference. I hope you can get something after reading this article.

SQL Server's full-text search (Full-Text Search) is a text retrieval function based on word segmentation, which depends on full-text index. Different from the traditional balanced tree (B-Tree) index and column storage index, full-text index is composed of data tables, called inverted index (Invert Index), which stores the mapping relationship between participle and row unique key. Inverted indexes are automatically created and maintained by SQL Server when creating or updating full-text indexes. Full-text index mainly consists of three kinds of parsers: word splitter (Word Breaker), stem analyzer (stemmer) and synonym analyzer. The data stored in the full-text index is information such as word segmentation and its location. Word segmentation is based on the grammatical rules of a specific language, finding word boundaries according to specific symbols, and decomposing the text into "words", each word is called a term; full-text indexes sometimes extract word trunks and store multiple derived forms of word trunks as a single word stem, a process called extracting word trunks. According to the list of custom synonyms provided by the user, the related words are converted into synonyms, a process called extracting synonyms.

The generation of full-text index is to Word breaker and Stemmer the text data in the user table, convert synonyms (Thesaurus), filter out the stop words (Stopword) in the participle, and finally store the processed data into the full-text index. The process of storing data to full-text data is called the Populate or Crawl process, and the full-text index can be updated manually, automatically, or incrementally.

First, create full-text catalogs and unique indexes

Before you create a full-text index, you must create a full-text catalog (Full-Text Catalog), which is used to organize the full-text index and is the container for the full-text index. Each full-text index must belong to a full-text catalog. The full-text catalog is a logical structure, which is the same as the Schema of the database, regardless of where the full-text index is stored.

Create fulltext catalog catalog_testas default

In order to create a full-text index, there must be a unique (unique), single-column (single-column), non-empty (non-nullable) index on the underlying table. The full-text engine uses this index to map each row of data on the underlying table to a unique index key, and the inverted index stores the mapping relationship between the index key and the participle.

Create unique index uidx_dbLogID on [dbo]. [DatabaseLog] ([DatabaseLogID])

Second, create a full-text index

Only one full-text index can be created per table. When creating a full-text index, you must consider the filegroup stored in the full-text index, the list of stop words associated with the full-text index, the update method of the full-text index, and the language associated with the text. The full-text index column must be a text field, for example:

Create fulltext index on [dbo]. [DatabaseLog] ([tsql] language 1033) key index ui_dbLogIDon (catalog_test,filegroup [primary]) with (change_tracking=off, no population, stoplist=system)

1, language (language)

The option language is optional to specify the language at the column level, the value of which can be the name of the language or LCID, or if the language option is not specified, the default language of the SQL Server instance is used. View the languages supported by the system and their corresponding LCID and names from the system view sys.fulltext_languages (Transact-SQL).

2, full-text catalogue (fulltext_catalog)

The option fulltext_catalog_name specifies the grouping of the full-text index

3, filegroup (filegroup)

The option filegroup filegroup_name is used to specify the filegroup for the full-text index storage. If no filegroup is specified, the full-text index and the underlying table are stored in the same filegroup. Because updating the full-text index is an IO-intensive operation, in order to update the full-text index faster, it is best to store the full-text index on a physical hard disk or filegroup different from the underlying table to achieve maximum IO concurrency.

4. How to populate the full-text index

Like ordinary indexes, when the underlying table data is updated, the full-text index must be updated automatically, which is the default behavior of the system, or you can configure to update the full-text index manually or automatically at specific time intervals.

The option CHANGE_TRACKING specifies whether data updates (Update,Delete, or Insert) related to the full-text index column need to be synchronized to the full-text index.

CHANGE_TRACKING = MANUAL: update manually

CHANGE_TRACKING = AUTO: automatic update, default setting, when the underlying table data changes, the full-text index is automatically updated

CHANGE_TRACKING = OFF, NO POPULATION: no update, specify the option NO POPULATION, indicating that SQL Server will not update (populate) the full-text index after creating the full-text index; if the option NO POPULATION is not specified, SQL Server updates the full-text index after creating the full-text index.

5, stop words (STOPLIST)

Deactivated words (StopWord) are also known as noise words, and each full-text index is associated with a list of deactivated words, and by default, full-text indexes are associated with system deactivated words (system stoplist). The full-text engine removes the deprecated word from the participle so that the full-text index does not contain the deprecated word.

STOPLIST [=] {OFF | SYSTEM | stoplist_name}

Third, fill in the full-text index

Populating a full-text index is also called a crawl process, or a Population process. Since creating or populating a full-text index consumes a lot of system (IO, memory) resources, try to populate the full-text index when the system is idle. When creating a full-text index, by specifying the option CHANGE_TRACKING= MANUAL, or CHANGE_TRACKING= OFF, NO POPULATION, the newly created full-text index will not be populated immediately, and the user can choose to use the alter fulltext index statement to perform the fill operation when the system is idle. The full-text index contains the participle data of the underlying table only after the full-text index is populated.

Alter fulltext index on table_namestart {full | incremental | update} population

There are three ways to update a full-text index:

FULL POPULATION: populate it all, get each row from the underlying table, and reindex it into the full text

INCREMENTAL POPULATION: incremental padding, provided that the underlying table contains timestamp fields. Since the last padding, only the updated data will be indexed into the full text.

UPDATE POPULATION: update padding, re-indexing rows of data that perform update (insert, update, or delete) operations since the last padding

When creating a full-text index, if you specify CHANGE_TRACKING=AUTO or CHANGE_TRACKING= OFF, the newly created full-text index starts the populating process immediately.

Fourth, use contains predicate to query full-text index

If you want to use full-text indexing in a query, you usually use the CONTAINS predicate to call the full-text index to implement a more complex text matching query than the LIKE keyword, while the LIKE keyword is fuzzy matching and full-text indexing is not called.

For example, use the contains predicate to perform an exact match query for a single participle:

Select [tsql] from [dbo]. [DatabaseLog] where contains ([tsql], 'searchword', language 1033)

Compared with Like, full-text query is faster and supports more complex search functions. Using contains predicate, we can not only perform complete matching of participle or prefix matching of participle, but also execute query based on root, query based on custom synonyms, query based on distance and order of adjacent word segmentation. However, compared with Like, contains predicates cannot make suffix matching queries.

The result returned by the contains predicate is a Boolean, or TRUE; is returned if the full-text index column contains the specified keyword or lookup pattern (pattern). Otherwise, FALSE is returned.

Contains predicate supports word query and phrase query, word refers to a single participle, phrase (phrase) is composed of multiple word and spaced spaces, for phrases, you must use double quotation marks to combine multiple word into a phrase.

1. Logical combination query

Use and, and not, or or logical operators to match multiple word or phrase

CONTAINS (Name,'"Mountain" OR "Road"') CONTAINS (Name, 'Mountain OR Road')

2, prefix query

Prefix matching using contains predicates has the same function as like 'prefix%', except that contains predicates use "*" as a wildcard, "*" matches 0 prefix* 1 or more characters, and prefix matching is written as' "prefix*". Full-text indexes can only perform prefix matching.

CONTAINS (Name,'"Chain*"') CONTAINS (Name,'"chain*" OR "full*"')

3, query synonyms (thesaurus) or stem (stemmer)

Stemmer (word stem), for example, according to grammatical rules, English verbs vary according to the number (singular, plural), person, and tense, and these words are all homologous.

CONTAINS (Description, 'FORMSOF (INFLECTIONAL, ride)')

THESAURUS (synonym), you need to import XML for configuration. SQL Server provides a default Thesaurus file, which is Empty. If "Author", "Writer" and "journalist" are synonyms configured in Thesaurus file, as long as any one of the synonyms is satisfied in the fulltext index query, the match will be successful.

CONTAINS (Description, 'FORMSOF (THESAURUS, author)')

4, distance query

Use the near function to query the data rows that match adjacent participles. The near function is defined as follows, which is used for query modes that need to specify a distance in the query mode:

NEAR (({|}) [, … N]) [,] [])

For example, use the Near function to specify the distance and matching order of adjacent participles. Near ((term1,term2,term3), 5) means that the distance between any two term cannot exceed 5, and near ((term1,term2,term3), 5 real) means that the distance between any two term cannot exceed 5 and exists in the string in the order of term1,term2,term3.

-- regardless of the intervening distance and regardless of orderCONTAINS (column_name, 'NEAR (term1, "term3 term4")')-- searches for "AA" and "BB", in either order, within a maximum distance of fiveCONTAINS (column_name, 'NEAR ((AA,BB), 5)')-- in the specified order with regardless of the distanceCONTAINS (column_name, 'NEAR ((Monday, Tuesday, Wednesday), MAX, TRUE)')

For near ((term1,term2,term3), 5LJ true), there are up to 5 term between term1 and term5, excluding the internal search participle, "term2", for example:

CONTAINS (column_name, 'NEAR ((AA,BB,CC), 5)')

This query matches the following text. Note that the internal search participle CC does not calculate the distance:

BB one two CC three four five AA

For example, in the original text, the maximum distance between the participle bike and control cannot exceed 10, and the participle bike must appear before the participle control:

CONTAINS (Comments, 'NEAR ((bike,control), 10, TRUE)')

SQL Server provides full-text search functions, more abundant than LIKE keywords, with primary full-text search functions, fast speed, simple maintenance, the disadvantage is that the full-text search function is very limited, in the actual development, you can cooperate with open source full-text search engines, such as Solr,Elasticsearch, to develop more powerful full-text search functions.

This is the end of this article on "full-text search function of SQL Server". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, please share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.