How regular expressions match words 07/04 Update SLTechnology News&Howtos

How regular expressions match words

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article is about how regular expressions match words. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

The regular expression matches the inside of the word:

Metacharacters are also "anchors" that match positions. This match is a 0-length match. There are four locations that are considered "word boundaries":

1) the position in front of the * characters of the string (if the * characters of the string is a "word character")

2) the position after one character of the string (if the character of the string is a "word character")

3) between a "word character" and a "non-word character", where the "non-word character" immediately follows the "word character"

4) between a "non-word character" and a "word character", where the "word character" immediately follows the "non-word character"

"word characters" are characters that can be matched with "\ w", and "non-word characters" are characters that can be matched with "\ W". In most regular expression implementations, "word characters" usually include.

For example, you can match a single 4 instead of part of a larger number. This regular expression does not match the 4 in "44".

In other words, it almost matches the position of the beginning and end of an "alphanumeric sequence".

The inverse set of "word boundary" is that the position he wants to match is between two "word characters" or between two "non-word characters".

Discussion on the principle of regular expression matching words:

◆ goes deep inside the regular expression engine

Let's look at applying regular expressions to the string "This island is beautiful". The engine processes symbols first. Because\ b is 0 in length, the position in front of the T of * characters will be examined. Because T is a "word character" and the character before it is an empty character (void),\ b matches the word boundary. Then failed to match the * character "T". The matching process continues until there is a match between the fifth space and the fourth character "s". However, the space character does not match. Go back to the sixth character "I", which matches the fifth space character, and then matches both the sixth and seventh characters. However, the eighth character does not match the second "word boundary", so the match fails again. To the 13th character I, because it forms a "word boundary" with the previous space character and matches "is". The engine then tries to match the second one. Because the 15th space character and "s" form the word boundary, the match is successful. The engine is in a hurry to return the result of a successful match.

Thank you for reading! This is the end of the article on "how regular expressions match words". I hope the above content can be of some help to you, so that you can learn more knowledge. If you think the article is good, you can share it for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.