What are the common regular expression engines? 04/16 Update SLTechnology News&Howtos

What are the common regular expression engines?

2025-04-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article is about what common regular expression engines have. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

Common regular expression engine

The engine determines the regular expression matching method and the internal search process, and it is important to understand it. At present, the main popular engines are: DFA,NFA two engines, let's make a comparison.

Engine distinguishing point DFA

Deterministic finite automaton

Deterministic finite automaton DFA engines they do not require backtracking (and therefore they never test the same character twice), so matching is fast! The DFA engine can also match the longest possible string. However, the DFA engine contains only a limited number of states, so it cannot match patterns with reverse references or capture subexpressions. The representatives are: awk,egrep,flex,lex,MySQL,ProcmailNFA

Non-deterministic finite automaton nondeterministic finite automaton, which is divided into traditional NFA,Posix NFA traditional NFA engine, runs the so-called "greedy" matching backtracking algorithm (longest-leftmost), tests all possible extensions of regular expressions in a specified order and accepts the first match. Traditional NFA backtracking can access exactly the same state multiple times, and in the worst case, it may be very slow to perform, but it supports submatching. Representatives are: GNU Emacs,Java,ergp,less,more,.NET language

PCRE library,Perl,PHP,Python,Ruby,sed,vi, etc., which are generally used in high-level languages.

DFA matches the search in regular expressions one by one with string characters, while NFA looks in strings one by one based on regular expressions. Although the speed is slow, it is easier for the operator, so it is more widely used! The following are all examples of the NFA engine, parsing process!

String composition in the eyes of the parsing engine

For the string "DEF", it includes three characters D, E, F and four numeric positions 0, 1, 2, 3: 0D1E2F3. For regular expressions, all source strings have characters and positions. The regular expression will match one by one from the position of 0.

Occupy characters and zero width

In the process of regular expression matching, if the subexpression matches the character content, not the position, and is saved in the final matching result, then the subexpression is considered to possess the character; if the subexpression matches only the position, or the matching content is not saved in the final matching result, then the subexpression is considered to be zero width. Possession characters are mutually exclusive and zero width is non-mutually exclusive. That is, a character can only be matched by one subexpression at a time, while a position can be matched by multiple zero-width subexpressions at the same time. Common zero-width characters are: ^, (? =), etc.

An example of the matching process of regular expression

We have mastered the above concepts, and let's analyze the next few common parsing processes. Combined with the use of software regexBuddy to analyze.

Demo1: source character DEF, corresponding tag is: 0D1E2F3, matching regular expression is: / DEF/

The process can be understood as: first, the control is obtained by the regular expression character / D/, matching starts from position 0, and the "D" is matched by / D/. If the match is successful, the control is handed over to the character / E /; because "D" has been matched by / D/, so / E / tries to match from position 1, and the match succeeds, and the control is handed over to / F /. "F" is matched by / F /, and the match is successful.

Demo2: source character DEF, corresponding tag is: 0D1E2F3, matching regular expression is: / D\ w+F/

The process can be understood as follows: first, the control is obtained by the regular expression character / D /, the match starts from position 0, and the "D" is matched by / D /. If the match is successful, the control is given to the character /\ wicked /. Since "D" has been matched by / D /, /\ w / tries to match from position 1,\ w + greedy mode, will record an alternative state, the default will match the longest character, directly match to EF, and the match is successful, the current position is 3. And give control to / F /; if the / F / match fails,\ w + the match will backtrack one bit, and the current position becomes 2. And hand over the control to / Fhand, and the / F / matching character F succeeds. So\ w + here matches the E character, the match is complete!

Demo3: source character DEF, corresponding tag: 0D1E2F3, matching regular expression is: / ^ (? = D) [Dmurf] + $/

The process can be understood as: metacharacters / ^ / and / $/ match only the position, and the sequence looks around / (? = D) / (matches the current position, whether the character "D" character appears on the right) only matches, does not occupy the character, and does not save the matching content to the final matching result, so it is zero width. First of all, the control is obtained by the metacharacter / ^ /. The match starts from position 0, and the starting position "position 0" is matched by / ^ /. If the match is successful, the control is handed over to the sequential look / (? = D) / / (? = D]) / requires that the right side of its position must be the letter "D" to match successfully, and zero-width subexpressions are not mutually exclusive, that is, the same position can be matched by multiple zero-width subexpressions at the same time, so it also attempts to match from position 0, the right side of position 0 is the character "D", meets the requirements, matches successfully, and the control is handed over to / [Daff] + /. Because / (? = D) / only matches, the content of the match is not saved to the final result, and / (? = D) / the location where the match succeeds is position 0, so / [Dmurf] + / also tries to match from position 0, / [Dmurf] + / first tries to match "D", and continues to try to match until the match is finished with "EF". At this time, it has been matched to position 3, and there are no characters on the right side of position 3. Control will be handed over to / $/, metacharacter / $/ to try to match from position 3, which matches the end position, that is, "position 3". The match is successful. At this point, the regular expression match is complete and the match is reported to be successful. The matching result is "DEF", with a start position of 0 and an end position of 3. Where / ^ / match position 0, / (? = D) / match position 0, / [Dmurf] + / match string "DEF", / $/ match position 3.

Thank you for reading! This is the end of this article on "what are the common regular expression engines?". I hope the above content can be of some help to you, so that you can learn more knowledge. If you think the article is good, you can share it for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.