Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Introduction to the principle of regular expression

2025-03-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

The main content of this article is "introduction to the principle of regular expression". Interested friends may wish to take a look at it. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "introduction to the principles of regular expressions".

Regular expressions may be used by most people, but when you use them, have you ever thought about the principle behind regular expressions, or are you particularly surprised when I tell you that regular expressions may have performance problems that cause them to hang up online?

Let's take a look at an example of an online accident in early July because of a regular expression.

Https://blog.cloudflare.com/d...

In a nutshell, it's a regular expression with a performance problem that causes catastrophic backtracking and leads to a full load of cpu.

Regularity of performance problems

First take a look at the regularity of the problem.

The key part of the performance problem is. * (?:. * =. *). Let's ignore the non-capture group and regard the regularity of the performance problem as. *. * =. *.

Among them. It means to match any character except line breaks (many people make a mistake here, it is easy to bug), and. * means to greedily match any character any number of times.

What is backtracking?

When greedy matching or lazy matching or matching enters the matching path selection, the behavior of encountering a failed matching path and trying to take another matching path is called backtracking.

Can be understood as walking a maze, a road to the end, found that there is no way to go back to the last fork in the road to choose another road.

Retrospective phenomenon

/ / performance problem regularization / / paste the following code into the browser console to try const regexp = `[Amurz] +\\ d + (. *): (. *) + [A Methoz] +\\ d +`; const str = `A1VOBZE 1Magnum Glaze H1` const reg = new RegExp (regexp); start = Date.now (); const res = reg.test (str); end = Date.now () Console.log ('regular regular execution time:' + (end-start))

Now let's see what backtracking is all about.

Suppose we have a regular (. *) +\ d, and the input string is abcd. Note that only one string of length 4 is entered at this time. Let's analyze the matching backtracking process:

The above shows a backtracking matching process, roughly describing the first three rounds of matching.

Note (. *) + here it can be seen as multiple execution for the time being. (. *) {1,}

Match * * times, because. * you can match any character any time, so here you can choose to match empty, a, ab, abc, abcd. Because of the greedy nature of *, 4 characters of abcd are directly matched, + because there are no other characters behind, so just look at. * it will not match after eating abcd. Here, the value of + is 1, and then\ d nothing can match. Therefore, if the match fails, perform * backtracking times.

The second match, because backtracking, so go back to the last matching path selection, the last time. * matching is abcd, and the road is blocked, so this time can only try to match abc, this time there is a d at the end, then it can be understood as. * matched abc for the second time, and then because of (. *) +, the second match can be carried out. Here. * can match d, where the value of + is 2, and then\ d nothing can match, so the match fails and a second backtrack is performed.

The third match, because backtracking, so go back to the last matching path selection, the last match is abc, the second match is d, and the path is blocked, so the second match is not matched, and there is a d,\ d and d match failure at the end, and the third backtracking is carried out.

How to reduce or avoid backtracking

Optimize regular expressions: always pay attention to the performance impact of backtracking.

Regular expressions using the DFA regular engine

What is the DFA regular engine

Traditional regular engines are divided into NFA (non-deterministic finite state automaton) and DFA (deterministic finite state automaton).

DFA

For any given state and input character, DFA will only move to a certain state. And DFA does not allow state transitions without input characters.

For example, state 0, when entering the character A, there is only one end point, only to state 1.

NFA

For any state and input character, the state that NFA can transfer is a non-empty set.

For example, state 0, when entering the character A, there can be multiple endpoints, either to state 1 or to state 0.

The difference between the regular engine of DFA and NFA

So after all this talk, what's the difference between DFA and NFA regular engines? Or how do DFA and NFA implement regular engines?

DFA

The DFA engine in the rule actually converts the regular expression into an adjacency table of a graph, and then determines whether a string matches the regular in the form of a jump table.

/ / roughly simulate function machine (input) {if (typeof input! = = 'string') {console.log (' incorrect input'); return } / / for example, after regular: / abc/ is converted to DFA / / here we define four states, which are 0meme 1meme 2jue 3. The initial state is 0 const reg = {0: {a: 1,}, 1: {b: 3,}, 2: {isEnd: true,}, 3: {c: 2,},} Let status = 0; for (let I = 0; I < input.length; iTunes +) {const inputinputChar = input [I]; status = reg [status] [inputChar]; if (typeof status = 'undefined') {console.log (' match failed'); return false;}} const end = reg [status] If (end & & end.isEnd = true) {console.log ('match succeeded'); return true;} else {console.log ('match failed'); return false;}} const input = 'abc'; machine (input)

Advantages: no matter how badly regular expressions are written, the matching speed is fast.

Disadvantages: advanced features such as capture groups and assertions are not supported

NFA

The NFA engine in the regular is actually a directed graph constructed during syntax parsing. Then through the way of deep search, to the recursive attempt of a path.

Advantages: powerful, can get matching context information, support a variety of assertions to capture the group look around and other functions

Disadvantages: high requirements for regular development skills, need to pay attention to the performance problems caused by backtracking

Summary

Now back to the beginning of the question, let's take a look at why there is a performance problem with his regularization.

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

First of all, his regulars use NFA's regular engine (most languages' regular engines are NFA, and so is js, so pay attention to the impact of performance problems)

He wrote regular expressions with performance problems that could easily lead to catastrophic backtracking.

To avoid such problems, either raise the developer's awareness of regular performance issues, or switch to DFA's regular engine (fast, weak, no replenishment group assertions, etc.).

Matters needing attention

In the usual regular writing, write less fuzzy matching, the more accurate the better, fuzzy matching, greedy matching, lazy matching will bring backtracking problems, just choose a way with as little impact as possible. Just keep the concept of a performance problem in mind when writing rules.

At this point, I believe you have a deeper understanding of the "introduction to the principle of regular expressions". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 207

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report