Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Case Analysis of g Flag in JavaScript regular expression

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly talks about "case analysis of g flag in JavaScript regular expression". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn the example analysis of g flag in JavaScript regular expression.

One day I saw a problem in the Sifu community, which is roughly described as follows.

Const list = ['aquiz,' baked,'-', 'cased,' d']; const reg = / [Amurz] / GX Const letters = list.filter (I = > reg.test (I)); / / letters = ['await,' c']; / / if you don't use the `g` flag, you can get all the letters / / why not after adding `g`

As far as the problem is concerned, the I in the traversal is just a character, and g is not needed.

But as far as my understanding of regularities is concerned (too shallow) there is no sense of g (just a global search, not a match to stop) should not affect, arousing my curiosity.

The suggestion of the above question is as follows

Const reg = / [amerz] / gpolitics reg.test ('a'); / / = > truereg.test ('a'); / / = > falsereg.test ('a'); / / = > truereg.test ('a'); / / = > falsereg.test ('a'); / / = > true decryption process

First of all, it can be determined that the performance must be caused by g.

Search engine

Open MDN to take a closer look at the role of the g flag, and come to the same conclusion as I understand.

My guess is that g may have some kind of cache enabled, and because the reg relative filter is a global variable, I changed the code to:

Const list = ['averse,' baked,'-', 'caged,' d']; const letters = list.filter (I = > / [Amurz] / g.test (I)); / / letters = = ['averse,' baked, 'crested,' d']

By declaring the regularity to each traversal, the conclusion is correct, which verifies my conjecture. Also got, the cache is somewhere in the rule.

Next, I find the corresponding source code to see the cause of the problem.

Source code level

Since I have been watching Rust recently, I use the source code written by Rust to view

After opening the project, click. Enter vscode mode and command+p to search for regexp keywords

Go to the test.rs file and search command+f / g to find the test with a last_index () at line 90

# [test] fn last_index () {let mut context = Context::default (); let init = r # "var regex = / [0-9] + (\. [0-9] +)? / g; the function of" #; / / forward: change the context and return the string of the result. Eprintln! ("{}", forward (& mut context, init)); assert_eq! (forward (& mut context, "regex.lastIndex"), "0"); assert_eq! (& mut context, "regex.test ('1.0foo')", "true"); assert_eq! (& mut context, "regex.lastIndex"), "3") Assert_eq! (forward (& mut context, "regex.test ('1.0foo')"), "false"); assert_eq! (forward (& mut context, "regex.lastIndex"), "0");}

Seeing the lastIndex keyword, we have guessed the cause of the problem again. The g flag has the last subscript after matching, which leads to the problem.

We move our eyes into the mod.rs file and search for test

You see the fn test () method on line 631

Pub (crate) fn test (this: & JsValue, args: & [JsValue], context: & mut Context,)-> JsResult {/ / 1.Let R be the this value. / 2. If Type (R) is not Object, throw a TypeError exception. Let this = this.as_object () .ok_or_else (| | {context .construct _ type_error ("RegExp.prototype.test method called on incompatible value")})?; / 3. Let string be? ToString (S). Let arg_str = args. Get (0). Cladding (). Unwrap _ or_default (). To _ string (context)?; / 4. Let match be? RegExpExec (R, string). Let m = Self::abstract_exec (this, arg_str, context)?; / 5. If match is not null, return true; else return false. If m.is_some () {Ok (JsValue::new (true))} else {Ok (JsValue::new (false))}}

The Self::abstract_exec () method is found in the test () method

Pub (crate) fn abstract_exec (this: & JsObject, input: JsString, context: & mut Context,)-> JsResult {/ / 1.Assert: Type (R) is Object. / 2. Assert: Type (S) is String. / / 3. Let exec be? Get (R, "exec"). Let exec = this.get ("exec", context)? / / 4. If IsCallable (exec) is true, then if let Some (exec) = exec.as_callable () {/ / a. Let result be? Call (exec, R, «S »). Let result = exec.call (& this.clone (). Into (), & [input.into ()], context)?; / b. If Type (result) is neither Object nor Null, throw a TypeError exception. If! result.is_object () & &! result.is_null () {return context.throw_type_error ("regexp exec returned neither object nor null");} / c. Return result. Return Ok (result.as_object (). Cloned ());} / 5. Perform? RequireInternalSlot (R, [RegExpMatcher]]). If! this.is_regexp () {return context.throw_type_error ("RegExpExec called with invalid value");} / 6. Return? RegExpBuiltinExec (R, S). Self::abstract_builtin_exec (this, & input, context)}

The Self::abstract_builtin_exec () method is also found in the Self::abstract_exec () method.

Pub (crate) fn abstract_builtin_exec (this: & JsObject, input: & JsString, context: & mut Context,)-> JsResult {/ / 1.Assert: R is an initialized RegExp instance. Let rx = {let obj = this.borrow (); if let Some (rx) = obj.as_regexp () {rx.clone ()} else {return context.throw_type_error ("RegExpBuiltinExec called with invalid value");}; / / 2. Assert: Type (S) is String. / / 3. Let length be the number of code units in S. Let length = input.encode_utf16 (). Count (); / 4. Let lastIndex be ℝ (? ToLength (? Get (R, "lastIndex")). Let mut last_index = this.get ("lastIndex", context)? .to _ length (context)?; / 5. Let flags be R. [[OriginalFlags]]. Let flags = & rx.original_flags; / / 6. If flags contains "g", let global be true; else let global be false. Let global = flags.contains ('g'); / / 7. If flags contains "y", let sticky be true; else let sticky be false. Let sticky = flags.contains ('y'); / / 8. If global is false and sticky is false, set lastIndex to 0. If! global & &! sticky {last_index = 0;} / 9. Let matcher be R. [[RegExpMatcher]]. Let matcher = & rx.matcher; / / 10.If flags contains "u", let fullUnicode be true; else let fullUnicode be false. Let unicode = flags.contains ('u'); / / 11. Let matchSucceeded be false. / / 12. Repeat, while matchSucceeded is false, let match_value = loop {/ / a. If lastIndex > length, then if last_index > length {/ / I. If global is true or sticky is true, then if global | | sticky {/ / 1. Perform? Set (R, "lastIndex", + 0 percent lastIndex, true). This.set ("lastIndex", 0, true, context)?;} / ii. Return null. Return Ok (None);} / b. Let r be matcher (S, lastIndex). / / Check if last_index is a valid utf8 index into input. Let last_byte_index = match String::from_utf16 (& input.encode_utf16 (). Take (last_index). Collect:: (),) {Ok (s) = > s.len (), Err (_) = > {return context. Compare _ type_error ("Failed to get byte index from utf16 encoded string")}} Let r = matcher.find_from (input, last_byte_index). Next (); match r {/ / c. If r is failure, then None = > {/ / I. If sticky is true, then if sticky {/ / 1. Perform? Set (R, "lastIndex", + 0 percent lastIndex, true). This.set ("lastIndex", 0, true, context)? / / 2.Return null. Return Ok (None);} / / ii. Set lastIndex to AdvanceStringIndex (S, lastIndex, fullUnicode). Last_index = advance_string_index (input, last_index, unicode) } Some (m) = > {/ / c. If r is failure, then # [allow (clippy::if_not_else)] if m.start ()! = last_index {/ / I. If sticky is true, then if sticky {/ / 1. Perform? Set (R, "lastIndex", + 0 percent lastIndex, true). This.set ("lastIndex", 0, true, context)? / / 2.Return null. Return Ok (None);} / / ii. Set lastIndex to AdvanceStringIndex (S, lastIndex, fullUnicode). Last_index = advance_string_index (input, last_index, unicode); / / d. Else,} else {/ / I. Assert: r is a State. / / ii. Set matchSucceeded to true. Break m;}}; / 13. Let e be r's endIndex value. Let mut e = match_value.end (); / / 14. If fullUnicode is true, then if unicode {/ / e is an index into the Input character list, derived from S, matched by matcher. / / Let eUTF be the smallest index into S that corresponds to the character at element e of Input. / / If e is greater than or equal to the number of elements in Input, then eUTF is the number of code units in S. / / b. Set e to eUTF. E = input.split_at (e) .0.encode_utf16 () .count ();} / 15. If global is true or sticky is true, then if global | | sticky {/ / a.Perform? Set (R, "lastIndex", (e), true) This.set ("lastIndex", e, true, context)?;} / 16. Let n be the number of elements in r's captures List. (This is the same value as 22.2.2.1s NcapturingParens.) Let n = match_value.captures.len (); / / 17. Assert: n

< 23^2 - 1. debug_assert!(n < 23usize.pow(2) - 1); // 18. Let A be ! ArrayCreate(n + 1). // 19. Assert: The mathematical value of A's "length" property is n + 1. let a = Array::array_create(n + 1, None, context)?; // 20. Perform ! CreateDataPropertyOrThrow(A, "index", ????(lastIndex)). a.create_data_property_or_throw("index", match_value.start(), context) .expect("this CreateDataPropertyOrThrow call must not fail"); // 21. Perform ! CreateDataPropertyOrThrow(A, "input", S). a.create_data_property_or_throw("input", input.clone(), context) .expect("this CreateDataPropertyOrThrow call must not fail"); // 22. Let matchedSubstr be the substring of S from lastIndex to e. let matched_substr = if let Some(s) = input.get(match_value.range()) { s } else { "" }; // 23. Perform ! CreateDataPropertyOrThrow(A, "0", matchedSubstr). a.create_data_property_or_throw(0, matched_substr, context) .expect("this CreateDataPropertyOrThrow call must not fail"); // 24. If R contains any GroupName, then // 25. Else, let named_groups = match_value.named_groups(); let groups = if named_groups.clone().count() >

0 {/ / a. Let groups be! OrdinaryObjectCreate (null). Let groups = JsValue::from (JsObject::empty ()); / / Perform 27.f here / / f. If the ith capture of R was defined with a GroupName, then / / I. Let s be the CapturingGroupName of the corresponding RegExpIdentifierName. / / ii. Perform! CreateDataPropertyOrThrow (groups, s, capturedValue). For (name, range) in named_groups {if let Some (range) = range {let value = if let Some (s) = input.get (range.clone ()) {s} else {""} Groups. To _ object (context)? .create _ data_property_or_throw (name, value, context). Create ("this CreateDataPropertyOrThrow call must not fail");}} groups} else {/ / a. Let groups be undefined. JsValue::undefined ()}; / / 26. Perform! CreateDataPropertyOrThrow (A, "groups", groups). A.create_data_property_or_throw ("groups", groups, context). Coach ("this CreateDataPropertyOrThrow call must not fail"); / / 27. For each integer i such that i ≥ 1 and I ≤ n, in ascending order, do for i in 1.. Let captureI be ith element of r's captures List. Let capture = match_value.group (I); let captured_value = match capture {/ / b. If captureI is undefined, let capturedValue be undefined. None = > JsValue::undefined (), / / c. Else if fullUnicode is true, then / d. Else Some (range) = > {if let Some (s) = input.get (range) {s.into ()} else {"" .into ()} / / e. Perform! CreateDataPropertyOrThrow (A! ToString (I), capturedValue). A.create_data_property_or_throw (I, captured_value, context). Speak ("this CreateDataPropertyOrThrow call must not fail");} / / 28. Return A. Ok (Some (a))}

There are global and last_index in the Self::abstract_builtin_exec () method, so this seems to be the final method executed. Take a closer look at the code in the method (the code is written in detail and annotated at each step).

In step 12:

LastIndex exceeds the text length and sets lastIndex to 0 when global exists

Get the matching value (match_value)

If it does not match, set it to the return value of the advance_string_index () method

Advance_string_index () is not within the scope of consideration of the current issue https://tc39.es/ecma262/#sec-...

Step 13 gets the endIndex of the matched value

Step 15 set lastIndex to endIndex

At this point, we understand the meaning of the g flag. There is a lastIndex in the regular prototype chain. If the match is true, the lastIndex will not be reset to 0, and the last position will be inherited at the beginning of the next time.

Conclusion

Analyze in the problem code

Const reg = / [Amurz] / g; / / after declaration, lastIndex is 0reg.test ('a'); / / = > after the first match of true;, lastIndex is 1reg.test ('a'); / / = > false; second matching because lastIndex is 1 and there is only one character, get false and set lastIndex to 0reg.test ('a'); / / = > logical reg.test ('a') of the first two loops under true;; / / = > false Reg.test ('a'); / / = > true; so far, I believe you have a deeper understanding of "g flag instance analysis in JavaScript regular expressions". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report