In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
In this article, the editor introduces in detail "what is the best way to use .NET regular expressions". The content is detailed, the steps are clear, and the details are handled properly. I hope this article "what is the best way to use .NET regular expressions" can help you solve your doubts.
The regular expression engine in .NET is a powerful and complete tool that processes text based on pattern matching rather than comparing and matching text. In most cases, it can perform pattern matching quickly and efficiently. In some cases, however, the regular expression engine seems to be slow. In extreme cases, it even seems to stop responding because it takes hours or even days to process relatively small inputs.
This topic outlines some of the best practices that developers can adopt to ensure that their regular expressions achieve the best performance.
Consider the input source
In general, regular expressions can accept two types of input: constrained input or unconstrained input. Constrained input is text that comes from a known or reliable source and follows a predefined format. Unconstrained input is text that comes from an unreliable source, such as Web users, and may not follow a predefined or expected format.
Regular expression patterns are usually written to match valid input. That is, developers check the text they want to match and then write regular expression patterns that match it. The developer then tests with multiple valid inputs to determine whether the pattern needs to be corrected or further refined. When the pattern matches all assumed valid inputs, it is declared production ready and can be included in the published application. This makes the regular expression pattern suitable for matching constrained input. However, it is not suitable for matching unconstrained input.
To match unconstrained input, regular expressions must be able to handle the following three types of text efficiently:
Text that matches the regular expression pattern.
Text that does not match the regular expression pattern.
Text that roughly matches the regular expression pattern.
The last type of text is particularly problematic for regular expressions written to handle constrained input. If the regular expression also relies on a lot of backtracking, the regular expression engine may spend a lot of time (in some cases, many hours or days) processing seemingly harmless text.
For example, consider a common but problematic regular expression for validating e-mail address aliases. The regular expression ^ [0-9A-Z] ([-.\ w] * [0-9A-Z]) * $is written to deal with e-mail addresses that are considered valid, which contain an alphanumeric character followed by zero or more characters that can be alphanumeric, period, or hyphen. The regular expression must end with an alphanumeric character. But as the following example shows, although this regular expression can easily handle valid input, it is very inefficient when dealing with near-valid input.
Using System;using System.Diagnostics;using System.Text.RegularExpressions;public class Example {public static void Main () {Stopwatch sw; string [] addresses = {"AAAAAAAAAAA@contoso.com", "AAAAAAAAAAaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa String pattern = @ "^ [0-9A-Z] ([-.\ w] * [0-9A-Z]) * $"; string input; foreach (var address in addresses) {string mailBox = address.Substring (0, address.IndexOf ("@")); int index = 0; for (int ctr = mailBox.Length-1; ctr > = 0; ctr--) {index++ Input = mailBox.Substring (ctr, index); sw = Stopwatch.StartNew (); Match m = Regex.Match (input, pattern, RegexOptions.IgnoreCase); sw.Stop (); if (m.Success) Console.WriteLine ("{0mem2}. Matched'{1pr 25}'in {2} ", index, m.Value, sw.Elapsed); else Console.WriteLine (" {0ret 2}. Failed'{1pr 25}'in {2} ", index, input, sw.Elapsed);} Console.WriteLine () }} / / The example displays output similar to the following:// 1. Matched 'A'in 0000The example displays output similar to the following:// 00.0007122 AA' in / 2. Matched 'AAA' in 0000The example displays output similar to the following:// / 3. 00.0000042max / 4. Matched' AAAA' in 00AAAAAA' in 00AAAAA' in 00.0000038max / 5. Matched 'AAAAAA' in 001D 00AAAA' in 00.0000042max / 7. Matched 'VOUL00GUBE 00VOLY 00.0000042 AAAAAAAA' in / 8. Matched'Gap: 00.0000087Matched / 9. Matched 'AAAAAAAAAA' in 00Matched / 10. AAAAAAAAAAA' in' AAAAAAAAAAA' in 00Matched 00Matched / 10. 00.0000045 Failed / 11. In 0000.0000447 / 2. Failed 'aura' In 0000.0000071 / 3. Failed'aaasides' In 00.0000071 / 4. Failed'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa In 00.0000061 / 5. Failed'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa In 00.0000081 / 6. Failed'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa In 00.0000126 / 7. Failed'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa In 00.0000359 / 8. Failed'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa In 00.0000414 / 9. Failed'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa In 00.0000758 / 10. Failed'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa In 00.0001462 / 11. Failed'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa In 00.0002885 / 12. Failed 'Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa In 00.0005780 / 13. Failed 'AAaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.' In 0011628 / 14. Failed 'AAAaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa In 0015 00.0022851 / 15. Failed 'AAAaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa In 00.0045864 / 16. Failed'AAAAAaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa In 00.0093168 / 17. Failed'AAAAAaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa In 00.0185993 / 18. Failed 'AAAAAaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa In 00.0366723 Failed 'AAAAAAaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa In 00.1370108 / 20. Failed 'AAAAAAaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa In 00:00:00.1553966// 21. Failed'AAAAAAAaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa In 00:00:00.3223372
As shown in the sample output, the regular expression engine processes valid e-mail aliases at roughly the same interval, regardless of their length. On the other hand, when a near-valid e-mail address contains more than five characters, each additional character in the string roughly doubles the processing time. This means that it will take an hour to process a string of nearly valid 28 characters, and it will take nearly a day to process a string of nearly valid 33 characters.
Because only the format of the input to match is considered when developing this regular expression, input that does not match the pattern cannot be considered. This in turn significantly degrades the performance of unconstrained input that approximately matches the regular expression pattern.
To resolve this problem, do the following:
When developing patterns, you should consider the impact of backtracking on the performance of the regular expression engine, especially when regular expressions are designed to handle unconstrained input. For more information, see the section controlling backtracking.
Fully test the regular expression with invalid input, near valid input, and valid input. To randomly generate input for a particular regular expression, you can use Rex, the regular expression discovery tool provided by Microsoft Research.
Handle object instantiation appropriately
The core of the .NET regular expression object model is the xref:System.Text.RegularExpressions.Regex?displayProperty=nameWithType class, which represents the regular expression engine. In general, the single biggest factor that affects the performance of regular expressions is how the xref:System.Text.RegularExpressions.Regex engine is used. Defining a regular expression requires a tight coupling of the regular expression engine with the regular expression pattern. Whether the coupling process needs to instantiate xref:System.Text.RegularExpressions.Regex by passing regular expression patterns to its constructor or call static methods by passing regular expression patterns and strings to parse, it is bound to consume a lot of resources.
You can couple the regular expression engine with a specific regular expression pattern, and then use the engine to match text in several ways:
You can call static pattern matching methods, such as xref:System.Text.RegularExpressions.Regex.Match (System.String%2CSystem.String)? displayProperty=nameWithType. There is no need to instantiate a regular expression object.
You can instantiate a xref:System.Text.RegularExpressions.Regex object and call the instance pattern matching method of the interpreted regular expression. This is the default way to bind the regular expression engine to the regular expression pattern. This method is generated if the xref:System.Text.RegularExpressions.Regex object is instantiated without using the xref:System.Text.RegularExpressions.RegexOptions.Compiled argument that includes the options tag.
You can instantiate a xref:System.Text.RegularExpressions.Regex object and call the instance pattern matching method of the compiled regular expression. When an xref:System.Text.RegularExpressions.RegexOptions.Compiled object is instantiated with an options parameter that includes a xref:System.Text.RegularExpressions.Regex tag, the regular expression object represents the compiled pattern.
You can create a special-purpose xref:System.Text.RegularExpressions.Regex object that is tightly coupled to a particular regular expression pattern, compile it, and save it to a separate assembly. To do this, you can call the xref:System.Text.RegularExpressions.Regex.CompileToAssembly*?displayProperty=nameWithType method.
This special way of calling the regular expression matching method can have a significant impact on the application. The following sections discuss when to use static method calls, interpreted regular expressions, and compiled regular expressions to improve application performance.
Static regular expression
It is recommended that the static regular expression method be used as an alternative to repeatedly instantiating a regular expression object using the same regular expression. Unlike the regular expression pattern used by regular expression objects, the operation code or compiled Microsoft intermediate language (MSIL) in the pattern used by static method calls is cached internally by the regular expression engine.
For example, event handlers frequently call other methods to validate user input. This is reflected in the following code, where the xref:System.Windows.Forms.Control.Click event of a xref:System.Windows.Forms.Button control is used to call a method named IsValidCurrency, which checks whether the user has entered a currency symbol followed by at least one decimal number.
Public void OKButton_Click (object sender, EventArgs e) {if (! String.IsNullOrEmpty (sourceCurrency.Text)) if (RegexLib.IsValidCurrency (sourceCurrency.Text)) PerformConversion (); else status.Text = "The source currency value is invalid.";}
The following example shows a very inefficient implementation of the IsValidCurrency method. Notice that each method call re-instantiates the xref:System.Text.RegularExpressions.Regex object using the same pattern. This in turn means that the regular expression pattern must be recompiled each time the method is called.
Using System;using System.Text.RegularExpressions;public class RegexLib {public static bool IsValidCurrency (string currencyValue) {string pattern = @ "\ p {Sc} +\ s *\ d +"; Regex currencyRegex = new Regex (pattern); return currencyRegex.IsMatch (currencyValue);}}
This inefficient code should be replaced with a call to the static xref:System.Text.RegularExpressions.Regex.IsMatch (System.String%2CSystem.String)? displayProperty=nameWithType method. This eliminates the need to instantiate the xref:System.Text.RegularExpressions.Regex object every time you call the pattern matching method, and allows the regular expression engine to retrieve the compiled version of the regular expression from its cache.
Using System;using System.Text.RegularExpressions;public class RegexLib {public static bool IsValidCurrency (string currencyValue) {string pattern = @ "\ p {Sc} +\ s *\ d +"; return Regex.IsMatch (currencyValue, pattern);}}
By default, the last 15 recently used static regular expression patterns are cached. For applications that require a large number of cached static regular expressions, you can adjust the cache size by setting the Regex.CacheSize property.
The regular expression\ p {Sc} +\ s *\ d + used in this example verifies that the input string contains a currency symbol and at least one decimal number. The definition of the pattern is shown in the following table.
The pattern description\ p {Sc} + matches one or more characters in the Unicode symbol, currency category. \ s* matches zero or more white space characters. \ d + matches one or more decimal digits. Interpreted and compiled regular expressions
Binds the specification that interprets not through the RegexOptions.Compiled option to the regular expression pattern of the regular expression engine. When you instantiate a regular expression object, the regular expression engine converts the regular expression into a set of action codes. When the instance method is called, the action code is converted to MSIL and executed by the JIT compiler. Similarly, when a static regular expression method is called and the regular expression is not found in the cache, the regular expression engine converts the regular expression into a set of action codes and stores it in the cache. It then converts the operation code to MSIL so that the JIT compiler can execute it. Interpreted regular expressions reduce startup time, but slow execution. Therefore, using interpreted regular expressions works best when using regular expressions in a small number of method calls or when the exact number of regular expression methods called is unknown but expected to be small. As the number of method calls increases, the impact of slower execution on performance outweighs the performance improvement brought about by reduced startup time.
Binds the specification that compiles through the RegexOptions.Compiled option to the regular expression pattern of the regular expression engine. This means that when a regular expression object is instantiated or when a static regular expression method is called and the regular expression is not found in the cache, the regular expression engine converts the regular expression into a set of intermediate operation codes, which are then converted to MSIL. When the method is called, the JIT compiler executes the MSIL. Compiled regular expressions increase startup time compared to interpreted regular expressions, but perform various pattern matching methods faster. As a result, the performance resulting from compiling regular expressions is improved relative to the number of regular expression methods invoked.
In short, it is recommended to use interpreted regular expressions when you call regular expression methods relatively infrequently with specific regular expressions. When you call regular expression methods relatively frequently with specific regular expressions, you should use compiled regular expressions. It is difficult to determine the exact threshold at which interpreted regular expression execution slows down beyond the performance gain resulting from reduced startup time, or compiled regular expression startup exceeds the performance gain due to faster execution. This depends on a variety of factors, including the complexity of the regular expression and the specific data it processes. To determine whether interpreted or compiled regular expressions provide the best performance for a particular application scenario, you can use the Diagnostics.Stopwatch class to compare their execution times.
The following example compares the performance of compiled and interpreted regular expressions when reading the first ten sentences and all sentences of the financier by Theodore Dreiser. As the sample output shows, the interpreted regular expression provides better performance than the compiled regular expression when only ten calls are made to the regular expression that matches the method. However, when a large number of calls are made (in this example, more than 13000 calls), compiled regular expressions provide better performance.
Using System;using System.Diagnostics;using System.IO;using System.Text.RegularExpressions;public class Example {public static void Main () {string pattern = @ "\ b (\ w + ((\ r?\ n) |,?\ s)) *\ w + [.?:;!]"; Stopwatch sw; Match match; int ctr; StreamReader inFile = new StreamReader (@ ".\ Dreiser_TheFinancier.txt"); string input = inFile.ReadToEnd () InFile.Close (); / / Read first ten sentences with interpreted regex. Console.WriteLine ("10 Sentences with Interpreted Regex:"); sw = Stopwatch.StartNew (); Regex int10 = new Regex (pattern, RegexOptions.Singleline); match = int10.Match (input); for (ctr = 0; ctr subexpression) language elements (called atomic groups) to disable it. The following example parses the input string by using two regular expressions. The first regular expression\ b\ p {Lu}\ w*\ b depends on backtracking. The second regular expression\ b\ p {Lu} (? >\ w*)\ b disables backtracking. As shown in the sample output, the two regular expressions produce the same results.
Using System;using System.Text.RegularExpressions;public class Example {public static void Main () {string input = "This this word Sentence name Capital"; string pattern = @ "\ b\ p {Lu}\ w*\ b"; foreach (Match match in Regex.Matches (input, pattern)) Console.WriteLine (match.Value); Console.WriteLine (); pattern = @ "\ b\ p {Lu} (? >\ w*)\ b" Foreach (Match match in Regex.Matches (input, pattern)) Console.WriteLine (match.Value);}} / / The example displays the following output:// This// Sentence// Capital//// This// Sentence// Capital
In many cases, backtracking is important when matching regular expression patterns to input text. However, excessive backtracking can seriously degrade performance and create the feeling that the application has stopped responding. In particular, this occurs when qualifiers are nested and the text that matches the external subexpression is a subset of the text that matches the internal subexpression.
For example, the regular expression pattern ^ [0-9A-Z] ([-.\ w] * [0-9A-Z]) *\ $$is used to match part numbers that include at least one alphanumeric character. Any additional character can contain alphanumeric characters, hyphens, underscores, or periods, but the last character must be alphanumeric. The dollar sign is used to terminate the part number. In some cases, this regular expression pattern exhibits poor performance because the qualifier is nested and the subexpression [0-9A-Z] is a subset of the subexpression [-.\ w] *.
In these cases, regular expression performance can be optimized by removing nested qualifiers and replacing external subexpressions with zero-width prediction leading and retrospective assertions. Prediction advance and review assertions are anchor points; they do not move the pointer in the input string, but rather check whether the specified condition is met by prediction lead or review. For example, you can rewrite the part number regular expression as ^ [0-9A-Z] [-.\ w] * (?) Zero width negative review. Review the current position of the post to determine if the subexpression does not match the input string. Use timeout valu
If a regular expression processes input that roughly matches the regular expression pattern, it usually depends on excessive backtracking that can seriously affect its performance. In addition to seriously considering the use of backtracking and testing regular expressions against roughly matching input, you should always set a timeout value to ensure that the impact of excessive backtracking, if any, is minimized.
The regular expression timeout interval defines the length of time that the regular expression engine uses to find a single match before it times out. The default timeout interval is Regex.InfiniteMatchTimeout, which means that regular expressions do not time out. You can override this value and define the timeout interval as follows:
Provides a timeout value when instantiating a Regex object (by calling the Regex (String, RegexOptions, TimeSpan) constructor.
Call a static pattern matching method, such as Regex.Match (String, String, RegexOptions, TimeSpan) or Regex.Replace (String, RegexOptions, TimeSpan), which contains the matchTimeout parameter.
For compiled regular expressions created by calling the Regex.CompileToAssembly method, you can call the constructor with parameters of type TimeSpan.
If a timeout interval is defined and no match is found at the end of the interval, the regular expression method throws a RegexMatchTimeoutException exception. In the exception handler, you can choose to use a longer timeout interval to retry the match, abandon the match attempt and assume that there is no match, or abandon the match attempt and record the exception information for future analysis.
The following example defines a GetWordData method that instantiates a regular expression with a timeout interval of 350ms to calculate the number of words in a text file and the average number of characters in a word. If the match operation times out, the timeout interval is extended by 350ms and the Regex object is re-instantiated. If the new timeout interval exceeds 1 second, this method throws an exception to the caller again.
Using System;using System.Collections.Generic;using System.IO;using System.Text.RegularExpressions;public class Example {public static void Main () {RegexUtilities util = new RegexUtilities (); string title = "Doyle-The Hound of the Baskervilles.txt"; try {var info = util.GetWordData (title); Console.WriteLine ("Words: {0:N0}", info.Item1) Console.WriteLine ("Average Word Length: {0:N2} characters", info.Item2);} catch (IOException e) {Console.WriteLine ("IOException reading file'{0}'", title); Console.WriteLine (e.Message);} catch (RegexMatchTimeoutException e) {Console.WriteLine ("The operation timed out after {0:N0} milliseconds", e.MatchTimeout.TotalMilliseconds) } public class RegexUtilities {public Tuple GetWordData (string filename) {const int MAX_TIMEOUT = 1000; / / Maximum timeout interval in milliseconds. Const int INCREMENT = 350; / / Milliseconds increment of timeout. List exclusions = new List (new string [] {"a", "an", "the"}); int [] wordLengths = new int [29]; / / Allocate an array of more than ample size. String input = null; StreamReader sr = null; try {sr = new StreamReader (filename); input = sr.ReadToEnd ();} catch (FileNotFoundException e) {string msg = String.Format ("Unable to find the file'{0}'", filename); throw new IOException (msg, e);} catch (IOException e) {throw new IOException (e.Message, e) } finally {if (sr! = null) sr.Close ();} int timeoutInterval = INCREMENT; bool init = false; Regex rgx = null; Match m = null; int indexPos = 0; do {try {if (! Init) {rgx = new Regex (@ "\ b\ w +\ b", RegexOptions.None, TimeSpan.FromMilliseconds (timeoutInterval)); m = rgx.Match (input, indexPos); init = true;} else {m = m.NextMatch () } if (m.Success) {if (! exclusions.Contains (m.Value.ToLower () wordLengs [m.Value.Length] + +; indexPos + = m.Length + 1 }} catch (RegexMatchTimeoutException e) {if (e.MatchTimeout.TotalMilliseconds < MAX_TIMEOUT) {timeoutInterval + = INCREMENT; init = false;} else {/ / Rethrow the exception. Throw;} while (m.Success); / / If regex completed successfully, calculate number of words and average length. Int nWords = 0; long totalLength = 0; for (int ctr = wordLengths.GetLowerBound (0); ctr
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.