Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use the Pattern class in Java regular expressions

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article focuses on "how to use the Pattern class in Java regular expressions". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor learn how to use the Pattern class in Java regular expressions.

Preface

In Java, the java.util.regex package defines the related classes used by regular expressions, of which the two main classes are: Pattern, Matcher:

Pattern creates a matching pattern after compiling the regular expression

Matcher uses the regular expression provided by the Pattern instance to match the target string, which is the object that really affects the search.

Add a new exception class, PatternSyntaxException, which throws an exception when an illegal search pattern is encountered.

Overview of Pattern

Declaration: public final class Pattern implements java.io.Serializable

The Pattern class is decorated with final, so it cannot be inherited by subclasses.

Meaning: pattern class, compiled representation of regular expressions.

Note: instances of this class are immutable and can be used safely by multiple concurrent threads.

Pattern matching pattern (Pattern flags)

The compile () method has a version that requires a parameter that controls the matching behavior of regular expressions:

Pattern Pattern.compile (String regex, int flag)

Value range of flag

Field description Pattern.UNIX_LINESunix line mode, most systems' lines end in\ n, but a few systems, such as Windows, end in\ r\ ncombination. When this mode is enabled, only\ n is used as the line Terminator, which affects ^, $, and period (dot matching newline).

Unix line mode can also be enabled through the embedded flag expression (? d). Pattern.CASE_INSENSITIVE by default, case-insensitive matches apply only to the US-ASCII character set. This flag allows expressions to match regardless of case. To match the size of Unicode characters, simply combine UNICODE_CASE with this flag.

Case-insensitive matching can also be enabled through the embedded flag expression (? I).

Specifying this flag may have some impact on performance. In Pattern.COMMENTS ⇢ mode, the space character (not the "/ / s" in the expression, but the space, tab, carriage return, etc.) and comments (starting from # to the end of the line) are ignored when matching.

Annotation mode can also be enabled through the embedded flag expression (? X). By default, Pattern.MULTILINE, the input string is treated as a line, even if a newline character is wrapped in that line. When matching content between "^" and "$", the entire input is treated as a single line. When multiline mode is enabled, input containing newline characters is automatically converted to multiple lines and then matched.

Multiline mode can also be enabled through the embedded flag expression (? M). Pattern.LITERAL enables literal resolution mode.

When this flag is specified, the input string for the specified pattern is treated as a sequence of literal characters. Metacharacters or escape sequences in the input sequence do not have any special meaning.

The flags CASE_INSENSITIVE and UNICODE_CASE will affect the match when used with this flag. Other signs have become superfluous.

There are no embedded flag characters that enable literal parsing. Pattern.DOTALL in this mode, the expression. You can match any character, including the line Terminator. By default, this expression does not match the line Terminator.

This pattern can also be enabled through the embedded flag expression (? s) (s is a mnemonic for the "single-line" pattern, which is also used in Perl). Pattern.UNICODE_CASE in this mode, if you also enable the CASE_INSENSITIVE flag, it will match Unicode characters case insensitively. By default, case ambiguity matches only apply to the US-ASCII character set.

Specifying this flag may have an impact on performance. Pattern.CANON_EQ determines a match only if the normal decomposition (canonical decomposition) of the two characters is exactly the same. For example, after using this flag, the expression a/u030A will match. By default, specification equality (canonical equivalence) is not taken into account.

Specifying this flag may have an impact on performance.

Of these logos, Pattern.CASE_INSENSITIVE,Pattern.MULTILINE and Pattern.COMMENTS are the most useful (of which Pattern.COMMENTS can also help us sort things out and / or document them). Note that you can enable most patterns by inserting tokens in expressions. These marks are just below the signs in the table above. You can insert a mark where you want the mode to start.

You can use the OR (|) operator to use these flags together.

Code sample multiline mode: Pattern.MULTILINE example

I tested it, which means that without the MULTILINE flag, ^ and $can only match the beginning and end of the input sequence; otherwise, you can match the line Terminator inside the input sequence. The test code is as follows:

Import java.util.regex.*;/** * multiline mode * / public class ReFlags_MULTILINE {public static void main (String [] args) {/ / notice the newline character String str = "hello world\ r\ n" + "hello java\ r\ n" + "hello java"; System.out.println ("= match the beginning of the string (not multiline mode) =") Pattern p = Pattern.compile ("^ hello"); Matcher m = p.matcher (str); while (m.find ()) {System.out.println (m.group ()) + "location: [" + m.start () + "," + m.end () + "]");} System.out.println ("= match the beginning of the string (multiline mode) =") P = Pattern.compile ("^ hello", Pattern.MULTILINE); m = p.matcher (str); while (m.find ()) {System.out.println (m.group () + "position: [" + m.start () + "," + m.end () + "]");} System.out.println ("= match the end of the string (non-multiline mode) =") P = Pattern.compile ("java$"); m = p.matcher (str); while (m.find ()) {System.out.println (m.group () + "position: [" + m.start () + "," + m.end () + "]");} System.out.println ("= match string end (multiline pattern) =") P = Pattern.compile ("java$", Pattern.MULTILINE); m = p.matcher (str); while (m.find ()) {System.out.println (m.group () + "position: [" + m.start () + "]);}

= matches the beginning of the string (not in multiline mode) =

Hello location: [0Pol. 5]

= match the beginning of the string (multiline mode) =

Hello location: [0Pol. 5]

Hello location: [132.18]

Hello location: [255.30]

= matches the end of the string (not in multiline mode) =

Java location: [31jue 35]

= match the end of the string (multiline mode) =

Java location: [195.23]

Java location: [31jue 35]

Ignore case: Pattern.CASE_INSENSITIVE exampl

Sometimes, you need to make a match that ignores case. This example matches Celsius and Fahrenheit, and can match temperature values ending in C, c, F and f.

Import java.util.regex.Pattern;public class ReFlags_CASE_INSENSITIVE {public static void main (String [] args) {System.out.println ("= API ignores case ="); String moneyRegex = "[+ -]? (\\ d) + (. (\\ d) *)? (\\ s) * [CF]"; Pattern p = Pattern.compile (moneyRegex,Pattern.CASE_INSENSITIVE) System.out.println ("- 3.33c" + p.matcher ("- 3.33c"). Matches (); System.out.println ("- 3.33C" + p.matcher ("- 3.33C"). Matches ()); System.out.println ("= do not ignore case ="); moneyRegex = "[+ -]? (\ d) + (. (\ d) *)? (\\ s) * [CF]" P = Pattern.compile (moneyRegex); System.out.println ("- 3.33c" + p.matcher ("- 3.33c"). Matches (); System.out.println ("- 3.33C" + p.matcher ("- 3.33C"). Matches (); System.out.println ("= regular internal ignore case =") MoneyRegex = "[+ -]? (\\ d) + (. (\\ d) *)? (\\ s) * (? I) [CF]"; p = Pattern.compile (moneyRegex); System.out.println ("- 3.33c" + p.matcher ("- 3.33c"). Matches (); System.out.println ("- 3.33C" + p.matcher ("- 3.33C"). Matches ()) System.out.println ("= internal case is not ignored ="); moneyRegex = "[+ -]? (\\ d) + (. (\\ d) *)? (\\ s) * [CF]"; p = Pattern.compile (moneyRegex); System.out.println ("- 3.33c" + p.matcher ("- 3.33c") .matches ()) System.out.println ("- 3.33C" + p.matcher ("- 3.33C"). Matches ();}}

= API ignores case =

-3.33c true

-3.33C true

= do not ignore case =

-3.33c false

-3.33C true

= regular internal ignore case =

-3.33c true

-3.33C true

= case is not ignored internally =

-3.33c false

-3.33C true

Enable comments: Pattern.COMMENTS exampl

Comments are enabled, and when enabled, spaces in regular expressions and the # line are ignored.

Import java.util.regex.Pattern;public class ReFlags_COMMENTS {public static void main (String [] args) {System.out.println ("= API enable comment ="); String comments = "(\ d) + # this is comments."; Pattern p = Pattern.compile (comments, Pattern.COMMENTS); System.out.println ("1234" + p.matcher ("1234"). Matches ()) System.out.println ("= do not enable comments ="); comments = "(\ d) + # this is comments."; p = Pattern.compile (comments); System.out.println ("1234" + p.matcher ("1234"). Matches (); System.out.println ("= regular enable comments =") Comments = "(? X) (\ d) + # this is comments."; p = Pattern.compile (comments); System.out.println ("1234" + p.matcher ("1234"). Matches (); System.out.println ("= do not enable comments ="); comments = "(\ d) + # this is comments."; p = Pattern.compile (comments) System.out.println ("1234" + p.matcher ("1234") .matches ();}}

= API enable comments =

1234 true

= do not enable comments =

1234 false

= regular enable comments =

1234 true

= do not enable comments =

1234 false

As you can see, the comments from the # to the end of the line and the white space characters in front of them are ignored. The built-in enable annotation for regular expressions is (? X).

Enable dotall mode: Pattern.DOTALL exampl

Dotall mode is enabled. In general, the period (.) matches any character, but not the newline character. When this mode is enabled, the period can also match the newline character.

Import java.util.regex.Pattern;public class ReFlags_DOTALL {public static void main (String [] args) {System.out.println ("= API enables DOTALL="); String dotall = "(.) *"; Pattern p = Pattern.compile (dotall, Pattern.DOTALL); System.out.println ("\ r\ n" + p.matcher ("\ r\ n"). Matches ()) System.out.println ("= do not enable DOTALL="); dotall = "(.) *"; p = Pattern.compile (dotall); System.out.println ("\ r\ n" + p.matcher ("\ r\ n"). Matches ()); System.out.println ("= regular enable DOTALL="); dotall = "(? s) (.) *"; p = Pattern.compile (dotall) System.out.println ("\ r\ n" + p.matcher ("\ r\ n"). Matches (); System.out.println ("= do not enable DOTALL="); dotall = "(.) *"; p = Pattern.compile (dotall); System.out.println ("\ r\ n" + p.matcher ("\ r\ n"). Matches ());}}

= API enable DOTALL=

\ r\ ntrue

= do not enable DOTALL=

\ r\ nfalse

= enable DOTALL= regularly

\ r\ ntrue

= do not enable DOTALL=

\ r\ nfalse

Plain character mode: Pattern.LITERAL exampl

When this mode is enabled, all metacharacters and escape characters are treated as ordinary characters and have no other meaning.

Import java.util.regex.Pattern;public class ReFlags_LITERAL {public static void main (String [] args) {System.out.println (Pattern.compile ("\ d", Pattern.LITERAL) .matcher ("\\ d") .matches (); / / true System.out.println (Pattern.compile ("\ d", Pattern.LITERAL) .matcher ("2") .matcher ()) / / false System.out.println (Pattern.compile ("(\\ d) +", Pattern.LITERAL) .matcher ("1234"). Matches (); / / false System.out.println (Pattern.compile ("(\ d) +). Matcher (" 1234 "). Matches (); / / true System.out.println (Pattern.compile (" (\\ d) {2pm 3} ", Pattern.LITERAL) .matcher (" (\ d) {2pm 3} "). Matches ()) / / true}} attachment: greedy matching and lazy matching

Consider this expression: a.roomb, which will match the longest string that starts with an and ends with b. If you use it to search for aabab, it matches the entire string aabab. This is called greedy matching.

Sometimes we need lazy matching, that is, matching as few characters as possible. All the qualifiers given above can be converted into lazy matching patterns, as long as they are followed by a question mark. Like this. *? It means matching any number of duplicates, but using the least number of duplicates on the premise that the entire match is successful.

A. matches the shortest string that starts with an and ends with b. If you apply it to aabab, it will match aab and ab.

Public static void main (String [] args) {String str = "Beijing (Haidian District) (Chaoyang District)"; String paternStr = ". * (? =\\ ()"; Pattern pattern = Pattern.compile (paternStr); Matcher matcher = pattern.matcher (str); if (matcher.find ()) {System.out.println (matcher.group (0));}}

The output of the above method is: Beijing (Haidian District)

Public static void main (String [] args) {String str = "Beijing (Haidian District) (Chaoyang District)"; String paternStr = ". *? (? =\\ ()"; Pattern pattern = Pattern.compile (paternStr); Matcher matcher = pattern.matcher (str); if (matcher.find ()) {System.out.println (matcher.group (0));}}

Output of the above method: Beijing

At this point, I believe you have a deeper understanding of "how to use the Pattern class in Java regular expressions". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report