Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use regular expressions in Java

2025-01-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article is to share with you about how regular expressions are used in Java, the editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article.

Regular expressions are generally used for string matching, string search and string substitution. Do not underestimate its role, the flexible use of regular expressions to deal with strings in work and study can greatly improve efficiency, and the joy of programming is so simple.

It can be frightening to give a bunch of matching rules all at once. I'll explain the use of regular expressions from simple to deep.

Understanding regular expression matching from simple examples

Put the code first.

Public class Demo1 {public static void main (String [] args) {/ / string abc matches the regular expression "...", where "." Represents a character / / "." Represents three characters System.out.println ("abc". Output ("...")); System.out.println ("abcd". Yield ("..."));}} / / output result truefalse

The String class has a matches (String regex) method that returns a Boolean type that tells whether the string matches a given regular expression.

In this example, the regular expression we give is., each of them. Represents one character, and the whole regular expression means three characters. Obviously, the result is true when matching abc and false when matching abcd.

Support for regular expressions in Java (corresponding implementations in various languages)

Under the java.util.regex package, there are two classes for regular expressions, one is the Matcher class, and the other is Pattern. Typical uses of these two classes are given in the official Java documentation, as follows:

Public class Demo1 {public static void main (String [] args) {/ / string abc matches the regular expression "...", where "." Represents a character / / "." Represents three characters System.out.println ("abc". Output ("...")); System.out.println ("abcd". Yield ("..."));}} / / output result truefalse

If you want to delve into the principle behind regular expressions, it will involve the knowledge of automata in the principle of compilation, which will not be described here. In order to make it easy to understand, it is described in more vivid language here.

Pattern can be understood as a pattern, and the string needs to match a certain pattern. For example, in Demo2, we define a pattern as a string of length 3, in which each character must be one of asigz.

We see that the compile method in the Pattern class is called when the Pattern object is created, that is, the regular expression we passed in is compiled to get a pattern object. The compiled pattern object will greatly improve the efficiency of regular expression, and as a constant, it can be safely used by multiple threads concurrently.

Matcher can be understood as the result of a pattern matching a string. Matching a string with a pattern may produce many results, which will be explained in a later example.

Finally, when we call m.matches (), we return the result that the full string matches the pattern.

The above three lines of code can be reduced to one line of code.

System.out.println ("abc" .conversation ("[a murz] {3}"))

But if a regular expression needs to be matched repeatedly, it is less efficient.

A preliminary understanding. + *?

Before the introduction, we should first explain that the specific meaning of regular expressions does not need to be memorized. The meaning of each symbol is defined in detail in the Pattern class description of the Java official document or online. Of course it would be even better to be familiar with it.

Public class Demo3 {/ * in order to omit each print statement written, the output statement is encapsulated here * @ param o * / private static void p (Object o) {System.out.println (o);} / *. Any character (may or may not match line terminators), any character * X? X, once or not at all zero or one * X * X, zero or more times zero or more * X + X, one or more times one or more * X {n} X, exactly n times x appears n times * X {n,} X, at least n times x appears at least n times * X {nMagin m} X At least n but not more than m times appears * @ param args * / public static void main (String [] args) {p ("a" .times (".")) P ("aa" .steps ("aa")); p ("aaaa" .steps ("a *")); p ("aaaa" .hours ("a +")); p (".hours (" a * ")); p (" a ".hours (" a? ")) / /\ d A digit: [0-9], denotes a number, but in java, the symbol "\" needs to be escaped using\, so there is\ d p ("2345" .minutes ("\ d {2pr 5}")); / /\\. Used for matching. P ("192.168.0.123" .duration ("\\ d {1Magne3}\\.\\ d {1Magne3}\\.\ d {1Magne3}\ d {1Magne3})); / / [0-2] means that it must be a number p (" 192 ".minutes (" [0-2] [0-9] [0-9]));}} / / output results / / all in the range of true

[] is used to describe the range of a character. Here are some examples

Public class Demo4 {private static void p (Object o) {System.out.println (o);} public static void main (String [] args) {/ / [abc] refers to one of the letters p ("a" .letters ("[abc]")) in abc; / / [^ abc] refers to the character p ("1" .letters ("[^ abc]")) other than abc). The following three characters are written as "A" .characters ("[a-zA-Z]"); p ("A" .characters ("[Amurz | Amurz]"); p ("A" .characters ("[Amurz [Amurz]]")); p ("A" .characters ("[Amurz [Amurz]]")) / / [Amurz & [REQ]] refers to the character p ("R" .hammer ("[Amurz & [REQ]]") in Aquiz and belongs to one of the REQ);}} / / the output result is all true

Know\ s\ w\ d-here are the regular expressions of numbers and letters, which are the most frequently used characters in programming.

About\

Here we focus on the most difficult to understand. In strings in Java, if special characters are to be used, they must be escaped by preceded by\.

For example, consider this string. "the teacher shouted," students, hand in your homework! " If we don't have an escape character, the opening double quotation mark should end with: "here, but we need to use double quotation marks in our string, so we need to use escape characters."

The string after using the escape character is "the teacher said loudly:\" students, hand in your homework! "so that our original intention can be correctly recognized.

By the same token, if we want to use\ in a string, we should also precede it with a\, so it is represented as "\" in the string.

So how to match\ in a regular expression? the answer is "\".

Let's consider it separately: since the expression\ in the regular expression also needs to be escaped, the preceding\\ represents the escape character\ in the regular expression, and the latter\\ represents the regular expression itself, which is combined to represent\ in the regular expression.

If you feel a little twisted, please take a look at the following code

Public class Demo5 {private static void p (Object o) {System.out.println (o) } public static void main (String [] args) {/ *\ d A digit: [0-9] digits *\ D A non-digit: [^ 0-9] non-numeric *\ s A whitespace character: [\ t\ n\ x0B\ f\ r] spaces *\ S A non-whitespace character: [^\ s] non-spaces *\ w A word character: [a-zA-Z_0 -9] Numeric letters and underscores *\ W A non-word character: [^\ w] non-numeric letters and underscores * /\\ s {4} represent four blank characters p ("\ n\ r\ t" .blank ("\\ s {4}")) / /\ S denotes the non-blank character p ("a" .blank ("\\ S")); / /\ w {3} denotes numeric letters and underscores p ("axi8" .letters ("\\ w {3}")); p ("abc888& ^%"); P ("[amurz] {1} 3}\ d + [% ^ & *] +"); / / match\ p ("\\" .letters ("\")) }} / / output results are all true

Boundary treatment

^ denotes [^] in square brackets and the beginning of a string if not in square brackets.

Public class Demo6 {private static void p (Object o) {System.out.println (o);} public static void main (String [] args) {/ * ^ The beginning of a line the beginning of a string * the end of a The end of a line string *\ b A word boundary the boundary of a word, which can be spaces, newline characters, etc. * / p ("hello sir" ("^ h.*")) P ("hellosir" .colors (". * r $"); p ("hellosir" .colors ("^ h [amurz] {1merz] {1mer3} o\\ b.*")); p ("hellosir" .colors ("^ h [amerz] {1merz] {1mer3} o\\ b.*"));}} exercise: match blank lines and email addresses

How to judge how many blank lines there are when you get an article? Regular expressions can be easily matched. Note that blank lines may include spaces, tabs, etc.

P ("\ n" .resume ("^ [\\ sstories & [^\ n]] *\\ n$"))

Explanation: ^ [\\ swords & [^\ n]] * is a space symbol but not a newline character, and\ n$ ends with a newline character

Here is the matching mailbox.

P ("liuyj24@126.com" .conversation ("[\ w [. -]] + @ [\ w [. -]] +\ .[\ w] +")

Explanation: [\ w [. -]] + underline one or more numeric letters. Or-composition, @ followed by an @ symbol, then also [\ w [. -]] +, then\. Match., and finally [\ w] +

Matches (), find () and lookingAt () of the Matcher class

The matches () method matches the entire string to the template.

Find () matches from the current position. If find () is used first after passing in a string, then the current position is the beginning of the string. For a specific analysis of the current position, please see the following code example

The lookingAt () method matches from the beginning of the string.

Public class Demo8 {private static void p (Object o) {System.out.println (o);} public static void main (String [] args) {Pattern pattern = Pattern.compile ("\ d {3Power5}"); String s = "123-34345-234-00"; Matcher m = pattern.matcher (s); / / demonstrate matches () first, matching the entire string. P (m.matches ()); / / the result is false. Obviously, if you want to match 3 or 5 digits, you will fail to match / / and then demonstrate find (). First, use the reset () method to set the current position to the beginning of the string m.reset (); p (m.find ()); / / true match 123success p (m.find ()); / / true match 34345 success p (m.find ()) / / true match 234 success p (m.find ()); / / false match 00 failed / / below we demonstrate not using reset () in matches () to see the change in the current position m.reset (); / / reset p (m.matches ()) first; / / false failed to match the entire string and the current position came to-p (m.find ()); / / true match 34345 success p (m.find ()) / / true match 234 success p (m.find ()); / / false match 00 start edge p (m.find ()); / / false has nothing to match, fail / / demonstrate lookingAt (), start from scratch to find p (m.lookingAt ()); / / true finds 123, success}} start () and end () in the Matcher class

If a match succeeds, start () is used to return the position where the match begins, and end () is used to return the position after the end character of the match.

Public class Demo9 {private static void p (Object o) {System.out.println (o);} public static void main (String [] args) {Pattern pattern = Pattern.compile ("\ d {3Power5}"); String s = "123-34345-234-00"; Matcher m = pattern.matcher (s); p (m.find ()); / / true matching p ("start:" + m.start () + "- end:" + m.end ()); p (m.find ()) / / true match 34345 success p ("start:" + m.start () + "- end:" + m.end ()); p (m.find ()); / / true match success p ("start:" + m.start () + "- end:" + m.end ()); p (m.find ()); / / false match 00 failed try {p ("start:" + m.start () + "- end:" + m.end ()) } catch (Exception e) {System.out.println ("wrong...");} p (m.lookingAt ()); p ("start:" + m.start () + "- end:" + m.end ());} / / output result truestart: 0-end:3truestart: 4-end:9truestart: 10-end:13false wrong... truestart: 0-end:3

Replacement string

If you want to replace a string, you must first find the string to be replaced. Here is a new introduction to a method group () in the Matcher class, which returns the matching string.

Let's look at an example of converting java in a string to uppercase.

Public class Demo10 {private static void p (Object o) {System.out.println (o);} public static void main (String [] args) {Pattern p = Pattern.compile ("java"); Matcher m = p.matcher ("java Java JAVA JAva I love Java and you"); p (m.replaceAll ("JAVA")); / / replaceAll () method replaces all matching strings}} / / output JAVA Java JAVA JAva I love Java and you

Upgrade: finding and replacing strings insensitive to case

In order to be case-insensitive when matching, we need to specify case-insensitive public static void main (String [] args) {Pattern p = Pattern.compile ("java", Pattern.CASE_INSENSITIVE) when creating template templates; / / specify case-insensitive Matcher m = p.matcher ("java Java JAVA JAva I love Java and you"); p (m.replaceAll ("JAVA")) } / / output result JAVA JAVA JAVA JAVA I love JAVA and you and then upgrade: case-insensitive, replacing the specified string found

Here is a demonstration of converting the odd string to uppercase and the even number to lowercase.

Here you will introduce a powerful method appendReplacement (StringBuffer sb, String replacement) in the Matcher class, which requires passing in a

StringBuffer performs string concatenation.

Public static void main (String [] args) {Pattern p = Pattern.compile ("java", Pattern.CASE_INSENSITIVE); Matcher m = p.matcher ("java Java JAVA JAva I love Java and you?"); StringBuffer sb = new StringBuffer (); int index = 1; while (m.find ()) {/ / m.appendReplacement (sb, (index++ & 1) = 0? "java": "JAVA"); the more concise way to write if ((index & 1) = = 0) {/ / even m.appendReplacement (sb, "java");} else {m.appendReplacement (sb, "JAVA");} index++;} m.appendTail (sb); / / add the remaining strings to p (sb);} / / output the result JAVA java JAVA java I love JAVA and you? Grouping

Starting with a question, take a look at the following code

Public static void main (String [] args) {Pattern p = Pattern.compile ("\\ d {3 p.matcher 5} [a murz] {2}"); String s = "123aa-5423zx-642oi-00"; Matcher m = p.matcher (s); while (m.find ()) {p (m.group ());}} / / output result 123aa5423zx642oi

The regular expression "\ d {3pm 5} [amurz] {2}" means three or five numbers followed by two letters, and then print out each matching string.

What if you want to print the numbers in each matching string?

First of all, you may want to match the matching string, but this is too troublesome, the grouping mechanism can help us to group in regular expressions.

It is stipulated that () is used for grouping, here we divide letters and numbers into groups of "(\ d {3jue 5}) ([amurz] {2})".

Then pass in the group number when calling the m.group (int group) method

Note that the group number starts at 0, and the group 0 represents the entire regular expression. After 0, there is a group for each left parenthesis in the regular expression from left to right. In this expression, the first group is numbers and the second group is letters.

Public static void main (String [] args) {Pattern p = Pattern.compile ("(\ d {3heli5}) ([a Mustz] {2})"); / / regular expression is 3 to 5 digits followed by two letters String s = "123aa-5423zx-642oi-00"; Matcher m = p.matcher (s); while (m.find ()) {p (m.group (1));} / output result 1: grab the email address in the web page (crawler)

Suppose we have some high-quality resources on hand and intend to share them with netizens, so we send a post leaving mailbox resources on the post bar. Unexpectedly, netizens were enthusiastic and left nearly a hundred mailboxes. But copy and send one by one is too tiring, we consider using the program to achieve.

We will not expand the e-mail section here, but focus on using the regular expressions you have learned to intercept all email addresses from the web page.

First of all, get the html code of a post, find a random one, click Jump, and right-click in the browser to save the html file.

Let's take a look at the code:

Public class Demo12 {public static void main (String [] args) {BufferedReader br = null; try {br = new BufferedReader ("C:\\ emailTest.html"); String line = ""; while ((line = br.readLine ()! = null) {/ / read every line of parse (line) of the file; / / parse the email address} catch (FileNotFoundException e) {e.printStackTrace ();} catch (IOException e) {e.printStackTrace () } finally {if (br! = null) {try {br.close (); br = null;} catch (IOException e) {e.printStackTrace ();} private static void parse (String line) {Pattern p = Pattern.compile ("[\\ w [. -]] + @ [\\ w [. -]] +\. [\\ w] +"); Matcher m = p.matcher (line); while (m.find ()) {System.out.println (m.group ()) } / / output result 2819531636roomqq.com2819531636roomqq.com2405059759roomqq.com2405059759roomqq.com1013376804 roomqq.com. Practice 2: code Statistics Mini Program

The last practical case: count how many lines of code, how many comments, how many blank lines there are in a project. You might as well count the projects you have done and find that you are unwittingly a person who has written thousands of lines of code.

I selected a project on github, which is a small project written by pure java, which is convenient for statistics. Click to jump

The following is the specific code, in addition to judging that the blank lines use regular expressions, the lines of code and comments use the api of the String class

Public class Demo13 {private static long codeLines = 0; private static long commentLines = 0; private static long whiteLines = 0; private static String filePath = "C:\\ TankOnline"; public static void main (String [] args) {process (filePath); System.out.println ("codeLines:" + codeLines); System.out.println ("commentLines:" + commentLines); System.out.println ("whiteLines:" + whiteLines) } / * Recursive lookup file * @ param pathStr * / public static void process (String pathStr) {File file = new File (pathStr); if (file.isDirectory ()) {/ / if it is a folder, recursively find File [] fileList = file.listFiles (); for (File f: fileList) {String fPath = f.getAbsolutePath (); process (fPath) }} else if (file.isFile ()) {/ / is a file to determine whether it is a .java file if (file.getName (). Matches (". *\\ .java $") {parse (file);} private static void parse (File file) {BufferedReader br = null; try {br = new BufferedReader (new FileReader (file)); String line = "; while (line = br.readLine ()! = null) {line = line.trim () / / clear the space if at the beginning and end of each line (line.matches ("^ [\\ slots & [^\\ slots & [^\\ n]] * $") {/ / Note that it does not end with\ n, because in br.readLine () it removes\ nwhiteLines++;} else if (line.startsWith ("/ *") | | line.startsWith ("*") | | line.endsWith ("* /")) {commentLines++ } else if (line.startsWith ("/ /") | | line.contains ("/")) {commentLines++;} else {if (line.startsWith ("import") | | line.startsWith ("package")) {/ / the import package is not continue;} codeLines++;} catch (FileNotFoundException e) {e.printStackTrace ();} catch (IOException e) {e.printStackTrace ();} finally {if (null! = br) {try {br.close (); br = null } catch (IOException e) {e.printStackTrace ();}} / / output result codeLines: 1139commentLines: 124whiteLines: 172greedy mode and non-greedy mode

After two actual battles, I believe you have mastered the basic use of regular expressions. Greedy mode and non-greedy mode are introduced below. By looking at the official api, we find that the Pattern class has the following definition:

Greedy quantifiers greedy mode

X? X, once or not at all

XX, zero or more times

XX + X, one or more times

X {n} X, exactly n times

X {n,} X, at least n times

X {n,m} X, at least n but not more than m times

Reluctant quantifiers non-greedy mode (reluctant, reluctant)

X?? X, once or not at all

Xerox? X, zero or more times

Xerox? X, one or more times

X {n}? X, exactly n times

X {n,}? X, at least n times

X {n,m}? X, at least n but not more than m times

Possessive quantifiers exclusive mode

Xylene + X, once or not at all

Xylene + X, zero or more times

Xylene + X, one or more times

X {n} + X, exactly n times

X {n,} + X, at least n times

X {n ·m} + X, at least n but not more than m times

The meaning of these three modes is the same, and we all use the greed mode in the previous explanation. So what's the difference between the other two modes? Explain it through the following code example.

Public static void main (String [] args) {Pattern p = Pattern.compile (". {3public static void main 10} [0-9]"); String s = "aaaa5bbbb6"; / / 10 characters Matcher m = p.matcher (s); if (m.find ()) {System.out.println (m.start () + "-" + m.end ());} else {System.out.println ("not match!");}} / / output result 0-10

A regular expression means 3 to 10 characters plus a number. When matching in greedy mode, the system will swallow 10 characters first, then check whether the last one is a number, find that there are no characters, so spit out a character, match the number again, match successfully, and get 0-10.

The following is a demonstration of non-greedy mode (reluctantly, reluctantly)

Public static void main (String [] args) {Pattern p = Pattern.compile (". {3 args 10}? [0-9]"); / / added one? String s = "aaaa5bbbb6"; Matcher m = p.matcher (s); if (m.find ()) {System.out.println (m.start () + "-" + m.end ());} else {System.out.println ("not match!");}} / / output result 0-5

In non-greedy mode, first only swallow 3 (at least 3), then determine whether the latter one is a number, the result is not, swallow a character later, continue to determine whether the latter is a number, the result is, output 0-5

Finally, the exclusive mode is demonstrated, which is usually done only in the pursuit of efficiency, and is rarely used.

Public static void main (String [] args) {Pattern p = Pattern.compile (". {3public static void main 10} + [0-9]"); / / more + String s = "aaaa5bbbb6"; Matcher m = p.matcher (s); if (m.find ()) {System.out.println (m.start () + "-" + m.end ());} else {System.out.println ("not match!");} / / output result not match!

Exclusive mode will swallow 10 characters at once, then determine whether the latter is a number or not, and it will not continue to swallow or spit out a character regardless of whether it matches or not.

The above is how regular expressions are used in Java, and the editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report