Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the use of regular expressions in JAVA

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly introduces the use of regular expressions in JAVA, has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, the following let Xiaobian take you to understand.

In Java JDK version 1.40 of Sun, Java comes with a package that supports regular expressions, so this article gives you an overview of how to use the java.util.regex package.

A rough estimate is that all Linux users will encounter regular expressions except those who occasionally use Linux. Regular expressions are an extremely powerful tool and are flexible in string pattern-matching and string pattern substitution. In the Unix world, there are few restrictions on regular expressions, but to be sure, they are widely used.

The regular expression engine has been implemented by many common Unix tools, including grep,awk,vi and Emacs. In addition, many widely used scripting languages also support regular expressions, such as Python,Tcl,JavaScript, and the most famous Perl.

I was a Perl hacker a long time ago, and if you're like me, you'll be very dependent on these powerful text-munging tools at hand. In recent years, like other developers, I have paid more and more attention to the development of Java.

As a development language, Java has a lot to recommend, but it doesn't have its own support for regular expressions. Until recently, Java began to support regular expressions with the help of third-party class libraries, but these third-party class libraries were inconsistent, poorly compatible, and poorly maintained code. This shortcoming has always been a huge concern for me to choose Java as my primary development tool.

You can imagine how happy I was to know that Java JDK version 1.40 of Sun included java.util.regex, a fully open, native package of regular expressions! It's funny to say that I spent a lot of time digging up this hidden gem. I am very surprised that Java is such a big improvement (with its own java.util.regex package) why not make it a little more public?

Recently, both Java feet have jumped into the world of regular expressions. The java.util.regex package also has its advantages in supporting regular expressions, and Java also provides detailed documentation. So that the hazy mysterious scene of regex is also slowly poked away. There are some regular expression constructs (perhaps most notably, a mix of character class libraries) that can't be found in Perl.

In the regex package, there are two classes, Pattern (pattern class) and Matcher (matcher class). The Pattern class is the object used to express and state the search pattern you want, and the Matcher class is the object that really affects the search. Add a new exception class, PatternSyntaxException, which throws an exception when an illegal search pattern is encountered.

Even if you are familiar with regular expressions, you will find that using regular expressions through java is quite easy. To be clear, for Perl enthusiasts spoiled by Perl's single-line matching, replacing using java's regex package can be more cumbersome than their usual methods.

The limitation of this article is that it is not a complete tutorial on the use of regular expressions. For readers who want to learn more about regular expressions, it is recommended to read Jeffrey Frieldl's Mastering Regular Expressions, which is published by O'Reilly. Here are some examples to teach readers how to use regular expressions and how to use them more easily.

It can be complicated to design a simple expression to match any phone number number because there are many situations in which the phone number format can be used. All must choose a more effective mode. For example: (212555-1212, 212555-1212 and 212555 1212, some people will think that they are all equivalent.

First, let's construct a regular expression. For simplicity, construct a regular expression to identify the phone number digits in the following format: (nnn) nnn-nnnn.

The first step is to create a pattern object to match the above substring. Once the program is running, you can generalize this object if necessary. A regular expression that matches the above format can be formed as follows: (/ d {3}) / s _ d/d/d d {3}-/ d {4}, where the / d single character type is used to match any number from 0 to 9, and the {3} repeat symbol is a simple token to indicate that there are three consecutive digits, which is also equivalent to (/ digit). / s is also another useful single character type, which is used to match spaces, such as the Space key, tab key, and newline characters.

Isn't it easy? However, if you use this regular expression pattern in a java program, there are two more things to do. For the java interpreter, the characters before the backslash character (/) have a special meaning. In java, not all packages related to regex can understand and recognize backslash characters (/), although you can try. However, to avoid this, in order for the backslash character (/) to be fully passed in the schema object, the double backslash character (/) should be used. In addition, parentheses have two meanings in the regular expression, and if you want it to be interpreted as literal (that is, parentheses), you also need to precede it with a double backslash character (/). That is, like the following:

/ / (/ / d {3} / /) / / _ d {3}-/ d {4}.

Now let's show you how to implement the regular expression you just mentioned in your java code. To keep in mind, when using a package of regular expressions, you need to include the package before the class you define, that is, a line like this:

Import java.util.regex.*

The following code implements the function of reading line by line from a text file and searching for phone number digits line by line, and once a match is found, it is output to the console.

BufferedReader in; Pattern pattern = Pattern.compile ("/ / (/ / d {3} / /) / sUnip d {3}-/ / d {4}"); in = new BufferedReader (new FileReader ("phone")); String s; while ((s = in.readLine ())! = null) {Matcher matcher = pattern.matcher (s); if (matcher.find ()) {System.out.println (matcher.group ()) }} in.close ()

For those who are familiar with implementing regular expressions in Python or Javascript, this code is common. In languages such as Python and Javascript, or other languages, once these regular expressions have been explicitly compiled, you can use them wherever you want. It looks like a lot of work has been done compared to Perl's single-step matching, but it doesn't take a lot of work.

The find () method, as you might imagine, is used to search for any target string that matches the regular expression, and the group () method is used to return a string containing the matched text. It should be noted that the above code is used only if each line can contain only one matching phone number digit string. It is safe to say that java's regular expression package can be used in a search when a line contains multiple matching targets. The original intention of this article is to give some simple examples to inspire readers to further learn the regular expression package that comes with java, so there is no in-depth discussion on this.

This is quite beautiful! But unfortunately, this is just a phone number matcher. Obviously, there are two things that can be improved. If there is a space at the beginning of the phone number, that is, between the area code and the local number. We can also match these situations by adding / s to the regular expression. To achieve, among them? The metacharacter indicates that there may be 0 or 1 space character in the pattern.

The second point is that between the first three digits and the last four digits of the local number digit, there may be a space character rather than a hyphen, and there may be a winner, or there may be no separator at all, that is, seven digits are linked together. For these cases, we can use (- |)? To solve it. The regular expression of this structure is the converter, which can match several cases mentioned above. When () can contain the pipe character |, can it match whether it contains a space character or hyphen, while the trailing one? Metacharacters indicate whether there is no delimiter at all.

Finally, the location number may not be included in parentheses, which can be simply appended after the parentheses. Metacharacters, but this is not a good solution. Because it also contains unmatched parentheses, such as "(555" or "555)". Instead, we can use another converter to force whether the phone number has parentheses: / (/ d {3} /) | / d {3}). If we replace the regular expressions in the above code with these improved ones, the above code becomes a very useful phone number digit matcher:

Pattern pattern =

Pattern.compile ("(/ / (/ / d {3} / /) | / d {3}) / / sambixample d {3} (- |)? / / d {4}")

To be sure, you can try to further improve the above code yourself.

Now look at the second example, which is adapted from Friedl. Its function is to check whether there are duplicate words in the text file, which is often encountered in printing and typesetting, and it is also a problem of grammar checker.

Matching words, like others, can also be done through several regular expressions. Perhaps the most straightforward is / b/w+/b, which has the advantage of using only a small number of regex metacharacters. The / w metacharacter is used to match any character from the letters a to u. The + metacharacter means to match one or more characters, and the / b metacharacter is used to indicate the boundary of the matching word, which can be a space or any different punctuation mark (including commas, periods, etc.).

Now, how do we check whether a given word has been repeated three times? To accomplish this task, you need to take full advantage of the well-known backward scanning in regular expressions. As mentioned earlier, parentheses have several different uses in regular expressions, one of which is to provide a combination type, which is used to hold matching results or partially matched results (so that they can be used later). Even if you encounter the same pattern. In the same regular expression, it is possible (and usually expected) that there is more than one combination type. In the nth combination type, the matching result can be obtained by scanning backward. Scanning backwards makes it easy to search for repetitive words: / b (/ w +) / swords UniUnip 1xb.

Parentheses form a combination type, which is the first (and only) combination type in this regular representation. Scanning / 1 backwards refers to any word matched by / w +. Our regular expression therefore matches a word that has one or more spaces followed by the same word. Note that the positioning type of the tail (/ b) is essential to prevent errors. If we want to match "Paris in the the spring" instead of "Java's regex package is the theme of this article". According to the current format of java, the above regular expression is: Pattern pattern = Pattern.compile ("/ / b (/ / w+) / / swords)

The final further modification is to make our matcher case-sensitive. For example, the following is the case: "The the theme of this article is the Java's regex package.", which can be easily achieved in regex, that is, by using the static flag CASE_INSENSITIVE predefined in the Pattern class:

Pattern pattern = Pattern.compile ("/ / b (/ / w+) / / squnxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxuanxuanxuanxuanxuan

Pattern.CASE_INSENSITIVE)

The topics about regular expressions are very rich and complex, and the implementation with Java is also very extensive, so it requires a thorough study of the regex package, and what we are talking about here is just the tip of the iceberg. Even if you are new to regular expressions, you will soon find it powerful and scalable after using the regex package. If you are a seasoned regular expression hacker from Perl or other languages, after using the regex package, you will feel at ease into the world of java, give up other tools, and regard java's regex package as a must-have weapon at hand.

CharSequence

JDK 1. 4 defines a new interface called CharSequence. It provides an abstraction of the character sequence of the String and StringBuffer classes:

CharSequence {charAt (I); length (); subSequence (start, end); toString ();}

Both String,StringBuffer and CharBuffer have been modified to implement this new CharSequence interface. Many regular expression operations take CharSequence as an argument.

Pattern and Matcher

Let me give you an example. The following program can test whether regular expressions match strings. The first parameter is the string to match, followed by a regular expression. There can be multiple regular expressions. In the Unix/Linux environment, regular expressions under the command line must also be in quotation marks.

Java.util.regex.*; TestRegularExpression {main (String [] args) {(args.length < 2) {System.out.println (+ +); System.exit (0);} System.out.println (/); (I = 1; I < args.length; iTunes +) {System.out.println (/); Pattern p = Pattern.compile (args [I]); Matcher m = p.matcher (args [0]) (m.find ()) {System.out.println ("+ m.group () + at positions" + m.start () + (m.end ()-1));}}

Java's regular expressions are implemented by java.util.regex 's Pattern and Matcher classes. The Pattern object represents a compiled regular expression. The static compile () method is responsible for compiling the string representing the regular expression into a Pattern object. As the above routine shows, you can get a Matcher object simply by sending a string to the matcher () method of Pattern. In addition, Pattern also has a quick way to determine whether regex can be found in input.

Matches (regex, input)

And the split () method that returns the String array, which splits the string with regex.

Just pass a string to the Pattern.matcher () method to get the Matcher object. Then you can use Matcher's method to query the matching results.

Matches ()

LookingAt ()

Find ()

Find (start)

The premise of matches () is that Pattern matches the entire string, while lookingAt () means that Pattern matches the beginning of the string.

Find ()

The function of Matcher.find () is to find multiple character sequences in CharSequence that match pattern. For example:

Java.util.regex.*; com.bruceeckel.simpletest.*; java.util.*; FindDemo {Test monitor = Test (); main (String [] args) {Matcher m = Pattern.compile () .matcher (); (m.find ()) System.out.println (m.group ()); I = 0; (m.find (I)) {System.out.print (m.group () +); iTunes;} monitor.expect (String [] {, + +});}}

"/ / w +" means "one or more word characters", so it breaks the string directly into words. Find () is like an iterator that scans the string from beginning to end. The second find () takes an int parameter, and as you can see, it tells the method where to start looking-- that is, from the location of the parameter.

Groups

Group is a regular expression enclosed in parentheses that can be called by subsequent expressions. Group 0 represents the entire expression, group 1 represents the first enclosed group, and so on. So...

A (B (C)) D

There are three group:group 0 is ABCD, group 1 is BC,group 2 is C.

You can use group with the following Matcher methods:

Public int groupCount () returns the number of group in the matcher object. Group0 is not included.

Public String group () returns the group 0 (the whole match) of the last match operation (for example, find ())

Public String group (int I) returns a group of the last matching operation. If the match succeeds, but group cannot be found, null is returned.

Public int start (int group) returns the starting position of the group found in the last match.

Public int end (int group) returns the end position of the group found in the last match, and the subscript of the last character is incremented by one.

Java.util.regex.*; com.bruceeckel.simpletest.*; Groups {Test monitor = Test (); String poem = +; main (String [] args) {Matcher m = Pattern.compile () .matcher (poem); (m.find ()) {(j = 0; j)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report