How to realize position matching of regular expressions 04/27 Update SLTechnology News&Howtos

How to realize position matching of regular expressions

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article is about regular expressions and how to match them. Xiaobian thinks it is quite practical, so share it with everyone for reference. Let's follow Xiaobian and have a look.

The details are as follows:

I. Introduction of problems

If we want to match a word in a paragraph of text (leaving aside multiline patterns, which we'll see later), we might do something like this:

Yesterday is history, tomorrow is a mystery, but today is a gift.

Regular expression: is

Yesterday [is] h [is] tory, tomorrow [is] a mystery, but today [is] a gift.

Analysis: Originally only the word is was matched, but the is contained in other words was also matched. To solve this problem, use boundary delimiters, which are metacharacters in regular expressions that indicate where (or boundary) we want the matching operation to occur.

Second, word boundaries

A common boundary is a word boundary specified by the qualifier\b,\b used to match the beginning and end of a word. More precisely, it matches the position between a character that can be used to form a word (a letter, number, underscore, that is, the character that matches\w) and a character that cannot be used to form a word (the character that matches\W). Consider the previous example:

Yesterday is history, tomorrow is a mystery, but today is a gift.

Regular expression: \bis\b

Yesterday [is] history, tomorrow [is] a mystery, but today [is] a gift.

Analysis: In the original text, the word is is preceded and followed by a space, which matches the pattern\bis\b (a space is one of the characters used to separate words). The word history also contains is, because it is preceded by the characters h and t, neither of which matches\b.

If a word boundary does not match, use\B. For example:

Please enter the nine-digit id as it appears on your color-coded pass-key.

Regular expression: \B-\B

Result: Please enter the [nine-digit] id as it appears on your color-coded [pass-key].

Analysis: \B-\B will match a hyphen that is not a word boundary before or after, nine-digit and pass-key have no spaces before or after the hyphen, so they can match, while color-coded has spaces before and after the hyphen, so they cannot match.

III. String boundaries

Word boundaries can be used to match positions related to words (beginning, end, whole word, etc.). String boundaries serve a similar purpose, except that they are used to match positions related to strings (beginning, end, entire string, and so on). There are two metacharacters used to define string boundaries: ^, which defines the beginning of the string, and $, which defines the end of the string.

For example, to check the validity of an XML document, legal XML documents begin with the following form:

Text:

Regular expression: ^\s*

Results:

Parsing: ^matches the beginning of a string, so ^\s* will match the beginning of a string followed by zero or more whitespace characters, since whitespace characters such as spaces, tabs, newlines, etc. are allowed before labels.

The usage of the $metacharacter is exactly the same as ^except for the difference in position. For example, check whether an html page ends with the pattern: \s*$

IV. Multi-line matching pattern

Regular expressions can change the behavior of some metacharacters by using some special metacharacters. Can it be passed?) m) to enable multiline matching mode. The multi-line matching pattern causes the regular expression engine to treat the line delimiter as if it were a string delimiter. In multiline matching mode,^matches not only the normal start of the string but also the start position after the line delimiter (newline), and $matches not only the normal end of the string but also the end position after the line delimiter (newline).

When used (?) m) must appear at the top of the pattern. For example, regular expressions can be used to find all the single-line comments (starting with//) in a piece of java code.

Text:

publicDownloadingDialog(Frame parent){ //Callsuper constructor, specifying that dialog box is modal. super(parent,true); //Setdialog box title. setTitle("E-mailClient"); //Instructwindow not to close when the "X" is clicked. setDefaultCloseOperation(DO_NOTHING_ON_CLOSE); //Puta message with a nice border in this dialog box. JPanelcontentPanel = new JPanel(); contentPanel.setBorder(BorderFactory.createEmptyBorder(5,5, 5, 5)); contentPanel.add(newJLabel("Downloading messages... ")); setContentPane(contentPanel); //Sizedialog box to components. pack(); //Centerdialog box over application. setLocationRelativeTo(parent);}

Regular expressions: (? m)^\s*//.*$

Results:

publicDownloadingDialog(Frame parent){

【 //Call superconstructor, specifying that dialog box is modal.】

super(parent,true);

【 //Set dialog boxtitle.】

setTitle("E-mailClient");

【 //Instruct windownot to close when the "X" is clicked.】

setDefaultCloseOperation(DO_NOTHING_ON_CLOSE);

【 //Put a messagewith a nice border in this dialog box.】

JPanelcontentPanel = new JPanel();

contentPanel.setBorder(BorderFactory.createEmptyBorder(5,5, 5, 5));

contentPanel.add(newJLabel("Downloading messages... "));

setContentPane(contentPanel);

【 //Size dialog boxto components.】

pack();

【 //Center dialogbox over application.】

setLocationRelativeTo(parent);

}

Analysis: ^\s*//.*$ Matches the beginning of a string, then any number of whitespace characters, then//, then any text, and finally the end of a string. But this pattern can only find the first comment, plus (?) After the prefix m), the newline character is treated as a string delimiter, so that each line of comments can be matched.

Java code implementation is as follows (text saved in text.txt file):

public static String getTextFromFile(String path) throws Exception{ BufferedReader br = new BufferedReader(new FileReader(new File(path))); StringBuilder sb = new StringBuilder(); char[] cbuf = new char[1024]; int len = 0; while(br.ready() && (len = br.read(cbuf)) > 0){ br.read(cbuf); sb.append(cbuf, 0, len); } br.close(); return sb.toString();}public static void multilineMatch() throws Exception{ String text = getTextFromFile("E:/text.txt"); String regex = "(? m)^\\s*//.*$ "; Matcher m = Pattern.compile(regex).matcher(text); while(m.find()){ System.out.println(m.group()); }}

The output is as follows:

//Call super constructor, specifying that dialog box is modal.

//Set dialog box title.

//Instruct window not to close when the "X" is clicked.

//Put a message with a nice border in this dialog box.

//Size dialog box to components.

//Center dialog box over application.

Thank you for reading! About "how to achieve position matching of regular expressions" this article is shared here, I hope the above content can be of some help to everyone, so that everyone can learn more knowledge, if you think the article is good, you can share it to let more people see it!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.