What are the knowledge points of regular expressions? 07/02 Update SLTechnology News&Howtos

What are the knowledge points of regular expressions?

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article shares with you what are the knowledge points about regular expressions. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

A preface

I believe many people know about regular expressions, but the first feeling of many people is that it is difficult to learn, because at first glance, they feel that there are no rules to find, and they are all a bunch of all kinds of special symbols.

In fact, it's just that you don't know about the regularity. if you understand it, you will find that there are not many related characters used in the regularization, and it is not difficult to remember and even more difficult to understand. the only difficulty is that after the combination, the readability is relatively poor, and it is not easy to understand. The purpose of this article is to make people have a basic understanding of regularities, can understand simple regular expressions, and can write simple regular expressions. It can be used to meet the needs of daily development.

0\ d {2} -\ d {8} | 0\ d {3} -\ d {7} let's start with a regular paragraph. If you don't understand the regularity, do you have no idea what this string of characters means? It doesn't matter. The article will explain the meaning of each character in detail.

1.1 what is a regular expression

A regular expression is a special string pattern used to match a set of strings, just like using a mold to make a product, and the regular is the mold that defines a rule to match characters that match the rules.

1.2 commonly used regular matching tools

Online matching tool:

1 http://www.regexpal.com/

2 http://rubular.com/

Regular matching software McTracer

After using a few still think this is the best to use, support the rules into the corresponding language such as java C # js also help you escape, Copy directly used on the line is very convenient, in addition to support the regular expression usage interpretation, such as which section is captured grouping, which section is greedy matching, etc., in short to use So Happy.

A brief introduction to two regular characters

2.1 metacharacter introduction

"^": ^ matches the starting position of a line or string, and sometimes the starting position of the entire document.

"$": $matches the end of a line or string

As shown in the picture

And the characters to be matched must begin with This with spaces, must end with Regex, and must not have spaces and other characters.

"\ b": does not consume any characters to match only one position, and is often used to match word boundaries. If I want to match the individual word "is" from the string "This is Regex", it will be written as "\ bis\ b".

\ b will not match the characters on both sides of is, but it will recognize whether the two sides of is are the boundaries of words.

"\ d": match the number

For example, to match a fixed format phone number with the first four digits starting with 0 and the last seven digits, such as 0737-5686123 regular: ^ 0\ d\ d\ d -\ d\ d $here is only to introduce the "\ d" character, there is actually a better way to write it below.

"\ w": matches letters, numbers, and underscores.

For example, I want to match the "a2345BCD__TTz" regular: "\ w +" where the "+" character is the number of times a quantifier refers to repetition, which will be described in more detail later.

"\ s": match spaces

For example, the character "a b c" is regular: "\ w\ s\ w\ s\ w" is followed by a space. If there are multiple spaces between characters, write "\ s" as "\ s +" and let the space repeat.

".: matches any character except a newline character

This is an enhanced version of "\ w". "\ w" cannot match spaces. If you add a string to a space with "\ w", it will be limited. How to match the character "a23 4 5 B C D__TTz" regular: ". +"

"[abc]": character groups match characters that contain elements in parentheses

This is relatively simple to match only the characters that exist in parentheses, and it can also be written as [a murz] matches a to z, so the letters can be used to control only the input of English.

2.2 several antonyms

The writing method is very simple and can be changed to uppercase, which means the opposite of the original, so I won't give an example here.

"\ W" matches any character that is not a letter, number, or underscore

"\ S" matches any character that is not a blank character

"\ D" matches any non-numeric character

The "\ B" match is not the beginning or end of the word.

"[^ abc]" matches any character except abc

2.3 quantifier

First explain the three important concepts involved in quantifiers

Greedy (greedy) characters such as the "*" character greedy quantifier will first match the entire string, when trying to match, it will select as much as possible, if it fails, it will back out one character, and then try back again the process is called backtracking, it will fall back one character at a time until a match is found or there is no character to fall back. Compared with the following two kinds of greedy quantifiers, the consumption of resources is the largest.

Lazy (reluctantly) like "?" The lazy quantifier matches in another way, trying to match from the beginning of the target, checking one character at a time and looking for what it wants to match, and so on until the end of the character.

Possessive quantifiers such as "+" overwrite the target string and then try to find a match, but it only tries once and does not go back, just like grabbing a stone and then picking out gold from the stone

"*" (greed) repeat zero or more times

For example, "aaaaaaaa" matches all the a regularities in the string: "a *" gives all the characters "a"

"+" (lazy) repeat one or more times

For example, "aaaaaaaa" matches all a regularities in a string: "a +" takes all a characters in a character. "a +" differs from "a *" in that "+" is at least once while "*" can be 0 times.

I will talk to "?" later. The combination of characters reflects this difference.

"?" (possession) repeat zero or one time

For example, "aaaaaaaa" matches a regular in a string: "a?" It will only match once, that is, the result is only a single character a.

Repeat "{n}" n times

For example, matching the an of a string from "aaaaaaaa" and repeating the regular 3 times: "a {3}" results in getting 3 a characters "aaa"

Repeat "{n ~ m}" n to m times

For example, regular "a {3pr 4}" repeatedly matches a for 3 or 4 times, so the matching characters can be three "aaa" or four "aaaa" regularities.

Repeat "{n,}" n or more times

What differs from {nrecom} is that there is no upper limit on the number of matches, but at least n times, such as regular "a {3,}" a, must be repeated at least 3 times.

After knowing the quantifier, the regularity that matches the phone number can now be changed to simple ^ 0\ d\ d\ d -\ d\ d $can be changed to "^ 0\ dwords -\ d {7} $".

This is not perfect if the previous area code is not defined so that you can enter a lot of them, usually only 3 or 4 digits

Now change the "^ 0\ d {2jin3} -\ d {7}" so that the area code can match 3 or 4 digits.

2.4 lazy qualifier

"?" Repeat any number of times, but repeat as little as possible

For example, the "acbacb" regular "a.roomroomb" will only get the first "acb", but with the qualifier, it will only match as few characters as possible, and the result of the minimum number of characters of "acbacb" will be "acb".

"+?" Repeat one or more times, but repeat as little as possible

Same as above, except that it should be repeated at least once.

"?" Repeat 0 or 1 times, but repeat as little as possible

For example, the regular "a. Aaacb" will only get the last three characters "acb".

"{nMagne m}?" Repeat n to m times, but repeat as little as possible

For example, the "aaaaaaaa" regular "a {0jue m}" is empty because it is at least 0 times.

"{n,}?" Repeat more than n times, but repeat as little as possible

For example, "aaaaaaa" regular "a {1,}" is at least once, so the result is "a".

Three regular progression

3.1 capture packet

First understand the concept of capturing a packet in a regular, which is actually a parenthetical content such as "(\ d)\ d" and "(\ d)". You can reference the capture group backward (if you have the same content later, you can directly refer to the capture group defined earlier. To simplify the expression) such as (\ d)\ d\ 1 the "\ 1" here is the backward reference to "(\ d)"

What's the use of capturing grouping? just look at an example.

For example, "zery zery" regular\ b (\ w +)\ b\ s\ 1\ b so the characters captured by "\ 1" here are also the same as "zery". In order to make the group name more meaningful, the group name can be customized.

"\ b (?\ w+)\ b\ s\ k\ b" with "?" You can customize the group name and remember to write "\ k" when you want to reference the group backwards; after customizing the group name, the matching values in the captured group will be saved in the defined group name.

The common uses of capture packets are listed below

"(exp)" matches the exp and captures the text to the automatically named group

"(? exp)" matches exp and captures text into a group named name

"(?: exp)" matches exp, does not capture matching text, and does not assign a group number to this packet

The following are zero-width assertions

"(? = exp)" matches the position in front of exp

For example, "How are you doing" regular "(?. + (? = ing)" here takes all the characters before ing and defines a capture group whose name is "txt" and the value in the "txt" group is "How are you do".

"(? 4 share the complete directory structure of the project project

In the process of project development, it is very important to save all kinds of data files in the project in order and establish a directory structure with clear classification and convenient management. Combining the previous project and the project structure of some friends, I compiled a catalog structure of the project that I think is pretty good. I would like to share with you here. You are welcome to put forward your valuable opinions and suggestions. If you like, please "recommend" it, thank you very much! The whole directory is set to a level 4 subdirectory.

The seventh young master posted comments at 15:48 on 2013-11-23.

The key information is obtained by constructing a Http request to fetch the data and process the data accordingly, which shows the powerful power of regularization when filtering Html tags to fetch articles.

Regular knowledge points are also basically used, such as "\ s\ w +. *?" and capture packets, zero-width assertions, and so on. Friends who like can give it a try, and then see for themselves how to get the corresponding data through the regularities. The regularities in the code are very basic and simple, and their meaning and usage are written in detail above.

Class Program {static void Main (string [] args) {string content = HttpUtility.HttpGetHtml (); HttpUtility.GetArticles (content);}} internal class HttpUtility {/ / get the first page of data public static string HttpGetHtml () {HttpWebRequest request = (HttpWebRequest) WebRequest.Create ("http://www.cnblogs.com/"); request.Accept =" text/plain, * / *; qroom0.01 "; request.Method =" GET "by default) Request.Headers.Add ("Accept-Language", "zh-cn,zh;q=0.8,en-us;q=0.5,en;q=0.3"); request.ContentLength = 0; request.Host = "www.cnblogs.com"; request.UserAgent = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.1 (KHTML, like Gecko) Maxthon/4.1.3.5000 Chrome/26.0.1410.43 Safari/537.1"; HttpWebResponse response = (HttpWebResponse) request.GetResponse () Stream responStream = response.GetResponseStream (); StreamReader reader = new StreamReader (responStream, Encoding.UTF8); string content = reader.ReadToEnd (); return content;} public static List GetArticles (string htmlString) {List articleList = new List (); Regex regex = null; Article article = null; regex = new Regex ("(?. *?) (? =" + @ "\ s*), RegexOptions.Singleline); if (regex.IsMatch (htmlString)) {MatchCollection aritcles = regex.Matches (htmlString) Foreach (Match item in aritcles) {article = new Article (); / / recommended regex = new Regex (". * (?. *)" + @ "+". * ", RegexOptions.Singleline); article.DiggNum = regex.Match (item.Value). Groups [" digNum "] .value; / / the escape character regex = new Regex (" (?. *) ", RegexOptions.Singleline) needs to be removed from the article title. String a = regex.Match (item.Value). Groups ["a"] .value; regex = new Regex ("(?. *)", RegexOptions.Singleline); article.AritcleUrl = regex.Match (a). Groups ["href"] .value; article.AritcleTitle = regex.Match (a). Groups ["summary"] .Value; / / take the author's picture regex = new Regex ("(?

]. * >) ", RegexOptions.Singleline); article.AuthorImg = regex.Match (item.Value). Groups [" img "] .value; / / take the target attribute of the author's blog URL and link regex = new Regex (". * ", RegexOptions.Singleline); article.AuthorUrl = regex.Match (item.Value). Groups [" href "] .value; string urlTarget = regex.Match (item.Value). Groups [" target "]. Value / / introduction to the article / / 1 fetch all the content in summary Div first regex = new Regex ("(?. *)

", RegexOptions.Singleline); string summary = regex.Match (item.Value). Groups [" summary "] .value; / / 2 to introduce regex = new Regex (" (?)?

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.