Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the use of Python regular expressions?

2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly explains the "what is the use of Python regular expression", the content of the explanation is simple and clear, easy to learn and understand, now please follow the editor's ideas slowly in depth, together to study and learn "what is the use of Python regular expression" bar!

What is a regular expression?

In short, regular expressions (regex) are used to explore fixed patterns in a given string.

The pattern we want to find can be anything.

You can create patterns similar to finding e-mail or mobile phone numbers. You can also create patterns to find strings that begin with an and end with z.

In the above example:

Import re pattern = r'[,., -] 'print (len (re.findall (pattern,string)

The pattern we want to find out is r'[,., -]'. This mode can find any of the four characters you want. Regex101 is a tool for testing patterns. When the pattern is applied to the target string, the following interface is presented.

As shown in the figure, it can be found as needed in the target string.

The above tools are used whenever you need to test regular expressions. This is much faster and easier to debug than running python over and over again.

Now that we can find these patterns in the target string, how do we really create them?

Create a pattern

When using regular expressions, the first thing you need to learn is how to create patterns.

Next, we will introduce some of the most commonly used patterns one by one.

The simplest pattern you can think of is a simple string.

Pattern = ringing times' string = "It was the best of times, it was the worst of times." Print (len (re.findall (pattern,string)

But it's not very useful. To help create complex patterns, regular expressions provide special characters / operators. Let's take a look at these operators one by one. Please wait for gif to load.

1. [] operator

This was used in the first example to find a character that meets the criteria in these square brackets.

[abc]-will find all a, b, or c that appear in the text

[amurz]-will look for all letters from a to z that appear in the text

[a-z0-9A-Z]-all uppercase letters from A to Z, lowercase letters from a to z, and numbers from 0 to 9 that appear in the text.

You can easily run the following code in Python:

Pattern = r'[a-zA-Z] 'string = "It was the best of times, it was the worst of times." Print (len (re.findall (pattern,string)

In addition to .findall, regular expressions have many other functions, which will be covered later.

two。 Dot operator

Dot operator (.) Used to match any character except a newline character.

The greatest advantage of operators is that they can be used together.

For example, you want to find a 6-letter substring in a string that starts with a small d or an uppercase D and ends with the letter e.

3. Some metasequences

Some patterns are often used when using regular expressions. So regular expressions create some shortcuts for these patterns. The most common shortcuts are as follows:

\ w, match any letter, number, or underscore. It is equivalent to [aMuzAmurZ0Mutual 9 _]

\ W, matching anything except letters, numbers, or underscores.

\ d, match any decimal number. It is equivalent to [0mur9].

\ D, matching any number except the decimal number.

4. Plus sign and star operator

The dot operator is only used to get a single instance of any character. What should I do if I want to find more examples?

The plus sign + is used to represent one or more instances of the leftmost character.

The asterisk * is used to represent 0 or more instances of the leftmost character.

For example, if you want to find all substrings that begin with d and end with e, there can be no or multiple characters between d and e. We can use: d\ wenche

If we want to find all the substrings that begin with d and end with e, there is at least one character between d and e, we can use: d\ wenche

You can also use a more general method: use {}

\ w {n}-repeat\ w exactly n times.

\ w {n,}-repeat\ w at least n times, or more.

\ w {N1, N2}-repeat\ w at least N1 times but not more than N2 times.

5.^ caret and $dollar sign.

The caret matches the beginning of the string, while the $dollar sign matches the end of the string.

6. Word boundary

This is an important concept.

Have you noticed that in the above example, it always matches the substring, not the word?

What if you want to find all the words that start with d?

Can I use d\ w * mode? Let's try it with network tools.

Regular expression function

So far, only the findall function in the re package has been used, but there are many other functions. Let's introduce them one by one.

1. Findall

Findall is already used above. This is the one I use most often. Let's formally get to know this function.

Entering: patterns and test strings

Output: list of strings.

# USAGE: pattern = r'[iI] t 'string = "It was the best of times, it was the worst of times." Matches = re.findall (pattern,string) for match in matches: print (match)-It it

two。 Search

Entering: patterns and test strings

Output: the location object matched for the first time.

# USAGE: pattern = r'[iI] t 'string = "It was the best of times, it was the worst of times." Location = re.search (pattern,string) print (location)

You can use the following programming to get the data for this location object:

Print (location.group ())-'It'

3. Replace

This function is also very important. When using a natural language processor, you sometimes need to replace integers with X, or you may need to edit some files. Finding and replacing can be done in any text editor.

Enter: search pattern, replacement pattern, and target string

Output: replacement string

String = "It was the best of times, it was the worst of times."

String = re.sub (ringing timesystems, ringing lifetimes, string)

Print (string)

It was the best of life, it was the worst of life.

Case study

Regular expressions are used in many situations that require validation. We may see a prompt like this on the website: "this is not a valid email address." Although you can use multiple if and else conditions to write such prompts, regular expressions may have an advantage.

1.PAN number

In the United States, SSN (Social Security number) is the number used for tax identification, while in India, PAN number is used for tax identification. The basic verification standard of PAN is that all the letters above must be capitalized, and the order of the characters is as follows:

So the question is:

Is "ABcDE1234L" a valid PAN number?

If there is no regular expression, how do you answer this question? You might write a for loop and traverse the search. But if you use regular expressions, it's as simple as the following:

Match=re.search (r'[Amurz] {5} [0Mel 9] {4} [Amurz]', 'ABcDE1234L') if match: print (True) else: print (False)-False

two。 Find the domain name

Sometimes we have to find a phone number, email address, domain name, etc., from a large text document.

For example, suppose you have the following text:

^ ["Train (noun)"] (http://www.askoxford.com/concise_oed/train?view=uk). (definition-Compact OED) Oxford University Press. Retrieved 2008-03-18. ^ Atchison, Topeka and Santa Fe Railway (1948). Rules: Operating Department. P. 7. ^ [Hydrogen trains] (http://www.hydrogencarsnow.com/blog2/index.php/hydrogen-vehicles/i-hear-the-hydrogen-train-a-comin-its-rolling-round-the-bend/) ^ [Vehicle Projects Inc. Fuel cell locomotive] (http://www.bnsf.com/media/news/articles/2008/01/2008-01-09a.html) ^ Central Japan Railway (2006). Central Japan Railway Data Book 2006. P. 16. ^ ["Overview Of the existing Mumbai Suburban Railway"] (http://web.archive.org/web/20080620033027/http://www.mrvc.indianrail.gov.in/overview.htm). _ Official webpage of Mumbai Railway Vikas Corporation_. Archived from [the original] (http://www.mrvc.indianrail.gov.in/overview.htm) on 2008-06-20. Retrieved 2008-12-11.

You need to find all the domain names here-- askoxford.com;bnsf.com;hydrogencarsnow.com;mrvc.indianrail.gov.in;web.archive.org-- from the text above

What should I do?

Match=re.findall (r'http (s::)\ /\ / (www. | ww2.) ([0-9amurz.Amurz -] *.\ w {2mer3})' String) for elem in match: print (elem)-(':', 'www.',' askoxford.com') (':', 'www.',' hydrogencarsnow.com') (':', 'www.' 'bnsf.com') (':', 'web.archive.org') (':', 'www.',' mrvc.indianrail.gov.in') (':', 'www.',' mrvc.indianrail.gov.in')

The or operator is used here, and match returns the tuple, preserving the schema part of ().

3. Find the email address:

The following regular expression is used to find an e-mail address in long text.

Match=re.findall (r'([\ w0-9mur._] + @ [\ w0-9mur.] + [\ w0-9] {2jue 3})', string)

These are advanced examples, and the information provided is enough to help you understand them.

Thank you for your reading, the above is the content of "what is the use of Python regular expression". After the study of this article, I believe you have a deeper understanding of what the use of Python regular expression is, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report