Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use python to extract Chinese and English strings

2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

In this article, the editor introduces in detail "how to use python to extract strings in Chinese and English". The content is detailed, the steps are clear, and the details are handled properly. I hope this article "how to use python to extract strings in Chinese and English" can help you solve your doubts.

one。 Sub function in re

Using Python's re module, the re module provides re.sub to replace matches in strings.

Re.sub (pattern, repl, string, count=0)

Parameter description:

Pattern: regular heavy pattern string

Repl: the string to be replaced

String: the original string to be used for replacement

Count: the maximum number of substitutions after pattern matching. If omitted, the default is 0, which means all matches are replaced.

1.1 extract Chinese

Think of it this way: we can just replace characters that are not in Chinese with empty ones.

For example

Import restr = "world" str = re.sub ("[A-Za-z0-9,.]" Str) print (str) output: children of God are singing

1.2 extract English

Import restr = "world" str = re.sub ("[u4e00-u9fa5-9,.]" , ", str) print (str) output: helloHworld

1.3 extract numbers

Import restr = "world" str = re.sub ("[A-Za-zu4e00-u9fa5,.]]" , str) print (str) output: 123 II. Findall function in re

Finds all the substrings matched by the regular expression in the string and returns a list, or an empty list if no matches are found.

The syntax format is:

Findall (string [, pos [, endpos]])

Parameters:

String: the string to match.

Pos: optional parameter that specifies the starting position of the string. The default is 0.

Endpos: optional parameter that specifies the end position of the string, which defaults to the length of the string. Find all the numbers in the string:

Extension: there are match and search in the rule, they are matched once, findall

Match all. For more information, please see the rookie tutorial.

2.2 extract English

Popular writing method

Import string# provides the lowercase letter dd = "Child of God hello sings in H, world" # prepares the English character temp= "" letters=string.ascii_lowercase# contains the lowercase letter of Amurz for word in dd:#for loop takes out a single word if word.lower () in letters:# to determine whether the English temp+=word# is added to make up the English word print (temp) output: helloHworld

Regular pattern

# A-Za-zimport redd = "world out of 123jianghu hello" result = '.join (re.findall (r' [A-Za-z]', dd)) print (result) output: helloHworld

2.3 extract numbers

# 0-9 pay attention to not being able to precede this number, otherwise he will even count import redd = "hello of God 123 is singing H songs., world" result = '.join (re.findall (r' [0-9]', dd)) print (result) output: 123 III. Compile function in re

The compile function is used to compile regular expressions and generate a regular expression (Pattern) object for use by other functions.

The syntax format is:

Re.compile (pattern [, flags])

Parameters:

Pattern: a regular expression in string form

Flags: optional, indicating matching pattern, such as ignoring case, multiline mode, etc. The specific parameters are:

Re.I ignores case

Re.L represents the special character set w, W, B, s, S depending on the current environment

Re.M multiline mode

Re.S is. And any character, including a newline character (. Does not include newline characters)

Re.U represents a special character set w, W, B, d, D, S that depends on the Unicode character attribute database.

Re.X ignores spaces and comments followed by # to increase readability

After reading this, the article "how to use python to extract strings in Chinese and English" has been introduced. If you want to master the knowledge of this article, you still need to practice and use it before you can understand it. If you want to know more about related articles, welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report