In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >
Share
Shulou(Shulou.com)06/01 Report--
In text mining, TSQL wildcard (Wildchar) appears to be insufficient, at this time, the use of "CLR+ regular expression" is a very good choice, regular expression seems to be very complex, but, proficient in regular expression metadata, you can skillfully and flexibly use regular expression to complete complex Text Mining work.
First, the special characters of regular expressions
1, commonly used metacharacters
Used to match specific characters (letters, numbers, symbols). Note that letters are case-sensitive:
. Match any character except the newline character
\ w: match letters or numbers or underscores or Chinese characters
\ s: match any blank character
\ d: match the number
\ b: the beginning or end of a matching word
^: match the beginning of the string
$: the end of the matching string
\ k: reference grouping name, for example:\ k, to refer to a grouping named group_name
\ group_number:group_number is the group number of the grouping, such as 1meme2jin3, etc., indicating that the grouping is referenced by the group number
2, repeat characters or grouping
Specify the number of times the previous character or group is repeated:
*: repeat zero or more times
+: repeat one or more times
?: repeat zero or once
{n}: repeat n times
{n,}: repeat n or more times
{nrecoery m}: repeat n to m times
3, grouping, escaping, branching, qualifier
These characters have specific meanings and uses:
(): use parentheses to indicate a grouping
: define the group name
< 和 >The string between is the group name
\: escape characters, transfer special characters to ordinary characters, for example:\ (, indicates parentheses "(", parentheses are no longer used as special characters
| |: branch, the relationship between expressions is "OR" |
[]: specify a list of qualified characters. A character must match any character in the list. Specify the list of matching characters in square brackets, for example: [aeiou] A character must be any one of the aeiou.
[^]: specifies the list of excluded characters. A character cannot be any character in the excluded list. The list of excluded characters is specified in square brackets, for example: [^ aeiou] A character cannot be any one of aeiou.
Second, group reference
Grouping is a subexpression specified in parentheses; grouping reference refers to the repeated use of subexpressions in expressions to make regular expressions more concise. By default, regular expressions automatically assign a group number to each packet, and the rule is: the group number starts at 1, from left to right, and the group number is increased by 1 (base-1), for example, the group number of the first packet is 1, the group number of the second packet is 2, and so on.
There are three forms of group definition:
(exp): automatically assigns the group number, referencing the group by the group number
(? exp): name a group, which is referenced by the group name
(?: exp): this group only matches the text in the current position. After the group, the group cannot be referenced. The group has no group name and no group number.
1, reference grouping by group number
Define an exp before the regular expression, and after the expression, you can refer to the grouped expression by the group number. The syntax for referencing the grouping is:\ group_number
For example:\ b (\ w+)\ b\ s +\ 1\ b, in this regular expression, there is only one grouping (\ w+), and the group number is 1. After that grouping, use\ 1 to refer to the grouping and replace\ 1 with the grouped subexpression, which is equivalent to:\ b (\ w+)\ b\ s+ (\ w+)\ b.
2, refer to the group by its name
In regular expressions, the grouping format that can be named is: (? exp), the grouping name is name, and the format of referencing the grouping through name is:\ k. The text matching behavior of the grouping is the same by referencing the grouping by the grouping name and group number.
For example:\ b (?\ w+)\ b\ s +\ 1\ b, later in the group, use\ k to reference the group and replace\ k with the grouped subexpression, which is equivalent to:\ b (\ w+)\ b\ s + (\ w+)\ b.
3, groups that cannot be referenced
(?: exp): groups defined using this syntax cannot be referenced and can only match text in the current location. Regular expressions do not automatically assign a group number to the group.
Third, assertion search
An assertion is a logical expression that matches successfully only if the expression is true. When the match succeeds, the text is returned without a prefix or suffix, that is, the assertion is used to find text before or after a particular "text". Four grammars for assertions:
(? = exp): matches the expression exp after the text and returns the expression before the position of exp
(? Is not an expression of exp
1, suffix matching
(? = exp): matches the expression exp after the text, returning the expression before the position of exp. Suffix match, similar to TSQL's'% ing'
For example, the regular expression:\ b\ w + (? = ing\ b)
Analysis: asserts that the suffix is ing and is the end of the word (\ b), matching the word ending in ing, but returning the front part of the word, the part before ing
For example, look for "I'm reading a book" and it matches "reading" because the character ends with ing, the regular expression returns read, and the text returned by the assertion does not contain a suffix.
2, prefix matching
(? Is not an expression of exp
3.1 for example, regular expression:\ b\ w + (?! ing\ b)
Analysis: do not match the word that ends in ing, look for "I am reading a book", return the text: I _ journal am _ a _ book
3.2 for example, regular expression: (?
Analysis: do not match the words that begin with re, look for "I am reading a book", and return the text: I _ journal am _ a _ book
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.