Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the syntax of SQL Anywhere regular expressions

2025-03-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article focuses on "what is the syntax of SQL Anywhere regular expressions". Interested friends may wish to have a look at it. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "what is SQL Anywhere regular expression grammar?"

Regular expression syntax

Regular expressions are supported through SIMILAR TO and REGEXP search criteria and the REGEXP_SUBSTR function. For SIMILAR TO, the regular expression syntax conforms to the ANSI/ISO SQL standard. For REGEXP and REGEXP_SUBSTR, the syntax and support for regular expressions conform to Perl 5.

REGEXP and SIMILAR TO use regular expressions to match strings, while REGEXP_SUBSTR uses regular expressions to match substrings. To implement the substring matching behavior of REGEXP and SIMILAR TO, specify wildcards on either side of the pattern you are trying to match. For example, REGEXP'. * car.*' matches car, carwash, and vicar. Alternatively, you can rewrite the query to use the REGEXP_SUBSTR function. See the REGEXP_SUBSTR function [String].

Regular expressions matched by SIMILAR TO are not case-sensitive or accent-sensitive. REGEXP and REGEXP_SUBSTR are not affected by the database's sensitivity to stress and case. See differences in LIKE, REGEXP, and SIMILAR TO: character comparisons.

Regular expressions: metacharacter

Metacharacters are symbols or characters that have a special meaning in regular expressions.

The handling of metacharacters varies depending on the following:

Whether regular expressions are used with SIMILAR TO or REGEXP search criteria, or with the REGEXP_SUBSTR function

Whether metacharacters are inside the character class of regular expressions

Before proceeding, you should understand the definition of character classes. A character class is a set of characters enclosed in square brackets based on which the characters in the string are matched. For example, in SIMILAR TO's ab [1-9] 'syntax, [1-9] is a character class that matches a number in the range 1 to 9, including 1 and 9. Metacharacters are handled differently in regular expressions, depending on whether metacharacters are placed inside the character class. Specifically, when metacharacters are placed inside the character class, most metacharacters are treated as regular characters.

For SIMILAR TO (SIMILAR TO only), the metacharacters *,?, +, _, |, (,), {must be escaped within the character class.

To include a minus sign (-), delimited character (^), or square bracket (]) character in a character class, you must escape the character.

The following is a list of supported regular expression metacharacters. When SIMILAR TO, REGEXP, and REGEXP_SUBSTR use metacharacters, almost all metacharacters are treated the same way:

Character other information [and] left and right square brackets are used to specify the character class. A character class is a set of characters to be matched by.

Except for hyphens (-) and delimited characters (^), the metacharacters and quantifiers specified in the character class (such as * and {m}, metacharacters and quantifiers, respectively) have no special meaning and can be operated as actual characters.

SQL Anywhere also supports subcharacter classes, such as POSIX character classes. See regular expressions: special subcharacter classes.

* asterisks can be used to match characters 0 or more times. For example, REGEXP'. * abc' matches a string that ends with abc and begins with any prefix. Therefore, aabc, xyzabc, and abc match, but bc and abcc do not. ? Question marks can be used to match characters 0 or 1 times. For example, 'colou?r' matches color and colour. The plus sign can be used to match a character one or more times. For example, 'bre+' matches bre and bree, but not br. -you can use a hyphen to represent a range in a character class. For example, REGEXP'[a mure] 'matches a, b, c, d, and e.

For more information about how REGEXP and SIMILAR TO evaluate ranges, see differences in LIKE, REGEXP, and SIMILAR TO: character comparisons.

The%% sign can be used with SIMILAR TO to match any number of characters.

The percent sign is not considered a metacharacter used by REGEXP and REGEXP_SUBSTR. When specified, it matches the percent sign (%).

_ (underscore character) can be used with an underscore with SIMILAR TO to match a single character.

Underscores are not considered metacharacters used by REGEXP and REGEXP_SUBSTR. When specified, it matches the underscore (_).

| the pipe symbol is used to specify the alternative mode to be used when matching strings. In a row of patterns separated by vertical bars, the vertical bar is interpreted as OR, and the matching process starts with the leftmost pattern and stops when the first match is found. Therefore, you should list the patterns in descending order of priority. You can specify any number of alternative modes. (and) when left and right parentheses are used for each combined part of a regular expression, they are metacharacters. For example, (ab) * matches duplicates of zero or more ab. As with mathematical expressions, you use combinations to control the order in which parts of regular expressions are evaluated. {and} when left and right curly braces are used to specify quantifiers, they are metacharacters. Quantifiers specify the number of times a pattern must be repeated to form a match. For example:

{m} matches a character exactly m times. For example, '519-[0-9] {3}-[0-9] {4}' matches a phone number in the 519 area code (assuming the data is formatted as defined in the syntax).

{m,} matches a character at least m times. For example,'[0-9] {5,} 'matches any string that contains five or more numbers.

{mdirection n} matches a character at least m times, but not more than n times. For example, SIMILAR TO'_ {5pm 10} 'matches any string of 5 to 10 characters (inclusive).

The\ backslash is used as an escape character for metacharacters. It can also be used to escape non-metacharacters. For REGEXP and REGEXP_SUBSTR, when delimiting is outside the character class, delimiting matches the beginning of the string. For example,'^ [hc] at' matches hat and cat, but only at the beginning of the string.

When used within a character class, the following behavior applies:

REGEXP and REGEXP_SUBSTR when delimited is the first character in the character class, it matches any character other than the character in the character set. For example, REGEXP'[^ abc] 'matches any character except a, b, or c.

If delimiting is not the first character in square brackets, it matches delineation. For example, REGEXP_SUBSTR'[a-e ^ c] 'matches a, b, c, d, e, and ^.

SIMILAR TO for SIMILAR TO, delimiting is treated as a minus operator. For example, SIMILAR TO'[a-e ^ c] 'matches a, b, d, and e.

When used with REGEXP and REGEXP_SUBSTR, matches the end character of the string. For example, SIMILAR TO 'cat$' matches cat, but not catfish.

When used with SIMILAR TO, it matches the question mark.

. When used with REGEXP and REGEXP_SUBSTR, matches any single character. For example, REGEXP 'a.cd' matches any string with four characters that begins with an and ends with cd.

When used with SIMILAR TO, it matches the period.

Use colons in the character set to specify subcharacter classes For example,'[: alnum:]]'. Regular expressions: special subcharacter class

Subcharacter classes are special character classes embedded in larger character classes. In addition to custom character classes where character sets to match are defined, for example, [abxq4] limits matching character sets to a, b, x, Q, and 4, SQL Anywhere also supports subcharacter classes, such as most POSIX character classes. For example, [[: alpha:]] represents a collection of all uppercase and lowercase letters.

The REGEXP search criteria and the REGEXP_SUBSTR function support all the syntax conventions in the following table, but SIMILAR TO search expressions do not. The conventions supported by SIMILAR TO have a Y in the SIMILAR TO column.

In REGEXP, when using the REGEXP_SUBSTR function, you can use delimiting to distinguish between subcharacter classes. For example, [: ^ alpha:]] matches a collection of all characters except alphabetic characters.

Other information in the subcharacter class SIMILAR TO [: alpha:] matches uppercase and lowercase alphabetic characters in the current category. For example,'[0-9] {3} [[: alpha:]] {2} 'matches three numbers followed by two letters. Y [: alnum:] matches the numeric, uppercase, and lowercase alphabetic characters in the current category. For example,'[[: alnum:]] + 'matches a string that contains one or more letters and numbers. Y [: digit:] matches the number in the current category. For example,'[[: digit:] -] + 'matches a string that contains one or more numbers or hyphens. Similarly,'[^ [: digit:] -] + 'matches a string that contains one or more characters that are not numbers or dashes. Y [: lower:] matches the lowercase characters in the current category. For example,'[[: lower:]] 'does not match A because An is uppercase. Y [: space:] matches a single space (''). For example, the following statement searches Contacts.City to find any city whose name is two words:

SELECT City FROM Contacts WHERE City REGEXP'. * [[: space:]]. *; Y [: upper:] matches the uppercase characters in the current category. For example,'[: upper:] ab] 'matches one of the following: any uppercase letter, a, or b. Y [: whitespace:] matches a white space character, such as space, tab, page feed, and carriage return. Y [: ascii:] matches any seven-bit ASCII character (a sequential value between 0 and 127).

[: blank:] matches a white space or horizontal tab.

[[: blank:]] is equivalent to [\ t].

[: cntrl:] matches ASCII characters (control characters) with sequence values less than 32 or character values of 127s. Control characters include line breaks, page breaks, backspace characters, and so on.

[: graph:] matches printed characters.

[[: graph:]] is equivalent to [: alnum:] [: punct:].

[: print:] matches printed characters and spaces.

[[: print:]] is equivalent to [: graph:] [: whitespace:].

[: punct:] matches one of the characters:! "# $% &'() * +, -. /:; @ [\] ^ _ `{|} ~.

[: punct:] subcharacter classes cannot include non-ASCII punctuation characters available in the current collection.

[: word:] matches letters, numbers, or underscore characters in the current category.

[[: word:]] is equivalent to [[: alnum:] _].

[: xdigit:] matches the characters in the character class [0-9A-Fa-f].

Regular expressions: other syntax conventions supported

The REGEXP search criteria and the REGEXP_SUBSTR function support the following syntax conventions, and they assume that the backslash is an escape character. SIMILAR TO search expressions do not support these conventions.

The regular expression syntax name and meaning\ 0xxx matches characters with a value of\ 0xxx, where xxx is any sequence of octal numbers and 0 is zero. For example,\ 0134 matches the backslash. \ a matches alarm characters. \ An is used outside the character set to match the beginning of the string.

Equivalent to ^ used outside the character set.

\ b matches backspace characters. \ B matches backslash characters (\). \ cX matches named control characters. For example,\ cZ stands for ctrl-Z. \ d matches a number in the current category. For example, the following statement searches Contacts.Phone to find all phone numbers that end with 00:

SELECT Surname, Surname, City, Phone FROM Contacts WHERE Phone REGEXP'\ d {8} 00'

\ d can be used either inside or outside the character class, which is equivalent to [: digit:].

\ D matches any character other than a number. Its effect is just the opposite of\ d.

\ D can be used either inside or outside the character class, which is equivalent to [^ [: digit:]].

Please be careful when using non-shorthand in square brackets. [\ D\ S] is not the same as [^\ d\ s]. The latter matches any character other than a number or space. So it matches x, but not 8. The former matches any character that is not a number or a space (one of two conditions is met). Because a number is not a space and a space is not a number, [\ D\ S] can match any character, number, space, or other character.

\ e matches escape characters. \ E stops the ability to treat metacharacters as non-metacharacters initiated by\ Q.

For a list of regular expression metacharacters, see regular expressions: metacharacters.

\ f matches the feed character. \ nmatches the newline character. \ Q treats all metacharacters as non-metacharacters until\ E is encountered. For example,\ Q [$\ E is equivalent to\ [\ $.

For a list of regular expression metacharacters, see regular expressions: metacharacters.

\ r match carriage return. \ s matches a space or character that is considered a white space. For example, the following statement returns all product names with at least one space in the name from the Products.ProductName:

SELECT Name FROM Products WHERE Name REGEXP'. *\\ s.girls'

\ s can be used either inside or outside the character class, which is equivalent to [: whitespace:]. See regular expressions: special subcharacter classes.

\ s matches non-white space characters. It works the opposite of\ d and is equivalent to [^ [: whitespace:]].

\ s can be used either inside or outside the character class. See regular expressions: special subcharacter classes.

Please be careful when using non-shorthand in square brackets. [\ D\ S] is not the same as [^\ d\ s]. The latter matches any character other than a number or space. So it matches x, but not 8. The former matches any character that is not a number or a space (one of two conditions is met). Because a number is not a space and a space is not a number, [\ D\ S] can match any character, number, space, or other character.

\ t matches horizontal tabs. \ v matches vertical tabs. \ w matches alphabetic characters, numbers, or underscores in the current category. For example, the following statement returns all last names that are exactly seven alphanumeric characters from Contacts.Surname:

SELECT Surname FROM Contacts WHERE Surname REGEXP'\\ w {7}'

\ w can be used either inside or outside the character class. See regular expressions: special subcharacter classes.

Equivalent to [[: alnum:] _]..

\ W matches any character other than alphabetic characters, numbers, or underscores in the current category. It works the opposite of\ w and is equivalent to [^ [: alnum:] _].

This regular expression can be used both inside and outside the character class. See regular expressions: special subcharacter classes.

\ xhh matches characters with a value of 0xhh, where hh is up to two hexadecimal digits. For example,\ x2D is equivalent to a hyphen.

Equivalent to\ x {hh}.

\ x {hhh} matches characters with values of 0xhhh, where hhh is up to three hexadecimal digits. \ z and\ Z match the position at the end of the string (not the character).

Equivalent to $.

Regular expressions: assertion

Asserts whether the test condition is true and affects where the match begins in the string. The assertion does not return characters; the assertion pattern is not included in the final match. The REGEXP search criteria and the REGEXP_SUBSTR function support these assertion patterns. SIMILAR TO search expressions do not support these conventions.

Lookahead and lookbehind assertions are useful for REGEXP_SUBSTR when trying to split a string. For example, you can return a list of street names (without street numbers) in the Address column of the Customers table by executing the following statement:

In SELECT REGEXP_SUBSTR (Street,'(? a *) a', (? > a *) matches (and consumes) aa, not just leading a. Therefore, the calculation of 'aa' REGEXP' (? > a*) a' is false.

(?: pattern) non-capture blocks this syntax is functionally equivalent to pattern and is provided for compatibility.

For example, in 'bb' REGEXP' (?: B *) b', (?: B *) matches (and consumes) bb. However, unlike the ownership local subexpression, the last b in the bb is discarded to allow the entire match to succeed (that is, to match the b found outside the non-capture block).

Similarly,'a (?: bc | b) c 'matches abcc and abc. When you match an abc, the last c in the bc is backtraced so that you can use c outside the group to make the match successful.

(? # text) is used for comments. The contents of the text will be ignored.

Regular expression example

The following table shows examples of the use of regular expressions. All examples apply to REGEXP, and some examples also apply to SIMILAR TO (as noted in the [example] column). The results vary depending on the search criteria you use for the search. For examples using SIMILAR TO, the results also vary depending on whether they are case-sensitive and accent-sensitive.

For a comparison of how REGEXP and SIMILAR TO handle matches and calculated ranges, see LIKE, REGEXP, and SIMILAR TO search criteria.

Note that if you use these examples in a text string (for example,'. +. +\. +), you should use a double backslash

Example matches sample credit card number (REGEXP only):

Visa:

4 [0-9] {3}\ s [0-9] {4}\ s [0-9] {4}

MasterCard:

5 [0-9] {3}\ s [0-9] {4}\ s [0-9] {4}

American Express:

37 [0-9] {2}\ s [0-9] {4}\ s [0-9] {4}

Discover:

6011\ s [0-9] {4}\ s [0-9] {4}

Match (Visa): 4123 6453 2222 1746

Mismatch (Visa):

3124 5675 4400 4567, 4123-6453-2222-1746

Similarly, MasterCard matches a set of 16-digit numbers, starting with 5, with a space between a subset of every four-digit number. American Express and Discover are the same, but must start with 37 and 6011, respectively.

Date (both REGEXP and SIMILAR TO apply):

([0-2] [0-9] | 30 | 31) / (0 [1-9] | 1 [0-2]) / [0-9] {4}

Match: 31Universe 04Compare 1999, 15Universe 12Compare 4567

Non-matching: 31Compact 4Compact 1999, 31-4-99, 1999-04-19, 42Accord 67Unix 25456

Windows absolute path (REGEXP only):

([A-Za-z]:\)\ [[: alnum:] [: whitespace:]! "# $% &'() +, -.\; = @\ [\] ^ _ `{} ~.] *

Match:\\ server\ share\ file

Mismatch:\ directory\ directory2, / directory2

Email address (REGEXP only):

[[: word:]\ -.] + @ [[: word:]\ -.] +. [: alpha:] {2jue 3}

Match: abc.123@def456.com, _ 123@abc.ca

Mismatch: abc@dummy, ab*cd@efg.hijkl

Email address (REGEXP only):

. + @. +\.. +

Match: * @ qrstuv@wxyz.12345.com, _ _ 1234 ^% @ @ abc.def.ghijkl

Mismatch: abc.123.*&ca, ^% abcdefg123

HTML hexadecimal color code (both REGEXP and SIMILAR TO apply):

[A-F0-9] {6}

Match: AB1234, CCCCCC, 12AF3B

Mismatch: 123G45, 12-44-CC

HTML hexadecimal color code (REGEXP only):

[A-F0-9] {2}\ s [A-F0-9] {2}\ s [A-F0-9] {2}

Match: AB 11 00, CC 12 D3

Mismatch: SS AB CD, AA BB CC DD, 1223AB

IP address (REGEXP only):

((2 (5 [0-5] | [0-4] [0-9])) | 1 ([0-9] [0-9]) | ([1-9] [0-9]) | [0-9]) {3} (2 (5 [0-5] | [0-4] [0-9])) | ([0-9] [0-9]) | [0-9]) | [0-9])

Match: 10.25.101.216

Mismatch: 0.0.0, 256.89.457.02

Java comments (REGEXP only):

/\ *.\ * / | / [^\ n] *

Matches the Java comment between / * and * /, or a line prefixed with / /.

Mismatch: axi1

Currency (REGEXP only):

(\ + | -)?\ $[0-9] *. [0-9] {2}

Match: $1.00,-$97.65

Mismatch: $1,1.00 $, $- 75.17

Positive, negative, and decimal values (REGEXP only):

(\ + | -)? [0-9] + (\. [0-9] +)?

Match: + 41,-412, 2, 7968412, 41, + 41.1,-3.141592653

Mismatch: + + 41, 41.1.19,-+ 97.14

Password (both REGEXP and SIMILAR TO apply):

[[: alnum:]] {4,10}

Match: abcd, 1234, A1b2C3d4, 1a2B3

Mismatch: abc, * ab12, abcdefghijkl

Password (REGEXP only):

[a-zA-Z]\ w {3pr 7}

Match: AB_cd, A1_b2c3, a123_

Mismatch: * & ^ g, abc, 1bcd

Phone number (both REGEXP and SIMILAR TO apply):

([2-9] [0-9] {2}-[2-9] [0-9] {2}-[0-9] {4}) | ([2-9] [0-9] {2}\ s [0-9] {4})

Matching: 519,883-6898, 519,888 6898

Mismatch: 8886898, 5198886898, 519 883-6898

Sentence (REGEXP only):

[A-Z0-9]. * (\. |\? |!)

Match: Hello, how are you?

Mismatch: i am fine

Sentence (REGEXP only):

[[: upper:] 0-9]. * [.?!]

Match: Hello, how are you?

Mismatch: i am fine

Social security number (both REGEXP and SIMILAR TO apply):

[0-9] {3}-[0-9] {2}-[0-9] {4}

Match: 123-45-6789

Mismatch: 123456789, 123456789, 1234-56-7891

URL (REGEXP only):

(http://)?www\.[a-zA-Z0-9]+\.[a-zA-Z]{2,3}

Match: http://www.sample.com, www.sample.com

Mismatch: http://sample.com, http://www.sample.comm

At this point, I believe you have a deeper understanding of "what is the syntax of SQL Anywhere regular expressions". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report