Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the super-useful Java regular expressions?

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article introduces the knowledge about "what are the super easy Java regular expressions". In the actual case operation process, many people will encounter such difficulties. Next, let Xiaobian lead you to learn how to deal with these situations! I hope you can read carefully and learn something!

1. Convert URLs to links

Suppose there are one or more URLs in the text, none of which are HTML anchor elements and therefore cannot be clicked. To automatically convert urls to links, you first need to find the URLs and then tag each URL with the href attribute pointing to the URL…:

const str = "Visit https://en.wikipedia.org/ for moreinfo. ";str.replace(/\b(https?| ftp|file):\/\/\S+[\/\w]/g, '$&');// => "Visit https://en.wikipedia.org/ for more info. "

Note: Be careful when using this regular expression because it won't match URLs ending in punctuation and may not match more complex URLs.

Here's how it works:

\b Matches at locations called "word boundaries."

(https?| ftp| file) matches the characters "https", or "http", or "ftp", or "file"

: Literally matches colon characters

\/Literally matches forward slash characters

\S Matches a single character other than white space

+ Matches the previous item one or more times

[\/\w] matches forward slash or word characters. Without this, the regular expression will match any punctuation at the end of the URL.

g Commands the regular expression engine to match all occurrences rather than stopping after the first match

$& In the second argument to replace(), insert the matching substring into the replacement string

2. Delete duplicate words

It is not uncommon for articles and tutorials to contain unnecessary repetition of words. Even professional writers have to proofread these errors. A simple search for "the" on Google News reveals repeated "the" in articles by hundreds of prominent news organizations. Fortunately, regular expressions can fix this problem with one line of code:

const str = "This thissentence has has double words. ";str.replace(/\b(\w+)\s+\1\b/gi, '$1');//=> "This sentence has double words. "

\b Match at the "word boundary" position (followed or preceded by ASCII letters, numbers, or underscores).

\w Matches word characters (ASCII letters, numbers, or underscores)

+ Matches the previous item one or more times

\s matches white space characters

+ Match the previous term one or more times to detect repeated words with multiple white space characters

\1 Is the backreference and matching text the same as the matching text in the first pair of brackets

\b Match word boundary

g Commands the regular expression engine to match all occurrences rather than stopping after the first match

i Make search case insensitive (ignore case differences)

$1 Insert the matching text in the first pair of parentheses in the second argument of replace()

3. Remove invalid characters from file names

When providing a file for download, certain characters should not be included in the file name. For example, in Windows operating systems, the following characters are not valid in file names and should be deleted:

(greater than)

: (colon)

" (double quotes)

/ (forward slash)

\ (backslash)

| (vertical line)

? (Question mark)

* (asterisk)

Deleting invalid characters using regular expressions is very simple. Consider an example:

const str ="https://en.wikipedia.org/";str.replace(/[|:"*?\\/]+/ g,''); // =>"httpsen.wikipedia.org"

[], called a character class, matches a character between square brackets. Therefore, by placing all invalid characters in them and adding a global (g) flag to the end of the regular expression, you can effectively remove these characters from the string.

Note that in character classes, the backslash has a special meaning and must be escaped with another backslash:\\. Operator + repeating character class to replace invalid character sequences at the same time, which helps improve performance. It can be omitted without affecting the outcome.

Remember that the second argument to the replace() method must be an empty string unless you want to replace an invalid character with another character.

There are also several reserved names that are used internally by Windows for various tasks and are not allowed as file names. The reserved names are as follows:

CON, PRN, AUX, NUL, COM1, COM2, COM3, COM4, COM5,COM6, COM7, COM8, COM9, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, andLPT9

To remove reserved names, run the following code:

str.replace(/^(CON|PRN|AUX|NUL|COM1|COM2|COM3|COM4|COM5|COM6|COM7|COM8|COM9|LPT1|LPT2|LPT3|LPT4|LPT5|LPT6|LPT7|LPT8|LPT9)$/i,'file');

Basically, this code commands the regular expression engine to replace the characters in str if they are composed of vertical characters (|(One of the separate words). You cannot use an empty string as the second argument in this example because the file has no name.

Note that if the string contains any additional characters, it is not replaced. For example,"con" is replaced, but "concord" is not, which is a valid filename. This is achieved by using ^and $in regular expressions.^ Matches the beginning of a string to ensure that no other characters precede the string being matched.$ Matches the end of a string.

You can also write the regular expression in a more compact way using character classes:

str.replace(/^(CON|PRN|AUX|NUL|COM[1-9]|LPT[1-9])$/i,'file');

[1 - 9] Matches numbers 1-9

4. Replace multiple spaces with a single space

When the web page is rendered, repeated white space characters appear as a single white space. However, there are times when it is necessary to clean up user input or other data and replace repeated white space with a single white space. Here's how to do this using regular expressions:

const str = " My opinions may have changed, but not the fact that I'mright. "; // Ashleigh Brilliantstr.replace(/\s\s+/g,' ');// => " My opinions may have changed, but not the fact that I'mright. "

This regular expression contains only two metacharacters, one operator, and one tag:

\s matches a single white space character, including ASCII spaces, tabs, line feeds, carriage returns, vertical tabs, and form line feeds

\s matches a single white space character again

+ Matches the previous item one or more times

g Commands the regular expression engine to match all occurrences rather than stopping after the first match

The result is to replace all white space characters that are repeated at least twice. Note that the results in the example above still have a white space character at the beginning that should be removed. To do this, simply add the trim() function to the end of the statement:

str.replace(/\s\s+/g, '').trim();// => "My opinions may have changed, but not the fact thatI'm right. "

Remember that this code replaces any type of white space character with a space (U+0020) character, including ASCII spaces, tabs, newlines, carriage returns, vertical tabs, and form newlines. Therefore, if carriage returns follow tabs, they are replaced by a space. If this is not the purpose and you want to replace only the same type of white space, you can replace it with the following code:

str.replace(/(\s)\1+/g,'$1').trim();

\1 is a backreference and matches the matching character in the first pair of parentheses (\s). They can be replaced with $1 in the second argument of replace(), which inserts the characters matching in parentheses.

5. Find sentences that contain specific words

Suppose you want to match all sentences in text that contain a particular word. Either you want to highlight these sentences in search results, or you want to remove them from your text. Regular Expression/[^.!?]*\ bword\b[^.!?]*.?/ gi can fulfill the above requirements. Here's how it works:

const str = "The apple treeoriginated in Central Asia. It is cultivated worldwide. Apple matures in latesummer or autumn. "; // en.wikipedia.org/wiki/Apple// find sentences that contain the word"apple" str.match(/[^.!?]*\ bapple\b[^.!?]*.?/ gi);// => ["The apple treeoriginated in Central Asia. ", "Apple matures in late summer orautumn. "]

This regular expression is explored step by step:

[^.!?] Matches all except., !,And? characters other than

* Zero or more sequences matching the previous term

\b Match at the "word boundary" position (followed or preceded by ASCII letters, numbers, or underscores).

apple matches characters literally (because it is case-sensitive, the i tag is added to the end of the regular expression)

\b Match word boundary

[^.!?] Matches all except.,!, And? characters other than

* Zero or more sequences matching the previous term

Matches all characters except newline

? Matches an item with zero or one occurrence of the previous item

g Commands the regular expression engine to match all occurrences rather than stopping after the first match

i Make search case insensitive

Tip: Use Bit (Github) to "get" components from the code base and build the UI component library step by step. Use this UI component library with teams for consistent UI, rapid development, and unlimited collaboration. Easily import reusable components into any project, use and update to synchronize changes across repositories.

Example: Searching for React components shared on bit.dev

6. Limit user input to alphanumeric characters

A common task when developing web pages is to limit user input to alphanumeric characters (A-z, A-z, and 0-9). Using regular expressions to accomplish this task is very simple: use a character class to define the range of characters allowed, and then add a quantifier to it to specify the number of characters that can be repeated:

const input1 = "John543"; const input2 = ":-)";/^[A-Z0-9]+$/i.test(input1); // → true /^[A-Z0-9]+$/i.test(input2); // →false

Note: This regular expression only applies to English and does not match accented letters or letters in other languages.

Here's how it works:

^Matches the beginning of a string. Make sure there are no other characters before the string you want to match.

[A-Z0-9] Matches characters between A and Z, or characters between 0 and 9. Because this is case-sensitive, you can add the i tag to the end of the regular expression. Alternatively, unlabeled [A-Za-z0 -9] may be used.

+ Matches the previous entry one or more times, so the entry must have at least one non-blank alphanumeric character; otherwise, the match fails. To make a field optional, you can use the * quantifier, which must match the preceding term more than 0 times.

$matches the end of a string.

Taking the time to master regular expressions is definitely a worthwhile investment because it will help solve problems encountered while coding.

"Super easy to use Java regular expressions what" content is introduced here, thank you for reading. If you want to know more about industry-related knowledge, you can pay attention to the website. Xiaobian will output more high-quality practical articles for everyone!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report