Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use regular expressions to find entries that do not contain specific strings

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly shows you "how to use regular expressions to find entries that do not contain specific strings". The content is easy to understand and clear, hoping to help you solve your doubts. Let me lead you to study and learn this article "how to use regular expressions to find entries that do not contain specific strings".

People who do log analysis often need to deal with thousands of log entries. In order to find a specific pattern of data in a large amount of data, it is often necessary to write a lot of complex regular expressions. For example, enumerate entries that do not contain a particular string in the log file, find entries that do not start with a particular string, and so on.

Use negative foresight

There are the concepts of Lookahead and Lookbehind in regular expressions, which vividly describe the matching behavior of regular engines. It is important to note that the front and back of regular expressions are a little different from what we generally understand. For a paragraph of text, we usually refer to the direction at the beginning of the text as "front" and the direction at the end of the text as "back". But for the regular expression engine, because it parses from the head to the tail of the text (the parsing direction can be controlled by the regular option), it is called "front" for the end of the text, because at this time, the regular engine has not yet reached that block, while for the head direction of the text, it is called "back", because the regular engine has already walked through that area. As shown in the following figure:

The so-called foresight is that when the regular expression matches a certain character, take a look at the "unparsed text" in advance to see if it matches / does not match the matching pattern, while looking back, it is to see if the text already matched by the regular engine matches / does not match the matching pattern. Matching and non-matching patterns are also called positive matching and negative matching.

Modern advanced regular expression engines generally support foresight, but the backtracking support is not very extensive, so we use negative foresight here to achieve our needs.

Realize

Test data:

The copy code is as follows:

2009-07-07 04:38:44 127.0.0.1 GET / robots.txt

2009-07-07 04:38:44 127.0.0.1 GET / posts/robotfile.txt

2009-07-08 04:38:44 127.0.0.1 GET /

For example, with these simple log entries above, we want to achieve two goals:

1. Filter out the data of No. 8.

two。 Find entries that do not contain robots.txt strings (filter out any entries that contain robots.txt in the Url).

The grammar of foresight is:

?! Match pattern) Let's first achieve the first goal-matching entries that do not start with a specific string.

Here we want to exclude a continuous string, so the matching pattern is very simple, which is 2009-07-08. The implementation is as follows:

The copy code is as follows:

^ (! 2009-07-08). *? $

With Expresso, we can see that the results did filter out the data on No. 8.

Next, let's achieve the second goal-to exclude entries that contain specific strings.

According to the way we wrote above, I drew a ladle according to the gourd:

The copy code is as follows:

^. *? (?! robots\ .txt). *? $

This regular paragraph is described in vernacular as follows: start with any character, then do not follow the robots.txt consecutive string, and then follow any character at the end of the string.

Run the test and find that:

It didn't achieve the effect we wanted. Why is that? Let's add two capture groups to the regular expression above to debug:

The copy code is as follows:

^ (. *?) (?! robots\ .txt) (. *?) $

Test results:

We see that the first grouping matches nothing, while the second grouping matches the entire string. Let's go back and analyze the regular expression just now. In fact, when the regular engine parses to region A, the forward-looking work for area B has already begun. At this point, it is found that the match is successful when the An area is Null -. * is allowed to match empty characters, and the forward-looking condition is satisfied, and the An area is followed by the "2009" string instead of robots. Therefore, the whole matching process successfully matches all entries.

After analyzing the reason, we modify the above-mentioned rule, and will. *? Move into the forward-looking expression as follows:

The copy code is as follows:

^ (?!. *? robots). * $

Test results:

Complete

The implementation method of using regular implementation not to include a string in php

Preg_match ("/ ^ ((?! abc).) * $/ is", $str)

Complete code example

The copy code is as follows:

$str = "dfadfadf765577abc55fd"

Pattern_url = "/ ^ ((?! abc).) * $/ is"

If (preg_match ($pattern_url, $str))

{

Echo "does not contain abc!"

}

Else

{

Echo "contains abc!"

}

The result is: false, containing abc!

Also matches a regular expression that contains the string "abc" and does not contain the string "xyz":

Preg_match ("/ (abc) [^ ((?! xyz).) * $] / is", $str)

This method is effective, and I use it as follows:

(?: (?!). |\ n) *? / / matches a string that is not contained

However, in the end, it is found that the method is extremely inefficient and can be considered for use in the processing of very short words (there are more than a dozen words to match the same part of the regular formula, or at most dozens of words). However, when it is used for the analysis of large articles or many places need to change the matching time should not be used, consider using other methods instead (such as: first parsing out the text to match the regular formula Then verify whether there is a piece of text), regular expressions are not very effective for matching text segments that do not contain specific strings.

The above is all the content of the article "how to use regular expressions to find entries that do not contain specific strings". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report