How to solve the ReDOS problem 04/15 Update SLTechnology News&Howtos

How to solve the ReDOS problem

2025-04-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >

Shulou(Shulou.com)05/31 Report--

How to solve the ReDOS problem, many novices are not very clear, in order to help you solve this problem, the following editor will explain for you in detail, people with this need can come to learn, I hope you can gain something.

Overview of 0x01

In fact, when I was doing a Code Review project, I found a problem with ReDos, but due to laziness, I didn't record it. I'm more free recently, so take some time to record it.

0x02 knowledge foreshadowing

The so-called ReDOS (Regular expression Denial of Service) regular expression denial of service attack. In fact, developers use regular expressions to verify the validity of the data entered by users. When the regular expressions written for verification are flawed or not rigorous, attackers can construct special strings to consume a lot of server system resources, causing server services to be interrupted or stopped.

What a regular expression is, in fact, we should all know, or have used, but there are some things to mention here is the engine of regular expressions. First of all, a concept is involved here: finite state automata.

(FSM "finite state machine" or FSA "finite state automaton") is a computing model abstracted for the study of computing processes with limited memory and some language classes. Finite state automata have a finite number of states, each state can be migrated to zero or more states, and the input string determines which state to perform the transition.

Finite state automata can also be divided into deterministic and uncertain automata, and uncertain finite state automata can be transformed into deterministic finite state automata. Whisper bb, that's not the point. The key is that regular expression engines are divided into two categories: one is called DFA (deterministic finite state automata), and the other is called NFA (non-deterministic finite state automata).

For both types of engines to work smoothly, there must be a regular formula and a text string, one pinched in the hand and the other eaten. DFA pinches the text string to compare the regular, sees a sub-regular, marks all the possible matching strings, then looks at the next part of the regular, and updates the annotation according to the new matching result. On the other hand, NFA compares the text with the regular formula, eats a character, compares it with the regular formula, and writes down the match: "it matches somewhere in a certain year, a certain month and a certain day!" And then move on. Once it doesn't match, spit out the character you just ate and spit it out one by one until you get back to the last match.

There is actually a fundamental difference between the two engines:

DFA for each character in the text string only needs to be scanned once, faster, but less features; NFA to eat characters over and over again, spit characters, slow, but rich in features, so it is widely used, today's main regular expression engines, such as Perl, Ruby, Python's re module, Java and .NET regex libraries, are NFA.

Only NFA supports features such as lazy and backreference

NFA is eager to ask for credit, so the leftmost regular first matches successfully, so he occasionally misses the best matching results; DFA is the "longest left regular first matching success".

NFA uses greedy quantifiers by default (see item 4)

NFA may fall into the trap of recursive calls and perform poorly.

Here I quote the examples of others, because I think it is very good.

Now there is a text: perlmanbook, a regular expression is / perl | perlman/, and we can see the matching results of this regular expression under the two engines.

First, NFA, which is regular-oriented, how to say, at this time we have the first sub-regular expression / perl/, and the engine scans the perlmanbook string, starting from the left, when the progress reaches perlmanbook, the first part of the perl has already matched with the first sub-regular expression, and when the engine progress scans to the m character, it is found that it does not match the first sub-regular expression. So spit out m, report that it successfully matches perl, don't care about anything else, and don't try the second sub-regular / perlman/.

In the case of the DFA engine, it is text-oriented and scans from the left for the regularity. when / p / is scanned, it finds that the match is on, and then continues to match down. when the first sub-regular / perl/ is fully matched, it will throw the regularity away and eat the / p / of the second sub-regular. I ate it well, because it matched again, so I went on to eat. Until he finished eating the regular formula, he reported up contentedly that he had successfully matched 'perlman'.

In other words, we can sum up that the NFA engine has to eat and spit characters over and over again, is slow, and may fall into recursive danger, resulting in extremely poor performance. So regular expressions using the NFA engine may have ReDOS problems.

Sample 0x03 vulnerabilities

The types of languages currently supported by regular engines:

Engine type programs DFAawk (most versions), egrep (most versions), flex, lex, MySQL, Procmail traditional NFAGNU Emacs, Java, grep (most versions), less, more, .NET language, PCRE library, Perl, PHP (all three sets of regular libraries), Python, Ruby, set (most versions), viPOSIX NFAmawk, Mortice Lern System's utilities, GUN Emacs (explicitly specified) DFA/NFA mix GNU awk, GNU grep/egrep, Tcl

Let's look at a demo.

Briefly explain what this ^ (a +) + $regular does:

^: match the beginning of the input string, that is, the match starts at the beginning of the str_list string below.

A: it is the character a, which is not explained here.

+: "+" (lazy) repeat one or more times, for example, "aaaaaaaa" matches all the an in the string.

$:\ $matches the end of a line or string.

So let's verify it through a piece of code.

According to the PPT of Regular Expression Denial of Service, it is summarized as follows:

Repetitive grouping construction

Repetition should appear in the group: 1. Repeat, 2. Alternating overlap

A sample regular expression for ReDOS appears.

(a +) +

([a-zA-Z] +) *

(a | aa) +

(a | a?) +

(. * a) {x} | for x > 10

Payload: "aaaaaaaaaaaaaaaaaaX"

Of course, we can also use python's re module to verify the existence of ReDOS, because Python's re module uses NFA's engine.

This PPT provides some regular expressions that may exist in ReDOS in business scenarios.

Let's show some defect patterns that will be used in real-world business scenarios.

Person Name:

Regex: ^ [a-zA-Z] + (([\'\,\.\ -] [a-zA-Z])? [a-zA-Z] *) * $

Payload: aaaaaaaaaaaaaaaaaaaaaaaaaaaa!

Java Classname

Regex: ^ (([amerz]) +.) + [Amurz] ([amelz]) + $

Payload: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!

Email Validation

Regex: ^ ([0-9a-zA-Z] ([-.\ w] * [0-9a-zA-Z]) * @ (([0-9a-zA-Z]) + ([-\ w] * [0-9a-zA-Z]) *\.) + [a-zA-Z] {2jue 9}) $

Payload: a@aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!

Multiple Email address validation

Regex: ^ [a-zA-Z] + (([[\'\,\.\ -] [a-zA-Z])? [a-zA-Z] *) *\ sroommate; (\ w [-. _\ w] *\ w @\ w [-. _\ w] *\ w\.\ w {2jue 3}) & ampgt;$ | ^ (\ w [-. _\ w] *\ w @\ w [-. _\ w] *\ w.\ w {2jue 3}) $

Payload: aaaaaaaaaaaaaaaaaaaaaaaa!

Decimal validator

Regex: ^\ d * [0-9] (|.\ d * [0-9] |) * $

Payload: 1111111111111111111111111!

Pattern Matcher

Regex: ^ ([a-z0-9] + ([\-a-z0-9] * [a-z0-9] +)?\.) {0,} ([a-z0-9] + ([\-a-z0-9] * [a-z0-9] +)?) {1 a-z0 63} (\. [a-z0-9] {2cogn7}) + $

Payload: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!

We can use the following python code to assist with verification.

#! / usr/bin/env python# coding: utf-8import reimport times1 = time.time () regex = re.compile ('^ [a-zA-Z] + (([\'\,\.\ -] [a-zA-Z])? [a-zA-Z] *) * $') stringing Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

Define separately

Stringing aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa Stringing aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

The final result

Aaaaaaaaaaaaaaaaa!0.0207aaaaaaaaaaaaaaaaaaa!0.0776

Before, there was a prefwaf in Master Ph's code-breaking question, which actually involves this problem. First of all, the source code is like this.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.