Introduction and implementation of PHP Automation White Box Audit Technology 07/11 Update SLTechnology News&Howtos

Introduction and implementation of PHP Automation White Box Audit Technology

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "the introduction and implementation of PHP automation white box audit technology". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Next let the editor to take you to learn "the introduction and implementation of PHP automation white box audit technology"!

Preface of 0x00

There are few PHP automation audit technical materials published in China. In contrast, there have been some excellent automation audit implementations abroad. For example, RIPS conducts a series of code analysis based on token streams. Traditional static analysis techniques such as data flow analysis and pollution propagation analysis are rarely used in the dynamic scripting language analysis of PHP, but they are the key points in the realization of white-box automation technology. Today, the author mainly introduces the recent research and implementation results, in the hope that more domestic security researchers will devote their energy to the meaningful field of PHP automation audit technology.

Basic knowledge of 0x01

There are many ways to implement automatic audit, such as directly using the regular expression rule library for location matching, this method is the simplest, but the accuracy is *. The most reliable idea is to combine the knowledge in the field of static analysis technology to design. The flow of general static analysis security tools is mostly in the form of the following figure:

The * thing to do in static analysis is to model the source code. In popular terms, the source code of the string is converted into an intermediate representation that is convenient for our subsequent vulnerability analysis, that is, a set of data structures that represent this code. The modeling work generally uses the methods in the field of compilation technology, such as lexical analysis to generate token, abstract syntax tree generation, control flow chart generation and so on. The pros and cons of modeling work directly affect the effect of subsequent pollution propagation analysis and data flow analysis.

Performing analysis is to combine security knowledge to analyze and deal with vulnerabilities in the loaded code. * static analysis tools need to generate judgment results in order to end this phase of work.

The thought of 0x02 realization

After a period of efforts, the author and his partners have also roughly realized a static analysis tool for automation. The specific implementation idea is the use of static analysis technology, if you want to in-depth understanding of the implementation of ideas, you can read the previous article. In the tool, the automated audit process is as follows:

First of all, load all the PHP files in the project directory to be scanned by the user, and distinguish these PHP files. If the scanned PHP file is Main file, that is, the PHP file that really handles the user's request, then analyze the vulnerabilities of this type of file. If it is not a Main file type, such as the class definition in the PHP project, the tool function definition file, skip the analysis.

Secondly, collect the global data, focus on the definition of the class information in the project that needs to be scanned, such as the file path of the class, the properties in the class, the methods and parameters in the class and so on. At the same time, the file summary is generated for each file, and the file summary focuses on collecting the information of each assignment statement, as well as the purification information and coding information of the related variables in the assignment statement.

After the global initialization, we compile the front-end module and use the open source tool PHP-Parser to build the abstract grammar tree (AST) of the analyzed PHP code. On the basis of AST, the control flow graph is constructed by using CFG construction algorithm, and the summary information of basic blocks is generated in real time.

In the work of the front end of the compilation, if the call of the sensitive function is found, the pollution propagation analysis will be stopped, and the corresponding stain data will be found by inter-process analysis and intra-process analysis. Then, based on the information collected in the process of data flow analysis, the purification information and coding information are judged to determine whether it is a vulnerability code or not.

If the previous step is the vulnerability code, then transfer to the vulnerability reporting module for the collection of vulnerability code segments. The basis of its implementation is to maintain a singleton pattern result set context object in the system environment, and if a vulnerability record is generated, it will be added to the result set. After the whole project result is scanned, Smarty is used to output the result set to the front end, and the front end does the visualization of the scan result.

0x03 initialization work

In a real PHP audit, when we encounter a call to a sensitive function, such as mysql_query, we can't help but manually analyze * parameters to see if they are controllable. In fact, many CMS will encapsulate some database query methods, so that the call is convenient and the program logic is clear, such as encapsulating into a class MysqlDB. At this point, instead of searching for the mysql_query keyword in the audit, we look for calls such as db- > getOne.

So the question is, when automated programs analyze, how do you know that the db- > getOne function is a database access class method?

It is necessary to collect all the classes and defined methods of the whole project in the early stage of automatic analysis, so that the program can find the method body that needs to be followed up during the analysis.

The collection of class and method information should be completed as part of the framework initialization and stored in the singleton context:

At the same time, it is necessary to identify whether the analyzed PHP file is really a file that handles user requests, because in some CMS, encapsulated classes are generally written into separate files, such as database operation classes or file operation classes. For these files, pollution propagation analysis is meaningless, so it needs to be identified when the framework is initialized, the principle is very simple, analyze the proportion of call type statements and definition type statements, and distinguish according to the threshold, the error rate is very small.

* to perform a summary operation on each file. The purpose of this step is to perform inter-file analysis when require,include and other statements are encountered in subsequent analysis. Mainly collect variable assignment, variable coding, variable purification information.

0x04 user function processing

Common web vulnerabilities are generally caused by dangerous parameters that can be controlled by users, which are called stain type vulnerabilities, such as common SQLI,XSS and so on. Some of the functions built into PHP are inherently dangerous, such as echo that can cause reflective XSS. However, in the real code, no one calls some built-in functions directly, but encapsulates them again as custom functions, such as:

1234function myexec ($cmd)

{

Exec ($cmd)

}

In the implementation, our processing flow is as follows:

Navigate to the corresponding method code segment using the context information obtained during initialization

Analyze this code snippet and find the dangerous function (here is exec)

Locate the hazard parameter in the hazard function (here is cmd)

If no purification information is encountered during the analysis, indicating that the parameter can be contagious, the parameter cmd is mapped to the user function myexec, and the user-defined function is stored in the context structure as a dangerous function.

Recursive return to start the stain analysis process

To sum up, we follow the corresponding class methods, static methods, and functions to query whether there are calls to dangerous functions and parameters from these code snippets. These PHP built-in dangerous functions and parameter positions are all configured in the configuration file, if once these functions and parameters are found, and the dangerous parameters are not filtered. The user-defined function is used as a user-defined hazard function. Once these functions are found to be called in the subsequent analysis, the stain analysis is started immediately.

0x05 handles the purification and coding of variables

In the real audit process, once we find that the dangerous parameters are controllable, we can't wait to find out whether the programmer has effectively filtered or coded the variable, so as to determine whether there is a loophole. This train of thought is also followed in automatic audit. In the implementation, we should first count and configure the security functions in each PHP. When analyzing the program, we should backtrack and collect necessary purification and coding information for each data flow information, such as:

12345$ a = $_ GET ['a']

$a = intval ($a)

Echo $a

$a = htmlspecialchars ($a)

Mysql_query ($a)

The above code snippet looks a little weird, but it is only used as a demonstration. As can be seen from the code snippet, the variable a has been purified by intval and htmlspecialchars, and according to the configuration file, we have successfully collected this information. At this point, a backtracking is performed to merge the upward purification and coding information of the current line of code. For example, in the third line, the purification information of variable a has only one intval, but in the fifth line, the purification information of variable an is required to be merged and collected into a list collection intval and htmlspecialchars, by collecting all the data flow information in the precursor code and backtracking.

The details are that when the user calls functions such as base64_encode and base64_decode on the same variable at the same time, the base64 encoding of the variable will be eliminated. Similarly, if both escape and anti-escape are carried out at the same time, they should also be eliminated. But if the call order is out of order or only decode is performed, then, you know, it is quite dangerous.

0x06 variable backtracking and stain analysis

1. Variable backtracking

To find the parameters (traceSymbol) of all dangerous sink points, all basic blocks connected to the current Block will be traced forward as follows:

Loop through all the entry edges of the current base block, look for the unpurified traceSymbol and find the name of traceSymbol in the DataFlow attribute of the base block.

Once found, it is replaced with the mapped symbol, and all the cleansing and coding information of the symbol is copied. Then, tracking will be carried out at all the entrances.

* *, the results on different paths on CFG will be returned.

The algorithm stops when traceSymbol maps to a static object of a static string, number, and other type, or when the current basic block has no entry edge. If traceSymbol is a variable or array, check to see if it is in a super-global array.

2. Stain analysis

Stain analysis begins in the process of interprocedural analysis processing built-in and user-defined functions. If a sensitive function call is encountered during program analysis, a dangerous parameter node is obtained by backtracking or from the context, and stain analysis is started. Generally speaking, it is to judge whether dangerous parameters may lead to vulnerabilities. The stain analysis is implemented in the code TaintAnalyser. After obtaining the dangerous parameters, the specific steps are as follows:

First, look for the assignment of dangerous parameters in the current base block and see if there is a user input source in the right node of the DataFlow, such as a super-global array such as $_ GET $_ POST. Plug-in classes with different types of vulnerabilities are used to determine whether these nodes are secure.

If the source is not found in the current basic block, then enter the process of multi-basic inter-block analysis in this document. First, get all the precursor basic blocks of the current base block, where the precursor base blocks contain parallel structures (if-else if-else), or non-parallel structures (ordinary statements). The hazard variable analysis is carried out, and if there is no precursor node in the basic block of the current loop, the analysis algorithm ends.

If no vulnerabilities are found in the basic inter-block analysis, then perform a * inter-file analysis. Make a decision by traversing the include file summaries before loading the current base block.

If a vulnerability occurs in the above steps, enter the vulnerability reporting module. Otherwise, the system continues with the code analysis.

Current effect of 0x07

We did a testable scan of simple-log_v1.3.12 and the results are as follows:

Total: 76 XSS: 3 SQLI: 62 INCLUDE: 5 FILE: 3 FILEAFFECT: 1

Test code are some obvious loopholes, and did not use the MVC framework, what character truncation eat escape characters this, the current technology really can not support, but also can sweep out some. From the point of view of the testing process, bug emerges one after another, mainly because in the early implementation, many syntax structures and test cases are not taken into account, and the algorithms are almost recursive, so it is easy to cause Apache to kneel.

So the current code is really only an experiment, the robustness of the code requires countless refactorings and a large number of tests to achieve, the author does not have much time to maintain.

At this point, I believe that everyone on the "PHP automation white box audit technology introduction and implementation" have a deeper understanding, might as well to actual operation of it! Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.