How to understand PHP Kernel Exploration: the principle of Hash Table collision attack 07/13 Update SLTechnology News&Howtos

How to understand PHP Kernel Exploration: the principle of Hash Table collision attack

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article introduces the knowledge of "how to understand PHP kernel exploration: the principle of hash table collision attack". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Recently, the topic of hash table collision attack (Hashtable collisions as DOS attack) has been mentioned constantly, and various languages have been attacked one after another. In this paper, combined with the PHP kernel source code, talk about the principle and implementation of this attack.

The basic principle of hash table collision attack

Hash table is a data structure with high search efficiency, and many languages implement hash table internally. The hash table in PHP is a very important data structure, which is not only used to represent the Array data type, but also used to store context information within the Zend virtual machine (the variables and functions of the execution context are stored in the hash table structure).

Ideally, the time complexity of hash table insert and lookup operations is O (1). Any data item can calculate a hash value (key) in a time independent of the length of the hash table, and then locate a bucket in constant time (the term bucket refers to a location in the hash table). Of course, this is ideal, because the length of any hash table is limited, so there must be cases where different data items have the same hash value, and different data items are defined to the same bucket, called collision. The implementation of the hash table needs to solve the collision problem, and there are generally two ways to solve the collision problem. The first is to define the collided data to other buckets according to certain principles, such as linear detection-- if the data collides during insertion, then sequentially look for the bucket behind this bucket and put it into the first bucket that is not used. The second strategy is that each bucket is not a location that can hold only a single data item, but a data structure that can hold multiple data (such as a linked list or a red-black tree), and all colliding data is organized in the form of some data structure.

No matter which collision resolution strategy is used, the time complexity of insert and search operations is no longer O (1). Take lookup as an example, it cannot end when you locate the bucket through key. You must also compare whether the original key (that is, the key before hashing) is equal. If not, continue searching using the same algorithm as insert, until you find a matching value or confirm that the data is not in the hash table.

PHP uses a single linked list to store collision data, so in fact, the average search complexity of the PHP hash table is O (L), where L is the average length of the bucket linked list; and the worst complexity is O (N), when all data collide and the hash table degenerates to a single linked list. The following diagram shows the normal hash table and the degenerate hash table in PHP.

The hash table collision attack is to make all the data collide and artificially turn the hash table into a degenerated single linked list by carefully constructing the data. At this time, the time of various operations of the hash table is increased by an order of magnitude, so it consumes a lot of CPU resources, resulting in the system can not respond to requests quickly, thus achieving the purpose of denial of service attack (DoS).

As you can see, the premise of a hash collision attack is that the hash algorithm is very easy to find the collision, and if it's MD5 or SHA1, it's almost impossible. Fortunately (and unfortunately) the hash algorithms used by most programming languages are very simple (for efficiency reasons), so you can easily construct attack data. The next section will analyze the Zend-related kernel code to find out how to attack the hash table collision attack PHP.

Internal implementation data structure of Zend hash table

A structure called Backet is used in PHP to represent buckets, and all buckets with the same hash are organized into a single linked list. The hash table is represented by the HashTable structure. The relevant source codes are under zend/Zend_hash.h:

Typedef struct bucket {ulong h; / * Used for numeric indexing * / uint nKeyLength; void * pData; void * pDataPtr; struct bucket * pListNext; struct bucket * pListLast; struct bucket * pNext; struct bucket * pLast; char arKey [1]; / * Must be last element * /} Bucket;typedef struct _ hashtable {uint nTableSize; uint nTableMask; uint nNumOfElements; ulong nNextFreeElement; Bucket * pInternalPointer; / * Used for element traversal * / Bucket * pListHead; Bucket * pListTail; Bucket * * arBuckets Dtor_func_t pDestructor; zend_bool persistent; unsigned char nApplyCount; zend_bool bApplyProtection;#ifZEND_DEBUG int inconsistent;#endif} HashTable

The field name clearly indicates its purpose, so it is not explained too much. Focus on the following fields: the "h" in Bucket is used to store the nTableMask in the original key;HashTable is a mask, which is generally set to nTableSize-1, which is closely related to the hash algorithm, which will be discussed in detail later in the hash algorithm; arBuckets points to an array of pointers, where each element is a header pointer to the Bucket linked list.

Hash algorithm

The minimum capacity of the PHP hash table is 8 (2 ^ 3), the maximum capacity is 0 × 80000000 (2 ^ 31), and is rounded to the integer power of 2 (that is, the length automatically expands to the integer power of 2, for example, the hash table length of 13 elements is 16 × 100 elements). NTableMask is initialized to the hash table length (rounded) minus 1. The specific code is in the _ zend_hash_init function of zend/Zend_hash.c, which intercepts the relevant parts of this article and adds a small amount of comments.

ZEND_API int_zend_hash_init (HashTable * ht, uintnSize, hash_func_t pHashFunction, dtor_func_t pDestructor, zend_bool persistent ZEND_FILE_LINE_DC) {uinti = 3; Bucket * * tmp; SET_INCONSISTENT (HT_OK); / / Integer if (nSize > = 0x80000000) {/ * prevent overflow * / ht- > nTableSize = 0x80000000;} else {while ((1U nTableSize = 1nTableMask = ht- > nTableSize-1; / * some code is omitted here … * / returnSUCCESS;}

It is worth mentioning that the method of rounding PHP to the integer power of 2 is very clever and can be memorized and used when needed.

Zend HashTable's hashing algorithm is extremely simple:

The copy code is as follows:

Hash (key) = key&nTableMask

That is, simply bit by bit the original key of the data and the nTableMask of the HashTable.

If the original key is a string, the string is first transformed into an integer using the Times33 algorithm, and then bitwise with nTableMask.

The copy code is as follows:

Hash (strkey) = time33 (strkey) & nTableMask

Here is the code to find the hash table in the Zend source code:

ZEND_API int zend_hash_index_find (constHashTable * ht, ulong h, void * * pData) {uint nIndex; Bucket * p; IS_CONSISTENT (ht); nIndex = h & ht- > nTableMask; p = ht- > arBuckets [nIndex]; while (p! = NULL) {if ((p-> h = = h) & (p-> nKeyLength = = 0)) {* pData = p-> pData; returnSUCCESS;} p = p-> pNext;} returnFAILURE } ZEND_API int zend_hash_find (constHashTable * ht, constchar * arKey, uint nKeyLength, void * * pData) {ulong h; uint nIndex; Bucket * p; IS_CONSISTENT (ht); h = zend_inline_hash_func (arKey, nKeyLength); nIndex = h & ht- > nTableMask; p = ht- > arBuckets [nIndex] While (p! = NULL) {if ((p-> h = = h) & & (p-> nKeyLength = = nKeyLength)) {if (! memcmp (p-> arKey, arKey, nKeyLength)) {* pData = p-> pData; returnSUCCESS;}} p = p-> pNext;} returnFAILURE;}

Where zend_hash_index_find is used to find the case of the integer key and zend_hash_find is used to find the string key. The logic is basically the same, but the string key will be converted to an integer through zend_inline_hash_func key,zend_inline_hash_func encapsulates the times33 algorithm, the specific code will not be posted.

Attack basic attack

Knowing the algorithm of PHP internal hash table, you can use its principle to construct data for attack. One of the easiest ways to create collisions is to use masking rules. As mentioned above, the length nTableSize of Zend HashTable will be rounded to the integer power of 2. Suppose we construct a hash table of 2 ^ 16, then the binary representation of nTableSize is: 1 0000 0000 0000, and nTableMask = nTableSize-1 is: 0 1111 1111 1111. Next, with 0 as the initial value and 2 ^ 16 as the step size, enough data can be produced to speculate as follows:

0000 0000 0000 & 0 1111 1111 1111 = 0

0001 0000 0000 0000 & 0 1111 1111 1111 = 0

0010 0000 0000 0000 & 0 1111 1111 1111 = 0

0011 0000 0000 0000 & 0 1111 1111 1111 = 0

0100 0000 0000 0000 & 0 1111 1111 1111 = 0

……

Generally speaking, as long as the last 16 bits are guaranteed to be 0, all the hashes obtained after the mask will collide at position 0.

Here is a piece of attack code written using this principle:

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.