How to use IDAPython to find vulnerabilities 07/09 Update SLTechnology News&Howtos

How to use IDAPython to find vulnerabilities

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >

Shulou(Shulou.com)06/01 Report--

Today, I will talk to you about how to use IDAPython to find loopholes. Many people may not know much about it. In order to make you understand better, the editor has summarized the following for you. I hope you can get something according to this article.

Overview

IDAPython is a powerful tool for automating tedious and complex reverse engineering tasks. While there have been many articles about using IDAPython to simplify basic reverse engineering, there are few articles about using IDAPython to help review binaries for vulnerabilities. Because this is not a new idea (HalvarFlake published an article on using IDA script automation vulnerabilities in 2001), it is a bit surprising that there are no more articles on this topic. This may be due in part to the increasing complexity required to perform utilization operations on modern operating systems. However, it is still valuable to be able to automate part of the vulnerability research process.

We will begin to show you how to use basic IDAPython techniques to detect dangerous code, which often leads to stack buffer overflows. I'll use the "ascii_easy" binary in http://pwnable[.]kr to automatically detect basic stack buffer overflows. Although this binary is small enough to be reversed manually, it is a good example of applying the same IDAPython technology to larger, more complex binaries.

Start

Before we start writing IDAPython, we must first determine what we want the script to look for. In this case, I chose the binary with one of the simplest types of vulnerabilities, which is a stack buffer overflow caused by using "strcpy" to copy user-controlled strings to the stack buffer. Now that we know what we are looking for, we can start thinking about how to automatically find these types of vulnerabilities.

To achieve this, we will divide it into two steps:

Find all function calls that may cause stack buffer overflows (in this case, "strcpy")

Analyze the use of function calls to determine whether the use meets the criteria (which may result in exploitable overflows)

Find function calls

To find all the calls to the "strcpy" function, we must first locate the "strcpy" function itself. This is easy to do using the capabilities provided by IDAPython API. Using the following code, we can print out all the function names in the binary file:

For functionAddr in Functions (): print (GetFunctionName (functionAddr))

Running this IDAPython script on the ascii_ easy binary gives the following output. We can see that all the function names are printed in the output window of IDA Pro.

Next, we add code to filter the list of functions to find the 'strcpy' function we are interested in. Simple string comparisons will work here. Because we usually deal with similar functions, but because the imported functions are named slightly differently (such as "strcpy" vs "_ strcpy" in the sample program), it is best to check the substrings rather than the exact strings.

Based on the previous code, we now have the following code:

For functionAddr in Functions (): if "strcpy" in GetFunctionName (functionAddr): print hex (functionAddr)

Now that we have found the function we are looking for, we must determine all the locations where it is called. This involves several steps. First, we get all the cross-references to "strcpy", and then examine each cross-reference to find out which cross-references are the actual `strcpy' function calls. Put all this together, and we get the following code:

For functionAddr in Functions (): # Check each function to look for strcpy if "strcpy" in GetFunctionName (functionAddr): xrefs = CodeRefsTo (functionAddr) False) # Iterate over each cross-reference for xref in xrefs: # Check to see if this cross-reference is a function call if GetMnem (xref). Lower () = "call": print hex (xref)

Running this command on the ascii_ easy binary will generate all "strcpy" calls in the binary. The results are as follows:

Function call analysis

Now, through the above code, we know how to get the addresses of all calls in the program. Although there is only one call to "strcpy" in ascii_easy applications (which also happens to be vulnerable), many applications have a large number of calls to "strcpy" (a large number of calls are not vulnerable), so we need some way to analyze calls to "strcpy" in order to prioritize function calls that are more vulnerable.

A common feature of available buffer overflows is that they often involve stack buffers. Although it is possible to exploit buffer overflows in the heap and elsewhere, stack buffer overflows are an easier way to exploit.

This involves some analysis of the target parameters of the strcpy function. We know that the target parameter is the first parameter of the strcpy function, which we can find in the disassembly of the function call. The following is a disassembly of the strcpy call.

When analyzing the above code, there are two ways to find the target parameter of the _ strcpy function. The first method relies on automatic IDA Pro analysis, which automatically annotates known function parameters. As we can see in the screenshot above, IDA Pro automatically detects the "dest" parameter of the _ strcpy function and annotates it as the dest parameter at the instruction to push it into the stack.

Another simple way to detect function parameters is to move the assembly code backward and look for the "push" instruction starting with the function call. Whenever we find an instruction, we can add a counter until we find the index of the parameter we are looking for. In this case, since we are looking for the "dest" parameter that happens to be the first parameter, the method will stop at the first instance of the "push" instruction before the function call.

In both cases, when we traverse the code backwards, we must carefully identify certain instructions that break the sequential code flow. Instructions such as "ret" and "jmp" can cause changes to the code stream, making it difficult to accurately identify parameters. In addition, we must make sure that we do not traverse the code backwards at the beginning of the current function. Now we will simply identify instances of non-sequential code flows when searching for parameters, and stop the search if any non-sequential code flow instances are found.

We will use the second method to find parameters (looking for parameters that are pushed into the stack). To help us find the parameters in this way, we should create a helper function that tracks the parameters pushed back to the stack from the address of the function call and returns the operands corresponding to the specified parameters.

So, for the example above that calls _ strcpy in ascii_easy, our helper will return the value "eax" because the "eax" register stores its target parameter _ strcpy when it pushes strcpy as an argument to the stack. Using some basic python and IDAPython API together, we can build a function to do this, as shown below.

Def find_arg (addr, arg_num): # Get the start address of the function that we are in function_head = GetFunctionAttr (addr, idc.FUNCATTR_START) steps = 0 arg_count = 0 # It is unlikely the arguments are 100 instructions away, include this as a safety check while steps

< 100: steps = steps + 1 # Get the previous instruction addr = idc.PrevHead(addr) # Get the name of the previous instruction op = GetMnem(addr).lower() # Check to ensure that we haven't reached anything that breaks sequential code flow if op in ("ret", "retn", "jmp", "b") or addr < function_head: return if op == "push": arg_count = arg_count + 1 if arg_count == arg_num: # Return the operand that was pushed to the stack return GetOpnd(addr, 0) 使用这个帮助函数，我们能够确定在调用_strcpy之前使用了"eax"寄存器来存储目标参数。为了确定eax在被推入堆栈时是否指向堆栈缓冲区，我们现在必须继续尝试跟踪"eax"中的值来自何处。为了做到这一点，我们使用了类似于以前帮助函数中使用的搜索循环： # Assume _addr is the address of the call to _strcpy # Assume opnd is "eax" # Find the start address of the function that we are searching infunction_head = GetFunctionAttr(_addr, idc.FUNCATTR_START)addr = _addr while True: _addr = idc.PrevHead(_addr) _op = GetMnem(_addr).lower() if _op in ("ret", "retn", "jmp", "b") or _addr < function_head: break elif _op == "lea" and GetOpnd(_addr, 0) == opnd: # We found the destination buffer, check to see if it is in the stack if is_stack_buffer(_addr, 1): print "STACK BUFFER STRCOPY FOUND at 0x%X" % addr break # If we detect that the register that we are trying to locate comes from some other register # then we update our loop to begin looking for the source of the data in that other register elif _op == "mov" and GetOpnd(_addr, 0) == opnd: op_type = GetOpType(_addr, 1) if op_type == o_reg: opnd = GetOpnd(_addr, 1) addr = _addr else: break 在上面的代码中，我们通过汇编代码执行向后搜索，查找保存目标缓冲区的寄存器获取其值的指令。代码还执行许多其他检查，比如检查，以确保我们没有搜索过函数的开始，也没有执行任何可能导致代码流更改的指令。代码还试图追溯任何其他寄存器的值，这些寄存器可能是我们最初搜索的寄存器的来源。例如，代码试图说明下面演示的情况。 ... lea ebx [ebp-0x24] ... mov eax, ebx...push eax... 此外，在上面的代码中，我们引用了函数is_stack_buffer()。这个函数是这个脚本的最后一部分，在IDA API中没有定义。这是一个额外的帮助函数，我们将编写它来帮助我们寻找bug。这个函数的目的非常简单：给定指令的地址和操作数的索引，报告变量是否是堆栈缓冲区。虽然IDA API没有直接为我们提供这种功能，但它确实为我们提供了通过其他方式检查这一功能的能力。使用get_stkvar函数并检查结果是否为None或对象，我们能够有效地检查操作数是否是堆栈变量。我们可以在下面的代码中看到我们的帮助函数： def is_stack_buffer(addr, idx): inst = DecodeInstruction(addr) return get_stkvar(inst[idx], inst[idx].addr) != None 请注意，上面的帮助函数与IDA7 API不兼容。在我们的下一篇博文中，我们将介绍一种新的方法来检查参数是否是堆栈缓冲区，同时保持与所有最新版本的IDA API的兼容性。现在，我们可以将所有这些放到一个脚本中，如下所示，以便找到使用strcpy的所有实例，以便将数据复制到堆栈缓冲区中。有了这些，我们就可以将这些功能扩展到除了strcpy之外，还可以扩展到类似的功能，如strcat、printf等(请参阅 Microsoft禁止的函数列表 )，以及向我们的脚本添加额外的分析。这个脚本的完整版在文章的底部可以找到。运行脚本可以成功地找到易受攻击的strcpy，如下所示。

Script def is_stack_buffer (addr, idx): inst = DecodeInstruction (addr) return get_stkvar (inst [idx], inst[ IDX] .addr)! = None def find_arg (addr, arg_num): # Get the start address of the function that we are in function_head = GetFunctionAttr (addr, idc.FUNCATTR_START) steps = 0 arg_count = 0 # It is unlikely the arguments are 100 instructions away Include this as a safety check while steps < 100: steps = steps + 1 # Get the previous instruction addr = idc.PrevHead (addr) # Get the name of the previous instruction op = GetMnem (addr). Lower () # Check to ensure that we havent reached anything that breaks sequential code flow if op in ("ret", "retn", "jmp" "b") or addr < function_head: return if op = = "push": arg_count = arg_count + 1 if arg_count = = arg_num: # Return the operand that was pushed to the stack return GetOpnd (addr, 0) for functionAddr in Functions (): # Check each function to look for strcpy if "strcpy" in GetFunctionName (functionAddr): xrefs = CodeRefsTo (functionAddr False) # Iterate over each cross-reference for xref in xrefs: # Check to see if this cross-reference is a function call if GetMnem (xref). Lower () = = "call": # Since the dest is the first argument of strcpy opnd = find_arg (xref, 1) function_head = GetFunctionAttr (xref) Idc.FUNCATTR_START) addr = xref _ addr = xref while True: _ addr = idc.PrevHead (_ addr) _ op = GetMnem (_ addr). Lower () if _ op in ("ret", "retn", "jmp" "b") or _ addr < function_head: break elif _ op = = "lea" and GetOpnd (_ addr, 0) = = opnd: # We found the destination buffer, check to see if it is in the stack if is_stack_buffer (_ addr 1): print "STACK BUFFER STRCOPY FOUND at 0x%X"% addr break # If we detect that the register that we are trying to locate comes from some other register # then we update our loop to begin looking for the source of the data in that other register elif _ op = = "mov" and GetOpnd (_ addr 0) = = opnd: op_type = GetOpType (_ addr, 1) if op_type = = o_reg: opnd = GetOpnd (_ addr, 1) addr = _ addr else: break read the above Do you know more about how to use IDAPython to find vulnerabilities? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.