Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to implement breakpoints in Linux Debugger

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

This article will explain in detail how to achieve breakpoints in the Linux debugger. The editor thinks it is very practical, so I share it with you as a reference. I hope you can get something after reading this article.

Series index

With the release of later articles, these links will gradually take effect.

Prepare the environment breakpoint register and memory Elves and dwarves source code and signal source code level step by step after the breakpoint call stack reads variables

DWARF

Elves and dwarves describe how DWARF debugging information works and how it can be used to map machine code to high-level source code. Recall that DWARF contains the address range of the function and a row table that allows you to switch the location of the code between abstraction layers. We will use these features to implement our breakpoints.

Function entry

If you consider overloading, member functions, etc., setting breakpoints on function names may be a bit complicated, but we will traverse all the compilation units and search for functions that match the name we are looking for. The DWARF information is as follows:

DW_TAG_compile_unit DW_AT_producer clang version 3.9.1 (tags/RELEASE_391/final) DW_AT_language DW_LANG_C_plus_plus DW_AT_name / super/secret/path/MiniDbg/examples/variable.cpp DW_ AT_stmt_list 0x00000000 DW_AT_comp_dir / super/secret/path/MiniDbg/build DW_AT_low_pc 0x00400670 DW_AT_high_pc 0x0040069cLOCAL_SYMBOLS: DW_TAG_subprogram DW_AT_low_pc 0x00400670 DW_AT_high_pc 0x0040069c DW_AT_name foo. DW_TAG_subprogram DW_AT_low_pc 0x00400700 DW_AT_high_pc 0x004007a0 DW_AT_name bar...

We want to match DW_AT_name and use DW_AT_low_pc (the starting address of the function) to set our breakpoint.

Void debugger::set_breakpoint_at_function (const std::string& name) {for (const auto& cu: m_dwarf.compilation_units ()) {for (const auto& die: cu.root ()) {if (die.has (dwarf::DW_AT::name) & & at_name (die) = = name) {auto low_pc = at_low_pc (die); auto entry = get_line_entry_from_pc (low_pc); + + entry / / skip prologueset_breakpoint_at_address (entry- > address);}

The only thing about this code that looks a little strange is + + entry. The problem is that the function's DW_AT_low_pc does not point to the starting address of the function's user code, it points to the beginning of the prologue. The compiler usually outputs the prologue and epilogue of a function, which are used to save and restore the stack, manipulate stack pointers, and so on. This is not very useful for us, so we added the entry line to get the first line of user code instead of prologue. The DWARF row table actually has some functionality for marking the entry as the first line after the function prologue, but not all compilers output it, so I took the original approach.

Source line

To set a breakpoint on a line of high-level source code, we need to convert the line number to an address in DWARF. We will traverse the compilation unit, looking for a compilation unit whose name matches the given file, and then looking for the entry corresponding to the given line.

DWARF looks a bit like this:

.debug _ line: line number info for a single cuSource lines (from CU-DIE at .debug _ info offset 0x0000000b): NS new statement, BB new basic block, ET end of text sequencePE prologue end, EB epilogue beginIS=val ISA number, DI=val discriminator value [lno,col] NS BB ET PE EB IS= DI= uri: "filepath" 0x004004a7 [1,0] NS uri: "/ super/secret/path/a.hpp" 0x004004ab [2,0] NS0x004004b2 [3,0] NS0x004004b9 [4,0] NS0x004004c1 [5,0] NS0x004004c3 [1 0] NS uri: "/ super/secret/path/b.hpp" 0x004004c7 [2,0] NS0x004004ce [3,0] NS0x004004d5 [4,0] NS0x004004dd [5,0] NS0x004004df [4,0] NS uri: "/ super/secret/path/ab.cpp" 0x004004e3 [5,0] NS0x004004e8 [6,0] NS0x004004ed [7,0] NS0x004004f4 [7,0] NS ET

So if we want to set a breakpoint on the fifth line of ab.cpp, we will look for the entry associated with the row (0x004004e3) and set a breakpoint.

Void debugger::set_breakpoint_at_source_line (const std::string& file, unsigned line) {for (const auto& cu: m_dwarf.compilation_units ()) {if (is_suffix (file, at_name (cu.root () {const auto& lt = cu.get_line_table (); for (const auto& entry: lt) {if (entry.is_stmt & & entry.line = = line) {set_breakpoint_at_address (entry.address); return;}

I have done is_suffix hack here so that you can type c.cpp to represent a/b/c.cpp. Of course you should actually use case-sensitive paths to deal with libraries or other things, but I'm lazy. Entry.is_stmt checks to see if the entry to the row table is marked as the beginning of a statement, which is set by the compiler based on the address it considers the best target for the breakpoint.

Symbol search

When we are in the object file layer, the symbol is king. Functions are named by symbols, global variables are named by symbols, you get a symbol, we get a symbol, everyone gets a symbol. In a given object file, some symbols may refer to other object files or shared libraries, and the linker will create an executable program from the symbol reference.

You can find the symbol in the correctly named symbol table, which is stored in the ELF section of the binary file. Fortunately, libelfin has a good interface to do this, so we don't have to deal with all the ELF things ourselves. To let you know what we are dealing with, here is a dump of the .symtab part of a binary file, which is generated by readelf:

Num: Value Size Type Bind Vis Ndx Name0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND1: 0000000000400238 0 SECTION LOCAL DEFAULT 12: 0000000000400254 0 SECTION LOCAL DEFAULT 23: 0000000000400278 0 SECTION LOCAL DEFAULT 34: 00000000004002c8 0 SECTION LOCAL DEFAULT 45: 0000000000400430 0 SECTION LOCAL DEFAULT 56: 00000000004004e4 0 SECTION LOCAL DEFAULT 67: 0000000000400508 0 SECTION LOCAL DEFAULT 78: 0000000000400528 0 SECTION LOCAL DEFAULT 89: 0000000000400558 0 SECTION LOCAL DEFAULT 910: 0000000000400570 0 SECTION LOCAL DEFAULT 1011: 0000000000400714 0 SECTION LOCAL DEFAULT 1112: 0000000000400720 0 SECTION LOCAL DEFAULT 1213: 0000000000400724 0 SECTION LOCAL DEFAULT 1314: 0000000000400750 0 SECTION LOCAL DEFAULT 1415: 0000000000600e18 0 SECTION LOCAL DEFAULT 1516: 0000000000600e20 0 SECTION LOCAL DEFAULT 1617: 0000000000600e28 0 SECTION LOCAL DEFAULT 1718: 0000000000600e30 0 SECTION LOCAL DEFAULT 1819: 0000000000600ff0 0 SECTION LOCAL DEFAULT 1920: 0000000000601000 0 SECTION LOCAL DEFAULT 2021: 0000000000601018 0 SECTION LOCAL DEFAULT 2122: 0000000000601028 0 SECTION LOCAL DEFAULT 2223: 0000000000000000 0 SECTION LOCAL DEFAULT 2324: 0000000000000000 0 SECTION LOCAL DEFAULT 2425: 0000000000000000 0 SECTION LOCAL DEFAULT 2526: 0000000000000000 0 SECTION LOCAL DEFAULT 2627: 0000000000000000 0 SECTION LOCAL DEFAULT 2728: 0000000000000000 0 SECTION LOCAL DEFAULT 2829: 0000000000000000 0 SECTION LOCAL DEFAULT 2930: 0000000000000000 0 SECTION LOCAL DEFAULT 3031: 0000000000000000 0 FILE LOCAL DEFAULT ABS init.c32: 0000000000000000 0 FILE LOCAL DEFAULT ABS crtstuff.c33: 0000000000600e28 0 OBJECT LOCAL DEFAULT 17 _ _ JCR_LIST__34: 00000000004005a0 0 FUNC LOCAL DEFAULT 10 deregister_tm_clones35: 00000000004005e0 0 FUNC LOCAL DEFAULT 10 register_tm_clones36: 0000000000400620 0 FUNC LOCAL DEFAULT 10 _ _ do_global_dtors_aux37: 0000000000601028 1 OBJECT LOCAL DEFAULT 22 completed.691738: 0000000000600e20 0 OBJECT LOCAL DEFAULT 16 _ _ do_global_dtors_aux_fin39: 0000000000400640 0 FUNC LOCAL DEFAULT 10 frame_dummy40: 0000000000600e18 0 OBJECT LOCAL DEFAULT 15 _ _ frame_dummy_init_array_41: 0000000000000000 0 FILE LOCAL DEFAULT ABS / super/secret/path/MiniDbg/42: 0000000000000000 0 FILE LOCAL DEFAULT ABS crtstuff.c43: 0000000000400818 0 OBJECT LOCAL DEFAULT 14 _ _ FRAME_END__44: 0000000000600e28 0 OBJECT LOCAL DEFAULT 17 _ _ JCR_END__45: 0000000000000000 0 FILE LOCAL DEFAULT ABS46: 0000000000400724 0 NOTYPE LOCAL DEFAULT 13 _ _ GNU_EH_FRAME_HDR47: 0000000000601000 0 OBJECT LOCAL DEFAULT 20 _ GLOBAL_OFFSET_TABLE_48: 0000000000601028 0 OBJECT LOCAL DEFAULT 21 _ _ TMC_END__49: 0000000000601020 0 OBJECT LOCAL DEFAULT 21 _ _ dso_handle50: 0000000000600e20 0 NOTYPE LOCAL DEFAULT 15 _ _ init_array_end51: 0000000000600e18 0 NOTYPE LOCAL DEFAULT 15 _ _ init_array_start52: 0000000000600e30 0 OBJECT LOCAL DEFAULT 18 _ DYNAMIC53: 0000000000601018 0 NOTYPE WEAK DEFAULT 21 data_start54: 0000000000400710 2 FUNC GLOBAL DEFAULT 10 _ _ libc_csu_fini55: 0000000000400570 43 FUNC GLOBAL DEFAULT 10 _ start56: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ _ gmon_start__57: 0000000000400714 0 FUNC GLOBAL DEFAULT 11 _ fini58: 0000000000000000 0 FUNC GLOBAL DEFAULT UND _ _ libc_start_main@@GLIBC_59: 0000000000400720 4 OBJECT GLOBAL DEFAULT 12 _ IO_stdin_used60: 0000000000601018 0 NOTYPE GLOBAL DEFAULT 21 _ _ data_start61: 00000000004006a0 101 FUNC GLOBAL DEFAULT 10 _ _ libc_csu_init62: 0000000000601028 0 NOTYPE GLOBAL DEFAULT 22 _ _ bss_start63: 0000000000601030 0 NOTYPE GLOBAL DEFAULT 22 _ end64: 0000000000601028 0 NOTYPE GLOBAL DEFAULT 21 _ edata65: 0000000000400670 44 FUNC GLOBAL DEFAULT 10 main66: 0000000000400558 0 FUNC GLOBAL DEFAULT 9 _ init

You can see many symbols used to set up the environment in the object file, and finally you can see the main symbol.

We are interested in the type, name, and value (address) of symbols. We have a symbol_type enumeration of this type with a std::string as the name and std::uintptr_t as the address:

Enum class symbol_type {notype, / / No type (e.g., absolute symbol) object, / / Data objectfunc, / / Function entry pointsection, / / Symbol is associated with a sectionfile, / / Source file associated with the}; / / object filestd::string to_string (symbol_type st) {switch (st) {case symbol_type::notype: return "notype"; case symbol_type::object: return "object"; case symbol_type::func: return "func" Case symbol_type::section: return "section"; case symbol_type::file: return "file";}} struct symbol {symbol_type type;std::string name;std::uintptr_t addr;}

We need to map the symbol types obtained from libelfin to our enumeration because we don't want the dependency to break this interface. Fortunately, I chose the same name for everything, so it's simple:

Symbol_type to_symbol_type (elf::stt sym) {switch (sym) {case elf::stt::notype: return symbol_type::notype;case elf::stt::object: return symbol_type::object;case elf::stt::func: return symbol_type::func;case elf::stt::section: return symbol_type::section;case elf::stt::file: return symbol_type::file;default: return symbol_type::notype;}}

Finally, we need to look for symbols. For illustrative purposes, I loop through the ELF section of the symbol table and collect any symbols I find in it into std::vector. A smarter implementation can establish a mapping from names to symbols so that you only need to look at the data once.

Std::vector debugger::lookup_symbol (const std::string& name) {std::vector syms;for (auto & sec: m_elf.sections ()) {if (sec.get_hdr (). Type! = elf::sht::symtab & & sec.get_hdr (). Type! = elf::sht::dynsym) continue;for (auto sym: sec.as_symtab ()) {if (sym.get_name () = name) {auto & d = sym.get_data () Syms.push_back (symbol {to_symbol_type (d.type ()), sym.get_name (), d.value});} return syms;} add command

As always, we need to add some more commands to expose the functionality to the user. For breakpoints, I use a GDB-style interface, where the breakpoint type is inferred from the parameters you pass, rather than requiring explicit switching:

0x-> breakpoint address:-> breakpoint line number-> breakpoint function name else if (is_prefix (command, "break")) {if (args [1] [0] = ='0' & & args [1] [1] ='x') {std::string addr {args [1], 2}; set_breakpoint_at_address (std::stol (addr, 0,16)) } else if (args [1]. Find (':')! = std::string::npos) {auto file_and_line = split (args [1],':'); set_breakpoint_at_source_line (file_and_line [0], std::stoi (file_and_ line [1]));} else {set_breakpoint_at_function (args [1]);}}

For symbols, we will look for symbols and print out any matches we find:

Else if (is_prefix (command, "symbol")) {auto syms = lookup_symbol (args [1]); for (auto&& 's: syms) {std::cout''"0x" test

Start the debugger on a simple binary and set a breakpoint at the source code level. Setting a breakpoint on some foo functions and seeing my debugger stop on it was one of the most valuable moments of my project.

Symbolic lookup can be tested by adding some functions or global variables to the program and finding their names. Please note that if you are compiling C++ code, you also need to consider renaming.

This is the end of the article on "how to achieve breakpoints in the Linux debugger". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, please share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report