Linux HIDS agent Summary and user Mode HOOK (1) 07/19 Update SLTechnology News&Howtos

Linux HIDS agent Summary and user Mode HOOK (1)

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

Author: u2400 @ know Chuangyu 404 laboratory

Date: December 19, 2019

Original: https://paper.seebug.org/1102/

Recently, in the HIDS agent of linux, I found that although there are a lot of materials, each article has its own emphasis, and there are few step-by-step and comprehensive Chinese articles, which have stepped on a lot of holes in the step-by-step study. Here, we will take the process information collection as a starting point to explain in detail how to achieve a HIDS agent. I hope it will be helpful to all masters.

1. What is HIDS?

Host * * detection, usually divided into two parts: agent and server

Among them, agent is responsible for collecting information, and sending the relevant information to server.

Server usually acts as an information center, deploying rules written by security personnel (there is no written specification for HIDS rules at present), collecting data from various security components (which may also come from waf, NIDS, etc.), analyzing, judging whether the host behavior is abnormal according to the rules, and warning and prompting the abnormal behavior of the host.

The purpose of HIDS is that administrators will not be confused by security events when managing massive IDC, and the health status of each host can be monitored through the information center.

Related open source projects are OSSEC, OSquery, etc., OSSEC is a well-built HIDS, with agent side and server side, with its own rules, basic rootkit detection, sensitive file modification reminders and other functions, and is included in an open source project called wazuh. OSquery is an open source project developed by facebook, which can be used as an agent side to collect host-related data, but server and rules need to be implemented on their own.

Each company's HIDS agent will be customized according to its own needs, adding some personalized features more or less. A basic HIDS agent generally needs to be implemented as follows:

Collect process information

Collect network information

Periodically collect open ports

Monitor sensitive file modification

The following will start with the implementation of an agent, and discuss how to implement a process information collection module of HIDS agent around agent

2. Agent process monitoring module feed 2.1 the purpose of process monitoring

In the Linxu operating system, almost all operation and maintenance operations and * behaviors are reflected in the executed commands, and the essence of command execution is to start the process, so the monitoring of the process is the monitoring of command execution, which is of great help to the operation and maintenance operation upgrade and * behavior analysis.

2.2 data that should be obtained by the process monitoring module

If you want to get information, you must first be clear about what you need. If you do not know what information you need, then it is impossible to implement. Even if you first implement a HIDS that can obtain basic information such as pid, the interface will be changed frequently later because of lack of planning, which is a waste of manpower. Here, refer to the "Advanced Guide to Internet Enterprise Security" to give a basic list of access to information. The acquisition method of this table will be completed later.

The data name implies the path of the path executable file ppath parent process executable file path ENV environment variable cmdline process startup commands pid process idppid parent process group idsid process session iduid startup process user's uideuid startup process user's euidgid startup process user's idegid startup process user's egidmode executable file permissions owner_uid file owner uidowner_ GID text Owner's gidcreate_time file creation time modify_time most recent file modification time pstart_time process started running time prun_time parent process has been running time sys_time current system time fd file descriptor 2.3 process monitoring mode

Process monitoring, usually using hook technology, and these hook are roughly divided into two categories:

Application level (working in R3, it is common to hijack the libc library, which is usually simple but may be bypassed-kernel level (working at R0 or R1, kernel level hook is usually related to system call VFS, which is more complex, and compatibility problems may occur between different distributions and different kernel versions. Serious errors in hook may lead to kenrel panic, which can not be bypassed in principle.

Let's start with a simple application-level hook.

3. HIDS application-level hook3.1 hijacks libc library

The library is used for packaging functions, and the packaged functions can be used directly. Linux is divided into static libraries and dynamic libraries, in which the dynamic library is loaded only when the application is loaded, and the program has a loading order for the dynamic library. You can manually load a dynamic link library first by modifying / etc/ld.so.preload. In this dynamic link library, the original function can be replaced before the program calls the original function, and then the original function can be called to return the result that the original function should return after executing its own logic in its own function.

For those of you who want to know more, please refer to this article

There are several steps to hijack the libc library:

3.1.1 compile a dynamic link library

A simple dynamic link library for hook execve is as follows.

The logic is very simple.

Customize a function named execve, and accept parameters of the same type as the original execve

Execute your own logic.

# define _ GNU_SOURCE#include # include typedef ssize_t (* execve_func_t) (const char* filename, char* const argv [], char* const envp []); static execve_func_t old_execve = NULL;int execve (const char* filename, char* const argv [], char* const envp []) {/ / it is your own logic from here, that is, what printf you want to do when the process calls the execve function ("Running hook\ n") / / the following is to find and call the original execve function and return the call result old_execve = dlsym (RTLD_NEXT, "execve"); return old_execve (filename, argv, envp);}

Compiled into a so file through gcc.

Gcc-shared-fPIC-o libmodule.so module.c3.1.2 modifies ld.so.preload

Ld.so.preload is the configuration file of the LD_PRELOAD environment variable, by modifying the contents of the file to the specified dynamic link library file path

Note that only root can modify ld.so.preload unless the default permissions are changed.

A custom execve function is as follows:

Extern char* * environ;int execve (const char* filename, char* const argv [], char* const envp []) {for (int I = 0; * (environ + I); iTunes +) {printf ("% s\ n", * (environ + I));} printf ("PID:%d\ n", getpid ()); old_execve = dlsym (RTLD_NEXT, "execve"); return old_execve (filename, argv, envp);}

You can output the Pid of the current process and all the environment variables, modify ld.so.preload after compilation, restart shell, and run the ls command as follows

3.1.3 advantages and disadvantages of libc hook

Advantages: good performance, relatively stable, simpler than LKM, high adaptability, usually against web level.

Disadvantages: there is nothing you can do about statically compiled programs, and there is a risk that they will be bypassed.

3.1.4 hook and Information acquisition

The purpose of setting up hook is to set up a monitoring point to obtain relevant information about the process, but if the part of hook is written too much, it will affect the running efficiency of normal business, which is unacceptable to the business. In the usual HIDS, the information that can not be obtained from the hook will be obtained in agent, so that the information acquisition and business logic can be executed concurrently to reduce the impact on the business.

4. Information completion and acquisition

If the requirement for the accuracy of the information is not very high, and you want to do everything possible not to affect the normal business deployed on the HIDS host, then you can choose hook to obtain only the necessary data such as PID and environment variables, and then give these things to agent, and agent will continue to obtain other relevant information about the process, that is, while obtaining other information about the process, the process will continue to run. There is no need to wait for agent to get the complete information table.

/ proc/ [pid] / stat

/ proc is a set of fifo interfaces provided by the kernel to the user mode, which is called in the form of a pseudo-file directory.

The information related to each process will be placed in a folder named pid, and commands such as ps also get the information about the process by traversing the / proc directory.

The contents of a stat file are shown below. The following self is a quick interface to view your own process information provided by the / proc directory. Each process visits / self to see its own information.

# cat / proc/self/stat3119 (cat) R 29973 3119 19885 34821 3119 4194304 107000000 20 01 0 5794695 5562368 176 1844674407370955154309027168256 94309027193225 140712677015200000001700009309027212390271393090533990401407312704821 140712704841 14071267704841 1407312677059 0

You will find that the data is disorganized, using spaces as the boundary of each data, and there is no place to explain what the data means.

General toss found an article in which a list was given, which explained the data type of each data and the meaning of its expression. See Appendix 1 of the article.

Finally, we sorted out a structure with 52 data items of different types, which was still a bit troublesome to get. I didn't find a wheel on the Internet, so I wrote one myself.

Specific definition of structure:

Struct proc_stat {int pid; / / process ID. Char* comm; / / executable file name, which surrounds char state; / / process status int ppid; / / parent process pid int pgid; int session; / / sid int tty_nr; int tpgid; unsigned int flags; long unsigned int minflt; long unsigned int cminflt; long unsigned int majflt; long unsigned int cmajflt with () Long unsigned int utime; long unsigned int stime; long int cutime; long int cstime; long int priority; long int nice; long int num_threads; long int itrealvalue; long long unsigned int starttime; long unsigned int vsize; long int rss; long unsigned int rsslim; long unsigned int startcode; long unsigned int endcode; long unsigned int startstack; long unsigned int kstkesp; long unsigned int kstkeip; long unsigned int signal / / The bitmap of pending signals long unsigned int blocked; long unsigned int sigignore; long unsigned int sigcatch; long unsigned int wchan; long unsigned int nswap; long unsigned int cnswap; int exit_signal; int processor; unsigned int rt_priority; unsigned int policy; long long unsigned int delayacct_blkio_ticks; long unsigned int guest_time; long int cguest_time; long unsigned int start_data; long unsigned int end_data Long unsigned int start_brk; long unsigned int arg_start; / / Parameter start address long unsigned int arg_end; / / Parameter end address long unsigned int env_start; / / start address of environment variable in memory long unsigned int env_end; / / end address of environment variable int exit_code; / / exit status code}

Read from a file and format it into a structure:

Struct proc_stat get_proc_stat (int Pid) {FILE * f = NULL; struct proc_stat stat = {0}; char tmp [100] = "0"; stat.comm = tmp; char stat_path [20]; char* pstat_path = stat_path; if (Pid! =-1) {sprintf (stat_path, "/ proc/%d/stat", Pid) } else {pstat_path = "/ proc/self/stat";} if ((f = fopen (pstat_path, "r")) = = NULL) {printf ("open file error"); return stat;} fscanf (f, "% d", & stat.pid); fscanf (f, "(0s", stat.comm); tmp [strlen (tmp)-1] ='\ 0' Fscanf (f, "% c", & stat.state); fscanf (f, "% d", & stat.ppid); fscanf (f, "% d", & stat.pgid) Fscanf (f, "d% u% lu% ld% lu% ld% lu% d% u% u% llu% lu% ld% lu% lu d" & stat.tty_nr, & stat.tpgid, & stat.flags, & stat.minflt, & stat.cminflt, & stat.majflt, & stat.cmajflt, & stat.utime, & stat.stime, & stat.cutime, & stat.cstime, & stat.priority, & stat.nice, & stat.num_threads, & stat.itrealvalue, & stat.starttime, & stat.vsize, & stat.rss & stat.rsslim, & stat.startcode, & stat.endcode, & stat.startstack, & stat.kstkesp, & stat.kstkeip, & stat.signal, & stat.blocked, & stat.sigignore, & stat.sigcatch, & stat.wchan, & stat.nswap, & stat.cnswap, & stat.exit_signal, & stat.processor, & stat.rt_priority, & stat.policy & stat.delayacct_blkio_ticks, & stat.guest_time, & stat.cguest_time, & stat.start_data, & stat.end_data, & stat.start_brk, & stat.arg_start, & stat.arg_end, & stat.env_start, & stat.env_end, & stat.exit_code) Fclose (f); return stat;}

Compared with the data we need to obtain, we can obtain the following data

Ppid parent process idpgid process group idsid process session idstart_time parent process start running time run_time parent process has been running / proc/ [pid] / exe

Get the path to the executable file through / proc/ / exe, where / proc/ [pid] / exe is the soft link to the executable file, so get the address that the soft link points to through the readlink function.

It should be noted here that if the file read by readlink has been deleted, there will be an extra deleted after the read file name, but agent cannot blindly delete the corresponding string at the end of the file, so you need to pay attention to this situation when writing server rules.

Char* get_proc_path (int Pid) {char stat_path [20]; char* pstat_path = stat_path; char dir[ path _ MAX] = {0}; char* pdir = dir; if (Pid! =-1) {sprintf (stat_path, "/ proc/%d/exe", Pid);} else {pstat_path = "/ proc/self/exe" } readlink (pstat_path, dir, PATH_MAX); return pdir;} / proc/ [pid] / cmdline

The acquisition process starts with the start command, which can be obtained by getting the contents of / proc/ / cmdline. There are two potholes in this acquisition.

Due to the variable length of startup command, in order to avoid overflow, we need to get the length first, apply for heap space with malloc, and then read the data into variables.

All the spaces and carriage returns in the / proc/self/cmdline file will become'\ 0' and I don't know why, so you need to change the source manually, and several connected spaces will only become a'\ 0'.

The way to get the length here is stupid, but the method of moving the file pointer directly to the end of the file with fseek returns 0 every time, and I don't know what to do, so I have to do it first.

Long get_file_length (FILE* f) {fseek (freco 0L recording SEEKSTET); char ch; ch = (char) getc (f); long i; for (I = 0schch! = EOF; iSet +) {ch = (char) getc (f);} iMagnum; fseek (freco 0L SEEKEMSTET); return I;}

Get the contents of cmdline

Char* get_proc_cmdline (int Pid) {FILE* f; char stat_path [100] = {0}; char* pstat_path = stat_path; if (Pid! =-1) {sprintf (stat_path, "/ proc/%d/cmdline", Pid);} else {pstat_path = "/ proc/self/cmdline" } if ((f = fopen (pstat_path, "r")) = = NULL) {printf ("open file error"); return "";} char* pcmdline = (char*) malloc ((size_t) get_file_length (f)); char ch; ch = (char) getc (f); for (int I = 0borch! = EOF; itemized +) {* (pcmdline + I) = ch Ch = (char) getc (f); if ((int) ch = = 0) {ch =';}} return pcmdline;} summary

What is written here is only one of the most common and simple application-level hook methods. The specific implementation and code have been placed on github, while the code on github will be updated. The next article will share how to use LKM to modify sys_call_table to hook system calls to implement HIDS hook.

Reference article

Https://www.freebuf.com/articles/system/54263.html

Http://abcdefghijklmnopqrst.xyz/2018/07/30/Linux_INT80/

Https://cloud.tencent.com/developer/news/337625

Https://github.com/g0dA/linuxStack/blob/master/%E8%BF%9B%E7%A8%8B%E9%9A%90%E8%97%8F%E6%8A%80%E6%9C%AF%E7%9A%84%E6%94%BB%E4%B8%8E%E9%98%B2-%E6%94%BB%E7%AF%87.md

Appendix 1

Here is a complete explanation of the specific meaning of each file in the / proc directory.

Http://man7.org/linux/man-pages/man5/proc.5.html

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.