In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/02 Report--
The basic concepts of the process:
A process can be understood as a copy of a running program. A program runs as a process through kernel scheduling, and the kernel is responsible for scheduling it to run on CPU to execute part or all of the code in the program, so the process is a running dynamic entity. The program is a file placed on the file system, as long as it is not deleted, it will exist forever, and the process has a life cycle, and each process has a period of time to create, run, and end. The code of the same program can be copied and run by multiple processes scheduled by the kernel, so it can be called a copy of the running program.
Process scheduling:
The most important part of process management is process scheduling, which can be understood as the arrangement and management of various details of each process's runtime. When an executable program file on the file system is triggered and runs as a process through kernel scheduling, the instructions, data and process-related attribute information (such as process owner, group, PID, etc.) in the process are loaded into the memory space by the kernel, and the instructions need to be run on CPU. At this time, the registers in CPU can record the status of running instructions. For example, fetching instructions, executing instructions, processing data, fetching data, wherein the instruction pointer register IP is used to store the memory address of the next instruction to be executed.
But there are many processes that need CPU to run instead of just one process, so the kernel divides CPU into multiple time slices and is responsible for assigning these time slices to each process according to priority. When a process runs on a certain time slice and CPU, this time slice is the time that the process is allowed to run. Once this time slice has passed, the process will be interrupted, then the kernel stores the intermediate state information of the process in memory according to a fixed format, reschedules another process to run on the CPU, and then the data used to store the process state in CPU will be overwritten by the next process. Among them, the process of storing the relevant state information of the process in memory is called saving the site, while the Linux kernel stores the process state information into a structure with a fixed format, which is task struct, and the task struct of multiple tasks (processes) form a linked list structure (task list). According to the different ways of composition, there are unidirectional linked list structure, bi-directional linked list structure, cyclic linked list structure, bi-directional cyclic list structure and so on. For example, we need a jar to hold drinks, and this fixed structure container, the jar, is the structure, so we can't hold the wine on the floor, can it? The structure of these jars arranged according to a certain organization can be compared to the linked list structure.
It should be noted that when a time slice on CPU ends, the kernel will schedule another process to run on CPU, but the question is how can the kernel pick out one of the many processes waiting to run and make it run on CPU? In order to determine which process executes first and which process executes later, it needs to be judged by the process priority. The priority range of the process is 0-139, which is divided into real-time priority (1-99) and static priority (100-139). The real-time priority is related to the system management operations performed by the kernel, and users cannot adjust the real-time priority. Users can adjust the static priority (which can be adjusted by the nice value). Ordinary users can only lower but not higher, and only system administrators can adjust it at will. Therefore, the kernel uses process priority as the basis for scheduling process order, but the kernel must first know the attributes (priority) of each process. therefore, every time a scheduled process executes, it has to traverse the information in all the structures in memory, which will consume a lot of time.
The kernel version 2.6 of Linux solves this problem through ingenious design: all processes are arranged into 140 queues, each queue corresponds to a priority, and each process belongs to the queue of its priority. Each time the kernel schedules the execution of the process, it only needs to scan the head of the 140 queues and pick out one process from these queues for execution. In this way, no matter how many processes there are, the kernel can quickly schedule which process is executed on the CPU by scanning 140 queue headers, so the scheduling time does not change with the number of processes. This working mode of the kernel is based on an algorithm that conforms to the O (1) ideal model of Big O in the program world, which means that with the increase of algorithm complexity, the time for the algorithm to solve the problem (in this case, the process scheduling speed problem) will not change.
But then there is another question: how should a process return to the team after a time slice has been executed? In Linux, processes are queued according to priority, and each queue is divided into running queues and expired queues. In fact, the kernel makes the judgment by scanning the headers of each running queue, while the process returns to the expired queue after CPU execution, while the running queue continues to wait for the kernel to schedule. Once all the processes in the run queue are executed (each process is executed in turn on the CPU), the run queue is transformed into an expired queue, and the original expired queue is transformed into a run queue waiting for the kernel to schedule again, repeating the above logic.
Process creation:
After the host is booted, the kernel is first loaded into memory and the kernel code starts running on the CPU. Then the initialization process init is created by the process, and the user space management transaction is handed over to the init process management. Then init creates child processes, and these child processes can create their own child processes, so all processes except init processes are created by their parents; init processes can be regarded as agents of the kernel, responsible for managing the creation and shutdown of user-space processes, and submit corresponding requests to the kernel, but init cannot execute privileged instructions on behalf of the kernel.
So under what circumstances does the parent process need to create a child process? When a parent process needs to complete complex tasks with the help of some programs, it usually needs to call an executable program file on the system to create it as a process, and to create a child process needs to submit a process to the kernel through the system call interface fork (), and then copies its own data to the child process through the clone () interface It should be noted that at this time, the parent process and the child process occupy the same space in memory, and when the child process needs to modify the data, the parent process will re-open up the same memory space for the child process, this is because the parent process and the child process no longer occupy the same memory space, this mechanism is called write-time copy mechanism (CoW).
When a child process completes a complex task, its parent process shuts down the child process and the kernel takes back memory.
The embodiment of init program in different versions: CentOS 5: SysV init, classic version, the defect is that when the system starts and boots, it creates child processes by writing scripts with the help of shell, because shell scripts are piled up of a large number of commands, and processes are created when each command is run, so thousands of processes need to be created each time you start, so the execution speed is very slow. CentOS 6: upstart, based on dbus, communicates with SysV init by running many commands to create many processes, but the difference is that upstart can start related service programs in parallel to achieve multi-line creation processes. If there are multiple CPU, it will be much faster, while SysV init can only start the startup program in serial mode. CentOS 7: systemd, complete the startup and boot of the entire system with one program, and the total number of processes that need to be started is only about 10, so the system starts and boots very quickly. Starting or shutting down service programs on CentOS7 needs to be done through systemctl, because these services are uniformly controlled by systemd; # different versions of the init program file name is init.
Process memory:
Following the above, when the kernel is running, it will occupy a certain amount of memory space, for example, 1G for Linux. After other processes are running, the kernel will divide the memory into multiple pieces of space for each process to use, where each "slice" refers to a page frame (page frame), which is used to store page data (memory process information). A page frame is a 4K block of memory space. However, it is obviously not appropriate to allocate the real physical memory space to each process continuously, because the free memory space is often discrete, and each process will produce all kinds of intermediate data during its running. this will also lead to an increase in the memory space occupied by the process. Therefore, the kernel collects the free fragments (page frames) on the underlying physical memory and virtualizes a continuous memory space for the upper process, which is also known as a logical address. let each process "think" that the memory space it occupies is continuously distributed, and this continuous logical address is called linear address space. The kernel will create an illusion for each process, that is, only two programs are running on the host: the kernel and the process itself, and "think" that in the linear address space, except for 1G of memory allocated to the kernel, the rest of the memory space can be used by the process itself. In fact, the process really uses only a small portion of the memory space and maps to continuous or discrete memory space on real physical memory. The mapping between logical and physical addresses is stored in task struct. When a process executes, there is a circuit in CPU dedicated to translating logical addresses into physical memory addresses, that is, memory management units (MMU, Memory Management Unit), so the kernel is required to load the memory mapping into MMU.
When a process is allocated memory, its instructions, data, and process-related attribute information will be loaded into memory, storing instructions (code) at a low address in memory, and the data can be organized and stored in memory by variables and other data structures, of course, it can also be ordinary data; the process will also use part of the space as heap memory and stack memory When the intermediate data of a process increases, or when more data on disk needs to be loaded into memory, the amount of data stored in heap memory and stack memory increases, and the heap memory space expands to the top of the stack, while the stack memory space expands to the heap; once the two meet during expansion, it means that there is not enough physical memory space. At this time, the swap space (swap) comes out. Swap can scan the data that is not commonly used in memory through the recent least use algorithm (LRU, Least Recently Used) and store it in swap temporarily, and the memory space vacated can be used by the process.
Of course, not all data in memory space can be exchanged to swap, critical data (such as program instruction code) cannot be exchanged, but non-critical data (such as some infrequent data in programs, data); we call the memory space that cannot be swapped as resident memory sets, and the memory space that can be swapped as virtual memory sets. However, if the memory space is swapped, the address of the physical memory space mapped in the linear address space will also change, so when the process needs to revisit the swap swapped to the disk or load data from other spaces on the disk to memory, it needs to request the kernel call first, and the kernel loads the data on the disk into the kernel memory, and then copies a copy to the process memory. Also, the mapping on the MMU needs to be updated when the process executes on the CPU.
Interprocess communication:
Although the process "thinks" that only the kernel and the process itself are running on the host, the communication between processes can be achieved through IPC (Inter-Process Communication) technology. However, inter-process communication is different on the same host and on different hosts.
IPC (Inter-Process Communication):
On the same host:
Signal: communicate by sending signals; shm: communicate by shared memory; semerphor: flag language, communicate by means of similar "gesture"
On different hosts:
Rpc:remote procecure call, that is, remote procedure call, in which the process communicates by calling the library function on the remote host or the result of data processing; rpc is based on socket, but is more abstract than socket; socket: communication is achieved through process monitoring, and both sides of the communication need to establish a virtual link (tcp connection) in advance. In the user space, the socket file is represented as a socket file. The socket files on both hosts save the IP/Port of the local host and the IP/Port of the other host. One user can write data to the socket file, while the other user can obtain the data by reading the socket file.
Process type:
Classify according to whether it is related to the terminal:
Daemon process: a process started during system boot, independent of the terminal; foreground process: a process started by the terminal, related to the terminal; # Note: processes started in the foreground can also be sent to the background and run in daemon mode
Classify according to whether the process occupies more CPU or IO:
CPU-bound: general non-interactive processes are CPU-intensive; IO-bound: general interactive processes are IO-intensive; IO-intensive processes should have higher priority than CPU-intensive processes; # according to one of Linux's philosophies: programs should avoid interacting with users as much as possible after startup, so processes like CPU-bound should allocate more CPU resources.
Process status:
(1) running state: running; (2) ready state: ready, which can be run but not on CPU; also known as sleeping state; (3) sleeping state: divided into interruptible sleep and uninterruptible sleep; interruptible sleep: interruptable, which can be awakened and run at any time; uninterruptible sleep: uninterruptable, such as waiting for the IO process (4) stopped state: stopped, that is, paused in memory, will not be scheduled unless started manually; (5) dead state: zombie, after the child process is finished, it needs to wait for its parent process to shut down, but if the parent process shuts down accidentally, the child process will wait forever. At this time, the child process is a zombie process, or is in a zombie state. # waiting for the IO process: when the data needed for process execution is not in memory, you need to apply to the kernel (the process cannot access the hardware directly through the system call API). The kernel loads the data on the disk into the kernel memory first, and then copies the data loaded in the kernel memory to the process memory.
The basic concepts of threads:
Generally speaking, each program can only have one execution flow when it runs; and if there is a lot of code within the program that needs to be executed and the functions of these codes are relatively independent of each other, in order to make the program execute faster, the program can be developed into an execution flow that can be independent of each other and can be executed independently, and this execution flow is a thread; a thread is a sub-unit of the process. When there are multiple CPU, the program starts to create multiple parallel execution threads (execution flow), these threads can run on different CPU. However, when the host has only one CPU, splitting the process into multiple threads slows down because the kernel consumes a lot of CPU time when switching scheduled processes (threads).
About service programs:
For Linux, complex tasks are basically accomplished through the combination of Mini Program with simple functions, so processes are lightweight on Linux, almost no different from threads. Although many service programs are written based on parallel programming mode, the response speed is still very slow when the server receives too many requests, so it is possible to program through ingenious design, so that one process can respond to multiple processes at the same time.
As a Linux operation and maintenance staff, you need to identify the running status of the current system, the consumption of resources, the number and mode of processes started, for example, when the user's web page opens slowly. We need to see which processes are running on the current system, whether there are any processes that we expect to run, what percentage of CPU and memory are expected to run, whether we want to adjust the priority of the process, and whether the CPU and memory resources are full. For larger service programs, such as programs written by Java, it is often necessary for operators to check whether the jvm virtual machine is running normally, whether to adjust its garbage collection policy, whether the upper or lower limit of its memory resource usage, and whether to start another jvm process to achieve parallel response. In short, for the operation and maintenance personnel, it is very important to have the ability to identify the running status of the current system, because the operation and maintenance are basically services, and services are provided through the process.
Process viewing and management tools on Linux systems: pstree, ps, pidof, pgrep, top, htop, glances, pmap, vmstat, dstat, kill, pkill, job, bg, fg, nohup, nice, renice, killall,...
Pstree command:
Display a tree of processes
Used to display the process tree.
Example:
Display the process tree on CentOS 6:
[root@osyunwei] # pstreeinit ─┬─ NetworkManager ├─ abrtd ├─ acpid ├─ atd ├─ auditd ─── {auditd} ├─ automount ─── 4* [{automount}] ├─ bluetoothd ├─ certmonger ├─ console-kit-dae ─── 63 * [{console-kit-da}] ├─ crond ├─ cupsd ├─ dbus-daemon ─── {dbus- Daemon} ├─ dmeventd ─── 2 * [{dmeventd}] ├─ gpm ├─ hald ─┬─ hald-runner ─┬─ hald-addon-acpi │ │ ├─ │ │ ├─ hald-addon-rfki │ │ └─ hald-addon-stor │ └─ {hald} ├─ 2 * [iscsid] ├─ iscsiuio ─── 2 * [{iscsiuio}] ├─ ksmtuned ─── sleep ├─ login ─── bash ├─ master ─┬─ pickup │ └─ qmgr ├─ 5* [mingetty] ├─ modem-manager ├─ pcscd ─── {pcscd} ├─ polkitd ├ ─ portreserve ├─ rpc.statd ├─ rpcbind ├─ rsyslogd ─── 3 * [{rsyslogd}] ├─ sshd ─┬─ sshd ─── bash │ └─ sshd ─── bash ─── pstree ├─ udevd ─── 2 * [udevd] └─ wpa_supplicant
Display the process tree on CentOS 7:
[root@www ~] # pstreesystemd ─┬─ NetworkManager ─┬─ dhclient │ └─ 2 * [{NetworkManager}] ├─ abrt-watch-log ├─ abrtd ├─ anacron ├─ atd ├─ auditd ─── {auditd} ├─ crond ├─ dbus-daemon ─── {dbus-daemon} ├─ firewalld ── ─ {firewalld} ├─ httpd ─── 6 * [httpd] ├─ irqbalance ├─ login ─── bash ├─ lsmd ├─ lvmetad ├─ master ─┬─ pickup │ └─ qmgr ├─ polkitd ─── 5 * [{polkitd}] ├─ rngd ├─ rpcbind ├─ rsyslogd ─── 2 * [{rsyslogd}] ├─ smartd ├─ sshd ─── sshd ─── bash ─── pstree ├─ systemd-journal systemd-logind ├─ systemd-udevd tuned ─── 4* [{tuned}] └─ vmtoolsd ─── {vmtoolsd}
Ps command:
Report a snapshot of the current processes.
Displays the running status of processes on the system the moment the ps command is executed.
Ps command to view the interface of kernel management process parameters-- > pseudo file system / proc
The process is managed by the kernel, and the relevant information of the kernel management process can be queried through the interface. On Linux, this interface is the / proc directory. The kernel parameters are simulated as file system types, and each file is the kernel parameter. The / proc file system is stored in memory and is used to store state information in the kernel.
There are two kinds of kernel parameters:
(1) parameters that can be set to adjust the operating characteristics of the kernel: these parameters are usually stored in the / proc/sys/ directory, but not files (parameters) located in the / proc/sys directory can be set, only files with write permissions (parameters) can be set. (2) State parameters: used to output statistics and status information in the kernel; only for viewing.
Each process has a directory under the / proc directory with the same name as its PID, and each file in this directory is a kernel parameter dedicated to storing information about the current process.
The directory corresponding to the init process is displayed as follows:
You can view the program file that started the process through the parameter comm:
[root@osyunwei ~] # cat / proc/1/comm init
You can view the mapping between logical address and physical memory address through the parameter maps:
[root@osyunwei] # cat / proc/1/maps7f10e779e000-7f10e77ab000 r-xp 00000000 fd:01 398008 / lib64/libnss_files-2.12.so # library function 7f10e77ab000-7f10e79aa000-p 0000d000 fd:01 398008 / lib64/libnss_files-2.12.so7f10e79aa000-7f10e79ab000 Ruki p 0000c000 fd:01 398008 / lib64/libnss_files-2.12. So7f10e79ab000-7f10e79ac000 rw-p 0000d000 fd:01 398008 / lib64/libnss_files-2.12.so7f10e79ac000-7f10e7b36000 r-xp 00000000 fd:01 462418 / lib64/libc-2.12.so7f10e7b36000-7f10e7d36000-p 0018a000 fd:01 462418 / lib64/libc-2.12.so. (omitted in the middle). 7f10e8c01000-7f10e8c24000 r-xp 00000000 fd:01 149577 / sbin/init7f10e8e23000-7f10e8e25000 Ruki p 00022000 fd:01 149577 / sbin/init7f10e8e25000 -7f10e8e26000 rw-p 00024000 fd:01 149577 / sbin/init7f10eaa25000-7f10eaa64000 rw-p 00000000 00:00 0 [heap] # heap memory 7ffc9a845000-7ffc9a85a000 rw-p 00000000 00:00 0 [stack] # Stack memory 7ffc9a8fd000-7ffc9a8fe000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
Because the kernel parameters in the / proc directory are not easy for non-kernel developers to view, there are many view commands that specifically collect and display these kernel parameter information in an intuitive way. The ps command is one of the more classic commands.
Ps command format:
Ps [options]
There are three styles of ps command options:
(1) UNIX options: options must be preceded by'-'(2) BSD options: no'- 'between options. (3) GNU long format options: options must be preceded by'--'.
Common options:
A: displays all terminal-related processes
X: displays all processes independent of the terminal
U: user-centered organization of process status information display
O field1,field2,...: customize the list of fields to display; fields are separated by commas
-e: show all processes
-f: displays process status information in full format
-F: displays process status information in full format (more items than the-f option)
-H:Hierarchy, which displays information about the process in a hierarchical structure
Process startup method:
(1) automatic startup during system startup: processes independent of the terminal
(2) processes initiated by users through the terminal: processes related to the terminal
One of the common combinations of ①: aux
Displays all terminal-related processes:
[root@localhost ~] # ps a PID TTY STAT TIME COMMAND 2571 tty1 Ss+ 0:02-bash 13604 pts/0 Ss 0:00-bash 13742 pts/0 R + 0:00 ps a # Field meaning: PID: process number; TTY: terminal device associated with the process; STAT: state of the process; TIME: cumulative CPU time occupied by the process; COMMAND: program command (including options and parameters) to start the process
Displays all processes that are independent of the terminal:
[root@localhost ~] # ps x PID TTY STAT TIME COMMAND 1? Ss 0:28 / usr/lib/systemd/systemd-switched-root-system-deserialize 21 2? S 0:00 [kthreadd] 3? S 0:00 [ksoftirqd/0] 5? S < 0:00 [kworker/0:0H] 7? S 0:02 [migration/0] 8? S 0:00 [rcu_bh] 9? R 0:30 [rcu_sched] 10? S 0:00 [watchdog/0]. (omitted). 1982? Ss 0:12 / usr/sbin/httpd-DFOREGROUND 2571 tty1 Ss+ 0:02-bash 5213? Ssl 0:12 / usr/sbin/NetworkManager-no-daemon 7625? S < 0:00 [kworker/1:2H]. (omitted) .13739? S 0:00 [kworker/3:1] 13740? S < 0:00 [kworker/3:2H] 13755 pts/0 R + 0:00 ps xanthates It means it has nothing to do with the terminal; # according to the PID number, the meaning of each field is the same as above.
Show all processes:
[root@localhost ~] # ps ax
User-centric display of all processes:
[root@localhost ~] # ps auxUSER PID% CPU% MEM VSZ RSS TTY STAT START TIME COMMANDroot 1 0.00.6 193628 6732? Ss Feb13 0:28 / usr/lib/systemd/systemd-- switched-root-- system-- desroot 2000? S Feb13 0:00 [kthreadd] root 3 0.0 0.0 00? S Feb13 0:00 [ksoftirqd/0] root 5 0.0 0.0 00? S < Feb13 0:00 [kworker/0:0H] root 7 0.0 0.0 00? S Feb13 0:02 [migration/0] root 8 0.0 0.0 0 0? S Feb13 0:00 [rcu_bh] root 9 0.0 0.0 00? S Feb13 0:30 [rcu_sched] root 10 0.0 0.0 0 0? S Feb13 0:00 [watchdog/0] root 11 0.0 0.0 00? S Feb13 0:00 [watchdog/1] root 12 0.0 0.0 00? S Feb13 0:03 [migration/1] root 13 0.0 0.0 0 0? S Feb13 0:03 [ksoftirqd/1]. (omitted below). # meaning of each field: USER: user running the process; PID: process number;% CPU: percentage of CPU resources occupied by the process;% MEM: percentage of memory resources occupied by the process; VSZ:Virtual memory SiZe, virtual memory set; RSS:ReSident Size, resident memory set; TTY: terminal device associated with the process; STAT: process status START: the startup time of the process; TIME: the cumulative CPU time of the process; COMMAND: which command program starts the process, with'[] 'represents the kernel thread, and the pstree command can only display the process
Process status (STAT):
R:running, running state; S:interruptable sleeping, interruptible sleep; D:uninterruptable sleeping, uninterruptible sleep; T:stopped, stopped state; Z:zombie, dead state; +: foreground process (foreground refers to running through a terminal and needs to occupy a command prompt); l: multithreaded process; N: low priority process
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.