In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly introduces the relevant knowledge of "how to realize the process ID number of Linux". The editor shows you the operation process through the actual case. The operation method is simple, fast and practical. I hope this article "how to realize the process ID number of Linux" can help you solve the problem.
the code in this article is extracted from the Linux kernel version 5.15.13.
The Linux process always assigns a number to uniquely identify them in its namespace. This number is called the process ID number, or PID for short. Each process generated with fork or clone is automatically assigned a new and unique pid value by the kernel.
1. Process ID1.1, other ID
each process has other ID in addition to the eigenvalue of PID. There are several possible types
1, all processes in a thread group (in a process, to mark CLONE_THREAD to call the different execution context of the process established by clone, as we will see later), all processes have a unified thread group ID (TGID). If the process does not use threads, its PID and TGID are the same. The main process in a thread group is called a group leader. The group_leader member of the task_struct of all threads created through clone points to the task_struct instance of the group leader.
2. In addition, independent processes can be merged into process groups (using setpgrp system calls). The pgrp attribute value of the task_struct of the process group member is the same, that is, the PID of the process group leader. Process groups simplify the operation of sending signals to all members of the group, which is useful for various system programming applications (see the system programming literature, such as [SR05]). Note that processes connected by pipes are contained in the same process group.
3. Several process groups can be merged into a single session. All processes in the session have the same session ID, which is saved in the session member of the task_struct. SID can be set using setsid system call. It can be used for terminal programming.
1.2.Global ID and local ID
namespaces add complexity to PID management. PID namespaces are organized in a hierarchy. When you create a new namespace, all PID in that namespace are visible to the parent namespace, but the child namespace cannot see the PID of the parent namespace. But this means that some processes have multiple PID, and whenever you can see the namespace of the process, a PID is assigned to it. This must be reflected in the data structure. We must distinguish between local ID and global ID.
1. The global ID is the only ID number in the kernel itself and in the initial namespace, and the init process that starts during system startup belongs to the initial namespace. For each ID type, there is a given global ID that is guaranteed to be unique throughout the system.
2. Local ID belongs to a specific namespace and does not have global validity. For each ID type, they are valid within the namespace to which they belong, but ID of the same type and value may appear in different namespaces.
1.3.The implementation of ID
global PID and TGID are stored directly in task_struct, which are pid and tgid members of task_struct, respectively, in the sched.h file:
Struct task_struct {... pid_t pid;pid_t tgid;...}
Both entries are pid_t types, which are defined as _ _ kernel_pid_t, which is defined separately by each architecture. It is usually defined as int, which means that 232 different ID can be used at the same time.
2. Manage PID
A small subsystem of called the PID allocator (pid allocator) is used to speed up the allocation of new ID. In addition, the kernel needs to provide helper functions to implement the function of finding task_struct through ID and its types of processes, as well as the ability to convert the kernel representation of ID and values visible in user space.
2.1.The representation of PID namespaces
is defined in the pid_namespace.h file as follows:
Struct pid_namespace {struct idr idr; struct rcu_head rcu; unsigned int pid_allocated; struct task_struct * child_reaper; struct kmem_cache * pid_cachep; unsigned int level; struct pid_namespace * parent;#ifdef CONFIG_BSD_PROCESS_ACCT struct fs_pin * bacct;#endif struct user_namespace * user_ns; struct ucounts * ucounts; int reboot / * group exit code if this pidns was rebooted * / struct ns_common ns;} _ _ randomize_layout
each PID namespace has a process that acts like a global init process. One of the purposes of init is to call wait4 on orphan processes, which must also be done by local init variants of the namespace. Child_reaper holds a pointer to the process's task_struct.
parent is a pointer to the parent namespace, and level represents the depth of the current namespace in the namespace hierarchy. The level of the initial namespace is 0, the subspace level of the namespace is 1, and the subspace level of the next layer is 2, recursive in turn. The calculation of level is more important because ID in namespaces with higher level is visible to namespaces with lower level. From a given level setting, the kernel can infer how many ID the process will be associated with.
2.2. management of PID 2.2.1, data structure of PID
The management of PID revolves around two data structures: struct pid is the kernel's internal representation of PID, and struct upid represents information visible in a particular namespace. The two structures are defined in the file pid.h, as follows:
/ * What is struct pid? * * A struct pid is the kernel's internal notion of a process identifier. * It refers to inpidual tasks, process groups, and sessions. While * there are processes attached to it the struct pid lives in a hash * table, so it and then the processes that it refers to can be found * quickly from the numeric pid value. The attached processes may be * quickly accessed by following pointers from struct pid. * Storing pid_t values in the kernel and referring to them later has a * problem. The process originally with that pid may have exited and the * pid allocator wrapped, and another process could have come along * and been assigned that pid. * Referring to user space processes by holding a reference to struct * task_struct has a problem. When the user space process exits * the now useless task_struct is still kept. A task_struct plus a * stack consumes around 10K of low kernel memory. More precisely * this is THREAD_SIZE + sizeof (struct task_struct). By comparison * a struct pid is about 64 bytes. * * Holding a reference to struct pid solves both of these problems. * It is small so holding a reference does not consume a lot of * resources, and since a new struct pid is allocated when the numeric pid * value is reused (when pids wrap around) we don't mistakenly refer to new * processes. * / * struct upid is used to get the id of the struct pid, as it is * seen in particular namespace. Later the struct pid is found with * find_pid_ns () using the int nr and struct pid_namespace * ns. * / struct upid {int nr; struct pid_namespace * ns;}; struct pid {refcount_t count; unsigned int level; spinlock_t lock; / * lists of tasks that use this pid * / struct hlist_head tasks [PIDTYPE _ MAX]; struct hlist_head inodes; / * wait queue for pidfd notifications * / wait_queue_head_t wait_pidfd; struct rcu_head rcu Struct upid numbers [1];}
for struct upid, nr represents the numerical value of ID, and ns is a pointer to the namespace to which the ID belongs. All upid instances are saved in a hash table. Pid_chain implements a hash overflow linked list using the standard kernel method. The definition of struct pid is first of all a reference counter count. Tasks is an array, and each array item is a hash header that corresponds to an ID type. This is necessary because one ID may be used for several processes. All task_struct instances that share the same given ID are connected through this list. PIDTYPE_MAX represents the number of ID types:
Enum pid_type {PIDTYPE_PID, PIDTYPE_TGID, PIDTYPE_PGID, PIDTYPE_SID, PIDTYPE_MAX,}; 2.2.2, PID and process connection
A process may be visible in multiple namespaces, while its local ID varies from one namespace to another. Level represents the number of namespaces that can be seen for the process (in other words, the depth of the namespace that contains the process in the namespace hierarchy), while numbers is an array of upid instances, with each array entry corresponding to a namespace. Note that the array formally has only one array item, which is true if a process is contained only in the global namespace. Because the array is at the end of the structure, additional items can be added to the array as long as more memory space is allocated.
since all task_struct instances that share the same ID are stored in a hash table by process, you need to add a hash table element to the struct task_struct in the structure header definition of the process in the sched.h file.
Struct task_struct {. / * PID/PID hash table linkage. * / struct pid * thread_pid; struct hlist_node pid_ links [PIDTYPE MAX]; struct list_head thread_group; struct list_head thread_node;...}
connects the task_struct to the hash table in the header in pid_links.
2.2.3. Find PID
if you have assigned a new instance of struct pid and set it for the given ID type. It will be attached to the task_struct as follows, in the kernel/pid.c file:
Static struct pid * * task_pid_ptr (struct task_struct * task, enum pid_type type) {return (type = = PIDTYPE_PID)? & task- > thread_pid: & task- > signal- > pids [type];} / * attach_pid () must be called with the tasklist_lock write-held. * / void attach_pid (struct task_struct * task, enum pid_type type) {struct pid * pid = * task_pid_ptr (task, type); hlist_add_head_rcu (& task- > pid_links [type], & pid- > tasks [type]);}
has established a two-way connection here: task_struct can access the pid instance through task_struct- > pids [type]-> pid. Starting with the pid instance, you can traverse the tasks [type] hash table to find task_struct. Hlist_add_head_rcu is a standard function for traversing a hash table.
Third, generate a unique PID
In addition to managing the PID, the kernel is responsible for providing the mechanism to generate a unique PID. To track PID that has been allocated and is still available, the kernel uses a large bitmap, where each PID is identified by a bit. The value of PID can be calculated from the position of the corresponding bit in the bitmap. Therefore, assigning an idle PID is essentially equivalent to finding the first bit in the bitmap with a value of 0, and then setting that bit to 1. Conversely, releasing a PID can be achieved by switching the corresponding bit from 1 to 0. When you create a new process, the process may be visible in multiple namespaces. For each such namespace, you need to generate a local PID. This is handled in alloc_pid, and in the file kernel/pid.c:
Struct pid * alloc_pid (struct pid_namespace * ns, pid_t * set_tid, size_t set_tid_size) {struct pid * pid; enum pid_type type; int i, nr; struct pid_namespace * tmp; struct upid * upid; int retval =-ENOMEM; / * * set_tid_size contains the size of the set_tid array. Starting at * the most nested currently active PID namespace it tells alloc_pid () * which PID to set for a process in that most nested PID namespace * up to set_tid_size PID namespaces. It does not have to set the PID * for a process in all nested PID namespaces but set_tid_size must * never be greater than the current ns- > level + 1. * / if (set_tid_size > ns- > level + 1) return ERR_PTR (- EINVAL); pid = kmem_cache_alloc (ns- > pid_cachep, GFP_KERNEL) If (! pid) return ERR_PTR (retval); tmp = ns; pid- > level = ns- > level; for (I = ns- > level; I > = 0; iMurray -) {int tid = 0; if (set_tid_size) {tid = set_ tid [ns- > level-I] Retval =-EINVAL; if (tid
< 1 || tid >= pid_max) goto out_free; / * * Also fail if a PID! = 1 is requested and * no PID 1 exists. * / if (tid! = 1 & &! tmp- > child_reaper) goto out_free; retval =-EPERM; if (! checkpoint_restore_ns_capable (tmp- > user_ns)) goto out_free Set_tid_size--;} idr_preload (GFP_KERNEL); spin_lock_irq (& pidmap_lock) If (tid) {nr = idr_alloc (& tmp- > idr, NULL, tid, tid + 1, GFP_ATOMIC); / * If ENOSPC is returned it means that the PID is * alreay in use. Return EEXIST in that case. * / if (nr = =-ENOSPC) nr =-EEXIST;} else {int pid_min = 1 / * * init really needs pid 1, but after reaching the * maximum wrap back to RESERVED_PIDS * / if (idr_get_cursor (& tmp- > idr) > RESERVED_PIDS) pid_min = RESERVED_PIDS / * Store a null pointer so find_pid_ns does not find * a partially initialized PID (see below). * / nr = idr_alloc_cyclic (& tmp- > idr, NULL, pid_min, pid_max, GFP_ATOMIC);} spin_unlock_irq (& pidmap_lock); idr_preload_end (); if (nr)
< 0) { retval = (nr == -ENOSPC) ? -EAGAIN : nr; goto out_free; } pid->Number [I] .nr = nr; pid- > number [I] .ns = tmp; tmp = tmp- > parent;} / * * ENOMEM is not the most obvious choice especially for the case * where the child subreaper has already exited and the pid * namespace denies the creation of any new processes. But ENOMEM * is what we have exposed to userspace for a long time and it is * documented behavior for pid namespaces. So we can't easily * change it even if there were an error code better suited. * / retval =-ENOMEM; get_pid_ns (ns); refcount_set (& pid- > count, 1); spin_lock_init (& pid- > lock); for (type = 0; type)
< PIDTYPE_MAX; ++type) INIT_HLIST_HEAD(&pid->Tasks [type]); init_waitqueue_head (& pid- > wait_pidfd); INIT_HLIST_HEAD (& pid- > inodes); upid = pid- > numbers + ns- > level; spin_lock_irq (& pidmap_lock); if (! (ns- > pid_allocated & PIDNS_ADDING) goto out_unlock; for (; upid > = pid- > numbers) -- upid) {/ * Make the PID visible to find_pid_ns. * / idr_replace (& upid- > ns- > idr, pid, upid- > nr); upid- > ns- > pid_allocated++;} spin_unlock_irq (& pidmap_lock); return pid;out_unlock: spin_unlock_irq (& pidmap_lock); put_pid_ns (ns); out_free: spin_lock_irq (& pidmap_lock) While (+ + I level) {upid = pid- > numbers + I; idr_remove (& upid- > ns- > idr, upid- > nr);} / * On failure to allocate the first pid, reset the state * / if (ns- > pid_allocated = = PIDNS_ADDING) idr_set_cursor (& ns- > idr, 0); spin_unlock_irq (& pidmap_lock) Kmem_cache_free (ns- > pid_cachep, pid); return ERR_PTR (retval);} this is the end of the content about "how to implement the ID number of the process of Linux". Thank you for reading. If you want to know more about the industry, you can follow the industry information channel. The editor will update different knowledge points for you every day.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.