In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article introduces the relevant knowledge of "how to use Linux Namespace". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
Linux Namespace is a kernel-level method of environment isolation provided by Linux. Officially, Linux Namespace encapsulates global system resources in an abstraction so that processes within namespace think they have independent resource instances. This technology did not make much waves, but it was the rise of container technology that brought it back to everyone's attention.
There are six categories of Linux Namespace:
Classification system call parameters related kernel version Mount namespacesCLONE_NEWNSLinux 2.4.19UTS namespacesCLONE_NEWUTSLinux 2.6.19IPC namespacesCLONE_NEWIPCLinux 2.6.19PID namespacesCLONE_NEWPIDLinux 2.6.24Network namespacesCLONE_NEWNET begins with Linux 2.6.24, completes with Linux 2.6.29User namespacesCLONE_NEWUSER, starts with Linux 2.6.23, completes with Linux 3.8
Namespace's API consists of three system calls and a series of / proc files, which are described in detail in this article. In order to specify the namespace type to be operated, you need to specify the constant CLONE_NEW* (including CLONE_NEWIPC,CLONE_NEWNS, CLONE_NEWNET,CLONE_NEWPID,CLONE_NEWUSER and `CLONE_NEWUTS) in the flag of the system call. You can specify multiple constants, which can be achieved through the | (bit or) operation.
Briefly describe the functions of the three system calls:
Clone (): implements the thread's system call, which is used to create a new process, and can be isolated by designing the above system call parameters.
Unshare (): detaches a process from a namespace.
Setns (): adds a process to a namespace.
For the specific implementation principle, please read on.
1. Clone ()
The prototype of clone () is as follows:
Int clone (int (* child_func) (void *), void * child_stack, int flags, void * arg)
Child_func: input the main function of the program that the child process is running.
Child_stack: the stack space used by the input child process.
Flags: indicates which CLONE_* flag bits are used.
Args: used to pass in user parameters.
Clone (), like fork (), is equivalent to making a copy of the current process, but clone () has more fine-grained control over resources shared with child processes (in fact, through flags), including virtual memory, open file descriptors, semaphores, and so on. Once the namespace corresponding to the flag bit CLONE_NEW*, is specified, the namespace is created, and the newly created process becomes a member of the namespace.
The prototype of clone () is not the lowest system call, but is encapsulated, and the real system call kernel implementation function is do_fork (), which looks like this:
Long do_fork (unsigned long clone_flags, unsigned long stack_start, unsigned long stack_size, int _ _ user * parent_tidptr, int _ _ user * child_tidptr)
Where clone_flags can be assigned to the flag mentioned above.
Let's look at an example:
/ * demo_uts_namespaces.c Copyright 2013, Michael Kerrisk Licensed under GNU General Public License v2 or later Demonstrate the operation of UTS namespaces.*/#define _ GNU_SOURCE#include # include / * A simple error-handling function: print an error message based on the value in 'errno' and terminate the calling process * / # define errExit (msg) do {perror (msg); exit (EXIT_FAILURE) \} while (0) static int / * Start function for cloned child * / childFunc (void * arg) {struct utsname uts; / * modify the hostname * / if in the new UTS namespace (sethostname (arg, strlen (arg)) =-1) errExit ("sethostname") / * get and display the hostname * / if (uname (& uts) =-1) errExit ("uname"); printf ("uts.nodename in child:% s\ n", uts.nodename); / * Keep the namespace open for a while, by sleeping. This allows some experimentation--for example, another process might join the namespace. * / sleep; return 0; / * Terminates child * /} / * define a stack for clone with a stack size of 1m * / # define STACK_SIZE (1024 * 1024) static char child_ stack [stack _ SIZE]; intmain (int argc, char * argv []) {pid_t child_pid; struct utsname uts; if (argc)
< 2) { fprintf(stderr, "Usage: %s \n", argv[0]); exit(EXIT_FAILURE); } /* 调用 clone 函数创建一个新的 UTS namespace,其中传出一个函数,还有一个栈空间(为什么传尾指针,因为栈是反着的); 新的进程将在用户定义的函数 childFunc() 中执行 */ child_pid = clone(childFunc, child_stack + STACK_SIZE, /* 因为栈是反着的, 所以传尾指针 */ CLONE_NEWUTS | SIGCHLD, argv[1]); if (child_pid == -1) errExit("clone"); printf("PID of child created by clone() is %ld\n", (long) child_pid); /* Parent falls through to here */ sleep(1); /* 给子进程预留一定的时间来改变主机名 */ /* 显示当前 UTS namespace 中的主机名,和 子进程所在的 UTS namespace 中的主机名不同 */ if (uname(&uts) == -1) errExit("uname"); printf("uts.nodename in parent: %s\n", uts.nodename); if (waitpid(child_pid, NULL, 0) == -1) /* 等待子进程结束 */ errExit("waitpid"); printf("child has terminated\n"); exit(EXIT_SUCCESS);} 该程序通过标志位 CLONE_NEWUTS 调用 clone() 函数创建一个 UTS namespace。UTS namespace 隔离了两个系统标识符 - 主机名和 NIS 域名 -它们分别通过 sethostname() 和 setdomainname() 这两个系统调用来设置,并通过系统调用 uname() 来获取。 下面将对程序中的一些关键部分进行解读(为了简单起见,我们将省略其中的错误检查)。 程序运行时后面需要跟上一个命令行参数,它将会创建一个在新的 UTS namespace 中执行的子进程,该子进程会在新的 UTS namespace 中将主机名改为命令行参数中提供的值。 主程序的第一个关键部分是通过系统调用 clone() 来创建子进程: child_pid = clone(childFunc, child_stack + STACK_SIZE, /* Points to start of downwardly growing stack */ CLONE_NEWUTS | SIGCHLD, argv[1]);printf("PID of child created by clone() is %ld\n", (long) child_pid); 子进程将会在用户定义的函数 childFunc() 中开始执行,该函数将会接收 clone() 最后的参数(argv[1])作为自己的参数,并且标志位包含了 CLONE_NEWUTS,所以子进程会在新创建的 UTS namespace 中执行。 接下来主进程睡眠一段时间,让子进程能够有时间更改其 UTS namespace 中的主机名。然后调用 uname() 来检索当前 UTS namespace 中的主机名,并显示该主机名: sleep(1); /* Give child time to change its hostname */uname(&uts);printf("uts.nodename in parent: %s\n", uts.nodename); 与此同时,由 clone() 创建的子进程执行的函数 childFunc() 首先将主机名改为命令行参数中提供的值,然后检索并显示修改后的主机名: sethostname(arg, strlen(arg); uname(&uts);printf("uts.nodename in child: %s\n", uts.nodename); 子进程退出之前也睡眠了一段时间,这样可以防止新的 UTS namespace 不会被关闭,让我们能够有机会进行后续的实验。 执行程序,观察父进程和子进程是否处于不同的 UTS namespace 中: $ su # 需要特权才能创建 UTS namespacePassword: # uname -nantero# ./demo_uts_namespaces bizarroPID of child created by clone() is 27514uts.nodename in child: bizarrouts.nodename in parent: antero 除了 User namespace 之外,创建其他的 namespace 都需要特权,更确切地说,是需要相应的 Linux Capabilities,即 CAP_SYS_ADMIN。这样就可以避免设置了 SUID(Set User ID on execution)的程序因为主机名不同而做出一些愚蠢的行为。如果对 Linux Capabilities 不是很熟悉,可以参考我之前的文章:Linux Capabilities 入门教程:概念篇。 2. proc 文件 每个进程都有一个 /proc/PID/ns 目录,其下面的文件依次表示每个 namespace, 例如 user 就表示 user namespace。从 3.8 版本的内核开始,该目录下的每个文件都是一个特殊的符号链接,链接指向 $namespace:[$namespace-inode-number],前半部份为 namespace 的名称,后半部份的数字表示这个 namespace 的句柄号。句柄号用来对进程所关联的 namespace 执行某些操作。 $ ls -l /proc/$$/ns # $$ 表示当前所在的 shell 的 PIDtotal 0lrwxrwxrwx. 1 mtk mtk 0 Jan 8 04:12 ipc ->Ipc: [4026531839] lrwxrwxrwx. 1 mtk mtk 0 Jan 8 04:12 mnt-> mnt: [4026531840] lrwxrwxrwx. 1 mtk mtk 0 Jan 8 04:12 net-> net: [4026531956] lrwxrwxrwx. 1 mtk mtk 0 Jan 8 04:12 pid-> pid: [4026531836] lrwxrwxrwx. 1 mtk mtk 0 Jan 8 04:12 user-> user: [4026531837] lrwxrwxrwx. 1 mtk mtk 0 Jan 8 04:12 uts-> uts: [4026531838]
One of the uses of these symbolic links is to confirm whether two different processes are in the same namespace. If two processes point to the same namespace inode number, they are under the same namespace, otherwise they are under different namespace. The files pointed to by these symbolic links are special and cannot be accessed directly. In fact, the files they point to are stored in a file system called nsfs, which is not visible to users. You can use the system call stat () to get inode number in the st_ino field of the returned structure. In the shell terminal, you can use the command (which actually calls stat ()) to see the inode information pointing to the file:
$stat-L / proc/$$/ns/net File: / proc/3232/ns/net Size: 0 Blocks: 0 IO Block: 4096 regular empty fileDevice: 4h/4d Inode: 4026531956 Links: 1Access: (0444) Rafael Uid: (0 / root) Gid: (0 / root) Access: 2020-01-17 1545 Blocks 23.783304900 + 0800Modify: 2020-01- 17 1515 45 23.783304900 + 0800Change: 2020-01-17 15 15 purl 45 23.783304900 + 0800:-
In addition to the above purposes, these symbolic links have other uses, and if we open one of the files, the namespace will not be deleted as long as the file descriptor associated with that file is open, even if all processes in the namespace are terminated. You can also achieve the same effect by mounting symbolic links to other locations on the system through bind mount:
$touch / uts$ mount-- bind / proc/27514/ns/uts ~ / uts3. Setns ()
Adding an existing namespace can be done by calling setns (). Its prototype is as follows:
Int setns (int fd, int nstype)
More specifically, setns () separates the called process from an instance of a particular type of namespace and re-associates the process with another instance of that type of namespace.
Fd represents the file descriptor of the namespace to be added, which can be obtained by opening one of the symbolic links or by opening bind mount to one of the linked files.
Nstype allows the caller to check the namespace type pointed to by fd. The value can be set to the constant CLONE_NEW*, mentioned earlier, 0 means no check. If the caller already knows clearly that he or she is adding the namespace type, or doesn't care about the namespace type, you can use this parameter to automatically verify.
Combining setns () and execve () can achieve a simple but very useful function: add a process to a specific namespace, and then execute commands in that namespace. Let's look directly at the example:
/ * ns_exec.c Copyright 2013, Michael Kerrisk Licensed under GNU General Public License v2 or later Join a namespace and execute a command in the namespace*/#define _ GNU_SOURCE#include # include / * A simple error-handling function: print an error message based on the value in 'errno' and terminate the calling process * / # define errExit (msg) do {perror (msg); exit (EXIT_FAILURE) While (0) intmain (int argc, char * argv []) {int fd; if (argc)
< 3) { fprintf(stderr, "%s /proc/PID/ns/FILE cmd [arg...]\n", argv[0]); exit(EXIT_FAILURE); } fd = open(argv[1], O_RDONLY); /* 获取想要加入的 namespace 的文件描述符 */ if (fd == -1) errExit("open"); if (setns(fd, 0) == -1) /* 加入该 namespace */ errExit("setns"); execvp(argv[2], &argv[2]); /* 在加入的 namespace 中执行相应的命令 */ errExit("execvp");} 该程序运行需要两个或两个以上的命令行参数,第一个参数表示特定的 namespace 符号链接的路径(或者 bind mount 到这些符号链接的文件路径);第二个参数表示要在该符号链接相对应的 namespace 中执行的程序名称,以及执行这个程序所需的命令行参数。关键步骤如下: fd = open(argv[1], O_RDONLY); /* 获取想要加入的 namespace 的文件描述符 */setns(fd, 0); /* 加入该 namespace */execvp(argv[2], &argv[2]); /* 在加入的 namespace 中执行相应的命令 */ 还记得我们之前已经通过 bind mount 将 demo_uts_namespaces 创建的 UTS namespace 挂载到 ~/uts 中了吗?可以将本例中的程序与之结合,让新进程可以在该 UTS namespace 中执行 shell: $ ./ns_exec ~/uts /bin/bash # ~/uts 被 bind mount 到了 /proc/27514/ns/uts My PID is: 28788 验证新的 shell 是否与 demo_uts_namespaces 创建的子进程处于同一个 UTS namespace: $ hostnamebizarro$ readlink /proc/27514/ns/utsuts:[4026532338]$ readlink /proc/$$/ns/uts # $$ 表示当前 shell 的 PIDuts:[4026532338] 在早期的内核版本中,不能使用 setns() 来加入 mount namespace、PID namespace 和 user namespace,从 3.8 版本的内核开始,setns() 支持加入所有的 namespace。 util-linux 包里提供了nsenter 命令,其提供了一种方式将新创建的进程运行在指定的 namespace 里面,它的实现很简单,就是通过命令行(-t 参数)指定要进入的 namespace 的符号链接,然后利用 setns() 将当前的进程放到指定的 namespace 里面,再调用 clone() 运行指定的执行文件。我们可以用 strace 来看看它的运行情况: # strace nsenter -t 27242 -i -m -n -p -u /bin/bashexecve("/usr/bin/nsenter", ["nsenter", "-t", "27242", "-i", "-m", "-n", "-p", "-u", "/bin/bash"], [/* 21 vars */]) = 0……………………pen("/proc/27242/ns/ipc", O_RDONLY) = 3open("/proc/27242/ns/uts", O_RDONLY) = 4open("/proc/27242/ns/net", O_RDONLY) = 5open("/proc/27242/ns/pid", O_RDONLY) = 6open("/proc/27242/ns/mnt", O_RDONLY) = 7setns(3, CLONE_NEWIPC) = 0close(3) = 0setns(4, CLONE_NEWUTS) = 0close(4) = 0setns(5, CLONE_NEWNET) = 0close(5) = 0setns(6, CLONE_NEWPID) = 0close(6) = 0setns(7, CLONE_NEWNS) = 0close(7) = 0clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f4deb1faad0) = 49684. unshare() 最后一个要介绍的系统调用是 unshare(),它的原型如下: int unshare(int flags); unshare() 与 clone() 类似,但它运行在原先的进程上,不需要创建一个新进程,即:先通过指定的 flags 参数 CLONE_NEW* 创建一个新的 namespace,然后将调用者加入该 namespace。最后实现的效果其实就是将调用者从当前的 namespace 分离,然后加入一个新的 namespace。 Linux 中自带的 unshare 命令,就是通过 unshare() 系统调用实现的,使用方法如下: $ unshare [options] program [arguments] options 指定要创建的 namespace 类型。 unshare 命令的主要实现如下: /* 通过提供的命令行参数初始化 'flags' */unshare(flags);/* Now execute 'program' with 'arguments'; 'optind' is the index of the next command-line argument after options */execvp(argv[optind], &argv[optind]); unshare 命令的完整实现如下: /* unshare.c Copyright 2013, Michael Kerrisk Licensed under GNU General Public License v2 or later A simple implementation of the unshare(1) command: unshare namespaces and execute a command.*/#define _GNU_SOURCE#include #include #include #include /* A simple error-handling function: print an error message based on the value in 'errno' and terminate the calling process */#define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \ } while (0)static voidusage(char *pname){ fprintf(stderr, "Usage: %s [options] program [arg...]\n", pname); fprintf(stderr, "Options can be:\n"); fprintf(stderr, " -i unshare IPC namespace\n"); fprintf(stderr, " -m unshare mount namespace\n"); fprintf(stderr, " -n unshare network namespace\n"); fprintf(stderr, " -p unshare PID namespace\n"); fprintf(stderr, " -u unshare UTS namespace\n"); fprintf(stderr, " -U unshare user namespace\n"); exit(EXIT_FAILURE);}intmain(int argc, char *argv[]){ int flags, opt; flags = 0; while ((opt = getopt(argc, argv, "imnpuU")) != -1) { switch (opt) { case 'i': flags |= CLONE_NEWIPC; break; case 'm': flags |= CLONE_NEWNS; break; case 'n': flags |= CLONE_NEWNET; break; case 'p': flags |= CLONE_NEWPID; break; case 'u': flags |= CLONE_NEWUTS; break; case 'U': flags |= CLONE_NEWUSER; break; default: usage(argv[0]); } } if (optind >= argc) usage (argv [0]); if (unshare (flags) =-1) errExit ("unshare"); execvp (argv [optind], & argv [optind]); errExit ("execvp");}
Let's execute the unshare.c program to execute shell in a new mount namespace:
$echo $$# shows the PID8490 $cat / proc/8490/mounts of the current shell | grep mq # shows a mount point mqueue / dev/mqueue mqueue rw,seclabel in the current namespace Relatime 0 0$ readlink / proc/8490/ns/mnt # displays the ID mnt of the current namespace: [4026531840] $. / unshare-m / bin/bash # execute the new shell$ readlink / proc/$$/ns/mnt # in the newly created mount namespace to display the ID mnt of the new namespace: [4026532325]
Comparing the output of the two readlink commands, you can see that the two shell are in different mount namespace. Change a mount point in the new namespace, and then observe whether the mount points of the two namespace have changed:
$umount / dev/mqueue # remove the mount point $cat / proc/$$/mounts in the new namespace | grep mq # check whether it is valid $cat / proc/8490/mounts | grep mq # to see if the mount point in the original namespace still exists? mqueue / dev/mqueue mqueue rw,seclabel,relatime 0 0
As you can see, the mount point / dev/mqueue in the new namespace has disappeared, but it still exists in the original namespace.
That's all for the content of "how to use Linux Namespace". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.