Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the common process communication mechanisms provided by the Linux kernel

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article introduces the knowledge of "what are the common process communication mechanisms provided by the Linux kernel?" in the actual case operation, many people will encounter such a dilemma, and then let the editor lead you to learn how to deal with these situations! I hope you can read it carefully and be able to achieve something!

1. What is process communication?

As the name implies, process communication (InterProcess Communication,IPC) refers to the "exchange of information between processes". In fact, "process synchronization and mutual exclusion is essentially a kind of process communication" (which is why we will see semaphores and PV operations in the process communication mechanism later), but it only transmits semaphores. By modifying semaphores, processes establish connections, coordinate with each other and work together, but it "lacks the ability to transmit data".

Although there are some cases, the amount of information exchanged between processes is very small, such as only exchanging a certain state information, so that the synchronization and mutual exclusion mechanism of the process is fully qualified for this task. But in most cases, "a large amount of data needs to be exchanged between processes", such as transmitting a batch of information or entire files, which needs to be done through a new communication mechanism, that is, the so-called process communication.

Let's take an intuitive look at some process communication from the operating system level: we know that, in order to ensure security, the user address space of each process is independent, and generally speaking, one process cannot directly access the address space of another process. however, the kernel space is shared by every process, so "if you want to exchange information between processes, you must go through the kernel."

Let's list the common process communication mechanisms provided by the Linux kernel:

Pipes (also known as shared files) message queues (also known as messaging) shared memory (also known as shared storage) semaphores and PV operation signal sockets (Socket) 2. Pipeline Anonymous Pipeline

If you have learned the Linux command, you must be familiar with pipes. Linux pipes use vertical bars | connect multiple commands, which is called pipe characters.

$command1 | command2

The above line of code constitutes a pipeline, its function is the output of the previous command (command1) as the input of the latter command (command2), from this function description, we can see that "data in the pipeline can only flow in one way", that is, half-duplex communication, if we want to achieve mutual communication (full-duplex communication), we need to create two pipes.

In addition, the pipe created by the pipe character | is anonymous and will be automatically destroyed when it is used up. Also, anonymous pipes can only be used between processes that are related (parent-child processes). In other words, anonymous pipes can only be used for communication between parent and child processes.

In the actual coding of Linux, anonymous pipes are created through the pipe function. If the pipeline is successfully created, 0 is returned. If the creation fails,-1 is returned:

Int pipe (int fd [2])

The function has an array of file descriptors with storage space 2:

Fd [0] points to the read end of the pipe, and the output of fd [1] to the write end of the pipe fd [1] is the input of fd [0].

Roughly explain the steps to achieve interprocess communication through anonymous pipes:

1) the parent process creates two anonymous pipes, pipe 1 (fd1 [0] and fd1 [1]) and pipe 2 (fd2 [0] and fd2 [1])

Because the data of the pipeline flows in one direction, two pipes, one in each direction, are needed to realize the two-way communication of data.

2) the parent process fork the child process, so for these two anonymous pipes, the child process also has two file descriptors pointing to both ends of the anonymous pipeline.

3) the parent process closes the fd1 [0] of pipeline 1 and the fd2 [1] of pipeline 2, and the child process closes the fd1 [1] of pipeline 1 and the fd2 [0] of pipeline 2. In this way, pipeline 1 can only be used for parent process writing and child process reading, and pipeline 2 can only be used for parent process reading and child process writing. The pipeline is implemented with a "ring queue", where data flows in from the writer and out from the reader, which enables two-way communication between the parent and child processes.

After reading the above, let's understand the nature of the pipe: for processes on both sides of the pipe, the pipe is a file (which is why the pipe is also called the shared file mechanism). But it is not an ordinary file, it does not belong to some kind of file system, but a separate file system, and only exists in memory.

To put it simply, "the essence of the pipe is that the kernel opens up a buffer in memory, which is associated with the pipe file, and the operation on the pipe file is converted by the kernel into the operation of this buffer."

Famous pipeline

Because there is no name, anonymous pipes can only be used for communication between parent and child processes. In order to overcome this disadvantage, a well-known pipeline, also known as FIFO, is proposed, because data is transmitted on a first-in-first-out basis.

The so-called named pipeline is to provide a path name associated with it, so that even if the process is not related to the process of creating the named pipeline, as long as the path can be accessed, they can communicate with each other through the named pipeline.

Use the Linux command mkfifo to create a named pipe:

$mkfifo myPipe

MyPipe is the name of the pipe. Next, we write data to the named pipe myPipe:

$echo "hello" > myPipe

After executing this command, you will find that it stops here because the contents of the pipe are not read, and the command can exit normally only after the data in the pipe has been read. So, we execute another command to read the data in this named pipeline:

$cat

< myPipe hello 3. 消息队列 可以看出,「管道这种进程通信方式虽然使用简单,但是效率比较低,不适合进程间频繁地交换数据,并且管道只能传输无格式的字节流」。为此,消息传递机制(Linux 中称消息队列)应用而生。比如,A 进程要给 B 进程发送消息,A 进程把数据放在对应的消息队列后就可以正常返回了,B 进程在需要的时候自行去消息队列中读取数据就可以了。同样的,B 进程要给 A 进程发送消息也是如此。

The essence of a message queue is a linked list of messages stored in memory, and messages are essentially user-defined data structures. If the process reads a message from the message queue, the message is deleted from the message queue. Compare the plumbing mechanism:

Message queuing allows one or more processes to write or read messages to it. Message queuing can realize the "random query" of messages, not necessarily in the first-in-first-out order, but also according to the type of message. It has more advantages than the first-in-first-out principle of the famous pipeline. For message queuing, there is no need for another process to wait for the message to arrive on the message queue before one process writes the message to one queue. For pipes, it makes no sense to have a write process to write first unless the read process already exists. The life cycle of the message queue follows the kernel, and if the message queue is not released or the operating system is not shut down, the message queue will always exist. Anonymous pipes are established with the creation of the process and destroyed with the end of the process.

It is important to note that message queuing is useful for exchanging a small amount of data because there is no need to avoid conflicts. However, because when a user process writes data to a message queue in memory, the process of "copying" data from the user state to the kernel state occurs; similarly, when another user process reads the message data in memory, the process of copying data from kernel state to user state occurs. Therefore, "if the amount of data is large, the use of message queuing will result in frequent system calls, that is, it will take more time for the kernel to intervene."

4. Shared memory

In order to avoid copying messages and making system calls as frequently as message queues, shared memory mechanisms have emerged.

As the name implies, shared memory allows unrelated processes to connect the same piece of physical memory to their respective address spaces, so that these processes can access the same physical memory, and this physical memory becomes shared memory. If a process writes data to shared memory, the changes will immediately affect any other process that can access the same segment of shared memory.

To gather the contents of memory management, let's deeply understand the principle of shared memory. First of all, each process has its own process control block (PCB) and logical address space (Addr Space), and has a corresponding page table, which is responsible for mapping the logical address (virtual address) and physical address of the process, and managing it through the memory management unit (MMU). The logical addresses of two different processes are mapped to the same area of the physical space through the page table, and the area they point to together is shared memory.

Different from the frequent system calls of message queues, for the shared memory mechanism, system calls are needed only when the shared memory area is established. Once the shared memory is established, all access can be used as regular memory access without the help of the kernel. In this way, data does not need to be copied back and forth between processes, so this is the fastest way for processes to communicate.

5. Semaphore and PV operation

In fact, the latest research on multi-CPU systems shows that the performance of message passing on such systems is actually better than shared memory, because "message queues do not need to avoid conflicts, and shared memory mechanisms may conflict." That is, if multiple processes modify the same shared memory at the same time, the content written by the first process will be overwritten later.

Moreover, in a multi-channel batch system, multiple processes can be executed concurrently, but due to the limited resources of the system, the execution of the process is not consistent to the end, but stop-and-go, moving forward at an unpredictable speed (asynchrony). But sometimes we want multiple processes to cooperate closely and execute in a particular order in order to achieve a common task.

For example, if there are two processes An and B responsible for reading and writing data respectively, the two threads cooperate and depend on each other. Then the data should be written before the data is read. In fact, due to the existence of asynchrony, read-then-write may occur, and because the buffer has not been written, read process A has no data to read, so read process An is blocked.

Therefore, in order to solve these two problems, to ensure that only one process is accessing (mutually exclusive) at any time in the shared memory, and to enable processes to access shared memory (synchronization) in a particular order, we can use process synchronization and mutual exclusion mechanisms, such as semaphores and PV operations.

"synchronization and mutual exclusion of processes are actually a protection mechanism for process communication, and they are not used to transmit the content of real communication between processes, but they are also included in the category of process communication because they transmit semaphores. It's called low-level communication.

The following is similar to what was said in the previous article [after reading the process synchronization and mutual exclusion mechanism, I finally fully understand the PV operation], friends who have seen it can skip to the next title.

A semaphore is actually a variable. We can use a semaphore to represent the number of certain resources in the system. For example, if there is only one printer in the system, we can set a semaphore with an initial value of 1.

User processes can operate on semaphores by using a pair of primitives provided by the operating system, so that processes can be mutually exclusive or synchronized conveniently. This pair of primitives is the PV operation:

1) "P operation": subtract the semaphore value by 1, which means "apply for occupation of a resource". If the result is less than 0, it indicates that there are no available resources, and the process performing the P operation is blocked. If the result is greater than or equal to 0, it means that the existing resources are sufficient for you to use, and the process of performing the P operation continues.

It can be understood that when the value of the semaphore is 2, there are two resources available, and when the value of the semaphore is-2, there are two processes waiting to use the resource. I really couldn't understand the V operation without reading this sentence, and I woke up immediately after watching it.

2) "V operation": add the semaphore value by 1, which means "release a resource", even if the resource is returned after running out of it. If the value of the semaphore is less than or equal to 0, it means that some process is waiting for the resource. Since we have released a resource, we need to wake up a process waiting to use the resource (ready state) to keep it running.

I think it's popular enough, but you may still be confused about the V operation. Let's take a look at two more questions about the V operation:

Q: "A semaphore value greater than 0 indicates that a shared resource is available, so why don't you wake up the process at this time?"

A: the so-called wake-up process wakes up the process from the ready queue (blocking queue), and a value greater than 0 of the semaphore indicates that a shared resource is available, that is, no process is blocked on this resource at this time, so there is no need to wake up and run normally.

Q: "when the value of the semaphore equals 0 indicates that there are no shared resources available, why wake up the process?"

Answer: the V operation first performs the semaphore value plus 1, that is, the semaphore value is added to 1 before it becomes 0. Before that, the semaphore value is-1, that is, a process is waiting for this shared resource, and we need to wake it up.

Semaphores and PV operations are defined as follows:

Mutually exclusive access to shared memory

Mutually exclusive access to shared memory by different processes can be achieved in two steps:

Define a mutex semaphore and initialize it to 1 to place access to shared memory between P and V operations

P operation and V operation must appear in pairs. The lack of P operation can not guarantee mutually exclusive access to shared memory, and the lack of V operation will result in shared memory never being freed and waiting processes never awakened.

Achieve process synchronization

Recall that process synchronization means that each concurrent process runs in an orderly manner as required.

For example, the following two processes P1 and P2 execute concurrently, and the order in which they proceed alternately is uncertain because of the asynchronism. Assuming that the "code 4" of P2 can only be executed based on the running results of "code 1" and "code 2" of P1, then we must ensure that "code 4" must be executed after "code 2".

If the "code 4" of P2 can only be executed based on the running results of "code 1" and "code 2" of P1, then we must ensure that "code 4" must be executed after "code 2".

It is also very convenient to use semaphores and PV operations to synchronize processes. There are three steps:

Define a synchronization semaphore and initialize the current number of available resources to perform V operations on the "back" side of operations with higher priority, release resources to perform P operations on the "front" side of operations with lower priority, and apply for occupation of resources.

With the following picture, we can understand it intuitively:

6. Signal

Be careful! Signal and semaphore are two completely different concepts!

Signal is the only "asynchronous" communication mechanism in the process communication mechanism, which can send a signal to a process at any time. "notifies the process of sending an asynchronous event by sending a specified signal to force the process to execute the signal handler. After the signal processing is completed, the interrupted process will resume execution. Users, kernels, and processes can all generate and send signals.

The main sources of signal events are hardware sources and software sources. The so-called hardware source means that we can enter some key combinations on the keyboard to send signals to the process, such as the common key combination Ctrl+C to generate SIGINT signals to terminate the process; while the software source is to send signals to the process through kill series commands, such as kill-9 1111, to send SIGKILL signals to the process with a PID of 1111, so that it ends immediately. Let's take a look at what signals are in Linux:

7. Socket

So far, the five methods described above are all used to communicate between processes on the same host. What if you want to "communicate with processes on different hosts across the network"? This is what Socket communication does ("of course, Socket can also communicate with processes on the host").

Socket originated from Unix, which originally means "socket". In the field of computer communication, Socket is translated as "socket", which is a convention or a way of communication between computers. Through the convention of Socket, one computer can receive data from other computers and send data to other computers.

From the computer network level, "Socket socket is the cornerstone of network communication", is the basic operation unit of network communication that supports TCP/IP protocol. It is the abstract representation of the endpoint in the process of network communication, and contains five kinds of information necessary for network communication: the protocol used for the connection, the IP address of the local host, the protocol port of the local process, the IP address of the remote host, and the protocol port of the remote process.

The essence of Socket is actually a programming interface (API), which is the intermediate software abstraction layer for the communication between the application layer and the TCP/IP protocol family, which encapsulates TCP/IP. It "hides the complex TCP/IP protocol family behind the Socket interface". For users, the network connection can be achieved through a simple set of API.

8. Summary

Briefly summarize the process communication mechanisms provided by the above six Linux kernels:

1) first of all, the simplest way is a "pipe", which is essentially a special file stored in memory. In other words, the kernel opens up a buffer in memory, which is associated with the pipe file, and the operation on the pipe file is converted by the kernel into the operation of this buffer. Pipes are divided into anonymous pipes and named pipes. Anonymous pipes can only communicate between parent and child processes, while named pipes have no restrictions.

2) although the use of the pipeline is simple, it is inefficient and is not suitable for frequent data exchange between processes, and the pipeline can only transmit unformatted byte streams. For this "message queuing" application. The essence of a message queue is a linked list of messages stored in memory, and messages are essentially user-defined data structures. If the process reads a message from the message queue, the message is deleted from the message queue.

3) the speed of message queue is slow, because each data writing and reading needs to go through the process of copying data between user state and kernel state. "shared memory" can solve this problem. The so-called shared memory is that the logical addresses of two different processes are mapped to the same area of the physical space through the page table, and the area they point to together is shared memory. If a process writes data to shared memory, the changes immediately affect any other process that can access the same section of shared memory.

For the shared memory mechanism, system calls are required only when the shared memory area is established. Once the shared memory is established, all access can be accessed as regular memory without the help of the kernel. In this way, data does not need to be copied back and forth between processes, so this is the fastest way for processes to communicate.

4) although the speed of shared memory is very fast, there is a conflict problem. For this reason, we can use semaphores and PV operations to achieve mutually exclusive access to shared memory, and can also achieve process synchronization.

5) "signal" and semaphore are two completely different concepts! Signal is the only asynchronous communication mechanism in the process communication mechanism, which can send a signal to a process at any time. Notifies the process of sending an asynchronous event by sending a specified signal to force the process to execute the signal handler. After the signal processing is completed, the interrupted process will resume execution. Users, kernels, and processes can all generate and send signals.

6) the five methods described above are all used for communication between processes on the same host, and "Socket" communication is required if you want to communicate with processes on different hosts across the network. In addition, Socket can also communicate with processes on the host.

This is the end of the content of "what are the common process communication mechanisms provided by the Linux kernel?" Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report