The use of mpi4py in the practice of Python Multi-process parallel programming 07/13 Update SLTechnology News&Howtos

The use of mpi4py in the practice of Python Multi-process parallel programming

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article will explain in detail the use of mpi4py in the practice of Python multi-process parallel programming. The content of the article is of high quality, so the editor shares it for you as a reference. I hope you will have a certain understanding of the relevant knowledge after reading this article.

Preface

In high-performance computing projects, we usually use more efficient compiled languages such as C, C++, Fortran and so on. But because of the flexibility and ease of use of Python, it is favored in the development and verification of algorithms, so Python can often be seen in the field of high-performance computing. This paper briefly introduces the method of multi-process parallel computing on cluster by using MPI interface in Python environment.

MPI (Message Passing Interface)

Let me first give a brief introduction to MPI. The full name of MPI is Message Passing Interface, that is, the messaging interface.

It is not a language, but a library. We can use Fortran, C, C++ and the interface provided by MPI to parallelize serial programs. We can also think that Fortran+MPI or C+MPI is a parallel language extended from the original serial language.

It is a standard rather than a specific implementation, and there can be many different implementations, such as MPICH, OpenMPI, and so on.

It is a message passing programming model, and as its name implies, it is dedicated to inter-process communication.

The way MPI works is easy to understand. We can start a group of processes at the same time. Different processes in the same communication domain have different numbers. Programmers can use the interface provided by MPI to assign different tasks to different numbered processes and help processes communicate with each other and finally complete the same task. It is like the contractor assigns work numbers to the workers and then assigns a plan to assign tasks to different numbers of workers and allow them to communicate with each other to complete the task.

Parallelism in Python

Because of the existence of GIL in CPython, we can not expect to use multi-thread to use multi-core resources for parallel computing in CPython, so we can make full use of multi-core resources in Python.

In Python, we can use many ways to do multi-process programming, such as os.fork () to create processes or through the multiprocessing module to more easily create processes and process pools. In the previous article, "Python multi-process parallel programming practice-multiprocessing module", we used process pool to manage Python processes conveniently and realized multi-computer distributed computing through Manager management distributed processes in multiprocessing module.

Different from multi-thread shared memory, because each process is independent of each other, inter-process communication plays a very important role in multiple processes. In Python, we can use pipe, queue, Array, Value and other tools in multiprocessing module to achieve inter-process communication and data sharing, but it still has great flexibility in writing. This is what MPI is good at, so it would be great to be able to call MPI's interface in Python, wouldn't it?

MPI and mpi4py

Mpi4py is a Python library built on top of MPI, mainly written in Cython. Mpi4py enables the data structure of Python to be easily transmitted in multiple processes.

Mpi4py is a very powerful library, which implements many interfaces in MPI standards, including point-to-point communication, intra-group collective communication, non-blocking communication, repeated non-blocking communication, inter-group communication and so on. Basically, I can think of the corresponding implementation in the MPI interface mpi4py. Not only Python objects, but mpi4py also has good support for numpy and high delivery efficiency. At the same time, it also provides the interface between SWIG and F2PY so that we can still use the objects and interfaces of mpi4py for parallel processing after our own Fortran or Cmax Candle + programs are packaged into Python. It can be seen that the skill of the author of mpi4py is indeed very great.

Mpi4py

Here I begin to introduce parallel programming using the interface of mpi4py in the Python environment.

MPI environmental management

Mpi4py provides the corresponding interfaces Init () and Finalize () to initialize and end the mpi environment. But mpi4py initializes the mpi environment automatically when we from mpi4py import MPI by writing the initialization operation in _ _ init__.py.

MPI_Finalize () is registered with Python's C interface Py_AtExit () so that MPI_Finalize () is automatically called at the end of the Python process, so we no longer need to explicitly remove Finalize ().

Communication domain (Communicator)

Mpi4py directly provides the Python class of the corresponding communication domain, where Comm is the base class of the communication domain, and Intracomm and Intercomm are its derived classes, which is the same in the C++ implementation of this MPI.

It also provides two predefined communication domain objects:

Contains the COMM_WORLD of all processes

Contains only the COMM_SELF of the calling process itself

In [1]: from mpi4py import MPI In [2]: MPI.COMM_SELF Out [2]: In [3]: MPI.COMM_WORLD Out [3]:

The communication domain object provides interfaces related to the communication domain, such as obtaining the current process number, obtaining the number of processes in the communication domain, obtaining process groups, collecting process groups, dividing and merging process groups, and so on.

In [4]: comm = MPI.COMM_WORLD In [5]: comm.Get_rank () Out [5]: 0 In [6]: comm.Get_size () Out [6]: 1 In [7]: comm.Get_group () Out [7]: In [9]: comm.Split (0,0) Out [9]:

We will not elaborate on the operation of communication domain and process group here, you can refer to Introduction to Groups and Communicators.

Point-to-point communication

Mpi4py provides a point-to-point communication interface that enables multiple processes to pass Python's built-in objects to each other (based on pickle serialization), as well as direct array delivery (numpy arrays, close to the efficiency of C).

If we need to pass a generic Python object, we need to use the lowercase interface of the communication domain object, such as send (), recv (), isend (), and so on.

If you need to pass the data object directly, you need to call the uppercase interface, such as Send (), Recv (), Isend (), and so on, which is the same as the spelling in the C++ interface.

There are many kinds of point-to-point communication in MPI, including standard communication, cache communication, synchronous communication and ready communication, and there are non-blocking asynchronous versions of these communications and so on. All of these have corresponding Python interfaces in mpi4py to give us more flexibility in dealing with inter-process communication. Here I will only use the blocking and non-blocking versions of standard communication as an example:

Blocking standard communication

Here I try to use the interface of mpi4py to pass Python list objects between two processes.

From mpi4py import MPI import numpy as np comm = MPI.COMM_WORLD rank = comm.Get_rank () size = comm.Get_size () if rank = 0: data = range (10) comm.send (data, dest=1, tag=11) print ("process {} send {}..." .format (rank, data) else: data = comm.recv (source=0, tag=11) print ("process {} recv {}..." .format (rank, data))

Execution effect:

Zjshao@vaio:~/temp_codes/mpipy$ mpiexec-np 2 python temp.py process 0 send [0,1,2,3,4,5,6,7,8,9]. Process 1 recv [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]...

Non-blocking standard communication

All blocking communication mpi provides a non-blocking version, similar to the way we write asynchronous programs that do not block on the time-consuming IO, and MPI's non-blocking communication does not block the transmission of messages, which can make full use of processor resources to improve the efficiency of the whole program.

Take a picture to see the comparison between blocking and non-blocking traffic:

Message sending and receiving for non-blocking communication:

Similarly, we can also write a non-blocking version of the above example.

From mpi4py import MPI import numpy as np comm = MPI.COMM_WORLD rank = comm.Get_rank () Size = comm.Get_size () if rank = 0: data = range (10 ) comm.isend (data Dest=1, tag=11) print ("process {} immediate send {}..." .format (rank, data)) else: data = comm.recv (source=0, tag=11) print ("process {} recv {}..." .format (rank, data))

As a result of the execution, note that non-blocking sending can also use blocking reception to receive messages:

Zjshao@vaio:~/temp_codes/mpipy$ mpiexec-np 2 python temp.py process 0 immediate send [0,1,2,3,4,5,6,7,8,9]. Process 1 recv [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]...

Support for Numpy arrays

One of the good features of mpi4py is that it has good support for Numpy array. We can pass data objects directly through the interface provided by C/Fortran. This way is very efficient, basically similar to the way C/Fortran calls MPI interface directly (way and effect).

For example, I want to pass an int array of length 10. The C++ interface of MPI is:

Void Comm::Send (const void * buf, int count, const Datatype & datatype, int dest, int tag) const

Similarly in the interface of mpi4py, Comm.Send () needs to receive a Python list as a parameter containing the address, length, and type of the data being passed.

Let's take an example of blocking standard communication:

From mpi4py import MPI import numpy as np comm = MPI.COMM_WORLD Rank = comm.Get_rank () size = comm.Get_size () if rank = 0: Data = np.arange (10 Dtype='i') comm.Send ([data, MPI.INT], dest=1, tag=11) print ("process {} Send buffer-like array {}..." .format (rank, data)) else: data = np.empty (10 Dtype='i') comm.Recv ([data, MPI.INT], source=0, tag=11) print ("process {} recv buffer-like array {}..." .format (rank, data))

Execution effect:

Zjshao@vaio:~/temp_codes/mpipy$ / usr/bin/mpiexec-np 2 python temp.py process 0 Send buffer-like array [0 1 2 3 4 5 6 7 8 9]. Process 1 recv buffer-like array [0 1 2 3 4 5 6 7 8 9]...

Group communication

An important difference between MPI group communication and point-to-point communication is that all processes in a process group participate in communication at the same time. Mpi4py provides a convenient interface for us to complete intra-group collective communication in Python, which is convenient for programming and improves the readability and portability of the program.

Here are a few common collective communications to try it out.

Broadcast

The broadcast operation is a typical one-to-many communication, copying the data with the process to all other processes in the same group.

In Python I want to broadcast a list to other processes:

From mpi4py import MPI comm = MPI.COMM_WORLD rank = comm.Get_rank () Size = comm.Get_size () if rank = = 0: Data = range (10) print ("process {} bcast data {} to other processes" .format (rank Data) else: data = None data = comm.bcast (data, root=0) print ("process {} recv data {}..." .format (rank) Data))

Execution result:

Zjshao@vaio:~/temp_codes/mpipy$ / usr/bin/mpiexec-np 5 python temp.py process 0 bcast data [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] to other processes process 0 recv data [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] Process 1 recv data [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]... Process 3 recv data [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]... Process 2 recv data [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]... Process 4 recv data [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]...

Divergence

Unlike broadcasting, divergence can send different data to different processes instead of copying it completely.

For example, I want to send 0-9 to different processes:

From mpi4py import MPI import numpy as np comm = MPI.COMM_WORLD Rank = comm.Get_rank () size = comm.Get_size () Recv_data = None if rank = = 0: Send_data = range (10) print ("process {} scatter data {} to other processes" .format (rank Send_data)) else: send_data = None recv_data = comm.scatter (send_data Root=0) print ("process {} recv data {}..." .format (rank, recv_data))

Divergence results:

Zjshao@vaio:~/temp_codes/mpipy$ / usr/bin/mpiexec-np 10 python temp.py process 0 scatter data [0,1,2,3,4,5,6,7,8,9] to other processes process 0 recv data 0. Process 3 recv data 3... Process 5 recv data 5... Process 8 recv data 8... Process 2 recv data 2... Process 7 recv data 7... Process 4 recv data 4... Process 1 recv data 1... Process 9 recv data 9... Process 6 recv data 6...

Collect

The collection process is the inverse process of the divergence process. Each process sends messages from the sending buffer to the root process, and the root process stores its own messages in its own message buffer according to the process number of the sending process.

Collect the results:

Zjshao@vaio:~/temp_codes/mpipy$ / usr/bin/mpiexec-np 5 python temp.py process 2 send data 2 to root... Process 3 send data 3 to root... Process 0 send data 0 to root... Process 4 send data 4 to root... Process 1 send data 1 to root... Process 0 gather all data [0, 1, 2, 3, 4]...

Other intra-group communications and reduction operations will not be discussed because of space constraints. If you are interested, you can take a look at the official documents and corresponding teaching materials of MPI.

Practice of mpi4py parallel programming

Here I take the example of double loop drawing map in the previous article "Python multi-process parallel programming practice: multiprocessing module" to use mpi4py for parallel acceleration processing.

I intend to start 10 processes at the same time to send the data that needs to be calculated and drawn on each 0 axis to different processes for parallel computing.

So I need to divert the pO2s array into 10 processes:

Then I need to do another pCOs loop in each process based on the data received from pO2s.

Finally, the result of each process calculation (TOF) is collected:

Comm.gather (tofs_1d, root=0)

Since the code is all professional and relevant, I can't list it all. Put the modified parallel version of mpi4py into 10 processes for execution.

The efficiency has been increased by about 10 times.

This paper briefly introduces the method of multi-process programming in python with the interface of mpi4py. The interface of MPI is very large, and the corresponding mpi4py is also very large. Mpi4py also implements the corresponding encapsulation file and type mapping of SWIG and F2PY, which can help us to unify the message transmission of Python with the real CAccord + and Fortran programs.

On Python multi-process parallel programming practice in the use of mpi4py to share here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.