How to use Python multiprocessing Library to deal with 3D data 07/08 Update SLTechnology News&Howtos

How to use Python multiprocessing Library to deal with 3D data

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

How to use Python multi-processing library to deal with 3D data, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain for you in detail, people with this need can come to learn, I hope you can gain something.

Today we will introduce very convenient tools for dealing with large amounts of data. Instead of just telling you the general information you might find in the manual, I'll share some tips I've found, such as using tqdm with multiprocessingimap, working with archives in parallel, drawing and working with 3D data, and how to search for similar objects in object meshes if you have a point cloud.

So why do we ask for parallel computing? Today, if you deal with any type of data, you may face problems related to "big data". Every time we have data that is not suitable for RAM, we need to deal with it piece by piece. Fortunately, modern programming languages allow us to generate multiple processes (or even threads) that run perfectly on multi-core processors. (note: this does not mean that a single-core processor cannot handle multiprocessing. This is the stack overflow thread for this topic. )

Today, we will try to deal with the frequent 3D computer vision tasks of calculating the distance between meshes and point clouds. For example, you may encounter this problem when you need to find meshes that define 3D objects that are the same as a given point cloud in all available meshes.

Our data consists of .obj files stored in .7z archives, which is excellent in terms of storage efficiency. But when we need to visit the exact part of it, we should try. Here, I define the class that wraps the 7-zip archive and provides the underlying data interface.

Class Archive7z (basic): def _ _ init__ (self, file, password = None): #... The self. File = {} #. For the file of information. File: # create an ArchiveFile instance that knows the location of the disk file = ArchiveFile (info, pos, src_pos, folder, self, maxsize = maxsize) #. The self. Files. Append (file) #. The self. File mapping. Update [(X). The file name, X) is the self of X. File]) # method of returning ArchiveFile from files_map dictionary def getmember (self, name): if isinstance (name, (int, long)): try: return to yourself. File [name] except IndexError: returns no return self. File mapping. Get (name, none) class Archive7 (basic): define read (self): #. For horizontal, the encoder is enumerated (self). _ folder . Encoder): #... # get the decoder and decode the underlying data data = getattr (self, decoder) (coder, data, level, num_coders) return data

This class hardly relies on the py7zlib package, which allows us to extract the data and provide us with the number of files in the archive each time the get method is called. We also define _ _ iter__, which will help us map start multiprocessing on that object as if it were an iterable object.

As you probably know, you can create a Python class where you can instantiate iterable objects. This class should meet the following conditions: override _ _ getitem__ returns self and _ _ next__ returns subsequent elements. We absolutely abide by this rule.

The above definition gives us the possibility of traversing the archive, but whether it allows us to access content randomly in parallel is an interesting question. I can't find the answer online, but we can study the source code py7zlib and try to answer it ourselves.

Here, I provide a code snippet from pylzma:

Class Archive7z (basic): def _ _ init__ (self, file, password = None): #... The self. File = {} #. For the file of information. File: # create an ArchiveFile instance that knows the location of the disk file = ArchiveFile (info, pos, src_pos, folder, self, maxsize = maxsize) #. The self. Files. Append (file) #. The self. File mapping. Update [(X). The file name, X) is the self of X. File]) # method of returning ArchiveFile from files_map dictionary def getmember (self, name): if isinstance (name, (int, long)): try: return to yourself. File [name] except IndexError: returns no return self. File mapping. Get (name, none) class Archive7z (basic): define read (self): #. For horizontal, the encoder is enumerated (self). _ folder . Encoder): #... # get the decoder and decode the underlying data data = getattr (self, decoder) (coder, data, level, num_coders) return data

In the code, you can see the method called during the reading of the next object from the archive. I believe it is clear from the above that as long as the archive is read multiple times at the same time, there is no reason to stop the archive.

Next, let's take a quick look at what grids and point clouds are.

First of all, a mesh is a collection of vertices, edges, and faces. The vertices are defined and assigned a unique number by the coordinates in the space. Edges and faces are groups of corresponding point pairs and triples, defined by the only point mentioned, id. Usually, when we talk about "meshes", we mean "triangular meshes", that is, surfaces made up of triangles. Using the trimesh library to use grids in Python is much easier. For example, it provides an interface to load files in .obj memory. To display and interact in a 3D object in jupyter notebook, you can use the K3D library.

So, I answer this question with the following code snippet: "how do you draw trimesh objects with jupyter with K3D?"

Import finishes import K3D using open (w. / data/meshes/stanford-bunny, obj ") as f: bunny_mesh = mesh. (F, 'obj') plot = K3D. Plot () grid = K3D. Grid (bunny_mesh. Vertices > bunny_mesh. Faces) draw grid plot. Show.

Second, a point cloud is a 3D array of points that represents objects in space. Many 3D scanners generate point clouds as representations of scanned objects. For demonstration purposes, we can read the same mesh and display its vertices as point clouds.

Import finishes import K3D using open (w. / data/meshes/stanford-bunny, obj ") as f: bunny_mesh = mesh. Likou load (f, 'obj') plot = K3D. Plot () cloud = K3D. Point (bunny_mesh. Vertices, point_size = 0. 0001, shader =" flat ") plot + = cloud plot. Show.

Point cloud drawn by K3D

As mentioned above, the 3D scanner provides us with a point cloud. Suppose we have a grid database, and we want to find a grid in our database that is aligned with the scanned object, that is, the point cloud. In order to solve this problem, we can propose a simple method. We will search our file for the maximum distance between the points of a given point cloud and each grid. If the distance of some meshes in 1e-4 is smaller, we will consider the mesh to be aligned with the point cloud.

Finally, we come to the multi-processing part. Keep in mind that there are a large number of files in our archives that may not be put together in memory because we prefer to process them in parallel. To achieve this, we will use multiprocessing Pool, which uses map or imap/imap_unordered methods to handle multiple calls to user-defined functions. The difference between map and imap affecting us is that map converts iterable objects into lists before sending them to the worker process. If the archive is too large to write to the RAM, it should not be extracted into the Python list. In other words, the execution speed of the two is similar.

[load grid: pool.map wamp o manager] pool elapsed time for 4 processes: 37.213207403818764 seconds [load grid: pool.imap_unordered wpool o manager] pool elapsed time for 4 processes: 37.219303369522095 seconds

Above you can see the result of a simple read from a grid archive suitable for memory.

Further imap: let's discuss how to achieve our goal of finding a grid close to the point cloud. This is the data. We have five different grids from the Stanford model. We will simulate the 3D scan by adding noise to the vertices of the Stanford rabbit mesh.

Import numpy as npA numpy. Randomly import defaultingdef normalize_pc (point): point amount = point amount-point amount. Average (axis = 0) [none,:] distribution = np. Linalg norm (point, axis = 1) scaled_points = point / dists. Maximum. Return scaled_pointsdef load_bunny_pc (bunny_path): standard deviation = lumped-3 use open (bunny_path) as f: bunny_mesh = load_mesh (f) # standardized cloud scaled_bunny = normalize_pc (bunny_mesh. Vertices) # add some noise to the point cloud rng = defaulting () noise = rng Normal (0. 0, STD, scaled_bunny. Shape) distorted Rabbit = scale Rabbit + noise returns di st ort ed_bunny

Of course, we normalized the point clouds and mesh vertices below to scale them in the 3D cube.

To calculate the distance between the point cloud and the grid, we will use igl. To complete, we need to write a function to call each process and its dependencies. Let's summarize it with the following code snippet.

Import iterative tool Import time import numpy as np nwnpyo randomly import default rng face with port 1 such as enter from A multiprocessing import pool de £load_mesh (obj_file): item 2 trimesh. Force load (obj_file, 'obj') returns grid def get_max__dist (basjmesh, point_cloud): distance_sq, mesh_face__indexes, _ = igl. Point_mesh_squared_distance (point cloud, basjmesho vertex, basjmesho face) returns distancjsq. Maximum 0def 1 oad_mesh__get_di stance (args): obj_file, point__cloud = args [0] / args [1] Grid two load_mesh (obj_file) net. Vertex = RormaliNe_pc (mesh. Vertex) max_dist = get_max_dist (mesh, point cloud) returns max__distde £read_meshes__get__di stances_pool__imap (archive_path, point_cloud, nwn_proc, nwn_i terations): # gridding in hernia elapsed__time = [] is a range (nujn-i terations): archiving two MeshesArchive (ARCHIVE-PATH) pool two pools (nwn_proc) start = time. Time 0 leads to = list (tqdm (pool). IMAP (1 o ad_m e sh__ge t_di s t anc e, zip (archiving, itertoolso repeat (point_cloud),), total = len humiliating file)) pool. Close pool 0. Join o end = time. Time 0 elapsed time o appends (end start) print (F [Process meshes: pool, imap] {num_proc} pool elapsed time: {np. Array (elapsed_time). Mean ()} sec) for name, di st in zip (archive. Namesjist, result): print (r {name} {dist} ") return the result if _ name_ = = bunny_path = / data/meshes/stanford-bunny, obj" archive_path = / data/meshes. 7zff nwn_proc = 4 num_iterations = 3 point__cloud-load__bunny_pc (bunny_path) read_meshes__get__di stances_pool_no_manager__imap (archive_path, point_cloud, nwn_proc, num.iterations)

This read_meshes_get_distances_pool_imap is a central function that does the following:

Initialize MeshesArchive and multiprocessing.Pool

Tqdm is used to observe the progress of the pool and manually complete the analysis of the entire pool.

Output of execution result

Notice how we pass parameters to imap from archive and point_cloud using zip (archive, itertools.repeat (point_cloud)). This allows us to paste the point cloud array onto each entry in the archive to avoid converting the archive to a list.

The implementation results are as follows:

| # | 5 amp 5 [00:00]

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.