The method of dealing with 3D data using Python Multiprocessing Library 07/19 Update SLTechnology News&Howtos

The method of dealing with 3D data using Python Multiprocessing Library

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "the method of using Python Multiprocessing library to deal with 3D data". In the daily operation, I believe that many people have doubts about the method of using Python Multiprocessing library to deal with 3D data. Xiaobian consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts of "using Python Multiprocessing library to deal with 3D data". Next, please follow the editor to study!

Our data consists of .obj files stored in .7z archives, which is excellent in terms of storage efficiency. But when we need to visit the exact part of it, we should try. Here, I define the class that wraps the 7-zip archive and provides the underlying data interface.

From io import BytesIOimport py7zlibclass MeshesArchive (object): def _ _ init__ (self, archive_path): fp = open (archive_path 'rb') self.archive = py7zlib.Archive7z (fp) self.archive_path = archive_path self.names_list = self.archive.getnames () self.cur_id = 0 def _ len__ (self): return len (self.names_list) def get (self Name): bytes_io = BytesIO (self.archive.getmember (name). Read () return bytes_io def _ _ getitem__ (self) Idx): return self.get (self.names[ IDX]) def _ _ iter__ (self): return self def _ next__ (self): if self.cur_id > = len (self.names_list): raise StopIteration name = self.names_ list [self.cur _ id] self.cur_id + = 1 return self.get (name)

This class hardly relies on the py7zlib package, which allows us to extract the data each time the get method is called, and provides us with the number of files in the archive. We also define _ _ iter__, which will help us map start multiprocessing on that object as if it were an iterable object.

As you probably know, you can create a Python class where you can instantiate iterable objects. This class should meet the following conditions: override _ _ getitem__ returns self and _ _ next__ returns subsequent elements. We absolutely follow this rule.

The above definition gives us the possibility of traversing the archive, but does it allow us to have parallel random access to the content? This is an interesting question. I can't find the answer online, but we can study the source code py7zlib and try to answer it ourselves.

Here, I provide a code snippet from pylzma:

Class Archive7z (Base): def _ _ init__ (self, file, password=None): #... Self.files = {} #... For info in files.files: # create an instance of ArchiveFile that knows location on disk file = ArchiveFile (info, pos, src_pos, folder, self, maxsize=maxsize) #... Self.files.append (file) #... Self.files_map.update ([(x.filename, x) for x in self.files]) # method that returns an ArchiveFile from files_map dictionary def getmember (self, name): if isinstance (name, (int, long)): try: return self.files [name] except IndexError: return None return self.files_map.get (name None) class Archive7z (Base): def read (self): #... For level, coder in enumerate (self._folder.coders): #... # get the decoder and decode the underlying data data = getattr (self, decoder) (coder, data, level, num_coders) return data

In the code, you can see the method called during the reading of the next object from the archive. I believe it is clear from the above that as long as the archive is read multiple times at the same time, there is no reason to stop the archive.

Next, let's take a quick look at what grids and point clouds are.

First of all, a mesh is a collection of vertices, edges, and faces. The vertices are defined and assigned a unique number by the coordinates in the space. Edges and faces are groups of corresponding point pairs and triples, defined by the only point mentioned, id. Usually, when we talk about "meshes", we mean "triangular meshes", that is, surfaces made up of triangles. Using the trimesh library to use grids in Python is much easier. For example, it provides an interface for .obj to load files in memory. To display and interact in a 3D object in jupyter notebook, you can use the K3D library.

So, I answer this question with the following code snippet: "how do you draw trimesh objects with jupyter with K3D?"

Import trimeshimport k3dwith open (". / data/meshes/stanford-bunny.obj") as f: bunny_mesh = trimesh.load (f, 'obj') plot = k3d.plot () mesh = k3d.mesh (bunny_mesh.vertices, bunny_mesh.faces) plot + = meshplot.display ()

Stanford Rabbit Grid displayed by K3D

Second, a point cloud is a 3D array of points that represents objects in space. Many 3D scanners generate point clouds as representations of scanned objects. For demonstration purposes, we can read the same mesh and display its vertices as point clouds. Import trimeshimport k3dwith open (". / data/meshes/stanford-bunny.obj") as f: bunny_mesh = trimesh.load (f, 'obj') plot = k3d.plot () cloud = k3d.points (bunny_mesh.vertices, point_size=0.0001, shader= "flat") plot + = cloudplot.display ()

Drawn by K3D

Point cloud

As mentioned above, the 3D scanner provides us with a point cloud. Suppose we have a grid database, and we want to find a grid in our database that is aligned with the scanned object, that is, the point cloud.

In order to solve this problem, we can propose a simple method. We will search our file for the maximum distance between the points of a given point cloud and each grid.

If the distance of some meshes in 1e-4 is smaller, we will consider the mesh to be aligned with the point cloud.

Finally, we come to the multi-processing part. Keep in mind that there are a large number of files in our archives that may not be put together in memory because we prefer to process them in parallel.

To achieve this, we will use multiprocessing Pool, which uses map or imap/imap_unordered methods to handle multiple calls to user-defined functions.

The difference between map and imap affecting us is that map converts it to a list before sending it to the worker process. If the archive is too large to write to the RAM, it should not be extracted into the Python list. In other words, the execution speed of the two is similar.

[load grid: pool.map wgamo manager] 4 process pools time: 37.213207403818764 seconds [load grid: pool.imap_unordered wampo manager] 4 process pools time: 37.219303369522095 seconds

Above, you can see the result of a simple read from a grid archive suitable for memory.

Further imap: let's discuss how to achieve the goal of finding a grid close to the point cloud. This is the data. We have five different grids from the Stanford model. We will simulate the 3D scan by adding noise to the vertices of the Stanford rabbit mesh.

Import numpy as npfrom numpy.random import default_rngdef normalize_pc (points): points = points-points.mean (axis=0) [None,:] dists = np.linalg.norm (points Axis=1) scaled_points = points / dists.max () return scaled_pointsdef load_bunny_pc (bunny_path): STD = 1e-3 with open (bunny_path) as f: bunny_mesh = load_mesh (f) # normalize point cloud scaled_bunny = normalize_pc (bunny_mesh.vertices) # add some noise to point cloud rng = default_rng () noise = rng.normal (0.0, STD Scaled_bunny.shape) distorted_bunny = scaled_bunny + noise return distorted_bunny

Of course, we normalized the point clouds and mesh vertices below to scale them in the 3D cube.

To calculate the distance between the point cloud and the grid, we will use igl. To complete, we need to write a function to call each process and its dependencies. Let's summarize with the following code snippet.

Import itertoolsimport timeimport numpy as npfrom numpy.random import default_rngimport trimeshimport iglfrom tqdm import tqdmfrom multiprocessing import Pooldef load_mesh (obj_file): mesh = trimesh.load (obj_file, 'obj') return meshdef get_max_dist (base_mesh, point_cloud): distance_sq, mesh_face_indexes, _ = igl.point_mesh_squared_distance (point_cloud, base_mesh.vertices) Base_mesh.faces return distance_sq.max () def load_mesh_get_distance (args): obj_file, point_cloud = args [0], args [1] mesh = load_mesh (obj_file) mesh.vertices = normalize_pc (mesh.vertices) max_dist = get_max_dist (mesh, point_cloud) return max_distdef read_meshes_get_distances_pool_imap (archive_path, point_cloud, num_proc Num_iterations): # do the meshes processing within a pool elapsed_time = [] for _ in range (num_iterations): archive = MeshesArchive (archive_path) pool = Pool (num_proc) start = time.time () result = list (pool.imap (load_mesh_get_distance, zip (archive, itertools.repeat (point_cloud)),) Total=len (archive)) pool.close () pool.join () end = time.time () elapsed_time.append (end-start) print (f'[Process meshes: pool.imap] Pool of {num_proc} processes elapsed time: {np.array (elapsed_time). Mean ()} sec') for name, dist in zip (archive.names_list) Result): print (f "{name} {dist}") return result if _ _ name__ = "_ _ main__": bunny_path = ". / data/meshes/stanford-bunny.obj" archive_path = ". / data/meshes.7z" num_proc = 4 num_iterations = 3 point_cloud = load_bunny_pc (bunny_path) read_meshes_get_distances_pool_no_manager_imap (archive_path, point_cloud) Num_proc, num_iterations)

This read_meshes_get_distances_pool_imap is a central function that does the following:

Initialize MeshesArchive and multiprocessing.Pool

Tqdm is used to observe the progress of the pool and manually complete the analysis of the entire pool.

Output of execution result

Notice how we pass parameters to imap from archive and point_cloud using zip (archive, itertools.repeat (point_cloud)). This allows us to paste the point cloud array onto each entry in the archive to avoid converting the archive to a list.

The implementation results are as follows:

| # | 5 amp 5 [00:00]

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.