How to optimize the memory occupied by Python 04/18 Update SLTechnology News&Howtos

How to optimize the memory occupied by Python

2025-04-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly shows you "how to optimize Python memory occupation". The content is simple and easy to understand. It is clearly organized. I hope it can help you solve your doubts. Let Xiaobian lead you to study and learn this article "how to optimize Python memory occupation".

give a chestnut

Let's take a simple scenario where Python stores a 3D coordinate data, x,y,z.

Dict

Using Python's built-in data structure Dict to implement the requirements of the above example is simple.

>>> ob = {'x':1, 'y':2, 'z':3}>>> x = ob['x']>>> ob['y'] = y

Check out the memory size of ob:

>>> print(sys.getsizeof(ob))240

Simple three integers, take up a lot of memory, imagine the following, if there is a lot of such data to store, will take up more memory.

Data Size Memory Size 1 000 000240 Mb10 000 0002.40 Gb100 000 00024 Gb

Class

Programmers who prefer object-oriented programming prefer to pack data into a class. Use class with the same requirements:

class Point: # def __init__(self, x, y, z): self.x = x self.y = y self.z = z>>> ob = Point(1,2,3)

The data structure of class and Dict are very different. Let's take a look at the memory occupied in this case:

Field memory footprint PyGC_Head24PyObject_HEAD16__weakref__8_dict__8TOTAL56

For__weakref__(weak reference) you can check this document, the dict of the object stores something about self.xxx. Starting with Python 3.3, keys use shared memory storage, reducing the size of instance traces in RAM.

>>> print(sys.getsizeof(ob), sys.getsizeof(ob.__ dict__) 56 112 Data Volume Memory Usage 1 000 000168 Mb10 000 0001.68 Gb100 000 00016.8 Gb

You can see that the memory footprint, class is a little less than dict, but this is far from enough.

__slots__

From the memory footprint distribution of classes, we can see that by eliminating dict and_weakref__, we can significantly reduce the size of class instances in RAM, which we can achieve by using slots.

class Point: __slots__ = 'x', 'y', 'z' def __init__(self, x, y, z): self.x = x self.y = y self.z = z>>> ob = Point(1,2,3)>>> print(sys.getsizeof(ob))64

And you can see that memory usage has decreased dramatically.

Field memory footprint PyGC_Head24PyObject_HEAD16x8y8z8TOTAL64 Data footprint Memory footprint 1 000 00064Mb10 000 000640Mb100 000 0006.4Gb

By default, instances of both Python's new and classic classes have a dict to store the attributes of the instance. This is fine in general, but it's also very flexible, so much so that you can set new properties in your program at will. However, this dict is a bit of a waste of memory for small classes that know how many fixed attributes they have before "compiling."

This problem becomes particularly acute when a large number of instances need to be created. One solution is to define a slots attribute in the new-style class.

The slots declaration contains several instance variables and reserves just enough space for each instance to hold each variable; Python saves space by not using dict.

So is it necessary to use slot to be non-very, um? There are also side effects to using slots:

Each inherited subclass needs to redefine slots once

Instance can only contain attributes defined in slots, which affects the flexibility of writing programs. For example, if you set a new attribute to instance for some reason, such as instance.a = 1, but because a is not in slots, it will directly report an error. You have to constantly modify slots or use other methods to solve the problem.

instances cannot have weakref targets, otherwise remember to put weakref in slots

Finally, namedlist and attrs provide automatic creation of classes with slots, which you can try if you are interested.

Tuple

Python also has a built-in type tuple for representing immutable data structures. Tuples are fixed structures or records, but have no field names. For field access, use the field index. When creating tuple instances, tuple fields are associated with value objects at once:

>>> ob = (1,2,3)>>> x = ob[0]>>> ob[1] = y # ERROR

An example of a tuple is simple:

>>> print(sys.getsizeof(ob))72

You can see only 8 bytes more than slot:

field memory usage (bytes)PyGC_Head24PyObject_HEAD16ob_size8[0]8[1]8[2]8TOTAL72

Namedtuple

With namedtuple we can also access elements in tuple via key values:

Point = namedtuple('Point', ('x', 'y', 'z'))

It creates a subclass of tuples that define descriptors for accessing fields by name. For our example, it looks like this:

class Point(tuple): # @property def _get_x(self): return self[0] @property def _get_y(self): return self[1] @property def _get_y(self): return self[2] # def __new__(cls, x, y, z): return tuple.__ new__(cls, (x, y, z))

All instances of this class have the same memory footprint as tuples. A large number of instances leave a slightly larger memory footprint:

Data Volume Memory footprint 1 000 00072 Mb10 000 000720 Mb100 000 0007.2 Gb

Recordclass

Python's third-party library recordclassd provides a data structure recordclass.mutabletuple that is almost identical to the built-in tuple data structure, but takes up less memory.

>>> Point = recordclass('Point', ('x', 'y', 'z'))>>> ob = Point(1, 2, 3)

After instantiation, only PyGC_Head:

Field memory footprint PyObject_HEAD16ob_size8x8y8y8TOTAL48

At this point, we can see that the memory footprint is further reduced compared to slot:

Data Volume Memory footprint 1 000 00048 Mb10 000 000480 Mb100 000 0004.8 Gb

Dataobject

Recordclass provides an alternative solution: use the same storage structure in memory as slots, but do not participate in circular garbage collection. Recordclass.make_dataclass creates instances like this:

>>> Point = make_dataclass('Point', ('x', 'y', 'z'))

Another method is inherited from dataobject

class Point(dataobject): x:int y:int z:int

Classes created in this way create instances that do not participate in the circular garbage collection mechanism. The structure of the in-memory instance is the same as in the case of slots, but without PyGC_Head:

field memory footprint (bytes)PyObject_HEAD16x8y8y8TOTAL40>> ob = Point(1,2,3)>> print(sys.getsizeof(ob))40

To access these fields, special descriptors are also used to access the fields by their offsets from the beginning of the objects in the class dictionary:

mappingproxy({'__new__': , ....................................... 'x': , 'y': , 'z':}) Data Volume Memory Usage 1 000 00040 Mb10 000 000400 Mb100 000 0004.0 Gb

Cython

There is a method based on the use of Cython. The advantage is that fields can take values of C atomic types. For example:

cdef class Python: cdef public int x, y, z def __init__(self, x, y, z): self.x = x self.y = y self.z = z

In this case, the memory footprint is smaller:

>>> ob = Point(1,2,3)>>> print(sys.getsizeof(ob))32

The memory structure is distributed as follows:

Field Memory footprint (bytes)PyObject_HEAD16x4y4y4 $> усто4TOTAL32 Data Volume Memory footprint 1 000 00032 Mb10 000 000320 Mb100 000 0003.2 Gb

However, when accessed from Python code, the conversion from int to Python object and vice versa is performed every time.

Numpy

In a pure Python environment, Numpy gives better results, such as:

>>> Point = numpy.dtype(('x', numpy.int32), ('y', numpy.int32), ('z', numpy.int32)])

Create an array with an initial value of 0:

>>> points = numpy.zeros(N, dtype=Point) Data Amount Memory Usage 1 000 00012 Mb10 000 000120 Mb100 000 0001.2 Gb

last

As you can see, there are still many things you can do to optimize Python performance. Python provides convenience, but it also requires more temporary resources. In different scenarios, I need to choose different processing methods in order to bring a better performance experience.

That's all for "How to optimize Python memory usage". Thank you for reading! I believe that everyone has a certain understanding, hope to share the content to help everyone, if you still want to learn more knowledge, welcome to pay attention to the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.