In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
How to use Python code to reduce the memory required by Python, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain in detail for you, people with this need can come to learn, I hope you can get something.
When executing a program, memory problems can occur if there are a large number of active objects in memory, especially if the total amount of memory available is limited. In this article, we will discuss ways to shrink objects to significantly reduce the memory required for Python.
For simplicity, let's take a Python structure that represents a point as an example, which includes x, y, and z coordinate values, which can be accessed by name.
Dict
In mini programs, especially in scripts, it is simple and convenient to use the dict that comes with Python to represent structural information:
> ob = {"x" ob 1,'y "" 2 ",'Z"3}} > x = ob ['x'] = y
Because the implementation of dict in Python 3.6uses a set of ordered keys, its structure is more compact and more popular. However, let's take a look at how much space dict takes up in the content:
> > print (sys.getsizeof (ob)) 240
As shown above, dict takes up a lot of memory, especially if you suddenly need to create a large number of instances:
Class instance
Some people want to encapsulate everything in a class, and they prefer to define a structure as a class that can be accessed by a property name:
Class Point: # def _ init__ (self, x, y, z): self.x = x self.y = y self.z = z > > ob = Point (1m 2m 3) > x = ob.x > ob.y = y
The structure of the class instance is interesting:
In the above table, _ _ weakref__ is a reference to the list, called a weak reference to the object (weak reference), and the field _ _ dict__ is a reference to the instance dictionary of the class, which contains the value of the instance property (note that 8 bytes are occupied in the 64-bit reference platform). Starting with Python 3. 3, the keys for dictionaries of all class instances are stored in shared space. This reduces the size of the instance in memory:
> > print (sys.getsizeof (ob), sys.getsizeof (ob.__dict__)) 56 112
As a result, a large number of class instances take up less space in memory than a regular dictionary (dict):
It is not difficult to see that because the dictionary of the instance is very large, the instance still takes up a lot of memory.
Class instance with _ _ slots__
In order to significantly reduce the size of class instances in memory, we can consider killing _ _ dict__ and _ _ weakref__. To do this, we can use _ _ slots__:
Class Point: _ _ slots__ ='x,'y,'z 'def _ init__ (self, x, y, z): self.x = x self.y = y self.z = z > ob = Point (1 sys.getsizeof (ob)) 64
As a result, the objects in memory are significantly smaller:
When _ _ slots__ is used in the definition of a class, the memory occupied by a large number of instances is significantly reduced:
Number of instances
Currently, this is the main way to reduce the memory footprint of class instances.
The principle of reducing memory in this way is that in memory, references to objects (that is, attribute values) are stored after the object's title, and special descriptors in the class dictionary can be used to access these attribute values:
Pprint (Point.__dict__) mappingproxy (.. 'x:,' y:,'z:})
To automate the process of creating classes using _ _ slots__, you can use the library namedlist (https://pypi.org/project/namedlist). The namedlist.namedlist function creates a class with _ _ slots__:
> Point = namedlist ('Point', (' x,'y,'z'))
There is also a package attrs (https://pypi.org/project/attrs) that you can use to automatically create classes with or without _ _ slots__.
Tuple
Python also has a built-in tuple (tuple) type that represents immutable data structures. A tuple is a fixed structure or record, but it does not contain field names. You can access the fields of the tuple using the field index. When you create a tuple instance, the fields of the tuple are associated to the value object at one time:
> ob = (1) > x = ob [0] > ob [1] = y # ERROR
Tuple instances are very compact:
> print (sys.getsizeof (ob)) 72
Because the tuple in memory also contains the number of fields, you need to occupy 8 bytes of memory, more than the class with _ _ slots__:
Named tuple
Because tuples are widely used, one day you need to access tuples by name. To meet this requirement, you can use the module collections.namedtuple.
The namedtuple function automatically generates this class:
> Point = namedtuple ('Point', (' x,'y,'z'))
The code above creates a subclass of the tuple, which also defines a descriptor that accesses the field by name. For the above example, the access method is as follows:
Class Point (tuple): # @ property def _ get_x (self): return self [0] @ property def _ get_y (self): return self [1] @ property def _ get_z (self): return self [2] # def _ new__ (cls, x, y, z): return tuple.__new__ (cls, (x, y, z))
All instances of this category occupy exactly the same memory as tuples. But a large number of instances also consume slightly more memory:
Record class: changeable named tuples without circular GC
Because tuples and their corresponding named tuple classes can generate immutable objects, object values similar to ob.x can no longer be assigned other values, so sometimes modifiable named tuples are needed. Because Python does not have a built-in type equivalent to tuples and supports assignments, people have come up with a lot of ways. Here we talk about the recordclass, https://pypi.org/project/recordclass, which is https://stackoverflow.com/questions/29290359/existence-of-mutable-named-tuple-in on StackoverFlow.
In addition, it can reduce the amount of memory consumed by objects to a level similar to that of tuple objects.
The recordclass package introduces the type recordclass.mutabletuple, which is almost equivalent to a tuple, but supports assignment. It creates subclasses that are almost identical to namedtuple, but supports assigning new values to attributes (without creating a new instance). The recordclass function is similar to the namedtuple function in that these classes are created automatically:
> Point = recordclass ('Point', (' x','y','z')) > ob = Point (1, 2, 3)
The structure of the class instance is also similar to tuple, but without PyGC_Head:
By default, the recordclass function creates a class that does not participate in the garbage collection mechanism. In general, both namedtuple and recordclass can generate classes that represent records or simple data structures (that is, non-recursive structures). Proper use of both in Python does not result in circular references. Therefore, the class instance generated by recordclass does not contain a PyGC_Head fragment by default (this fragment is a required field to support the circular garbage collection mechanism, or more precisely, the flags field does not set the Py_TPFLAGS_HAVE_GC flag by default in the PyTypeObject structure of the created class).
The amount of memory consumed by a large number of instances is less than that of class instances with _ _ slots__:
Dataobject
The basic idea of another solution proposed by the recordclass library is that the memory structure uses the same structure as the class instance with _ _ slots__, but does not participate in the circular garbage collection mechanism. This category can be generated through the recordclass.make_dataclass function:
> Point = make_dataclass ('Point', (' x,'y,'z'))
Classes created in this way default to generate modifiable instances.
Another way is to inherit from recordclass.dataobject:
Class Point (dataobject): x:int y:int z:int
Class instances created by this method do not participate in the recycling garbage collection mechanism. The structure of the in-memory instance is the same as the class with _ _ slots__, but there is no PyGC_Head:
> ob = Point (1 sys.getsizeof 2 ob 3) > print (sys.getsizeof (ob)) 40
If you want to access the field, you need to use a special descriptor to represent the offset from the beginning of the object, which is located in the class dictionary:
Mappingproxy ({'_ new__':,.. 'x:,' y:,'z:})
The amount of memory consumed by a large number of instances is the smallest in a CPython implementation:
Cython
There is also a solution based on Cython (https://cython.org/). The advantage of this scheme is that the field can use the atomic type of C language. The descriptor of the access field can be created through pure Python. For example:
Cdef class Python: cdef public int x, y, z def _ init__ (self, x, y, z): self.x = x self.y = y self.z = z
In this example, the instance takes up less memory:
> ob = Point (1 sys.getsizeof 2 ob 3) > print (sys.getsizeof (ob)) 32
The memory structure is as follows:
A large number of copies also take up a small amount of memory:
However, you need to keep in mind that when accessing from Python code, each access causes a conversion between the int type and the Python object.
Numpy
Using a multidimensional array or record array with a large amount of data takes up a lot of memory. However, in order to effectively use pure Python to process data, you should use the functions provided by the Numpy package.
> Point = numpy.dtype (('x, numpy.int32), ('y, numpy.int32), ('z, numpy.int32)])
An array with N elements initialized to zero can be created with the following function:
> points = numpy.zeros (N, dtype=Point)
The memory footprint is minimal:
In general, accessing array elements and rows causes a conversion between Python objects and C language int values. If you get a row of results from the generated array that contains an element, its memory is less compact:
> sys.getsizeof (points [0]) 68
Therefore, as mentioned above, you need to use the functions provided by the numpy package to handle arrays in your Python code.
Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.