Int source code analysis of Python built-in type 07/06 Update SLTechnology News&Howtos

Int source code analysis of Python built-in type

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)05/31 Report--

Today, I would like to share with you the relevant knowledge points of Python built-in type int source code analysis, the content is detailed, the logic is clear, I believe most people still know too much about this knowledge, so share this article for your reference, I hope you can get something after reading this article, let's take a look at it.

Question: for the C language, what is the result of running the following program? Is it 1000000000000?

# include int main (int argc, char * argv []) {int value = 1000000; print ("% d\ n", value * value)}

The output is as follows:

-727379968

In a computer system, if the storage space of a certain type of variable is fixed, the range of values it can represent is limited. Take int as an example. In C language, the length of this type of variable is 32 bits, and the range of integers that can be represented is-2147483648. 1000000000000 is obviously out of range, that is, an integer overflow has occurred. However, for int in Python, this does not happen:

> 1000000 * 1000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001 Design of int object 1.1 PyLongObject

The structure of the int object:

Typedef struct _ longobject PyLongObject;struct _ longobject {PyObject_VAR_HEAD digit ob_digit [1];}

Digit array

# if PYLONG_BITS_IN_DIGIT = = 30typedef uint32_t digit;//... # elif PYLONG_BITS_IN_DIGIT = = 15typedef unsigned short digit;//... # endif

What type of integer is used to implement the digit array? Python provides two versions, a 32-bit unit32_t and a 16-bit unsigned short, which can be specified through a macro definition. As for the reason for this design, this is mainly due to memory considerations, for a small range of integers, 16-bit integers can be expressed, using 32-bit will be more wasteful.

(note: you can see that the value of the PYLONG_BITS_IN_DIGIT macro is 30 or 15, which means that Python only uses 30 or 15 bits, which is why-- this is for Python's consideration of addition carry. If all 32 bits are used to hold absolute values, then in order to ensure that the addition does not overflow (generate carry), it needs to be cast to a 64-bit type before it is calculated. However, after sacrificing the highest 1 bit, the addition operation does not have to worry about carry overflow. So why sacrifice the highest 2 bits for 32 bits? It may be to be unified with the 16-bit integer scheme: if you choose a 16-bit integer, Python uses only 15 bits; 32-bit uses 30 bits. )

In fact, due to the existence of PyObject_VAR_HEAD headers, there is little difference between 32-bit and 16-bit choices:

Integer object Base Units 16-bit Base Units 2624 + 4 * 1 = 28100000024 + 2 * 2 = 2824 + 4 * 1 = 281000000000024 + 2 * 3 = 3024 + 4 * 2 = 32

The structure of the int object is shown below:

For larger integers, Python splits them into parts and saves them in the ob_digit array. However, we notice that in the structure definition, the length of the ob_digit array is fixed at 1. Why? The explanation here is: "because array length is not type information in C language, we can allocate enough memory for ob_digit array according to actual needs and operate it as an array of length n. This is also a common programming skill in C language."

But according to my understanding of the C language, the location of the array is determined by the base address + offset, and the initial length of the array is 1. Wouldn't it be a problem if the position beyond that length is forcibly indexed later? I don't know if I misunderstood it or something, but there will be further research here.

1.2 layout of integers

Integers are divided into positive, negative, and zero integers, and these three different integers are stored as follows:

Save the absolute value of an integer in the ob_digit array

The length of the ob_digit array is saved in the ob_size field. If the integer is negative, the ob_size is negative.

The ob_size of integer zero is 0. The array is empty.

Here are five typical examples of integer storage in different situations:

For the integer 0 zero obliteration size = 0 minutes obliteration size is empty, no assignment is required

For the integer 10, its absolute value is saved in the ob_digit array, and the array length is 1. The field is 1.

For the integer-10, its absolute value is saved in the ob_digit array, and the array length is 1. The field size is-1.

For the integer 1073741824 (that is, 2 ^ 30), because Python uses only the last 30 bits of 32 bits, the 2 ^ 30 power requires two array elements to store, and the length of the integer array is 2. The absolute value is calculated as follows:

2 ^ 0 * 0 + 2 ^ 30 * 1 = 1073741824

For the integer-4294967297 (that is,-(2 ^ 32 + 1)), an array of length 2 is also required, but the ob_size field is negative. The absolute value is calculated as follows:

2 ^ 0 * 1 + 2 ^ 30 * 4 = 4294967297

Summary: when the ob_digit array stores data, it is similar to the 230th calculation (or 215ary, depending on the type of array).

1.3 small integer static object pool

Question: through the study of the previous chapter, we know that the integer object is immutable, and the integer operation results are returned as the new object:

> a = 1 > id (a) 1497146464 > a + = 1 > id (a) 1496146496

A design like Python will bring a performance defect, and a large number of objects will be created and destroyed when the program is running, that is, it will bring a lot of memory allocation and recovery consumption, which will seriously affect the performance. For example, for a for loop that loops 100 times, you need to create 100 int objects, which is obviously unacceptable.

Python's solution is to create commonly used integer objects in advance and replace them later, which is the pool of small integer objects. (pool technology is used like float, but with a slight difference, which is also caused by the difference in actual use between int and float.)

Small integer object pool related source code:

# ifndef NSMALLPOSINTS#define NSMALLPOSINTS 257#endif#ifndef NSMALLNEGINTS#define NSMALLNEGINTS 5#endifstatic PyLongObject small_ ints [NSMALLNEGINTS + NSMALLPOSINTS]

NSMALLPOSINTS macros specify a positive number of object pools (including 0). By default, 257 NSMALLNEGINTS macros specify a negative number of object pools. By default, 5 small_ints is an array of integer objects, saving pre-created small integer objects.

Taking the default configuration as an example, after Python starts, it statically creates an array of integers containing 262 elements, and initializes the integer objects from-5 to-1, and from 1 to 256. The structure of the small integer object pool is as follows:

1.4 exampl

Example 1:

> a = 1 + 0 > > b = 1 * 1 > id (a), id (b) (1541936120048,1541936120048)

Because 1 + 0 evaluates to 1, in the range of small integers, Python takes the integer 1 / 0 directly from the pool of static objects. The names an and b are actually bound to an object (see this blog: Python Source Code Learning Notes: Python scope and namespace), that is, the integer 1 in the pool of small integer objects, so their id is the same.

Example 2:

> c = 1000 + 0 > > d = 1000 * 1 > id (c), id (d) (3085872130224,3085872130256)

1000 + 0 and 1000 * 1 both evaluate to 1000, but since 1000 is not within the pool of small integers, Python creates objects and returns them respectively, so the c and d bound objects id are different.

Note: if you use Pycharm to run here, you will find that their id is the same:

The reason here is essentially bytecode related. In IDLE, each command is compiled separately, while in Pycharm, the entire py file is compiled, in the same context.

2 large integer operation

Problem: previously we learned about the internal structure of integer objects and had a preliminary understanding of how Python can deal with the problem of "integer overflow". But the real difficulty lies in the realization of large integer mathematical operations.

2.1 Overview of integer operations

The operation of an integer object is determined by the three fields tp_as_number, tp_as_sequence, and tp_as_mapping in the integer type object. The PyLong_Type source code of integer type object is as follows: (only some fields are listed)

PyTypeObject PyLong_Type = {PyVarObject_HEAD_INIT (& PyType_Type, 0) "int", / * tp_name * / offsetof (PyLongObject, ob_digit), / * tp_basicsize * / sizeof (digit), / * tp_itemsize * / /. & long_as_number / * tp_as_number * / 0, / * tp_as_sequence * / 0, / * tp_as_mapping * / /...}

Integer objects only support numeric operations long_as_number:

Static PyNumberMethods long_as_number = {(binaryfunc) long_add, / * nb_add*/ (binaryfunc) long_sub, / * nb_subtract*/ (binaryfunc) long_mul, / * nb_multiply*/ long_mod, / * nb_remainder*/ long_divmod, / * nb_divmod*/ long_pow / * nb_power*/ (unaryfunc) long_neg, / * nb_negative*/ (unaryfunc) long_long, / * tp_positive*/ (unaryfunc) long_abs, / * tp_absolute*/ (inquiry) long_bool, / * tp_bool*/ (unaryfunc) long_invert, / * nb_invert*/ long_lshift / * nb_lshift*/ (binaryfunc) long_rshift, / * nb_rshift*/ long_and, / * nb_and*/ long_xor, / * nb_xor*/ long_or, / * nb_or*/ long_long, / * nb_int*/ 0 / * nb_reserved*/ long_float, / * nb_float*/ 0, / * nb_inplace_add * / 0, / * nb_inplace_subtract * / 0, / * nb_inplace_multiply * / 0 / * nb_inplace_remainder * / 0, / * nb_inplace_power * / 0, / * nb_inplace_lshift * / 0, / * nb_inplace_rshift * / 0, / * nb_inplace_and * / 0 / * nb_inplace_xor * / 0, / * nb_inplace_or * / long_div, / * nb_floor_divide * / long_true_divide, / * nb_true_divide * / 0, / * nb_inplace_floor_divide * / 0 / * nb_inplace_true_divide * / long_long, / * nb_index * /}

So far, we have identified all the mathematical operations supported by integer objects and the corresponding processing functions: (only some functions are listed)

Mathematical operation processing function example addition long_adda + b subtraction long_suba-b multiplication long_mula * b take module long _ moda% b divide long_divmoda / b index long_powa * * b

The relationship between integer objects, integer type objects, and integer mathematical operations:

2.2 large integer operation process

Take addition as an example to understand the processing process of large integer operation.

Addition processing function long_add ()

1.long_add () source code: static PyObject * long_add (PyLongObject * a, PyLongObject * b) {PyLongObject * z; CHECK_BINOP (a, b); if (Py_ABS (Py_SIZE (a)) ob_ ob_ [I]; z-> ob_ Digit [I] = carry & PyLong_MASK; carry > > = PyLong_SHIFT;} for (; I)

< size_a; ++i) { carry += a->

Ob_ digit [I]; z-> ob_ Digit [I] = carry & PyLong_MASK; carry > > = PyLong_SHIFT;} z-> ob_ digit [I] = carry; return long_normalize (z);}

Source code analysis:

Lines 10-15: if the length of the an array is relatively small, swap an and b, with the larger one in front (it feels like sometimes you need to swap the algorithm problem to facilitate unified processing)

Lines 16-18: create a new integer object to hold the results of the calculation (note that the length must be larger than a, because it may carry)

Lines 19-23: traverse the underlying array of b, correspond to a part of the camera ah and save it in z, you need to pay attention to carry (you can see that here it is calculated by bit and right shift, and it is also very efficient to deal with it through location calculation. Algorithm problems are also common)

Lines 24-28: traverse the rest of the underlying array of a, add it to carry and save it in z, also pay attention to the carry operation

Line 29: write carry to the highest bit of the z underlying array

Line 30: normalize z, remove the previous redundant zeros in the underlying array of calculation z

3 other large integers to float overflow

At this point, we have a certain understanding of int and float, and there is a natural problem: what if an overflow occurs when converting a large integer int to float?

Example:

> > I = 11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111

Because float has a length limit and its size has an upper limit, when we convert a large int to float, an error will be reported if the upper limit is exceeded. We can use Decimal to solve this problem: (only the usage is introduced here, and you can learn about the specific principle.)

> from decimal import Decimal > d = Decimal (I) > f2 = float (d) > f2inf

You can see that when I is converted through Decimal (), the error will not be reported.

The above is all the content of this article "Python built-in type int source code analysis". Thank you for reading! I believe you will gain a lot after reading this article. The editor will update different knowledge for you every day. If you want to learn more knowledge, please pay attention to the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.