What are the techniques for allowing Python performance to take off 07/16 Update SLTechnology News&Howtos

What are the techniques for allowing Python performance to take off

2025-07-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "what are the skills to let Python performance take off". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "what are the skills to let Python performance take off".

How to measure the execution time of a program

The question of how Python accurately measures the execution time of programs seems simple but complex, because the execution time of programs is affected by many factors, such as operating system, Python version, and related hardware (CPU performance, memory read and write speed), and so on. The above factors are certain when running the same version of the language on the same computer, but the sleep time of the program is still variable, and other programs running on the computer can also interfere with the experiment. so strictly speaking, this is that the experiment cannot be repeated.

The two representative libraries I have learned about timing are time and timeit.

Among them, there are three functions in the time library: time (), perf_counter () and process_time (), which can be used for timing (in seconds). The suffix _ ns indicates timing in nanoseconds (starting from Python3.7). There was a clock () function before that, but it was removed after Python3.3. The differences between the above three are as follows:

Time () is relatively less precise, and affected by the system, it is suitable to represent the date and time or the timing of large programs.

Perf_counter () is suitable for smaller program tests and calculates the sleep () time.

Process_time () is suitable for smaller program testing and does not calculate sleep () time.

Compared with the time library, timeit has two advantages:

Timeit will choose the best timer based on your operating system and Python version.

Timeit temporarily disables garbage collection during the timing period.

Parameter description of timeit.timeit (stmt='pass', setup='pass', timer=, number=1000000, globals=None):

Stmt='pass': requires a statement or function for timing.

The code that setup='pass': will run before executing stmt. Typically, it is used to import modules or declare necessary variables.

Timer=: timer function, which defaults to time.perf_counter ().

Number=1000000: the number of times the timing statement is executed. The default is one million times.

Globals=None: specifies the namespace in which the code is executed.

All the timings in this paper use the timeit method, and the default execution number is one million times.

Why do you have to execute a million times? Because our test program is very short, if we don't execute it so many times, we can't see the gap at all.

1. Use map () for function mapping

Exp1: converts lowercase letters in a string array to uppercase letters.

The test array is oldlist = ['life',' is', 'short',' iota, 'choose',' python'].

Method one

Newlist = [] for word in oldlist: newlist.append (word.upper ())

Method two

List (map (str.upper, oldlist))

The first method takes 0.5267724000000005s, the second method 0.41462569999999843s, and the performance is improved by 21.29%.

two。 Use set () to find the intersection

Exp2: find the intersection of two list.

Test array: a = [1, 2, 2, 3, 4, 5], b = [2, 4, 6, 6, and 8, 10].

Method one

Overlaps = [] for x in a: for y in b: if x = = y: overlaps.append (x)

Method two

List (set (a) & set (b))

Method 1 takes 0.95072640000006s, method 2 takes 0.6148200999999993s, and the performance is improved by 35.33%.

The syntax for set (): |, &,-denotes union, intersection, and subtraction, respectively.

3. Sort using sort () or sorted ()

We can sort the sequence in many ways, but in fact, the loss of writing our own sorting algorithm outweighs the gain. Because the built-in sort () or sorted () method is good enough, and the use of the parameter key can achieve different functions, very flexible. The difference between the two is that the sort () method is defined only in list, while sorted () is a global method that is valid for all iterable sequences.

Exp3: sort the same list using the Quick sort and sort () methods, respectively.

Test array: lists = [2, 1, 4, 4, 3, 0].

Method one

Def quick_sort (lists,i,j): if I > = j: return list pivot = lists [I] low = I high = j while i

< j: while i < j and lists[j] >

= pivot: J-= 1 lists [I] = lists [j] while I

< j and lists[i] b[2] else 1 #成绩姓名都相同，按照年龄降序排序 students = [('john', 'A', 15),('john', 'A', 14),('jane', 'B', 12),('dave', 'B', 10)]sorted(students, key = functools.cmp_to_key(cmp))4.使用collections.Counter()计数 Exp4：统计字符串中每个字符出现的次数。测试数组：sentence='life is short, i choose python'。方法一 counts = {}for char in sentence: counts[char] = counts.get(char, 0) + 1 方法二 from collections import CounterCounter(sentence) 方法一耗时 2.8105250000000055s，方法二耗时 1.6317423000000062s，性能提升 41.94% 5.使用列表推导列表推导（list comprehension）短小精悍。在小代码片段中，可能没有太大的区别。但是在大型开发中，它可以节省一些时间。 Exp5：对列表中的奇数求平方，偶数不变。测试数组：oldlist = range(10)。方法一 newlist = []for x in oldlist: if x % 2 == 1: newlist.append(x**2) 方法二 [x**2 for x in oldlist if x%2 == 1] 方法一耗时 1.5342976000000021s，方法二耗时 1.4181957999999923s，性能提升 7.57% 6.使用 join() 连接字符串大多数人都习惯使用+来连接字符串。但其实，这种方法非常低效。因为，+操作在每一步中都会创建一个新字符串并复制旧字符串。更好的方法是用 join() 来连接字符串。关于字符串的其他操作，也尽量使用内置函数，如isalpha()、isdigit()、startswith()、endswith()等。 Exp6：将字符串列表中的元素连接起来。测试数组：oldlist = ['life', 'is', 'short', 'i', 'choose', 'python']。方法一 sentence = ""for word in oldlist: sentence += word 方法二 "".join(oldlist) 方法一耗时 0.27489080000000854s，方法二耗时 0.08166570000000206s，性能提升 70.29% join还有一个非常舒服的点，就是它可以指定连接的分隔符，举个例子???? oldlist = ['life', 'is', 'short', 'i', 'choose', 'python']sentence = "//".join(oldlist)print(sentence) life//is//short//i//choose//python 7.使用x, y = y, x交换变量 Exp6：交换x，y的值。测试数据：x, y = 100, 200。方法一 temp = xx = yy = temp 方法二 x, y = y, x 方法一耗时 0.027853900000010867s，方法二耗时 0.02398730000000171s，性能提升 13.88% 8.使用while 1取代while True 在不知道确切的循环次数时，常规方法是使用while True进行无限循环，在代码块中判断是否满足循环终止条件。虽然这样做没有任何问题，但while 1的执行速度比while True更快。因为它是一种数值转换，可以更快地生成输出。 Exp8：分别用while 1和while True循环 100 次。方法一 i = 0while True: i += 1 if i >

100: break

Method two

I = 0while 1: I + = 1 if I > 100: break

Method 1 takes 3.679268300000004s, method 2 takes 3.607847499999991s, and the performance is improved by 1.94%.

9. Use decorator caching

Storing files in a cache helps to quickly restore functionality. Python supports decorator caching, which maintains a specific type of cache in memory to achieve optimal software driver speed. We use the lru_cache decorator to provide caching for Fibonacci functions, and there are a lot of repeated calculations when using fibonacci recursive functions, such as fibonacci (1) and fibonacci (2). After using lru_cache, all the repeated calculations will be performed only once, thus greatly improving the execution efficiency of the program.

Exp9: find the Fibonacci series.

Test data: fibonacci (7).

Method one

Def fibonacci (n): if n = = 0: return 0 elif n = = 1: return 1 return fibonacci (n-1) + fibonacci (NMur2)

Method two

Import functools@functools.lru_cache (maxsize=128) def fibonacci (n): if n = = 0: return 0 elif n = = 1: return 1 return fibonacci (n-1) + fibonacci (NMur2)

Method 1 takes 3.955014900000009s, method 2 takes 0.05077979999998661s, and the performance is improved by 98.72%.

Note:

The cache is based on the parameter as the key, that is, when the parameter is constant, the function decorated by lru_cache will be executed only once.

All parameters must be hashable, for example, list cannot be used as an argument to a function decorated by lru_cache.

Import functools @ functools.lru_cache (maxsize=100) def demo (a, b): print ('I was executed') return a + bif _ _ name__ ='_ main__': demo (1, 2) demo (1, 2)

I was executed (demo (1,2) was executed twice, but only output once)

From functools import lru_cache @ lru_cache (maxsize=100) def list_sum (nums: list): return sum (nums) if _ _ name__ ='_ _ main__': list_sum ([1,2,3,4,5])

TypeError: unhashable type: 'list'

Two optional parameters for functools.lru_cache (maxsize=128, typed=False):

Maxsize represents the cached memory footprint value, beyond which the result is freed, and the new calculation result is cached, which should be set to the power of 2.

If typed is True, the results obtained by different parameter types will be saved separately.

10. The reduction point operator (.) The use of

Dot operator (.) A property or method used to access an object, which causes the program to use _ _ getattribute__ () and _ _ getattr__ () for dictionary lookups, resulting in unnecessary overhead. In particular, it is important to reduce the use of dot operators in loops and move it out of the loop.

This inspires us to use from as much as possible. Import... This is a way to guide the package, rather than getting it through the dot operator when you need to use a method. In fact, it is not only the dot operator, but also many other unnecessary operations that we try to move outside the loop.

Exp10: converts lowercase letters in a string array to uppercase letters.

The test array is oldlist = ['life',' is', 'short',' iota, 'choose',' python'].

Method one

Newlist = [] for word in oldlist: newlist.append (str.upper (word))

Method two

Newlist = [] upper = str.upperfor word in oldlist: newlist.append (upper (word))

Method 1 takes 0.7235491999999795s, method 2 takes 0.5475435999999831s, and the performance is improved by 24.33%.

11. Use for loop instead of while loop

When we know exactly how many times to loop, it is better to use a for loop than a while loop.

Exp12: use for and while to loop 100 times respectively.

Method one

I = 0while I < 100: I + = 1

Method two

For _ in range: pass

Method 1 takes 3.89468329999997s, method 2 takes 1.01980779999953s, and the performance is improved by 73.82%.

twelve。 Use Numba.jit to accelerate computing

Numba can encode and decode Python functions into machine code execution, which greatly improves the speed of code execution, even close to the speed of C or FORTRAN. It can be used in conjunction with Numpy and can significantly improve execution efficiency in the for loop or when there are a large number of calculations.

Exp12: add the sum from 1 to 100.

Method one

Def my_sum (n): X = 0 for i in range (1,1): X + = i return x

Method two

From numba import jit@jit (nopython=True) def numba_sum (n): X = 0 for i in range (1,1): X + = i return x

Method 1 takes 3.71999970000167s, method 2 takes 0.23769430000001535s, and the performance is improved by 93.61%.

13. Using Numpy vectorization array

Vectorization is a powerful feature in NumPy where operations can be expressed as occurring on an entire array rather than on individual elements. This practice of replacing explicit loops with array expressions is often called vectorization.

There is a lot of overhead involved when looping an array or any data structure in Python. Vectorization operations in NumPy delegate internal loops to highly optimized C and Fortran functions, making Python code faster.

Exp13: two sequences of the same length are multiplied element by element.

Test array: a = [1, 2, 2, 3, 4, 5], b = [2, 4, 6, 6, 8, 10]

Method one

[a [I] * b [I] for i in range (len (a))]

Method two

Import numpy as npa = np.array ([1, 2, 2, 3, 4, 5)) b = np.array ([2, 4, 6, 6, 8, 10]) aforb

The first method takes 0.6706845000000214s, and the second method takes 0.3070132000000001s, and the performance is improved by 54.22%.

14. Use in to check list members

To check whether a member is included in the list, it is usually faster to use the in keyword.

Exp14: check whether a member is included in the list.

Test array: lists = ['life',' is', 'short',' iota, 'choose',' python']

Method one

Def check_member (target, lists): for member in lists: if member = = target: return True return False

Method two

If target in lists: pass

The first method takes 0.160384499999216s, the second method 0.04139250000000061s, and the performance is improved by 74.19%.

15. Iterate using the itertools library

Itertools is a module used to operate iterators, and its functions can be divided into three types: infinite iterator, finite iterator and combinatorial iterator.

Exp15: returns the full arrangement of the list.

Test array: ["Alice", "Bob", "Carol"]

Method one

Def permutations (lst): if len (lst) = = 1 or len (lst) = 0: return [lst] result = [] for i in lst: temp_lst = lst [:] temp_lst.remove (I) temp = permutations (temp_lst) for j in temp: j.insert (0, I) result.append (j) return result

Method two

Import itertoolsitertools.permutations (["Alice", "Bob", "Carol"])

Method 1 takes 3.867292899999484s, method 2 takes 0.3875405000007959s, and the performance is improved by 89.98%.

Expand: itertools Library details: click this link

Conclusion

According to the above test data, I have drawn the following experimental result diagram, which can more intuitively see the performance differences brought about by different methods.

As can be seen from the figure, the performance increase brought about by most of the techniques is considerable, but there are also a small number of skills with small increases (for example, numbered 5, 7, 8, where there is little difference between the two methods in Article 8).

To sum up, I think it is actually the following two principles:

1. Try to use built-in library functions

Built-in library functions are written by professional developers and have been tested many times, and the underlying layers of many library functions are developed in C language. Therefore, these functions are generally very efficient (such as sort (), join (), etc.), and the methods written by yourself are difficult to surpass them, so it is better to save effort and not to repeat the wheels, not to mention that the wheels you build may be even worse. So, if the function already exists in the library, use it directly.

two。 Try to use excellent third-party libraries

There are many excellent third-party libraries, the underlying of which may be implemented in C and Fortran, and libraries like this have nothing to lose, such as Numpy and Numba mentioned earlier, and the improvements they bring are amazing. There are many libraries like this, such as Cython, PyPy, etc., here I just throw a brick to attract jade.

In fact, there are many ways to speed up the execution of Python code, such as avoiding global variables, using the latest version, using appropriate data structures, taking advantage of the inertia of if conditions, and so on.

Thank you for your reading, the above is the content of "what are the skills for Python performance take-off". After the study of this article, I believe you have a deeper understanding of what skills to let Python performance take off, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.