Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the reason why multiplication is faster than bit operation in Python sometimes?

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article mainly introduces "what is the reason that sometimes multiplication is faster than bit operation in Python". In daily operation, I believe that many people have doubts about the reason why multiplication is faster than bit operation in Python. Xiaobian consulted all kinds of data and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the doubt that "sometimes multiplication is faster than bit operation in Python." Next, please follow the editor to study!

First of all, in the spirit of seeking truth from facts, let's first verify:

In [33]:% timeit 107374182502 7.47 ns ±0.0843 ns per loop (mean ±std. Dev. Of 7 runs, 100000000 loops each) In [34]:% timeit 1073741825 bsize) {T1 = a; a = b; b = T1; I = asize; asize = bsize; bsize = I;} / * Use gradeschool math when either number is too small. * / I = a = b? KARATSUBA_SQUARE_CUTOFF: KARATSUBA_CUTOFF; if (asize ob_size. That * leads to a sequence of balanced calls to k_mul. * / if (2 * asize > 1; if (kmul_split (a, shift, & ah, & al)

< 0) goto fail; assert(Py_SIZE(ah) >

0); / * the split isn't degenerate * / if (a = = b) {bh = ah; bl = al; Py_INCREF (bh); Py_INCREF (bl);} else if (kmul_split (b, shift, & bh, & bl)

< 0) goto fail; /* The plan: * 1. Allocate result space (asize + bsize digits: that's always * enough). * 2. Compute ah*bh, and copy into result at 2*shift. * 3. Compute al*bl, and copy into result at 0. Note that this * can't overlap with #2. * 4. Subtract al*bl from the result, starting at shift. This may * underflow (borrow out of the high digit), but we don't care: * we're effectively doing unsigned arithmetic mod * BASE**(sizea + sizeb), and so long as the *final* result fits, * borrows and carries out of the high digit can be ignored. * 5. Subtract ah*bh from the result, starting at shift. * 6. Compute (ah+al)*(bh+bl), and add it into the result starting * at shift. */ /* 1. Allocate result space. */ ret = _PyLong_New(asize + bsize); if (ret == NULL) goto fail; #ifdef Py_DEBUG /* Fill with trash, to catch reference to uninitialized digits. */ memset(ret->

Ob_digit, 0xDF, Py_SIZE (ret) * sizeof (digit); # endif / * 2. T1 = 0); assert (2*shift + Py_SIZE (T1) ob_digit + 2*shift, T1-> ob_digit, Py_SIZE (T1) * sizeof (digit)); / * Zero-out the digits higher than the ah*bh copy. * / I = Py_SIZE (ret)-2*shift-Py_SIZE (T1); if (I) memset (ret- > ob_digit + 2*shift + Py_SIZE (T1), 0, I * sizeof (digit)); / * 3. T2 = 0); assert (Py_SIZE (T2) ob_digit, T2-> ob_digit, Py_SIZE (T2) * sizeof (digit)); / * Zero out remaining digits. * / I = 2*shift-Py_SIZE (T2); / * number of uninitialized digits * / if (I) memset (ret- > ob_digit + Py_SIZE (T2), 0, I * sizeof (digit)); / * 4 & 5. Subtract ah*bh (T1) and al*bl (T2). We do al*bl first * because it's fresher in cache. * / I = Py_SIZE (ret)-shift; / * # digits after shift * / (void) v_isub (ret- > ob_digit + shift, I, T2-> ob_digit, Py_SIZE (T2)); Py_DECREF (T2); (void) v_isub (ret- > ob_digit + shift, I, T1-> ob_digit, Py_SIZE (T1)); Py_DECREF (T1); / * 6. T3 = 0); / * Add T3. It's not obvious why we can't run out of room here. * See the (*) comment after this function. * / (void) v_iadd (ret- > ob_digit + shift, I, T3-> ob_digit, Py_SIZE (T3)); Py_DECREF (T3); return long_normalize (ret); fail: Py_XDECREF (ret); Py_XDECREF (ah); Py_XDECREF (al); Py_XDECREF (bh); Py_XDECREF (bl); return NULL;}

Here is not the implementation of Karatsuba algorithm 1 to do a separate explanation, interested friends can refer to the reference at the end of the article to understand the specific details.

In general, the time complexity of ordinary multiplication is n ^ 2 (n is digits), while the time complexity of K algorithm is 3n ^ (log3) ≈ 3n ^ 1.585. it seems that the performance of K algorithm is better than ordinary multiplication, so why doesn't Python use K algorithm altogether?

Quite simply, the advantage of the K algorithm actually forms an advantage over ordinary multiplication only when n is large enough. At the same time, considering the memory access and other factors, when n is not large enough, the performance of K algorithm will be worse than that of direct multiplication.

So let's take a look at the implementation of multiplication in Python:

Static PyObject * long_mul (PyLongObject * a, PyLongObject * b) {PyLongObject * z; CHECK_BINOP (a, b); / * fast path for single-digit multiplication * / if (Py_ABS (Py_SIZE (a)) remshift) & lomask; if (iTun1)

< newsize) z->

Ob_ digit [I] | = (a-> ob_ digit [j + 1])

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report