In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces how to move the window on the vectorized NumPy array, the content is detailed and easy to understand, the operation is simple and fast, and it has a certain reference value. I believe you will have something to gain after reading this article on how to move the window on the vectorized NumPy array. Let's take a look.
What is a sliding window?
The following example shows a 3 × 3 (3 × 3) sliding window. Array elements marked in red are target elements. This is the array position of the new metric that the sliding window will calculate. For example, in the following image, we can calculate the average of nine elements in the gray window (the average is also 8) and assign them to the target element, marked in red. You can calculate the minimum value (0), the maximum value (16), or some other indicator instead of the average. This is done for each element in the array.
okay. This is the basic principle of sliding window. Of course, things may get more complicated. The finite difference method can be applied to temporal and spatial data. Logic can be implemented. You can use larger window sizes or non-square windows. You know. But at its core, mobile window analysis can be simply summarized as the average of neighbor elements.
It is important to note that special adjustments must be set for edge elements because they do not have nine adjacent elements. As a result, many analyses exclude edge elements. For simplicity, we will exclude edge elements in this article.
Sample array
Sliding window of 3x3
Create a NumPy array
To implement some simple examples, let's create the array shown above. First, import numpy.
Import numpy as np
Then use arange to create a 7 × 7 array with values ranging from 1 to 48. In addition, create another array that contains countless values with the same shape and data type as the original array. In this case, I use-1 as an infinite data value.
A = np.arange (49). Reshape ((7,7)) b = np.full (a.shape,-1.0)
We will use these arrays to develop the following sliding window example.
Realization of sliding window through Loop
There is no doubt that you have heard that loops in Python are slow and should be avoided as much as possible. Especially when using large NumPy arrays. This is absolutely true. However, we will first look at an example of using loops, because this is an easy way to conceptualize what happens in a moving window operation. After you have mastered the concept through the loop example, we will continue to use a more efficient vectorization method.
To move the window, simply loop through all the internal array elements, identify the values of all adjacent elements, and use these values in specific calculations.
Adjacent values can be easily identified by row and column offsets. The offset of the 3 × 3 window is shown below.
Row offset
Column offset
The Python code of the NumPy mobile window in the loop
We can implement a mobile window in three lines of code. This example calculates the average in a sliding window. First, loop through the inner rows of the array. Second, loop through the inner columns of the array. Third, calculate the average value in the sliding window and assign the value to the corresponding array element in the output array.
For i in range (1, a.shape [0]-1): for j in range (1, a.shape [1]-1): B [I, j] = (a [I-1, jmur1] + a [I-1, j] + a [I-1, jmur1] + a [I, j] + a [I, j] + a [I, JLV 1] + a [iLife1, j] + a [iLiq1] ]) / 9.0 results after cycle
You will notice that the result has the same value as the input array, but the external elements are not assigned data values because they do not contain nine adjacent elements.
[- 1. -1. -1. -1. -1. -1. -1.] [- 1. 8. 9. 10. 11. 12.-1.] [- 1. 15. 16. 17. 18. 19.-1. [- 1. twenty-two。 23. 24. 25. twenty-six。 -1.] [- 1. twenty-nine。 thirty。 thirty-one。 thirty-two。 thirty-three。 -1.] [- 1. thirty-six。 thirty-seven。 thirty-eight。 thirty-nine。 forty。 -1.] [- 1. -1. -1. -1. -1. -1. -1.] Vectorization sliding window
Array loops in Python are generally inefficient. Efficiency can be improved by vectorizing operations that are typically performed in a loop. Moving window vectorization can be achieved by canceling all the elements within the array at the same time.
This is shown in the following figure. Each image has a corresponding index. You will notice that the last image indexes all internal elements, and the corresponding image indexes the offset of each adjacent element.
The offset index from left to right: [1], [1], [2:], [2:]
Offset index from left to right: [2:,:-2], [2:, 1], [:-2]
Offset index from left to right: [:-2jue 2:], [:-2je mai Mou 2], [1RV Rue 1, 1Rue Mel 1]
The Python code of the vectorized mobile window on the Numpy array
With the above offset, we can now easily implement a sliding window in a single line of code. Simply set all the internal elements of the output array to a function that calculates the desired output based on adjacent elements.
B [1:-1,1] = (a [1:-1,1] + a [:-2,1] + a [2] + a [1:-1,:-2] + a [1:-1,:-2] + a [2:-1, 2:] + a [:-2,:-2] + a [:-2, 2:]) / 9.0 Vectorization sliding window result
As you can see, this will get the same result as the loop.
[- 1. -1. -1. -1. -1. -1. -1.] [- 1. 8. 9. 10. 11. 12.-1.] [- 1. 15. 16. 17. 18. 19.-1. [- 1. twenty-two。 23. 24. 25. twenty-six。 -1.] [- 1. twenty-nine。 thirty。 thirty-one。 thirty-two。 thirty-three。 -1.] [- 1. thirty-six。 thirty-seven。 thirty-eight。 thirty-nine。 forty。 -1.] [- 1. -1. -1. -1. -1. -1. -1.] Speed comparison
The above two methods produce the same results, but which is more effective? I calculated the speed of each method for an array from 5 rows to 100 columns. Each method was tested 100 times for each. Here is the average time for each method.
Obviously, the vectorization method is more effective. As the size of the array increases, the efficiency of the loop decreases exponentially. In addition, it is important to note that an array of 10000 elements (100rows and columns) is very small.
Summary
Mobile window computing is very common in many data analysis workflows. These calculations are very useful and very easy to implement. However, using loops to implement sliding window operations is very inefficient.
The vectorized mobile window implementation is not only more efficient, but also uses fewer lines of code. Once you have mastered the vectorization method to realize the sliding window, you can easily and effectively improve the speed of the workflow.
Add: Python Learning Notes-Mobile sliding window of Numpy array, realized by as_strided
Why does the realization of moving sliding window in Numpy need to move sliding window
In the process of quantitative investment analysis, the analysis of historical data is an indispensable step. The importance of sliding window in historical data analysis is self-evident. For example, moving average, exponential smooth moving average, MACD, DMA and other price indicators all need to use sliding window.
As a very popular data analysis tool, pandas provides a special sliding window class: DataFrame.rolling (). Through this sliding window class, it is very easy to implement the moving average algorithm, etc. However, in some cases, the running speed of Pandas is not enough, we need to further improve the speed with the help of the high efficiency of Numpy, so we need to implement sliding window in Numpy.
Moving sliding window in Numpy
Unfortunately, Numpy does not provide a direct and simple sliding window method, if you use for-loop to achieve sliding window, not only the efficiency is reduced, but also the memory footprint is very large. In fact, Numpy provides a very low-level function that can be used to generate sliding windows: Numpy.lib.stride_tricks.as_stried.
As_strided implementation method of moving sliding window
For example, first generate a two-dimensional array of 5000 rows and 200columns, and we need to generate a sliding window with a width of 200on this two-dimensional array, that is, the first window contains the first 0200 rows of data, the second window contains 1200rows, the third window contains 220201 rows, and so on, a total of 4801 groups:
In: d = np.random.randint (100, size= (5000200))
If you use the as_strided function to generate the above sliding windows, you need the following code, which generates a three-dimensional array, including 4801 sets of 200X200 matrices, with each set of 200X200 matrices representing a set of sliding windows:
In:% timeit sd = as_strided (d, (4801200200), (2003,2002,8)) 5.97 μ s ±33.2ns per loop (mean ±std. Dev. Of 7 runs, 100000 loops each)
Let's try again to generate a sliding window using for-loop 's method to verify that the previously generated sliding window is correct:
In [108]:% timeit.: sd2 = np.zeros ((4801200200)): for i in range (4801):...: sd2 [I] = d [i:i+200].: 722 ms ±98.8 ms per loop (mean ±std. Dev. Of 7 runs, 1 loop each) In [109]: np.allclose (sd, sd2) Out: True
As can be seen from the above code, using as_strided to generate a set of sliding windows is more than 100, 000 times faster than for-loop! So how does as_strided do it?
Detailed Analysis of as_strided function
What's going on with as_strided? Look at its functional explanation:
Signature: as_strided (x, shape=None, strides=None, subok=False, writeable=True)
Docstring:
Create a view into the array with the given shape and strides.
.. Warning:: This function has to be used with extreme care, see notes.
Parameters
-
X: ndarray
Array to create a new.
Shape: sequence of int, optional
The shape of the new array. Defaults to "x.shape".
Strides: sequence of int, optional
The strides of the new array. Defaults to "x.strides".
Subok: bool, optional
If True, subclasses are preserved.
Writeable: bool, optional
If set to False, the returned array will always be readonly. Otherwise it will be writable if the original array was. It is advisable to set this to False if possible (see Notes).
Returns
-
View: ndarray
The first argument this function accepts is an array, the second argument is the output data shape, and the third argument is stride. Both shape and stride are very important to control the output of data.
The meaning of shape is very simple, which refers to the number of rows, columns, and layers of the output data. This parameter is a tuple, and the number of elements in the tuple is equal to the dimension of the array.
The meaning of stride is more complicated. In fact, it means "stride", which means the number of bytes of data in each dimension translated in memory.
Because the array is stored in memory in an one-dimensional linear way, to access a number in the array, you need to know which memory unit to translate to, and ndarray specifies the magnitude of this translation through the stride "stride".
In the as_strided function, stride is also a tuple, and the number of elements must be the same as the number of elements in shape, and each element represents the memory interval of each data of the dimension relative to the previous data.
For example:
In [188d]: d = np.random.randint (10, size= (5jue 3)) In [189]: dOut [189]: array ([[4,4,6], [2,9,3], [5,1,1], [2,0,0], [9,2] 3]]) address 0 address 1 address 2 address 3 address 4 address 5 address 6 address 8 address 9 address An address B address C address D address E445293511200923
The reason why we see a two-dimensional array is that the shape of the numpy array is (5,3) and the stride is (24,8), which means that the data we see has five rows and three columns, corresponding to the shape (5,3), and each row is 24 bytes apart from the previous row (actually three digits, because each int type occupies 8 bytes, and each column is 8 bytes less than the previous column).
After understanding the meaning of the above, we can also understand how to generate a data slide window. If we need to generate a 2X3 data slide window and slide on d, we can actually generate a data view of 4 groups, 2 rows and 3 columns. The first group covers rows 0 and 1 of d, the second layer covers rows 1 and 2 of d, and the third layer covers rows 2 and 3 of d. This creates the effect of a data sliding window, and we can traverse the entire sliding window as long as we traverse the new data view. The advantage of this is that there is no need to move or copy the data at all during the traversal, so it is very fast.
According to the above idea, we need to generate a new data view whose shape (4, 2, 3) represents 4 groups (slide 4 times from beginning to end), 2 rows and 3 columns (sliding window size)
Next, we need to determine stride, as mentioned earlier, stride is also a tuple of three elements, the first element is the memory interval between two layers of data, because each time our sliding window slides down one line, the layer stride should translate three digits, that is, 24 bytes, row stride and column stride are the same as the original row stride, because we need to see the numbers in order, so The new stride is: (24,24,8)
Let's take a look at what this new data view looks like:
In: as_strided (d, shape=, strides= 24) Out: array [4,4,6], [2,9,3], [2,9,3], [2,1,1], [5,1,1], [2,0,0], [2,0,0]])
Look! A data slide window appears correctly!
The dangers of using the as_strided function
The biggest problem with using the s_strided function is the risk of memory reading. When as_strided generates a new view, it directly manipulates the memory address (much like C's pointer operation), and it does not check whether the memory address is out of bounds, so if you are not careful, you will read another memory address. The point is that if you don't set readable parameters, you can manipulate the data in memory directly, which brings great risk. Understanding this risk is critical to proper operation!
For example, using the following stride will directly overflow to another unknown memory address, read its value, and even modify it directly:
In [194]: as_strided (d, shape= (5, shape=), strides= (24)) Out [194]: array ([[4,4,6], [2,9,3]], [2,9] 3], [5, 1, 1]], [[5, 1, 1], [2, 0, 0]] [[2, 0, 0], [9, 2, 3]], [[9, 2, 3], [2251799813685248, 18963 0])
At this time, the fifth group of objects is mapped to three unknown memory addresses, and if you inadvertently modify the contents of these three addresses, it may cause unpredictable problems, such as program crash.
Therefore, the official document solemnly warned: if possible, try to avoid using the as_strided function
This is the end of the article on "how to move windows on a vectorized NumPy array". Thank you for reading! I believe you all have a certain understanding of the knowledge of "how to move windows on the vectorized NumPy array". If you want to learn more, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.