In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)05/31 Report--
In this article Xiaobian detailed introduction of "Python search algorithm how to achieve", the content is detailed, the steps are clear, the details are handled properly, I hope this "Python search algorithm how to achieve" article can help you solve doubts, following the editor's ideas slowly in-depth, together to learn new knowledge.
The search algorithm is used to retrieve whether there is a given data (keyword) in the sequence data (group). The commonly used search algorithms are:
Linear lookup: linear lookup, also known as sequential lookup, is used to find in an unordered sequence.
Binary search: binary search is also called half-and-half search, and its algorithm is used for ordered sequences.
Interpolation search: interpolation search is an improvement of binary search algorithm.
Block lookup: also known as index order lookup, it is an improved version of linear lookup.
Tree table lookup: tree table lookup can be divided into binary search tree and balanced binary tree search.
Hash lookup: hash lookup can find the required data directly through keywords.
Because tree table lookup and hash lookup require more space, they will not be explained in this article. This paper will introduce in detail the search algorithms except tree table and hash, analyze the advantages and disadvantages of each algorithm, and put forward the corresponding optimization scheme.
1. Linear search
Linear search is also called sequential search. Linear search belongs to primitive, exhaustive and violent search algorithm. Easy to understand and easy to code and implement. However, when there is a large amount of data, because the idea of the algorithm is simple and exhaustive, there is not much optimization design in the algorithm, so the performance will be very low.
Linear search idea:
Scan each data in the original list from beginning to end and compare it with a given keyword.
If the comparison is equal, the search is successful.
When the scan ends and no data equal to the given keyword is found, the lookup failure is declared.
According to the description of the linear lookup algorithm, it is easy to code:
'' Linear search algorithm parameter: nums: sequence key: keyword return value: keyword position in the sequence if not Return-1'''def line_find (nums, key): for i in range (len (nums)): if nums [I] = = key: return i return-1 testing linear algorithm''if _ _ name__ = = "_ main__": nums = [4, 1, 8, 10, 3 5] key = int (input ("Please enter the keyword to find:") pos = line_find (nums, key) print ("keyword {0} in the {1} position of the sequence" .format (key, pos))''output result: please enter the keyword you want to find: 3 keyword 3 in the 4 position of the sequence''
Analysis of average time complexity of linear search algorithm.
1. Best luck: if the keyword you are looking for happens to be in the first position of the series, you only need to look for it once.
For example, look for the keyword 4 in the series = [4, 1, 8, 10, 3, 5].
You only need to look for it once.
two。 Worst-luck scenario: the keyword is not found until it is scanned to the end of the sequence.
For example, look for the existence of keyword 5 in the series = [4, 1, 8, 10, 3, 5].
Then the number of times you need to find it is equal to the length of the sequence, which is 6 times here.
3. Bad luck: if the keyword you are looking for is somewhere in the middle of the sequence, the probability of finding it is 1.
N is the length of the sequence.
The average number of linear lookups should be = (1cm n) / 2. Replace it with a big O that means the rule is O (n).
The constant is ignored in the big O representation.
The worst case of linear lookup is that after scanning the entire sequence, there are no keywords to look for.
For example, look for the existence of keyword 12 in the series = [4, 1, 8, 10, 3, 5].
After scanning 6 times, I failed!
Improved linear search algorithm
The linear search algorithm can be optimized accordingly. Such as setting up an "outpost". The so-called "outpost" is to insert the keyword you are looking for into the end of the sequence before looking for it.
Def line_find_ (nums, key): I = 0 while nums [I]! = key: I + = 1 return-1 if I = = len (nums)-1 else iTunes' Test Linear algorithm'if _ _ name__ = = "_ _ main__": nums = [4, 1, 8, 10, 3, 5] key = int (input ("Please enter the keyword you want to find:") # before finding First store the keyword at the end of the column nums.append (key) pos = line_find_ (nums, key) print ("keyword {0} in the {1} position of the sequence" .format (key, pos))
The time complexity of the linear search algorithm optimized by "outpost" does not change, O (n). Or from the point of view of the two codes, there is not much change.
However, from the point of view of the actual operation of the code, the second scheme not only reduces the number of if instructions, but also reduces the number of compiled instructions, which reduces the number of instructions executed by CPU. This optimization belongs to micro-optimization, not the essential optimization of the algorithm.
The code written in computer programming language is pseudo-instruction code.
The compiled instruction code is called the CPU instruction set.
One optimization solution is to reduce the compiled instruction set.
two。 Binary search
Binary search belongs to ordered search, which means that the sequence to be searched must be ordered. For example, if you look for the existence of the keyword 4 in the sequence = [4, 1, 8, 8, 10, 3, 5, 12], the binary search cannot be used because the sequence is not ordered. If you want to use the binary search algorithm, you need to sort the sequence first.
Binary search uses the idea of binary (half) algorithm, in which there are two key information that need to be obtained at any time:
One is the middle position mid_pos of the sequence.
One is the median value of the sequence, mid_val.
Now we can understand the flow of the binary search algorithm by looking for the keyword 8 in the sequence nums= [1, 3, 4, 5, 8, 10, 12].
Before doing a binary search, define 2 location (pointer) variables:
The left pointer l_idx initially points to the leftmost number of the sequence.
The right pointer r_idx initially points to the rightmost number of the sequence.
Step 1: calculate the middle position mid_pos=3 of the series from the current position of the left and right pointers, and find out that the value mid_val= nums [mid _ pos] corresponding to the middle position of the series is 5 according to the value of mid_pos.
The core of the binary search algorithm is to find out the value in the middle of the sequence.
Step 2: compare the value in the middle of the sequence with the given keyword. Here the keyword is 8, the value in the middle position is 5, obviously 8 is greater than 5, because the sequence is ordered, you will naturally think that there is no need to compare with the number before 5 in the sequence, but to concentrate on comparing with the number after 5.
The range of sequences searched again after a comparison has been halved. This is the origin of the dichotomy algorithm.
Step 3: according to the comparison result, adjust the size of the series, the size adjustment here is not physical structural adjustment, but logical adjustment, the original series does not change after the adjustment. That is to change the size of the sequence logically by changing the position of the left or right pointer. The adjusted series is shown in the following figure.
The range of sequences in the binary search algorithm is determined by the length of the left pointer to the right pointer.
Step 4: repeat the above steps until they are found or cannot be found.
Implementation of binary search algorithm by coding
'' binary search algorithm''def binary_find (nums Key): # initial left pointer l_idx = 0 # initial pointer r_ldx = len (nums)-1 while l_idx key: # indicates that the search range should be reduced to the left of the original number r_ldx = mid_pos-1 else: l_idx = mid_pos + 1 # exit 2: not found Given keyword return-1 binary 'test binary search' if _ _ name__ = = "_ _ main__": nums = [1 3,4,5,8,10,12] key = 3 pos = binary_find (nums, key) print (pos)
Through the previous analysis of the flow of the binary algorithm, we can know that the subproblem of the binary search is the same logic as the original problem, so we can use recursive implementation:
Recursive implementation of binary search def binary_find_dg (nums, key, l_idx R_ldx): if l_idx > r_ldx: # exit 1: did not find the given keyword return-1 # calculate the intermediate position mid_pos = (r_ldx + l_idx) / / 2 # calculate the value of the intermediate position mid_val = nums [mid _ pos] # compare with the keyword if mid_val = = key: # exit 2: compare equal With this keyword Return the keyword location return mid_pos elif mid_val > key: # indicates that the search scope should be reduced to the left of the original number r_ldx = mid_pos-1 else: l_idx = mid_pos + 1 return binary_find_dg (nums, key, l_idx, r_ldx)'if _ _ name__ = = "_ main__": nums = [1,3 4,5,8,10,12] key = 8 pos = binary_find_dg (nums, key,0,len (nums)-1) print (pos)
Binary search performance analysis:
It is more intuitive to describe the process of binary search with a tree structure. When the search is finished, it is drawn that the tree structure is a binary tree.
1. As in the above code execution process, first find the intermediate number 5 in the sequence, and then use 5 as the root node to build a unique node tree.
2.5 is compared with keyword 8, and then the middle number 10 is found in the right column with the number 5 as the dividing line, and the tree structure becomes as shown in the following figure.
3.10 compare with keyword 8, and then look on the left side of 10.
After finding 8, it means that the binary search has found the result, and it only takes 3 times to find the final result.
It can be concluded intuitively from the structure of the binary tree that the number of binary search keywords is determined by the depth of the keyword in the binary tree structure.
4. The above is to find the given number 8, in order to find any number in the series, the final complete tree structure should be shown in the following figure.
Obviously, the tree structure is a standard binary tree. As can be seen from the tree structure, no matter looking for any number, the minimum is once, such as looking for the number 5, it only takes 3 times at most, which is much faster than linear search.
According to the characteristics of the binary tree, the depth of the tree with n nodes is h=log2, so the time complexity of the binary search algorithm is O (logn), which is the time degree of logarithm level.
When binary search is carried out on a sequence with a length of 1000, the number of times required is only 10 times at most, and the efficiency of the binary search algorithm is obviously efficient.
However, binary search needs to sort the sequence in advance, and the previous time complexity does not take into account the sorting time. Therefore, binary search is generally suitable for ordered sequences with stable changes in numbers.
3. Interpolation search
The essence of interpolation search is binary search. Interpolation search improves the calculation logic of finding the middle position in the binary search algorithm.
The logic of calculating the intermediate position in the original binary search algorithm: the intermediate position is equal to the left pointer position plus the right pointer position and divided by 2.
# calculate the intermediate position mid_pos = (r_ldx + l_idx) / / 2
The intermediate position logic calculated by the interpolation algorithm is as follows:
Key is the keyword you are looking for!
# calculate the intermediate position mid_pos = l_idx + (key-nums [l _ idx]) / / (nums [r _ idx]-nums [l _ idx]) * (r_idx-l_idx) in the interpolation algorithm
Coding to achieve interpolation lookup:
# interpolation search is based on dichotomy, but the mid calculation method is different def binary_search (nums, key): l_idx = 0 r_idx = len (nums)-1 old_mid =-1 mid_pos = None while l_idx
< r_idx and nums[0] = key and old_mid != mid_pos: # 中间位置计算 mid_pos = l_idx + (key - nums[l_idx]) // (nums[r_idx] - nums[l_idx]) * (r_idx - l_idx) old_mid = mid_pos if nums[mid_pos] == key: return "index is {}, target value is {}".format(mid_pos, nums[mid_pos]) # 此时目标值在中间值右边,更新左边界位置 elif nums[mid_pos] < key: l_idx = mid_pos + 1 # 此时目标值在中间值左边,更新右边界位置 elif nums[mid_pos] >Key: r_idx = mid_pos-1 return "Not find" li = [1,3,4,5,8,10,12] print (binary_search (li, 6))
When calculating the intermediate position of the interpolation algorithm, it is possible to calculate the intermediate position for many times and the result is the same. At this time, it can be considered that the search failed.
The performance of interpolation algorithm is between linear search and binary search.
When there are more numbers in the series and the distribution is more uniform, the average performance of the interpolation search algorithm is much better than that of half search. If the data distribution in the series is very uneven, the interpolation algorithm is not the best choice in this case.
4. Block search
Block lookup is similar to an index query in a database, so block lookup is also called index lookup. The core of the algorithm is linear search.
The existing original sequence nums= [5, 1, 9, 9, 11, 23, 16, 12, 18, 24, 32, 29, 25] needs to find out whether the keyword 11 exists.
Step 1: before using a chunked lookup, divide the original sequence into multiple blocks by region. As for the number of pieces, it can be defined according to the actual situation. When partitioning, there is a requirement that the maximum value in the previous block must be less than the minimum value of the latter block.
The interior of the block is out of order, but keep the entire sequence ordered by block.
Block search requires that the original sequence has an ascending or descending order trend as a whole. If the distribution of the sequence does not have a tendency, if you still want to use block search, you need to adjust it in an orderly manner.
Step 2: establish the index table based on the block information. The index table should have at least two fields, the maximum number in each block, and the starting address of each block. It is obvious that the numbers in the index table are ordered.
Step 3: when looking for a given keyword, look for the index table first, and the query keyword should be in that block. For example, the query keyword 29 should be in the third block, and then enter the third block series according to the address information of the third block provided in the index table, and find the specific location of 29 according to the linear matching algorithm.
Coding to achieve block search:
First, code to create an index table according to the number of blocks. Here, a two-dimensional list is used to save the information in the index table.
'' chunking: establishing index table parameters: nums raw sequence blocks block size''def create_index_table (nums, blocks): # Index table using list to save index_table = [] # number of each block n = len (nums) / / blocks for i in range (0, len (nums)) N): # each row of records in the index table tmp_lst = [] # maximum tmp_lst.append (max (Nums [I: I + nmur1])) # start address tmp_lst.append (I) # end address tmp_lst.append (I + nMel 1) # add to the index Index_table.append (tmp_lst) return index_table''' test block''nums = [5] in the citation table 1, 9, 11, 23, 16, 12, 18, 24, 32, 29, 25] it = create_index_table (nums, 3) print (it)''output result: [[11,0,3], [23,4,7], [32,8,11]]''
After the code is executed, the output is the same as the result of the analysis.
The above code only blocks the sequence in which the overall trend is ordered. If the whole is not orderly, it is necessary to provide a corresponding block sorting scheme, which can be done by those who are interested.
The above code is only to illustrate the block search algorithm.
The complete code for the chunk search:
'' chunking: establishing index table parameters: nums raw sequence blocks block size''def create_index_table (nums, blocks): # Index table using list to save index_table = [] # number of each block n = len (nums) / / blocks for i in range (0, len (nums)) N): tmp_lst = [] tmp_lst.append (max (Nums [I: I + n-1])) tmp_lst.append (I) tmp_lst.append (I + n-1) index_table.append (tmp_lst) return index_table''' uses a linear search algorithm to find 'def lind_find (nums, start, end): for i in range (start) in the corresponding block End): if key = = nums [I]: return i break return-1 blocks' test chunks''nums = [5, 1, 9, 11, 23, 16, 12, 18, 24, 32, 29, 25] key = index table it = create_index_table (nums 3) # record number of the index table pos =-query for n in range (len (it)-1) in the index table: # is if key in the first block
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.