Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to solve the TopK problem by heap sorting

2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article focuses on "heap sorting how to solve the TopK problem", interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how to solve the TopK problem by heap sorting.

Find the k largest element in the unsorted array. Note that you are looking for the k largest element of the sorted array, not the k different element.

Example 1: input: [3, 2, 1, 5, 5, 6, 4] and k = 2 output: 5 example 2: input: [3, 2, 2, 3, 1, 2, 4, 4, 5, 5, 5, 5, 6] and k = 4, output: 4.

The classic TopK problem also includes: the maximum (minimum) K number, the first K high frequency element, and the K largest (minimum) element.

This TopK problem is essentially a sorting problem, there are ten sorting algorithms, and there are many sorting algorithms that have not been introduced.

As for why the best answer to the TopK question is heap sorting? In fact, in terms of space and time complexity, although fast sorting is the best sorting algorithm, it sorts from large to small for 10 billion elements, and then outputs the first K element values.

However, whether we master the fast sort algorithm or the heap sort algorithm, we need to read all the elements into memory when sorting. In other words, 10 billion integer elements need to take up about the memory space of 40GB, which doesn't sound like what the average civilian computer can do (the average civilian computer has less memory than this, for example, the computer memory I use to write the article is 32GB).

It is well known that the time complexity of both quick sort and heap sort can be achieved, but for quick sort, the data is accessed sequentially. For heap sorting, the data is accessed by skips. For example, in heap sorting, one of the most important operations is the stacking of data. Therefore, the time complexity of quick sorting is better than that of heap sorting.

But quick sorting is a new array, and the space complexity is much lower than that of heap sorting. For a large amount of data, heap sorting should be preferred.

If you use the heapq built-in module, finding the K largest element in the array is a line of code. The nlargest interface in heapq is encapsulated, and an array is returned, which needs to be sliced.

Import heapq class Solution: def findKthLargest (self, nums: List [int], k: int)-> int: return heapq.nlargest (KMagol nums) [- 1]

Of course, handwritten heap sorting is generally used to find the K largest element in the array to establish the minimum heap, and to find the K smallest element in the array to establish the maximum heap.

Idea: "take the first K elements of nums to build a minimum heap of size K, and then maintain a small top heap with a capacity of k. The k nodes in the heap represent the current largest k elements, and the top of the heap is obviously the minimum of these k elements. "

So as long as you traverse the entire array, when the binary heap size is equal to K, when you encounter an element greater than the heap top value, pop up the top of the heap and press it into the element, continuously maintaining the largest K elements. At the end of the traversal, the top element of the heap is the K largest element. Time complexity.

Class Solution: def findKthLargest (self, nums: List [int], k: int)-> int: heapsize=len (nums) def maxheap Length): l=2*i+1 r=2*i+2 large=i if la [large]: large=l if ra [large]: large=r if Largehammer roomi: a [large], a [I] = a [I], a [large] maxheap (a large Length) def buildheap (a heapsize//2,-1,-1 length): for i in range (heapsize//2,-1,-1): maxheap (a recorder I paper length) buildheap (nums,heapsize) for i in range (heapsize-1,heapsize-k,-1): nums [0], nums [I] = nums [I] Nums [0] heapsize-=1 maxheap (nums,0,heapsize) return nums [0]

On the contrary, if the first k is the minimum, then the maximum heap is used, so in the face of the TopK problem, the most perfect solution is heap sorting. So, only you can see the K th of the array. I immediately think of heap sorting.

If there is no need for a "high-end" algorithm when the data scale is small and the requirements for time complexity and space complexity are not high, it is perfect to write a quick schedule.

The TopK problem is like a search engine that receives a large number of user search requests every day. It will record the search keywords entered by these users, and then analyze them offline to get the most popular Top10 search keywords.

At this point, I believe you have a deeper understanding of "heap sorting how to solve the TopK problem". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report