In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
How to use TopN algorithm in 1 billion integers to find the first 1000 largest number, I believe many inexperienced people are helpless, for this reason this article summarizes the causes of the problem and solutions, through this article I hope you can solve this problem.
Interview question: How to find the first 1000 largest numbers among 1 billion integers.
We know that there are many sorting algorithms:
Bubble algorithm: through the two-layer for loop, the outer loop to find the first largest element in the array placed in the penultimate position, the second loop to find the second largest element placed in the penultimate position. The loop N times can find TopN.
Disadvantages: Bubble sort inner loop requires a lot of swapping elements. Complexity is between O(n) and O(n^2).
Quick sort: Select a base element, each sort can put this base element in the correct position, the left is smaller than the base element, the right is larger than the base element so as to divide the array into two parts, divide and rule. The same is true for the TopN problem, where you select a base element and place it in the correct position by quick sorting. If the number of elements on the left is less than 1000, you continue sorting from the right of the base. If the number of elements on the left is greater than 1000, you sort from the left of the base until the base is exactly at 1000.
Disadvantages: The first sort complexity is O(n), the second sort complexity is O(n/2), the third sort complexity is O(n/4)...
File storage, divide and conquer:
Store elements smaller than the benchmark in txt1, files larger than the benchmark in txt2, and then find TopN in a form similar to Method 2.
Disadvantages: Disk reads, excessive writes.
MapReduce: Single machine memory and performance are really limited, so we can store 1 billion segments on different machines, each machine calculates its own TopN, and finally summarize.
Cons: Space for time.
Optimal solution: small top pile
Maintain an array of length N in memory, according to the nature of the heap, each node is smaller than its left and right child nodes, first take the first N numbers and build a small top heap, then compare all data with the top of the heap, if smaller than the top of the heap, discard it directly, if larger than the top of the heap, replace the top of the heap, and rebuild the heap.
The process of building a small top heap: first find the last non-leaf node, the length of the array is 6, then the last non-leaf node is: length/2-1, that is, 6/2-1=2, and then the next step is to compare the node value with its left and right node values, if the node is greater than its left\right subtree value, swap (meaning put the smallest value into the node). If the node is not a leaf, the process recurs until the node becomes a leaf.
The specific execution code is as follows:
import java.util.Random;/** * @author vincent * @time 2019-08-07 11:59 */public class TopN { public static int N = 10; //Top10 public static int LEN = 10000000; //100 million integers public static int arrs[] = new int[LEN]; public static int result[] = new int[N]; //Maintain a small topheap of length N in memory public static int len = result.length; //valid element heapSize of elements in heap
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.