Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the scheme of obtaining TopK with a large amount of data

2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

It is believed that many inexperienced people have no idea about the solution of obtaining TopK with large amount of data. therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

One: introduction

In daily life, we often encounter the problem of finding TopK. In the case of a small amount of data, we can sort all the data first, and finally traverse it. However, in the case of a large amount of data, the lowest time complexity is O (NlogN). The N here may be as large as 1 billion, and the time complexity is too high, so what method can reduce the time complexity? share with you in the following ways.

Second: local knockout method-- obtaining TopK with the help of "bubble sorting"

Train of thought:

You can avoid sorting all the data, only some of them

Bubble sort is that each round of sorting will get a maximum value, then K-round sorting can get TopK.

Time complexity and space complexity

Time complexity: if the sorting round is O (N), then the total time complexity of K sorting is O (KN).

Space complexity: O (K), used to store the obtained topK, or O (1) to traverse the last K elements of the original array. Ps: bubble sort, please refer to: https://blog.csdn.net/CSDN___LYY/article/details/81478583

Third: local elimination method-- obtaining TopK with the help of data structure "heap"

Train of thought:

Heap: divided into large top heap (heap top element is larger than all other elements) and small top heap (other elements at the top of the heap are smaller than all other elements)

We use small top heap to achieve, why not apply to large top heap, which will be described below.

Take out K elements and put them in another array to build a heap of these K elements.

Then loop through the data from the K subscript position, and as long as the element is larger than the heap top, we assign the heap top to that element, and then re-adjust it to the small top heap.

After the loop, the heap array of K elements is the TopK we need.

Why use a small top pile?

In the process of comparison, we use a small top heap in which the heap top is the minimum, and the element is larger than the heap top. If we re-assign the heap top, then the heap top is always the lowest of these K values. When we compare the next element with the heap top, if it is not larger than the heap top, then it must not belong to the topK range.

Time complexity and space complexity

Time complexity: build a heap of K elements each time, and the time complexity is O (KlogK). If you add NMY K cycles, the total time complexity is O ((K + (N Mel K)) logK), that is, O (NlogK), where K is the number of TopK you want to obtain N is the total data.

Space complexity: O (K). You only need to create a new K-sized array to store topK.

Applicable environment

Suitable for single-core stand-alone environment, will not give full play to the advantages of multi-core

It can also be used to obtain the Top of each element in divide-and-conquer, which is described below

Code implementation

Implemented by the java code used, each step of the code has comments that are easy to understand

Import java.util.Arrays;/*** defines an array of TopK*/public class TopKStack {public static void main (String [] args) {/ / in a large amount of data through the heap data structure, and finds out the topK in the array. It is difficult to get a large amount of data. First, use this array to test int [] datas = {2LJ 3LJ 42Min 11Min 34pint 67pint 6pint 6pint 6pint 243e 8pint 246e 123lt 32mint 3451lt 23mlt 6mcm5lm5lt 6mlt 6mt 234c 36}. Int [] re = getTopK (datas,10); System.out.println (Arrays.toString (re)) } / * the method to get the former topk * @ param datas original array * @ topNum * @ return the last topNum heap array * / static int [] getTopK (int [] datas,int num) {/ / define the array of pre-stored num elements, which is used to build the int [] res = new int [num] / / initialize the array for (int I = 0; I

< num; i++) { res[i] = datas[i]; } //建造初始化堆 for (int i = (num - 1)/2; i >

= 0; iMel -) {shift (res,i);} / / traverses to find num maximum for (int I = num; I)

< datas.length; i++) { if (datas[i] >

Res [0]) {res [0] = datas [I]; shift (res,0);}} return res } / * Adjustment element satisfies heap structure * @ param datas * @ param index * @ return * / static int [] shift (int [] datas, int index) {while (true) {int left = (index)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report