In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article is about how C language uses heap to solve Topk problem. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.
Preface
Will explain in detail how to use the small root heap method to solve the TopK problem, so much data to deal with
The time complexity of the algorithm only needs
TopK problem
Introduction to TopK problem: find the largest top K among N numbers (for example, find the top 10 out of 1000 numbers)
Problem-solving method
Method 1: arrange the descending order first, and the first N is the largest.
Time complexity:
Method 2 the number of HeapPop N was inserted into the large heap in turn, and the number of the first K was taken from the top of the heap each time.
Time complexity:
Suppose N is very large, N is 1 billion, there is no room for these numbers in memory, they are stored in the file. K is 100, so neither method 1 nor method 2 can be used.
How much space does it take up if there are 1 billion integers?
1G = 1024MB
1G = 1024*1024KB
1G = 1024*1024*1024Byte
It takes up 1 billion bytes! So let's take a look at method 3.
Method 3:
① uses the previous K number to build a small heap of K numbers.
The number of Nmurks left in ② is compared with the data at the top of the stack in turn. If it is larger than the data at the top of the stack, replace the data at the top of the stack and adjust it downwards.
The K number in the last heap of ③ is the maximum K number.
Time complexity:
Why do you use small heaps instead of large ones here?
The largest number of the first K must be larger than the other numbers, as long as the number coming in is larger than the top data, replace it. Because it is a small pile (the small one is at the top and the big one is at the bottom), the largest number is bound to sink below, so it is impossible for a large number to get stuck at the top of the pile, and the larger the number, the deeper it sinks. Correspondingly, if you use a large heap, there will be a large number stuck at the top of the heap, and the rest of the number is smaller than this large number, so that the other numbers can not enter, and finally only the largest one can be selected.
Code implementation and explanation
Since we haven't started talking about C++ yet, we can't use the priority queue here, so we have to write a heap to use it manually. Of course, if you are too lazy to write, here is the code for the C language implementation heap.
Heap.h
# define _ CRT_SECURE_NO_WARNINGS 1#pragma once#include # include typedef int HPDataType; typedef struct Heap {HPDataType* array; / / points to dynamically opened array int size; / / number of valid data int capacity; / / size of capacity space} HP; / * initialization of heap * / void HeapInit (HP* php); / * destruction of heap * / void HeapDestroy (HP* php) / * print of heap * / void HeapPrint (HP* php); / * determine whether the heap is empty * / bool HeapIfEmpty (HP* hp); / * insert of heap * / void HeapPush (HP* php, HPDataType x); / * check capacity * / void HeapCheckCapacity (HP* php); / * exchange function * / void Swap (HPDataType* px, HPDataType* py) / * large root heap upregulation * / void BigAdjustUp (int* arr, int child); / * small root heap upregulation * / void SmallAdjustUp (int* arr, int child); / * deletion of heap * / void HeapPop (HP* php); / * small root heap downregulation * / void SmallAdjustDown (int* arr, int n, int parent); / * big root heap downregulation * / void BigAdjustDown (int* arr, int n, int parent) / * return heap top data * / HPDataType HeapTop (HP* php); / * count the number of heaps * / int HeapSize (HP* php)
Heap.c
# define _ CRT_SECURE_NO_WARNINGS 1#include "Heap.h" / * initialization of heap * / void HeapInit (HP* php) {assert (php); php- > array = NULL; php- > size = php- > capacity = 0;} / * destruction of heap * / void HeapDestroy (HP* php) {assert (php); free (php- > array); php- > capacity = php- > size = 0 } / * print of heap * / void HeapPrint (HP* php) {for (int I = 0; I)
< php->Size; iTunes +) {printf ("% d", php- > array [I]);} printf ("\ n");} / * determine whether the heap is empty * / bool HeapIfEmpty (HP* php) {assert (php); return php- > size = = 0 / / if size is 0, the insert * / * check capacity * / void HeapCheckCapacity (HP* php) {if (php- > size = = php- > capacity) {int newCapacity = php- > capacity = 0.4: (php- > capacity * 2) of the heap is empty. / / give 4 for the first time, expand 2 times in other cases HPDataType* tmpArray = (HPDataType*) realloc (php- > array, sizeof (HPDataType) * newCapacity); / / expand the array if (tmpArray = = NULL) {/ / check realloc printf ("realloc failed!\ n"); exit (EXIT_FAILURE) } / / Update their size php- > array = tmpArray; php- > capacity = newCapacity;}} / * Exchange function * / void Swap (HPDataType* px, HPDataType* py) {HPDataType tmp = * px; * px = * py; * py = tmp } / * void BigAdjustUp (int* arr, int child) {assert (arr); / / first calculate the father's subscript int parent = (child-1) / 2 according to the formula / / worst case: child=parent ends when child is the root node (root node is always 0) while (child > 0) {if (arr [child] > arr [parent]) {/ / if the child is larger than the father (not in accordance with the nature of the heap) / / swap their value Swap (& arr [child], & arr [parent]) / / go up child = parent; parent = (child-1) / 2;} else {/ / if the child is smaller than the father (in accordance with the nature of the heap) / / jump out of the loop break } / * small root heap upregulate * / void SmallAdjustUp (int* arr, int child) {assert (arr); / / first calculate the father's subscript int parent = (child-1) / 2 according to the formula / / worst case: set to root, child=parent ends when child is root node (root node is always 0) while (child > 0) {if (arr [child])
< arr[parent]) { // 如果孩子大于父亲(不符合堆的性质) // 交换他们的值 Swap(&arr[child], &arr[parent]); // 往上走 child = parent; parent = (child - 1) / 2; } else { // 如果孩子小于父亲(符合堆的性质) // 跳出循环 break; } } }void HeapPush(HP* php, HPDataType x) { assert(php); // 检查是否需要扩容 HeapCheckCapacity(php); // 插入数据 php->Array [php-> size] = x; php- > size++; / / upward adjust [target array, start position of adjustment position (data just inserted)] SmallAdjustUp (php- > array, php- > size-1);} / * deletion of heap * / / * small root heap downgrade * / void SmallAdjustDown (int* arr, int n, int parent) {int child = parent * 2 + 1 / / defaults to the left child while (child)
< n) { // 叶子内 // 选出左右孩子中小的那一个 if (child + 1 < n && arr[child + 1] < arr[child]) { child++; } if (arr[child] < arr[parent]) { // 如果孩子小于父亲(不符合小堆的性质) // 交换它们的值 Swap(&arr[child], &arr[parent]); // 往下走 parent = child; child = parent * 2 + 1; } else { // 如果孩子大于父亲(符合小堆的性质) // 跳出循环 break; } } } /* 大根堆下调 */ void BigAdjustDown(int* arr, int n, int parent) { int child = parent * 2 + 1; // 默认为左孩子 while (child < n) { // 叶子内 // 选出左右孩子中大的那一个 if (child + 1 < n && arr[child + 1] >Arr [child]) {child++;} if (arr [child] > arr [parent]) {/ / if the child is larger than the father (not in accordance with the nature of a large pile) / / swap their values Swap (& arr [child], & arr [parent]) / / go down parent = child; child = parent * 2 + 1;} else {/ / if the child is smaller than the father (with the nature of a large number) / / jump out of the loop break;}} void HeapPop (HP* php) {assert (php) Assert (! HeapIfEmpty (php)); / delete data Swap (& php- > array [0], & php- > array [php-> size- 1]); php- > size--; / / adjust down [target array, array size, start position of adjustment position] SmallAdjustDown (php- > array, php- > size, 0);} / * return heap top data * / HPDataType HeapTop (HP* php) {assert (php) Assert (! HeapIfEmpty (php)); return php- > array [0];} / * count the number of heaps * / int HeapSize (HP* php) {assert (php); return php- > size;}
Reference code for the third method:
# define _ CRT_SECURE_NO_WARNINGS 1#include "Heap.h" / * find the largest first K of N * / void PrintTopK (int* arr, int N, int K) {HP hp; / / create heap HeapInit (& hp); / / initialize heap for (int I = 0; I)
< K; i++) { // Step1: 创建一个K个数的小堆 HeapPush(&hp, arr[i]); } for (int i = K; i < N; i++) { // Step2: 剩下的N-K个数跟堆顶的数据比较 if (arr[i] >HeapTop (& hp) {/ / replace the heap HeapPop (& hp) if the data is larger than the heap top; / / this number is indeed larger than the heap top, delete the & hp, arr [I]) / / push this number into the heap, the larger the number, the deeper the sinking / * another way to write: manually replace hp.array [0] = arr [I]; SmallAdjustDown (hp.array, hp.size, 0); * /} HeapPrint (& hp); / / print K heap HeapDestroy (& hp) / / destroy heap} / * Simulation test TopK * / void TestTopK () {int N = 1000000; int* arr = (int*) malloc (sizeof (int) * N); srand (time (0)); / / set random number seed for (size_t I = 0; I
< N; i++) { // 产生随机数,每次产生的随机数都mod100w,这样产生的数一定是小于100w的 arr[i] = rand() % 1000000; } // 再去设置10个比100w大的数 arr[5] = 1000000 + 1; arr[1231] = 1000000 + 2; arr[5355] = 1000000 + 3; arr[51] = 1000000 + 4; arr[15] = 1000000 + 5; arr[2335] = 1000000 + 6; arr[9999] = 1000000 + 7; arr[76] = 1000000 + 8; arr[423] = 1000000 + 9; arr[3144] = 1000000 + 10; PrintTopK(arr, N, 10); //测试用,所以给10个} int main(void) { TestTopK(); return 0;}运行结果 函数解读PrintTopK 解读 ① 准备好我们实现好的堆之后,我们就可以写TopK的算法了。我们创建一个 PrintTopK 函数,函数需要接收存放数据的数组、数组的大小(N)和要找前多少个(K)。 ② 首先创建一个堆,用来存放K 。按照规矩我们先把 HeapInit(初始化)和 HeapDestroy(销毁)先写好,防止自己不小心忘记销毁。 ③ 核心步骤1:创建一个K个数的小堆,我们直接用 for 循环将数组中前K个值先逐个 HeapPush (堆的插入)进去。 这里不代表最后的结果,我们只是相当于 "默认" 认为这前K个数是最大的,方便我们后续进行比较替代。经过 HeapPush (堆的插入)后,这些数据会通过 SmallAdjustDown (小堆下调接口) 对数据进行相应的调整: for (int i = 0; i < K; i++) { // Step1: 创建一个K个数的小堆 HeapPush(&hp, arr[i]);} ④ 核心步骤2:除去K,将剩下的N-K个数据进行比较。我们再利用 for 循环将数组中剩下的N-K个数据进行遍历。 这里逐个进行判断,如果该数堆顶的数据 i<K(max),我们就进行替换操作。调用 HeapPop(堆的删除)删除堆顶的数据,给 让位。之后将其 HeapPush (堆的插入)推到堆中,就完成了替换的工作。值得一提的是,我们还可以不调用 Pop 和 Push 这两个操作,手动进行替换。hp.array [ 0 ] 就表示栈顶,我们将 赋值给它,随后再手动进行 SmallAdjustDown (小堆下调操作),传入相应的值即可: for (int i = K; i < N; i++) { // Step2: 剩下的N-K个数跟堆顶的数据比较 if (arr[i] >HeapTop (& hp) {/ / if the data is larger than the heap top, replace it into the heap HeapPop (& hp); / / if this number is indeed larger than the heap top, delete the heap top HeapPush (& hp, arr [I]); / / push this number into the heap, the larger the number, the deeper the sinking / * another way of writing: manually replace hp.array [0] = arr [I] SmallAdjustDown (hp.array, hp.size, 0); * /}}
⑤ when the for traverses all the data, the largest number of the first K of the N data is retained in the small heap. At this point, we call the heap print interface to print out the data in the small heap and we are done.
TestTopK interpretation
① this is a function used to test the TopK algorithm we wrote. Set the size of N to 100w, and dynamic memory opens up a space where so much data can be stored:
Int N = 1000000 * arr = (int*) malloc (sizeof (int) * N)
② then sets the random number seed according to time, fills each element with random number, and the random number generated each time is modularized to 100w, which ensures that the random number generated must be less than 100w.
Srand (time (0)); for (size_t I = 0; I < N; iTunes +) {arr [I] = rand ()% 1000000;}
③ randomly writes a few numbers greater than 100w to facilitate testing:
/ then set 10 numbers larger than 100w, arr [5] = 1000000 + 1 arr [1231] = 1000000 + 2 arr [5355] = 1000000 + 3 arr [51] = 1000000 + 4 arr [15] = 1000000 + 5 arr [2335] = 1000000 + 6; arr [76] = 1000000 + 8ash arr [423] = 1000000 + 9 arr [3144] = 1000000 + 10
④ calls the PrintTopK function we just implemented, and after submitting the corresponding parameters, we can test it. Here, to facilitate testing, our K is set to 10:
PrintTopK (arr, N, 10); thank you for reading! This is the end of the article on "how to solve the Topk problem with C language heap". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it out for more people to see!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.