Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How does the Bitmap of Distinct Count sort?

2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly explains "how to sort the Bitmap of Distinct Count". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how to sort the Bitmap of Distinct Count.

Big data (big data), IT industry term, refers to the data set that can not be captured, managed and processed with conventional software tools within a certain period of time. It is a massive, high growth rate and diversified information asset that requires a new processing model to have stronger decision-making power, insight and process optimization ability.

1. Bitmap introduction

Bitmap is a very useful data structure. The so-called Bitmap is to mark the Value corresponding to an element with a bit bit, and Key is the element. Because Bit is used as a unit to store data, memory footprint can be greatly saved.

In short-- use a bit (0 or 1) to indicate whether an element has ever been present, and its position in the bitmap corresponds to its index.

An example of sorting with bitmap:

/ * Copyright (C) 1999 Lucent Technologies * / / * From 'Programming Pearls' by Jon Bentley * / / * bitsort.c-- bitmap sort from Column 1* Sort distinct integers in the range [0..N-1] * / # include#define BITSPERWORD 32#define SHIFT 5#define MASK 0x1F#define N 10000000int a [1 + N / BITSPERWORD] Void set (int I) {a [I > > SHIFT] | = (1 > SHIFT] & = ~ (1 > SHIFT) & (1 {u.set (v) u}, (U1: EWAHCompressedBitmap, U2: EWAHCompressedBitmap) = > u1.or (U2) bitmap.cardinality ()} / / the tuple_2 is the indexdef groupCount [K: ClassTag] (rdd: RDD [(K, Int)]): RDD [(K, Int)] Int)] = {val grouped: RDD [(K, EWAHCompressedBitmap)] = rdd.combineByKey [EWAHCompressedBitmap] ((v: Int) = > EWAHCompressedBitmap.bitmapOf (v), (c: EWAHCompressedBitmap, v: Int) = > {c.set (v) c}, (C1: EWAHCompressedBitmap, c2: EWAHCompressedBitmap) = > c1.or (c2)) grouped.map (t = > (t.room1, t._2.cardinality ()}

However, in the above calculation, because the set method of EWAHCompressedBitmap requires the int value to be in ascending order, that is to say, the index of each partition of RDD should be in ascending order:

/ / sort pair RDD by valuedef sortPairRDD [K] (rdd: RDD [(K, Int)]): RDD [(K, Int)] = {rdd.mapPartitions (iter = > {iter.toArray.sortWith ((x, y) = > x._2.compare (.c.2) < 0). Iterator})}

To avoid sorting, you can generate a bitmap for each uid, and then or the bitmap when you Distinct Count:

Rdd.reduceByKey (_ or _) .mapValues (_. _ 2.cardinality ()) so far, I believe you have a better understanding of "how to sort the Bitmap of Distinct Count". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report