Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What does sorting mean in big data's development?

2025-04-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces what sorting means in big data development. The introduction in this article is very detailed and has certain reference value. Interested friends must read it!

O(n2) time complexity: O(n1) time complexity: O(n2) time complexity: O(This time let's talk about bucket sort, count sort and radix sort with time complexity O(n). Because their time complexity is linear, they are also called linear sort. The reason why they can achieve linear complexity is because they do not involve comparisons between elements when sorting, and their use conditions are also very strict.

Bucket sort

Bucket sorting, as the name implies, uses buckets to divide the data. Bucket sorting is to divide the array to be sorted into several ordered buckets, then sort the data in each bucket, and finally take out all the data in turn to complete the sorting.

Let's take a look at its time complexity. If n data are evenly divided into m buckets, each bucket has x=n/m data, and each bucket uses fast sorting, the time complexity is O(xlogx), and the time complexity of m buckets is O(m*xlogx). Because x=n/m, the time complexity is O(nlog(n/m)). When the number of buckets is close to n, log(n/m) will be a very small number, and the time complexity is close to O(n).

However, if the above two words are used, because there are many preconditions required to use bucket sorting, first the data needs to be easily divided into m buckets, and each bucket itself needs to have a sequence, so that the bucket does not need to be sorted again. Secondly, we have bolded the two words evenly above, only when the data in the bucket is relatively average, if the data gap in each bucket is very large, the time complexity of sorting in the bucket is not constant, and the most extreme case is to divide it into a bucket, then the time complexity becomes O(nlogn).

Bucket sort is better suited for external sort. The so-called external sort is that the data is stored in an external disk, the amount of data is relatively large, and the memory is limited, so it is impossible to load all the data into memory.

If there are 10 G order data, the order amount is a positive integer, to sort according to the amount, and the memory is relatively small, can not load all the data at the same time how to do?

The idea of bucket sorting is to scan the order first and look at the approximate amount distribution. If the amount is distributed between 1 and 10,000, we can divide the amount into 10 buckets. The first bucket is 1~1000, and so on. Their order is 0, 1, 2…9.

Ideally, they are evenly distributed in each bucket, and then we use quick sort in each bucket, and finally output them in turn. If the amount of 1000-2000 is relatively large, it still cannot be directly put into memory, then continue to use bucket sorting for segmentation until all sorting is completed.

Counting sort

Counting sort basically belongs to the special case of bucket sort. When the range to be sorted is not large, for example, there are x data, then we divide it into x buckets, and the data in each bucket is the same, so that we save the time of sorting in the bucket.

As for why it is called counting, this is related to the implementation method of counting sorting, but I have not understood it clearly, so I will put it down first, and then fill in this pit.

radix sort

Let's look at another sorting problem. If there are 100,000 mobile phone numbers, sort them from small to large. If you use the bucket sort and count sort above, its range is relatively large, and these two algorithms are obviously not suitable.

Now you can use the third sort method, cardinal sort. In the question just now, we can clearly see that when the first few digits of a number are obviously larger than another number, the latter ones do not need to be compared, that is to say, we need to sort the mobile phone numbers into the first digits from small to large, the first digits are the same, the second digits are the same, according to the third, and so on.

Here we use a new idea to consider, we first according to the size of the last digit to sort, and then according to the penultimate digit to sort, and so on, after 11 sorting, completed the sorting of the entire mobile phone number. It should be noted here that it must be implemented with a stable sorting algorithm. If it is an unstable sorting algorithm, the previous sorting has no meaning.

So if you use a few strings, it looks like this.

For example, for a batch of orders, we want to sort them according to the size of the amount. If the amount is the same, we will sort them according to the time when the order was placed. The same method is used.

If the data we need to sort is not of equal length, how should we deal with it? For example, if we want to sort words, some words are five letters, some words are fifteen letters, how should we deal with it?

We can add all letters to the same length, less than 0, because according to ASCII code we can find that all letters are after the number 0, it does not affect the normal sorting, so you can still use radix sorting to proceed.

The above is all the content of this article "What does sorting mean in big data development". Thank you for reading it! Hope to share the content to help everyone, more relevant knowledge, welcome to pay attention to the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report