In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article will explain in detail the analysis of a series of application examples about common data structures and algorithms. The content of the article is of high quality, so the editor shares it for you as a reference. I hope you will have a certain understanding of the relevant knowledge after reading this article.
Data structure refers to a format of data organization, management and storage, which can help us to access and modify the data efficiently.
Data structure = data element + structure between elements.
If the data structure is the skeleton of building, the algorithm is the specific process of building. Different processes lead to different efficiency resources. I will combine the two to briefly discuss their characteristics and applications.
Common data structures can be divided into linear structure, tree structure and graph structure.
Common algorithms are: recursion, sorting, binary search, search, hash algorithm, greedy algorithm, divide and conquer algorithm, backtracking algorithm, dynamic programming, string matching algorithm and so on.
Let's start with linear data structures, recursion and sorting algorithms.
Linear structure
Linear structure: refers to the structure of data arranged like a line. Each element node corresponds to at most one precursor node and one successor node. Such as arrays, linked lists, stacks, queues, etc.
Array
An array is a data structure consisting of a collection of elements of the same type (element), allocating a contiguous block of memory to store. The storage address corresponding to the element can be calculated by using the subscript position of the element.
The picture comes from the network and is deleted and invaded.
Advantages:
Allocation is based on continuous memory and is a natural index structure that queries the efficiency of modifying elements O (1). At the same time, the data in the array can be pre-read with the help of the caching mechanism of CPU, so the access efficiency is higher.
Disadvantages:
The index advantage of the array is also its disadvantage, because its index is based on the location subscript of a continuous memory element, and the time complexity of adding and deleting arr [I] is O (n), which needs to move the location of the array arr [i-n-1] as a whole. In addition, allocating large arrays takes up more memory.
You can avoid the overhead of copying and occupying elements in the following ways:
1. Lazy deletion: when deleting, only marks the element is deleted, and does not actually perform the deletion. When the overall memory of the array is insufficient, the actual deletion is performed.
two。 Block idea: divide a large block of memory into n small blocks, and copy the array memory in small blocks. For example, each Buffer Pool instance in Mysql's InnoDB engine consists of several chunk, and the actual memory request operation is based on chunk.
3. Downsizing: when I interviewed Ali, I designed a downsized version of HashMap. Waste is shameful, economy is honorable.
4. Linked list.
Linked list
The existence of linked list is to solve the problem of adding and deleting arrays, which is complex and time-consuming and takes up a lot of memory. It does not require a contiguous piece of memory space, it concatenates a set of scattered memory blocks through pointers. According to the different pointers, there are single linked list, two-way linked list and circular linked list.
The picture comes from the network and is deleted and invaded.
Advantages:
Add and delete arr [I] time complexity O (1), there is no limit on the size of the linked list itself, and naturally supports dynamic expansion.
Disadvantages:
There is no "index" and the query time complexity is O (n). Pointers need to be maintained and take up more memory. At the same time, the memory is not continuous, which is easy to cause memory fragmentation.
You can see that arrays and linked lists are a pair of data structures that complement each other. So how to make up for the deficiency of the linked list?
This piece of memory is difficult to solve, which is determined by the pointer.
As for the index, just index it without an index:
1. Combined with the hash table, record the location of each node in the linked list.
two。 When the length of the linked list is too long, consider data structures such as jump tables and red-black trees. Don't panic, I'll talk about it later.
Application scenarios:
Arrays and linked lists are widely used, and they are the basis of the data structure. Such as stack, queue, set and so on.
Stack
Stack is a restricted linear data structure. Elements can only be accessed at the top of the stack. It accords with the access method of first-in-first-out First-In-Last-Out.
The picture comes from the network and is deleted and invaded.
Those implemented with arrays are called sequential stacks, and those implemented with linked lists are called linked stacks.
Some people may wonder: I use array linked lists to stretch at both ends of the head and tail, so why should I use a stack structure that can only be operated on the head?
The structure of this FILO is, of course, only applicable to FILO scenarios. If we encapsulate the array / linked list structure as a stack, we can use only the API of its pop/push, masking the implementation details.
Application scenarios:
1. Redo/undo operation of the editor.
two。 Forward / backward operation of the browser.
3. Parenthesis matching check of compiler
4. Expression Evaluation in Mathematical calculation
5. Function call
6. Recursion
7. String reversal...
Queue
Queues are also a restricted linear data structure. Accords with the access mode of first-in-first-out First-In-First-Out. Similarly, queues implemented with arrays are called sequential queues, and queues implemented with linked lists are called chained queues.
The picture comes from the network and is deleted and invaded.
According to the difference of head and tail pointer and operation, the queue can be divided into double-ended queue, circular queue, blocking queue and concurrent queue.
Double-ended queue: both head and tail can insert, delete, and access elements, which is more practical. There is no such limitation as FIFO.
The picture comes from the network and is deleted and invaded.
Circular queue: a queue that connects the beginning and end of a queue and uses a sequential storage structure for data storage.
The picture comes from the network and is deleted and invaded.
In concurrency scenarios, the critical areas for queue access elements are enqueue and dequeue operations. Queues that ensure secure access to queues under concurrency are classified as blocking queues and concurrent queues. The difference between the two lies in the granularity of synchronous resources.
Blocking queue: the security of enqueue and dequeue is ensured by mutex, and the locking granularity is large. Such as the blocking queue in the Java JUC packet.
Concurrent queue: circular queue based on array, using CAS atomic operation to ensure the security of enqueue and dequeue.
In fact, it is through: multiple volatile read + CAS operation this optimistic idea to modify the position of the head and tail pointer to ensure the security of enqueue and dequeue. The synchronization cost of CAS is small and small, so it is called: lock-free concurrent queue. For example, Ring Buffer uses this point in the Disruptor framework.
PS: many frameworks replace thread pool requirements with Disruptor, such as Log4j2, Canal, and so on.
Application scenarios:
The function of the queue is actually the queue in reality. When resources are insufficient, the effect of queuing can be realized through the structure of "queue". Used for:
1. Where task scheduling exists: CPU/ disk / thread pool / task scheduling framework.
two。 The transfer of data in two processes: such as pipe/file IO/IO Buffer.
3. In the producer-consumer scenario..
Recursive implementation
Recursion is a coding implementation of algorithm solution. It can be applied to such as depth-first search, pre-middle and post-order binary tree traversal (I will talk about it after digging holes), etc. Because the following sorting algorithms such as merge / fast arrangement can be implemented through recursion, let's first take a look at the steps of writing recursion. Familiar with the idea of recursion, it is actually a simple way to write coding.
As long as the problem satisfies the following three points, recursion can be used to solve it:
1. The solution of a problem can be decomposed into the solution of several subproblems.
two。 Between the problem and the sub-problem, except for the different data scale, the solution is exactly the same.
3. There is a recursive termination condition
The key to writing recursive code is to find out the law of how to decompose a big problem into a small problem, write a recursive formula based on it, and then finalize the termination condition, and finally translate the recursive formula and termination condition into code.
Because people are not good at dealing with this kind of program, we can automatically block the execution of recursion when writing recursive code. We just need to tell the program what the recursive formula and termination condition are, and things will change Easy~.
Notes when using:
1.stackoverflow: if the actual function call is too deep, there is a risk of space overflow in the system stack or virtual machine stack.
two。 Repeated calculation of sub-problems: in the previous article, I mentioned that dynamic programming can reduce the time complexity by avoiding the repeated calculation of sub-problems. One way is to solve it through recursion + memo (the solution of the sub-problem is saved).
Sorting algorithm
The first algorithm that 233 paste learns is the bubble sorting algorithm, and I think many programmers have experienced the fear of being dominated by "several major sorting algorithms".
Sorting is a scenario that we often encounter in project projects, such as TopK, median problem and so on. The difference between ordered and unordered data sets is that the former "reverse pair" is 0. 5.
Tip: if I
< j,且a[i] >A [j] is called a reverse pair, as in 1, 7, 3, 5.
On the contrary, it is an orderly pair, such as
Different sorting algorithms have different ways to eliminate reverse pairs, which are reflected in time and space complexity, sorting mode, stability, applicable scenarios and so on.
Let me first put a picture of the sorting algorithm on the Internet:
The picture comes from the network and is deleted and invaded.
When choosing a sorting algorithm, we should consider the execution efficiency, memory consumption, stability and other factors of the algorithm.
PS: the following content mainly quotes the course "the Beauty of data structures and algorithms" by geek time King. 233The ability is limited, silently advertise and like the boss.
How to analyze the execution efficiency of sorting algorithm
1. Best-case, worst-case, average time complexity
For the original data to be sorted, the order of the data is different, which has an impact on the efficiency of sorting. For example, the time complexity of sorting data to be sorted is close to O (n). We need to understand the performance of sorting algorithms under different data.
two。 Coefficient, constant, low order of time complexity
When sorting small-scale data, such as 10,100,1000. Coefficients, constants and low orders need to be taken into account in order to choose an appropriate sorting algorithm.
3. Number of comparisons and times of exchange (or movement)
The execution process of the sorting algorithm based on comparison will involve two operations, one is the relative size of elements, the other is element exchange or movement. Therefore, if we analyze the execution efficiency of the sorting algorithm, we should also take into account the number of comparisons and the number of exchanges (or moves).
Memory consumption of sorting algorithm
In the figure above, there is a list of sorting methods: In-place and Out-place. The former refers to the sorting algorithm with O (1) space complexity, which does not need to open up memory space outside. The latter needs extra space to store the intermediate state. The advantage of the former is that it can be accessed more efficiently with the help of CPU's caching mechanism. This is an important consideration.
Tip: the space complexity of Quick Row is because its implementation is recursive. Only constant space is used in each function call, so the space complexity is equal to the recursive depth O (logn).
Stability of sorting algorithm
Stability means that there are elements with equal values in the sequence to be sorted. After sorting, the original order of the equal elements is unchanged.
Why consider the stability of the sorting algorithm?
This is because there may be multiple object sorting dimensions to be sorted in the actual scene. For example, we sort the order first by amount, and then according to the time when the order was issued. The simple idea to achieve is to sort the order first according to the time when the order was issued, and then by the amount. The stable sorting algorithm can ensure that the order order of two objects with the same amount will remain the same after sorting.
The following mainly discusses the application of these sorting algorithms in terms of data scale.
Small-scale data sorting
Under small-scale data, bubble sort / selection sort / insert sort is relatively simple, excluding unstable selection sort. Insert sorting (comparable to the idea of sorting when playing cards) is better than bubbling sorting (the largest element comes back in turn) with less exchange times and higher sorting efficiency on a small scale.
In addition, when the order degree of the sequence to be sorted is relatively high, the efficiency of insertion sorting is also better than that of merging / fast sorting O (nlogn). Therefore, insertion sorting is suitable for small-scale data scenarios.
Large-scale memory level data sorting
Large-scale data sorting is suitable for O (nlogn)-level sorting algorithms. Merge sorting and quick sorting are discussed here.
The idea of merging and sorting is the thought of divide and rule. The sorting of the whole unordered sequence is divided into the sorting problem of unordered small sequences. The subsequence is ordered, and then the ordered subsequence is merged, and the whole sequence is arranged.
The merge sort is an external sort. Additional memory space is required for each merge operation, and after the merge is completed, the temporarily opened memory space is freed. At any given time, CPU will have only one function executing, that is, only a temporary memory space will be used. The maximum temporary memory space will not exceed the size of n data, so the space complexity is O (n).
Quick sorting also uses the idea of divide and conquer. Local order finally leads to global order. It uses a partition point data (pivort) to divide elements into
< pivort,=pivort,>There are three parts of pivort. And then in
< pivort 和 >The two parts of pivort continue recursive processing, and the final sorting is complete.
If pivort is reasonably selected for fast scheduling, multi-pointer participation in partitioning can avoid the deterioration of time complexity. And the fast row is the in-situ sort, compared with the merge sort is the external sort, the space complexity is higher O (n). Fast row is more widely used.
In Java, Arrays.sort is a mixed sort, and there are two implementation strategies:
Case1. The stored data type is the basic data type
Fast sorting is used, and insertion sorting is used when the amount of data is very small.
Case2. The data type stored is Object
Merge sort is used, and insert sort is also used when the amount of data is small.
Large-scale external data sorting
When the data is large, we can't load all the data into memory. At this time, we can consider the external sorting algorithm whose time complexity is O (n): bucket sorting, counting sorting, cardinality sorting. External sorting means that the data is stored on an external disk.
The reason for the low time complexity here is that the three algorithms are non-comparison-based sorting algorithms and do not involve comparison operations between elements.
Bucket sorting is to assign elements to globally ordered subbuckets according to certain attributes, and then do local sorting in subbuckets. When the number of buckets is large enough, the time complexity is close to O (n).
Counting sorting is actually a special case of bucket sorting. When the range of n data to be sorted is not large, for example, the maximum value is k, we can divide the data into k buckets. The data values in each bucket are the same, saving time for sorting in the bucket.
Cardinal sorting is sorted according to each bit, cardinality sorting has requirements for the data to be sorted, it needs to be divided into independent "bits" to compare, and there is a progressive relationship between bits, if the high order of a data is larger than that of b data, then the remaining low bits do not have to be compared. In addition, the data range of each bit can not be too large, and the linear sorting algorithm can be used to sort, otherwise, the time complexity of cardinality sorting can not achieve O (n).
This is the end of the analysis of a series of examples of common data structures and algorithms. I hope the above content can be helpful to you and learn more knowledge. If you think the article is good, you can share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.