Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What functions are included in the cassandra system

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

In this issue, the editor will bring you about the functions of the cassandra system. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.

Cassandra is a set of NoSQL storage engine developed by facebook, and it is also one of the most popular NoSQL applications at present. Different people understand the characteristics of cassandra differently, which can be summarized as follows:

Distributed, high fault tolerance under the cluster, flexible control of schema with unlimited scalability, add and delete fields at will, and support range query. The following editor will explain what functions the cassandra system contains?

What functions are included in the cassandra system

1. The hard drive is a new tape.

"memory is the new hard disk, hard disk is the new tape" is a famous saying of Jim Gray. At present, our use of hard disk (non-SSD) is mostly random reading, at this time, the hard disk reading speed is quite slow, but if the hard disk is read sequentially as a tape, the speed is quite amazing.

The design of cassandra is precisely for this point. He saves a certain amount of data in it and then writes it to disk, which itself is a sequential write, and no changes are made after writing, so that when reading the data, it can only be read sequentially again and again. The efficiency of the disk is greatly improved.

If cassandra does not change the data, how is the update operation of the data realized? cassandra uses an append method to write another message, and when taking it, take out all the operations on the data and replay the case according to the chronological order to figure out what the latest data is.

What functions are included in the cassandra system

Application of 2.bloom-filter algorithm

Bloom-filter algorithm is simply an algorithm to determine whether a value exists in a set. It is most commonly used in the URL crawling of search engines. If the URL is in the URL list that has been crawled for a period of time, it will no longer be crawled. The time and space complexity of this algorithm is very small, the basic judgment of each data only needs to do hash several times, but the problem is that there is a certain error, as long as the application can accept this error, then using bloom-filter algorithm is the best.

Bloom-filter is used in cassandra to determine whether there is an update of a value in a data block. As mentioned above, when we read data, we read all its update records and sort them in chronological order to get the latest values. On the other hand, each time the memory storage limit of cassandra (which can be set freely, but is usually lower than physical memory in order to ensure efficiency), the internal data will be written to the hard disk to generate a new file. So in a large amount of data, there will be a lot of blocks generated, if all blocks to find whether there is a certain value of the update record, it will waste time and reduce efficiency, so cassandra uses the bloom-filter algorithm to decide whether to find this block, and the index.db file in cassandra stores the hash table of the bloom-filter algorithm.

As we said above, the bloom-filter algorithm will have some errors, but this error may misjudge the values that are not in a set as in this set, rather than misjudge the values in this set as not in this set. This error is tolerable here because we can look for one more data block that does not exist, but we will never miss any of them.

3. Multi-point synchronization based on gossip

Gossip is an implementation of P2P protocol, its principle is to send information to the surrounding nodes until all nodes have the same information, this transmission is viral. In this way, multi-point synchronization can be achieved, and the function of infinite horizontal expansion can be achieved without paying attention to the number of specific nodes. Moreover, the multi-point distributed system has a good fault-tolerant mechanism, and the problem of one or N machines in the cluster will not affect the correctness of the overall data service. And cassandra's error detection system can also quickly find necrotic nodes in order to deal with them in time.

These are the functions of the cassandra system shared by the editor. If you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report