Comparison of main Design ideas between Cassandra and HBase 04/17 Update SLTechnology News&Howtos

Comparison of main Design ideas between Cassandra and HBase

2025-04-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Cassandra HBase Consistency Quorum NRW Policy

Synchronizes MerkleTree through Gossip protocol to maintain data consistency among cluster nodes

Single node, no replication, strong consistency Availability 1, ConsistentHash based replication of data by adjacent nodes, data exists on multiple nodes, no single point of failure.

2. When a node goes down, new data hashed to this node is automatically routed to the next node to do hintedhandoff. After the source node is restored, it is pushed back to the source node.

3. Maintain the health status of all nodes in the cluster through Gossip protocol, and send synchronization requests to maintain data consistency.

4, SSTable, pure file, stand-alone reliability is general.

1. There is a single point of failure. After RegionServer goes down, the region maintained by the server cannot be accessed for a short time. Wait for the failover to take effect.

2. Maintain the health status and Region distribution of each RegionServer through Master.

3, multiple masters, Master downtime has zookeeper paxos voting mechanism to select the next Master. Even if the Master is completely down, it does not affect Region reading and writing. The Master acts only as an automated operations role.

4. HDFS is a distributed storage engine, with one backup and three, high reliability and zero data loss.

5, HDFS namenode is a SPOF.

Scalability 1, ConsistentHash, quickly locate the node where the data is located.

2. Data distribution needs to be adjusted among multiple nodes on HashRing for expansion.

1, locate the target RegionServer through Zookeeper, and finally locate Region.

2, RegionServer expansion, by publishing itself to the Master, Master evenly distributed.

load average

Heng

Ask Zookeeper to fetch the entire cluster address and select the appropriate node based on ConsistentHash. The client caches the cluster address. Request Zookeeper to retrieve the read/write data routing table and locate RegionServer. Master will modify this routing table. The Client itself caches some routing information. Data Difference Comparison Algorithm MerkleTree,BloomFilterBloomFilter Lock and Transaction ClientTimetap (Dynamo uses vectorlock) OptimisticConcurrencyControl Read and write performance Data read and write positioning is very fast. Data read-write positioning may have to pass through network RPC up to 6 times, which is low performance. CAP comment 1, weak consistency, data loss possible.

2) High availability.

3) Easy to expand.

1, strong consistency, 0 data loss.

2) Low availability.

3) Easy to expand.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.