On distributed CAP Theorem 07/15 Update SLTechnology News&Howtos

On distributed CAP Theorem

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

With the development of the Internet, due to the problems of large amount of data and high concurrency of operation, most website projects adopt distributed architecture. The biggest characteristic of the distributed system is that the data is scattered, and the data will be inconsistent in different network nodes at some time (data is not synchronized, data is lost).

In 2000, Professor Eric Brewer put forward a conjecture at the PODC seminar: consistency, availability, and partition fault tolerance cannot be satisfied at the same time in distributed systems, and only two of them can be satisfied at most!

In 2002, Lynch proved his conjecture and rose to a theorem. This is known as the CAP theorem.

CAP is the design standard for all distributed databases. For example, the designs of Zookeeper, Redis, HBase and so on are all based on CAP theory.

CAP definition

The so-called CAP is the three features of a distributed system:

Consistency, consistency. Whether the data of all distributed nodes are consistent. Availability, availability. Whether you can continue to respond to the service (available) when there is a problem with some nodes (data inconsistency, node failure). Partition tolerance, partition fault tolerance. Allow data inconsistencies in nodes (partitions). In-depth understanding

There are A, B, C three distributed databases.

When the data of A, B and C are exactly the same, then it conforms to the Consistency (consistency) in the theorem.

If the data of An is different from that of B, but the overall service (including A, B, C as a whole) is not down, it can still serve the external system, then it conforms to the Availability (availability) in the theorem.

There is no way to keep the data of each node consistent all the time in the distributed database. Suppose a user updates a record on library A, and at this moment, the data of library An is inconsistent with that of library B and C. This situation is inevitable in distributed databases. This is Partition tolerance (partition fault tolerance)

When the data is inconsistent, it must satisfy the partition fault tolerance, if not, then this is not a reliable distributed system.

However, in the case of data inconsistency, the system either chooses to maintain data consistency first, in that case. The first thing the system needs to do is to synchronize the data, and the response of the system needs to be paused at this time. This is satisfying CP.

If the system gives priority to usability, then in the case of data inconsistency, consistency will be abandoned in the first place, so that the overall system can still work. This is AP.

So, in general, distributed systems either satisfy CP or AP.

So is there anyone who satisfies CA? Yes, when the distributed node is 1, there is no P, and CA will naturally be satisfied.

Examples

As mentioned above, partition fault tolerance must be satisfied in distributed systems, and what needs to be weighed is the consistency and availability of the system. So what kind of tradeoffs are common distributed systems based on?

Zookeeper

Guarantee CP. When the master node fails, Zookeeper reselects the master. Zookeeper is not available at this time and you need to wait for the candidate to finish before you can re-provide the registration service. Obviously, Zookeeper does not meet the availability characteristics in the event of node failure. In the production environment with complex network conditions, there is also a probability that such a situation will occur. Once it appears, if the part that relies on Zookeeper will stutter, it is easy to cause an avalanche of the system on a large system. This is also the reason why large projects do not choose Zookeeper as the registry. Eureka

Guarantee AP. In Eureka, each node is equal and they register with each other. Hanging several nodes can still provide registration services (which can be configured to the proportion of dead). If the connected Eureka is found to be unavailable, it will automatically switch to other available points. In addition, when a service tries to connect to Eureka and finds that it is not available, it is possible that the data read by this service is not up-to-date because the failed node does not have time to synchronize the latest configuration. So when strong consistency is not required, Eureka is more reliable as a registry. Git

In fact, Git is also a distributed database. It guarantees CP. It is easy to guess that the Git warehouse in the cloud must ensure the consistency of the data in the local warehouse, and if it is inconsistent, it will make the data consistent before working. When you modify the local code and want to send the push code to the Git repository, if the cloud HEAD is inconsistent with the local HEAD, you will first synchronize the cloud HEAD to the local HEAD, and then synchronize the local HEAD to the cloud. Finally, the consistency of the data is guaranteed.

For more technical articles and wonderful practical information, please follow us.

Personal blog: zackku.com

Official account of Wechat: Zack says code

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.