Some thoughts on the consistency and usability of CAP Theory and MongoDB 07/13 Update SLTechnology News&Howtos

Some thoughts on the consistency and usability of CAP Theory and MongoDB

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

About five or six years ago, I first came into contact with NoSql, which was already hot topic at that time. But at that time, I learned to use mysql,Nosql, which is still new to me. I didn't really use it. I just didn't understand it, but I thought it was awesome. But what is impressive is this picture (later google came to the picture from here):

This picture is about the relationship between databases (including traditional relational databases and NOSQL) and CAP theory. Because NoSql has no practical experience and no in-depth understanding, it is even less knowledgeable about CAP theory. Therefore, it is not clear why a particular database is divided into which camp.

After work, I use MongoDB more often, and I have a certain understanding. Some time ago, I saw this picture again, so I wanted to find out whether MongoDB really belongs to the CP camp and why. The reason for skepticism is that replica set is used in MongoDB's classic (officially recommended) deployment architecture, and replica set provides high availability (Availability) through redundancy and automatic failover, so why does MongoDB sacrifice Avalability? I searched for "CAP" in MongoDB's official documentation and didn't find anything. So I want to figure out this question and give myself an answer.

This article first clarifies what CAP theory is and some articles about CAP theory, and then discusses the tradeoff and tradeoff between consistency and usability of MongoDB.

Address: http://www.cnblogs.com/xybaby/p/6871764.html

CAP theory

Back to the top.

For CAP theory, I only know the meaning of these three words, and their explanations also come from some articles on the Internet, which are not necessarily accurate. So first of all, we have to go back to the source and find out the origin and accurate explanation of this theory. I think the best start is wikipedia, from the above you can see a more accurate introduction, more importantly, you can see a lot of useful links, such as the origin of CAP theory, the process of development.

CAP theory is that for distributed data storage, at most both of them can be satisfied at the same time, such as consistency, availability (A, Availability), and fault tolerance of partitions.

Consistency means that for each read operation, either the most recently written data can be read or an error can be made.

Availability means that you can get a timely and unmistakable response to each request, but there is no guarantee that the result of the request is based on the latest written data.

Partition fault tolerance means that the whole system can continue to provide services (providing consistency or availability) even if some messages are packet-to-packet or delayed due to network problems between nodes.

Consistency and usability all use very broad terms, and the specific meaning is different in different semantic environments. For example, in cap-twelve-years-later-how-the-rules-have-changed, Brewer pointed out that "consistency in CAP is not the same problem as consistency in ACID". Therefore, unless otherwise stated, the consistency and usability mentioned in later articles refer to the definition in CAP theory. The discussion is meaningful only if it is made clear that everyone is in the same context.

For distributed systems, the situation of network partitioning (network partition) is inevitable. There must be delays in data replication between nodes. If you need to ensure consistency (the latest written data can be read for all read requests), it is bound to be unavailable (unreadable) for a certain period of time, that is, at the expense of availability, and vice versa.

According to the description in Wikipedia, the relationship between CAP originated about 1998. Brewer showed CAP conjecture in PODC (Symposium on Principles of Distributed Computing) in 2000. In 2002, two other scientists, Seth Gilbert and Nancy Lynch, proved Brewer's conjecture, which changed from conjecture to theorem [4].

The origin of CAP theory

In Towards Robust Distributed Systems, Brewer, the originator of CAP theory, pointed out that in distributed systems, computing is relatively easy, but the real difficulty is state maintenance. Then for distributed storage or data sharing system, it is difficult to ensure the consistency of data. For the traditional relational database, the priority is consistency rather than availability, so the ACID feature of transaction is proposed. For many distributed storage systems, it pays more attention to availability rather than consistency, which is guaranteed by BASE (Basically Available, Soft state, Eventual consistency). The following figure shows the difference between ACID and BASE:

In short: BASE ensures service availability as much as possible through ultimate consistency. Note that the last sentence in the intention, "But I think it's a spectrum", means that ACID BASE is only a matter of degree, not two extremes of opposites.

In 2002, in Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services, the two authors demonstrated the CAP conjecture through the asynchronous network model, thus upgrading Brewer's conjecture to theorem. But to be honest, I didn't read the article very clearly either.

In this 2009 article, brewers-cap-theorem, the author gives a relatively simple proof:

As shown in the figure above, the N1J N2 two nodes store the same data V, and the current state is V0. The safe and reliable write algorithm A runs on node N1, and the same reliable read algorithm B runs on node N2, that is, N1 node is responsible for write operation and N2 node is responsible for read operation. The data written by the N1 node is also automatically synchronized to N2, and the synchronous message is called M. If there is a partition between N1 and N2, there is no guarantee that message M will arrive at N2 within a certain period of time.

Look at these issues from a transactional point of view

The transaction α consists of operations α 1, α 2, where α 1 is the write data and α 2 is the read data. If it is a single point, it is easy to ensure that α 2 can read the data written by α 1. In a distributed case, there is no guarantee that α 2 can read the data written by α 1 unless it can control the occurrence time of α 2, but any control (such as blocking, data centralization, etc.) either destroys the fault tolerance of partitions or loses availability.

In addition, this article points out that in many cases, availability is more important than consistency, for example, for sites like facebook google, temporary unavailability can lead to huge losses.

In this 2010 article, brewers-cap-theorem-on-distributed-systems/, uses three examples to illustrate CAP, namely example1: a single point of mysql;example2: two mysql, but different mysql stores different data subsets (similar to sharding); example3: two mysql, an insert operation for An is considered complete only if it is successfully executed on B (similar to a replication set). The author believes that strong consistency can be guaranteed on both example1 and example2, but not usability; in the case of example3, due to the existence of partition, there is a trade-off between consistency and availability.

In my opinion, the best way to discuss CAP theory is under the premise of "distributed storage system". Availability does not mean the availability of the overall service, but the availability of a child node in the distributed system. Therefore, I feel that the above example is not very appropriate.

The development of CAP theory

In 2012, Brewer, the inventor of CAP theory, wrote another article on CAP theory, "CAP Twelve Years Later: How the" Rules "Have Changed". This article is long, but it is clear and strategically advantageous, and it is well worth reading. There is also a useful Chinese translation "CAP Theory Review in the past 12 years: the rules have changed", and the translation is not bad.

In the article, the main point is that CAP theory does not mean that there is no need to choose the two. First of all, although as long as it is a distributed system, there may be partitions, but the probability of partitions is very small (otherwise you need to optimize the network or hardware), CAP allows perfect C and A most of the time; it is only during the time that partitions exist that there is a trade-off between C and A. Second, consistency and availability are a matter of degree, not 0 or 1. Usability can vary continuously from 0 to 100%, and consistency is divided into many levels (for example, in casandra, consistency level can be set). Therefore, the goal of contemporary CAP practice should be to maximize the effectiveness of data consistency and availability within a reasonable range for specific applications.

The article also points out that partitioning is a relative concept, and when the communication time limit is exceeded, that is, if the system cannot achieve data consistency within the time limit, it means that the situation of partitioning has happened. a choice must be made between C and A for the current operation.

In terms of revenue goals and contractual requirements, system availability is the primary goal, so we usually use caching or post-checking update logs to optimize system availability. Therefore, when the designer chooses usability, it is necessary to restore the broken immutability after the end of the partition.

In practice, most groups believe that there is no partition within a data center (located in a single location), so there is a choice within a single data center. Before the emergence of CA;CAP theory, the system default to this design idea, including traditional databases.

During partitioning, independent and self-guaranteed sets of nodes can continue to operate, but there is no guarantee that the global scope invariance constraints will not be broken. Such an example is data fragmentation (sharding). Designers divide the data into different partition nodes in advance, and most of the individual data fragments can continue to operate during the partition. On the contrary, if the partitioned state is intrinsically closely related, or if there are some global invariance constraints that must be maintained, then the best case is that only one side of the partition can operate, and at worst the operation cannot be done at all.

The lower line selection section of the above excerpt is very similar to the sharding of MongoDB. Under the sharded cluste mode of MongoDB, there is no need for shard to communicate with each other under normal circumstances.

In the 13-year article "better-explaining-cap-theorem", the author points out "it is really just A vs C!" Because

(1) usability is generally achieved through data replication between different machines.

(2) consistency requires updating several nodes simultaneously between read operations.

(3) temporary partion, that is, the communication delay between several points is possible, so it is necessary to make a tradeoff between An and C. But trade-offs need to be considered only when zoning occurs.

In distributed systems, network partitioning is bound to occur, so "it is really just A vs C!"

MongoDB and CAP

Back to the top.

In the article "getting to know MongoDB by creating sharded cluster step by step", this paper introduces the features of MongoDB, including high performance, high availability, and scalability (horizontal scaling), in which the high availability of MongoDB depends on replica set replication and automatic failover. There are three modes for the use of MongoDB database: standalone,replica set and shareded cluster. In the previous article, the construction process of shared cluster is introduced in detail.

A standalone is a single mongod, and the application connects directly to this Mongod, in which case there is no partition fault tolerance and must be highly consistent. For sharded cluster, each shard is also recommended as a replica set. Shards in MongoDB maintains a separate subset of data, so there is little impact on partitioning between shards (it may still have an impact on the process of chunk migration), so the main consideration is the partitioning impact of replica set within shard. Therefore, the consistency and usability of MongoDB are discussed in this article, which is also aimed at MongoDB's replica set.

For replica set, there is only one primary node that accepts write and read requests, and the other secondary nodes accept read requests. This is a case of single-write and multi-read, which is much simpler than that of more reading and writing. For the purpose of discussion later, it is also assumed that replica set consists of three basis points, one primary, two secondary, and all nodes persist data (data-bearing)

MongoDB's tradeoff between consistency and usability depends on three: write-concern, read-concern, and read-preference. The following is mainly the case with the MongoDB3.2 version, because read-concern was introduced in the MongoDB3.2 version.

Write-concern:

Write concern indicates the circumstances under which MongoDB responds to the client for write operations. Includes the following three fields:

{w:, j:, wtimeout:}

W: indicates that the write request is returned to the client only after it has been processed by value MongoDB instances. Value range:

1: default value, which indicates that the data is returned after it is written to MongoDB of standalone or primary of replica set

0: directly return to the client without writing, which has high performance, but data may be lost. However, it can be combined with j:True to increase the persistence of data (durability)

> 1: it is only useful in replica set environment. If the value is greater than the number of nodes in the replica set, it may cause blocking.

'majority': is returned to the client after the data is written to most nodes of replica set. In this case, it is usually used with read-concern:

After the write operation returns with a w: "majority" acknowledgement to the client, the client can read the result of that write with a "majority" readConcern

J: indicates that the write request is returned to the client only after the write request is written to journal. The default is False. Two points to note:

If you use j:True for MongoDB instances that do not have journaling enabled, an error will be reported.

After MongoDB3.2 and then, for w > 1, all instances need to be written to journal before returning

Wtimeout: indicates the timeout of the write, that is, at the specified time (number), if it cannot be returned to the client (w greater than 1), an error will be returned

The default is 0, which means that this option is not set

In MongoDB3.4, writeConcernMajorityJournalDefault. Such an option makes wmai j different in different combinations:

Read-reference:

As explained earlier, a replica set consists of a primary and multiple secondary. Primary accepts writes, so the data must be up-to-date, and secondary synchronizes writes through oplog, so there is a certain delay in the data. For query services that are not very sensitive to timeliness, you can query from the secondary node to reduce the pressure on the cluster.

MongoDB pointed out that it is very flexible to choose different read-reference under different circumstances. MongoDB driver supports several kinds of read-reference:

Primary: default mode, where all reads are routed to the primary node of replica set

PrimaryPreferred: normally, it is routed to the primary node, and only when the primary node is not available (failover) will it be routed to the secondary node.

Secondary: all read operations are routed to the secondary node of replica set

SecondaryPreferred: normally, it is routed to the secondary node, and only when the secondary node is not available, it is routed to the primary node.

Nearest: reads data from the node with the least latency, whether it's primary or secondary. For distributed applications and MongoDB is a multi-data center deployment, nearest can guarantee the best data locality.

If you are using secondary or secondaryPreferred, you need to be aware of:

(1) because of the delay, the data read may not be up to date, and the data returned by different secondary may not be the same.

(2) for sharded collection with balancer enabled by default, secondary may return missing or redundant data due to the unfinished or abnormally terminated chunk migration.

(3) in the case of multiple secondary nodes, which secondary node to choose is simply "closest", that is, the node with the lowest average delay, specifically participating in Server Selection Algorithm.

Read-concern:

Read concern is a new feature added in MongoDB3.2 to indicate what kind of data is returned for replica set (including shard that uses replication sets in sharded cluster). Different storage engines have different support for read-concern.

Read concern has the following three level:

Local: the default value, which returns the latest data of the current node, which depends on read reference.

Majority: returns the latest data that has been confirmed to be written to most nodes. The following conditions are required for the use of this option: the WiredTiger storage engine and the use of election protocol version 1; the-- enableMajorityReadConcern option is specified when starting the MongoDB instance.

Introduced in the linearizable:3.4 version, which is skipped here, interested readers refer to the documentation.

There is a sentence in the article:

Regardless of the read concern level, the most recent data on a node may not reflect the most recent version of the data in the system.

That is, even if read concern:majority is used, it does not necessarily return up-to-date data, which is not the same thing as NWR theory. The fundamental reason is that the final returned value comes from only one MongoDB node, and the choice of this node depends on read reference.

In this article, the significance and implementation of the introduction of readconcern are described in detail, and only the core is referenced here:

The original purpose of readConcern is to solve the problem of "dirty reading". For example, a user reads a piece of data from MongoDB's primary, but the data is not synchronized to most nodes, and then primary fails. After recovery, the primary node will roll back the data that is not synchronized to most nodes, causing users to read "dirty data".

When the readConcern level is majority, it can ensure that the data read by the user "has been written to most nodes", and such data will not be rolled back, avoiding the problem of dirty reading.

Consistent or availability?

Review the issues with consistent usability in CAP theory:

Consistency means that for each read operation, either the most recently written data can be read or an error can be made.

Availability means that you can get a timely and unmistakable response to each request, but there is no guarantee that the result of the request is based on the latest written data.

As mentioned earlier, the discussion of consistent availability in this article is based on replica set, and it doesn't matter whether it is shared cluster or not. In addition, the discussion is based on the case of a single client, if it is multiple clients, it seems to be a problem of isolation and does not belong to the scope of CAP theory. Based on the understanding of write concern, read concern and read reference, we can draw the following conclusions.

By default (wprimary, readconcern:local) if read preference is primary, you can read the latest data with strong consistency, but if primary fails at this time, an error will be returned at this time, and availability is not guaranteed

By default (WGV 1, readconcern:local) if read preference is secondary (secondaryPreferred, primaryPreferred), although you may read outdated data, you can get the data immediately, and the availability is better.

Writeconern:majority guarantees that the data written will not be rolled back; readconcern:majority guarantees that the data read will not be rolled back

Even if it is read from primary, there is no guarantee that the latest data will be returned, so it is weak consistency.

If (w: majority, readcocern:majority), if you are reading from primary, you must be able to read the latest data, and the data will not be rolled back, but the write availability is poor at this time; if you are reading from secondary, you cannot guarantee that you can read the latest data with weak consistency.

In retrospect, what MongoDB calls high availability is availability in a more general sense: through data replication and automatic failover, the entire cluster can recover and continue to work in a short period of time even if a physical failure occurs, not to mention automatic recovery. In this sense, it is indeed highly available.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.