[NoSQL] 01, NoSQL Foundation 04/27 Update SLTechnology News&Howtos

[NoSQL] 01, NoSQL Foundation

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

1. ACID, Distributed, CAP, BASE Theory

Relational databases follow ACID rules

A transaction in English is a transaction, which is similar to a transaction in the real world. It has four characteristics:

A (Atomicity) atomicity

Atomicity is easy to understand, that is, all operations in the transaction are either completed or not done. The condition for the success of the transaction is that all operations in the transaction are successful. As long as one operation fails, the whole transaction fails and needs to be rolled back.

For example, bank transfer, from account A to 100 yuan to account B, is divided into two steps: 1) take 100 yuan from account A;2) deposit 100 yuan to account B. These two steps are either completed together or not completed together. If only the first step is completed and the second step fails, the money will be inexplicably less than 100 yuan.

C (Consistency)

Consistency is also easy to understand, that is, the database should always be in a consistent state, and the operation of transactions will not change the original consistency constraints of the database.

For example, the existing integrity constraint a+b=10, if a transaction changes a, then b must be changed, so that the transaction still satisfies a+b=10, otherwise the transaction fails.

I (Isolation) Independence or Isolation

Independence means that concurrent transactions do not affect each other. If one transaction accesses data that is being modified by another transaction, the data it accesses is not affected by the uncommitted transaction as long as the other transaction is not committed.

For example, there is an existing transaction that transfers 100 yuan from account A to account B. In the case that this transaction has not been completed, if B queries his own account at this time, he cannot see the newly added 100 yuan.

D (Durability)

Persistence means that once a transaction is committed, its changes will be permanently stored in the database and will not be lost even if there is an outage.

distributed system

A distributed system consists of multiple computers and communicating software components connected through a computer network (local area network or wide area network).

Distributed systems are software systems built on networks. Because of the nature of software, distributed systems are highly cohesive and transparent.

Thus, the difference between a network and a distributed system lies more in high-level software (especially the operating system) than in hardware.

Distributed systems can be used on different platforms such as PCs, workstations, local area network and wide area network.

Advantages of Distributed Computing

Reliability (fault tolerance):

An important advantage in distributed computing systems is reliability. A system crash on one server does not affect the rest of the servers.

Scalability:

More machines can be added to distributed computing systems as needed.

Resource sharing:

Shared data is essential for applications such as banking and reservation systems.

Flexibility:

Since the system is very flexible, it is easy to install, implement and debug new services.

Faster speed:

A distributed computing system can have the computing power of multiple computers, making it faster than other systems.

Open systems:

Because it is an open system, the service can be accessed locally or remotely.

Higher performance:

Cluster can provide higher performance (and better price/performance ratio) than centralized computer networks.

Disadvantages of Distributed Computing

Troubleshooting:

Troubleshooting and diagnosing problems.

Software:

Less software support is a major disadvantage of distributed computing systems.

Network:

Network infrastructure problems, including: transmission problems, high load, information loss, etc.

Safety:

The characteristics of open systems make distributed computing systems have problems such as data security and sharing risk.

CAP theorem

In computer science, CAP theorem, also known as Brewer's theorem, states that it is impossible for a distributed computing system to satisfy all three of the following:

consistency (Consistency)

All nodes have the same data at the same time

Availability

Ensure that every request has a response regardless of success or failure

Partition tolerance is also known as partition tolerance.

The loss or failure of any information in the system does not affect the continued operation of the system

The core of CAP theory is that a distributed system can not satisfy consistency, availability and partition fault tolerance at the same time, and can only satisfy two requirements at the same time.

Therefore, according to CAP principle, NoSQL database is divided into three categories: CA principle, CP principle and AP principle:

CA -Single point cluster, a system that satisfies consistency, availability, and is generally less robust in scalability.

CP -A system that satisfies consistency, partition tolerance, and generally performance is not particularly high.

AP -Systems that meet availability, partition tolerance, and generally may have lower consistency requirements.

BASE

BASE：Basically Available, Soft-state, Eventually Consistent。Defined by Eric Brewer.

The core of CAP theory is that a distributed system can not satisfy consistency, availability and partition fault tolerance at the same time, and can only satisfy two requirements at the same time.

BASE is a NoSQL database that generally has weak requirements for availability and consistency:

Basically Available--Basically Available

Soft-state --soft state/soft transaction. "Soft state" can be understood as "connectionless," while "Hard state" is "connection-oriented."

Eventual Consistency -Ultimate consistency (also called weak consistency); also the ultimate goal of ACID.

ACID vs BASEACIDBASE Atomicity Basically Available Consistency Soft state Isolation Eventual consistency Durable

ACID: Strong consistency, isolation, pessimistic conservative approach, difficult to change

BASE: Weak consistency, availability first, optimistic approach, easy to adapt to change, simpler, faster

Final consistency breakdown:

causal consistency

Read and Write Consistency

session consistency

Monotonic read consistency, Timeline consistency (also known as monotonic write consistency)

Techniques for data consistency:

NRW：

PC: Two-phase commit, a protocol that guarantees strong consistency

Paxos，

Vector Clock: Vector Clock

II. Overview of NoSQL

1. Introduction to NoSQL

1998，NoREL

2009, NoSQL officially proposed

NoSQL(NoSQL = Not Only SQL ), meaning "more than SQL";NoSQL is a genre of technology, not a specific technology

A large amount of data is generated on the network every day on modern computing systems, and a large part of this data is processed by relational database management systems (RDMBS). E.F.Codd's paper "A relational model of data for large shared data banks" in 1970, which made data modeling and application programming easier. The relational model has proven to be well suited for client server programming, far exceeding the expected benefits, and today it is the dominant technology for structured data storage in web and business applications.

NoSQL is a completely new database revolution, proposed early on and growing stronger in 2009. NoSQL advocates advocate the use of non-relational data stores, a concept that is a fresh injection of thought compared to the overwhelming use of relational databases.

What is NoSQL?

NoSQL refers to a non-relational database. NoSQL, sometimes referred to as Not Only SQL, is a generic term for database management systems that differ from traditional relational databases.

NoSQL is used for storage of very large scale data. (Google or Facebook, for example, collect trillions of bits of data every day for their users.) These types of data stores do not require a fixed schema and can scale horizontally without unnecessary manipulation.

Why use NoSQL ?

Today we can easily access and capture data through third-party platforms (e.g. Google,Facebook, etc.). Personal information about users, Social networks, geolocation, user-generated data and user action logs have multiplied. If we want to mine these user data, SQL databases are no longer suitable for these applications, and the development of NoSQL databases can also handle these large data well.

2. Big data problem BigData, massive data

1) Big Data Four Management Systems (Storage)

Parallel Database Systems: Traditional RDBMS Horizontal Splitting, Partitioned Query

NoSQL database management system: non-relational, distributed, does not support ACID database design paradigm

NewSQL Database Management System: Attempting to implement ACID on a distributed or distributed basis

Why is SQL so hard to distribute?

Logging: Logging is an obstacle in distributed systems

Locks: pessimistic concurrency control, a large number of locks, when a resource is used by a process, another process needs to wait when it needs to use this resource

Buffer Management: How Multiple Node Buffers Realize Data Interaction

Open Source Solutions:

Clustix，GenieDB，ScaleArc，ScaleBase，NimbusDB，Drizzle

Cloud Data Management System:

2) Big data analysis and processing

MapReduce：

Technical characteristics of NoSQL

Non-relational, distributed (CAP,BASE theory), no ACID provided

Simple data models (e.g. Key-Value)

Separation of metadata and application data

Metadata is used for data management and requires a dedicated metadata management node

Weak consistency, final consistency supported by technical solutions

high throughput capability

Higher levels of scalability and low-end hardware clustering

Advantages of NoSQL

Avoid unnecessary complexity

Do not use object-relational mapping

Disadvantages of NoSQL

Data models and query languages are not mathematically validated

ACID (transaction) is not supported

function is simple

There is no unified data query model (SQL for relational databases)

III. Data storage model

The various schools of NoSQL are divided according to the data storage model

NoSQL database classification type part represents

feature column storage

Hbase

Cassandra

Hypertable

As the name suggests, it stores data in columns. The biggest feature is that it is convenient to store structured and semi-structured data, convenient to do data compression, and has a very large IO advantage for queries against a certain column or several columns.

document storage

MongoDB

CouchDB

Document storage is generally stored in a format similar to json, and the stored content is document-type. This also gives you the opportunity to index certain fields and implement certain functions of relational databases.

key-value storage

Tokyo Cabinet / Tyrant

Berkeley DB

MemcacheDB

Redis

You can quickly find its value by key. In general, storage regardless of the format of the value, accept it as it is. (Redis includes other functions)

graph store

Neo4J

FlockDB

Optimal storage of graph relationships. Using traditional relational database to solve the problem has low performance and inconvenient design.

object storage

db4o

Versant

Manipulate databases through syntax similar to object-oriented languages, accessing data through objects.

XML database

Berkeley DB XML

BaseX

Efficient storage of XML data, and support for XML internal lookup

Column model:

Application scenario: Provide distributed data storage supporting random read and write on top of distributed file system

Typical products: HBase, Hypertable, Cassandra

Data model: store "column" as the center, store the same column data together

Advantages: fast query, high scalability, easy to implement distributed expansion

Document model:

Application scenario: web applications with non-strong transaction requirements

Typical products: MongoDB, ElasticSearch, CouchDB, CouchBase Server

Data model: Key-value model, storing data as documents

Advantages: Data models do not need to be defined in advance

Key Model:

Application scenario: content cache, used for high-load scenarios with large parallel data access

Typical products: DynamoDB, Riak, Redis

Data model: key-value based on hash table implementation

Advantages: Quick inquiry

Schema model:

Application scenarios: Social networks, recommendation systems, relationship graphs

Typical products: Neo4j, Infinite Graph

Data Model: Schema Structure

Advantages: Adapt to schema calculation scenarios

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.