In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
1. ACID, Distributed, CAP, BASE Theory
Relational databases follow ACID rules
A transaction in English is a transaction, which is similar to a transaction in the real world. It has four characteristics:
A (Atomicity) atomicity
Atomicity is easy to understand, that is, all operations in the transaction are either completed or not done. The condition for the success of the transaction is that all operations in the transaction are successful. As long as one operation fails, the whole transaction fails and needs to be rolled back.
For example, bank transfer, from account A to 100 yuan to account B, is divided into two steps: 1) take 100 yuan from account A;2) deposit 100 yuan to account B. These two steps are either completed together or not completed together. If only the first step is completed and the second step fails, the money will be inexplicably less than 100 yuan.
C (Consistency)
Consistency is also easy to understand, that is, the database should always be in a consistent state, and the operation of transactions will not change the original consistency constraints of the database.
For example, the existing integrity constraint a+b=10, if a transaction changes a, then b must be changed, so that the transaction still satisfies a+b=10, otherwise the transaction fails.
I (Isolation) Independence or Isolation
Independence means that concurrent transactions do not affect each other. If one transaction accesses data that is being modified by another transaction, the data it accesses is not affected by the uncommitted transaction as long as the other transaction is not committed.
For example, there is an existing transaction that transfers 100 yuan from account A to account B. In the case that this transaction has not been completed, if B queries his own account at this time, he cannot see the newly added 100 yuan.
D (Durability)
Persistence means that once a transaction is committed, its changes will be permanently stored in the database and will not be lost even if there is an outage.
distributed system
A distributed system consists of multiple computers and communicating software components connected through a computer network (local area network or wide area network).
Distributed systems are software systems built on networks. Because of the nature of software, distributed systems are highly cohesive and transparent.
Thus, the difference between a network and a distributed system lies more in high-level software (especially the operating system) than in hardware.
Distributed systems can be used on different platforms such as PCs, workstations, local area network and wide area network.
Advantages of Distributed Computing
Reliability (fault tolerance):
An important advantage in distributed computing systems is reliability. A system crash on one server does not affect the rest of the servers.
Scalability:
More machines can be added to distributed computing systems as needed.
Resource sharing:
Shared data is essential for applications such as banking and reservation systems.
Flexibility:
Since the system is very flexible, it is easy to install, implement and debug new services.
Faster speed:
A distributed computing system can have the computing power of multiple computers, making it faster than other systems.
Open systems:
Because it is an open system, the service can be accessed locally or remotely.
Higher performance:
Cluster can provide higher performance (and better price/performance ratio) than centralized computer networks.
Disadvantages of Distributed Computing
Troubleshooting:
Troubleshooting and diagnosing problems.
Software:
Less software support is a major disadvantage of distributed computing systems.
Network:
Network infrastructure problems, including: transmission problems, high load, information loss, etc.
Safety:
The characteristics of open systems make distributed computing systems have problems such as data security and sharing risk.
CAP theorem
In computer science, CAP theorem, also known as Brewer's theorem, states that it is impossible for a distributed computing system to satisfy all three of the following:
consistency (Consistency)
All nodes have the same data at the same time
Availability
Ensure that every request has a response regardless of success or failure
Partition tolerance is also known as partition tolerance.
The loss or failure of any information in the system does not affect the continued operation of the system
The core of CAP theory is that a distributed system can not satisfy consistency, availability and partition fault tolerance at the same time, and can only satisfy two requirements at the same time.
Therefore, according to CAP principle, NoSQL database is divided into three categories: CA principle, CP principle and AP principle:
CA -Single point cluster, a system that satisfies consistency, availability, and is generally less robust in scalability.
CP -A system that satisfies consistency, partition tolerance, and generally performance is not particularly high.
AP -Systems that meet availability, partition tolerance, and generally may have lower consistency requirements.
BASE
BASE:Basically Available, Soft-state, Eventually Consistent。Defined by Eric Brewer.
The core of CAP theory is that a distributed system can not satisfy consistency, availability and partition fault tolerance at the same time, and can only satisfy two requirements at the same time.
BASE is a NoSQL database that generally has weak requirements for availability and consistency:
Basically Available--Basically Available
Soft-state --soft state/soft transaction. "Soft state" can be understood as "connectionless," while "Hard state" is "connection-oriented."
Eventual Consistency -Ultimate consistency (also called weak consistency); also the ultimate goal of ACID.
ACID vs BASEACIDBASE Atomicity Basically Available Consistency Soft state Isolation Eventual consistency Durable
ACID: Strong consistency, isolation, pessimistic conservative approach, difficult to change
BASE: Weak consistency, availability first, optimistic approach, easy to adapt to change, simpler, faster
Final consistency breakdown:
causal consistency
Read and Write Consistency
session consistency
Monotonic read consistency, Timeline consistency (also known as monotonic write consistency)
Techniques for data consistency:
NRW:
PC: Two-phase commit, a protocol that guarantees strong consistency
Paxos,
Vector Clock: Vector Clock
II. Overview of NoSQL
1. Introduction to NoSQL
1998,NoREL
2009, NoSQL officially proposed
NoSQL(NoSQL = Not Only SQL ), meaning "more than SQL";NoSQL is a genre of technology, not a specific technology
A large amount of data is generated on the network every day on modern computing systems, and a large part of this data is processed by relational database management systems (RDMBS). E.F.Codd's paper "A relational model of data for large shared data banks" in 1970, which made data modeling and application programming easier. The relational model has proven to be well suited for client server programming, far exceeding the expected benefits, and today it is the dominant technology for structured data storage in web and business applications.
NoSQL is a completely new database revolution, proposed early on and growing stronger in 2009. NoSQL advocates advocate the use of non-relational data stores, a concept that is a fresh injection of thought compared to the overwhelming use of relational databases.
What is NoSQL?
NoSQL refers to a non-relational database. NoSQL, sometimes referred to as Not Only SQL, is a generic term for database management systems that differ from traditional relational databases.
NoSQL is used for storage of very large scale data. (Google or Facebook, for example, collect trillions of bits of data every day for their users.) These types of data stores do not require a fixed schema and can scale horizontally without unnecessary manipulation.
Why use NoSQL ?
Today we can easily access and capture data through third-party platforms (e.g. Google,Facebook, etc.). Personal information about users, Social networks, geolocation, user-generated data and user action logs have multiplied. If we want to mine these user data, SQL databases are no longer suitable for these applications, and the development of NoSQL databases can also handle these large data well.
2. Big data problem BigData, massive data
1) Big Data Four Management Systems (Storage)
Parallel Database Systems: Traditional RDBMS Horizontal Splitting, Partitioned Query
NoSQL database management system: non-relational, distributed, does not support ACID database design paradigm
NewSQL Database Management System: Attempting to implement ACID on a distributed or distributed basis
Why is SQL so hard to distribute?
Logging: Logging is an obstacle in distributed systems
Locks: pessimistic concurrency control, a large number of locks, when a resource is used by a process, another process needs to wait when it needs to use this resource
Buffer Management: How Multiple Node Buffers Realize Data Interaction
Open Source Solutions:
Clustix,GenieDB,ScaleArc,ScaleBase,NimbusDB,Drizzle
Cloud Data Management System:
2) Big data analysis and processing
MapReduce:
Technical characteristics of NoSQL
Non-relational, distributed (CAP,BASE theory), no ACID provided
Simple data models (e.g. Key-Value)
Separation of metadata and application data
Metadata is used for data management and requires a dedicated metadata management node
Weak consistency, final consistency supported by technical solutions
high throughput capability
Higher levels of scalability and low-end hardware clustering
Advantages of NoSQL
Avoid unnecessary complexity
Do not use object-relational mapping
Disadvantages of NoSQL
Data models and query languages are not mathematically validated
ACID (transaction) is not supported
function is simple
There is no unified data query model (SQL for relational databases)
III. Data storage model
The various schools of NoSQL are divided according to the data storage model
NoSQL database classification type part represents
feature column storage
Hbase
Cassandra
Hypertable
As the name suggests, it stores data in columns. The biggest feature is that it is convenient to store structured and semi-structured data, convenient to do data compression, and has a very large IO advantage for queries against a certain column or several columns.
document storage
MongoDB
CouchDB
Document storage is generally stored in a format similar to json, and the stored content is document-type. This also gives you the opportunity to index certain fields and implement certain functions of relational databases.
key-value storage
Tokyo Cabinet / Tyrant
Berkeley DB
MemcacheDB
Redis
You can quickly find its value by key. In general, storage regardless of the format of the value, accept it as it is. (Redis includes other functions)
graph store
Neo4J
FlockDB
Optimal storage of graph relationships. Using traditional relational database to solve the problem has low performance and inconvenient design.
object storage
db4o
Versant
Manipulate databases through syntax similar to object-oriented languages, accessing data through objects.
XML database
Berkeley DB XML
BaseX
Efficient storage of XML data, and support for XML internal lookup
Column model:
Application scenario: Provide distributed data storage supporting random read and write on top of distributed file system
Typical products: HBase, Hypertable, Cassandra
Data model: store "column" as the center, store the same column data together
Advantages: fast query, high scalability, easy to implement distributed expansion
Document model:
Application scenario: web applications with non-strong transaction requirements
Typical products: MongoDB, ElasticSearch, CouchDB, CouchBase Server
Data model: Key-value model, storing data as documents
Advantages: Data models do not need to be defined in advance
Key Model:
Application scenario: content cache, used for high-load scenarios with large parallel data access
Typical products: DynamoDB, Riak, Redis
Data model: key-value based on hash table implementation
Advantages: Quick inquiry
Schema model:
Application scenarios: Social networks, recommendation systems, relationship graphs
Typical products: Neo4j, Infinite Graph
Data Model: Schema Structure
Advantages: Adapt to schema calculation scenarios
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.