Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is NoSQL?

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article introduces the knowledge of "what is NoSQL". In the operation of actual cases, many people will encounter such a dilemma. Then let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Name of NoSQL

Before we define NoSQL, let's try to interpret it from its name. As the name implies, the data manipulation interface of the NoSQL system should be non-SQL type. But in the NoSQL community, NoSQL is given a more inclusive meaning, which means that Not Only SQL, that is, NoSQL, provides a different storage model from traditional relational databases, which provides developers with an alternative to relational databases.

The inspiration of NoSQL

The NoSQL movement has been inspired by many related research papers. Among all the materials, there are two core ones: Google's BigTable paper and Amazon's Dynamo paper.

Overview of Properti

The NoSQL system abandons some functions in the SQL standard and is replaced by some simple and flexible functions. The construction idea of NoSQL is to simplify data operations as much as possible and make the execution efficiency of operations predictable as far as possible. When you check out a NoSQL system, the following points are worth noting.

Data model and operation model: is your application layer data model row, object, or document? Can this system support you to do some statistical work?

Reliability: when you update the data, is the new data written to the persistent storage immediately? Is the new data synchronized to multiple machines?

Scalability: how much data do you have, and can you accommodate it on a single computer? Can you support your reading and writing requirements on a stand-alone machine?

Partitioning strategy: do you need a piece of data to be stored on multiple machines, considering the requirements for scalability, availability, or persistence? Do you need to know or can you know which machine the data is on?

Consistency: has your data been copied to multiple machines? How can the data of these different nodes be consistent?

Transaction mechanism: does the business need an ACID transaction mechanism?

Stand-alone performance: if you plan to store data on disk persistently, which data structure will meet your needs (do you need to read more or write more)? Will write operations become disk bottlenecks?

The load is assessable: for an application that reads more and writes less, such as a web application that responds to user requests, we always pay a lot of attention to the load. You may need to monitor the data scale and summarize the data of multiple users. Does your application scenario need such a feature?

NoSQL data model and operation model

The data model of the database refers to the organization of the data in the database, and the operation model of the database refers to the way to access the data. Usually, data models include relational models, key-value models and various graph structure models. Operation languages may include SQL, key-value query, MapReduce and so on. NoSQL usually combines a variety of data models and operational models to provide different architectures.

NoSQL data Model based on key value Storage

In key-based systems, complex joint queries and data query operations that meet multiple conditions are not so easy to achieve, and a different way of thinking is needed to establish and use key names. For example, to get the information of all employees with department number 20, the application layer can first get the list whose Key is employee_departments:20, and then cycle through the ID in this list to get the information of all employees by getting employee:ID.

Key-Value storage

Key-Value storage can be said to be the simplest NoSQL storage, with each Key value corresponding to an arbitrary data value. For the NoSQL system, it doesn't care what this arbitrary data value is. For example, in the employee belief database, the Key of employee:30 may correspond to a piece of binary data that contains all the information of the employee. It doesn't matter if the binary format may be Protocol Buffer, Thrift, or Avro.

Key- structured data storage

A typical example of a Key- structured data store is that Redis,Redis turns the Value stored by Key-Value into a structured data type. The types of Value include numbers, strings, lists, collections, and ordered collections. In addition to set/get/delete operations, Redis also provides many special operations for the above data types, such as adding and subtracting operations for numbers and push/pop operations for list. By providing this specific type of operation for a single Value, Redis can be said to achieve a balance between function and performance.

Key- document storage

Representatives of Key- document storage are CouchDB, MongoDB, and Riak. Under this storage structure, the Value of Key-Value is a structured document, which is usually converted to JSON or a structure similar to JSON for storage. Documents can store lists, key-value pairs, and documents with complex hierarchies.

Column Cluster Storage of BigTable

The data models of HBase and Cassandra are borrowed from BigTable of Google. This data model is characterized by column storage, where items of each row of data are stored in different columns (the set of these columns is called column clusters). Each data in each column contains a timestamp property so that multiple versions of the same data item in the column can be saved.

Column storage can be understood as follows: row ID, column cluster number, column number, and timestamp are combined to form a Key, and then the Value is stored in Key order. The structure of Key value enables this data structure to achieve some special functions, the most commonly used is to save multiple versions of a data into several values with different timestamps, so that historical data can be easily saved. This structure can also naturally store loose column data (there is no data for a column in many rows). Of course, for those columns that rarely have a NULL value in a row, this will result in a waste of space because each data must contain a column identity.

Graph structure storage

Graph structure storage is another storage implementation of NoSQL. The guiding idea is that the data is not peer-to-peer, and relational storage or key-value pair storage may not be the storage method. Graph structure is one of the basic structures of computer science. Neo4j and HyperGraphDB are the current Graph structure databases.

Complex query

In NoSQL storage systems, there are many more complex operations than key-value lookups. For example, MongoDB can build indexes on any row of data, and you can use Javascript syntax to set complex query conditions. BigTable systems usually support traversing data in a single row, allowing data in a single column to be filtered according to specific criteria. CouchDB allows you to create multiple views of the same data and implement more complex queries or updates by running MapReduce tasks. Many NoSQL systems support large-scale data analysis in combination with Hadoop or other MapReduce frameworks.

Transaction mechanism

Unlike relational databases, NoSQL systems usually focus on performance and scalability rather than transaction mechanisms. Transactions in traditional SQL databases are usually strong transaction mechanisms that support ACID. The support of ACID enables applications to be clear about their current data status. For many NoSQL systems, the performance consideration is far above the guarantee of ACID. Usually the NoSQL system only provides the guarantee of atomicity at the row level, that is to say, two operations on the data under the same Key will be carried out serially in the actual execution, ensuring that each Key-Value pair will not be destroyed.

Storage of Schema-free

Another thing that many NoSQL have in common is that it usually does not have mandatory data structure constraints. Even on document or column storage, it is not required that a data column must exist on every row of data.

Data reliability

Ideally, the database will immediately write all writes to the persistent storage device and copy multiple copies to different nodes in different geographic locations to prevent data loss. However, this requirement of data security has an impact on performance, so different NoSQL systems adopt different strategies in data security under the consideration of their own performance.

Single machine reliability

Stand-alone reliability is very simple to understand, and it is defined that write operations will not be lost due to machine restart or power outage. Usually, the guarantee of stand-alone reliability is accomplished by writing data to disk, which usually causes disk Imando O to become the bottleneck of the whole system. Let's talk about some ways to improve performance under the guarantee of stand-alone reliability.

Control the frequency of fsync calls

Redis provides several ways to control the frequency of fsync calls. Application developers can configure Redis to execute fsync once after each update operation, which is safer and slower. Redis can also be set to call fsync once in N seconds for better performance. But the consequence is that in the event of a failure, it may lead to data loss within N seconds at most. For some occasions where reliability is not too high (for example, when using Redis only as Cache), application developers can even turn off the call to fsync directly: let the operating system decide when to flush the data to disk. Redis can close aof logs, and Redis itself supports the mechanism of dump data in memory into rdb files, which is not the same as above.

Use a log-based data structure

Cassandra, HBase, Redis, and Riak all write to a log file sequentially. The log files mentioned above can be fsync frequently relative to other data structures in the storage system, thus turning random writes to the disk into sequential writes.

Improve throughput by merging write operations

Cassandra has a mechanism that puts several concurrent writes together for a short period of time to make a fsync call, which is called group commit.

Multi-machine reliability

As the hardware level sometimes causes irreparable damage, the guarantee of stand-alone reliability is out of reach at this time. For some important data, cross-machine backup and preservation is a necessary security measure. Some NoSQL systems provide support for multi-computer reliability.

Redis adopts the traditional way of master-slave data synchronization.

MongoDB provides a highly available architecture called Replica Sets.

Riak, Cassandra, and Voldemort provide some more flexible configurable policies and a configurable parameter N, which represents the number of copies of each data that will be backed up. In order to deal with the failure of the whole data center, it is necessary to realize the multi-machine backup function across the data center.

Scale-out brings performance improvement

The goal of scale-out is to achieve a linear effect, that is, if you double the number of machines, you should be able to double the load capacity accordingly. The main problem to be solved is how to distribute data among multiple machines, which involves slicing technology.

Sharding means that no one machine can handle all write requests, and no one machine can handle read requests for all data. Below, we will describe hash sharding and range sharding.

Do not slice if it is not necessary

Sharding can lead to a great increase in system complexity, so do not use sharding if it is not necessary. In general, we can use read-write separation and build caching to relieve our data read pressure. But if the write operation reaches a level that a single point cannot afford, then we may really need to slice.

Data slicing through coordinator

One slicing strategy is realized by introducing an intermediate proxy layer, which records the distribution of data in each node, and all read and write requests are routed through the proxy layer. For example, two projects with CouchDB: Lounge and BigCouch. Similarly, Twitter itself implements a coordinator called Gizzard, which can implement data sharding and backup functions.

Consistent hash ring algorithm

Consistent hash is a widely used technology, which was first used in a system called distributed hash tables (DHTs). Those Dynamo-like applications, such as Cassandra, Voldemort and Riak, basically use the consistent hash ring algorithm.

As shown in figure 1, the consistent hash ring algorithm has a hash function H, through which all nodes that store data and the data itself can calculate a hash value as their position on the lower ring. Each node is then responsible for storing its hash value to all the data between the next node. As a result, even if the number of nodes changes, most of the data does not need to be migrated.

Fig. 1 hash function of consistent hash ring algorithm

Continuous range partition

To use the method of continuous range partitioning for data slicing, we need to save a mapping table to indicate which Key value corresponds to which machine. Similar to consistent hash, contiguous range partitions segment Key values into consecutive ranges, with each piece of data specified to be saved on one node and then backed up redundant to other nodes.

The way BigTable is handled

A range partitioning method is described in the Google BigTable paper, which divides data into tablet blocks. Each tablet holds a certain number of key-value pairs. It is then stored on the Tablet server. The size of the tablet block will remain within a certain range, too large blocks will split into two, and too small blocks will be merged into one. BigTable implements node state detection through a module called Chubby. Similarly, there is a tool called ZooKeeper in Hadoop to implement this function.

Consistency

We talked about ensuring data security and reducing load by storing data redundancy in different nodes. Let's take a look at a problem caused by this: it is very difficult to ensure the consistency of data among multiple nodes. The problem of maintaining data consistency across multiple points is the theme of this chapter. Let's first take a look at the famous CAP theory.

Consistency (C): whether all data backups in a distributed system have the same value at the same time.

Availability (A): whether the cluster as a whole can respond to read and write requests from clients after a failure of some nodes in the cluster.

Partition tolerance (P): whether the cluster as a whole can continue to serve after some nodes in the cluster cannot be contacted.

The CAP theory means that in a distributed storage system, the above two points can only be realized at most. In addition, the current network hardware is bound to have problems such as delayed packet loss, so partition tolerance is something we must achieve. The result is that we can only make a tradeoff between consistency and availability, and no NoSQL system can guarantee all three at the same time.

For the guarantee of consistency, there are usually strong consistency and weak consistency, and in weak consistency, the realization of final consistency is more common.

If we use the NRW setting, N is the number of copies of data that need to be backed up, R is the number of copies of data on different nodes that need to be read for read operations, and W is the number of copies of data that need to be successfully written to different nodes for write operations, then when ringing W > N, it is a guarantee of strong consistency.

That's all for "what NoSQL is". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report