What are the distributed databases? 07/01 Update SLTechnology News&Howtos

What are the distributed databases?

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

Editor to share with you what distributed databases, I believe that most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's learn about it!

Distributed databases are: 1, Elasticsearch database, can exist a single node or multiple nodes; 2, Redis database, support rich data types; 3, Mongodb database, can more easily obtain data; 4, Mysql distributed cluster, high availability.

Distributed databases include:

I. Elasticsearch database

Introduction to 1.Elasticsearch

Distributed real-time file storage, each field is indexed and searchable, distributed real-time analysis search engine

Can scale to hundreds of servers to handle PB-level structured or unstructured data

2.Elasticsearch application scenario

Distributed search engine and data analysis engine, full-text search, structured retrieval, data analysis

Near real-time processing of massive data, site search (e-commerce, recruitment, portals, etc.), IT system search (OA,CRM,ERP, etc.), data analysis

Advantages and disadvantages of 3.Elasticsearch

Disadvantages: no user authentication and access control, no transaction concept, no rollback, erroneous deletion can not be restored, need java environment.

Advantages: divide your document into different containers or fragments, and there can be a single node or multiple nodes

Copy each shard to provide data backup to prevent data loss caused by hardware problems.

Route the mutual requests of any node in the cluster to ensure that the obtained data is what you need. When the cluster adds or redistributes shards, the new node recovers the lost shard data without downtime.

Persistence Scheme of 4.Elasticsearch

Gateway represents the persistent storage of elasticsearch indexes. By default, elasticsearch stores the indexes in memory first, and then persists them to the hard disk when the memory is full. Index data is read from gateway when the elasticsearch cluster is shut down or restarted again. Elasticsearch supports many types of gateway, including local file system (default), distributed file system, Hadoop's HDFS and amazon's S3 cloud storage service.

ElasticSearch first stores the contents of the index in memory, and then persists the index to the hard disk when there is not enough memory. At the same time, it also has a queue, which automatically writes the index to the hard disk when the system is idle.

II. Redis database

Introduction to 1.Redis

Redis is an open source BSD licensed advanced key-value storage system (NoSQL), which can be used to store strings, hash structures, linked lists, collections, therefore, it is often used to provide data structure services, Redis supports data persistence, you can save the data in memory on disk, and you can load it again when you restart. Support for simple key-value type data, but also provide storage of data structures such as list,set,zset,hash. Redis supports data backup, that is, data backup in master-slave mode.

2.Redis application scenario

A) regular count: fans, Weibo

B) change of user information

C) caching processing, as a cache for mysql

D) queuing system, queue system with priority, log collection system

Advantages and disadvantages of 3.Redis

Advantages:

(1) it is fast, because the data is stored in memory, and the advantage similar to HashMap,HashMap is that the time complexity of search and operation is O (1).

(2) supports rich data types and string,list,set,sorted set,hash

(3) transactions are supported, and all operations are atomic. the so-called atomicity means that changes to the data are either performed or not performed at all.

(4) rich features: can be used for caching, messages, set expiration time by key, and will be deleted automatically after expiration

Disadvantages:

(1) Redis does not have automatic fault tolerance and recovery function, and the downtime of the host slave will cause some read and write requests in the front end to fail. You need to wait for the machine to restart or manually switch the IP of the front end to recover.

(2) when the host is down, some data cannot be synchronized to the slave in time before downtime, and the problem of data inconsistency will be introduced after switching IP, which reduces the availability of the system.

(3) the master-slave replication of redis adopts full replication. In the process of replication, the master fork makes a snapshot of the memory of a child process, and saves the memory snapshot of the child process as a file to send to the slave. This process needs to ensure that the host has enough free memory. If the snapshot file is large, it will have a great impact on the service capacity of the cluster, and the replication process will be carried out when the slave joins the cluster or when the slave is disconnected from the host network, that is, the network fluctuation will cause a full data replication between the host and the slave, which causes a lot of trouble to the actual system operation.

(4) it is difficult for Redis to support online expansion, and it will become very complex when the cluster capacity reaches the upper limit. In order to avoid this problem, operators must ensure that there is enough space when the system is online, which causes a great waste of resources.

Persistence Scheme of 4.Redis

Redis provides two ways for persistence, one is RDB persistence (the principle is to dump the database records of Reids in memory regularly to RDB persistence on disk), and the other is AOF (append only file) persistence (the principle is to write the operation log of Reids to a file in an appended way).

RDB persistence means that the snapshot of the dataset in memory is written to disk within a specified time interval. The actual operation is a child process of fork, which first writes the dataset to a temporary file, and then replaces the previous file and stores it with binary compression.

III. Mongodb database

Introduction to 1.Mongodb

MongoDB itself is a non-relational database. Each of its records is a Document, and each Document is composed of a set of key-value pairs. The Document in MongoDB is similar to the JSON object. The values of fields in Document may include other Document, arrays, and so on.

2.Mongodb application scenario

The main goal of mongodb is to build a bridge between key / value storage (which provides high performance and high scalability) and traditional RDBMS systems (rich features), combining the advantages of both. Mongo applies to the following scenarios:

a. Website data: mongo is very suitable for real-time insertion, update and query, and has the replication and high scalability required for website real-time data storage.

b. Caching: due to its high performance, mongo is also suitable as a cache layer for the information infrastructure. After the system is rebooted, the persistence cache built by mongo can avoid the overload of data sources in the lower layer.

c. Large-size, low-value data: it may be expensive to use traditional relational databases to store some data. before that, many programmers tend to choose traditional files for storage.

d. Highly scalable scenarios: mongo is ideal for databases made up of dozens or hundreds of servers.

e. For the storage of objects and JSON data: mongo's BSON data format is very suitable for document format storage and query.

Advantages and disadvantages of 3.Mongodb

Advantages:

(1) weak consistency (final consistency), which can better ensure the access speed of users

(2) the storage mode of document structure makes it more convenient to obtain data.

(3) built-in GridFS to support large-capacity storage

(4) in the case of use, for tens of millions of document objects and nearly 10 gigabytes of data, the query for indexed ID will not be slower than mysql, while the query for non-indexed fields will win in an all-round way.

Disadvantages:

(1) do not support things

(2) too much space will be wasted on disk.

(3) the reliability of single machine is poor.

(4) A large amount of data is inserted continuously, and the write performance fluctuates greatly.

Persistence scheme / exception handling for 4.Mongodb

When performing a write operation, MongoDB creates a journal to contain the exact disk location and the changed bytes. Therefore, if the server crashes suddenly, journal will replay any writes that were not flushed to disk before the crash.

Data files are flushed to disk every 60s, by default, so journal only needs to hold write data within 60s. Journal pre-allocates several empty files for this purpose, located in / data/db/journal, named _ j.0.j.1, and so on.

When MongoDB is running for a long time, in the journal directory, you will see files similar to _ j.6217j.6218 and _ j.6219. These files are current journal files, and if MongoDB keeps running, these numbers will continue to increase. When MongoDB is normally closed, these files will be erased because these logs are no longer required for a normal shutdown.

If the server crashes or kill-9, when mongodb starts again, the journal file will be replayed and tedious checklines will be output, indicating a normal recovery.

4. Mysql distributed cluster

Brief introduction of 1.Mysql distributed Cluster

MySQL cluster is a storage scheme with shared-nothing, distributed node architecture, which aims to provide fault tolerance and high performance.

Data updates use read committed isolation level (read-committedisolation) to ensure data consistency across all nodes, and two-phase commit mechanism (two-phasedcommit) to ensure that all nodes have the same data (if any of the writes fail, the update fails).

Peer nodes without sharing make update operations on one server immediately visible on other servers. Propagating updates uses a complex communication mechanism that is dedicated to providing high throughput across the network.

Load is distributed across multiple MySQL servers to maximize program performance, ensuring high availability and redundancy by storing data in different locations.

2.Mysql distributed cluster application scenario

Solve mass storage problems, such as the Mysql distributed cluster used by JD.com for B2B.

Applicable to billions of PV access to DB.

Advantages and disadvantages of 3.Mysql distributed Cluster

Advantages:

A) High availability

B) Fast automatic failover

C) flexible distributed architecture with no single point of failure

D) High throughput and low latency

E) strong scalability and support for online expansion

Disadvantages:

A) there are many restrictions, such as: foreign keys are not supported

B) deployment, management and configuration are complex

C) large disk space and large memory

D) inconvenient backup and recovery

E) when rebooting, it takes a long time for the data node to load the data to memory

Persistence Scheme of 4.Mysql distributed Cluster

Load balancing.

Manage node backups.

These are all the contents of the article "what are the distributed databases?" Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.