Review of Yunqi practical information | the top NoSQL members of the industry sit in array, and the special session of NoSQL database focuses on analysis! 12/24 Update SLTechnology News&Howtos

Review of Yunqi practical information | the top NoSQL members of the industry sit in array, and the special session of NoSQL database focuses on analysis!

2025-12-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

NoSQL database as one of the most important components of the database market, its every move affects thousands of enterprises. This special session invited the top NoSQL core members of the industry to look forward to the future of NoSQL database with you, and the technology giants of Alibaba, MongoDB, Redisson, Douyu and other companies shared the enterprise-level characteristics and industry solutions of Aliyun NoSQL database with you. Technical Analysis of Redis & MongoDB Cloud Database Technical Director of the Database Product Division of Aliyun Intelligent Business Group, MongoDB China user Group Hangzhou user Association (Ye Xiang) deeply analyzed the technology of Redis and MongoDB cloud database. Redis enterprise database service Redis as an enterprise database needs to pay attention to four aspects:  distribution: the need to meet the needs of rapid growth and cost reduction of enterprises, to achieve flexible expansion, and from master-slave mode to cluster mode.  compatibility: compatibility is an eternal topic, and even if 100% consistency cannot be achieved, it needs to be infinitely close.  security audit: security is becoming more and more important in the cloud environment, and the security audit capability of the open source version of Redis is relatively weak, which is enhanced by Aliyun Redis.  data synchronization: need to be able to support hybrid cloud deployment, enable third-party cloud vendors, IDC and Aliyun to achieve interoperability, as well as data migration and transformation, to meet customers' flexible decisions on or under the cloud. Redis's native Cluster architecture uses the Gossip protocol to synchronize routing tables, but this architecture is not rapidly popular in the community and enterprises. Although it has the advantages of no central architecture and less component dependence, there are also many problems, such as difficult operation and maintenance, uncertain routing, relying on Smart Client, not supporting Multi-Key and migrating from master-slave mode to cluster mode, which makes it difficult to upgrade.

In order to solve the above problems, Aliyun Redis database does not adopt Gossip protocol, but introduces two new components: Proxy and Config Server. Aliyun Redis uses a configuration center to manage routing table information, which can be intelligently scheduled through Config Server, while Proxy can be compatible with non-Smart Client, support Multi-Key, and achieve traffic management and read-write separation. Proxy and Config Server bring architectural complexity, but managing large-scale complex architectures is what cloud vendors are good at. In addition, the additional costs caused by these two new components will also be flattened. Through such a cloud service architecture, users can seamlessly migrate Redis from the master-slave architecture to the cluster version.

With the extension of Redis Cluster cloud service architecture, a new concept-Redis cloud database enterprise distributed matrix has emerged. The matrix can be expanded vertically and horizontally, and vertically can be sliced with the addition of Shard to achieve flexible expansion; horizontally, it can achieve read-write separation, and do Group packet isolation. From a global point of view, it also supports Memcache and Redis protocols, and can achieve smooth switching between cluster and master / standby. Aliyun Redis's Proxy introduces the concept of Connection Session, which enables more fine-grained management of Connection and long connection reuse through connection pooling, which is not only compatible with a variety of protocols, but also improves the performance of short connections through C language high-performance code. Aliyun Redis's Proxy also has a hot upgrade capability, which ensures that the version can be upgraded without interruption of service. Aliyun Redis has carried out layer-by-layer encryption on the entire data link, supporting SSL, whitelist, rights management, as well as the prohibition and audit of key commands, thus enhancing the security audit capability of Redis. Redis also provides some free open source tools, such as synchronization tool RedisShake and data validation tool RedisFullCheck. As a memory-based cache service, Redis also has many challenges, such as limited capacity, high cost, weak persistence and so on. Based on the above problems, Aliyun provides a Redis version of hybrid storage, which aims to provide users with persistent and secure Redis services. Its implementation depends on the underlying RocksDB, and makes the memory within the controllable range by constantly synchronizing hot and cold Key.

MongoDB enterprise database service MongoDB needs to pay attention to four aspects as an enterprise database, namely, security audit, backup and recovery, data synchronization and flexible scaling. The security audit of MongoDB is basically the same as that of Redis, further adding TDE encryption. MongoDB adds physical backup, which greatly improves the efficiency of backup and recovery, and enables data to be restored to any point in time through incremental backup capabilities. In addition, Aliyun MongoDB also provides backup verification capability on the basis of backup.

Aliyun MongoDB also provides diagnostic analysis capabilities and MongoShake tools to synchronize data. Aliyun MongoDB implements a shared storage solution based on RocksDB engine, which can achieve storage flexibility, add read-only nodes in seconds, and solve the problem of oplog global locking. Of course, such a solution also faces several challenges, such as compatibility with WiredTiger, performance jitter caused by Compaction, and shared storage latency stability. The implementation of data center technology based on MongoDB Tang Jianfa, chief architect of MongoDB Greater China, introduces the implementation of data center technology based on MongoDB. If the enterprise business needs to dock different customer data, and the structure and type of the data are different, it may take weeks or even months. Many existing solutions are to realize the data unification platform, and extract all the data to the data platform through ETL. The commonness of this way is that the data sets are collected in batches, made into data sets, and downloaded interactively. However, there are some problems in this method, such as platform data lag, slow response speed, rough interaction mode and so on. From the technical point of view, the data center is defined as "data unified platform + data as a service capability". The data comes from the business and needs to be collected in accordance with the "Tunable 0" mode to provide timely data. The data needs to be delivered as a service of API, not packaged. This enables the data center to support the operational system on which the enterprise depends for survival. Compared with the analytical business, the operational business is more core and more able to improve the competitiveness of enterprises, which is also the reason why the data center is so popular. The definition of data center is a data service technology platform which includes enterprise real-time global data and is mainly oriented to operational business applications. Its concept originated in China, and there are many schools with different opinions. The consulting company said that the data center is a change in organizational structure, while the solution provider said that the data center is a technology platform product like Hadoop, and different organizations have different starting points. 97% of China's small and micro enterprises are basically irrelevant to the data center. The 1.2 million large and medium-sized enterprises with 3 per cent waist may have a lot of developers but no data experts, as well as a small number of head enterprises. For large and medium-sized enterprises at the waist, there may not be many systems, and the data team basically does not have it, so it is impossible to quickly build a perfect data center, but the pain points of the data isolated island, the need for data access and rapid development are real. These enterprises can choose a technology-based architecture, with specific capabilities to consider, including data aggregation, data governance and modeling, data API services, and the most critical storage, massive, multi-mode, and high performance. RDBMS, MPP, Hadoop, NoSQL and NewSQL data technologies have their own strengths and weaknesses, which can also be used for reference in the construction of CCTV. Enterprises need to consider them according to their own actual situation.

Previously, MongoDB was not a good choice for big data offline analysis, but more for business scenarios. The data center is oriented to business application scenarios, so MongoDB has become a good choice. It has strong horizontal automatic expansion ability, supports multi-mode polymorphism, and is API-friendly. In addition, the work of modeling based on MongoDB is much less than the traditional way, which can reduce the cost. In addition, MongoDB also has the necessary capabilities of data acquisition, visual modeling, non-coding API, data visualization and so on. The following figure shows a relatively complete reference architecture of MongoDB data solution, which consists of three layers: collection, storage processing and data service from bottom to top.

Building a data center based on MongoDB has several core advantages, namely, seamless scale-out, multi-type structured data model, logical model, namely storage model, heterogeneous real-time database synchronization, fast API publishing without code, and simplicity, light weight and speed. Design and practice of picture database GDB Zhu Guoyun (Zong Dai), a senior technical expert of Alibaba, shared the design and practice of Alibaba map database GDB. What is a graph database? a graph database is a database designed for graph structure, not a picture database. What is the graph structure? This is introduced by taking the social network model as an example. In this model, there are relationships between people and people, between people and forums, between people and posts, and between posts and forums. People, forums and posts belong to the points in the graph (that is, Vertex). The relationship between points is called edge (that is, Edge), and there will be some attributes (that is, Property) on the points and edges. Today, some excellent social applications will store multi-dimensional data in a unified graph space for storage, query and analysis to bring a better experience for users. In recent years, with the increasing amount of data, the data dimensions are also gradually increasing, and the graph database is born under this background. As the fastest growing category in the database field in recent years, what is the difference between graph database and relational database? In general, the graph database can be shown more intuitively and naturally in the model that can be achieved by building seven or eight tables in the relational database. In addition, the graph database can make related queries faster, and it can also provide more capabilities of exploration and discovery. The above mentioned is the attribute graph model, and there is also a RDF model in the field of graph data. The main difference between the two is that the points and edges of the RDF cannot have attributes.

The development of graph database is very fast, so there are many kinds, which can be divided into four categories, namely, knowledge graph / RDF, analysis graph, graph database and multi-model graph database. There are roughly three kinds of mainstream query languages used in these graph database systems, namely, Cypher, which is the earliest SQL-like query language promoted by Neo4j, SPARQL, a description language used on RDF, and Gremlin, which supports the most extensive query language based on attribute graphs. What is a graph database GDBGDB is a graph database, which mainly deals with the storage and query of highly connected data. It supports the attribute graph model and the open source TinkerPop Gremlin query language. Unlike other databases, GDB is a cloud native database, built on Ali Cloud infrastructure from the very beginning, so it can be flexible, real-time and highly reliable. GDB was born out of the TairGraph subsystem in Tair Service and later hatched out and placed on Aliyun to focus on solving problems in highly connected data scenarios. Based on the technical foundation of Tair for 10 years, GDB has implemented a highly optimized self-research engine, which can achieve real-time updates and second-level queries, and fully support ACID transactions, and ensure high reliability through multiple copies. In addition, it also achieves the high availability of services, which can achieve rapid node failure transfer; easy operation and maintenance, which provides the ability to be used out of the box; visualization, which is more conducive to the analysis of the internal relationship of data. At the architectural level, GDB provides customers with exclusive instances, which means that resources are independent and there is no need to worry about preemption. HA adopts the most classic active and standby architecture and provides read-only nodes to improve real-time query capability. GDB supports Gremlin's open source TinkerPop SDK. In order to achieve million-level point-edge filtering per second, GDB customizes its own graph-friendly database engine, and does a lot of optimization in query optimization and parallel execution, as well as supporting transactions and automatic indexing. In the data channel section, GDB also provides efficient import support for a variety of data sources.

Scenarios and cases of GDB nowadays, GDB is widely used in social networks, financial fraud detection, real-time recommendation engine, knowledge graph and network / IT operation scenarios, and these scenarios are often intertwined. GDB can be used to achieve real-time or quasi-real-time scenes that have previously deviated from the line. To sum up, today, when there are more and more data dimensions and data are more and more closely related to each other, GDB provides an effective way of graph storage, which can connect multi-dimensional data well, and mine the hidden value of data real-time and intelligently through graph query and graph algorithm. From Java to cloud native, Redisson keeps exploring Redisson co-founder Gu Rui to share Redisson's exploration path from Java to cloud native. Redisson is a Java resident memory data grid based on Redis. Redisson is provided to everyone in the form of Java interface rather than command, and it is very easy to use. Its advantage is that it is easy to use, as long as you can use Java, you can basically use Redisson. In addition, Redisson avoids the problem of multithreading, adopts thread-safe design, and introduces the management of thread pool and connection pool, which can choose a suitable way in both synchronous and asynchronous scenarios. In addition to being easy to use, Redisson also provides a variety of functional options, supporting 31 distributed collections, 14 distributed objects, 8 distributed locks and synchronizers, and 5 distributed services. The architecture of Redisson is mainly divided into two parts, including the basic functions of Redisson client connection management and protocol resolution, and advanced functions including distributed architecture, distributed middleware and third-party function support. From the perspective of Redisson architecture, it seems to conflict with the concept of Redis. The design concept of Redis emphasizes simplicity, while the design of Redisson is more complex; Redis provides nine data structures with clear boundaries, while Redisson provides about 60 with vague boundaries; Redis faces users in the form of commands, while Redisson faces users in the form of Java API. It seems to go their separate ways, but in fact, they all go the same way in order to hide the complexity and provide a simple way of use to the user. Supporting only Java is both an advantage and a disadvantage of Redisson. Java is a cage for Redisson, which is an advantage for application developers and a disadvantage for library developers. Therefore, Redisson has been thinking about how to get out of the predicament and embrace other ecology. In 2016, Redisson first tried to use the Vert.x framework. Vert.x is characterized by a cluster running environment, multilingual interaction and based on mature technologies, and Vert.x has fewer restrictions on developers. Therefore, Redisson has done relevant experiments to realize the operation of Redisson in other languages. However, the learning cost of this scheme is very high, but the actual benefit is not high. In 2018, Redisson noticed that at the bottom of ORACLE Labs's GraalVM,GraalVM is the Java runtime, including GraalVM and SubstrateVM, which allows other languages to be compiled and merged and executed in JVM, while ensuring a bridge between them. SubstrateVM is the most attractive point of Redisson, which can be understood as an embedded virtual machine written in Java, making true cross-platform and cross-language possible.

As a result, Redisson began the "escape road" and implemented redisson-native. Comparing the performance of Java, Java+Warm UP and Native, we can see that the performance of redisson-native has obvious advantages.

Therefore, this shows that running away from Java through SubstrateVM is a very good solution, without considering JNI and other related problems, most operations can be completed only with Java, the learning cost is low, and there is no need to install a separate JVM, the generated files are small, the performance is high in cloud native cases, and C calls are very simple. By extending it, Redisson can be brought into the original binary state and re-encapsulated, which can be realized everywhere. Big data Storage and processing based on Enterprise HBase Technical Director of Database Product Division of Aliyun Intelligent Business Group, Apache HBase PMC Shen Chunhui (Tianwu) shared big data storage and processing based on enterprise HBase. Entering the era of big data, the amount of data is more and more, and the types of data are more and more abundant. It is easy to understand that there is a large amount of data, but the rich variety of data can be seen from three dimensions: from the static dimension, more and more digital devices can be used; from the dynamic dimension, more and more devices and services are running; in addition, data reprocessing has produced new data, making the data endless. In the face of so much quantity and variety of data, if there is no value, it is all in ruins. Looking back over the past decade, people's understanding of the value of data is getting stronger and stronger, and the data are more and more applied to every scene of life. With the application of data, the system will face many challenges. Big data put forward "4V". Specifically, for developers, a very large amount of data means that the system needs high scalability; a very rich variety of data means that the system needs high flexibility and can well carry new types of data generated anytime and anywhere; the timeliness of data means that the system has high real-time performance and the ability to make data online. Data value means that it needs to be commercialized, and the system needs to reduce the cost of data storage and computing. More than ten years ago, Google first encountered the problem of big data, so he published a Big Table paper. HBase is based on the highly reliable, high-performance, scalable open source big data NoSQL system designed in this paper. HBase abandons the support for relational database transactions and focuses on building scalability, flexibility, real-time response and low-cost storage for large amounts of data.

Alibaba began to investigate HBase in 2010, and now it has been nearly ten years. With the gradual exploration over the past decade, Alibaba has also enriched the use of HBase, such as messages, orders, Feed streams, monitoring, large screen, trajectory, device status, AI storage, recommendation, search, BI reports and so on. Alibaba's own use of HBase has reached a very large volume and scale, but also has a lot of accumulation and precipitation in the product, forming the architecture of today's cloud HBase+X-Pack. Relying on HBase database alone cannot solve complex problems in business scenarios, so X-Pack extends computing, retrieval and multi-models based on cloud HBase, including Spark, Phoenix, Solr and OpenTSDB, forming a stable, easy-to-use and low-cost one-stop big data NoSQL platform.

The cloud HBase+X-Pack architecture implements low-cost data storage, runs HBase on top of OSS, and allows the overall interface model to reuse HDFS capabilities. At the same time, it overcomes the problem of OSS in the file-oriented scenario, and uses the original object-oriented storage system as a similar cloud disk, which reduces the storage cost by 3 to 7 times. In addition, it also achieves integrated cold and hot separation based on HBase, and makes the business unaware. In addition to low-cost storage, Aliyun HBase has also invested a lot of energy to optimize performance. Compared with the open source version, Aliyun HBase has a great improvement in various performance indicators. Behind this is continuous optimization, such as transforming the three copies of HDFS Pipeline logs into LLC mechanism, and changing serial to parallel; changing the original serial lock acquisition method into parallel; and achieving 10 times Java GC optimization. Finally, HBase belongs to the field of big data and must be combined with many components, so ease of use is the most urgent need. Aliyun HBase realizes the data linkage of HBase and Spark and the efficient integration of online and offline. In addition, Ali also provides a set of easy-to-use data migration system, which can achieve smooth online relocation. Aliyun HBase has been greatly improved in terms of stability, ease of use, performance and cost. In the future, Aliyun HBase will further reduce costs through technologies such as shared block storage, will also launch Serverless capabilities, and will speed up computing and reduce costs through new hardware. Douyu TV evolution of hybrid cloud architecture from 0 to 1 Douyu Technical Director Ma Yong shared Douyu TV's evolution of hybrid cloud architecture. Douyu TV, founded in 2014, is a live broadcast platform based on games and competitions. the platform has signed up about 50 domestic Top100 anchors, covering 8 game anchors in Top10, with 150 million monthly active users and about 6 million Q1 paying users in 2019. Douyu has three main business features: hot spot effect of head VJ, large fluctuation of traffic and water level, and more online interaction scenarios. The current technical status is that the daily service transfer volume is about 100 billion, the Redis instance cluster is more than 2000, and the single interface QPS is more than 200000. Douyu TV has maintained an annual growth of more than 25% of monthly active users since 2016. at present, there are three main technical difficulties: (1) "fried fish", head flow drags the whole station room; (2) the utilization rate of server resources is low, and a large number of servers are idle at daily water level; (3) the cost of Redis maintenance and disaster recovery is high. Douyu hybrid cloud architecture process is mainly divided into three stages, in the exploration period to make an independent business cloud attempt; in the growth period through the IDC+ cloud to achieve horizontal traffic expansion; in the mature period to achieve horizontal expansion to maximize the use of resources.

The main background of the exploration period is that IDC hardware resources are in a state of long-term shortage, R & D support can not keep up with business development, and the public cloud is gradually mature. Therefore, at this stage, Douyu tentatively selected the advertising business as the cloud pilot, and achieved greater benefits, the throughput of the system rose in a straight line, the stability of dependence increased significantly, and the computing cost also decreased significantly. However, the scope of application of this model is too narrow to copy directly to other business scenarios, and this model only applies to the case of a single data center, so it enters the growth period. The background of the growth period is the need to solve the problem of building data channels from IDC to the public cloud. To solve this problem, Douyu and Aliyun jointly built a RedisShake data synchronization tool, which supports full and incremental data synchronization of Redis, synchronization of different data centers on and under the cloud, and data monitoring in seconds. The multi-dimensional comparison of data is realized through RedisFullCheck, which can basically ensure the data consistency of the data path. The benefit of this stage lies in the realization of the data expansion process from a single computer room to multiple computer rooms. At this stage, there are two points that need to be improved, namely, the high cost of resource scheduling and the lack of fine operation of resources. The main optimization direction in the mature period is responsibility separation and elastic scaling, and the optimization scheme includes four aspects, namely, traffic classification, data hot and cold separation, elastic scaling and traffic scheduling. The scheduling strategy includes manual scheduling, timing scheduling, resource consumption scheduling and Hook scheduling. For hybrid cloud architecture, Douyu also summed up three experiences:  fully and reasonably evaluated: cloud computing network is quite different from IDC, so it needs to be tested according to the actual business situation to avoid impact.  input-output ratio: hybrid cloud architecture has certain requirements or negative impact on resource redundancy.  latency problem: enterprises should decide whether to build a hybrid cloud by evaluating the importance of the business. Although there is a dedicated line from the data center to the cloud, there is also a certain delay. Analysis of Cassandra&X-Pack Spark Cloud Database Technology Cao long (Fengshen), a senior technology expert in Aliyun Intelligence, analyzed Cassandra and X-Pack Spark cloud database technology for everyone. Why choose Cassandra? Cassandra is a database with no center at all, and each node is the master node. If Kill drops any of these nodes, it will not affect the QPS and latency of the cluster. In addition to the P2P-QUORUM mechanism used by Cassandra, there are also HA mechanism, Raft, and single memory copy + shared storage mechanism, while only Cassandra can achieve almost no time awareness, so the Slogan of Cassandra is "Always Online".

Cassandra can achieve smooth expansion, on the one hand, it can increase the amount of node data, and even expand multiple DC. On the other hand, you can also increase memory on the cloud. Smooth expansion is an important feature of Cassandra, which is often difficult for other databases to achieve. Cassandra can also implement global multi-DC, and architects can adapt freely according to their business. In terms of learning costs, Cassandra provides CQL similar to SQL statements, and DBA of MySQL or developers can basically learn Cassandra in a day. In terms of security, Cassandra, like mainstream databases, provides a sound authentication and authentication system. In terms of multi-language, Cassandra adopts non-Thrift mode, adopts client-side and server-side direct connection mode, supports mainstream languages, and has good performance. The last point is that the operation and maintenance is simple. Cassandra has only one process as a whole, and there are no Proxy, HA, ZK and other role nodes. Cassandra has many functions, especially its index supports materialized view, SASI full-text index, and integrates Lucene for stronger full-text index, and supports CDC peer-to-peer streaming system.

Cassandra is rich in functions and ecology, and it can be matched with other components, such as Spark, Kafka, ES, Lucene, RocksDB and so on. Cassandra ranks first in the world in the field of wide table, even in the absence of domestic publicity. The development of Cassandra has gone through a decade, and it combines the strengths of AWS's DynamoDB and Google's BigTable. Alibaba also publicly tested and released Aliyun Cassandra database service in 2019, and improved the native Cassandra in many aspects, such as automated operation and maintenance, compatibility with DynamoDB, 100% improvement in link optimization performance, and so on. To sum up, cloud database Cassandra is an online and reliable NoSQL adjustable and consistent distributed database service, which supports SQL-like syntax CQL, provides powerful distributed indexing capabilities, and provides enterprise-level capabilities such as security, multi-active disaster recovery, monitoring, backup and recovery, and is compatible with DynamoDB protocol. X-Pack Spark supports not only Cassandra, but also HBase, Phoenix, RDS, and MongoDB. X-Pack Spark not only has strong connectivity and archiving capabilities, but also reduces computing and storage costs through ElasticNode. Cassandra+Spark can be used in a wide range of business scenarios, such as user profiles, Feed, small object storage and recommended platforms. To sum up, combining the advantages of Spark and Cassandra can meet the needs of a variety of business scenarios, and can achieve Always Online, strong scalability, easy to use, functional and ecological rich, as well as Spark data closed loop.

The original link to this article is the original content of Yunqi community and may not be reproduced without permission.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.