How to select a database for microservices 07/02 Update SLTechnology News&Howtos

How to select a database for microservices

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces how to select a database for micro-services, which has a certain reference value. Interested friends can refer to it. I hope you will gain a lot after reading this article.

Your microservice architecture requires multiple data models. Should you choose mixed persistence or multi-model databases?

In the past decade, large-scale distributed systems have exploded. This trend has led to a great creativity in the database field, which is undoubtedly unprecedented in the history of the software industry. The result is a healthy and competitive database market that allows us to get what we need on a large number of platforms. But how should we choose?

In this article, we will explore how to select a verified database schema based on the application. (yes, there can be more than one choice!), we will also see that the choice of data schema can help determine which technologies will be selected in the data layer.

Cloud architecture, NoSQL and micro-services architecture

As developers begin to create scalable Web applications, relational databases, which historically dominate data architectures, begin to show a lot of pressure. We developed very popular social applications and began to connect more and more devices to the Internet of things (IoT). A large number of users read and write data, resulting in the need to expand the data layer, resulting in the emergence of a new type of database to meet these high scalability requirements.

In many cases, these new database "NoSQL" or "non-relational" solutions are based on a different data model from the traditional relational database model. NoSQL databases include document type, key-value pair type (key-value), column database and even graph database. In general, these databases sacrifice some of the common features of relational databases, such as strong consistency, ACID transaction features, and join connections.

At the same time, like the change of database technology, SOA (Service-oriented Architecture) at the beginning of this century is gradually evolving into a micro-service architecture, and many enterprises are gradually abandoning the heavyweight SOA architecture such as Enterprise Service bus (ESB), and tend to use a "decentralized" architecture approach. The charm of micro-service architecture is that its development, management and extension services are relatively independent. This gives us a lot of flexibility in implementation, including infrastructure technologies such as databases.

For example, let's assume that we are doing development work for a micro-services architecture and look forward to the need for large-scale scalability. Whether this project is a new application or a refactoring of existing applications, we have the opportunity to make new choices for the database.

Hybrid persistence (Polyglot persistence)

A key benefit of the microservice architecture style is persistence encapsulation. We can choose different persistence technologies according to the needs of each service. Choosing the method of data storage according to the characteristics of each data type is called hybrid persistence, a term originally popularized by Martin Fowler et al. Mixed persistence and micro-service architecture is a perfect match.

The following figure shows a series of microservices and how we choose a different data schema for each service. I don't want to select the appropriate use case for each type of database in this article. My intention is to highlight the advantages of various types of databases and why hybrid persistence is commendable.

Among them, the team that develops Service A, because the service is a core application based on large-scale data management, may use a tabular model database such as Apache Cassandra. For example, a retail application inventory application might be suitable for using Apache Cassandra. Cassandra provides a series of coordination mechanism tools, such as tunable consistency, batch processing and lightweight transaction mechanism, which can be used as an alternative to the complete ACID transaction mechanism.

Service B supports finding values with well-known keywords, such as descriptive data for product catalogs. This is a good example for the key-value storage model, where we look up a series of data through a well-known key value, such as the product ID. Many in-memory caches use key-value pair data patterns to support large-scale fast reads.

Service C may focus on semi-structured content, such as forms or pages on a Web site, and document storage may be well suited for that type of data. There are many similarities between document storage and key-value storage, but a key difference is that document-based data supports additional structures on the data, such as indexing specific attributes to support fast retrieval.

Service D may involve complex relationship navigation between data, such as customer data and customer contact history data with various departments in the organization. This may involve relationships between data types owned by other services. This is an interesting case because it begins to be contrary to the constraints that the services mentioned above have their own data types. In this case, you can choose to create a graph for your service with read-only access to the underlying table, and then handle all changes through the "front door"-that is, through the "front door" to invoke the API of other services that "own" these data types.

* We may also have a legacy system or service that uses relational database technology, or we may have a service to manage data that is small or does not change frequently. Relational databases may be perfectly suited to these scenarios.

Should a single service use hybrid persistence?

It is also possible that we can design a service that requires multiple database support. For example, we can create a hotel service that uses the key-value storage pattern as an index, mapping between the hotel name and ID, while the store stores descriptive data about the hotel in Cassandra.

Note that name-to-ID mapping can be implemented using a normalized design approach in Cassandra, where a separate table maintains the name-to-ID mapping. This uses more storage space, but reduces the operational complexity of managing separate key-value stores.

This is what I recommend- should stick to a single data model (database) for a microservice as long as it is feasible. If you find a situation where you think that a single service requires two different databases to support, consider whether the granularity of the service may become too large. You may need to consider splitting the service into smaller services.

Tradeoff of the limitations of mixed persistence

The main disadvantage of hybrid persistence is the cost of supporting multiple technologies, both in the initial development phase and in future operations.

The main development cost is the need to train every developer to master each new database technology. This is very important, especially in teams where developers move frequently.

Another cost is the operating cost of supporting multiple databases. This can be a problem, especially when the database is centrally managed and the team must maintain a high level of mastery of a variety of technologies, but this problem is not too prominent in the DevOps environment, because the development team needs to support the database they choose in a production environment.

Multiple model database (Multi Model Databases)

As a supplement to other options or mixed persistence patterns, database vendors have begun to build and promote multi-model databases. The term "model" refers to the core abstractions provided by data stores, such as tables (relational and non-relational), column stores, key values, documents, or graphs. We can think of a multi-model application as an application that uses multiple data storage types, and a multi-model database is a database that supports multiple abstract models.

DataStax Enterprise Edition (DSE) is a typical example of a multi-model database, which at its core supports Cassandra's partitioned row storage (table) model, as well as an abstraction layer based on the graph on it (DSE diagram). It is also easy for DSE to build corresponding key values and document models on top of the core model, as shown in the following figure. In this way, we can modify the above hybrid persistence method to use a basic database engine to provide corresponding services for all of our services, while using a separate Cassandra keyspaces to maintain clear boundaries between data owned by different services.

Here are the functions it can achieve:

[list]

Table: our main application service A can deal directly with DSE database through Cassandra query language (CQL).

Key-value pair: although DataStax, the distributed version of Apache and Cassandra, does not provide explicit key-value pair API, like Service B can support individual key-value and column methods through table design to access the

Cassandra, for example:

Code

CREATE TABLE hotel.hotels (key uuid PRIMARY KEY,value text); / / or select blob type

Documentation: Cassandra supports document-style data through the use of JSON files, which can be used in Service C. Note that because Cassandra needs to define schema schemas for tables, you cannot insert any new JSON columns, which may be commonly associated with document databases.

Figure: for highly relevant data such as Service D, DSE's graph is a highly scalable graphical database built on top of the DSE database. The DSE diagram supports Gremlin API from the powerful features and expressiveness of the Apache tinkerpop project. [/ list]

Advantages and limitations of multi-model database

When considering whether to invest in a multi-model database (or the multi-model feature of the database you are already using), you should consider the same development and operating costs in hybrid persistence that we discussed earlier.

The use of multi-model databases can make operations easier. Even if different development teams use different API and different interaction patterns to deal with back-end database platforms, we only need to manage one platform, thus improving efficiency.

One of the issues to consider when selecting a multi-model database is how to support various models. A common approach is a database engine based on a single native underlying model, while other models are built on top of it. The hierarchical data model can better show the characteristics of the underlying basic model.

For example, in the 16th issue of ThoughtWorks Technology Radar, we discussed the characteristics of the DSE graph database based on Cassandra, and also mentioned the tradeoffs:

Quote

The database positioning of DSE diagram based on Cassandra is a large-scale data set, compared with our long-term favorite Neo4j began to show some limitations. This is a trade-off; for example, you will lose the transactional nature of ACID and the schema freedom of the Neo4j runtime, but you will have access to the underlying tables of Cassandra, integration of analysis workloads and Spark, and a powerful TinkerPop/Gremlin query language, which is indeed an option worth considering.

If you consider the various data types in Web applications, you may find that different data types have different requirements for consistency, and the number of data types that actually need immediate consistency is relatively small.

The ThoughtWorks point of view quoted above also mentions the integration and interaction of - between different models and data engines, as well as use cases for various operations and analysis to access data, when considering another important factor in multi-model databases. DSE supports accessing graph data for data analysis through Spark (DSE Analysis), and the DSE search engine provides the ability to create various query indexes for the data in the DSE database.

Four steps of microservice data model operation

Now that we have discussed the advantages and disadvantages of hybrid persistence and multiple models, how should we decide which data models are suitable for large-scale scalable micro-service applications? You can follow these steps:

Identify the main data types in your application, create a service for each of them, and let each service control the corresponding persistence layer. Where possible, multiple model databases are used for all services, allowing the services to be different in the model that interacts with the data.

Use Tabular (such as DSE database) as the main model for network-level scalability and availability, and then build hierarchical key-value pairs and document data models on top of them as needed. It is important to consider various ways to access data in operational and analysis use cases in order to plan ahead how features such as search indexing and replication will be used in the data analysis center.

Use the graph method to represent (that is, DSE diagram) highly related data, especially when the relationship between entities has many or more attributes than the entity's own attributes, or when it is necessary to capture many-to-many relationships between the same entities.

Retain legacy investments in relational database technology without the need for change. For example, when your case requires large-scale, low latency, and high availability, use traditional relational databases.

Thank you for reading this article carefully. I hope the article "how to choose a database for micro services" shared by the editor will be helpful to everyone. At the same time, I also hope that you will support and pay attention to the industry information channel. More related knowledge is waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.