Three difficult problems in microservice business development-split, transaction, query (part two) 07/06 Update SLTechnology News&Howtos

Three difficult problems in microservice business development-split, transaction, query (part two)

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

In the last episode, we explained that the key obstacles to the use of micro-service architecture are domain model, transaction and query, which seem to be in natural opposition to functional split. As long as the functions are split, these three problems are involved.

Then we showed you that one solution is to implement the business logic of each service into a set of DDD aggregations. Each transaction can then only update or create a separate aggregation. Events are then used to maintain data consistency between aggregations (and services).

In this episode, we will introduce you to a new problem when using events, which is how to update aggregations and publish events atomically. It then shows how to solve this problem using event sources, which are an event-centric approach to business logic design and persistence. After that, we will address the query difficulties under the micro-service architecture. Then introduce you to a method called Command query responsibility Separation (CQRS) to achieve scalable and high-performance queries.

Reliably update status and release events

On the face of it, it seems simple to use events to maintain consistency between aggregates.

When a service creates or updates an aggregation of a database, it simply publishes an event.

However, this is only the appearance, in fact, there is a core problem: update the database and publish events must be atomic. Otherwise, there will be a situation like this: if the service crashes after updating the database but before publishing the event, the system will have an inconsistency problem.

The traditional solution is generally to use distributed transactions, a distributed transaction involving database and message broker. However, for the reasons described in the previous episode, 2PC is not a viable option.

In fact, besides 2PC, there are several ways to solve this problem.

One solution is that applications can perform updates by publishing an event to the broker of messaging middleware such as Kafka. Then a message consumer subscribes to the event, consumes the event, and finally updates the database. This approach ensures that the database is updated and events are published.

But the disadvantage is that this consistency model is too complex, at least a little complex. And the application cannot immediately read the write it just wrote.

Figure 1-updating the database by publishing events to message broker

Another approach, as shown in figure 2, is for the application to append the transaction log to the database (a.k.a.commit log), convert each recorded change into an event, and then publish the event to the message broker. An important benefit of this approach is that the application itself does not require any change.

One drawback, however, is that this approach is a low-level event, not an upper-level business event. It may be difficult to reverse upper-level business events (due to database updates) from underlying changes to rows in the table.

It can be difficult toreverse engineer the high-level business event-the reason for the databaseupdate-from the low-level changes to the rows in the tables.

Figure 2-append database transaction log

The third solution, shown in figure 3, is to use database tables as a temporary message queue. When a service updates an aggregation, it insert an event to the EVENTS table as part of a local ACID transaction. A separate process then polls the EVENTS table and publishes the event to the message broker.

The advantage of this approach is that service can publish high-level 's business events.

The disadvantage is that this approach is error-prone, which is potentially possible because the event publishing code must be synchronized with the business logic.

Figure 3-using database tables as message queue

The above three methods all have typical shortcomings.

The practice of posting an event to message broker and updating it later always does not provide a read-your-writes consistency, that is, it can only guarantee final consistency.

Append transaction logs provide consistent reads, but do not publish advanced business events.

Using database tables as message queue provides consistent reads and can publish high-level business events, but

But it depends on the developer, that is, the developer has to remember to add the logic of the release event when the state changes.

Fortunately, we have another solution, which is event sourcing, the event source. It is an event-centric approach to persistence and business logic, called an event source. The explanation here is not clear enough. It will be expanded slowly later.

Using event sources to develop micro services

Event source (Event sourcing) is an event-centric persistence method. This is not a new concept.

I first learned about this concept more than five years ago, and have been curious about this new thing since then, until I started developing microservices. Next, you'll see how good it is to implement an event-driven micro-service architecture through event sources.

A service uses a series of events through event sourcing to persist each aggregation.

When an aggregation is created or updated, the service stores one or more events in the database. The way event is stored in the database can be called event store, which we call the event database.

It implements the current state of the update aggregation by loading these events and replay them.

In functional programming, a service reconstructs aggregations rather than events by executing a functional fold or reduce.

Since events are states, you no longer have the problem of updating states and publishing events atomically.

For example, such as order Service (Order Service). Instead of storing each order as a row in the ORDERS table, each order is aggregated as a series of events, such as the order has been created, the order has been approved, the order has been shipped, and so on, persisted into the EVENTS table. Figure 4 shows how these events are stored in the SQL-based event database (event store).

Figure 4-using an event source to persist an order

Meaning of each column:

Entity_type and entity_id-uniquely identify an aggregation

Event_id-event ID, unique identification

Event_type-event type

Event_data-Serialized JSON representation of event attributes

Some events contain large amounts of data. For example, the order creation (Order Created) event contains the full order, including its order items, payment information, and delivery information. Other events, such as the order shipment (Order Shipped) event, contain little or no data and simply indicate a state transition.

Event source (Event Sourcing) and publish event

Strictly speaking, the event source simply persists aggregators as events. More directly, it is to use event sources as a reliable event publishing mechanism. Saving an event is an inherent atomic operation that ensures that the event database (event store) passes the event to the interested service.

For example, if events are stored in the EVENTS table shown above, subscribers can simply poll the table to find new events. The more complex event database (event store) will use a different approach with higher performance and scalability. For example, Eventuate Local uses the method of appending transaction logs. It reads events inserted into the EVENTS table from the MySQL replication stream and publishes them to Apache Kafka.

As for what the heck is Eventuate Local? You can search github. Put a picture below:

Using Snapshot to improve performance

Order (Order) aggregation has relatively few state transitions, so it has only a small number of events.

Therefore, it is efficient to query the event database (event store) against these events and ReFactor the Order aggregation. However, some aggregations have a lot of events. For example, a customer (Customer) aggregation may have a large number of reserved credit (Credit Reserved) events. Over time, loading and consuming (fold) these events will become less and less efficient.

A common solution is to keep a snapshot of the aggregated state (snapshot) on a regular basis. The application restores the state of aggregation by loading the most recent snapshot and starting with those events that occurred after the snapshot was created.

In functional form, the snapshot is the initial value of fold. (original: In functional terms, the snapshot is the initial value of thefold. If the aggregation is a simple, easy-to-serialize structure, the snapshot can simply be in JSON serialization format. More complex aggregations can be snapped using Memento mode (Mementopattern). As for what exactly this design pattern is, you can look it up for yourself.

The customer (Customer) aggregation in the online store example has a very simple structure: customer information, their credit line (credit limit), and their credit reservation (credit reservations).

A snapshot of a customer (Customer) is simply a JSON serialization of their state. Figure 5 shows how to recreate a customer (Customer) from a snapshot corresponding to the status of the customer (Customer) of event # 103. Customer Service (Customer Service) only needs to load snapshots and events that occur after event # 103.

Figure 5-using snapshots to optimize performance

Customer Service (Customer Service) recreates that customer (Customer) by deserializing the snapshot's JSON and then loading and consuming events from # 104 to # 106.

Event source implementation

The event database (event store) is a mixture of database and message borker. It is a database because it has an API for inserting and retrieving aggregated events through the primary key. The event database (event store) is also a message broker because it has an API for subscribing to events.

There are several different ways to implement an event database (event store).

One way to do this is to write your own event source framework. For example, you can persist events in RDBMS. A simple, but slightly lower-performance way to publish events, and then subscribers poll the EVENTS table of events.

Another approach is to use a dedicated event database (event store), which usually provides richer features as well as better performance and scalability. Greg Young, one of the developers of the event Source, has a .NET-based open source event database called Event Store. Lightbend, formerly known as Typesafe, has a micro-service framework called Lagom, which is based on event sources. Here is a recommendation for my own startup project, Eventuate, an event source framework for micro services. You can think of it as a cloud service, or you can think of it as an open source project based on Kafka or RDBMS.

Advantages and disadvantages of event sources

Event sources have both advantages and disadvantages.

One of the main advantages of an event source is that it can reliably publish events when the state of the aggregation changes. It lays a good foundation for event-driven micro-service architecture. Also, because each event can record the identity of the user who made the change, the event source provides an accurate audit log. Event flows can be used for a variety of other purposes, including sending notifications to users, application integration, and so on.

Another benefit of the event source is that it stores the entire history of each aggregation. You can easily implement temporal queries that retrieve aggregated past states. To determine the state of the aggregation at a given point in time, you only need to fold the events that occur up to that point. For example, you can directly calculate the customer's available credit at some point in the past.

The event source also avoids the problem of O / R impedance imbalance. This is because it persists events rather than aggregates. Events usually have a simple, easy-to-serialize structure. A service (service) can take a snapshot of a complex aggregation by serializing records of its state. The Memento pattern adds an intermediate layer between aggregation and its serialized representation.

For more information about Ogamar impedance mismatch:

Object-relational impedance imbalance (object-relational impedance mismatch) is a set of conceptual and technical difficulties that are often encountered when relational database management systems (RDBMS) are served by applications (or applications) written in object-oriented programming languages or styles, especially because object or class definitions must be mapped to database tables defined by relational schemas.

Of course, the event source is not perfect, it also has some shortcomings. It's a completely different programming model and you may not be familiar with it, so it takes some time to learn. In order for existing applications to use event sources, you have to rewrite business logic. Fortunately, this is a rather mechanical transformation, which you can do when migrating applications to microservices.

Another disadvantage of event sources is that message broker usually guarantees at least one at-least once delivery. Non-idempotent event handling handler must detect and discard repeated events. The event source framework can solve this problem by assigning a monotonously increasing id to each event. Event handling handler can then detect duplicate events by tracking the maximum event ID.

Another limitation of event sources is events (and snapshots! ) will evolve over time. Because of the permanent storage of events, when the service rebuilds the aggregation, the service may need to collapse the events corresponding to multiple schema versions. One way to simplify services is to convert all events to the latest version of the schema when the event source framework loads them from the event database (event store). Therefore, the service only needs to fold the latest version of the event.

Another disadvantage of event sources is that it can be difficult to query the event database (event store). Let's imagine, for example, that you need to find a customer with a lower credit line. You can't simply write SELECT * FROM CUSTOMERWHERE CREDIT_LIMIT? Because there is no such column as CREDIT_LIMIT. Instead, you have to use more complex and possibly invalid queries of nested SELECT to calculate credit lines by processing and consuming (fold) events. To make matters worse, NoSQL-based event databases (event store) usually only support lookups based on primary keys. Therefore, the query must be implemented using the command query responsibility separation (CQRS) method. The full name of CQRS: Command Query Responsibility Segregation.

Our next content is to introduce CQRS.

Using CQRS to implement query

Event source is the main obstacle to achieve efficient query in micro-service architecture. This is not the only problem, but also, for example, you use SQL to find new customers for high-value orders.

SELECT * FROM CUSTOMER c, ORDER oWHERE c.id = o.ID AND o.ORDER_TOTAL > 100000 AND o.STATE = 'SHIPPED' AND c.CREATION_DATE >?

In a micro-service architecture, you can't join CUSTOMER and ORDER these two tables. Each table is owned by a different service and can only be accessed through the API of that service. You cannot write traditional queries that connect tables owned by multiple services. Event sources make things worse, preventing you from writing simple, direct queries. Let's take a look at how similar queries are implemented in a micro-service architecture.

How to use CQRS

A good way to implement a query is to use an architectural pattern called Command query responsibility Separation (CQRS): Command Query Responsibility Segregation. As the name suggests, CQRS divides the application into two parts. The first part is the command side (command-side), which processes commands (for example, HTTP POST,PUT and DELETE) to create, update, and delete aggregates. The premise is that these aggregations are implemented using event sources. The second part of the application is the query side (query-side), which processes the query (such as HTTP GET) by querying one or more materialized views (materialized views) of the query aggregation. The query side keeps the view and aggregate synchronized by subscribing to events published by the command side.

Query-side (query-side) views can be implemented using any type of database that meets the requirements. Depending on the requirements, the query side of the application may use one or more of the following databases:

Table 1. Query side view database selection

In many cases, CQRS is an event-based (event-based) synthesis, such as using RDBMS as a recording system and using such as Elasticsearch to handle text queries. The query side of CQRS can use other types of databases and support multiple types of databases, not just text search engines. Moreover, it updates the view on the query side in quasi-real time by subscribing to events.

Figure 6 shows the CQRS pattern applied to the online store example. Customer Service (Customer Service) and order Service (Order Service) are command-side services. They provide API for creating and updating customers and orders. Customer View Service (Customer View Service) is a query side service. It provides an API for querying customers.

Figure 6-using CQRS in an online store

Customer View Service (Customer View Service) subscribes to customer (Customer) and order (Order) events published by the command-side service. It updates the view store (view store) implemented in MongoDB. The service maintains a collection of MongoDB documents, one for each customer. Each document has the property of customer details. It also has the property to store the customer's most recent order. This collection supports a variety of queries, including those mentioned above.

Advantages and disadvantages of CQRS

CQRS has both advantages and disadvantages. One of the main advantages of CQRS is that it can implement queries in micro-service architectures, especially those that use event sources. It enables the application to effectively support a different set of queries. Another advantage is that the command side is separated from the query side to achieve decoupling.

CQRS also has some drawbacks. One drawback is that additional work is needed to develop and maintain the system. You need to develop and deploy query-side services that update and query views. Also, you need to deploy the view database (view store).

Another disadvantage of CQRS is that it handles the "lag" between the command-side and query-side views. There is a certain delay in the query layer compared with the command side. A client application that updates the aggregation and then immediately queries the view may see the previous version of the aggregation. Therefore, some techniques must be used to avoid exposing these potential inconsistencies to users.

Summary

The main challenge when using events to maintain data consistency between services is to update the database and publish events at the atomic level. The traditional solution is to use distributed transactions across databases and message broker. However, 2PC is not a feasible technology for modern applications. A better approach is to use event sources, which is an event-centric approach to business logic design and persistence.

Another challenge in the microservice architecture is query. Queries typically require data owned by multiple services in join. However, join can no longer be used because the data is private to each service. The use of event sources also makes it more difficult to implement queries efficiently because the current state is not explicitly stored. The solution is to use the command to query separation of responsibilities (CQRS) and maintain one or more materialized views of aggregates that can be easily queried.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.