Internet architecture: a tried and tested architecture troika 07/06 Update SLTechnology News&Howtos

Internet architecture: a tried and tested architecture troika

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

The three carriages mentioned here refer to micro-services, message queues and timing tasks. As shown in the figure below, here is the architecture of a three-dimensional Internet project driven by a troika.

The dotted boxes on the picture all represent that the module or project does not contain too much business logic and is purely skinny (will invoke the service but will not touch the database). The arrows of the black lines represent dependencies, and the green and red arrows are the direction of MQ's send and subscribe message flows, respectively. The details will be explained in more detail later.

Micro service

Microservice is not a very new concept. I began to practice this architectural style 10 years ago. I have fully implemented microservices in four companies' projects. I am more and more convinced that this is an architectural style that is very suitable for Internet projects. This is not to say that our services have to be called remotely across physical machines, but that our business is segmented by domain at the beginning through deliberate design, which enables us to have a better understanding of the business. it makes it easy for us to work on different business modules in subsequent iterations, and our project development becomes easier and easier to come from several aspects:

two。 If our division of services and responsibilities are clear, for new requirements, we can know where to change what kind of code, and there are fewer holes without copy and paste.

3. Most of our business logic has been developed and can be directly reused, and our new business is just an aggregation of existing logic. After the PRD review, the developer concluded that it is hard to imagine that you only need to combine the XYZ methods that call the three services of ABC, and then modify the Z method to add a branch logic in the C service.

4. When there is an obvious bottleneck in performance, we can expand the capacity of some services by adding more machines, and because of the division of services, we are more aware of the bottleneck of the system. It is difficult to locate a line of code with performance problems from 10000 lines of code, but if these 10000 lines of code are already made up of 10 services Then first locate the performance problem of a service, and then analyze the service to reduce the complexity of the location problem.

5. If there are major changes in the business that need to be taken offline, we can be sure that the underlying public services will not be eliminated. The aggregation business services corresponding to the offline business will stop the traffic ingress, and then some of the basic services involved can be offline. If you have a sound service governance platform, the whole operation does not even need to change the code.

Here, we are also required to achieve the principles in several aspects:

1. The granularity of services needs to be well controlled. My habit is that there is nothing wrong with dividing it by domain first, and then split it more finely as the project progresses. For example, for Internet financial P2P services, at the beginning, it can be divided into:

A tripartite cooperation service PartnerInvestService: the traffic of the tripartite financial platform that docks the cooperation

B General Investment Service NormalInvestService: the main process of the most common form of assets

C reservation investment product service ReserveInvestService: the main process of the assets that need to make an appointment for investment

D cyclical planning product service AutoInvestService: the main process of wealth management products that will be automatically reinvested on a regular basis

E investor trading service TradeService: specializes in dealing with investors' trading behavior, such as investment

F borrower transaction service LoanService: specializes in processing borrowers' transactions, such as repayment

G user service UserService: handle user registration, login, etc.

H Asset Services ProjectService: dealing with the correlation between assets and targets

I account accounting service AccountService: handles users' accounts, sub-accounts and accounting records

J Marketing campaign Service ActivityService: a points system for dealing with various activities and users

K membership system service VipService: membership growth system for dealing with users

Bank depository service BankService: specially used for interfacing bank depository system

M electronic signature service DigSignService: specially used for docking three-party digital signature system

N message push service MessageService: specially used to interface with three-party SMS channels and push SDK

two。 The service must be three-dimensional, not at one level. As shown in the picture above, our service has three levels:

Aggregate business service: a high-level business service with a complete business form that strands the whole process together. Unlike basic business services, here is a complete description of one aspect of the business, which is often assembled by a variety of basic services. Different forms of cooperation with different external partners and different service forms of the products provided to users determine that there will be differences in business processes in converged business services. if such services are devolved to basic business services, then basic business services will have a variety of if-else logic (if-else according to product type and user type), and the demand will change with or without cooperation. Basic business services can be very corrupt, and to avoid this, we put much more changeable aggregate business logic into separate business services. Generally speaking, aggregate business services do not invoke each other because they represent independent business processes, but they are bound to invoke a large number of basic business services. These services, which are marked with blue fonts in the first point, are all such services. The business logic of this level of service is more to express the complexity and differences of business processes, and will not involve how to deal with account information, accounting information, user information, and how to deal with the transactions of specific investors and borrowers. For example, for the business form of reservation, it focuses on the reservation of assets first, and then automatic investment by the system, and the bottom layer completely depends on the investor trading service to do the whole transaction process.

Basic business services: services related to business in a given area. Such services are allowed to call each other, for example, investor trading services and borrower trading services inevitably need to communicate with user services, asset services, account services and other business operations such as user information query, target information query, bookkeeping and so on. These services, which are marked with green fonts in the first point, are all such services. Although the service at this level has a lot of business logic, it has actually enjoyed a large level of reuse of public basic services, and the additional logic with weak coupling with its own business is often not accumulated in this service. More full-time basic business services bear this part of the logic.

Common basic services: responsible for a certain aspect of the basic business (there is no domain business logic in it), can be autonomous to deal with a certain aspect of the basic business, can also communicate with the outside to achieve a certain aspect of the function, services will not be called each other, but will be called by aggregation business services and basic business services. The services mentioned in the first point, which are marked with orange font, are all such services. If the external cooperation changes in the future, because we have defined the external service contract, we can easily replace this service to replace the third party of the cooperation, and the rest of the system hardly needs to be modified. All three-party docking is recommended to be independent of public basic services. If the same business connects multiple three-party channels, such as push docking with aurora and a push, or even the public basic service can be provided by an abstract aggregate push service, then route to the specific aurora push and a push service.

I hope to clarify this matter here. It is very interesting and necessary to divide services into three levels of services. After service division, it is best to have a clear document to describe the responsibilities of each service, so that we can roughly locate the service where the business is located without reading API, and the whole complex system becomes very straightforward.

3. The underlying data table of each service docking is independent and not cross-related, that is, the data structure is not directly external, and the data that needs to use other services must be carried out through the access interface. The benefits are encapsulated in object-oriented design:

You can easily reconstruct the underlying data structure or even the data source, as long as the interface remains the same, the outside will not be aware of it.

When there is a performance problem, it is convenient to add caching, dividing tables, dismantling libraries, and archiving. After all, data sources do not have external dependencies.

To put it bluntly, I am in charge of my data. I don't care what I want. When refactoring or doing some high-level technical architecture (such as working in different places), it is too important that there is no underlying data to be relied on. Of course, the downside or trouble is that cross-service calls make data operations impossible to complete in a database transaction, which is not a big problem, first, because our split approach does not make the granularity too fine, most of the business logic is done in a business service, and second, it will be mentioned later that cross-service calls are made either through MQ or directly. There will be compensation to achieve ultimate consistency.

4. Take into account significant differences in the stability of cross-machine cross-process invocation services. When we make a method call within a method, we need to consider the exception of the call, but we hardly need to consider the timeout, the loss of the request, and the repeated invocation. For remote service calls, these points need to be considered, otherwise the system as a whole is basically available and there is no problem with the test environment, but it is in a state full of online problems. This requires asking a few more of the above questions for the provision and invocation of each service, carefully considering that the method does not perform multiple or partial execution because of network problems:

When we provide services, we should tell users not only the business capabilities provided by the services, but also the characteristics of the services, such as whether they are idempotent (for operational services of the order type, the same order and the same operation is strongly recommended to be idempotent, so that the caller can rest assured to retry or compensate) Whether external compensation is required (here you may say why external compensation is needed, can't the service compensate itself? of course, the internal sub-logic service can compensate itself, but sometimes the request does not go to the server because of the network. Of course, the server has no way to compensate for this call); whether there is a restriction on frequency control; whether there is a limit on permission; how to handle it after demotion, and so on.

Conversely, we need to ask a few more questions about the characteristics of the target service when we invoke other services, and design the corresponding compensation logic, consistency processing logic and degradation logic. We have to consider that sometimes it's not the server, but the request doesn't reach the server at all.

The service itself often has complex logic, as the identity of the client invokes a large number of external services, so the roles of the server and the client are not fixed. When there are many clients within our service to invoke the server, for each sub-logic we need to carefully consider every link. What is going to happen is that the service is partially logically idempotent or partially logically consistent.

If you say, it's hard for me to consider these points when implementing so many services, I don't think about distributed transactions, idempotency, and compensation at all (it's no exaggeration to say that sometimes we spend 20% of our time implementing business logic. Then spend 80% of the time implementing peripheral logic in terms of reliability), okay? It is not impossible, then the business will certainly be riddled with holes when running online. If the whole business does not have high requirements for reliability or if the business is not oriented to users, it is possible to ignore these points for the time being, but core businesses such as order business that do not allow inconsistencies still need to be fully considered.

5. Take into account the significant differences in cross-machine cross-process invocation service data transfer. For local method calls, if the parameters and return values are passed objects, then for most languages, the pointer is passed (or a copy of the pointer), and the pointer points to the object allocated in the heap. The cost of object data transfer is almost negligible, and there is no cost of serialization and deserialization. For cross-process service calls, this cost is often not negligible. If we need to return a lot of data, often the definition of the interface needs to be specially modified:

By using the form of paging, a fixed amount of data is returned at a time, and the client pulls more data as needed.

You can pass a data structure similar to EnumSet in the parameters to let the client tell the server what level of data I need. For example, the GetUserInfo API can provide the client with BasicInfo, VIPInfo, InvestData, RechargeData, WithdrawData, and the client can get BasicInfo from the server as needed | VipInfo.

6. This also leads to the problem of method granularity. For example, we can define GetUserInfo to return different data combinations by passing in unused parameters, or we can define fine-grained interfaces such as GetUserBasicInfo, GetUserVIPInfo, GetUserInvestData and so on. The granularity definition of the interface depends on how the user will use the data, and tends to use data of a single type or a compound type at a time, and so on.

7. Then we need to consider the problem of interface upgrade. The change of the interface should be compatible with the previous interface. If the interface needs to be eliminated, we need to make sure that the caller has changed to the new interface and make sure that the caller traffic is 0 for a period of time before the old interface can be offline from the code. Once the service is exposed, it is often not so easy to adjust the interface definition or even go offline. It is not up to you. Therefore, the design of external API needs to be careful.

8. Finally, I have to say that after the whole company has started micro-services, some cross-departmental service calls will inevitably have some wrangling when agreeing on API. Should I pass it to you or pull it yourself? why should I keep this data with me? Non-technical matters aside, there are some technical means to resolve these wrangling:

By defining the responsibility of the service, it is clear what the service should perceive and what it should not.

The interface definition of cross-departmental service interaction can be very light, using the interface with only one order number or MQ notification + data pull strategy (who has more data and who provides the data interface, without having to push the data downstream at one time).

Data providers can build a set of common data interfaces, which can meet the needs of multiple departments without customized processing. It can even provide transparent transmission of both landing and non-landing properties on the interface.

You may feel dizzy here, why micro-services need to consider so many extra things, the complexity of the implementation has suddenly increased. What I want to say is that we need to think about this from a different perspective:

1. We do not need to carefully consider all the logic at the beginning, first covering the core process core logic. Because cross-service has become the provider and user of the service, it means that besides myself, there are many other people who will be related to my service ability, and people will ask all kinds of questions, which is good for designing a reliable method.

two。 Even if we stack all the logic together without cross-service invocation, it does not mean that the logic is necessarily transactional and tightly implemented, and cross-service invocation often magnifies the possibility of problems to some extent.

3. We also have a service framework, which often provides a lot of integrated functions at the monitoring and tracking level and the operation and maintenance system, which exposes the internal method logic. For a micro-service system with a perfect monitoring platform, when troubleshooting problems, you will often say that this is a remote service call.

4. The biggest dividend is mentioned before, when we have formed a three-dimensional service system with clear business logic, any requirement can be dissected into a very small number of code modifications and some combined service calls, and you know I will have no problem doing so, because the underlying service ABCDEFG is historically tested, and this pleasure will be exhilarated after a single experience.

However, if the service granularity division is unreasonable, the hierarchical division is unreasonable, the underlying data sources are crossed, the network call failure is not taken into account, the amount of data is not taken into account, the interface definition is unreasonable, and the version upgrade is too reckless, the whole system will have a variety of expansion problems, performance problems and Bug, which is a headache, which requires us to have a perfect service framework to help us locate all kinds of irrationality. Later on, the article on middleware will focus on service governance.

Message queue

The use of message queuing MQ has the following benefits, or we often consider introducing MQ for these purposes:

1. Asynchronous processing: a process such as an order can generally define a core process, which is used to process the state machine of the core order, which needs to be completed synchronously as soon as possible. then around the order will derive a series of subsequent business processes related to the user's inventory, which do not need to be processed at the moment the user clicks to submit the order. Issuing an order is just a process of confirming the legal acceptance of an order, and many subsequent things can be slowly transferred in dozens of modules, even if the transfer process takes 5 minutes, users do not need to feel it.

two。 Traffic peak: one of the characteristics of Internet projects is that some toC promotions are sometimes done, and there will inevitably be some traffic peaks. If we introduce message queues as a buffer between modules, then backend services can passively consume data with their own comfortable frequency, and will not be overwhelmed by heavy traffic. Of course, monitoring is essential, so let's talk about monitoring in more detail.

3. Module decoupling: as the complexity of the project increases, we will have a variety of events from inside and outside the project (user registration and landing, investment, withdrawal events, etc.). These important events may continue to have a variety of modules (marketing module, activity module) to be concerned about, the core business system to call these external system modules, it is obviously not appropriate for the whole system to be entangled internally. At this time, decoupling through MQ, so that a variety of events in the system for loosely coupled flow, modules perform their duties do not perceive each other, which is a more appropriate approach.

4. Send messages in groups: some messages will have multiple receivers, and the number of receivers is still dynamic (similar to the nature of the blame chain is also possible). At this time, it will be more troublesome to make one-to-many coupling between upstream and downstream. In this case, it is more suitable to use MQ for decoupling. The upstream just sends a message saying that what is happening now, no matter how many people downstream care about the news, the upstream is not aware of it.

These requirements are basically present in Internet projects, so the use of message queues is a very important architectural tool. There are several points to pay attention to in use:

1. I prefer to separate a special listener project (rather than merge it into server) to listen for messages, and then this module doesn't really have much logic, but just calls API in the corresponding service for message processing after receiving a specific message. Listener can start multiple copies to do a load balancer (depending on the specific MQ product used), but because there is almost no pressure here, it is not necessary. Note that not all service requires a matching listener project. Because most public basic services are independent and do not need to perceive other external business events, they often do not have listener, and some basic business services do not need to have listener for similar reasons.

two。 For important MQ messages, the corresponding compensation line should be used as a backup, as a leak when everything in the MQ cluster is normal, and as a back when the MQ cluster is paralyzed. I have used RabbitMQ in projects with tens of millions of orders per day. Although QPS is hundreds of thousands, far lower than the tens of thousands of QPS that RabbitMQ can withstand, there is a 1/100000 probability of message loss as a whole (I have also used Ali's RocketMQ, but similar problems have not been observed because of a small volume). These lost messages will be processed by the compensation line immediately. In extreme cases, the entire cluster downtime occurs in RabbitMQ, and messages sent by A service cannot reach B service. At this time, compensation Job starts to work and regularly pulls messages from A service in batches to provide B service. Although message processing is batch by batch, at least it ensures that messages can be processed normally. It is very important to make this backup because we cannot ensure that the availability of the middleware is at 100%.

3. Compensation is implemented without any business logic, let's sort out the matter of compensation. If A service is the provider of the message and B-listener is the message listener, the specific method handleXXMessage (XXMessage message) in B-server will be called to execute the business logic after hearing the message. When the MQ stops working, there is a Job (configurable compensation time and the amount of each pull) to periodically call the proprietary method getXXMessages (LocalDateTime from, LocalDateTime to, int batchSize) provided by A service to pull the message. Then (concurrently) the handleXXMessage of B-server is called to process the message. This compensated Job can be reused and configurable, eliminating the need to handwrite a set for each message. The only thing that needs to be done is that the A service needs to provide an interface to pull the message. Then you may say, I A service here also need to maintain a set of database-based message queue, this is not a set of passive-based message queue? In fact, the news here is often just a transformation work. A must have data in the database that has changed in the past, as long as the data is converted into Message objects and provided. Since the handleXXMessage of B-server is idempotent, it doesn't matter whether the message is processed repeatedly or not, it is just a sequence of mindless data in the past period of time in an emergency.

4. The processing side of all messages had better be idempotent to the same message processing, even if there are some MQ products that support message processing and only process it once, doing idempotents on their own can make things easier.

5. There are some scenarios where there are requirements for delayed messages or delayed message queues, such as RabbitMQ and RocketMQ.

6. Generally speaking, there are two kinds of MQ messages, one is (preferably) that can only be consumed by one consumer and only once, and the other is that all subscribers can handle it without limiting the number of subscribers. Unused MQ middleware has different implementations for both forms, sometimes using message types, some using different switches, and some using group partitioning (different group can repeat the same message). Generally speaking, both implementations are supported. Be sure to study the relevant documentation when using a specific product, and do experiments to make sure that the two messages are handled in the right way to avoid monster problems.

7. You need to do a good job in message monitoring, and the most important thing is to monitor whether messages are accumulated, and if so, you need to enhance the downstream processing capacity in time (plus machines and threads). Of course, if you can do better, you can draw the flow direction and flow velocity of all messages with a hot topology map, and you can see which messages are under pressure at a glance. You might think that since messages are not lost in the MQ system, there is no problem that messages are piled up and processed more slowly. Yes, messages can be properly piled up, but not in large quantities. If there is a storage problem in the MQ system, it is also troublesome to lose a large number of accumulated messages. Moreover, some business systems depend on the time for the processing of messages, and messages that arrive too late will be ignored in violation of business regulations.

8. The picture shows two MQ clusters, one internal and one external. The reason is that we can be relatively weak in controlling the permissions of the internal MQ cluster, and the external cluster must specify each Topic, and the Topic needs to be maintained by fixed people. Topic cannot be added or deleted on the cluster to cause confusion. Hard isolation of internal and external messages is also good for performance. It is recommended to isolate internal and external MQ clusters in the production environment.

Scheduled task

There are several types of requirements for scheduled tasks:

1. As mentioned earlier, cross-service invocation, MQ notification will inevitably have unreachable problems, we need to have a certain mechanism to compensate.

two。 Some businesses are driven based on task tables, and the design of task tables is described in more detail below.

3. Some businesses are processed on a regular basis and do not need to be processed in real time at all (such as notifying users that red packets are about to expire, doing day-end reconciliation with banks, issuing bills to users, etc.). The difference between 2 and 2 is that the execution time and frequency of tasks here are varied, and 2 is generally fixed in frequency.

Explain in detail what the task driver is all about. In fact, make some task tables in the database and use these table drivers as the core system of data processing. This passive operation mode is the most reliable and is much more reliable than MQ driver or service driver. It must be load balanced + idempotent processing + compensation to the end. The task table can design the following fields:

Self-increasing ID

Task type: indicates the specific task type, of course, you can also make multiple task tables directly for different task types.

External order number: associated with the unique order number of the external business logic.

Execution status: unprocessed (waiting for processing), processing (preventing being preempted by other Job), success (eventually successful), failure (temporary failure, will continue to retry), manual intervention (will never change again, manual processing is required, alarm notification is required)

Number of retries: those who have dealt with too many times or failed can be classified as dead letter. If the special dead letter queue task can be retried several times alone, then call the police and intervene manually.

Processing history: the List of Json is saved here for reference.

Last processing time: last execution time

Last processing result: last execution result

Creation time: database maintenance

Last modified time: database maintenance

In addition to these fields, some of the business's own fields, such as order status, user ID, and so on, may be added as redundancy. The task list can be archived to reduce the amount of data, and the task table acts as a message queue. We need to monitor the backlog of data, the imbalance between the inbound and outbound teams, the occurrence of dead letter data, and so on. If our process is processed in the order of tasks ABCD, because each task has its own inspection interval, this system may waste a little time and is not as efficient as real-time concatenation through MQ, but we have to take into account that task processing is often batch data acquisition + parallel execution, which is different from MQ processing based on a single piece of data. In general, there will not be much difference in throughput. The only difference is the execution time of a single piece of data, which is an option for some businesses, considering the passive stability of task table-driven execution.

Here are a few more design principles of Job:

1. Job can be driven by various scheduling frameworks, such as ElasticJob, Quartz, and so on. It needs to be handled by independent projects and cannot be mixed with services. There are often problems with multiple deployment starts. Of course, it is not very troublesome to implement a task scheduling framework on your own, and it is not very troublesome to decide which machine to run Job on during execution, so that the resources of the whole cluster can be used more reasonably. To put it bluntly, there are two forms, one is that the Job is deployed there and is triggered by the framework, and the other is that the code is there and the process is started by the framework.

2. The Job project is only a skin, with some configuration integration at most. It should not have actual business logic and will not touch the database. In most cases, it is simply calling the API interface of a specific service. The Job project is responsible for configuration and frequency control.

3. The Job of the compensation class pays attention to the number of times of compensation to avoid the problem that the whole task is stuck by dead-letter data.

The troika is done, so, finally, let's sort out the module division of the whole project under such a set of architecture:

Site:

Front

Console

App-gateway

Fa ç ade Service:

Partnerinvestservice-api

Partnerinvestservice-server

Partnerinvestservice-listener

Normalinvestservice-api

Normalinvestservice-server

Normalinvestservice-listener

Reserveinvestservice-api

Reserveinvestservice-server

Reserveinvestservice-listener

Autoinvestservice-api

Autoinvestservice-server

Autoinvestservice-listener

Business Service:

Tradeservice-api

Tradeservice-server

Tradeservice-listener

Loanservice-api

Loanservice-server

Loanservice-listener

Userservice-api

Userservice-server

Projectservice-api

Projectservice-server

Accountservice-api

Accountservice-server

Accountservice-listener

Activityservice-api

Activityservice-server

Activityservice-listener

Vipservice-api

Vipservice-server

Vipservice-listener

Foundation Service:

Bankservice-api

Bankservice-server

Digsignservice-api

Digsignservice-server

Messageservice-api

Messageservice-server

Job:

Scheduler-job

Task-job

Compensation-job

Each module can be packaged into a separate package, all projects do not have to be in a project space, can be split into 20 projects, the service api+server+listener in a project, this is actually good for CICD disadvantage is the need to open N projects when you modify the code.

As I said at the beginning of the article, using this simple architecture has a lot of room for expansion, and it will not be much more complex or workload than the All-In-One architecture. You may not agree with this view. In fact, this depends on the accumulation of the team. If everyone in the team is familiar with this architecture and plays micro-service for many years, then many problems will be considered directly in the coding process. In many cases, design can also be considered as a job where practice makes perfect. If you do more, you will know what should be put where, how to divide and how to close, so there will not be too much extra time cost. I think such a simple and practical architecture scheme composed of the troika can be applied to most Internet projects, but some Internet projects will pay more attention to one aspect and weaken it. On the other hand, I hope this article will be useful to you.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.