Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to understand the process of system architecture evolution

2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article mainly explains "how to understand the process of system architecture evolution". The content of the explanation in this article is simple and clear, and it is easy to learn and understand. let's study and learn how to understand the process of system architecture evolution.

One: the initial demand

A few years ago, Xiaoming and Xiaopi started a business together to do an online supermarket. Xiaoming is in charge of program development, while Xiao Pei is in charge of other matters. At that time, the Internet was still underdeveloped, and the online supermarket was still a blue sea. As long as the function is realized, you can make money at will. So their needs are very simple, as long as a website is hung on the public network, where users can browse and buy goods, and a management background is needed to manage goods, users, and order data.

Let's sort out the feature list:

website

User registration and login function

Commodity display

Place an order

Manage the backend

User management

Commodity management

Order management

Because the demand is simple, Xiaoming left hand and right hand a slow motion, the website is done. Management backstage for security reasons, do not do together with the website, Xiaoming right hand left hand slow motion replay, management website is also done. The overall architecture diagram is as follows:

Xiaoming waved his hand and found a home cloud service to deploy, and the website went online. After its launch, it received rave reviews and was deeply loved by all kinds of fat houses. Xiao Ming Xiao Pi happily began to lie down and collect money.

Two: with the development of the business.

The good times did not last long, and within a few days, all kinds of online supermarkets sprang up, causing a strong impact on Xiao Ming's skin.

Under the pressure of competition, Xiaoming Xiaopi decided to carry out some marketing methods:

Carry out promotional activities. Such as New Year's Day full-court discount, Spring Festival buy two get one free, Valentine's Day dog food coupons and so on.

Expand channels and add mobile marketing. In addition to the website, we also need to develop mobile APP, WeChat Mini Programs and so on.

Precision marketing. Use historical data to analyze users and provide personalized services.

……

These activities need the support of program development. Xiao Ming pulled his classmate Xiao Hong to join the team. Xiao Hong is responsible for data analysis and mobile development. Xiaoming is responsible for the development of promotion-related functions.

Because the development task is more urgent, Xiao Ming Xiaohong did not well plan the architecture of the entire system, patted her head casually, and decided to put promotion management and data analysis in the management background, and Wechat and mobile APP to build separately. After a few days of staying up all night, the new features and applications are basically completed. The architecture diagram is as follows:

There are many unreasonable aspects at this stage:

Web sites and mobile applications have a lot of duplicate code with the same business logic.

Sometimes the data is shared through the database and sometimes transferred through the interface. The relationship between interface calls is messy.

In order to provide interfaces for other applications, a single application is gradually getting bigger and bigger, including a lot of logic that does not belong to it in the first place. The application boundary is vague and the function attribution is confused.

The management background has a low level of protection in the initial design. After adding the related functions of data analysis and promotion management, there is a performance bottleneck, which affects other applications.

The database table structure is dependent on multiple applications and can not be reconstructed and optimized.

All applications operate on one database, and there is a performance bottleneck in the database. Especially when the data analysis is running, the performance of the database drops sharply.

Development, testing, deployment, and maintenance are becoming more and more difficult. Even if only one small function is changed, the whole application needs to be released together. Sometimes the press conference accidentally brings some untested code, or after modifying one function, another unexpected place goes wrong. In order to mitigate the impact of possible problems caused by the release and the impact of a standstill in online business, all apps are released at 3: 00 or 4: 00 in the morning. After the release, in order to verify the normal operation of the application, we have to keep an eye on the peak of users during the next day.

There is a phenomenon of prevarication and bickering in the team. The question of which application some common functions should be built on is often debated for a long time, and in the end, it is either done separately, or put in a random place without maintenance.

Although there are many problems, there is no denying the results of this phase: the system has been quickly built in response to business changes. However, urgent and arduous tasks are easy to make people fall into a local and short-term way of thinking, so as to make compromise decisions. In this framework, everyone is only concerned about their own acre of land, the lack of overall, long-term design. In the long run, the construction of the system will become more and more difficult, and even fall into the cycle of continuous overthrow and reconstruction.

Three: it's time to make a change.

Fortunately, Xiao Ming and Xiao Hong are good young people with pursuits and ideals. After realizing the problem, Xiao Ming and Xiao Hong freed some of their energy from the trivial business needs and began to sort out the overall structure and prepare to transform the problem.

To make changes, first of all you need to have enough energy and resources. If your demand side (business staff, project manager, boss, etc.) is so strongly focused on the progress of demand that you can't allocate extra energy and resources, you may not be able to do anything.

In the world of programming, the most important thing is abstract ability. The process of microservice transformation is actually an abstract process. Xiao Ming and Xiao Hong sorted out the business logic of the online supermarket, abstracted the public business capabilities, and made several public services:

User service

Goods and services

Promotion service

Order service

Data analysis service

Each application background only needs to get the required data from these services, thus deleting a lot of redundant code, leaving only a thin control layer and front-end. The structure of this phase is as follows:

At this stage, the services are only separated, and the databases are still shared, so the shortcomings of some chimney systems still exist:

The database becomes a performance bottleneck and there is a risk of a single point of failure.

Data management tends to be chaotic. Even if there is a good modular design in the beginning, over time, there will always be the phenomenon that one service fetches the data of another service directly from the database.

Database table structure may be dependent on multiple services, affecting the whole body, and it is difficult to adjust.

If you keep the pattern of sharing the database all the time, the whole architecture will become more and more rigid and lose the meaning of the micro-service architecture. So Xiao Ming and Xiao Hong made an effort to split the database. All persistence layers are isolated from each other, and each service is responsible for it. In addition, in order to improve the real-time performance of the system, a message queue mechanism is added. The architecture is as follows:

After a complete split, each service can adopt heterogeneous technology. For example, data analysis services can use data warehouse as persistence layer in order to do some statistical calculations efficiently; commodity services and promotion services are accessed frequently, so a cache mechanism is added.

Another way to abstract common logic is to turn it into a common framework library. This method can reduce the performance loss of service invocation. However, the management cost of this method is very high, and it is difficult to ensure the consistency of all application versions.

Database splitting also has some problems and challenges, such as the requirement of cascading across databases, the granularity of querying data through services, and so on. But these problems can be solved through reasonable design. Generally speaking, the advantages of database splitting outweigh the disadvantages.

There is also a non-technical benefit of the micro-service architecture, which makes the division of labor and responsibilities of the whole system clearer, and everyone is responsible for providing better services to others. In the era of single application, there is often no clear attribution of public business functions. In the end, either do it separately, and everyone does it all over again, or a random person (usually a more capable or enthusiastic person) does it in the app he is responsible for. In the latter case, in addition to being responsible for his own application, the person is also responsible for providing these public functions to others-a function that no one is responsible for, just because he is more capable / enthusiastic. On the back of the pot inexplicably (this situation is also known as those who can do more). As a result, no one is willing to provide public functions in the end. In the long run, people in the team gradually become independent and no longer care about the overall architectural design.

From this point of view, the use of micro-service architecture also requires adjustments to the organizational structure. Therefore, the transformation of micro-services needs the support of managers.

After the transformation is completed, Xiao Ming and Xiao Hong are clear about their respective pots. The two were very satisfied, and everything was as beautiful and perfect as the Maxwell equations.

However.

Four: no silver bullets

Spring is coming, everything is reviving, and it is the annual shopping carnival. Watching the number of daily orders rise, Xiao Pi Xiaoming and Xiao Hong are smiling. Unfortunately, the good times did not last long, and happiness led to sadness. Suddenly, with a bang, the system hung up.

In the past, in single applications, troubleshooting was usually done by looking at the log, studying error messages and call stacks. However, the whole application of micro-service architecture is divided into multiple services, so it is very difficult to locate the point of failure. Xiaoming checks the log machine by machine, and manually invokes one service by one service. After more than ten minutes of searching, Xiaoming finally located the point of failure: the promotion service stopped responding because the number of requests received was too large. Other services invoke promotional services directly or indirectly, and then go down. In the micro-service architecture, a service failure may produce an avalanche effect, leading to the failure of the whole system. In fact, before the festival, Xiao Ming and Xiao Hong had made an assessment of the amount of requests. It is expected that the server resources are sufficient to support the number of requests for the holiday, so something must be wrong. However, the situation is urgent, and as every minute and every second goes by, Xiao Ming has no time to troubleshoot the problem, so he immediately decided to build several new virtual machines on the cloud, and then deploy new promotional service nodes one by one. After a few minutes of operation, the system finally managed to return to normal. It is estimated that hundreds of thousands of sales have been lost during the whole failure time, and the hearts of the three people are dripping blood.

After that, Xiaoming simply wrote a log analysis tool (the amount is so large that the text editor can hardly be opened and can not be seen by the naked eye), counted the access log of the promotion service, and found that during the failure, the merchandise service due to code problems, in some scenarios will make a large number of requests for promotion services. The problem is not complicated. Xiao Ming shakes his fingers and repairs the Bug worth hundreds of thousands of dollars.

The problem is solved, but there is no guarantee that similar problems will not happen again. Although the micro-service architecture is perfect in logical design, it is like a magnificent palace built by building blocks that cannot withstand the wind. Although the microservice architecture solves the old problems, it also introduces new ones:

The whole application of micro-service architecture is divided into multiple services, so it is very difficult to locate the point of failure.

Stability is declining. An increase in the number of services increases the probability that one of the services will fail, and a service failure may cause the entire system to fail. In fact, in a production scenario with a large number of visitors, failures always occur.

The number of services is very large, and the workload of deployment and management is very large.

Development: how to ensure that all services continue to cooperate in the case of continuous development.

Testing: after the service is split, almost all functions will involve multiple services. The original test of a single program becomes the test of inter-service invocation. Testing becomes more complex.

Xiao Ming and Xiao Hong learned from the bitter experience and was determined to solve these problems. The fault is generally dealt with from two aspects, on the one hand, to reduce the probability of the fault, on the other hand, to reduce the impact of the fault.

Five: monitoring-finding signs of failure

In high concurrency and distributed scenarios, faults often break out suddenly. Therefore, a perfect monitoring system must be established to find the signs of failure as much as possible.

There are many components in the micro-service architecture, and each component needs to monitor different indicators. For example, Redis cache generally monitors memory value, network traffic, database monitoring connections, disk space, business service monitoring concurrency, response delay, error rate, and so on. Therefore, it is not realistic to make a large and comprehensive monitoring system to monitor each component, and the scalability will be very poor. The general practice is to have each component provide an interface (metrics interface) to report its current status, and the format of the output data from this interface should be consistent. Then deploy a metrics collector component that periodically acquires and maintains the component state from these interfaces, while providing query services. Finally, a UI is needed to query the indicators from the indicator collector, draw the monitoring interface or issue alarms according to the threshold.

Most components do not need to be developed on their own, and there are open source components on the network. Xiaoming downloaded RedisExporter and MySQLExporter, which provide metrics interfaces for Redis cache and MySQL database, respectively. Microservices implement custom indicator interfaces according to the business logic of each service. Then Xiaoming uses Prometheus as the index collector, and Grafana configures the monitoring interface and mail alarm. Such a micro-service monitoring system has been set up:

Six: location problem-link tracking

In the micro-service architecture, a user's request often involves multiple internal service invocations. In order to locate the problem, we need to be able to record how many service invocations are generated within the microservice and its invocation relationship when each user requests. This is called link tracking.

Let's use an example of link tracing in an Istio document to see how it works:

Picture from: https://istio.io/zh/docs/tasks/telemetry/distributed-tracing/zipkin/

As you can see from the figure, this is a user request to access the productpage page. During the request, the productpage service invokes the interfaces of the details and reviews services sequentially. The reviews service invokes the interface of ratings in the response process. The record of the entire link trace is a tree:

To achieve link tracking, at least four pieces of data are recorded in the HEADERS of the HTTP for each service call:

The traceId:traceId identifies a calling link requested by the user. Calls with the same traceId belong to the same link.

SpanId: identifies the ID of a service call, that is, the node ID of the link trace.

ParentId: spanId of the parent node.

RequestTime & responseTime: request time and response time.

In addition, you need to invoke the components of log collection and storage, as well as the UI components that show link calls.

The above is only a minimalist explanation. The theoretical basis for link tracking can be found in Google's Dapper.

After understanding the theoretical basis, Xiaoming chose an open source implementation of Zipkin from Dapper. Then with a flick of the finger, I write an interceptor for HTTP requests, generate this data for injection into HEADERS at each HTTP request, and asynchronously send the call log to Zipkin's log collector. As an additional note, the interceptor for HTTP requests can be implemented in the code of the microservice or using a network proxy component (but each microservice requires a layer of proxies).

Link tracking can only locate which service has a problem and cannot provide specific error information. The ability to find specific error messages needs to be provided by the log analysis component.

Seven: analyze the problem-log analysis

Log analysis components should have been widely used before the rise of microservices. Even with a single application architecture, when the number of visits increases, or the size of the server increases, the size of the log files expands to be difficult to access with a text editor, or worse, they are scattered across multiple servers. To troubleshoot a problem, you need to log in to each server to get the log files and find (and open, find slowly) the desired log information one by one.

Therefore, when the application scale becomes larger, we need a log "search engine". So that you can accurately find the log you want. In addition, the data source side also needs the components to collect logs and the UI components to display the results:

Xiaoming investigated and used the famous ELK log analysis component. ELK is an acronym for Elasticsearch, Logstash and Kibana.

Elasticsearch: search engine, but also the storage of logs.

Logstash: log collector, which receives log input, performs some pre-processing of the log, and then outputs it to Elasticsearch.

The Kibana:UI component, which looks up the data through Elasticsearch's API and presents it to the user.

Finally, there is a small problem with how to send logs to Logstash. One solution is to directly call the Logstash API to send the log when the log is output. In this way, (why do you use "again") to modify the code …... So Xiaoming chose another scheme: the log is still output to a file, and then an Agent scan log file is deployed in each service and then output to Logstash.

Eight: gateway-access control, service governance

After splitting into micro-services, there are a large number of services and interfaces, which makes the whole invocation relationship messy. Often in the development process, writing, suddenly can not remember which service a certain data should call. Or write askew, call the service that should not be called, originally a read-only function ended up modifying the data.

In order to deal with these situations, the invocation of microservices requires a gatekeeper, that is, the gateway. Add a layer of gateway between the caller and the callee, and verify the permissions each time it is called. In addition, the gateway can also be used as a platform for providing service interface documents.

One problem in using a gateway is to determine the granularity to be used: the coarsest granularity solution is a gateway for the entire micro-service, which is accessed outside the micro-service through the gateway and called directly inside the micro-service; the finest granularity is all calls, whether internal or external calls to the micro-service, must go through the gateway. The compromise solution is to divide the micro-service into several regions according to the business domain, which is called directly within the area, and the interval is called through the gateway.

Since the number of services in the entire online supermarket is not very large, the coarsest granularity scheme adopted by Xiaoming:

9: the service is registered in Discovery-dynamic expansion

The previous components are designed to reduce the possibility of failure. However, failures always occur, so another thing that needs to be studied is how to reduce the impact of failures.

The roughest (and most commonly used) fault handling strategy is redundancy. In general, a service will deploy multiple instances, so that it can share the pressure and improve performance, and can respond even if one instance hangs down other instances.

One problem with redundancy is how much redundancy is used? There is no exact answer to this question on the timeline. Depending on the service function and time period, different numbers of instances are required. For example, on weekdays, 4 instances may be enough, while in promotions, traffic increases dramatically and 40 instances may be needed. Therefore, the number of redundancy is not a fixed value, but is adjusted in real time as needed.

Generally speaking, the operations to add new instances are:

Deploy a new instance

Register the new instance with the load balancer or DNS

There are only two steps to operate, but if the operation registered with load balancer or DNS is manual, it will not be easy. Think about what it feels like to enter 40 IP manually after 40 new instances.

The solution to this problem is automatic service registration and discovery. First, you need to deploy a service discovery service that provides address information for all registered services. DNS is also a service discovery service. Then each application service automatically registers itself with the service discovery service at startup. And after the application service starts, it will synchronize the address list of each application service to the local from the service discovery service in real time (periodically). The service discovery service also regularly checks the health status of the application service and removes unhealthy instance addresses. In this way, when you add a new instance, you only need to deploy the new instance, and shut down the service directly when the instance goes offline. Service discovery will automatically check the increase or decrease of the service instance.

Service discovery is also used in conjunction with client-side load balancing. Since the application service has already synchronized the address list of the service locally, you can decide the load policy when accessing the microservice. Some metadata (such as service version and other information) can even be added when the service is registered, and the client load can control the flow according to these metadata to achieve functions such as Amax B test, blue and green release and so on.

Service discovery has many components to choose from, such as ZooKeeper, Eureka, Consul, etcd, and so on. But Xiao Ming thought he was good and wanted to show off his skills, so he wrote one based on Redis.

Ten: circuit breaker, service degradation, current limit

Fuse break

When a service stops responding for various reasons, the caller usually waits for a period of time and then times out or receives an error return. If the calling link is long, it may cause requests to pile up, and the whole link takes up a lot of resources and has been waiting for the downstream response. Therefore, when multiple access to a service fails, the circuit breaker should be used to mark that the service has stopped working and return an error directly. Do not re-establish the connection until the service returns to normal.

The picture is from Micro Service Design.

Service degradation

When the downstream service stops working, if the service is not the core business, the upstream service should be downgraded to ensure that the core business is not interrupted. For example, the online supermarket order interface has a recommended goods order function, when the recommendation module is dead, the order function can not be hung up together, only need to temporarily turn off the recommendation function.

Current limit

After a service is hung up, the upstream service or user will habitually retry the access. As a result, once the service returns to normal, it is likely to hang up immediately because of excessive Internet traffic, repeating sit-ups in the coffin. Therefore, the service needs to be able to protect itself-- current limit. There are many current-limiting strategies, the simplest of which is to discard redundant requests when there are too many requests per unit time. In addition, zoning current restriction can also be considered. Only requests from services that generate a large number of requests are denied. For example, both the commodity service and the order service need to access the promotion service, the commodity service makes a large number of requests due to code problems, the promotion service only restricts the requests from the commodity service, and the request from the order service responds normally.

Eleven: testing

Under the micro-service architecture, testing is divided into three levels:

End-to-end testing: covers the entire system, generally testing in the user interface model.

Service testing: testing against the service interface.

Unit testing: tests against code units.

The ease of implementing the three kinds of tests from top to bottom increases, but the test effect decreases. End-to-end testing is the most time-consuming and laborious, but we have the most confidence in the system after passing the test. Unit testing is the easiest to implement and the most efficient, but there is no guarantee that there are no problems in the whole system after testing.

Because it is difficult to implement end-to-end testing, we generally only do end-to-end testing of core functions. Once an end-to-end test fails, it needs to be broken down into unit tests: analyze the cause of the failure, and then write unit tests to reproduce the problem so that we can catch the same error more quickly in the future.

The difficulty of service testing is that services often rely on other services. This problem can be solved through Mock Server:

Unit testing is familiar to everyone. We generally write a large number of unit tests (including regression tests) to cover as much code as possible.

XII: micro-service framework

Components such as indicator interface, link tracking injection, log drainage, service registration discovery, routing rules, as well as circuit breakers and current limits all need to add some docking codes to the application service. It is very time-consuming and laborious to let each application service implement on its own. Based on the principle of DRY, Xiaoming developed a set of micro-service framework, which removes the code docking with various components and other common code into the framework, and all application services are developed using this framework.

Many custom functions can be implemented using the micro-service framework. You can even inject program call stack information into link tracing to achieve code-level link tracing. Or output the status information of thread pool and connection pool to monitor the underlying status of the service in real time.

There is a serious problem with using a unified micro-service framework: the cost of updating the framework is high. Every time the framework is upgraded, all application services are required to cooperate with the upgrade. Of course, compatibility solutions are generally used to set aside a period of parallel time for all application services to be upgraded. However, if the application service is very large, the upgrade time may be very long. And there are some very stable and almost unupdated application services, the person in charge may refuse to upgrade. Therefore, the use of unified micro-service framework requires perfect version management methods and development management norms.

Thirteen: another way-Service Mesh

Another way to abstract common code is to abstract it directly into a reverse proxy component. Each service deploys this proxy component in addition, and all outbound and inbound traffic is processed and forwarded through it. This component is called Sidecar.

Sidecar does not incur additional network costs. Sidecar will be deployed on the same host as the micro-service node and share the same virtual network card. So the communication between Sidecar and micro-service nodes is actually achieved only through memory copy.

Sidecar is only responsible for network communication. You also need a component to manage all sidecar configurations in a unified manner. In Service Mesh, the part responsible for network communication is called data plane (data plane), and the part responsible for configuration management is called Control plane (control plane). The data plane and the control plane constitute the basic architecture of Service Mesh.

The advantage of Sevice Mesh over the micro-service framework is that it does not invade the code and is more convenient to upgrade and maintain. It is often criticized for performance problems. Even if the loopback network does not generate actual network requests, there is still an additional cost of memory copying. In addition, some centralized traffic processing can also affect performance.

The end is also the beginning

Microservices are not the end point of architectural evolution. Go in the direction of Serverless, FaaS and so on. On the other hand, some people are singing that they must break up and rediscover the single structure for a long time.

Thank you for your reading. the above is the content of "how to understand the process of system architecture evolution". After the study of this article, I believe you have a deeper understanding of how to understand the process of system architecture evolution. Specific use also needs to be verified by practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report