How to analyze the technology selection and design of container-based micro-service architecture 04/16 Update SLTechnology News&Howtos

How to analyze the technology selection and design of container-based micro-service architecture

2025-04-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

How to analyze the technology selection and design of container-based micro-service architecture? in order to solve this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.

Background introduction

As a financial enterprise, the IT work of UBS fund is mainly based on operation and maintenance for many years, and the main business system basically adopts outsourcing mode, but with the continuous development of business, the personalized needs of business departments accumulate more and more, outsourcing and outsourcing can no longer meet the needs of salesmen. At the end of 2016, the company began to set up the development team, and at the same time selected and designed the architecture of the company's business development platform, in order to unify the development platform and improve R & D efficiency, so as to speed up the business demand processing efficiency of the business department.

Below, we will share some experiences behind the selection of platform architecture, the design of platform architecture, and the gradual improvement of the platform and related subsystems in the past two years.

Applicable object

The architecture is all based on the open source platform, and after more than three years of production practice, the platform runs smoothly, has strong scalability and high availability, and can well meet the needs of the company for the continuous development of financial business. this also has a certain reference significance for the business architecture selection of similar small and medium-sized enterprises.

Note: UFOS: fund operation system of UBS

Architecture Design and selection consideration factors of Architecture Design

In the initial platform architecture design and selection, we sorted out the key factors to be considered in the technical architecture selection according to the requirements of the existing business system:

The forward-looking or advanced nature of the architecture platform is in line with the current trend and future development trend, has a better ecological chain and strong vitality, and cannot be reconstructed in the future because of improper selection of the platform architecture, resulting in a large number of migration and reconstruction work. It is necessary to ensure that the platform architecture can keep ahead in technology for a period of time.

There are two points to pay attention to here: one is the consideration of technological maturity, the adoption of risky cutting-edge technology and the risks brought by embracing it, which is similar to the relationship between returns and risks in financial investment. we need to strike a balance between the advanced nature of the system and the stability of the system. Second, the use of cutting-edge technology may face more difficulties, for example, there may be a lack of relevant information in China, or it may be difficult to find successful cases that can be used for reference. In many cases, relevant technical information can only be obtained through officials and forums, which will pose a risk to the timely delivery of the architecture.

Scalability of the platform

Can meet the needs of the continuous development and innovation of the fund industry business, as far as possible to achieve the horizontal smooth expansion of the platform, to meet the above characteristics actually determines the distributed characteristics of the architecture, of course, we prefer it to be a cloud native architecture.

Reliability and availability of the system

As a financial business system platform, it is necessary to ensure the continuous operation of the business system and the high availability of the platform. The automatic recovery function of the cluster or platform is adopted to ensure that the local errors of the platform do not affect the operation of the system as a whole. There are two levels here: one is that the functional components in the business system can be isolated from each other, and the unavailability of one component does not affect the rest of the system. Second, the basic system of the platform adopts cluster architecture and has automatic recovery function, which ensures that even if there is an error in the node in the system, the switching and recovery of services in the error node can be completed in a very short time.

Cost

Different architecture / technology choices have different development costs, including the technology framework and the learning cost of the platform. We expect the platform to support heterogeneous technologies, so that developers can use a more suitable technology stack to quickly achieve the development of business functions.

Develop the idea of integrated operation and maintenance (DevOps), consider operation and maintenance in the design, minimize the complexity of operation and maintenance in the later stage, and reduce the burden of operation and maintenance in the later stage of business system.

Let developers focus more on the development of business functional requirements, and other non-functional requirements, such as load balancing and high availability, are provided by the platform as far as possible, so as to be transparent to developers to improve development efficiency.

When the company is small and not strong enough to implement some or all of its own architecture, it is a natural choice to choose ready-made "wheels" to assemble its own architecture. In the choice, you may think more about how to use more "standard"wheels" to meet the needs of your business, so as to upgrade and expand your business in the future.

To achieve the scalability and high availability of the above platform, generally can not be separated from the distributed architecture, and the distributed architecture is generally inseparable from the service to host.

Evolution based on service architecture

Service-based architecture design has been around for a long time, such as RPC-based service invocation, which can be traced back to CORBA, and Tuxedo, the early framework of BEA used by many financial companies in their trading systems (the main programming language is Cmax Cure +). The newcomers are Facebook's Thrift,Google Protocolbuf framework / grpc, Ali's Dubbo framework and so on. These frameworks support binary encoding (serialization and deserialization) of messages and are efficient, so they become the first choice for applications with high requirements for network transmission and concurrent processing, such as App applications, games, trading software and so on.

Later, with the wide application of HTTP protocol, the architecture design of service-oriented architecture (SOA) has been developed. this architecture is generally used in complex, large-scale projects, in order to reuse functions in heterogeneous systems, or to consider system performance, functional modules are separated into services, services can be distributed deployed, and services are called to each other in the network through standard software interfaces. In order to unify the standard of service invocation, SOA often introduces the concept of data bus, through which services can be registered, searched and scheduled.

The services in SOA architecture are loosely coupled, and the granularity of services is relatively coarse, while the micro-services that have emerged in recent years can be regarded as a simplified, refined, or lightweight version of SOA services.

Micro service

When talking about micro-services, Metropolis corresponds to monolithic applications to show sharp contrast. Monolithic applications are actually applications that contain too many functions in a service, which is quite similar to the formulation of monomer classes (classes that contain too multi-functional implementations) in object-oriented design. There is a proper term monolithic in the English word to describe the two:

If you compare microservices and classes carefully, you will find that there are many similarities between them, for example, microservices and classes are consistent in design principles, that is, high cohesion / encapsulation and loose coupling, that is, high cohesion is only responsible for one task, that is, the principle of single responsibility, while loose coupling means that the interface between modules is as simple as possible and reduces coupling, which also makes development. It is easier to deploy and upgrade microservices independently.

Figure 1: Docker Swarm Service Discovery and load balancing HTTP reverse proxy / Service Gateway

In addition to internal invocation and communication between microservices, microservices have to be exposed in some way before they can be accessed by external systems (such as Web applications, mobile applications, etc.). This involves the front-end routing of services, which is a channel connecting internal microservices and external application systems.

Tools such as HaProxy and Ngix can also implement HTTP reverse proxy, but based on the following features, the open source HTTP reverse proxy and load balancer tool Traefik become our final choice:

Traefik is more suitable for application scenarios that require service discovery and service registration. It supports automatic discovery of a variety of background applications, such as Docker,Swarm,Kubernetes,Consul. It can also dynamically monitor the changes of background services to automatically update its configuration in real time.

Current limiting and automatic fusing functions are supported.

Configuration hot updates are supported.

It can be said that Traefik is very suitable for containerized micro-services. Adopting Traefik can bring the following benefits:

Service reverse routing, Traefik routes external requests to internal specific micro-services, so that although there is a complex distributed micro-service architecture inside the system platform, what the external system sees from the proxy is like a unified and complete service, which shields the complexity of the background service (similar to the Facade mode) as well as the upgrades and changes of the background service.

To facilitate security control, the service accesses the back-end micro-service uniformly through the proxy, while the proxy accesses the micro-service through the internal network of the container, that is, the micro-service does not have to expose the port to the outside of the container, and the external application cannot directly access the micro-service in the container, but must go through the Traefik proxy. The agent has the registration information of the micro service, which can be correctly routed to the micro service container of the corresponding IP/ port according to the micro service name. In this way, our security policy only needs to focus on the control of the Traefik proxy.

Provide measurement data in a variety of formats, such as the Prometheus monitoring data format we use, and provide data such as visit volume, call delay, error count and so on, to provide data support for performance optimization or capacity expansion of the backend.

Figure 3: log subsystem

In our architecture selection, we chose the popular open source framework ELK stack; logs are written to the remote Elasticsearch, usually in two ways, one way is through the logging agent, such as the efficient Beats tool provided by Elasticsearch, you can deploy Beats with business services, which is suitable for third-party services (no source code) or services with non-standard logging components in the development language. Another way is to write logs directly to remote log services through the SocketAppender of logs, such as LogStash, which is supported by many standard log components, such as Java standard log output such as Log4j,Logback. This approach is also suitable for micro-services deployed in containers, without the need to deploy additional logging tools. In our micro service platform, we choose the Logback with high performance and the matching LogStash output plug-in, through this plug-in (proxy) Logback can output the log directly to the Logstash service through Socket, which does not need to make any changes to the code, but can be easily realized through simple configuration file configuration, and is completely transparent to the application micro-service that calls the log.

In order to facilitate the subsequent log search and log data display in Kibana, we need to standardize the format of the log, so that the key information in the log can be stored in ElasticSearch in the way of key-value pairs. Normalization involves encoding and decoding of log text. On the application side and LogStash side, LogStash service can be configured to Mapping and filter messages.

If the log volume is large, you need to increase the message buffer between the log output and LogStash. Kafka is a high-throughput messaging system, and Log4j2 has Appender that outputs directly to Kafka.

Monitoring subsystem

The monitoring system is an important part of platform service governance, and the application system without monitoring can be called a streaking system. Our original business platform already has a set of traditional monitoring system Netgain, but it is more for the monitoring of infrastructure, and lacks the real monitoring of the internal state of the application system, such as support for micro services and containers, which can not meet the needs of the UFOS micro service platform.

Prometheus as the second open source project graduated from CNCF (the first is the container orchestration project Kurbernetes,Prometheus is originally derived from Google's monitoring of Kurbernetes), it can well monitor services and containers, in addition to seamless integration with Kurbernetes, it can also be well integrated with Swarm, especially with the label and global configuration options in Docker Swarm, it is very convenient to implement the deployment of remote Application Monitoring Agent (exporter).

Because Prometheus is an open monitoring platform, there are a large number of official and third-party monitoring agents Exporter (monitoring agents can help third-party services that do not support Prometheus data collection API to expose their own monitoring data). The following monitoring / agents are mainly used in UFOS:

Figure 4: monitoring subsystem architecture diagram

Prometheus provides a variety of client-side API interface invocation libraries, such as official Java,Python,Go and third-party libraries. Through these libraries, you can easily insert monitoring measurement data into your micro-service (through the micro-service Web interface, if it is a batch task, you can send the generated monitoring measurement data to the PushGateway service for hosting), and pull the monitoring measurement data to the Prometheus service process In this way, it is convenient for us to monitor the business data.

Monitoring interface display using Grafana,Grafana is an open source chart visualization system, supporting a variety of timing databases such as InfluxDB, of course, Prometheus,Grafana has a wealth of graphical display components, the official website also provides a large number of off-the-shelf templates, UFOS monitors and displays Swarm nodes, micro-services, databases, alarms and other resources.

High availability design

In order to ensure the stable availability of the business, the platform should be continuously available without unexplained downtime. Even if a fault occurs, it can be detected and located quickly. Through the monitoring mechanism, the problem can be solved as soon as possible before the system user discovers it. Or the system can automatically find the fault and automatically fail over through the design, for example, through the redundancy of the master / standby or cluster to avoid the single point problem. Here we will focus on the latter. This paper briefly introduces the design of the system to improve the high availability of the system.

Access layer

The UFOS running platform is based on the Linux system, and the entrance of the platform is the HTTP reverse proxy Traefik. In order to achieve the high availability of the portal, we must ensure the redundant backup of Traefik.

Traefik itself supports cluster-based HA. Based on the configured KUnip V storage, Consul is officially recommended. However, because our service platform is based on Swarm cluster, Traefik runs as Swarm service (limited to Swarm Manager nodes). It can read the relevant information of service instances running in enough Swarm through Swarm Manager nodes. On the other hand, Swarm Manger exchange information in real time through Raft algorithm, so the service instance information obtained by running multiple independent Traefik instances is up-to-date and peer-to-peer, so we do not need to use K V storage to achieve high availability of Traefik according to official guidelines.

In order to realize the automatic failover of Traefik, we design a VIP-based Linux cluster scheme for Swarm Manager nodes running Traefik Replica instances, using Pacemaker+Corosync, in which Corosync is used to detect whether the communication between nodes is normal, and pacemaker is used to manage cluster resources. When any node failure in the Linux cluster is detected, the VIP will automatically switch to other normal nodes, and the portal will automatically switch to the Traefik running on that node to ensure the availability of the HTTP access agent.

Application service layer

All microservices are run as Swarm services on the Swarm container platform, and the high availability of microservices is provided by Swarm. The Swarm container orchestration system itself supports high availability. Three Manager nodes (which can withstand at most one Manager failure) are configured in the UFOS Swarm cluster, and the Manager is elected through Raft. This election ensures that the exception of a single node does not affect the operation of the entire Swarm cluster.

The micro-service containers running in Swarm are also highly available. First, the high availability of micro-services can be achieved by starting multiple instances of the same micro-services. Swarm can achieve seamless load balancing and failure switching between micro-service containers through VIP (VIP will only be forwarded to healthy services). Even if it is a single micro-service container instance, Swarm can still ensure the high availability of the micro-service. For example, the micro-service container running in the node is abnormal due to node failure, and Swarm Manager can automatically detect the node exception, and then transfer the micro-service container in the abnormal node to other healthy nodes in the cluster, and restart the micro-service application in other nodes. This can still ensure that the micro-service running in the container can be accessed. In order to achieve the high availability of micro-services (container orchestration technology can ensure the dynamic discovery of containers, even if the containers are transferred to other nodes to restart, so as to achieve dynamic access to micro-services, of course, there may be a delay. Another way to achieve this is to ensure that the micro-service is designed to be stateless).

Data layer

Oracle Database uses a typical RAC cluster, while MongoDB implements the Replica configuration of the three MongoDB in a container manner based on the container image provided by the government.

Redis adopts master-slave replication mode, configures one master, two slaves and three nodes, and configures an equal number of Redis Sentinel. These Sentinel can work together to complete fault detection and judgment, as well as fault transfer, and notify the application side, so as to achieve real high availability.

ActiveMQ adopts the official recommendation method and implements the master-slave mode based on RDBMS, which regularly detects the refresh of the master message queue from the RDBMS shared table from the message queue. If the master message queue is abnormal, it cannot be updated within a specified time, and the master message queue is promoted from the message queue to master message queue, thus realizing the master-slave switch. It should be noted here that the synchronization of the system time of the master-slave service node must be ensured.

The high availability of the file system is achieved through the NFS file system and underlying storage.

Through the practice of the production environment, with the continuous improvement of the platform and the continuous accumulation of operation and maintenance experience, the availability of the UFOS platform has gradually increased from 99.95% to 99.99%.

The container-based micro-service architecture platform brings the following benefits to our research and development:

After more than three years of micro-service platform operation practice, it is concluded that the container-based micro-service architecture platform has brought the following benefits to our research and development:

Because it is completely based on open source system, it can be controlled independently.

The platform is basically transparent to developers, while DevOps makes operation and maintenance simple, effectively improves the efficiency of research and development, and saves investment in human resources.

The choice of platform micro-service development language is more flexible. at present, there are three kinds of micro-services developed in the platform, and the platform can change the language according to the development of the development language, and can also adjust the development language according to the changes of the market. maximize the protection of existing investment and optimize future investment

The realization of the company's unified development service cloud platform can seamlessly integrate the services provided by the existing third-party service providers and make effective use of the resources of the platform in service governance.

It can facilitate the integration of third-party open source software systems for direct use of the platform, provide services for the platform, and effectively save manpower investment in development.

The operating environment of the container is highly unified, and the interference of micro-service problems can be eliminated, which is convenient for problem analysis and troubleshooting.

The architecture platform has the advantages of stable operation, high availability and strong expansibility, can be dynamically expanded according to business needs, can meet the needs of the company's long-term business development in the future, and the technical architecture is forward-looking, which effectively avoids a large number of migration and reconstruction work caused by subsequent platform transformation and transplantation caused by improper platform architecture selection, and protects the investment (resource investment, including manpower).

The platform architecture can be used as a reference for small and medium-sized enterprises to select the architecture of the micro-service platform, of course, you can use Kubernetes to replace Docker Swarm, after all, the latter has become a niche product (if you start with simplicity, Swarm is still attractive, such as a few days), and the selection of other subsystems can also be used as a reference.

This is the answer to the question on how to analyze the technology selection and design of container-based micro-service architecture. I hope the above content can be of some help to you, if you still have a lot of doubts to be solved. You can follow the industry information channel for more related knowledge.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.