Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

The way of Architecture Evolution in Jinri Toutiao-- A Special topic on Architecture Evolution under High pressure

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/02 Report--

Today, I would like to share with you the evolution of the architecture of Jinri Toutiao. The previous lecturers talked about a lot of practical information. My sharing focuses on the introduction of infrastructure and architectural ideas. Our idea is to help the architecture do better iterations by providing better infrastructure.

From an architectural point of view, the pressure that the technical team is dealing with mainly comes from three aspects:

Service stability. The stability of the interface makes the service more reliable

Iteration speed. The iterative speed is relatively less important for large companies, the scale is relatively large, and the survival pressure is relatively small, but compared with small and medium-sized companies, iterative speed must be guaranteed, and the time window is also an important factor in determining success.

Quality of service. Mainly focused on user satisfaction, it is also a particularly important topic.

Jinri Toutiao has developed very fast, with a history of only 4 years. From the point of view of the rapid growth in the number and scale of personnel, there is great pressure in terms of stability and availability. On the one hand, it is necessary to implement the business quickly, but on the other hand, problems like these high availability problems often harass engineers: hanging up on-line, crashing services with a large amount of operational activities, unable to withstand stand-alone performance, and failing core services with a small service online. How can the technical team better deal with problems like these?

To add to my understanding of architectural evolution, companies face all kinds of pressures at different stages. The pressure on a small company may be that the business is not up, the QPS is very low, and there is no environment and conditions for optimization; when the company is big, the server may no longer be a problem, but you should constantly consider tuning and coping with the access pressure, and improve the infrastructure to provide a more stable development environment. Therefore, the evolution of architecture is a continuous process with no end.

Why is Jinri Toutiao under so much pressure? The growth rate of Jinri Toutiao is relatively fast. As can be seen from the picture above, the company has been in the past 4 years, and its DAU doubled every year from 2014 to 2016. This is a great challenge to the business. After the scale, our original architecture is difficult to achieve linear expansion, some services that can be expanded linearly, there are also many problems, the business growth is too fast, and the back-end pressure is relatively large.

A brief History of the Development of headline Architecture: three Historical stages

How did the structure of Jinri Toutiao develop?

There has never been a perfect architecture that can be supported all the time, the architecture is a dynamic system, real-time change, because of quantitative change and qualitative change, different stages need different architecture.

When do you need to make structural changes? When you suddenly find that there are more and more system problems, frequent accidents or alarms, reduced communication efficiency and other problems, it is very likely that there is something wrong with your architecture.

There is a problem with software architecture, which takes a relatively long time to change. The model idea of the architecture is set, and as the business grows, the burden becomes bigger and bigger. People who have done infrastructure have this experience: it is easy to have a good idea, but there are a lot of difficulties in making good software. The technological transformation is long, measured in years. So this time can only make the architecture iteration a little faster. Finally, don't try to make a particularly perfect architecture, we just have to keep the agile evolution.

The architecture will inevitably deteriorate.

The first stage of the headline: three-tier structure

When Jinri Toutiao first started, it was a simple Web application, set up a database and realized the business. The initial advantage of headlines is the recommendation engine, as well as another set of data mining and offline computing. The mode of online services at the front end is relatively clear, and the three layers are done. At the beginning of the business, there is no problem, access to increase the level of expansion can be solved.

The second stage of the headline: split

Very similar to the architectural evolution history of most companies, when the last version encountered some performance problems, it was easiest to split it. In the process of optimization, the piece that is too heavy is split from the code. In the figure above, A, B and C are different businesses. At the beginning, the code is the same. In the process of evolution, it is very painful to iterate over products for one or two years.

The architecture of the previous era basically did not consider the development of too many people or scale, and at the beginning, there were no specialized people to optimize the architecture, and many people devoted themselves to the business and added function points. For example, if the effect of the recommendation is not good, we will strengthen the recommendation, and there is no special person to consider how to organize and plan the overall structure.

By last year, the quarterly budget had been used up by the second month. There are 60% to 70% pressure at peak times, and there are two problems involved: the first problem is performance degradation in some places, and the other is that the business is too stressful.

The architecture team needs to find ways to become faster, even if there are access problems, pressure, and not enough machines to ensure our services. The business has been advancing rapidly, the burden is relatively heavy, and the cost of transformation is relatively high. Based on these problems, let's talk about our next stage of thinking and do micro-services.

The third stage of the headline: micro-service

At present, our idea is to build a new architecture through micro-services. By splitting into subsystems, large applications are divided into small applications, and the abstract general layer is used for code reuse.

The layering of the system is typical. We focus on infrastructure, hoping to improve fast iteration, disaster recovery and a series of work through infrastructure, and hope that each business team can make faster business iterations and architectural adjustments.

Micro-service architecture

We think the three key points of micro-service

Decoupled, one service relies on the concept of another service, module, or subservice.

Light weight, reduce the cost of maintenance personnel.

Easy to manage.

In reality, the key to micro-service is autonomy. Although microservices are autonomous and self-contained, they also need to have a hierarchy. For example, the service you provide is provided by an outside company and Weibo, so you can't ask Weibo to make changes for your service. Microservices should have boundaries, and at the corporate level, they should not be made too independent, which will increase the cost of communication. Infrastructure and specifications are preferably reused.

What is the real micro-service like?

The architecture must be something that falls to the ground. Micro services have a development framework. Students who do business do not need to care about disaster recovery at all, nor do they need to repeat such a set of things. They do not need to care about how this thing is deployed.

There needs to be a process normative to constrain. Global optimization can be done with specifications.

Microservices take the form of providing a platform or some tools.

The present and future of headline service

Finally, I would like to introduce to you how the previous service-oriented idea is implemented in Jinri Toutiao. How to provide services to various business team developers?

The main service ideas of the headlines are as follows:

Establish norms. What does the norm do? When deploying RPC, how do you transfer one service to another? I don't think there's a problem with innovation, but you have to consider bringing costs to others. This specification is still needed so that you can do overall control. The stability and unity of service, you have to consider that it brings real advantages, high performance is a point, but local priority is better.

Lay the foundation. After having the standard, the service that begins to really fall to the ground. For example, the basic library, Ngnix, Redis, MySQL these libraries encapsulated, unified to do some things. Development framework, you don't have to focus on data to optimize

Gradually. First detach and then iterate to optimize the service

Everything is a service, the fourth point is slightly different from other companies or teams, our idea is that everything is a service, each node is abstractly attributed to a specific service. Storage is indeed a service, but it is not only something that provides API or functions, but also quality of service. It is relatively easy for others to use it.

Platform. The last landing is something of a platform. How do we design the framework and how do we combine it with services?

The first norm: everything is service.

Resources are limited: apply on demand, apply and authorize

Simple usage: developers only need to focus on the business

There is only one way to locate: use global resources to locate

Finally, every service has an owner, which is partial to the engineering architecture, and my specification must be enforceable.

Our norms.

There must be a global center, and the service must be uniformly registered with consul

The service has a unique label and naming norm: {product line}. {subsystem}. {module} P.S.M, the company has many departments, and we don't want differences in communication between departments, so we need an overall plan to trace it.

Business services use Thrift to describe the interface and must pass standard parameters. If the data is described weakly and without strong constraints, the data on the client may have type errors.

RPC uses uniformly converged libraries

Nginx, Redis, MC, MySQL, etc are all services

Service registration

Our service is started using loader or wrapper scripts, and the specific startup is decided by the business.

The service startup will have a name, register app with the service, there seem to be some constraints, can the database MySQL be started? Is Redis all right?

When starting, the service mode does not need to be regulated, just use the same framework, a new specification, it is easy to migrate the existing services, but this is not a very strong specification, consider the migration cost. Light specification, easy to transfer.

Service Centre

The service center has service information and will bring what kind of service it is at the same time. Other people can simply OK this service. What kind of service quality does this service provide? the owner can manage this information. Redis goes to service, load balancing, serves a project, and connects the service.

Service relationship and Authorization

There is a key concept between services: service authorization. Usually we set up a service and we can connect to it through IP. The database has user name authentication and can also be authorized to IP. However, there are few restrictions on many services in the intranet, and not all services have authorization and authentication. We want to record the relationships between services, global topologies, and be executable.

A service provides an interface, which can be authorized by owner, and can only be accessed after other services are authorized.

Description: what does this service look like? What is the largest QPS? By describing the information to find the problem, the user information service can not hold up, so it refuses. If the resources are allocated to other services, more can be done. And computer room information can be put in here.

The idea of service authorization authentication:

Add more authentication methods for important services based on service identification

Collaborative authentication, and the client itself assists in authentication.

Give an example of Thrift. There are two dotted lines on both sides, and the service center has a strong ability to expand horizontally. ask it for basic authorization information. can I transfer this service? The default is yes, it is a Thrift package, I know who you are, make your own strategy and bring the service package. The request is brought up to analyze whether there is a problem with the call, which is also part of the specification. The developer doesn't have to care about how the framework is done.

The other invokes the service from the service center and rejects you. QPS is under a lot of pressure and can no longer support you. One advantage is that you can avoid wasting resources; in addition, you can virtualize the part of Docker. The previous idea was to authorize IP, control each IP, provide similar anonymous services, and do it according to the IP to which the node belongs. Now using Docker to get a logo is not easy to do, it is not easy to do in the network layer, there is a certain degree of credibility in the intranet environment, I consciously tell you who I am, and then call.

A solution that MySQL is currently working on is shown in the following figure. Unlike Redis, which requires you to bring in who you are, to call MySQL, you need to bring up who the caller is. An important database, be sure to do security authorization, I was just saying under normal circumstances. These methods are superimposed to bring the original information and Redis to do weighted check.

Redis cannot do this at the protocol layer, and adding the above information to the call by MySQL does not affect the semantics. If our server provides HTTP interface, we can provide this information in the HTTP header for authorization and authentication.

There is an authorization relationship, and all services constitute the topological relationship of complete services. A service can only be transferred by pre-authorization. If there is a real topological relationship on the cable, the alarm can be optimized. Redis alarm, MySQL alarm, there is such a topology, will increase the speed of problem tracking.

With such topology information and knowing the global meta-information of the service, we can better optimize the impact assessment and alarm of service changes.

RPC development framework

We have developed a RPC framework ourselves. The development framework will help us develop the code, which a lot of people are doing. Its main features include:

Rapid development: code generation

Service Discovery: understanding Servicalization

Observability: logid, pprof, admin port

Disaster recovery downgrade: business downgrade switch

Overload protection: circuit breaker, frequency control

Multilingual support: Python/Go

For example, observability means that all services can expose the internal state, which has a very good advantage. After the service comes up, the default analysis of the internal port or service port, service launch and platform. According to the topological relationship to automatically analyze the service status, and even do performance analysis, developers can not care about these things, naturally acquire these capabilities.

There is also disaster recovery downgrade, and overload protection, we also have a platform to manage relationships and downgrades, you can pay more attention to the business.

The following is a schematic diagram of the module, which makes our maintenance cost lower by modularization rather than embedding it into the framework.

The previous service is the embodiment of autonomy, and like Docker, we will also do containerized development. Just running the service in the container is far from enough. We can open up the service-oriented system, realize our "attitude" private cloud, and let our platform do the infrastructure. Business units only care about business.

We are now at this stage to do a service refactoring, our private cloud building. Front frame

Keep iterating.

Finally, how do we plan with the virtualized PaaS platform?

We implement it through three layers and manage it uniformly through PaaS platform. Provides general SaaS services, as well as a general App execution engine. The bottom layer is the IaaS layer.

IaaS manages all the machines and integrates the public cloud. There are some hot events in the headlines that will be promoted across the country. The network bandwidth is relatively high. With the help of the public cloud, which type of computing resources we need will be abstracted together. Infrastructure combined with service-oriented ideas, such as logging, monitoring and other functions, businesses do not need to pay attention to details to enjoy the capabilities provided by the infrastructure.

Quan A

Q: I was just talking about whether the cost of splitting single services into micro-services has increased. How do you think about it?

Xia Xuhong: I used to build a database and run directly. In the past, the entire library needs to be upgraded, but now it only needs to be upgraded. When the business is relatively simple and the scale is small, the cost of single service is really low. When your business grows and the number of machines increases, the single service will become a bottleneck, but if the micro service is standardized, it can be managed by automated tools and platforms, not by people, so the cost is reduced.

Q: run the service in the container, register with consul with your own information such as consul, your own container and IP, and update your authorized ACL?

Xia Xuhong: this is indeed an idea. We use consul to be decentralized, but it also adds a layer. If you need to control the access and security of micro-services, the container nodes should be classified into different levels. For example, I will be assigned to a small cluster, and the physical layer is isolated to achieve security in this way. Consul alone is not enough.

What is Q:RPC service discovery? RPC is implemented on its own?

Xia Xuhong: service discovery is consul;RPC is implemented on the basis of Thrift, and the circuit breaker mechanism is also implemented in service invocation.

Q: why not open source when selecting models? we are the architecture of the entire platform ready to make changes to micro-services. We want to select the architecture of the service. You made this part yourself, and we choose between open source and our own. Can you give me an example?

Xia Xuhong: look at the scenario, you have nothing to consider all kinds of open source solutions, we also have some special scenarios, open source things to integrate with internal services, we need to consider some integration costs and our own maintenance costs. Most of the time, open source projects will be universal, will consider more features, the code will be relatively complex, some functions we do not need, we have to make changes, on the whole is not complicated

The authorization standard is also made by ourselves, based on the service identification, the server does not consider the interconnection of scenarios.

Q: if we do the transformation of the micro service platform now, will there be a big change in the business system development model? Our platform will change from development mode to design. You have changed it. What experience do you have?

Xia Xuhong: we haven't finished the transformation up to now. this transformation is very difficult. You first set a general direction, because a lot of things are related to communication issues, promotion issues, you first need to communicate a good direction, to reach an agreement, to how to transform, less mobility, or you achieve most of the functions well, only need to do a small migration, reduce the cost of migration.

The original release time is: 2018-08-17

Author: Xia Xuhong

This article comes from Cloud Community partner "data and Cloud". For more information, you can follow "data and Cloud".

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report