What is the evolution of Tuniu's server deployment and architecture? 04/10 Update SLTechnology News&Howtos

What is the evolution of Tuniu's server deployment and architecture?

2025-04-10 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "what is the evolution of Tuniu's server deployment and architecture". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "what is the evolution of Tuniu's server deployment and architecture?"

Service promotion

The service of Tuniu began in 2011. At that time, Tuniu mainly carried out the service of members, and in 2012, it was the service of search 2.0. in 2013, the service was greatly advanced, mainly the service of search 3.0, price center, order center, product basic data and other systems. In 2014, TSP (Tuniu Service Governance platform), business public system, resource search system and so on were serviced. In 2015, the production category and open API will be serviced.

From the above process, we can see that the service of Tuniu is not achieved overnight, but has gone through a long process, and each split is equivalent to the process of changing tires for high-speed cars. It can be noted that Tuniu split a search 2.0 in 2012 and soon launched search 3.0 in 2013.

The difference between the two versions is that there is no experience in doing search 2.0 at the beginning. Although a very mature open source search engine such as Solr is used to build the search platform, the relationship between the search platform and the business system is not clearly defined. As a result, the logic of the search platform is so heavy that it is used as a platform for data aggregation, and the list page data and details page data of the site come out of the search. As a result, the logic of searching and obtaining data sources is very complex. Search developers spend 70% of their time on interfacing logic with business systems, and the index efficiency is relatively low, resulting in unstable performance and gradual retirement. After learning the lesson, Tuniu built a search 3.0 platform, which only provides list search, unifies list fields, moves the data push logic to the outside of the search, and is pushed by each product system. Search itself focuses on performance improvement and stability, and gradually adds intelligent sorting and manual intervention of search results. So far, search 3.0 is Tuniu's most stable system.

Then there are two services that have done well at the technical level in the process of service: the price calculation service and the service governance platform.

Price calculation service

Technically, the price calculation service has two difficulties: one is that there are many factors that depend on the price during the group period, and the path of dependence is deep; the other is that the price changes of these factors are more frequent, especially in the peak season. Therefore, from the design, the price calculation service must have larger capacity requirements and real-time at the same time.

The price computing service has been built since 13 years, and the architecture has gone through four stages: synchronous architecture, asynchronous architecture, concurrent architecture and distributed architecture, as shown in the following figure.

Synchronization architecture: the systems mainly interact through the interface, other systems notify the price center to initiate the operation by calling the interface, and the price center obtains all the resources that the price of other systems depends on through the interface. The whole calculation process adopts serial model row, and its low efficiency can only meet the needs of small-scale computing.

Asynchronous architecture: systems interact with each other through MQ, and the price center accelerates the efficiency of data reading by relying on the database to obtain data from other systems, and changes the calculated price into two stages: first, for the case of multiple suppliers of a resource, calculate the lowest cost of the resource, and then calculate the lowest price of the product. This architecture is more efficient than synchronous architecture for data reading, and can speed up computing and improve overall performance by three times by generating data in advance.

Concurrency architecture: firstly, the data of the price library (the cost price of resources and the starting price of the product group period) are divided into tables to improve the data capacity of the system, and then the calculation frequency of hot and cold data is distinguished according to the access frequency of the product. cold data reduces the calculation frequency, while hot data increases the calculation frequency-- by establishing a data structure of group period, itinerary and resources in memory. Improve the efficiency of reading and writing data in the process of calculation. The overall performance is 3.5 times higher than that of the asynchronous architecture, and the price calculation time for each tour period is controlled below 200ms.

Distributed architecture: by parsing the Binlog that depends on the database, transforming the data dependent on the database into a suitable in-memory database structure, further improving the data reading efficiency, thus solving the problem of excessive dependence on the database for computing. Through the use of Sharding MQ, local access and local computing are realized. By using the mechanism of Unix domain communication, the local communication is realized, and the resources and communications that each computing instance depends on are limited to the local server as far as possible, so as to maximize the capacity of IUnix and reduce the loss of IUnix. The overall performance is 2 times higher than that of the concurrent architecture, and the price calculation time for each tour period is controlled below 100ms.

Through the optimization of the above stages, the overall architecture of the price calculation service is shown in the following figure.

Among them, the computing cost nodes in the distribution nodes are some preprocessing nodes, which mainly calculate the cost of resources, and the computing nodes in the physical machine are the unit nodes that actually perform price calculation. The scheduling node divides the price calculation to different machines through certain routing rules, and when Binlog synchronizes, it will synchronize the data to different storage node physical machines according to similar rules, so as to achieve local storage and local computing as a whole.

As of May 2015, the price calculation service has about 900 million calculations per day, and each tour period is calculated more than twice a day on average. The price computing service has been iteratively improving in terms of Ihammer O capability and computing efficiency, and we look forward to a better architecture in the future.

Service governance platform

With the deepening of service, each system provides more and more interfaces, and the whole system gradually produces such problems: mesh interface call; circular dependency in the interface, which may cause avalanche effect; lack of monitoring of service invocation; using hardware to achieve load balancing, poor maintainability. In view of these problems, Tuniu urgently needs a set of service governance platform to manage all the services.

Based on the open source service governance platform, Tuniu has made some customizations and will soon build a service governance platform suitable for Tuniu, as shown in the following figure.

Among them, the registry uses the master-slave mode for cluster deployment, the "master" changes the service address and maintains the heartbeat, and the "slave" provides query services. Establish a long connection between the master and slave to keep the heartbeat. After the "master" downtime, "slave" to take over, change their identity. Instances deployed in the registry can accept client persistent connection requests only if they acquire the "master" identity. After each service provider and service consumer perceive the "master" downtime, try to connect the "slave" and establish a long connection with it, use SQLLite database to persist the service list, use highly available memory cache to save the list of available service addresses, and establish long connections with service providers and service consumers to maintain the heartbeat.

After the service provider starts, the service provided is notified to the registry by the common component, and the registry updates the list of available service addresses. If the service has no audit record, it is subject to review as a new service. After the new service is submitted to the registry, the registry will not update the list of available services. It needs to be manually approved on the management page before it can be used, which is perceived by the service consumers.

If the service provider goes down and the heartbeat is interrupted, the registry will update the list of available service addresses, delete all the services of the provider, and notify the change. The heartbeat has a reconnection holding mechanism. Disconnect the connection without a heartbeat for a certain period of time. The service provider uses connection pooling to control the number of persistent connections and set the maximum number of connections. If there is a maximum limit on the number of connections, new connection access is denied to ensure the availability of the current system.

On the management page, you can query the service, view the service details and the list of available service addresses, view the list of service consumers, review the newly launched service, prohibit the service to be offline, and adjust the load balancing policy of a service in real time. Reduce, double, disable and allow operations on a service provider.

The pain of the computer Room in South Beijing

This section mainly introduces the deployment strategy of Tuniu's computer room. Before 2014, Tuniu basically maintained the structure of the computer room in South Beijing. under the circumstances at that time, this strategy was basically reasonable, but with the increasing volume of application, problems gradually appeared. Tuniu became the strategy of a single computer room in Nanjing in 2015, and in the future Tuniu will evolve to a more stable and highly available structure of two places and three centers.

The strategy of single computer room in South Beijing, at the beginning of the design, well met the business needs. Before 2010, more than 70% of Tuniu's orders were phone orders. in addition, the booking process of travel orders was complicated, and there were many links that required manual customer service. Tuniu needed to deploy the order system in Nanjing computer room in order to provide a good user experience for Tuniu's customer service. At the same time, in order to provide better computer room conditions for Internet users, Tuniu needs to deploy its website in Beijing. Under this kind of computer room structure, Tuniu has done a lot of system optimization work, mainly to solve the problem of data synchronization between remote computer rooms.

First of all, according to the characteristic of "reading more and writing less" of the website data, Tuniu adopts the following typical system design for each subsystem, as shown in figure 4.

Data synchronization is carried out between southern Beijing through the master-slave synchronization mechanism of the database, and the application of the Beijing computer room reads the database of Beijing and writes it into the database of Nanjing through the dedicated line, thus ensuring the consistency of the data on both sides.

The design scheme can run well when the system capacity is small, but in the case of dedicated line instability, there will be more problems, the most common is data synchronization delay, such as users can not log in immediately after registering on the website. To solve this problem, Tuniu adopts a circuit breaker design and uses a specific process to monitor the database synchronization delay. If the delay reaches the upper limit, it will try to use the public network VPN for synchronization, and then switch back when the direct connect situation improves.

In addition, in order to control the amount of data synchronization, all data synchronization uses a compression mechanism to minimize the amount of data synchronization. At the same time, Tuniu also continues to expand the capacity of the dedicated line.

With the continuous growth of business and more and more synchronized data, this deployment architecture is facing more and more challenges, and finally Tuniu merged the two computer rooms in early 2015. The biggest challenge along the way was the network conditions of the Nanjing computer room. At that time, there was no multi-line BPG room with better access conditions in Nanjing. In order to provide better network services to users throughout the country, Tuniu finally adopted the dynamic CDN scheme, and the exit of the Nanjing computer room only provided IP for telecom exports. The users of Unicom and China Mobile are resolved to the local nearest transit server through dynamic domain name resolution, and then the transit server optimizes the route to access the telecom lines in Nanjing. This scheme can provide good network service for users all over the country.

In terms of the overall server deployment cost, Tuniu has reduced at least 30%. One is to avoid the deployment of two copies of the same system in southern Beijing, and the other is to save a lot of direct connect fees.

The current single computer room strategy is a transitional scheme. In order to ensure the further high availability of the system and data security, Tuniu will move forward to the standard two-place and three-center computer room deployment strategy in the later stage.

Performance optimization

Performance optimization mainly introduces several tools summed up by Tuniu in the process of optimization. Tuniu's idea is: first of all, continue to promote the evolution of architecture, system division and arrangement, expand resources in advance to ensure the overall carrying capacity. Then, continuously promote the improvement of monitoring, specific performance indicators, find problems, solve problems, and ensure the overall stability. It is mainly implemented by three tools: CODIS, BWT and OSS.

Codis is a Redis distributed cluster solution developed in Go and C language and implemented as an agent, and is fully compatible with Twemproxy. The underlying layer of Codis will handle the forwarding of requests, non-downtime data migration and so on. All the underlying processing is transparent to the client. In short, you can simply think of a background connection as a Redis service with unlimited memory. From no caching, to file caching, to Memcache caching, to today's Codis caching, caching is a necessity of large architectures.

After using Codis, the application no longer needs to care about where the cache is stored, the work of cache expansion and data migration, and the consistency of cache data, which greatly improves the efficiency of application development and maintenance.

BWT is an active cache update service independently developed by Tuniu. In order to further improve the efficiency of page generation, when data changes occur in the application system, requests for updated data are sent to BWT, and BWT updates the cache according to the set update strategy. The data pushed by the application system is generally delayed for 3 minutes to be updated. At the same time, the hot spot data analyzed by BWT through the log will be automatically updated according to the set time. During the update process, if the load on the target machine is high, the update will be stopped automatically.

OSS is also a website operation monitoring system independently developed by Tuniu. The initial goal of the system is to monitor and manage the performance, availability and security of the website. In the later stage, it will become a separate operation monitoring system to provide monitoring services for all systems. Figure 5 shows the system structure of OSS.

The main feature is to use UPD to send logs from the application system to reduce the performance consumption of sending logs to the application system as much as possible. The log is received through the NSQ queue, and the consumption process written in GE language summarizes the log, stores it in DB, and finally presents various statistical reports through the page.

The various failures of the website can be quickly found through errors and performance charts. There are mainly dependent interface monitoring, slow check SQL monitoring, Memcache monitoring, Redis monitoring and single-page performance monitoring.

Evolution of App client Technology

This paper mainly introduces the practical experience of Tuniu App in the development process, focusing on online hot patches and static front-end resources.

1. Online hot patch

As App adopts the scheme released by the client, once the released package has Bug, repair is a very troublesome problem. The traditional repair methods mainly include: server-side shielding technology, that is, temporarily blocking the problematic functions; jumping to the H5 page, directly redirecting the problematic page to the corresponding H5 page; and urgently releasing a new version. All of these methods have some limitations. For server-side masking technology, it will increase the complexity of server code and hide local functions; for jump to H5, it will reduce the user experience; for emergency release of new versions, it will increase operating costs and reduce user experience.

To this end, Tuniu introduced Ali's online hot patch technology, so that when a problem occurs, it can quickly release a patch package to solve the problem.

two。 Static front-end resources

Due to the short development cycle and easy deployment of H5, there are a large number of H5 pages in Tuniu App, but for H5 pages, the loss of user experience is also obvious. In order to render the elements in the page faster and present them to users, Tuniu adopts the scheme of static front-end resources. The main idea is to load the static resources in H5 page in advance. The main implementation points are as follows:

Static resources load asynchronously, and users download or update static files asynchronously when they open App. Optimize rendering to reduce unnecessary overhead. By optimizing the DOM layout, grouping the loaded static resources into groups, it can be packaged, priority rendering, need to be taken from the server, and then rendered, so as to speed up the first entry speed; reduce the number of DOM rendering on the first screen, use lazy loading, and load step by step; optimize the rendering structure, because the Webview performance in App is lower than that in mobile browsers, so reduce unnecessary rendering overhead, such as reducing some scroll images that are very expensive Optimize interaction, there are interactive operations, resulting in DOM rearrangement redrawing, try to use the smallest DOM rearrangement, will need to add some new layers, separate from the original DOM structure; use some 3D CSS, use GPU to help page redrawing.

These are some of the practical points of Tuniu in the process of architectural change, although they seem to be scattered, they are mainly introduced from the following three aspects of the architecture.

Logical architecture: service-oriented, how to abstract the common functions in the business and provide them to other systems in the form of services.

Physical architecture: the original intention of the design of the South Beijing computer room, problems encountered, solutions, etc.

System architecture: non-functional architecture, such as performance optimization, App client performance improvement practices.

At this point, I believe you have a deeper understanding of "Tuniu's server deployment and architecture evolution". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.