In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
Every year, Singles' Day is an e-commerce event and a consumer carnival. This year's Singles' Day is of particular significance, as it has developed into a feast for e-commerce and consumers all over the world. For technicians, Singles Day has undoubtedly become a big test, considering the overall architecture, basic middleware, operation and maintenance tools, personnel and so on.
A successful preparation is not only to optimize the system and architecture for the activity itself, such as flow control, caching strategy, dependency control, performance optimization …... What's more, it is inseparable from the accumulation and polishing of technology for a long time. Below, I will briefly introduce the overall structure of Alipay, so that you have a preliminary understanding, and then take the "ant flower", which is brilliant in the promotion, as an example to outline how a new business is prepared to promote from scratch.
Architecture
The architecture design of Alipay should take into account the particularity of Internet financial business, such as higher business continuity, better scalability, faster support for new business development and so on. At present, its structure is as follows:
The entire platform is divided into three layers:
Operation and maintenance platform (IAAS): mainly provides the scalability of basic resources, such as network, storage, database, virtualization, IDC, etc., to ensure the stability of the underlying system platform; technology platform (PAAS): mainly provides scalable, highly available distributed transaction processing and service computing capabilities, can achieve flexible resource allocation and access control, provides a basic middleware running environment, shielding the complexity of underlying resources. Business platform (SAAS): provide highly available payment services anytime, anywhere, and provide a secure and easy-to-use open payment application development platform.
Architectural characteristics
Logical data center architecture
In the case of doubling the business volume of the day on Singles' Day, Alipay is facing more and more challenges: the capacity of the system is getting larger and larger, and the servers, networks, databases and computer rooms are all expanded, which brings some big problems. for example, the scale of the system is getting larger and larger, the complexity of the system is getting higher and higher, and the previous point-to-point scalable architecture can not meet the requirements. We need to have a holistic and scalable solution that can be expanded according to the dimensions of a unit. It can provide the ability to support remote scalability, provide the disaster recovery solution of Number1, and provide an integrated fault recovery system. Based on the above requirements, we propose a logical data center architecture. The core idea is to raise the idea of horizontal data split up to the access layer and the terminal, and divide the system into multiple units from the access layer. The unit has several characteristics: each unit is closed to the outside, including the exchange of all kinds of storage access between systems; the real-time data of each unit is independent and not shared. On the other hand, the data with low delay requirements of members or configuration category can be shared; the communication between units can be uniformly controlled and asynchronized messages as far as possible. Synchronous message walking unit agent scheme
The following is a conceptual map of Alipay's logical computer room architecture:
This architecture solves several key issues: by minimizing cross-cell interaction and using asynchronization, offsite deployment is possible. The scalability of the whole system is greatly improved, and the remote disaster recovery strategy of Nissl can be realized without relying on IDC; in the same city, which greatly reduces the cost of disaster preparedness and ensures the real availability of disaster preparedness facilities. The whole system has no single point of existence, which greatly improves the overall high availability. Multiple units deployed in the same city and in different locations can be used as disaster recovery facilities for mutual backup. By switching quickly through the operation and maintenance management and control platform, there is a chance to achieve 100% continuous availability. Under this architecture, the traffic ingress and egress at the business level form a unified controllable and routable control point, and the manageability and control capability of the overall system has been greatly improved. Based on this architecture, the operation and maintenance management and control modes, such as online pressure testing, flow control and grayscale release, which were difficult to achieve before, can now be easily realized.
At present, the main framework of the new architecture has been completed in 2013, and successfully faced the test of Singles Day, so that the landing work of the whole framework has been well proved.
In 2015, the "remote multi-activity" architecture based on logic data room was completed. "remote multi-activity" architecture means that based on the expansion capability of logic server room, IDC deploys logic server room in different regions, and each logic server room is "live". It can really undertake online business and quickly switch between logic data centers in the event of failure.
This has a better guarantee of business continuity than the traditional "two places, three centers" architecture. Under the framework of "live in different places", the fault disaster recovery IDC corresponding to an IDC is a "live" IDC, which usually undertakes normal online business, ensuring its stability and correctness of the business.
The following is a schematic diagram of Alipay's "live in different places" structure:
In addition to the better fault response capability, we also have the verification ability of "blue-green release" or "grayscale release" based on the logical computer room. We divide a single logical computer room (hereinafter referred to as LDC) into An and B logical computer rooms, which are functionally equivalent. On a daily basis, the call request is randomly routed to An or B according to equal probability. When the blue-green mode is enabled, the upper routing component will adjust the routing calculation policy and isolate the calls between An and B. applications in Group A can only access each other, but not Group B.
Then the blue-green release process is roughly as follows:
Step1. Before the release, the "blue" traffic is adjusted to 0%, and all applications of "blue" are released in two groups in disorder.
Step2. "Blue" drainage 1% observation, if there is no abnormal, gradually increase the shunt ratio to 100%.
Step3. The traffic of "green" is 0%, and all applications of "green" are released in 2 groups in disorder.
Step4. Return to daily operation, and blue and green units each bear 50% of the online business traffic.
Distributed data architecture
Alipay handled a peak of 85900 payments per second during the peak period of Singles Day in 2015, making it the largest payment system in the world. Alipay is already one of the largest OLTP processors in the world. Its sensitivity to transactions makes Alipay's data architecture different from other Internet companies, but it inherits the huge number of users unique to Internet companies. The most important thing is that Alipay is more sensitive to transaction costs than traditional financial companies, so the development of Alipay data architecture is a low-cost, linear scalable, distributed data architecture evolution.
Now Alipay's data architecture has been upgraded from centralized minicomputers and high-end storage to distributed PC service solutions. The overall data architecture solution is vendor-free and standardized.
The scalability strategy of Alipay distributed data architecture is mainly divided into three dimensions:
Split vertically according to business type, split horizontally according to customer request (that is, the sharding strategy of data is often said), separate read and write and copy data for data that is much larger than writing.
The following figure shows the scalability design of Alipay internal trading data:
The data of the transaction system is mainly divided into three large database clusters: the main transaction database cluster, where the creation and status modification of each transaction is completed first. The resulting changes are then replicated to the other two database clusters through a reliable data replication center: consumption records database cluster and merchant query database cluster. The data of the database cluster is divided into multiple parts horizontally. In order to ensure scalability and high reliability at the same time, each node will have a corresponding backup node and failover node, which can be switched to the failover node within seconds in case of failure. Consumer record database cluster to provide consumers with better user experience and needs; merchants query database cluster to provide merchants with better user experience and needs
For the split data nodes, in order to ensure the transparency to the upper application system, we develop a set of data intermediate products to ensure the flexible expansion of transaction data.
Reliability of data
Under the distributed data architecture, it is a great challenge to ensure high availability and scalability on the basis of ensuring the original ACID (atomicity, consistency, isolation, persistence) characteristics of transactions. Imagine that you have paid two funds at the same time, and if the transactions of these two funds affect each other in a distributed environment, if one of the transactions rolls back, it will also affect how unacceptable the other is.
According to the CAP and BASE principles, and combined with the characteristics of Alipay system, we design a distributed transaction framework based on the service level. It supports the two-phase commit protocol, but does a lot of optimization to ensure the final consistency of the transaction under the premise of ensuring the transaction's ACID principle. We call it the "flexible things" strategy. The principle is as follows:
The following is a flowchart of the distributed transaction framework:
Achieve:
A complete business activity consists of a master business service and several slave business services. The main business service is responsible for initiating and completing the entire business activity. Provide TCC-type business operations from business services. The business activity manager controls the consistency of business activities, registers operations in business activities, confirms confirm operations for all two-phase transactions when the activity commits, and invokes cancel operations for all two-phase transactions when the business activity is cancelled. "
Compared with the 2PC protocol, there is no separate Prepare phase, the cost of the protocol is reduced, and the system has high fault tolerance and simple recovery.
The asynchronous reliable message strategy for the key components is as follows:
Some of the key design points: if a failure occurs in steps 2, 3 and 4, the business system decides whether to roll back or set up another compensation mechanism; if an exception occurs in steps 6 and 7, the message center needs to check back the producer; if an exception occurs in step 8, the message center needs to retry. The confirmation message in step 6 is encapsulated by the message center component, and the application system does not need to be aware of it. This mechanism ensures the integrity of the message data, and then ensures the ultimate consistency of the system data communicated through asynchronous reliable messages. The pre-check of some services requires the message center to provide a specified condition review mechanism.
Ant flowers.
Ant Flower is a new payment tool added this year, and the payment experience of "after confirmation of receipt and return next month" has been trusted by more and more consumers. Like Yu'e and Yu'e Bao, Ant Flower avoids the trading links between banks and minimizes the congestion of payments. According to official data, in today's Singles Day promotion, the success rate of Ant Huabai payment reached 99.99%, and each payment took an average of 0.035 seconds, working with major bank channels to ensure smooth payment.
Ant flower developed less than a year ago, but the speed of development is very fast. From 10 payments per second at the initial stage of launch to a peak of 2.1w per second on Singles Day. The technical system that supports the business development of Ant Huabai has been continuously evolved and has been completely based on Ant Financial Services Group's financial cloud architecture.
In December 2014, the Ant Huabai team completed the optimization of the business system, set up the system on the financial cloud according to the standard, and docked the channel layer, business layer, core platform layer and data layer in turn. It makes users experience unity in the whole process of marketing, placing orders and paying for ants.
In April 2015, Ant Huabai system synchronized the unitary construction of financial cloud, namely LDC, which made data and applications go to different places to become a reality, with good scalability and traffic control capabilities. In terms of availability, it is deeply combined with the financial cloud accounting system and borrows the failover capability of the accounting system, which makes Ant Huabai have high availability capabilities such as disaster preparedness in the same city and disaster preparedness in different places through low-cost transformation. Any unit has a problem with the database, can quickly switch disaster recovery, and will not affect the users of this unit to pay for ant flowers. In terms of stability, with the help of the high stability of the cloud customer platform, the contract data signed by the ant customer is migrated and written into the cache of the cloud client platform in advance, and the hit rate of the cache reaches 100% during the peak period. At the same time, combined with the full-link pressure test platform, the ability and continuous stability of the ant flower are tested, and it is found that the performance points of the system are optimized repeatedly, which greatly promotes the smooth operation of the system on the same day. In the previous architecture, the second-level processing capacity of the system can not be effectively measured, and more accurate and reliable data can not be obtained through simple drainage manometry. Based on the financial cloud, the system quickly obtained the stable ability to process 4w payments per second through full-link pressure test.
The most critical part of ant flower business is the control of buyer credit and payment risk. From the moment the buyer places the order, the backstage begins to parallel calculate the risk models such as false transaction, limit, cash out and use risk. These models will eventually complete the calculation and judgment of only 10 billion data within the 20ms, and can determine whether there is a potential risk in the transaction before the user arrives at the cashier.
In order to ensure sufficient credit funds during the double 11 period, an institutional asset center was set up under the financial cloud system, the payment and clearing platform was docked, and the credit assets in the table were packaged into an asset pool for a certain period of time. Based on this asset pool, tradable securities are issued for financing, that is, sufficient funds are obtained by means of asset transfer. Through this innovation to ensure that users can successfully complete the transaction through the flower service, and divert the pressure on the bank channel. Through the operation of asset securitization, it not only helps more than 1 million small and micro enterprises to achieve financing, but also supports the consumer credit needs of ant customers. Ant small loan asset securitization business platform can reach more than 100 million per hour, with a total scale of billions of yuan of asset transfers.
Summary
After so many years of highly available architecture and preparatory work, the ant financial technical team can achieve "win first and then fight", which is mainly divided into three aspects of technology accumulation: "scheme", "device" and "general".
"scheming" is the overall architectural design scheme and strategy.
"device" is a variety of basic middleware and basic components that support technical work.
Click follow, Internet R & D architect
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.