Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Key Progress and Prospect of Alibaba and big data Technology

2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

First, the customer value in the field of big data has been transferred to big data for 10 years, from "tasting fresh" to "Pratt" big data technology has existed for 20 years, and Ali's flying platform also has a 10-year course. The picture above is a very famous evaluation organization of Gartner, which shows Hype Cycle in Emerging Technologies. Emerging Technologies means that all of these technologies are regarded as emerging technologies. The horizontal axis is divided into five parts, starting from Trigger, reaching the hottest, and then to the cooling-off period, and then moving forward. Different colors indicate that the corresponding technology will become mature after a few years. In 2014, Big Data has reached the end of its peak. In 2015, Big Data was no longer in the picture above, and many people participated in the discussion about where to put the Big Data. Finally, Gartner analyst Betsy Burton gave a summary sentence: "Big Data..has become prevalent in my lives", which means that big data is no longer a specific technology, it is an inclusive technology field. Alibaba thinks that big data will go from tasting period to universal benefit period in 2014, and it will bring a lot of value changes. The migration of Value Proposition in big data field is shown above as a comparison from tasting period to inclusive period. The tasting period pays more attention to getting started quickly. The second is flexibility, whether the platform, supporting things or the tool chain are not particularly mature, how to make some adjustments and modifications more quickly to meet the demand is very important. In addition, you need to be able to achieve some goals. You don't need to be particularly comprehensive or even stable, as long as you can make trial and error. The characteristics of the universal benefit period and the trial period are almost different, even opposite. From the beginning of the inclusive period, cost and performance become very critical, especially "cost", because through research, users are very concerned about "cost", and users pay more attention not only to the amount of money paid on big data's processing, but also to how to ensure that the cost is within a controllable range with the massive growth of data. When entering the inclusive period and carrying out large-scale applications, the enterprise-level service capability becomes very critical. For example, Alibaba's big data platform generates merchant statements from Alipay every day, and the system requirements for settlement between merchants and merchants, between merchants and upstream and downstream, and between merchants and banks are foolproof. When from the tasting period to the inclusive period, there should be a relatively rich and complete tool chain and ecosystem, which requires the integration of ecosystem and tool chain in order to achieve the whole performance. From Alibaba's point of view-the development process of Feitian platform MaxComputer is the system of Feitian pedestal platform, which supports most of Feitian's data storage and computing power requirements. From Ali's point of view, in 2002, Oracle was doing warehouse data construction, including accounting and inside. In 2006, it was the largest Oracle Rack in Asia. In 2008 and 2009, the Hadoop and Feitian systems were launched respectively, followed by the well-known lunar landing system. In 2015, the lunar landing system was completed, and all the data were gathered together. At the same time, the base of the data was established as a unified storage system, an intermediate unified computing system and a data center. The whole system takes the middle platform system as the core. Become the integration of big data within Alibaba. In 2016, the MaxComputer 2.0 project was launched, almost replacing the whole from 2010 to 2015, while providing services to domestic cloud computing customers. In 2019, we can switch to MaxComputer 3.0. in addition to focusing on performance and cost, with the massive growth of data and the optimization of the data field has almost gone beyond the scope of human beings, it is very difficult for engineers in CCTV to complete the modeling and optimization of CCTV in a human way. Ali believes that to develop in the direction of intelligence, it is very important to optimize big data through intelligence. Second, the development direction of core technology can be analyzed from four angles: high efficiency + low cost includes four parts: computing layer, storage layer, resource utilization layer and governance layer. Enterprise-level services require enterprise-level stability, scalability and disaster recovery capabilities. Ecology and standardization are mainly the integration of ecology and standards. Intelligent "MaxCompute big data cost curve" (value center or cost center? )

The picture above shows the results of hundreds of customer survey data from Aliyun, in which the yellow curve indicates the growth of the company and department business, and the blue indicates that the process in which big data began to apply belongs to a steady development direction during the first year. In the inclusive period, after everyone found big data's technology and value, big data began to climb, and the process at the beginning was not smooth. It is a process of rapid growth. Then there is a problem, the growth of the amount of data and computation, as well as the cost exceeds the existing growth rate, and may continue to rise in the subsequent stage, if there is a relevant system to match, and good optimization and governance, then the data will come down, and finally reach the speed of almost matching application and development, while ensuring that the cost is sustainable. For example, when the business grows fivefold, the cost only doubles. If the data cannot be brought down, what will happen is that the data center becomes a cost center, and there is a lot of data and computing, but it is not clear what is valuable. In order to solve this problem, we need to provide better high-performance and low-cost service capabilities, reduce the cost of the platform layer, and at the same time manage data through data governance services. In addition, big data can be optimized by intelligent methods to achieve the corresponding purpose. The challenges faced by Ali in order to build a computing platform with "high efficiency and low cost" are divided into four parts: 1. When the scale exceeds 10,000 units, it will face the continuous growth of costs. 2. Data or computing explosion, hardware investment is greater than business growth. 3. The technological development of medium and large companies has entered the blind spot of open source software. 4. Unable to form large clusters and patchwork of multiple and small clusters, resulting in low overall utilization. Accordingly, Alibaba computing platform has made the following four optimizations to the above challenges: 1, engine optimization: core engine fully self-developed technology, with control, continuous optimization. 2. Storage optimization: ensure that the data is not duplicated, storage intelligent grading (1.6), compression grading. 3. Resource optimization: cloud native unified resource pool (and corresponding peak cutting and valley filling) + offline mixing. Special attention is paid to the fact that the optimization of the resource level is better than that of the operation itself. The pursuit of extreme performance and extreme speed of the operation is no longer Ali's biggest pursuit, but the greatest pursuit is to improve the utilization of resources as a whole. 4. Data and computing management and governance. The above picture is an example of Ali's Singles Day holiday from 2015 to 2018. The picture on the left shows the daily work volume, the middle chart shows the daily data processing volume, and the right chart shows the cost curve. Facts have proved that Ali has almost adapted the rate of business growth to the rate of cost growth through its flying platform and technical capabilities. On this basis, the following parts of the optimization work have been done: 1. Engine side: NativeEngine+LLVM CodeGen,Vectorization+SIMD CBO+HBO,Dynamic DAG for Input/Shuffling massive data, the newly introduced "rich structured data" data can be stored in Range/Hash mode, supporting first-level Index and Order 2, storage side: compatible with open source Apache ORC, new C++ Writer and improved C++ Reader, reading performance is 50% faster than CFile2 and open source ORC. 3. Resource side: a set of cross-cluster data, computing and scheduling capabilities to turn multiple cluster servers into a computer. 4. Scheduling system optimization: the average cluster utilization is 70%. In addition to optimizing the single job index, more emphasis is placed on the throughput of the entire cluster. 5. Through mixing technology, the utilization rate of online server can be increased to more than 50%. At the same time, the business elasticity of Singles Day scenarios is supported. Some data and cases: 2015, SortBenchmark,MaxCompute 100TB GreySort champion. In 2016, SortBenchmark, EMR 100TB CloudSort champion. In 2017, MaxCompute+PAI, the world's first 100TB-scale TPCx-Bigbench test, passed. In 2018, MaxCompute+PAI, or BigBench, continued to improve its 1x mark and maintain the highest score in the world. In 2018, the Flink internal version was several times the performance of the community, and it was open source in 2019. In 2019, EMR TPC-DS 10TB global fastest 2019, MaxCompute+PAI, indicators continue to improve, to maintain the first in the world, 30TB performance is twice as fast, and the cost is half. The chart above is on BigBench from 2017 to 2019, and it is clear that it has almost doubled every year. As can be seen from the above figure, compared with other systems in the industry, the performance is almost double, and the cost is almost half lower. Building a "multi-functional enterprise" computing platform belongs to the background work of the system, which is roughly divided into four parts: 1. It needs a reliable data intersection point (data chassis). Because the data of many companies are their assets, the security of data is very important. Specifically include the following: EB scale, scalability (single cluster, multi-level cluster, global deployment three-level expansion) data reliability (has gone through the available, available stage, need to provide foolproof guarantee capability, such as DC-level disaster recovery capability) security (from storage, computing, management, operation and maintenance, data security to every layer) 2, for disaster recovery part It is a work that needs to be solved independently by the enterprise, through the choice of disaster recovery, to achieve a certain ability, including the following: perfect fault tolerance (software, hardware, network, artificial) based on cost-effective hardware self-service operation and automatic operation and maintenance (software, hardware, network, artificial) 3. Because privacy leakage often occurs, but Ali will not have privacy disclosure. It is mainly due to the requirements of data management, sharing and security. It includes the following contents: fine-grained authorization for disaster recovery backup, security guard, audit, storage encrypted data management capability, data consanguinity and tracking, data consanguinity-based analysis and reporting multi-data / multi-job management and scheduling based on baseline guarantee scheduling capability 4, scheduling capability and scalability as internal optimization of the system. The details are as follows: large-scale, unified resource pool oversold baseline guarantee scalability and mixing capacity to build an "ecological integration" computing platform is a case study of Feitian MaxCompute platform integration. One layer is a unified storage layer, which can open not only MaxCompute engines, but also other engines. The middle abstraction layer is the joint computing platform, which refers to abstracting data, resources and interfaces into a set of standard interfaces, including Spark and other engines that can be applied to form a complete ecosystem. The ecology of the second line is the outward ecology of MaxCompute sources, and there are a variety of data sources, not only in Ali's own storage, but also in database systems and file systems. In addition, it allows users to interact with other systems without moving data, which is called federated computing. In addition, Blink was a separate branch of the Flink community that year, and the system for Ali's internal best development practices has become the full default community in version 1.9, making a lot of contributions to the SQL engine, scheduling system, and Algo on Flink. With the acquisition relationship with a company of Flink, it will push Flink company forward. Finally, there is the development of the storage level. The image above is about compression, reading and writing, and the transformation of data-related formats, all of which will be promoted to the community, and the orange font is changed according to the design standard. From engine optimization to "autopilot"

In addition to its own optimization, the optimization of computing engine also involves autopilot. The picture above is an example of the use of a car, which shows the evolution of flying apsaras. The first process is the available phase, such as whether the Singles Day can support such a large amount of load to ensure that the system is available. The second process is to achieve the ultimate pursuit of performance and cost. The third process is to make performance better. Intelligent Cloud data Warehouse (Auto Cloud Data Warehouse) has emerged three key challenges within Ali: 1, EB-level data and million-level jobs, which are difficult to manage. Data center team is no longer competent (the traditional DBA model can not support) 2, a variety of data together, people can not understand the full value of data on a massive scale 3. Big data system after years of development, if you need to achieve "transition" progress, the need for architectural transformation from the intelligent cloud data warehouse point of view, can be optimized from three aspects. The first aspect is efficiency optimization, including HBO is based on historical information optimization, it can be understood that a new job acts on the system, when the system does not understand it, the allocation of resources will be in a conservative way to make the job run. When the job is run for the first time, the tuning of the system may be conservative, and gradually it will be closer and closer to its own running state. After four days, the job is thought to be very good. Through HBO optimization, Alibaba's resource utilization reached 70%. In addition, it includes Learned Statistics, intelligent computing reuse, and intelligent data layering. The second aspect is resource planning. When 100,000 machines on the cloud are distributed in different data centers, how to plan data and resource mobilization is not a manual process, but an automated process, including automatic classification of job operation modes. There are three different operation modes for very large jobs and very interactive jobs. In addition, it also includes dynamic Quota adjustment, capacity reduction and expansion, job operation prediction and automatic forecast alarm, automatic job upgrade and downgrade, data arrangement and cross-cluster scheduling. The third aspect is intelligent modeling, including the identification of similar jobs and data, automatic error correction, job operation prediction and automatic warning, as well as automatic job upgrade and downgrade. The above three aspects are sustainable development in the field of intelligent data warehouse. The functions that have been or will be announced by Ali have been or will be announced in the picture above. Auto CDW-Intelligent Index recommends assimilation of cost module through the running relationship between jobs. This way is to find an optimal tuning of index and perform push. For example, based on MaxCompute, 30W fields of 8W tables are selected in Ali Group, from which the best Clustering scheme is recommended for 4.4W tables, resulting in an average Cost savings of 43%. Auto Tired Store-Hot and Cold data Identification and Management on September 1 this year, the overall price of Alibaba's storage was reduced by 30%. Part of the calculation came from the Auto Tired Store technology in the figure above, including the automatic separation of hot and cold data. The previous data was separated in two ways. The first way is that the system automatically does cold compression, which reduces the cost by about 2/3. The second way is to allow users to do flag. However, when there are tens of millions of tables in the system, it is difficult for data developers to identify the way the data is used. At this time, they can use economic models to build the relationship between Access and Storage, and automatically customize the degree of hot and cold for different partitions of each different job. In this way, Ali's compression ratio is compressed from 3 times to 1.6 times, and the overall storage efficiency is improved by 20%. Yugong-Intelligent global data scheduling and scheduling because cloud systems are deployed in multiple data centers around the world, and the generation of data is related to business, but the relationship between data is not allowed to be broken. What kind of data is placed in what kind of computer room and what kind of job is scheduled to achieve the optimal effect is a global optimal matching problem. Inside Ali, static job scheduling and dynamic scheduling are actually integrated into a system called Yugong. On the right side of the image above are two schematic diagrams. The computing power of DPSAaS- data sharing and analysis service based on differential privacy for sensitive data is called secret state computing, and the data for privacy is expected to be invisible. In the above chart, the first three are listed as sensitive data and the last three as insensitive data. Through the coding method of checking privacy, all the sensitive data is hidden. When it comes to care sensitive data, care is not available, but the calculation results of all data are correct. Ali is exploring how to find a balance between data sharing and privacy in this way. Other future-oriented explorations are aimed at other future-oriented explorations. Ali's main aspects include how to do operations on graph-based relationships, how to find the optimal balance between systems, privacy-based computing, how to make better scheduling in the case of multiple goals, and how to do better in the case of how to greatly reduce data at the sampling level.

The original link to this article is the original content of Yunqi community and may not be reproduced without permission.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report