In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
In this issue, the editor will bring you about the application of MaxCompute in Gaode big data. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.
I. Gaud's business and data
Map description needs a lot of supporting data, including real road information, road shape, road conditions and so on. The following trajectory thermal map shows the road conditions around Beijing Union University shown by Amap, depicting three kinds of information: point, line and surface. Regional heat is formed by superimposing map information and trajectory data. Among them, the tracks of different colors show the road conditions of the area at different times of the day.
Cdn.com/3de6b7d7a84b37d1a5945f5f3bb28734a8162022.png ">
Some of Gaud's business scenarios are shown below. The first scene is the Gaud app that everyone uses on a daily basis. Amap is Apple's strategic partner in China, and the second scene shows the travel services Gaud provides for Apple. Gao de has opened its ecological capabilities to the entire Internet industry. the third scenario is the location service interface that Gao de provides to APP application openers. Currently, mobile applications developed using this interface include Mobile Taobao, Jinri Toutiao and Xiaomi Sports. In addition, the fourth scene is the perfect location service scheme provided by Gaud for on-board equipment.
Amap's business architecture can be described as "442", that is, it is divided into four layers: client, middle tier, service engine and basic geographic information, and includes four business entrances: AppleMaps, Gaud App, third-party App and vehicle equipment. In addition, the number "2" means that Amap relies on two data sources, namely, basic geographic information composed of self-collected data, industry cooperation data and data generated by service engines such as track data and driving data. The business architecture of "442" has brought a qualitative leap to the development of Gao de.
The following picture shows Mr. Liu Zhenfei, President of Gaode, celebrating the historical moment when Gaode's Eleventh Dau broke through 100 million. During the National Day holiday, Gao de provided users with a total navigation mileage of more than 13.5 billion kilometers, equivalent to 45 round trips between the sun and the earth. Behind Gaode's massive services are Gaode's powerful big data computing power, more than thousands of Gaode cluster nodes and cluster storage capacity that carries more than 100 PB data.
Second, how to manage data well
SPA architecture
Gaud's data architecture is called "SPA architecture". "S" refers to Source, or data source layer, and houses all location, map and image data within Gaud. "P" refers to Platform, or data platform, which provides data warehouse, data adaptation and data mining capabilities to support the upper data application layer, that is, Application ("A"). In the "SPA Architecture", Gaud is most concerned about the access to data, that is, all data operations should comply with security specifications. In addition, Gaud also requires all departments to define their development goals and use unified platform tools for development.
Data research and development
The whole link process of data research and development includes data integration, data development, operation and maintenance center, data quality, data map, data security and data service. Gaud's requirements for the data platform are not only the above full-link All in One, but also hope to be able to visually interact with users in order to improve the efficiency of development. Taking the operation and maintenance center as an example, it is hoped that the tools used can visualize the scheduling nodes and facilitate task dependence with different time granularities. At the same time, we also want to have a visual data map for managing metadata information to facilitate immediate viewing of upstream and downstream. MaxCompute is a powerful product that meets the requirements of Gaode's data business.
Features of MaxCompute platform
The MaxCompute platform used by Gaud has the following three characteristics:
First, it is easy to use, with zero learning cost and perfect IDE and other advantages.
Second, efficiency. The Rubik's Cube, by far the largest public project in Gaud, is implemented using Aliyun and MaxCompute.
Third, flexibility, Gao de's traffic during the National Day holiday is far beyond imagination.
Easy to use-go to the cloud
In 2014, Gaud's data architecture relied on Flume for data collection, a hadoop cluster with only a few hundred machines and software such as Hive for data processing. In September 2014, Gaode put forward "Shangyun", which is to migrate data to Aliyun, so that non-process operations can be managed in a process. Compared with other complex data migration efforts, in 2014, Gao de realized "one-click" on the cloud, switching the synchronization of source data from Flume to TimeTunnel, and then configurable later. In addition, the migration is accompanied by code changes. In 2014, Gao de "Shangyun" made only very few changes to the code, such as modifying the interface in the old version M2. The upper data storage layer replaces the data media with cloud products such as OTS to support more stable foreground applications. It took only two months for Gaud to migrate all the cluster data to the cloud.
The benefits that Shangyun brings to Gaode are inestimable. Figure 1 shows that all the code is managed by the cloud after "going to the cloud"; figure 2 shows one-click operation and maintenance management; figure 3 shows measurable computing resource management, which quantitatively shows the resource usage of each task; figure 4 shows the flow of visual security approval operations. From "Shangyun" in 2014 to 2018 now, Gao de has experienced rapid development, but also exposed some problems.
Efficiency-Rubik's Cube
Too many chimneys is a troublesome problem in data warehouse, and Gaud also has this problem. It may take a month for data users to find the department where the data is located, the relevant product owners of the data, and R & D personnel to request the data. When Gao de took stock of the data warehouse in 2017, he found that there were 20 data warehouse projects within Gao de, and the data redundancy between each data warehouse was as high as 30%, which seriously affected the work efficiency of the team. In addition, Gaode data warehouse also has the disadvantage of high latency, and the core data can not guarantee the "7-point output" every day. Based on the above two problems, Gao de launched the "Rubik's Cube" project, merging 20 warehouses into one to achieve group-wide data governance.
It is obvious that there are serious challenges in realizing the group-wide data governance project. First of all, the amount of data is very large, and the Rubik's Cube project requires global data governance of 100 PB-level data. Secondly, there are a large number of participants. The "Rubik's Cube" project involves all the data developers of the Gaode production line, with a project team of more than 100 people. Finally, the schedule is tight, and in order to ensure that the data architecture upgrade does not affect the normal business, Gaud requires that the main development work of the "Rubik's cube" project should be completed within two and a half months. In addition, the shorter the data migration is completed, the greater the benefit to the enterprise, so Gaud requires the Rubik's cube project to be completed in as short a time as possible. The main idea to deal with these challenges is to introduce efficient R & D tools to achieve collaborative development in a standardized process and improve the efficiency of the team.
To this end, Gaud first unified the tool platform and introduced MaxCompute. The blue part of the figure below is the business benefits that MaxCompute brings to us. It is undoubtedly difficult to unify the specification in a team of several hundred people, and MaxCompute provides standardization modules such as coding specification, scheduling configuration specification and R & D self-test specification. Among them, the code writing specification module uses SQL Scan tools to automatically check whether the code conforms to the specification, and the scheduling configuration specification module provides a perfect user manual and various templates to assist developers to complete the configuration. The unified process requires customized management of the data development process, including R & D testing, development self-testing, scheduling testing, QA testing and final online deployment. In addition, unified modeling and language, unified data approval standards are also very important.
Aliyun provides some excellent tools to build a standardized process. First, a data consanguinity visualization tool is provided to help the data development team track the source data, upstream and downstream of the data in time. Second, it provides the ability to develop / test processes in parallel to support sound collaborative development and efficient operation. Third, a code cloud version management tool is provided, which allows real-time viewing of code changes, code management status, and support for rollback. Fourth, an one-click data exploration tool is provided, which allows data developers to explore the field null value rate, effective value rate, table repetition rate and other information of massive data through simple configuration, which greatly improves the work efficiency of data developers.
With the help of standardized process and many efficiency tools, Gaud completed the development of the Rubik's Cube project within the specified time, which was well received. Gao de finally unified the data warehouse, reducing the monthly growth rate of all internal 100 P data by 40%, while improving data computing efficiency by 30%. Even during the traffic bombardment period of the National Day holiday in 2018, Gaud still achieved the "5-point output" goal of the core data (the core data calculation task needs to be completed from 5: 00 to 7: 00).
Elasticity-National Day holiday
During the National Day holiday in 2018, Gaode's data processing capacity grew rapidly with the business, and the performance of data computing tasks and the stability of the platform were greatly tested.
The data consanguinity visualization tool allows data developers to visually view the system resource configuration. The following figure shows the actual calculated water level of Gaud's system on September 2, 2018, where the blue line is the system quota water level and the yellow line is the actual calculated water level of the system. The flexible computing power provided by Aliyun allows the normal calculation and output of system resources to be guaranteed within a certain range of elastic data. In addition, Aliyun also provides a stable computing environment to ensure the efficient operation of computing tasks while avoiding resource competition. In addition, in order to make better use of the system computing resources, the Gaode team put forward the scheme of "improving the blue line and breaking the yellow line", applied to expand the cluster resource quota to increase the computing space, and dispersed the actual resource water level by dispatching off-peak. In terms of capacity expansion, MaxCompute brings one-click resource expansion capability to Gaode, which enables cluster expansion to be completed in an hour-level time. Finally, Gao de also realized the calculation optimization, providing the guarantee of personnel on-line duty and so on. The following figure also shows the system calculated water level of Gaode on October 2, 2018. The "system quota water level" represented by the blue line is much higher than that on September 2, indicating that the expansion of the cluster has been successfully completed. At the same time, the "actual resource quota water level" represented by yellow has been completely cover by the blue line, which better ensures the resource calculation task. In addition, the yellow peak is obviously scattered, and some important non-core data are scheduled to 7: 00, indicating that the off-peak scheduling of computing resources has also been completed successfully. The one-click operation and maintenance scheduling tool provided by Aliyun can ensure that the system can easily dispatch off-peak and save manpower. The flexibility brought by MaxCompute to Gaud enabled Gaud to achieve the remarkable results of the core data "3-point output" on October 2, 2018.
Business results
The following picture shows the road network coverage map of China. East, North and South China have basically achieved full road network coverage, while road traffic in remote areas such as the west is not developed enough, and many roads are still under construction. Road network coverage is very important for Gaud, which needs to automatically discover new and expired roads at as little cost as possible.
Gaud combines track data assets and map construction capabilities to build a trajectory thermal map, supplemented by existing road networks and data mining algorithms to automatically find new and expired roads. In addition, Gao de also combines the regional traffic flow and related user reporting events to dynamically find road closure and traffic events, so as to better realize road network mining.
Road condition forecasting is another important business of Gaode, that is, real-time prediction of road traffic conditions, road congestion and so on. The picture on the left shows Gao de's forecast of the average speed of the road section on the right from morning to night. The red line represents the historical average accumulated by the data, the blue line represents the model predicted value, and the black line represents the real data value. The blue line and the black line basically coincide, which strongly illustrates the data mining capability of Gaode application and the achievements made in the construction of unified data warehouse.
In addition, Gao de also opened a city-level data product to users throughout the network, allowing users to view relevant data such as city road congestion and city congestion index at any time, which can be accessed on the http://report.amap.com/ page.
This is what the editor shares with you about the application of MaxCompute on Gaode big data. If you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.