Meituan big data platform Architecture practice 04/16 Update SLTechnology News&Howtos

Meituan big data platform Architecture practice

2025-04-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

The content introduced to you today mainly includes the following four parts: first, introduce the architecture of Meituan and big data platform, then review the history, see the time evolution of the entire platform, how each step is done, and some challenges and coping strategies. Finally, I would like to sum up and talk about my views on the platform.

Xie Yuchen is the architect of big data Construction platform from Meituan. At QCon2016 Beijing Railway Station, he shared some methods for building the big data platform as a whole, including big data, who built the whole platform, and the application of various technologies, in addition to focusing on a certain point, hoping to give you some enlightenment about big data.

Thank you very much for giving me this opportunity to bring you this speech. I joined Meituan in 2011 and was first responsible for the construction of statistical reports and data warehouses. It promoted the distribution of data warehouses in 2012, put distributed computing on Hadoop, then put the data development process online, and brought the offline platform team in 2014.

What I introduce to you today mainly includes the following four parts: first, introduce the architecture of Meituan and big data platform, then review the history, look at the time evolution of the entire platform, how each step is done, and some challenges and coping strategies. Finally, I would like to sum up and talk about my views on platform.

1. The structure of Meituan big data platform

1.1 overall architecture:

The picture above is the organization chart of Meituan's data system. Each vertical bar above is the business line of data development. Below is my basic database team. At the bottom, we rely on some virtual machines, physical machines, computer rooms and other infrastructure provided by Meituan Cloud. At the same time, we also helped Meituan Cloud to explore big data Cloud services.

1.2 data flow architecture:

Next, I would like to introduce the architecture of the entire Meituan data platform from the perspective of data flow architecture, which is the most restored architecture diagram. On the far left, it starts from business flow to platform, to real-time computing and offline data.

At the bottom of this series, there is a data development platform, this picture is relatively detailed, this is our detailed overall data flow framework composition. Including the far left is data access, above is streaming computing, and then Hadoop offline computing.

To expand the upper left corner of the image above, the first is data access and streaming computing. The data generated by e-commerce system is divided into two scenarios, one is additional log data, and the other is relational data dimension data. For the former, we use the log collection system, which is standardized by Flume and is used by everyone. Recently, Ali's open source Canal was used, and then there are three downstream. All the streaming data is flowed out of Kafka.

Data collection features:

For the data collection platform, the log data is multi-interface, you can enter the file to observe the file, you can also update the database table. Relational database is based on Binlog acquisition increment, if you do a data warehouse, there are a large number of relational databases, there are some changes can not be found, and so on, which can be solved by means of Binlog. Downstream is supported through a centralized distribution of Kafka message queues, currently supporting more than 850 log types with a peak of millions of interventions per second.

Features of streaming computing platform:

The complexity of development is fully taken into account when building a streaming computing platform, based on Storm. There is an online development platform, the test and development process is done on the online platform, which provides an encapsulation of Storm application scenarios, and there is a topology development framework. Because it is streaming computing, we also do delay statistics and alarms. Now it supports more than 1100 real-time topology and seconds real-time data flow delay.

This can configure a parameter set within the company, a code that can be compiled and debugged on the platform. This is the end of the real-time computing and data access section, and the following is about offline computing.

Offline computing: we are a data warehouse data application based on Hadoop, which mainly shows the planning of data warehouse sharing, including original data access, to the basic layer of the core data warehouse, including facts and derived facts. The dimension table spans the aggregated results, and the rightmost data application is provided: some mining and usage scenarios, above are the requirements reports and analysis libraries built by each business line.

This diagram shows the deployment architecture of the offline data platform, with three basic services at the bottom, including Yarn, HDFS, and HiveMeta. Different computing scenarios provide different computing engine support. If it is a new company, in fact, there are some architecture options. Cloud Table is a self-made HBase sub-package and seal. We use Hive to build data warehouses, use Spark for data mining and machine learning, Presto supports queries on Adhoc, and may write some complex SQL. The corresponding relationship here Presto is not deployed to Yarn, it is synchronized with Yarn, and Spark is on Yarn run. At present, Hive still relies on Mapreduce, and is currently trying to test and deploy Hive on tez online.

Offline computing platform features:

At present, the total storage of 42p + is 150000 Mapreduce and Spark tasks per day, with 25 million nodes, which supports the deployment of 3 data rooms. Later, we will introduce the database with a total of 16K data tables, which is quite complex.

1.3 data management system:

Characteristics of data management system:

Let's briefly talk about the data management system, which is equivalent to the operational experience mainly for data developers, including self-developed provisioning system, then data quality monitoring, resource management and task audit, a development and configuration center, and so on. are in the data management system, the following will be integrated into the entire data open platform.

We have mainly realized several points in the data management system.

The first point is that we do automatic parsing between ETL tasks based on SQL parsing.

The cost of each business line is calculated based on the resource reservation model, and the overall resources generally go to the Yarn. Each business line will have some committed resources, guaranteed resources, flexibility and flexibility, and there will be some budget.

The focus of our work is to register SLA guarantee for key tasks, and to monitor the quality and timeliness of data content.

This is the parsed dependency, and the red one is a task shown, with a series of upstream. This is our resource management system, which can analyze the resource usage of each task all the time, aggregate, and do cost accounting for each line of business.

This is for the data quality management center, the picture is relatively small, it can write some simple SQL to monitor whether the data results of a table are in line with our business expectations. The following is data management, as we just mentioned, there is some SLA tracking guarantee for each key data table, and daily newspapers will be issued regularly to observe some changes in their completion time.

1.4BI products:

Above is the BI product, the scenario of data application platform. Our query is mainly supported by a query center, including the engine of Hive,MySQL,Presto,Kylin, where we do SQL parsing. The front is a series of BI products, most of which are self-developed, facing users who can directly write independent queries of SQL, and look at a certain indicator, an analytical data product similar to online in a certain period of time, and a celestial system for bosses to see.

There are index extraction tools, in fact, the design of the commercial oneline front-end analysis engine is similar, selecting the scope of dimensions, as well as timely calculation caliber, there will be a series of timely management of dimensions. Data content data table is not enough, there will be some dashboard.

We have developed the Star Sky display Center, which can be configured with a series of pie charts, line charts and bar charts based on the previous index extraction results, to drag and drop, and finally build to produce a dashboard.

two。 Platform evolution timeline

2.1 platform development

Let's talk about the timeline of the entire data platform. Because I joined Meituan in 2011, Meituan has only been established for about a year. At the beginning of 2011, our main data statistics were based on handwritten reports, which was a requirement that we set up a report page based on online data and write some tables. The serious problem brought here is, first of all, the working status of the internal information system, which is not a vertical platform dedicated to data analysis. At that time, the system was shared with the business, the isolation from the business was very weak, and it was strongly coupled with the business, and every time we came to the data requirements, we had to have some special development, and the development cycle was very long.

What are we going to do with this scene? We have made a relatively good decision so far, which is to rely heavily on SQL. We sub-packaged some reporting tools for SQL and made etl tools for SQL. Mainly in the SQL level to do some template tools to support time and other variables. This variable will have some external parameters passed in and then replaced with the behavior of SQL.

In the second half of 2011, we introduced the concept of the whole data warehouse, combed all the data streams, and designed the whole data system. After the overall construction of the data warehouse, we found that the overall ETL has been developed. First of all, ETL has a certain dependency, but the cost of management is very high. So we developed a system by ourselves, and we found that the amount of data is getting larger and larger, and the original data analysis based on stand-alone MySQL is impossible, so in 2012, we installed four Hadoop machines, followed by more than a dozen, to the last several thousand, which can now support various businesses to use.

2.2 latest developments

We also did a very important thing is the ETL development platform, the original is based on Git warehouse management, the management cost is very high, at that time with a line of business has begun to set up their own data development team. We platform the whole process developed by them, so that each business line can be built on its own. After that, we encountered more and more business scenarios, especially real-time applications. In 2014, the real-time computing platform was launched, changing the original full synchronization mode of relational data tables to Binlog synchronization mode. We are also earlier in China on the improved version of Hadoop2.0 on Yarn, the advantage is to better stimulate the development of Spark. In addition, Hadoop clusters are deployed across multiple computer rooms and clusters, as well as OLAP guarantees and synchronous development tools.

3. Recent challenges and responses

3.1Hadoop multi-computer room

Hadoop multi-computer room background:

The following focuses on three challenges and coping strategies, the first is Hadoop multi-computer room. Why should Hadoop be deployed in multiple computer rooms? Only Taobao did this before. At the beginning of 2015, we were told that there were only 500 nodes in the total computer room, and the computer room we moved to was mainly in breach of contract. We communicated that the new offline computer room needs to be delivered in September. We need 1000 computing nodes in June 2015 and 1500 computing nodes in December, which is certainly not enough. It is necessary to sort out, business is tightly coupled, rapid split cannot support rapid growth, and data warehouse split will bring data copy and data transmission costs, so Hadoop can only be deployed in multiple data rooms at this time.

We thought about it for a moment, why can't Hadoop be deployed in multiple computer rooms?

Actually, there are only two questions.

One is that the cross-room bandwidth is very small, and the cross-room bandwidth is relatively high, tens of gigabytes, which may be more than 100 gigabytes, but the core switching nodes of the computer room are more than these. And Hadoop is a born distributed system, once it crosses nodes, it must have cross-computer room problems.

We combed the Hadoop running process, cross-node data flow, basically there are three.

The first is the network exchange of some Container communications within the APP, that is, some Container communications within the task, and the more specific scenario is between Map and educe.

The second is non-DataNode local reading. If the read data is deployed across data centers, it is across data centers, and the bandwidth is very large.

The third is to build a three-node pipeline when writing data, which may be across the computer room, which will bring a lot of data traffic.

Hadoop multi-computer room architecture decision:

Considering the pressure at that time, we first did the plan of multi-computer room before doing NameSpace, which was different from the plan of Taobao. Each of our nodes has an attribute of the computer room to which we belong, and the maintenance of this thing is basically based on the network segment. For the first problem just mentioned, our solution is to hit a tag in the computer room on the Yarn queue, and the tasks in each queue will only run in a certain computer room. Here we need to modify the code of Yarn fairscheduler.

The second is to modify the addBlock policy based on HDFS to return only the DataNode list of the server room where the client is located, so that the pipeline will not cross the server room when writing, and the server room where the clinet is located will be preferred for reading. There are other scenarios that span computer rooms. For example, Balancer also migrates data between nodes. In the end, we also did one thing, that is, Balancer is a direct DataNode communication, there are channels, we are directly constructed the Block file distribution tool.

Hadoop multi-computer room structure effect:

In terms of effect, the number of nodes on the left is more than 300 in March 2015 and more than 2400 in March 2016, and the different section in the middle is the number of nodes carried by each computer room at that time. At this time, we only have one computer room, because our entire cross-computer room, multi-computer room solution is to match a temporary state, so it uses the interface of the Balancer module to move all the data to a large offline computer room.

Hadoop multi-computer room architecture features:

When doing this architecture, we mainly considered that the first code change should be small, because our team did not have so much control over the Hadoop code at that time, and we wanted to ensure that the scope of influence of the designed results on the native logic of Hadoop was controllable; the second is to be able to develop quickly and give priority to the problem of insufficient distribution of node resources. The third whole migration process is completely transparent, as long as the blocks are distributed to the computer room where I want the task to be transferred before his data is read.

3.2 Task hosting and interactive development

Task hosting and interactive development background:

Our original approach is to distribute some open source native Hadoop and Spark Client to the line of business.

In the local machine to write code and compile, copy to the online execution node, because of the wired authentication.

And when we deploy a new execution node, we need to apply and allocate virtual machines, key and client, which is very expensive to manage.

And the same team sharing a virtual machine development will always encounter a problem, a virtual opportunity is full of memory tasks, to solve this problem.

And because in the process of Spark development, we will continue to provide business with Spark technical support such a service. If people fail to write and run the code, they don't have such strong debug capabilities. When we help them with debug, we can't get the compiled environment, execution environment and compiled code content immediately. The communication cost is very high. At the same time, when we push Spark, we find that its development efficiency is very high, and the cost of learning is also very high. What are we going to do?

Task hosting and interactive development architecture decisions:

In order to solve the problem of high cost of study, we have done two things.

One is the task hosting platform, where the task code is compiled, packaged, executed, tested and finally run online, all managed in one platform.

The other is that we promoted interactive development tools, surveyed ipthon notebook + spark and zeppelin at that time, and finally chose zeppelin, which we felt was more mature. Based on the latter development, a series of bug was repaired to supplement the login authentication. The effect is that the task hosting platform writes code locally and submits the code to the company's public address. In this platform interface, it is not necessary to enter the platform interface, and we also carry out the local task line, submit a task, and begin to test and execute uniformly on the platform. finally, we can also use this configuration to the self-developed scheduling system we just mentioned.

Interactive development may require secondary development at present, but it is worth trying. When the line of business uses it, there are mainly two scenarios. The first scenario is to analyze and investigate some data. It turns out that we provide adhoc's Sql query interface may not necessarily meet his needs, he wants to check that the interface has some sql query complex data, if you want to use spark every time you use spark to compile or use Spark management is very unintuitive.

In addition, some pioneers of Spark have written some Spark applications. How can these applications be seen, learned and understood by other students, and can support him to build his own application scenarios? We can also solve the problem of their interaction through such a platform of code, results, and corresponding display platform.

3.3The OLAP engine

Requirements characteristics of OLAP engine:

Finally, let's talk about the exploration of the OLAP engine. Around the end of 2015, we began to pay attention to the business data Mart, the amount of data has been very large, and including dimensions, the size and complexity of tables are growing very fast. These businesses are also relatively collapsed, MySQL and HBase will do some special methods to support it. We have conducted a survey of the demand, and it is generally said that it is necessary to support the fact that there should be less than 50 cube data cubes and less than 20 categories in 10 million dimensions.

Query request, because the data Mart is generally provided to the sales management team to see the performance, the delay requirement is relatively high, for our TP99 at that time, the first 99% query is less than 3 seconds.

There are multiple dimensions that combine aggregate queries because they need to be transferred up and down to analyze the business.

Another feature is that the de-weight indicators are more accurate, because some indicators related to performance, such as group purchase orders, to revisit the number of users if there is a deviation will affect the performance of the budget.

Possible scenarios for the OLAP engine:

At that time, the possible solutions of the industry were taken into account.

One is the original recommended method of use, that is, Presto, hive, Spark on ORCFile, which is the earliest solution.

In addition, there is an advance business solution. Based on the function of hive grouping set, grouping set is combined according to different dimensions to do aggregation, and then a large table is formed, which is introduced into HBase, and HBase does secondary index as needed. In fact, there are still some bottlenecks.

There are also Druid, Elasticsearch and Kylin projects that have sprung up in the community, and we are faced with a scenario like this. First of all, intuitively, considering the stability, maturity, and the possible control of the team over the product, as well as the activity of the community, we give priority to Kylin. Our team has two Kylin contributors.

OLAP engine explores ideas:

With so many solutions ahead, how can we make sure that the solution we choose is reliable? We build a Star Schema Benchmark based on dpch to construct the OLAP scenario and test data; we use this set of data structure and data content to test different engines to see its performance and functionality. And continue to share the progress of our research and compression in the process of promotion, give priority to collecting their actual business scenario requirements, and then go back to improve the requirements of the data Mart, which is more suitable for business line needs. The following figure shows the interface of Kylin.

Specifically, it provides an interface to declare your dimensions, facts, indicators, how these indicators will be aggregated, will generate Mapreduce tasks, and the results will be compressed according to design and imported into HBase. He also provides a SQL engine, which will be transferred to query on HBase to fish out the results. Generally speaking, it is quite mature.

This is StarSchemaBenchmark, a big fact table, with a lot of dimensions hanging on it, and we have made a lot of references to different levels of data, as well as real data.

The current progress of the OLAP engine:

With the current progress, we have completed the Presto, Kylin1.3 and Kylin1.5,Druid tests. This is indeed better than Kylin, but there are special scenarios that naturally do not support the SQL interface, so it will not be heavily used.

We use Kylin to support seven data cubes for a BI project. The data cube is basically a fact, with a series of dimensions, and is an analysis in a certain scenario.

The business development cycle makes a series of aggregation tables, combs the aggregation results, and maintains that these aggregation results are shortened from 7 days to one day.

There are 300 million rows of data running online, TP95% query response time is less than 1s, TP99 is less than 3 seconds; support delivery team daily query volume of 20, 000. As this is the takeout sales team to see, they have a very large quantity.

4. Summary of the idea of platform

4.1 value of the platform:

Finally, I would like to talk about how to think about the data platform after doing it for so many years. I think whether the platform is a data platform or not, as a team of a platform, the core values are actually these three.

The first is to repeat things, this platform team does well and specializes in doing repetitive things only once, reducing investment.

In addition, unification can push some standards, push some data management models, and reduce the docking cost between businesses, which is a great value of the platform.

The most important thing is to be responsible for the overall efficiency of the business, including development efficiency, iterative efficiency, the efficiency of maintaining operation and maintenance data processes, and the overall efficiency of resource utilization, which make the business team accountable to the business team. No matter what we push, we should consider their business costs from a business point of view in the first place.

4.2 Development of the platform:

What if it can develop into a good platform?

I understand that there are three points:

First of all, support business is the first priority, if there is no business, our platform can not continue to develop.

The second is to work with advanced business to assist and precipitate technology. In a so-called platform company, when there are multiple lines of business, or even when each line of business is already independent, some lines of business must be pioneers, and they have strong development and research capabilities. our goal is to follow these leading lines of business. In the process of walking with them, on the one hand, we can assist them and solve a series of problems. For example, they have sudden business needs, we will help solve the problems.

The third is to set up norms to support the late-development business with accumulated technology. That is, in the process of moving forward with them, some experiences, technologies, schemes and norms are slowly precipitated. For newly built business lines, or slow-growing business lines, our basic strategy is to set a series of specifications, accumulate with priority business lines to support subsequent business lines, and we can also use it when developing functions. Maintain the platform team's understanding of the business.

4.3 on open source:

Finally, let's talk about open source. We just mentioned that we have some improvements and refactorings to open source at the same time, but at the same time, some products are directly used by open source, such as zeppelin,Kylin.

Our strategy is to continue to pay attention, in fact, to do forward-looking research for the business line. their team looks at the data and the news every day. They will talk about how you push a new project. If you don't push us, we may need to keep paying attention. Design a series of research programs to help these businesses to do research. In this way, we will only do the same thing once.

If you have something in common with patch, especially some bug, there will be a table sharing inside the problem, and there will be dozens of patch inside. Selective refactoring will be greatly changed at last, especially when we get up and emphasize that we proceed from the business needs, carry out the selection and balance rationally, and finally come up with a plan that can be reliably implemented. That's all for my sharing. Thank you.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.