In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-07 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
Article | person in charge of real-time computing platform of Pan Guoqing Ctrip big data platform
This paper mainly expounds the architecture and practice of Ctrip real-time computing platform from five aspects: the overview of Ctrip big data platform, architecture design and implementation, the process of stepping on and filling holes, the detailed application scenarios in the field of real-time computing, and future planning. I hope it can be used as a reference for companies and students who need to build a real-time data platform.
I. the overall structure of Ctrip big data platform
The structure of Ctrip big data platform is divided into three layers:
Application layer: development platform Zeus (divided into scheduling system, Datax data transmission system, master data system, data quality system), query platform (ArtNova report system, Adhoc query), machine learning (developed based on open source frameworks such as tensorflow and spark; GPU cloud platform based on K8S), real-time computing platform Muise
Middle tier: based on the open source big data infrastructure, divided into distributed storage and computing framework, real-time computing framework
Offline is mainly based on Hadoop, HDFS distributed storage, distributed offline computing based on Hive and Spark, KV storage based on HBase, Presto and Kylin for Adhoc and reporting system
The bottom layer of the real-time computing framework is based on the message queue system Hermes encapsulated by Kafka. Qmq is a message queue developed by Ctrip itself. Qmq is mainly used in the order transaction system to ensure that the message queue built by 100% does not lose data.
Bottom layer: resource monitoring and operation and maintenance monitoring, divided into automatic operation and maintenance system, big data framework facilities monitoring, big data business monitoring.
II. Architecture design and implementation
Introduction to 1.Muise platform
1) what is Muise
Muise, named after the literary goddess muse in Greek mythology, is a platform for real-time data analysis and processing of Ctrip; the underlying Muise platform is based on message queues and open source real-time processing systems JStorm, Spark Streaming and Flink, which can support streaming data processing with seconds or even milliseconds delay.
2) functions of Muise
Data sources: Hermes Kafka/Mysql, Qmq
Data processing: provides Muise JStorm/Spark/FlinkCore API to consume Hermes or Qmq data, while the underlying layer uses Jstorm, Spark or real-time data processing, and provides its own encapsulated API for users to use. API docks all data source systems, which is convenient for users to use directly.
Job management: Portal provides management of JStorm, Spark Streaming and Flink jobs, including functions such as creating new jobs, uploading jar packages and releasing production
Monitoring and alarm: use the Metrics framework provided by Jstorm, Spark and Flink, support custom metrics;metrics information center management, access to Ops's monitoring and alarm system, and provide comprehensive monitoring and alarm support to help users monitor whether there is a problem with the job at the first time.
Current situation of 2.Muise platform
Current status of the platform:
Jstorm 2.1.1 、 Spark 2.0.1 、 Flink1.6.0 、 Kafka 2.0
Cluster size:
13 clusters, 200 + machines 150+Jstorm, 50+Yarn, 100 + Kafka
Job size:
11 lines of business, 350+Jstorm jobs, 120+SS/Flink jobs
Message size:
Topic 1300 +, incremental 100T + PD, Avg 200K TPS, Max 900K TPS
Message delay:
Within Hermes 200ms, within Storm 20ms
Message processing success rate:
99.99%.
The Evolution of 3.Muise platform
2015 Q2o2015 Q3: developing a real-time computing platform based on Storm
2016 Q1N1 2016 Q2: Storm migrates JStorm and introduces StreamCQL
2017 Q1, 2017 Q2: Spark Streaming research and access
2017 Q3 online 2018 Q1: Flink research and access.
4.Muise platform architecture
1) Muise platform architecture
Application layer: at present, Muise Portal mainly supports Storm and Spark Streaming jobs, and supports a series of functions such as new jobs, Jar package release, job running and stopping, etc.
Middle layer: encapsulates the underlying Infrastructure to provide users with API based on Storm, Spark, Flink and Services in all aspects
Bottom layer: Hermes & Qmq is the data source, Redis, HBase, HDFS, DB, etc. as external data storage, Graphite, Grafana, ES are mainly used for monitoring.
2) Muise real-time computing process
Producer side: users first apply for the topic of Kafka, and then write the data to Kafka in real time
Muise Portal side: users develop based on the API provided by us. After development, users configure, upload and start jobs through Muise Portal. After the jobs are started, jar packages are distributed to each corresponding cluster to consume Kafka data.
Storage side: after being consumed, data can be written back to QMQ or Kafka, or stored in external systems such as Redis, HBase, HDFS/Hive, and DB.
5. Platform design-ease of use
First of all: as a platform, the first point of design is to be easy to use. We provide a comprehensive Portal to facilitate users to build and manage its jobs, and to develop real-time jobs that can be put online as soon as possible.
Second, we encapsulate a lot of Core API and support multiple real-time computing frameworks:
Support for HermesKafka/MySQL and QMQ
Integrate Jstorm, Spark Streaming, Flink
Job resource control
Provide DB, Redis, HBase and HDFS output components
Customize multiple metric based on built-in Metric system for job early warning monitoring
Users can customize Metric for monitoring and early warning
AtLeast Once and Exactly Once semantics are supported.
Above mentioned that the design of the platform should be easy to use, the following is about the fault tolerance of the platform to ensure that the data must not go wrong.
6. Platform Design-Fault tolerance
Jstorm: ensuring At Least Once based on Acker mechanism
Spark Streaming: implement Exactly Once based on Checkpoint and implement At Least Once based on Kafka Offset backtracking
Flink: implement Exactly Once based on Flinktwo-phase commit + Kafka 0.11 transactional support.
7.Exactly Once
1) Direct Approach
At present, most people use Spark Streaming to consume Kafka using Direct Approach:
Advantages: record the Offset consumed by each batch, and jobs can be traced back through offset
Disadvantages: data storage and offset storage are asynchronous:
The data is saved successfully, the application is down, and offset is not saved (resulting in data duplication)
Offset saved successfully, application down, data saving failed (resulting in data loss)
2) CheckPoint
Advantages: the running status and source data of each batch are recorded by default, and can be recovered from the cp directory in case of downtime.
Disadvantages:
1. Non-100% guaranteed ExactlyOnce
Https://www.iteblog.com/archives/1795 describes a scenario where Exactly once cannot be guaranteed.
There is also a case of block loss during doCheckPoint in https://issues.apache.org/jira/browse/SPARK-17606.
two。 Additional performance impact with cp enabled
3. Streaming job logic changes cannot be recovered from cp.
Applicable scenarios: more suitable for stateful computing scenarios
How to use: it is recommended that the program store offset on its own. In the event of an outage, if the logic of the spark code has not changed, create a StreamingContext according to the checkpoint directory. If there is a change, create a context and set up a new checkpoint point based on the offset stored by the implementation itself.
8. Platform Design-- Monitoring and alarm
How to help users find homework problems in the first place is a top priority.
Cluster monitoring
Server monitoring: Memory, CPU, Disk IO, Net IO are considered
Platform monitoring: Ganglia
Job monitoring
Native Metric system based on real-time computing framework
Customize Metrics response job status
Capture native and custom Metrics for monitoring and alarm
Storage: Graphite display: Grafana alarm: Appmon
Among the many Metrics customized by us now, the more common ones are:
Fail: number of Jstorm data processing failures, number of Spark task Fail within a regular period of time
Ack: the amount of data processed within a regular period of time
Lag: the intermediate delay between data generation and consumption within a regular period of time (kafka 2.0 is based on built-in bornTime).
Ctrip developed its own alarm system and substituted Metrics into the system to do alarm based on rules. Through the job monitoring Kanban to complete the monitoring and viewing of relevant indicators, we will take Flink as the more concerned Metrics indicators, all of which will be imported into the Graphite database, and then displayed based on the front-end Grafana. Through the job monitoring Kanban, we can directly see the Kafka to Flink Delay (Lag), which is equivalent to the data from generation to consumption by Flink jobs, with an intermediate delay of 62 milliseconds, which is relatively fast. Secondly, we monitored the speed of getting data from Kafka each time. Because getting data from Kafka is based on a small piece, we set the amount of data to pull 2 megabytes at a time. Through the job monitoring Kanban, you can monitor that the average delay of pulling data from Kafka is 25 milliseconds, and the Max is 760 milliseconds.
Next, we will talk about some of the holes we have stepped on in the past few years and how to fill them.
Step on and fill the pit
Pit 1:HermesUBT has a large amount of data and a lot of information, so both the server and the client are under great pressure.
Solution: provide unified offloading jobs that offload data to different topic based on specific rules and configurations.
Pit 2:Kafka cannot guarantee global order
Solution: in a scenario where global ordering is enforced, a single Partition; can be used as a Hash based on a certain field to ensure the internal order of the Partition if it is partially ordered.
Pit 3:Kafka cannot accurately retrace data to a certain period of time based on time.
Solution: the platform provides filtering capabilities to filter data earlier than the set time (after kafka 0.10, each piece of data has its own timestamp, so this problem is naturally solved after upgrading kafka).
Pit 4: at first, all Ctrip's Spark Streaming and Flink jobs run on the host group, which is a large Hadoop cluster. At present, there are thousands of jobs, and offline and real-time jobs are mixed. Once a large offline job comes up, it will have an impact on real-time jobs. Secondly, the Hadoop cluster will often do some upgrades, so it may restart Name Node or Node Manager, which will sometimes cause jobs to hang up.
Solution: we deploy separately, build a real-time cluster separately, and run real-time jobs independently. Offline and offline, real-time to real-time, real-time cluster runs Spark Streaming and Yarn homework separately, offline specializes in offline homework.
When deployed separately, you will encounter new problems. Some real-time jobs need to go to some offline jobs to do some Join or Feature operations, so you also need to access host group data. This is equivalent to a problem of cross-cluster access.
Pit 5:Hadoop real-time cluster access host group across clusters
Solution: Hdfs-site.xml configures ns-prod and ns dual namespace, pointing to the local and host groups, respectively
Spark configuration spark.yarn.access.namenodes or hadoopFlieSystems
Pit 6: both Jstorm and Storm will encounter the problem of CPU preemption. When you take a big assignment, especially the one that consumes a lot of CPU, maybe I give it a Worker and a CPU Core, but it may end up giving me three or even four.
Solution: enable cgroup to limit cpu usage.
IV. Application scenarios
1. Real-time report statistics
Real-time report statistics and presentation is also a scenario widely used by Spark Streaming. The data can be based on Process Time statistics or Event Time statistics. Because the job of different batches of its own SparkStreaming can be regarded as a scrolling window, and an independent window contains data of multiple time periods, there are some restrictions when using SparkStreaming based on Event Time statistics. Generally, the more commonly used way is to count the cumulative values of different time dimensions in each batch and import them into external systems, such as ES;, and then do secondary aggregation based on time to get the complete cumulative value and finally get the aggregate value when the report is displayed. The following figure shows the real-time Kanban implemented by Ctrip IBU based on Spark Streaming.
two。 Real-time data warehouse
1) Spark Streaming stores data in near real time
Today, there are shaped × × tools on the market that can consume data from Kafka in real time, filter and clean them, and eventually land on the corresponding storage system, such as Camus, Flume, etc. Compared with this kind of products, the advantage of Spark Streaming is that it can support more complex processing logic. Secondly, the resource scheduling based on Yarn system makes the resource allocation of Spark Streaming more flexible. Users use Spark Streaming to write data to HDFS or Hive in real time.
2) data quality inspection based on various rules
Based on Spark Streaming, the custom metric function checks and monitors the data quality of data quantity, number of fields, data format and duplicate data.
3) Real-time early warning based on custom metric
Determine some rules based on our encapsulated Metric registration system, and then each batch makes a check based on these rules and returns a result. This result will be based on Metric sink spit out, spit out based on metrics results to do a monitoring. At present, we use Flink to load TensorFlow model to make real-time prediction. The basic timeliness is that once the data arrives within two seconds, the alarm information can be sent out, giving users a very good experience.
V. Future planning
1.Flink on K8S
There are some different computing frameworks within Ctrip, including real-time computing, machine learning, and offline computing, so a unified underlying framework is needed for management, so Flink will be migrated to K8S for unified resource control in the future.
2.Muise platform connects to Flink SQL
Although the Muise platform is connected to Flink, users still have handwritten code. We have developed a real-time feature platform. Users only need to write SQL, that is, Flink-based SQL, which can collect the features needed by users in the model or used in real time. After that, the real-time feature platform will be merged with the real-time computing platform. Finally, users only need to write SQL to achieve all the real-time jobs.
3.Jstorm fully enables Cgroup
At present, due to some historical reasons, many jobs are now running on Jstorm, so there is an uneven allocation of resources, and then Cgroup will be fully enabled.
4. Online model training
Some departments of Ctrip need real-time online model training, after training the model with Spark, and then using the model of Spark Streaming to do a real-time interception or control, applied in risk control and other scenarios.
-end-
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.