How to realize Real-time and batch Analysis of data on Druid by AirBnB 04/25 Update SLTechnology News&Howtos

How to realize Real-time and batch Analysis of data on Druid by AirBnB

2025-04-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Airbnb is serving millions of guests and hosts in our community. Every second, their activities on Airbnb.com, such as search, booking and messaging, generate large amounts of data that we anonymize and use to improve the community experience on our platform.

Airbnb's data platform team is committed to using this data to improve customer experience and optimize Airbnb's business. Our mission is to provide the infrastructure to collect, organize and process large amounts of data (all in a private and secure manner) and to enable Airbnb organizations to obtain the necessary analysis and make data-informed decisions.

The main way for companies to disclose and share advanced analysis within the company is through various dashboards. Many people use these dashboards every day to make decisions. The dashboard also allows real-time tracking and monitoring of all aspects of our business and systems. Therefore, the timeliness of these dashboards is critical to the day-to-day operation of Airbnb. However, we face three challenges:

First, it takes a long time to aggregate data in the warehouse, and systems such as Hive and Presto are used to generate the necessary data for these dashboards when querying. Hive / Presto must read all the data and aggregate it on demand, causing all necessary calculations to be invoked at query time. Even if these engines are used to pre-compute aggregate and store them, the storage format is not optimized for repeated slicing and slicing of the data needed to analyze the query.

Secondly, the system needs to be reliable and scalable. It is supporting Airbnb's core analysis use cases, so any downtime can have a serious impact on the business and its employees. In addition, the amount of data, queries and users are also growing, and our analysis system should be able to cope with the growing demand.

Third, we need a system that integrates perfectly with the data infrastructure based on the open source framework. For example, most of our datasets are stored in Hadoop, and we use Kafka and Spark Streaming to process our data flow.

That's why we adopted Druid.

Advantages of Druid

Fast query time

Druid provides subsecond query latency through predefined data sources and pre-calculated aggregations. Dashboards built on Druid are significantly faster than dashboards built on other systems. Compared with Hive and Presto, Druid can be an order of magnitude faster.

An architecture that provides reliability and scalability

The architecture of Druid is well divided into different components for injection, services, and overall coordination. We found that this component-based architecture was reliable and stable for our workloads, and it enabled us to easily scale the system as needed.

Druid separates data storage into deep storage for long-term storage, while the architecture of temporarily caching data in history nodes works well for us. Permanently storing the analysis data in S3 allows free disaster recovery and allows us to easily manage cluster hardware upgrades and maintenance (for example, easily switching node types to take advantage of the latest hardware).

Integration with open source framework

Druid also integrates smoothly with the open source data infrastructure that is mainly based on Hadoop and Kafka:

1. Druid's API allows us to easily extract data from Hadoop for batch analysis.

2. Druid realizes real-time analysis through stream processing engine. Druid provides a streaming client API,Tranquility that integrates with streaming engines such as Samza or Storm and can be integrated with any other JVM-based streaming engine. In Airbnb, the data stream is transmitted to Druid for real-time analysis by using Spark Streaming of Tranquility client.

3. Druid is well integrated with Apache Superset. Apache Superset is an open source data visualization system developed and open source by Airbnb. Superset is the interface for users to write and execute analytical queries and visualize the results on Druid.

How Airbnb uses Druid: dual Cluster configuration

At Airbnb, two Druid clusters are in production. Two separate clusters allow dedicated support for different uses, even though a single Druid cluster can handle more data sources than we need. We have a total of 4 Broker nodes, 2 Overlord nodes, 2 Coordinator nodes, 8 MiddleManager nodes and 40 Historical nodes. In addition, our cluster is supported by a MySQL server and a ZooKeeper cluster with five nodes. Compared with other service clusters such as HDFS and Presto, Druid clusters are relatively small and low-cost.

Of the two Druid clusters, one is dedicated to centralized key metrics services. To serve all dashboards of Airbnb, users can easily define their metrics through simple YAML files. Users can view their dashboards and metrics on Superset without knowing anything about Druid.

All batch jobs are scheduled using Airflow to extract data from the Hadoop cluster.

All real-time and other data sources for self-service users are handled by other druid clusters. Get real-time data through Spark Streaming + Tranquility client settings.

Improve the Druid usage of Airbnb

While Druid provides many powerful and widely applicable features to satisfy most enterprises, we do implement functionality within or on Druid to better serve our particular use cases.

A framework for real-time answering Ad-hoc analysis queries

Airbnb has a large number of data scientists embedded in different business teams. Each of them may have temporary questions about businesses that need insight from data, which usually requires data to be aggregated in any way.

To meet this requirement, we have built a self-service system on top of Druid that allows teams to easily define how data generated by applications or services should be aggregated and exposed as Druid data sources. Data scientists and analysts can then query Druid to answer temporary questions.

Users define their data sources using the following configuration. Real-time data from Kafka and bulk data from HDFS / S3 will be injected according to the configuration file.

Druid aggregates its real-time data through a 5-minute window, plus a pipe delay of 1 minute.

Real-time streaming from Druid enables us to provide users with many complex functions. An interesting use case for real-time capture is exception detection. By quickly injecting and summarizing real-time data in Druid, we can very quickly detect anything in production that does not conform to the expected pattern.

Integrate with Presto

In addition to the recent version of SQL query support, Druid also has a mature query mechanism that uses JSON over HTTP RESTful API. However, one limitation of Druid is that it does not allow queries across data sources (in short, a join query). All aggregate queries are limited to a single data source. However, in Airbnb, we do have scenarios where multiple data sources with overlapping dimensions need to be joined together for some queries. Another approach is to keep all the data in one data source, which is not the best in our scenario for a variety of reasons (including the rhythm of data generation, different data sources (for example, different services generate data), and so on. However, the need for cross-source queries is real and has recently become a mandatory requirement.

To cater to these situations, we have developed an internal solution based on Presto. Specifically, we have introduced a Presto connector for Druid, which can push queries to Druid through a single data source, and can retrieve and connect data to complete the execution of queries across data sources. The implementation details are still evolving and are beyond the scope of this article. We will provide more details in future separate posts.

Improve backfilling performance

Druid queries secrets that are much faster than other systems at the expense of injection. Before you can be used in a query, you need to extract each segment from the MapReduce job. This is ideal as a write-once read-multiple model, and the framework only needs to inject new data every day.

However, problems arise when the owner of the data source wants to redesign it and regenerate historical data. This means that data from the past few years need to be re-injected into Druid to replace the old data. This requires a very large injection job and a long-running MapReduce task, which makes it expensive, especially if an error occurs during the reinjection process.

One possible solution is to divide a large amount of injection into multiple requests to achieve better reliability. However, the query results will be inconsistent because it will be calculated based on a mixture of existing old data and new ingested data. With the development of user requirements and ingestion framework functions, backfilling operations are actually more frequent than we expected, making its performance a pain point that needs to be improved.

To solve this problem, we designed a solution that basically keeps all newly injected segments invalid until explicitly activated. This enables the extraction framework to divide the data source into smaller intervals of acceptable size. The framework then injects these intervals in parallel (with the parallelism allowed by Yarn cluster resources). Because the newly injected data is still inactive, these segments will be hidden in the background and will not mix different versions of the data when calculating the results of executing the query while the backfill injection is still in progress. When we activate the latest version of the segment for the data source, it will be refreshed with the new version without downtime. Split and refresh have greatly improved backfill performance, and backfilling for operation has been made for more than a day and is now completed within an hour.

Monitoring and operation

We continuously monitor the reliable service and optimal performance of Druid. Druid is powerful and resilient to node failures. Most node failures are transparent and imperceptible to the user. Even if the role of a single point of failure (such as Coordinator,Overlord or even ZooKeeper) fails, the Druid cluster can still provide query services for users. However, out of respect for users, SLA should detect any service interruption in a timely manner or even before a failure occurs.

As with other clusters, we monitor each machine in the Druid cluster by collecting machine statistics and issuing alerts when any instance reaches its capacity or enters a bad state. To monitor the overall cluster availability, we inject a piece of early warning data into the Druid every 30 minutes and check that the query results of each Broker node match the latest injection data every 5 minutes. Any degradation in the service, including query, injection, or downstream HDFS instability, can be detected within SLA.

Druid, which has been operating at Airbnb for many years, is one of the systems with the lowest maintenance costs. Druid's multi-role design makes the operation simple and reliable. Cluster administrators can adjust the cluster configuration and add / remove nodes according to monitoring metrics. As data grows in our Druid cluster, we can continue to add historical node capacity to cache and easily provide large amounts of data. If the real-time injected workload shows a rise, we can easily add an intermediate manager node accordingly. Similarly, if we need more capacity to handle queries, we can increase the number of proxy nodes. Thanks to Druid's decoupling architecture, we have completed a large operation to migrate all the data in the new storage from HDFS to S3 and rebuild the new cluster with only a few minutes of downtime.

Challenges and future improvements

While Druid provides us with good services in our data platform architecture, there are new challenges as our use of Druid within the company grows.

One of the issues we deal with is the daily increase in the number of segment files that need to be loaded into the cluster. The segment file is the basic storage unit of Druid data and contains pre-aggregated data for the preparation service. At Airbnb, we encountered scenarios where a large number of data sources sometimes need to be completely recalculated, resulting in a large number of segment files that need to be loaded into the cluster at once. Currently, Coordinator loads injected segments centrally in a single thread. As more and more segments are generated and Coordinator is unable to keep up, we see an increase in the delay between the time the injection job completes and the time the data is available for query (after the coordinator loads). Sometimes the delay can be several hours.

The usual solution is to try to increase the size of the target segment, thereby reducing the number of segments. However, in our use, producing large chunks of data input (the Hadoop worker runs the ingestion task) is so high that the Hadoop job runs for too long to process the data and fails many times for various reasons.

We are currently exploring a variety of solutions, including compressing segments after injection and before passing them to the coordinator, as well as different configurations to increase segment size without compromising injection job stability where possible.

Conclusion

Druid is a big data analysis engine designed for scalability, maintainability and performance. Its good factor architecture makes it easy to manage and scale Druid deployments, and its optimized storage format enables low-latency analytical queries. At present, foreign countries, such as Google, Facebook, Airbnb, Instgram, Amazon, Pinterest, etc., domestic Internet companies such as Alibaba, Xiaomi, 360,360,Youku, Zhihu, several geeks and other well-known Internet companies are using Druid, the momentum of development is in full swing. I believe that in the near future, Druid will become the de facto standard for the next generation of OLAP real-time analysis engine. Let's wait and see!

This article is translated and sorted by the author Wu Jianglin. Please indicate the source and keep this information when reproduced. Thank you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.