Introduction of Druid Multi-process Architecture in Apache 07/09 Update SLTechnology News&Howtos

Introduction of Druid Multi-process Architecture in Apache

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Today, the editor shares with you a detailed introduction of the Druid multi-process architecture in Apache. I believe most people don't know much about it. In order to make you understand better, the editor summed up the following contents. Without saying much, let's move on.

Druid is a multi-process architecture, and each process type can be configured and extended independently. This provides maximum flexibility for the cluster. This design also provides strong failure tolerance: one failed component does not immediately affect another.

Let's take a closer look at what types of processes Druid has and what role each process plays in the entire cluster.

Processes and Services (Process and Servers)

Druid has a variety of process types, as follows:

The Coordinator process is responsible for managing the availability of data in the cluster. The Overlord process controls the resource load allocation of data intake. The Broker process handles queries from external clients. The Router process is optional and can route requests to Brokers,Coordinator, and Overlord. The Historical process stores queryable data. The MiddleManager process is responsible for data intake.

You can deploy the above process in any way. However, in order to facilitate operation and maintenance, officials recommend that processes be organized in the following three types of services: Master, Query and Data.

Master: runs Coordinator and Overlord processes to manage data availability and data writing. Query: runs Broker and the optional Router process, which handles external query requests. Data: runs the Historical and MiddleManager processes, which perform data write tasks and store queryable data. External dependency (External dependencies)

In addition to the built-in process types, Druid has three external dependencies.

Deep storage

Shared file storage, as long as it is configured to allow Druid access. In a cluster deployment, you typically use distributed storage (such as S3 or HDFS) or mount a network file system. In a stand-alone deployment, a local disk is usually used. Druid uses Deep Storage to store data written to the cluster.

Druid uses Deep Storage only as a backup of data and as a means of data transfer between Druid processes in the background. To respond to the query, the Historical process does not read the data from the Deep Storage, but queries the existing data from the local disk before any query. This means that Druid does not need to access Deep Storage when querying, so that you can get the optimal query latency. This also means that you must have enough disk space to store the data you plan to load between the Deep Storage and Historical processes.

Deep Storage is an important part of the elastic and fault-tolerant design of Druid. If the local data of the Druid stand-alone process is lost, the data can be recovered from Deep Storage.

Metadata storage

Metadata storage, which stores various shared system metadata, such as segment availability information and task information. In a cluster deployment, a traditional RDBMS, such as PostgreSQL or MySQL, is usually used. In a stand-alone deployment, local storage, such as an Apache Derby database, is typically used.

Zookeeper

Used for internal service discovery, coordination, and primary elections.

Architecture Diagram (Architecture diagram)

The following figure shows how data is queried and written using the officially recommended deployment of Master/Query/Data services:

Storage Design (Storage design) Datasources and segments

Druid data is stored in "datasources", which is like table in RDBMS. Each datasources is partitioned by time, or by other properties. Each time range is called "chunk" (for example, one a day, if your datasource uses day partitions). In chunk, data is partitioned into one or more "segments". Each segment is a separate file that usually contains millions of lines of data. Once segment is stored in chunks, it will be organized as shown in the following timeline:

There may be only one datasource, or there may be hundreds or even millions of segment. Each segment life cycle begins when the MiddleManager is created, and when it is first created, the segment is mutable and uncommitted. The segment build process consists of the following steps designed to generate data files that are compact and support fast queries.

Convert to column format using bitmap to create indexes use various algorithms to compress data to do dictionary encoding for String columns, use minimized id storage to do bitmap compression for bitmap indexes, do type-aware compression for all columns

Segment is submitted and released regularly. At this point, the data is written to the Deep Storage, is no longer mutable, and migrates from the MiddleManagers process to the Historical process. An entry about segment will be written to metadata storage. This entry is self-describing information about the metadata of segment, including information such as segment's data schema, size, Deep Storage address, etc. This information lets Coordinator know which data is available in the cluster.

Indexing and handover (Indexing and handoff)

Indexing is the mechanism created by each segment. Handoff is a mechanism by which data is published and started to be processed by Historical processes. The working order of this mechanism on the indexing side is as follows:

Start an indexing task and build a new segment. Its identity must be determined before building. For an additional task (such as a kafka task, or an append mode task), you can call Overlord's "allocate" API to add a potential new partition to an existing segment. For an override task (such as a Hadoop task, or a non-append mode index task), a new version number and a new segment are created for interval. If the indexing task is a real-time task (such as a Kafka task), the segment can be queried immediately. The data is available, but still unpublished. When the indexing task finishes reading segment data, it pushes the data to Deep Storage and publishes it by writing a record to metadata store. If the indexing task is a real-time task, it will wait for the Historical process to load the segment. If the indexing task is not a real-time task, exit immediately.

This mechanism works on the Coordinator/Historical side as follows:

Coordinator pulls published segments from metadata storage periodically (default, per minute). When Coordinate finds a published but unavailable segment, it selects a Historical process to load the segment and instructs Historical what to do. Historical loads and services segment. At this point, you can exit if the indexing task is still waiting for the data to be handed over.

Data writing (indexing) and handover (handoff):

Segment identification (Segment identifiers)

The Segment logo consists of the following four parts:

Datasource name. Time interval (the time interval included by the segment, which is specified by segmentGranularity when the data is ingested). Version number (usually an ISO8601 timestamp, corresponding to the time when the segment was first generated). Partition number (integer, unique in datasource+interval+version, not necessarily contiguous).

For example, this is the identifier with datasource as clarity-cloud0, time period 2018-05-21T16:00:00.000Z/2018-05-21T17:00:00.000Z, version number 2018-05-21T15:56:09.909Z, and partition number 1:

Clarity-cloud0_2018-05-21T16:00:00.000Z_2018-05-21T17:00:00.000Z_2018-05-21T15:56:09.909Z_1

Segment with partition number 0 (the first partition in the block) omits the partition number, as shown in the following example, it is a segment in the same time block as the previous partition, but the partition number is 0 instead of 1:

Clarity-cloud0_2018-05-21T16:00:00.000Z_2018-05-21T17:00:00.000Z_2018-05-21T15:56:09.909Z version Control (segment versioning)

You may want to know what the "version number" described in the previous section is.

Druid supports batch mode overrides. In Driud, if all you have to do is append data, there is only one version for each time block. However, when you overwrite data, what happens behind the scenes is to use the same data source, the same time interval, but a higher version number to create a new set of segment. This signals to the rest of the Druid system that the older version should be removed from the cluster and that it should be replaced with a new version.

To the user, the switch seems to happen instantly, because Druid handles this problem by loading new data first (but no queries are allowed on it), and then switches the new query to the new segment as soon as all the new data is loaded. Then, it deletes the old segment a few minutes later.

Segment (segment) life cycle

The lifecycle of each segment involves the following three main areas:

Metadata store: once the segment is built, the segment metadata (small JSON data, usually no more than a few KB) is stored in the metadata store. The operation of inserting segmnet records into the metadata store is called publishing. Then set the use Boolean value in the metadata to available. Segment created by real-time tasks will be available before publishing because they are released only when the segment is complete and do not accept any other data. Deep storage: immediately after the segment data is built and before publishing the metadata to the metadata store, push the segment data file to the deep store. Query availability: segment can be used to query on some Druid data servers, such as real-time tasks or Historical processes.

You can use the Druid SQL sys.segments table to check the current segment status. It includes the following flags:

Is_published: if segment metadata has been published to the stored metadata, used is true, and this value is also true. Is_available: True if the segment is currently available for real-time tasks or Historical queries. Is_realtime: true if segment is available for real-time tasks. Data sources that use real-time writes are usually set to true first and then become false as segment is published and handed over. Is_overshadowed: true if the segment has been released (used is set to true) and is completely overwritten by some other published segment. Typically, this is a transitional state in which segment will soon automatically set its used flag to false. Query processing

The query first enters the Broker process, and Broker will figure out which segment has data related to the query (the segment list is always planned by time, or it can be planned based on other attributes, depending on how the data source is partitioned), and then Broker determines which Historical and MiddleManager are serving these segment and sends the rewritten subquery to each process. The Historical / MiddleManager process accepts the query, processes it, and returns the result. Broker receives the results and merges them together to get the final answer and returns it to the client.

Broker analyzes each request and optimizes the query to minimize the amount of data that must be scanned for each query. Compared to the optimizations made by Broker filters, the index structure within each segment allows Druid to find out which rows, if any, match the filter set before viewing any rows of data. Once Druid knows which rows match a particular query, it accesses only the specific columns required by that query. In these columns, Druid can skip between rows to avoid reading data that does not match the query filter.

Therefore, Druid uses three different techniques to optimize query performance:

Retrieve the segment to be accessed for each query.

In each segment, an index is used to identify the rows of the query.

In each segment, only the rows and columns related to a particular query are read.

On the Apache Druid multi-process architecture to share here, I hope the above content can be of some help to you, can learn more knowledge. If you like this article, you might as well share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.