How to analyze big data based on mdrill 02/14 Update SLTechnology News&Howtos

How to analyze big data based on mdrill

2026-02-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

How to carry out big data analysis based on mdrill, I believe that many inexperienced people are at a loss about it. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

Project profile

Mdrill is a set of data software opened by Ali. For the amount of TB-level data, it can respond in seconds with only 10 machines, the data can be imported in real time, and any dimension can be combined and filtered.

As an online data analysis and processing software, mdrill can analyze the data of any combination of dimensions at the level of ten billion in a few seconds to tens of seconds.

In Ali, 10 machines complete 3 billion of the daily data storage, of which 1 billion are real-time data imports and 2 billion are offline imports. At present, the total storage of the cluster is more than 100 billion 80 million 400 dimensions of data.

Characteristics of mdrill

1. To meet big data query needs: adhoc daily data volume of 3 billion, with the accumulation of time, the data will become larger and larger, mdrill uses column storage, index, distributed technology, appropriate partitions to meet the needs of users for real-time online analysis of data.

two。 Support for incremental updates: offline mdrill data supports incremental updates based on partitions.

3. Support for real-time data import: real-time import at level 1 billion per day (peak 200 million per hour) is supported with only 10 machines.

4. Fast response time: column storage, inverted indexing, efficient data compression, memory computing, various caches, partitions, distributed processing, etc., enable mdrill to analyze tens of billions of levels of data in only a few seconds to tens of seconds.

5. Low cost: at present, there are only 10 PCs with 48G memory in Ali adhoc, but it does store more than 100 billion data.

6. Full-text search mode: powerful condition settings, any combination, no matter difficult or easy second preview, 16 billion of the data every day are screened at will.

The growth of mdrill data

Time point

Amount of data

Event

December of the year

Less than 200 million

Adhoc debuts for the first time

January, 13.

20 ~ 3 billion

The capacity was expanded from 2 machines to 10.

May 2, 13

10 Billion

More than ten billion for the first time

July 24, 13

40 billion

Open source for the first time

November 13

100 billion

Full-text Retrieval Mode ods_allpv_ad_d launched

December 13

150 billion

Access to real-time data and wireless data

February 14

320 billion

At present, there are only 11 machines, and the utilization rate of hard disk is 30%.

After reading the above, have you mastered the method of how to analyze big data based on mdrill? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.