What are the characteristics of the new column storage system Kudu in Hadoop ecology 02/08 Update SLTechnology News&Howtos

What are the characteristics of the new column storage system Kudu in Hadoop ecology

2026-02-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

Editor to share with you what are the characteristics of Hadoop ecological new row storage system Kudu, I believe most people do not know much about it, so share this article for your reference, I hope you will learn a lot after reading this article, let's learn about it!

With the development of Hadoop ecosystem, the storage layer is mainly controlled by HDFS and HBase systems, and there has not been much breakthrough. In the batch scenario with high throughput, we choose HDFS, and in the scenario with low latency and random read and write requirements, we choose HBase. Is there a system that combines the advantages of the two systems and supports high throughput and low latency at the same time? Some people try to modify the HBase kernel to construct such a system, that is, to retain the data model of HBase, and change its underlying storage part to pure column storage (at present, HBase can only be regarded as a column cluster storage engine), but this modification is more difficult. The emergence of Kudu is expected to solve this problem.

Kudu is Cloudera's open source column storage engine with the following features:

C++ language development

Efficient handling of OLAP-like load

Friendly integration with MapReduce,Spark and other components in the Hadoop ecosystem

It can be integrated with Cloudera Impala to replace the HDFS+Parquet combination commonly used in Impala.

Flexible consistency model

Good performance can still be achieved in the scenario where sequential writing and random writing coexist.

High availability, using Raft protocol to ensure high reliable storage of data

Structured data model

The emergence of Kudu is expected to solve a wide range of problems that are difficult to solve in the current Hadoop ecosystem, such as:

Update of streaming real-time calculation results

Time series related applications, specific requirements are:

Query massive historical data

Query individual data and require a quick return

In the prediction model, the model is updated periodically and decisions are made quickly based on historical data.

These are all the contents of the article "what are the characteristics of Kudu, a new row storage system in Hadoop ecology?" Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.