Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the basic frameworks of Hadoop

2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article introduces the relevant knowledge of "what is the basic framework of Hadoop". In the operation of actual cases, many people will encounter such a dilemma. Next, let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Pig

A lightweight scripting language for operating hadoop, which was first introduced by Yahoo, but is now in decline. At the beginning, Yahoo slowly withdrew from the maintenance of pig and contributed it to the open source community to be maintained by all enthusiasts. However, some companies still use it, but I think it is better to use hive than to use pig.

Pig is a data flow language, which is used to process huge data quickly and easily.

Pig consists of two parts: Pig Interface,Pig Latin.

Pig can easily process the data of HDFS and HBase. Like Hive, Pig can deal with what it needs to do very efficiently. It can save a lot of labor and time by directly operating Pig query. You can use Pig when you want to make some transformations on your data and don't want to write MapReduce jobs.

Hive

Friends who do not want to use programming language to develop MapReduce, such as DB, friends who are familiar with SQL can use Hive for data processing and analysis offline.

Note that Hive is now suitable for data manipulation offline, that is, it is not suitable for real-time online queries or operations in a real production environment, because one word is "slow". On the contrary

It originated from the role of FaceBook,Hive as a data warehouse in Hadoop. Built at the top of the Hadoop cluster, it operates on the SQL-like interface of the data stored on the Hadoop cluster. You can use HiveQL for select,join, and so on.

If you have data warehouse requirements and you are good at writing SQL and do not want to write MapReduce jobs, you can use Hive instead.

HBase

HBase runs on top of HDFS as a column-oriented database, and HDFS lacks immediate read and write operations, which is why HBase appears. HBase is modeled on Google BigTable and stored in the form of key-value pairs. The goal of the project is to quickly locate and access the required data in billions of rows of data in the host.

HBase is a database, a NoSql database, like other databases to provide immediate read and write function, Hadoop can not meet the real-time needs, HBase can meet. If you need to access some data in real time, store it in HBase.

You can use Hadoop as a static data warehouse and HBase as a data store to store data that will change if you do something.

Pig VS Hive

Hive is more suitable for data warehouse tasks, while Hive is mainly used for static structures and tasks that require frequent analysis. The similarity between Hive and SQL makes it an ideal intersection of Hadoop and other BI tools.

Pig gives developers more flexibility in the big data set domain and allows the development of concise scripts to transform data streams for embedding into larger applications.

Pig is relatively lightweight compared to Hive, and its main advantage is that it can significantly reduce the amount of code compared to using Hadoop Java APIs directly. Because of this, Pig still attracts a large number of software developers.

Both Hive and Pig can be used in combination with HBase. Hive and Pig also provide high-level language support for HBase, which makes it very easy to process data statistics on HBase.

Hive VS HBase

Hive is a batch processing system based on Hadoop to reduce the writing work of MapReduce jobs, and HBase is a project to support a project to make up for Hadoop's shortcomings in real-time operation.

Imagine you are operating a RMDB database. If it is a full table scan, use Hive+Hadoop, and if it is an index access, use HBase+Hadoop.

Hive query is that MapReduce jobs can run from 5 minutes to more than a few hours. HBase is very efficient and certainly much more efficient than Hive.

This is the end of the content of "what are the basic frameworks of Hadoop". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report