Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to optimize the output of hive files

2025-03-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article is about how to optimize the output of hive files. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

There is such a problem when developing ETL tools to extract data. It took only 1 minute for hive to execute the sql statement MR execution, and the corresponding data records were written to the local file for more than ten minutes. After a detailed study, there are two ways for hive to export files:

The first is the most commonly used, and the final result is to generate a large file. Easy to operate. Because it is a single-line file, CPU,IO is wasteful.

Hive-e "use dw Select detail.siid siid, si.basic_class1_id, si.basic_class1_name, si.basic_class2_id, si.basic_class2_name, si.classify, si.bi_name, city_id, city.name city_name, operate_area, object_id, operate_type, operate_type_text, instock_amount, instock_price, instock_total, outstock_amount, outstock_price, outstock_total, stock_amount, stock_cost, stock_total, sale_price, sale_total, profit_total, operate_origin_amount Operate_origin_price, from_unixtime (operate_time,'yyyyMMdd HH:mm:ss'), if (delivery_datekey is null,0,delivery_datekey), financial_origin_adjustment, warehouse_origin_adjustment from dw.dw_wms_operate_flow_detail_v2 detail left join dw.dw_goods_standard_item si on detail.siid=si.id left join dim.dim_city city on city.id=detail.city_id where delivery_datekey > = 20151213 "> / home/meicai/test/data.txt

The second way: INSERT OVERWRITE LOCAL DIRECTORY'/ home/meicai/test' row format delimited FIELDS TERMINATED BY'\ t 'after specifying the file in hive. The file is written by mapper or reduce. Multiple threads write files concurrently, and the result is multiple files. Make full use of resources such as CPU and IO.

Hive-e "use dw INSERT OVERWRITE LOCAL DIRECTORY'/ home/meicai/test' row format delimited FIELDS TERMINATED BY'\ t 'select detail.siid siid, si.basic_class1_id, si.basic_class1_name, si.basic_class2_id, si.basic_class2_name, si.classify, si.bi_name, city_id, city.name city_name, operate_area, object_id, operate_type, operate_type_text, instock_amount, instock_price, instock_total, outstock_amount, outstock_price, outstock_total, stock_amount, stock_cost Stock_total, sale_price, sale_total, profit_total, operate_origin_amount, operate_origin_price, from_unixtime (operate_time,'yyyyMMdd HH:mm:ss'), if (delivery_datekey is null,0,delivery_datekey), financial_origin_adjustment, warehouse_origin_adjustment from dw.dw_wms_operate_flow_detail_v2 detail left join dw.dw_goods_standard_item si on detail.siid=si.id left join dim.dim_city city on city.id=detail.city_id where delivery_datekey > = 20151212 "

It takes about 11 minutes to pass the first test and 20 seconds to complete the second.

For the second method, multiple files are generated for load to the database or a local file merge.

Cat. / * | sed 's/NULL/\\ N g' > all.data (replace NULL in the text with\ N to ensure that the mysql identification bit is empty).

Or loop load into the database.

Thank you for reading! On "how to optimize the output of hive files" this article is shared here, I hope the above content can be of some help to you, so that you can learn more knowledge, if you think the article is good, you can share it out for more people to see it!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 288

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report