Write down the export of hadoop big data 04/24 Update SLTechnology News&Howtos

Write down the export of hadoop big data

2025-04-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Requirements: export one-month data to provide test analysis

Implementation:

Direct hive-e "xxxxx" > testdata.txt

Then look at the output information, map%1% 2% 3. 0, but the reduce has always been 0%. After waiting for more than ten hours, it is still 0%. Finally, I killed the process and tried it several times. The phenomenon is the same. We can see that we have been waiting for more than ten hours each time, in a twinkling of an eye, for two days.

So I suspected that there was something wrong with the cluster and investigated it for a long time, but no problem was found.

It is also suspected that there is something wrong with the where conditions, but it is still the same after tossing about for a long time.

Later, limit was added to see if there were any results, and if so, it proved that the grammar was correct; sure enough, limit 10 quickly came out with 10 records, grammatically correct.

Then changed the spark to extract, always reported insufficient buffer, added to the original 10 times, but also prompted insufficient.

Is the data so big?

Prepare count, wait, wait, wait! I was wrong

So first use hive to export a day's data, and so on, recharacterize the file to write 20 minutes, I wondered how many ah, and so on finished with wc-l a look, more than 8 million, file size 4G, oh, immediately understand, not a cluster problem, because there is too much data reduce implementation is very slow.

Finally, it is estimated that each treaty is 600B, and then 1000 articles are taken every day for 7 consecutive days, and the final file size is about 4MB.

The command is as follows:

Hive-e "set hive.cli.print.header=true;use dw;select * from aem where day = '2015-08-24' limit 1000" > aem_pg_8_24_30.txt

Hive-e "use dw;select * from aem where day = '2015-08-25' limit 1000" > > aem_pg_8_24_30.txt

Hive-e "use dw;select * from aem where day = '2015-08-26' limit 1000" > > aem_pg_8_24_30.txt

Hive-e "use dw;select * from aem where day = '2015-08-27' limit 1000" > > aem_pg_8_24_30.txt

Hive-e "use dw;select * from aem where day = '2015-08-28' limit 1000" > > aem_pg_8_24_30.txt

Hive-e "use dw;select * from aem where day = '2015-08-29' limit 1000" > > aem_pg_8_24_30.txt

Hive-e "use dw;select * from aem where day = '2015-08-30' limit 1000" > > aem_pg_8_24_30.txt

Harvest:

Big data's way of thinking should be slightly different. It is important to estimate the amount of data first, and then determine the export method. If it is too large, reduce the granularity and export it multiple times.

Thinking is very important! Thinking is very important! Thinking is very important!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.