How to update the traditional hive data table by writing SQL 07/09 Update SLTechnology News&Howtos

How to update the traditional hive data table by writing SQL

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly shows you the "traditional hive data table how to write SQL way to achieve data update", the content is easy to understand, well-organized, hope to help you solve your doubts, the following let the editor lead you to study and learn "traditional hive data table how to write SQL to achieve data update" this article.

1. Here are two new tables student10,student10_temp (temporary tables, which can be regarded as business libraries). The table statement is as follows:

/ / Student information table create table student10 (id string,name string,age string) row format delimited fields terminated by',' stored as textfile;// student information table temporary table, changes in business relational database, generally import temporary table create table student10_temp (id string,name string,age string) row format delimited fields terminated by',' stored as textfile

two。 Data preparation, which can be understood as the full amount of data before the business database update, file test.txt, the data are as follows:

[root@salver158 ~] # cat / tmp/test.txt 1 cat / Name3 18

3. Here, load the data from test.txt into the student10_ temp table and execute the command:

Hive > load data local inpath'/ tmp/test.txt' into table student10_temp;Loading data to table lujs.student10_tempTable lujs.student10_temp stats: [numFiles=1, numRows=0, totalSize=33, rawDataSize=0] OKTime taken: 1.275 seconds

Execute insert into to insert student10_temp into the student10 table in full:

Hive > insert into table student10 select id,name,age from student10_temp

At this point, the data of the two tables are the same, which is equivalent to the import of full data for the first time in our production. At this time, we update the student10_ temp table (here we delete one piece of data and insert two pieces of data in the simulated business relational database), and load the file test1.txt directly:

[root@salver158 tmp] # cat / tmp/test1.txt 2,name2,173,name3,5104,name4,195,name5,20

In fact, this student10_temp generally represents the change of the relational database, and then we can extract all the changed data from the relational database to the student10_temp temporary table through data extraction tools such as sqoop.

Hive > select * from student10_temp;OK2 name2 173 name3 5104 name4 195 name5 20

4. We can compare the data through SQL, and then overwrite the non-incremental and non-updated data into the student10 table. After executing SQL, we can ensure that the student10 and student10_temp data are not duplicated (here the table must have a primary key, otherwise it cannot be associated)

Find out non-incremental and non-updated data, that is, deleted data, and execute the command:

Select b.id,b.name,b.age from student10 a left join student10_temp b on a.id=b.id where b.id is nul

Overwrite non-incremental, non-updated data to the student10 table and execute the command:

Insert overwrite table student10 select b.id,b.name,b.age from student10 a left join student10_temp b on a.id=b.id where b.id is null

After execution, there is only non-incremental and non-updated data in the table, which generally means that the business database has deleted data. Here we can choose to keep it as an identification.

5. After the comparison and deduplication in step 4, you can load the temporary table data and execute the command:

Hive > insert into table student10 select id,name,age from student10_temp

View the final data, including non-incremental, non-updated data, as well as updated and incremental data:

Hive > select * from student10;OK1 name1 162 name2 173 name3 5104 name4 195 name5 20Time taken: 0.436 seconds, Fetched: 5 row (s)

Here is just a simple data update scheme, there may be problems in the middle, the last backup should be done before the data update, so as not to lose data.

Supplementary SQL:

1. Compare the new and updated data SQL. You can check before step 4. You can add an identification field in the student10 table to identify whether the data has been deleted, updated, or added. It is not demonstrated here. If you are interested in studying it yourself:

Hive > select a.namemema. Name.age 8b56 case WHEN b.id IS null THEN 'added' ELSE 'update' END FROM student10_temp a LEFT JOIN student10 b on a.idmemb.iditterQuery ID = root_20200310194355_d5428597-77a7-40e6-8b56-6b636c3e137eTotal jobs = 1Launching Job 1 out of 1Status: Running (Executing on YARN cluster with App id application_1583204243863_0026)

-VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED-- -Map 1. SUCCEEDED 1 1 0 0 0 0Map 2. SUCCEEDED 1 10000-VERTICES: 02 ELAPSED TIME 02 [= = > >] 100% Universe: 6.59s-OK2 name2 17 Update 3 name3 510 Update 4 name4 19 New 5 name5 20 New above is "traditional hive data Table" All the contents of the article "how to update data by writing SQL" Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.