In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
Yesterday the developer came to our DBA and asked us to write a Hive SQL.
Demand:
There is a t table, which mainly consists of two columns: the name of the airport airport and the latitude and longitude of the airport. I want to get the names of all two airports with distances less than 100s.
In fact, it is not very difficult to write the logic of this SQL. The difficulty is how to repeat the value.
I use MySQL to simulate a table, in fact, Hive syntax is similar to SQL, inserting three pieces of data, a, b, c represent three airport names, the structure is as follows:
Mysql > show create table t\ gateway * 1. Row * * Table: tCreate Table: CREATE TABLE `t` (`airport` varchar (10) DEFAULT NULL, `distant` int (11) DEFAULT NULL) ENGINE=InnoDB DEFAULT CHARSET=utf81 row in set (0.00 sec) mysql > select * from t +-+-+ | airport | distant | +-+-+ | a | 130 | b | 140 | c | 150 | +-+-+ 3 rows in set (0.00 sec)
Pass! = filter out the comparison between the airport itself, and use the abs function to take the absolute value to get two airports whose position is less than 100.
Mysql > select t1.airport, t2.airport from t T1 where t1.airport! = t2.airport and abs (t1.distant-t2.distant)
< 100;+---------+---------+| airport | airport |+---------+---------+| b | a || c | a || a | b || c | b || a | c || b | c |+---------+---------+6 rows in set (0.00 sec) 但是问题来了,(b,a) 与(a,b),(c,a)与(a,c),(c,b)与(b,c)这里被我们视为重复值,我们只需要得到其中某一行的数据,就知道是哪两个机场名了,那么,如何去掉这个重复值呢? 貌似distinct,group by都派不上用场了,最后咨询了一位资深的SQL高手,找到了这么一个函数hex(),可以把一个字符转化成十六进制,Hive也有对应的函数,效果如下: mysql>Select t1.from t1.airport Hex (t1.distant-t2.distant), t2.Airport where t1.airport Hex (t2.airport) from t 1jiggle T2 where t1.airport! = t2.airport and abs (t1.distant-t2.distant)
< 100;+---------+-----------------+---------+-----------------+| airport | hex(t1.airport) | airport | hex(t2.airport) |+---------+-----------------+---------+-----------------+| b | 62 | a | 61 || c | 63 | a | 61 || a | 61 | b | 62 || c | 63 | b | 62 || a | 61 | c | 63 || b | 62 | c | 63 |+---------+-----------------+---------+-----------------+6 rows in set (0.00 sec) 这样我们就可以通过比较机场1和机场2的大小,来去掉重复值了 mysql>Select t1.airport, t2.airport from t T1 where t1.airport! = t2.airport and hex (t1.airport)
< hex(t2.airport) and abs(t1.distant-t2.distant) < 100;+---------+---------+| airport | airport |+---------+---------+| a | b || a | c || b | c |+---------+---------+3 rows in set (0.00 sec) 最后再优化一下,结果如下: mysql>Select t1.airport, t2.airport from t T1 where hex (t1.airport) < hex (t2.airport) and abs (t1.distant-t2.distant) < 100 +-+-+ | airport | airport | +-+-+ | a | b | a | c | b | c | +-+-+ 3 rows in set (0.00 sec)
SQL is not complicated, there are not too many join and subqueries for tables, but it can be solved by removing duplicate values and using distinct or group by before, so it doesn't seem to be suitable this time, so record it and welcome to clap bricks.
Reference link
Https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_hex
Https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-Built-inFunctions
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.