In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly shows you "how to optimize count (distinct) in Hive", which is easy to understand and well-organized. I hope it can help you solve your doubts. Let me lead you to study and learn how to optimize count (distinct) in Hive.
Problem description
COUNT (DISTINCT xxx) can easily cause data skew in hive. In view of this situation, there are many optimization methods on the Internet, and I will not repeat them here.
But sometimes, "data tilt" is almost inevitable. Let's give an example:
Suppose that the client session information visiting a website M is recorded in the table detail_sdk_session, that is, if user An opens the app client, a session information will be recorded in the table. The granularity of the table is "once" session, in which each session records the user's unique identification uuid,uuid is a very long string, assuming its length is 64 bits. The demand now is to count the number of active users every day-"monthly active users" (those who visited app in that month are active users). Let's take January 2016 as an example, and now represents the current date.
The easiest way
The question is logically simple and SQL is easy to write, such as:
SELECT COUNT (DISTINCT uuid) FROM detail_sdk_session tWHERE t.date > = '2016-01-01' AND t.date = '2016-01-01' AND partition_date = '2016-01-01' AND partition_date
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.