In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article will explain in detail how to improve the retention computing speed of ClickHouse. The content of the article is of high quality, so the editor will share it with you for reference. I hope you will have a certain understanding of the relevant knowledge after reading this article.
User retention is an indispensable function of each big data analysis platform. Enterprises generally use retention rate to measure the activity of users, and it is also a direct index that can directly reflect the functional value of products. Retention rate is one of the most important indicators to measure user quality, so calculating various retention rates is the basic skill at the bottom of data analysis. So here are some practical examples of user retention analysis.
1. Prepare for
Understand the current retention rate of several conventional calculation methods, understand that ClickHouse provides retention (cond1, cond2, …) Function to calculate the retention rate
Table creation: user basic information table: login_event
CREATE TABLE login_event-user login event (ID', of `accountId` String COMMENT 'account-user unique ID `ds` Date COMMENT' date-user login date) ENGINE = MergeTreePARTITION BY accountIdORDER BY accountId
Derivative: insert August user login data
-inserting data insert into login_event values (10001recorder toDate ('2020-08-01'), (10001recorder toDate (' 2020-08-08')), (10001memotoDate ('2020-08-09')), (10001memtoDate (' 2020-08-10')), (10001memtoDate ('2020-08-12')), (10001memtoDate (' 2020-08-13')), (10001memtoDate ('2020-08-14'), (10001) ToDate ('2020-08-15'), (10001 to date ('2020-08-16'), (10001 to date ('2020-08-17)), (10001 to date (' 2020-08-18'), (10001 to date ('2020-08-20)), (10001 to date (' 2020-08-22')), (10001 to date ('2020-08-23)), (10001 to date (2020-08-24)), (10001 to date (2020-08-23)), (10001 to date (2020-08-24)) ToDate ('2020-08-20'), (10002 to date ('2020-08-22'), (10002 to date ('2020-08-23)), (10002 to date (' 2020-08-01'), (10002 to date (2020-08-11)), (10002 to date ('2020-08-12)), (10002 to date (' 2020-08-13)), (10002 to date ('2020-08-20')), (10002 to date ('2020-08-13)), (10002 to date (' 2020-08-20')), (10002 to date ('2020-08-13')), (10002 to date (2020-08-13)), (10002 to date ( ToDate ('2020-08-15'), (10002 to date ('2020-08-30)), (10002 to date (' 2020-08-20'), (10002 to date ('2020-08-01'), (10002 to date ('2020-08-06)), (10002 to date (' 2020-08-24'), (10003 to date ('2020-08-05')), (10003 to date ('2020-08-08)), (10003 to date (' 2020-08-08')), (10003 to date ('2020-08-08')), ToDate ('2020-08-09'), (10003 to date (2020-08-10)), (10003 to date (2020-08-11)), (10003 to date (2020-08-13), (10003 to date (2020-08-15), (10003 to date (2020-08-16)), (10003 to date (2020-08-18)), (10003 to date (2020-08-20)), (10003 to date (2020-08-18)), (10003 to date (2020-08-20)), (10003 ToDate ('2020-08-01'), (10003 to date ('2020-08-21)), (10003 to date (' 2020-08-22'), (10003 to date ('2020-08-24'), (10003 to date ('2020-08-26'), (10003 to date ('2020-08-25')), (10003 to date ('2020-08-27')), (10003 to date (' 2020-08-28')), (10003 to date ('2020-08-28')) ToDate ('2020-08-29'), (10003 to date ('2020-08-30)), (10004 to date (' 2020-08-01'), (10004 to date ('2020-08-02'), (10004 to date (2020-08-03)), (10004 to date ('2020-08-04')), (10004 to date ('2020-08-05'), (10004 to date ('2020-08-08)), (10004 to date (' 2020-08-08')), (10004 to date ('2020-08-05'), (10004 to date (' 2020-08-08')), (10004 to date ('2020-08-08')), (10004 to date (' 2020-08-05')), (10004 to date ('2020-08-05')), ( ToDate ('2020-08-09'), (10004 to date (2020-08-10)), (10004 to date (2020-08-11)), (10004 to date (2020-08-14), (10004 to date (2020-08-15), (10004 to date (2020-08-16)), (10004 to date (2020-08-17)), (10004 to date (2020-08-19')), (10004 to date (2020-08-19')), (10004 to date (2020-08-19')), (10004 ToDate ('2020-08-20'), (10004 to date (2020-08-21)), (10004 to date (2020-08-22), (10004 to date (2020-08-23), (10004 to date (2020-08-24)), (10004 to date (2020-08-23)), (10004 to date (2020-08-23)) ToDate ('2020-08-27'), (10004 to date (' 2020-08-30')) two。 Topic analysis
Calculate the secondary stay, 3 stay, 7 stay, 14 stay, and 30 stay of active users in a certain day. We divide the problem into three steps:
Find the number of daily active users
Find the login status of the number of daily active users on the 2nd, 3rd, 6th, 13th and 29th
Calculate the login number of active users on the 2nd, 3rd, 6th, 13th and 29th day, and calculate the N-day retention rate.
Solution 1:
-- calculate the retention number of 2020-08-01 active users on the 2nd, 3rd, 6th, 13th and 29th day The retention rates SELECT ds, count (accountIdD0) AS activeAccountNum, count (accountIdD1) / count (accountIdD0) AS `second retention, count (accountIdD3) / count (accountIdD0) AS `3retention, count (accountIdD7) / count (accountIdD0) AS `7, count (accountIdD14) / count (accountIdD0) AS `14 are calculated. Count (accountIdD30) / count (accountIdD0) AS `30 leave `FROM (--use LEFT JOIN to find 2020-08-01 login users on the 2nd, 3rd, 6th, 13th and 29th login users SELECT DISTINCT a.ds AS ds, a.accountIdD0 AS accountIdD0, IF (b.accountId =', NULL, b.accountId) AS accountIdD1, IF (c.accountId =', NULL, c.accountId) AS accountIdD3 IF (d.accountId =', NULL, d.accountId) AS accountIdD7, IF (e.accountId =', NULL, e.accountId) AS accountIdD14, IF (f.accountId =', NULL, f.accountId) AS accountIdD30 FROM (--find the active users of the day SELECT DISTINCT ds in 2020-08-01 AccountId AS accountIdD0 FROM login_event WHERE ds = '2020-08-01' ORDER BY ds ASC) AS a LEFT JOIN test.login3_event AS b ON (b.ds = addDays (a.ds, 1)) AND (a.accountIdD0 = b.accountId) LEFT JOIN test.login3_event AS c ON (c.ds = addDays (a.ds) 2) AND (a.accountIdD0 = c.accountId) LEFT JOIN test.login3_event AS d ON (d.ds = addDays (a.ds, 6)) AND (a.accountIdD0 = d.accountId) LEFT JOIN test.login3_event AS e ON (e.ds = addDays (a.ds, 13) AND (a.accountIdD0 = e.accountId) LEFT JOIN test.login3_event AS f ON (f.ds = addDays (a.ds) 29) AND (a.accountIdD0 = f.accountId) AS tempGROUP BY ds result:-- ┌─ ds ─┬─ activeAccountNum ─┬─ secondary ─┬── 3 ─┬─ 7 ─┬─ 14 ─┬─ 30 ─┐│ 2020-08-01 │ 4 │ 0.25 │ 0.25 │ 0 │ 0.5 │ 0.75 │└─┴─┴─┘ 1 rows in set. Elapsed: 0.022 sec.
Solution 2:
-- judge the retention number of 2020-08-01 active users on the 2nd, 3rd, 6th, 13th and 29th, and calculate the retention rate The retention rates SELECT DISTINCT b.ds AS ds, ifnull (countDistinct (if (a.ds = b.ds, a.accountId, NULL)), 0) AS activeAccountNum, ifnull (if (a.ds = addDays (b.ds, 1), b.accountId, NULL) / activeAccountNum, 0) AS `secondary retention `, ifnull (countDistinct (if (a.ds = addDays (b.ds, 2), b.accountId, NULL) / activeAccountNum, 0) AS `3 retention` are calculated. Ifnull (countDistinct (if (a.ds = addDays (b.ds, 6), b.accountId, NULL) / activeAccountNum, 0) AS `7Liu`, ifnull (if (a.ds = addDays (b.ds, 13), b.accountId, NULL) / activeAccountNum, 0) AS `14 Liu`, ifnull (countDistinct (if (a.ds = addDays (b.ds, 29), b.accountId, NULL)) / activeAccountNum 0) AS `30 leave `FROM-- use INNER JOIN to find out the login status of 2020-08-01 active users during the next 1-30 days (SELECT ds, accountId FROM login_event WHERE (ds = '2020-08-01')) AS aINNER JOIN-- find 2020-08-01 active users (SELECT DISTINCT accountId) Ds FROM test.login3_event WHERE ds = '2020-08-01') AS b ON a.accountId = b.accountIdGROUP BY ds result:-- ┌─ ds ─┬─ activeAccountNum ─┬─ secondary ─┬── 3 ─┬─ 7 ─┬─ 14 ─┬─ 30 ─┐│ 2020-08-01 │ 4 │ 0.25 │ 0.25 │ 0 │ 0.5 │ 0.75 │└─┴─┴─┘ 1 rows in set. Elapsed: 0.019 sec.
Solution 3:
-- obtain the retention number of 2020-08-01 active users on the 2nd, 3rd, 6th, 13th and 29th according to the array subscript SUM (r [index]) The retention rates of SELECT toDate ('2020-08-01') AS ds, SUM (r [1]) AS activeAccountNum, SUM (r [2]) / SUM (r [1]) AS `times, SUM (r [3]) / SUM (r [1]) AS `3`, SUM (r [4]) / SUM (r [1]) AS `7 `, SUM (r [5]) / SUM (r [1]) AS `14 are calculated. SUM (r [6]) / SUM (r [1]) AS `30 stay `FROM-- find the login status of 2020-08-01 active users on the 2nd, 3rd, 6th, 13th and 29th day 1Log on / not logged in (WITH toDate ('2020-08-01') AS tt SELECT accountId, retention (toDate (ds) = tt, toDate (subtractDays (ds, 1)) = tt, toDate (subtractDays (ds, 2)) = tt, toDate (subtractDays (ds, 6)) = tt, toDate (subtractDays (ds, 13)) = tt, toDate (subtractDays (ds)) 29) = tt) AS r-find 2020-08-01 active users login data FROM login_eventWHERE (ds > = '2020-08-01') AND (ds) in the next 1-30 days
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 257
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.