Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use Mapping

2025-02-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article introduces the relevant knowledge of "how to use Mapping". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

one

Background background

In the login state, the login id (account) of the user is collected in the log, which can accurately identify the user; while in the anonymous state, the login id of the user is not collected in the log, and the user is accurately identified, which becomes an extremely tricky matter

In the event log, the fields that can identify the user are:

In the app log, there is deviceid,account

In the web log, there is cookieid,ip,account

In the wxapp log, there is openid,account

In reality, a user may be in the following extremely complex state:

Login status access app

Anonymous status access to app

Login status access web

Anonymous status access to web

Login status access to wx Mini Program

Anonymous status access to wx Mini Program

A user may have more than one terminal device

There may be multiple users on a terminal device.

A user may change his mobile phone after a period of time.

……

two

Solution

1 use device ID only

Applicable scenario

It is suitable for products that do not have a user registration system, or a very small number of users will log on to multiple devices, such as tool products, search engines, some small e-commerce, and so on.

This is also the only solution offered by most data analysis products.

Limitation

The use of the same user in different devices will be considered different users, which will have an impact on the subsequent analysis and statistics.

Different users using the same device will be considered as a user, which also has an impact on the subsequent analysis and statistics.

However, if it is not a common scenario for users to use cross-device or multi-user shared devices, the above problems can be ignored.

2 Associate device ID and login ID (one-to-one)

Applicable scenario

After successfully associating the device ID with the login ID, the user's behavior on the device ID or under the login ID will be consistent, which is considered to be a global ID. In the event, funnel, retention and other user-related analysis will also be counted as a user.

Although the method of associating device ID and logging in ID achieves more accurate user tracking, it also increases complexity.

Therefore, in general, we recommend that ID association be considered only if the following conditions are met:

You need to get through the behavior of a user before and after registering on a device.

You need to understand the behavior of a registered user after logging in on different devices.

Limitation

A device ID can only be associated with one login ID, when in fact a device may have multiple users.

A login ID can only be associated with one device ID, when in fact a user may log in on multiple devices with a login ID.

3 Associate device ID and login ID (many to one)

Applicable scenario

It is a common scenario for a user to log in on multiple devices, for example, both the Web side and the App side may need to log in. After supporting a login to associate a multi-device ID under an ID, the user's behavior under the multi-device will be consistent, which is considered to be the occurrence of an ID.

Limitation

A device ID can only be associated with one login ID, when in fact a device may have multiple users.

Once a device ID is associated with a login ID or a login ID is associated with a device ID, it cannot be undone (automatically).

In fact, the dynamic association between the device ID and the login ID should be more reasonable.

4 Associated device ID and login ID (dynamic correction)

Basic principles, same as option 3

Correction: after a device ID is bound to a login ID (A), if the device is used more frequently by a new login ID (B) in a subsequent period of time (for example, within a month), the device ID will be adjusted to bind login ID (B).

three

Realize

1 create a log table

Create table wedw_dw.test_id_mapping (device_id string comment 'device ID',user_id string comment' user ID',page_url string comment 'browsing page', login_time timestamp comment 'login time') partitioned by (date_id string) row format delimited fields terminated by', 'stored as textfile

2flume collects access logs on the first day

+-+ | device_id | user_id | page_url | login_time | date_id | +- +-+-+ | device01 | user01 | pageview | 2020-09-01 09 pageview | 2020-09-01 | | device01 | user01 | pageview | 2020-09-01 09 pageview | 2020-09 -01 | | device01 | user01 | pageview | 2020-09-01 09pageview | 2020-09-01 | device01 | user01 | pageview | 2020-09-01 09pageview 04lane 00.0 | 2020-09-01 | device02 | user02 | pageview | 2020-09-01 09lane 03lane 00.0 | 2020-09-01 | device02 | user02 | pageview | 2020-09-01 09Switzerland 04pur00.0 | 2020-09-01 | device02 | user02 | | | pageview | 2020-09-01 09user02 | pageview | 2020-09-01 | device02 | user02 | pageview | 2020-09-01 06lane 00.0 | 2020-09-01 | device02 | null | pageview | 2020-09-01 09Freight 07Ze00.0 | 20-09-01 | device02 | user03 | pageview | 2020-09-01 09Freight 08pur00.0 | 2020-09-01 | device03 | null | pageview | 2020-09 | 09-01 09 pageview | 2020-09-01 | device03 | null | 2020-09-01 | device04 | null | 2020-09-01 | 2020-09-01 | | device04 | null | pageview | 2020-09-01 09pageview | 2020-09-01 | + -+-+

Regardless of the number of access records of an account on a device, the score is calculated only once for the logged-in accounts on each device, and different scores are assigned to them in chronological order (decreasing by 10 points in turn). The example is as follows:

# calculate the global IDdrop table if exists wedw_tmp.login_info_1 corresponding to the device ID Create table wedw_tmp.login_info_1asselect t1.deviceSecretiddline t1.userroomidfrom (select device_id, user_id, page_url, login_time, row_number () over (partition by device_id order by login_time asc) as rn from wedw_dw.test_id_mapping where date_id = '2020-09-01' and user_id 'null') t1where rn = 1 +-- + | device_id | user_id | +-- + | device01 | user01 | | device02 | user02 | +-+ # calculate the login information of each user corresponding to the device ID And score drop table if exists wedw_tmp.login_info_2 Create table wedw_tmp.login_info_2asselectt4.device_id,collect_list (t4.login_info) as login_infofrom (select t3.device_id, concat (t3.username, t2.login_time) as login_infofrom (select t2.device_id, t2.user_id, t2.login_time) Row_number () over (partition by device_id order by login_time asc) as rn from (select device_id, user_id) Min (login_time) as login_time from wedw_dw.test_id_mapping where date_id = '2020-09-01' and user_id 'null' group by device_id,user_id) T2) T3) t4group by t4.device_id +-- + | device_id | login_info | | +-- + | device01 | ["user01- > 100-> 2020-09-01 09:01:00"] | | device02 | ["user02- > 100-> 2020-09-01 09:03:00" "user03- > 90-> 2020-09-01 09:08:00"] | +-+ # create guid mapping table drop table if exists wedw_dw.device_guid Create table wedw_dw.device_guid (device_id string comment 'device ID',login_info array comment' user login information', guid string comment 'global ID') row format delimited fields terminated by', 'stored as textfile # insert the initialized data into the guid mapping table insert into table wedw_dw.device_guidselect t1. DeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDevic +-- + | device_id | Login_info | guid | +-- -+ | device01 | ["user01- > 100-> 2020-09-01 09:01:00"] | user01 | | device02 | ["user02- > 100-> 2020-09-01 09:03:00" "user03- > 90-> 2020-09-01 09:08:00"] | user02 | +-+

3flume collects access logs on the second day

+-+ | device_id | user_id | page_url | login_time | date_id | +- +-+-- + | device01 | user01 | pageview | 2020-09-02 09VRV 11VUR 00.0 | 2020-09-02 | | device01 | user01 | pageview | 2020-09-02 09VRO 12VOR 00.0 | 2020-09 -02 | | device01 | user01 | pageview | 2020-09-02 09 pageview | 2020-09-02 | | device01 | user01 | pageview | 2020-09-02 09 pageview | 2020-09-02 | device02 | user03 | pageview | 2020-09-02 09 pageview | 2020-09-02 | device02 | user03 | pageview | 2020-09-02 09 pageview 140.0 | 2020-09-02 | device02 | user03 | | | pageview | 2020-09-02 09 user03 | pageview | 2020-09-02 | device02 | user03 | pageview | 2020-09-02 09V 16lane 00.0 | 2020-09-02 | | device02 | null | pageview | 2020-09-02 | 2020-09-02 | device02 | user02 | pageview | 2020-09-02 09Flange 18lane 00.0 | 20-09-02 | | device03 | null | pageview | 2020-02 | 09-02 09Freight 00.0 | 2020-09-02 | | device03 | user04 | pageview | 2020-09-02 09Freight 00.0 | 2020-09-02 | +- +-- + # calculate the global IDdrop table if exists wedw_tmp.login_info_3 corresponding to the device ID Create table wedw_tmp.login_info_3asselect t1.deviceSecretiddline t1.userroomidfrom (select device_id, user_id, page_url, login_time, row_number () over (partition by device_id order by login_time asc) as rn from wedw_dw.test_id_mapping where date_id = '2020-09-02' and user_id 'null') t1where rn = 1 +-- + | device_id | user_id | +-+ | device01 | user01 | | device02 | user03 | | device03 | user04 | +-+ # calculate each corresponding ID of this device Login information and score drop table if exists wedw_tmp.login_info_4 of each user Create table wedw_tmp.login_info_4asselectt4.device_id,collect_list (t4.login_info) as login_infofrom (select t3.device_id, concat (t3.username, t2.login_time) as login_infofrom (select t2.device_id, t2.user_id, t2.login_time) Row_number () over (partition by device_id order by login_time asc) as rn from (select device_id, user_id) Min (login_time) as login_time from wedw_dw.test_id_mapping where date_id = '2020-09-02' and user_id 'null' group by device_id,user_id) T2) T3) t4group by t4.device_id +-- + | device_id | login_info | | +-- + | device01 | ["user01- > 100-> 2020-09-02 09:11:00"] | | device02 | ["user03- > 100-> 2020-09-02 09:13:00" "user02- > 90-> 2020-09-02 09:18:00" | | device03 | ["user04- > 100-> 2020-09-02 09:12:00"] | + -+

Computational logic: 1. For the log data of the day, score rules for each account logged in on each device:

Regardless of the number of access records of an account on a device, the score is calculated only once on each device.

The logged-in accounts are assigned different scores in chronological order (decreasing by 10 points in turn). The example results are as follows:

# insert the calculated guid mapping result into the temporary table drop table if exists wedw_tmp.login_info_5;create table wedw_tmp.login_info_5select t1. DeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceDeviceMid as guidfromwedw_tmp.login_info_3 t1inner join wedw_tmp.login_info_4 t2on t1.device_id = t2.device_id, insert the calculated guid mapping result into the temporary table drop table if exists wedw_tmp.login_info_5;create table wedw_tmp.login_info_5select t. +-- + | device_id | Login_info | guid | +-- -+ | device01 | ["user01- > 100-> 2020-09-02 09:11:00"] | user01 | | device02 | ["user03- > 100-> 2020-09-02 09:13:00" "user02- > 2020-09-02 09:18:00" | user03 | | device03 | ["user04- > 100-> 2020-09-02 09:12:00"] | user04 | +-+ -- +

4 combine the result of the previous day and the result of that day to get the final mapping result of that day.

# twice result data rows are transferred and inserted into a temporary table drop table if exists wedw_tmp.login_info_6 Create table wedw_tmp.login_info_6selectdevice_id,split (tmp.sub,'- >') [0] as user_id,split (tmp.sub,'- >') [1] as score,split (tmp.sub,'- >') [2] as login_timefromwedw_dw.device_guidlateral view explode (login_info) tmp as subunion all selectdevice_id,split (tmp.sub,'- >') [0] as user_id,split (tmp.sub,'- >') [1] as score,split (tmp.sub '- >') [2] as login_timefromwedw_tmp.login_info_5lateral view explode (login_info) tmp as sub +-+-+ | device_id | user_id | score | login_time | +- -+-- + | device01 | user01 | 100 | 2020-09-01 09:01:00 | | device02 | user02 | 2020-09-01 09:03:00 | device02 | user03 | 90 | 2020-09-01 09:08:00 | | device01 | user01 | 100 | 2020-09-02 09:11:00 | device02 | user03 | 2020-09-02 09:13:00 | | device02 | | | user02 | 90 | 2020-09-02 09:18:00 | | device03 | user04 | 2020-09-02 09:12:00 | +-- +-- + # calculate the evaluation of each user under each device in the previous result data | Minute sum and earliest login time drop table if exists wedw_tmp.login_info_7 Create table wedw_tmp.login_info_7select device_id,user_id,sum (score) as score,min (login_time) as login_timefromwedw_tmp.login_info_6group by device_id,user_id +-+-+ | device_id | user_id | score | login_time | +- -+-- + | device01 | user01 | 200.0 | 2020-09-01 09:01:00 | | device02 | user02 | 190.0 | 2020-09-01 09:03:00 | | device02 | user03 | 190.0 | 2020-09-01 09:08:00 | | device03 | user04 | 100.0 | 2020-09-02 09:12:00 | +-- -+-- +-- + # merge the guiddrop table if exists wedw_tmp.login_info_8 corresponding to the result data calculation device ID twice Create table wedw_tmp.login_info_8select t2.device_id as device_id,t2.user_id as guidfrom (select t1.device_id, t1.user_id, t1.score, t1.login_time, row_number () over (partition by device_id order by t1.score desc) as rn from wedw_tmp.login_info_7 T1) t2where t2.rn = 1 +-- + | device_id | guid | +-+ | device01 | user01 | | device02 | user02 | | device03 | user04 | +-+ # calculate the result data twice after merging the ID pair of each device Required user login information drop table if exists wedw_tmp.login_info_9 Create table wedw_tmp.login_info_9selectt1.device_id,collect_list (t1.login_info) login_infofrom (select device_id, concat (user_id,'- >', score,'- >', login_time) as login_infofrom wedw_tmp.login_info_7) t1group by device_id +-+-+ | device_id | collect_list (login_info ) | +-- + | device03 | ["user04- > 100.0" -> 2020-09-02 09:12:00 "] | device02 | [" user02- > 190.0-> 2020-09-01 09:03:00 " "user03- > 190.0-> 2020-09-01 09:08:00"] | | device01 | ["user01- > 200.0-> 2020-09-01 09:01:00"] | + -- +-- + # insert the merged data into the guid mapping table insert overwrite table wedw_dw.device_guidselect t1.device_id as device_id T2.login_info as login_info,t1.guid as guidfrom wedw_tmp.login_info_8 t1inner join wedw_tmp.login_info_9 t2on t1.device_id = t2.device_id +-- + | device_id | Login_info | guid | + +-+-+ | device03 | ["user04- > 100.0-> 2020-09-02 09:12:00"] | user04 | | device02 | ["user02- > 190.0-> 2020-09-01 09:03:00" "user03- > 190.0-> 2020-09-01 09:08:00"] | user02 | | device01 | ["user01- > 200.0-> 2020-09-01 09:01:00"] | user01 | +-+-- This is the end of the introduction of "how to use Mapping". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report