In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
How to use DolphinDB for Taobao user behavior analysis, I believe that many inexperienced people do not know what to do. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.
DolphinDB is a new generation of high-performance distributed time series database (time-series database), which has rich functions of data analysis and distributed computing. This tutorial uses DolphinDB to analyze the user behavior data of Taobao APP to further analyze business problems.
Data source: User Behavior Data from Taobao for Recommendation- dataset-Ali Yun Tianchi
We encapsulate DolphinDB database and the dataset we use into docker. Docker includes DolphinDB's distributed database dfs://user_behavior. It contains a table, user, which holds the behavior records of nearly 1 million Taobao APP users between November 25, 2017 and December 3, 2017. We use the combined partition method, the first layer is partitioned according to the date, one partition per day, and the second layer is divided into 180 partitions according to userID. The structure of the user table is as follows:
The meanings of the various types of user behavior are as follows:
Pv: browse the product details page
Buy: purchase of goods
Cart: add goods to the shopping cart
Fav: collect goods
1. Download the docker deployment package
This tutorial has encapsulated DolphinDB and the data used in a docker container. Make sure the docker environment is deployed before using it. Please refer to https://docs.docker.com/install/ for docker installation tutorials. Download the deployment package from http://www.dolphindb.cn/downloads/bigdata.tar.gz and execute the following code to the directory where the deployment package is located.
Extract the deployment package:
Gunzip bigdata.tar.gz
Import the container snapshot as a mirror:
Cat bigdata.tar | docker import-my/bigdata:v1
Get the ID of the image my/bigdata:v1:
Docker images
Start the container (replace images id according to the actual situation):
Docker run-dt-p 8888 bin/bash 8848-- name test / bin/bash. / dolphindb/start.sh
Enter the local IP address: 8888, such as localhost:8888, in the browser address bar to enter DolphinDB Notebook. The following code is executed in DolphinDB Notebook.
The DolphinDB license in this docker is valid until September 1, 2019. If the license file expires, you only need to download the community version from the DolphinDB official website and replace bigdata.tar/dolphindb/dolphindb.lic with the community version of license.
two。 User behavior analysis
View the amount of data:
Login ("admin", "123456") user=loadTable ("dfs://user_behavior", "user") select count (*) from user98914533
There are 98914533 records in the user table.
Analyze the behavior of users in the whole process from browsing to final purchase:
PV=exec count (*) from user where behavior= "pv" 88596903UV=count (exec distinct userID from user) 987984
During these nine days, the number of page visits to Taobao APP was 88596 and 903, and the number of unique visitors was 987984.
The exec used above is unique to DolphinDB and is similar to select. The difference between the two is that the select statement always returns a table, exec returns a vector when selecting a column, a scalar is returned when used with an aggregate function, and a matrix is returned when used with pivoy by, which facilitates subsequent calculation of the data.
Count the number of users who browse the page only once:
OnceUserNum=count (select count (behavior) from user group by userID having count (behavior) = 1) 92jumpRate=onceUserNum\ UV*1000.009312
Only 92 users left APP after only one page, accounting for 0.0093% of the total number of users, which is almost negligible, indicating that Taobao is attractive enough for users to stay in APP.
Count the number of individual user behaviors:
Behaviors=select count (*) as num from user group by behavior
Calculate the conversion rate from browsing to intended purchase:
Adding goods to the shopping cart and collecting items can be considered that users are interested in buying. Count the number of user behaviors that are interested in purchasing:
Fav_cart=exec sum (num) from behaviors where behavior= "fav" or behavior= "cart" 8318654intentRate=fav_cart\ PV*1009.389328
The conversion rate from browsing to intentional purchase is only 9.38%.
Buy= (exec num from behaviors where behavior= "buy") [0] 1998976buyRate=buy\ PV*1002.256259intent_buy=buy\ fav_cart*10024.030041
The conversion rate from browsing to final purchase is only 2.25%, and from intended purchase to final purchase is 24.03%, indicating that most users will collect or add their favorite products to the shopping cart, but will not necessarily buy them immediately.
Statistics of unique visitors to various user behaviors:
UserNums=select count (userID) as num from (select count (*) from user group by behavior,userID) group by behavior
Pay_user_rate= (exec num from userNums where behavior= "buy") [0]\ UV*10067.852313
During these nine days, 67.8% of the paying users used Taobao APP, indicating that most users will shop on Taobao APP.
Count the number of users with various user behaviors per day:
DailyUserNums=select sum (iif (behavior== "pv", 1Power0)) as pageView, sum (iif (behavior== "fav", 1Power0)) as favorite, sum (iif (behavior== "cart", 1mem0) as shoppingCart, sum (iif (behavior== "buy", 1mem0) as payment from user group by date (behaveTime) as date
Visits to Taobao APP increased significantly on Friday, Saturday and Sunday (2017.11.25,2017.11.26,2017.12.02, 2017.12.03).
Iif is the conditional operator of DolphinDB, its syntax is iif (cond, trueResult, falseResult), cond is usually a Boolean expression, returns trueResult if cond is satisfied, and falseResult if cond is not satisfied.
Count the number of various user behaviors at different times of the day. We provide the following two methods:
The first method is to count the data of each time period separately, and then combine the results. For example, count the number of user behaviors in different periods of the working day on Wednesday, November 29, 2017.
Re1=select first (behaveTime) as time, sum (iif (behavior== "pv", 1p0)) as pageView, sum (iif (behavior== "fav", 1Power0) as favorite, sum (iif (behavior== "cart", 1p0) as shoppingCart, sum (iif (behavior== "buy", 1p0) as payment from user where behaveTime between 2017.11.29T00:00:00: 2017.11.29T05:59:59re2=select first (behaveTime) as time, sum (iif (behavior== "pv", 1J0)) as pageView, sum (iif (behavior== "fav", 1J0)) as favorite Sum (iif (behavior== "cart", 1Power0)) as shoppingCart, sum (iif (behavior== "buy", 1Power0)) as payment from user where behaveTime between 2017.11.29T06:00:00: 2017.11.29T08:59:59re3=select first (behaveTime) as time, sum (iif (behavior== "pv", 1SCO) as pageView, sum (iif (behavior== "fav", 1jin0) as favorite, sum (iif (behavior== "cart", 1d0) as shoppingCart, sum (iif (behavior== "buy") 1) as payment from user where behaveTime between 2017.11.29T09:00:00: 2017.11.29T11:59:59re4=select first (behaveTime) as time, sum (iif (behavior== "pv", 1) as pageView, sum (iif (behavior== "fav", 1) as favorite, sum (iif (behavior== "cart", 1) as shoppingCart, sum (iif (behavior== "buy", 1) as payment from user where behaveTime between 2017.11.29T12:00:00: 2017.11.29T13:59:59re5=select first (behaveTime) as time Sum (iif (behavior== "pv", 1Power0)) as pageView, sum (iif (behavior== "fav", 1Power0)) as favorite, sum (iif (behavior== "cart", 1Magin0)) as shoppingCart, sum (iif (behavior== "buy", 1Magin0) as payment from user where behaveTime between 2017.11.29T14:00:00: 2017.11.29T17:59:59re6=select first (behaveTime) as time, sum (iif (behavior== "pv", 1d0) as pageView, sum (iif (behavior== "fav", 1Mag0)) as favorite, sum (iif (behavior== "behavior==") ) as shoppingCart, sum (iif (behavior== "buy", 1d0)) as payment from user where behaveTime between 2017.11.29T18:00:00: 2017.11.29T21:59:59re7=select first (behaveTime) as time, sum (iif (behavior== "pv", 1d0)) as pageView, sum (iif (behavior== "fav", 1pen0) as favorite, sum (iif (behavior== "cart", 1d0) as shoppingCart, sum (iif (behavior== "buy") 1) as payment from user where behaveTime between 2017.11.29T22:00:00: 2017.11.29T23:59:59re=unionAll ([re1,re2,re3,re4,re5,re6,re7], false)
This approach is relatively simple, but requires a lot of repetitive code. Of course, you can also encapsulate repetitive code into functions.
Def calculateBehavior (startTime,endTime) {return select first (behaveTime) as time, sum (iif (behavior== "pv")) as pageView, sum (iif (behavior== "fav", 1) as favorite, sum (iif (behavior== "cart", 1) as shoppingCart, sum (iif (behavior== "buy", 1) as payment from user where behaveTime between startTime: endTime}
You only need to specify the start time of the time period.
Another way to do this is through DolphinDB's Map-Reduce framework. For example, count the user behavior on Wednesday, November 29, 2017.
Def caculate (t) {return select first (behaveTime) as time, sum (iif (behavior== "pv", 1memo)) as pageView, sum (iif (behavior== "fav", 1p0)) as favorite, sum (iif (behavior== "cart", 1pc0) as shoppingCart, sum (iif (behavior== "buy", 1pc0) as payment from t} ds1 = repartitionDS (, `behaveTime, RANGE) 2017.11.29T00:00:00 2017.11.29T06:00:000 2017.11.29T09:00:00 2017.11.29T12:00:00 2017.11.29T14:00:00 2017.11.29T18:00:00 2017.11.29T22:00:00 2017.11.29T23:59:59) WedBehavior = mr (ds1, caculate, unionAll {, false})
We use the repartitionDS function to re-partition the user table according to the time range (without changing the original partition mode of the user table), and generate multiple data sources, and then use the mr function to calculate the data sources in parallel. DolphinDB applies the caculate function to each data source and then merges the results.
On weekdays, Taobao has the highest utilization rate of APP in the early morning (0: 00 to 6: 00), followed by the afternoon (14:00 to 16:00).
Statistics of user behavior on Saturday (2017.11.25) and Sunday (2017.11.26):
Ds2 = repartitionDS (, `behaveTime, RANGE,2017.11.25T00:00:00 2017.11.25T06:00:000 2017.11.25T09:00:00 2017.11.25T12:00:00 2017.11.25T14:00:00 2017.11.25T18:00:00 2017.11.25T22:00:00 2017.11.25T23:59:59) SatBehavior = mr (ds2, caculate, unionAll {, false})
Ds3 = repartitionDS (, `behaveTime, RANGE,2017.11.26T00:00:00 2017.11.26T06:00:000 2017.11.26T09:00:00 2017.11.26T12:00:00 2017.11.26T14:00:00 2017.11.26T18:00:00 2017.11.26T22:00:00 2017.11.26T23:59:59) SunBehavior = mr (ds3, caculate, unionAll {, false})
The utilization rate of Taobao APP on Saturdays and Sundays is higher than that on weekdays. Similarly, the peak of APP usage on Taobao on Saturdays and Sundays is in the early morning (0: 00 to 6: 00).
3. Commodity Analysis allItems=select distinct (itemID) from user4142583
In these nine days, a total of 4142583 commodities were involved.
Count the number of purchases of each item:
ItemBuyTimes=select count (userID) as times from user where behavior= "buy" group by itemID order by times desc
Count the top 20 goods sold:
SalesTop=select top 20 * from itemBuyTimes order by times desc
Goods with an ID of 3122135 had the highest sales, with a total of 1408 purchases.
Count the quantity of goods under each purchase number:
BuyTimesItemNum=select count (itemID) as itemNums from itemBuyTimes group by times order by itemNums desc
The results show that the vast majority (370747) of goods were purchased only once in these nine days, accounting for 8.94% of all goods. The more times you buy, the less the number of goods involved.
Count the number of user behaviors for all goods:
AllItemsInfo=select sum (iif (behavior== "pv", 1Power0)) as pageView, sum (iif (behavior== "fav", 1mem0)) as favorite, sum (iif (behavior== "cart", 1pc0) as shoppingCart, sum (iif (behavior== "buy", 1mem0) as payment from user group by itemID
Count the top 20 merchandise viewed:
PvTop=select top 20 itemID,pageView from allItemsInfo order by pageView desc
The product with the highest number of views was ID 812879, with 29720 views, but sales were only 135, not in the top 20.
Count the number of user behaviors of the top 20 products:
Select * from ej (salesTop,allItemsInfo, `itemID) order by times desc
The highest-selling product 3122135 had 1777 views, did not enter the top 20, and the conversion rate from browsing to purchase was as high as 79.2%. The product may be a product with rigid demand, and users do not need to browse too much to decide to buy.
Expand the exercise:
(1) calculate the purchase rate of Taobao APP per hour on November 25, 2017 (purchase rate = number of purchases / total number of actions * 100%)
(2) find out the user who buys the most and the goods he buys the most.
(3) calculate the number of purchases of goods with an ID of 3122135 in each time period
(4) count the number of actions in each category
(5) calculate the highest-selling goods in each category
After reading the above, have you mastered how to use DolphinDB to analyze the behavior of Taobao users? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.