How to build Enterprise Personalized recommendation platform based on DAYU 04/16 Update SLTechnology News&Howtos

How to build Enterprise Personalized recommendation platform based on DAYU

2025-04-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article shows you how to build an enterprise personalized recommendation platform based on DAYU, the content is concise and easy to understand, it can definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

Abstract: the most time-consuming and labor-consuming thing to build this platform is the scheduling of batch and streaming jobs, job organization and management, and task scheduling. But all these tasks can be accomplished with the data development function of DAYU.

Most e-commerce enterprises will build their own personalized recommendation systems, using their own user data, commodity data, user behavior data and various dimensions of label portraits to calculate user preferences, recommend the best products to users, and maximize transactions.

A typical recommendation system includes batch computing, real-time processing layer and recommended application, which is a typical Lamda architecture.

The most time-consuming and labor-consuming thing to build this platform is the scheduling of batch and streaming jobs, job organization and management, and task scheduling. But all these tasks can be accomplished with the data development function of DAYU. Of course, you might say, isn't there a dedicated personalized recommendation cloud service that doesn't smell good to use it directly? Here we do not compete to lift barbells, if the enterprise does not have the ability to use a variety of recommendation algorithms, then directly spend some money to buy recommendation services is the best choice; but if you want to maximize and continuously optimize the effectiveness of recommendation algorithms, the framework is more reliable. Here is an example of how to use DAYU to quickly complete a simple recommendation system. In addition to the data development of DAYU, you also need to match Huawei Cloud's DLI, DIS and MRS-HBase.

First, we introduce two types of jobs developed by DAYU:

Batch operation

Batch jobs can only be triggered by scheduling, and the execution of the task must end for a period of time, in other words, the task cannot run continuously for an infinite period of time. A job is a Pipeline,Pipeline composed of multiple operators (one can also be) scheduled as a whole.

Real-time operation

In fact, the name of real-time job is not accurate. In fact, it can be a mixed stream and batch job, a pure real-time stream job, or a simple batch job. A job is a Pipeline composed of multiple operators. Relative to batch jobs, each operator in a real-time job can be individually configured with a scheduling strategy, and the tasks initiated by the operator can never be offline, so that those always online Flink and SparkStreaming stream processing jobs can be scheduled. In real-time jobs, connections with arrows only represent business relationships, not task execution processes, let alone data flows.

The background of this recommendation system is implemented using real-time jobs. A mixed stream and batch job is directly given a panorama:

The main calculation flow of a simple recommendation system is covered here. The task flow of more algorithms is not fully shown here, such as model-based algorithm, recommendation algorithm based on in-depth learning, and does not include the calculation process of various recommendation indicators. Interested students can learn from Baidu.

The whole task includes 9 sets of data processing processes, 6 batch job flows, and 3 real-time jobs:

Batch process

Calculate from top to bottom:

1) calculate the recommendation list based on individual user characteristics and tags

Cycle: once a day

Calculation: every day, user data is extracted from RDS to DLI through CDM, and a recommendation list is generated based on each user's basic information, age, sex, occupation, income, region and other attribute information, as well as tag information from a 360-degree portrait system, and saved to HBase.

2) calculate the recommendation list based on the similarity of the product.

Cycle: once a day

Calculation: every day, the new product information is extracted from RDS to DLI through CDM, and then the recommendation list based on the similar characteristics of goods is calculated and stored in HBase.

3) calculate the preferences of current users and generate a list of daily recommendations

Cycle: once a day

Calculation: through the DIS dump dump task, the real-time user behavior information collected by the website is dumped to OBS, and the recommendation list is calculated based on the one-day behavior data through a number of Spark algorithms (batch user collaboration, commodity collaboration, content-based similarity, LR, etc.). Then push the list to HBase.

4) calculate the preferences of users this week and generate a list of weekly recommendations

Cycle: once a day

Calculation: the calculation behavior is the same as above, except that the recommendation list is calculated based on one week's behavior data.

5) calculate the preferences within 3 months and generate a list of long-term preference recommendations

Cycle: once a day

Calculation: the calculation behavior is the same as above, except that the recommendation list is calculated based on 3 months of behavior data.

6) calculate the list of popular products

Cycle: every day or several hours

Calculation: through the click, search, scoring and other behaviors of users' overall goods, and based on the user's behavior data on OBS, the Top50 of popular products is calculated by category. This list can also be used as a complement list, which can be used when other recommendation lists are not enough to fill the recommended bits of the site.

Real-time flow processing flow

1) Real-time calculation of user preferences-- Item-Based collaborative algorithm

Calculation: consume the data of the DIS user behavior channel through the Flink task, first convert the user behavior log into standard behavior (Time,userid,ItemID,Score), and then calculate the recommendation list through the streaming Item-Based collaborative algorithm, and update it to the HBase.

2) Real-time calculation of user preferences-- User-Based collaborative algorithm

Calculation: same as above, the difference is to use the streaming User-Based collaborative algorithm to calculate the recommendation list and update it to HBase.

3) Real-time calculation of user preferences-- Content-Based algorithm

Calculation: same as above, the difference is to use the streaming Content-Based collaborative algorithm to calculate the recommendation list and update it to HBase.

For the last operation, there will be a list of recommendations with UserID and Item as Key in HBase, such as:

Results of user recommendation list:

Userid_001:item100, item899, item 433, item 666,....

Userid_002:item220, item334, item 720 item 666,....

Userid_003:item728, item899, item 333, item 632,....

There are several different recommendation lists according to different periods of users' real-time behavior and historical behavior.

Product-based recommendation list results:

Item_0001: Item1000,Item333,time5213,...

Item_0002: Item1000,Item333,time5213,...

Item_0003: Item1000,Item333,time5213,...

In addition, the recommendation system platform also needs a service that provides rest interface for calling the recommended bits of the web website. When the user opens the web page, the user automatically requests the current user's recommendation list from the service, visits the HBase, obtains several recommendation lists calculated by the previous job, and combines them into a recommendation list to return to the web page according to a certain strategy, thus completing an end-to-end recommendation business process.

A complete recommendation system is more complex, and the topic content of the recommendation system is not discussed here. From the examples, we can see that DAYU has powerful orchestration and scheduling capabilities, and a single task can cover very complex scenarios. In real time, the large recommendation system platform still needs targeted customization, because it involves some management processes that need to be dealt with and closed loop. However, based on various platforms and applications under Huawei's cloud system, with DAYU as an assistant, all aspects of data-related transaction processing will become concise and efficient.

The above content is based on how to build an enterprise personalized recommendation platform based on DAYU. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.