What is the order-making algorithm based on Graph Embedding? 04/17 Update SLTechnology News&Howtos

What is the order-making algorithm based on Graph Embedding?

2025-04-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly explains "what is the order-making algorithm based on Graph Embedding". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "what is the order-making algorithm based on Graph Embedding".

I. background

As an important part of the shopping voucher guide link, purchase is designed to help users find goods, reach a certain reduced threshold (for example, 400 minus 50), complete cross-store order making, and improve the experience of the entire shopping voucher link. Full-reduced shopping vouchers as the most widely used marketing means, the advantage is far greater than red packets, discounts and other preferential activities, it can not only bring real concessions to users, but also allow users to buy more, raising the unit price of customers. As an important link for the use of coupons, it aims to help consumers find products that can use coupons with the same threshold.

Compared with previous years, has two major breakthroughs in making orders this year. The first is the change in product form. In previous years, it is just a product recommendation page. This year, it can support search, price screening, category screening, sales ranking, price sorting and other search functions. Secondly, a major breakthrough has been made in the algorithm. Bundle mining,bundle based on Graph Embedding means package purchase. We think that the important scenario for making an order is when the user has already purchased product An and wants to find a product B that can be packaged and bought together, rather than looking for a product similar to A. the traditional u2i and similar i2i can not meet the needs of the order-making scenario, in order to break through the experience of finding similarity, which is often criticized. We can't even have U2i, similar i2i and other logic, so bundle mining has become the focus of single algorithm optimization, which can not only enhance the rich experience, but also improve the conversion efficiency.

Second, the core algorithm 1. Basic ideas

diagram is a highly abstract and expressive data structure, which describes the relationship between entities and entities through the definition of nodes and edges. The commonly used graphs are social network, commodity network, knowledge graph and so on.

user behavior is a natural network graph, and edges and nodes often have a variety of rich information. Graph embedding is an implicit representation vector of learning nodes, which encodes the association relationship of nodes in a continuous vector space, which is convenient to calculate the association relationship between nodes. At the same time, graph has the ability of propagation, and random walk can mine multiple relationships, which can effectively improve coverage and expand recall.

Graph Embedding is an important research direction in academic circles, such as deep walk, which is a typical method for language model and unsupervised learning to extend from word sequences to graph structures. This method takes truncated wandering sequences as sentences, and then uses the Skip-Gram model in word2vec to train to get the embedding vectors of each node. Line samples only for edges, and Node2vec can adjust the parameters to sample BFS or DFS.

so the basic idea of Graph Embedding is to sample (Sampling) the graph and construct the model (Embedding) from the extracted sequence.

2. Main technology

combined with our scenario, to mine co-purchase relationships, it can also be done directly through item-item relationship mining, traditional collaborative filtering can also be done, why do we still need to build graph? Because graph has the ability to spread, it can not only effectively extract the direct correlation, but also dig out the second-degree and third-degree relationship through the walk strategy. We believe that friends of friends, who are also friends, also have a certain weak connection, and the effective use of this communication ability can solve the sparsity of purchase data and greatly improve coverage.

We mainly have three aspects of work. First, we build graph based on user purchase behavior, node: commodity, side: simultaneous purchase behavior, weight: proportion of simultaneous purchase, which can be purchase times, purchase time, amount, etc. Second, based on weight Sampling (weighted walk) as a candidate for positive samples, negative samples are randomly sampled from users' non-purchase behavior. Third, the embedding part upgrades the unsupervised model to the supervised model, constructs the pair pairs of item-item based on the order collected by weighted walk, and sends them to the supervised model (DNN) for training. The following figure is the framework of the algorithm.

algorithm frame diagram

A) build Graph

mentioned above that we need to explore the co-purchase relationship between goods (bundle mining), which is similar to the problem of buying and buying. Therefore, the graph we build is a network of goods with weight, node: commodity, edge: co-purchase relationship between goods, weight: co-purchase times, purchase time.

Why does need Graph with weight? Because traditional methods such as random walk are not suitable for commodity networks, and commodity nodes are often tens of millions, the relevance of most of them is very weak, that is, most of them are unpopular goods, and only a small number of goods build graph is a hot spot. If we use random walk to sample, we will collect a lot of sequences of unpopular nodes, so we weighted walk based on the weight of edges to make the sampling as far as possible to the direction of hot nodes. In this way, the confidence of the sampled sample is higher.

therefore, our input is a graph with weight, Graph definition: G = (VMagneEPowerW), V = vertex (vertex or node, in the case of bundle, specifically refers to goods), E = edge (edge, in the case of bundle, specifically refers to joint purchase), W = weight (weight of edge, number of times and time of joint purchase), as shown in the following figure, next we will carry on sampling.

goods with weight graph

For example, if walks 2 steps and starts from node An and randomly fetches a neighbor node, if it is random walk algorithm, it will walk to B or C node with equal probability, but our algorithm will take node C with the probability of 7pm 8, and then walk to node D with the probability of 8pm 12. Finally, there will be a very high probability to pick out an order walk= (A-line C-Force D) walk= (A-line C-line D) for the original graph. An and D are not related, but the relationship between An and D can be effectively mined through weighted walk. For more information on the algorithm, please see:

Algorithm 1 Weigted Walk (GMagneWalksGreco nJournal walks)

Input: Graph G (VMagneEMagol W) G (VMagneErecoverW)

Step nn

Output: walkwalk

Initialization: walkwalk to empty

For each vivi ∈∈ VV do

Append vivi to walkwalk

For j=1...nj=1...n do

vj=GetNeighbor (GMagnevi) vj=GetNeighbor (GMagnevi)

Append vjvj to walkwalk

Return walkwalk

Algorithm 2 GetNeighbor (GMagnevi) GetNeighbor (GMagnevi)

Input: Graph G (VMagneEMagol W) G (VMagneErecoverW)

Node vivi

Output: next node vjvj

Vj=WeightedSample (vi,w) vj=WeightedSample (vi,w)

The algorithm is implemented on the odps graph platform, a distributed graph computing platform. Offline graph has 200 million edges, 30 million nodes, and runs all the data in 10 minutes. In the real-time part, we have implemented the structure of Graph edges that can be updated up to 10w per minute. How to implement this algorithm on the distributed odps graph platform can be seen in another ata for details.

C) Embedding

The last part of describes how to construct a weighted probability graph, which is based on weighted sampling (weighted walk) as a candidate for positive samples, and negative samples are randomly sampled from users' non-purchasing behavior. This part mainly introduces the part of embedding. Based on the order collected by weighted walk, we construct the pair pair of item-item and send it to the embedding model. We construct a supervised embedding model (DNN) to avoid the problem that the unsupervised model can not evaluate the effect of the model offline. The structure of the model is shown in the following figure.

2. Richness: compared with the benchmark bucket, the per capita exposure leaf category increases by 88% and the per capita exposure level 1 category increases by 43%, as shown in the following figure.

The core advantage of graph, friends of friends, but also friends, traditional i2i (statistical version i2i), only statistics direct relationship, and we build a joint purchase of graph, through walking mining multiple relationships, to make up for the problem of sparse purchase behavior, effectively improve the coverage. Our offline experiments found that compared with the statistical version of auc, the improvement is very significant. We have implemented real-time large-scale graph updates, which can update up to 100000 edges per minute, and the qps as big as double 11 can run smoothly.

Thank you for your reading, the above is the content of "what is the order-making algorithm based on Graph Embedding". After the study of this article, I believe you have a deeper understanding of what the order-making algorithm based on Graph Embedding is, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.