Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

The method of implementing content recommendation by Tags

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

Editor to share with you Tags to achieve content recommendation methods, I believe that most people do not know much about it, so share this article for your reference, I hope you will learn a lot after reading this article, let's go to know it!

Originally, for the sake of simplicity and convenience, the content recommendation of the article page on my small website is to randomly extract data from the database to fill a list, so there is no relevance at all, and there is no way to guide users to visit the recommended content.

Algorithm selection

How to do the recommendation of similar content, because the small website is still running on the virtual host (yes, there is not even a complete controllable server), so there are not many ways to think of, the condition is that you can only use PHP+MySql. So what I can think of is to use Tags to match similar articles for recommendation. If the TAGS of the two articles are similar

For example, the TAGS of article An is: [Ameme, Breco, C, D, E]

The TAGS of article B is: [Ameme Dpene E mai Fjue G]

The TAGS of article C is: [C, H, J, I, J, K]

Through the eyes, we can easily find that article B and article An are more similar, because they have three keywords that are the same: [a _ Magi D ~ E], how to use a computer to judge their similarity? here we use the most basic jaccard similarity to calculate their similarity.

Jaccard similarity

Given that the Jaccard coefficients of two sets An and B are defined as the ratio of the size of the intersection of An and B to the size of the union of An and B, the definition is as follows:

The intersection of article An and article B is [Arector DPIX E] with size 3, and the set is [Arector Brecinct CpGe]. The size of article An and article B is 7, and the size is 7, respectively.

The intersection of article An and article C is [C], the size is 1, and the set is [A recorder Brecinct C pyrrine C pyrrine D pyrrine E maitre I mae Jre K], and the size is 9, 1 beat 9 times 0.11111.

In this way, we can draw the conclusion that the article A ~ ~ B is more similar to the article A ~ ~ C. with this algorithm, the computer can judge the similarity between the two articles.

Specific recommendation ideas

Given an article, get the keyword TAGS of the article, and then use the above algorithm to compare the similarity of all articles in the database, and get the most similar N articles for recommendation.

The acquisition of the first TAGS in the implementation process

The TAGS of the article is through the TF-IDF algorithm, extract the high-frequency words in the article, select N as TAGS, for the Chinese article also involves a Chinese word segmentation problem, because it is the relationship of the virtual host, this step of work I use python (why use Python, jieba word segmentation, really incense) to write a local program to complete the word segmentation of all articles, word frequency statistics, generate TAGS, and write back to the server database. Since this paper is an algorithm for writing recommendation, the parts of word segmentation and establishing TAGS are not specifically expanded, and different systems have different ways to build TAGS.

Storage of the second TAGS

Create two tables to store TAGS

Tags, used to store the names of all tag

+-+ | Field | Type | Null | Key | Default | Extra | +-+-+ | Tag | text | YES | | NULL | | count | bigint (20) | YES | | NULL | | tagid | int (11) | NO | PRI | 0 | | +-+-+

Tag_map establishes the reflection relationship between tag and the article.

+-+ | Field | Type | Null | Key | Default | Extra | +- -+-+ | id | bigint (20) | NO | PRI | 0 | | articleid | bigint (20) | YES | | NULL | | tagid | int (11) | YES | | NULL | | +-+-+

The data stored in tag_map is similar to the following:

+-+ | id | articleid | tagid | +-+ | 1 | 776 | 589 | 2 | 776 | 471 | 3 | 776 | 1455 | 4 | 1287 | 5 | 776 | 52 | 6 | 777 | 1386 | | 7 | 777 | | | 588 | | 8 | 777 | 109 | | 9 | 777 | 603 | | 10 | 777 | 1299 | +-+ |

In fact, when making similar recommendations, you only need to use the tag_map table, because tagid and tag name are one-to-one correspondence.

Specific code 1. Get the corresponding TAGIDmysql > select articleid, GROUP_CONCAT (tagid) as tags from tag_map GROUP BY articleid of all articles +-+-- + | articleid | tags | +-+-- + | 12 | 1178Magne1067Magne49LIN693Magne1227 | | 13 | 1971927131 | 2071927131 | 14 | 1945, 713,1711, 1711, 2024,49 | 15 | 35,1199. 1meme 1180 | | 16 | 1182 Permian 1924 pyrrine 2200 181pyrre 1938 | | 17 | 46492414424620 | 18 | 415499153567674 | | 19 | 1602Magol 805pas 691pr 1613194 | | 20 | 2070pens 1994886pas 575pr 1149 | 21 | 1953pr 1964pr 153pr 1393 | +-- + |

Through the above SQL, you can query the articles used at once, and all the corresponding tag

In PHP, we can turn tags into an array.

Public function getAllGroupByArticleId () {/ / caches the query data, because this is full table data, and it will not change if the article is not updated, that is, you have to get the data from the database once every time it is recommended, which will definitely have an impact on performance, so do a cache. If ($cache = CacheHelper::getCache ()) {return $cache;} $query_result = $this- > query ('select articleid, GROUP_CONCAT (tagid) as tags from tag_map GROUP BY articleid'); $result = []; foreach ($query_result as $key = > $value) {/ / use articleid as key, and the value is all tagID arrays under this id. $result [$value ['articleid']] = explode (",", $value [' tags']);} CacheHelper::setCache ($result, 86400); return $result;}

With the return results of this, it is easier to do, the next job is to apply the jaccard similarity algorithm, specific look at the code.

/ * [return similar article recommendations based on specified articles] * @ param $articleid specified articles ID * @ param $top number of recommended entries to be returned * @ return Array Array * / function getArticleRecommend ($articleid, $top = 5) {if ($cache = CacheHelper::getCache ()) {return $cache } try {$articleid = intval ($articleid); $m = new TagMapModel (); $all_tags = $m-> getAllGroupByArticleId (); / / call the above function to return tags $finded = $all_tags [$articleid] for all articles; / / because it contains all articles, it must contain the current article. Unset ($all_tags [$articleid]); / / remove the current article from the array, otherwise you must be the most similar to yourself. $jaccard_arr = []; / for storage similarity foreach ($all_tags as $key = > $value) {$intersect = array_intersect ($finded, $value); / / Compute intersection $union = array_unique (array_merge ($finded, $value)); / / calculate and integrate $jaccard_arr [$key] = (float) (count ($intersect) / count ($union)) } arsort ($jaccard_arr); / / in the order of similarity, the most similar row is the first $jaccard_keys = array_keys ($jaccard_arr); / / since the key of the array is the article id, you can take the key out here and array_splice ($jaccard_keys, $top) / / get the first N recommendations / / so far we have got the ID of the most similar N articles. The next work is to use these ID to query the relevant information from the database. $articleModels = new\ Api\ Model\ ArticleModel (); $recommendArticles = $articleModels- > getRecommendByTag ($jaccard_keys); CacheHelper::setCache ($recommendArticles, 604800) / / cache 7 days return $recommendArticles;} catch (\ Exception $e) {throw new\ Exception ("error getting recommended article");}} above are all the contents of the recommended method for Tags implementation. Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report