Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

The relationship between conceptual data analysis recommendation system and search engine

2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >

Share

Shulou(Shulou.com)06/01 Report--

The relationship between recommendation system and search engine is good, Chen Yunwen

From the point of view of information acquisition, search and recommendation are the two main means for users to obtain information. Search and recommendation coexist both on the Internet and offline, so what is the relationship between the recommendation system and the search engine? What are the differences and similarities? In this paper, the author is fortunate to have the first-line technical product development experience of search engine and recommendation system, combined with his own practical experience to explain the relationship between them and share his own experience (Dr. Chen Yunwen)

Figure 1: search engine and recommendation system are two different ways to obtain information

Active or passive: the Choice of search engine and recommendation system

Access to information is a rigid need for human beings to understand the world, survive and develop, and search is the clearest way, which embodies the action of "going out to find", looking for food, finding places, etc. In the Internet era, the search engine (Search Engine) is the best tool to meet the needs of finding information. You enter the content you are looking for (that is, enter the query word in the search box, or Query), and the search engine will quickly give you the best results. Such rigid demand has given birth to Internet giants such as Google and Baidu.

But in addition to searching, there is another way to get information, called Recommendation System (Recsys for short). Recommendation is also a basic skill born with human development. You must encounter such a scene, when you are new to a place, you will ask your local friends, "Hi, please recommend any delicious and interesting places nearby!" Knowledge, information, etc., are spread through recommendations, which is also a way to obtain information.

The difference between search and recommendation is shown in figure 1. Search is a very active behavior, and the needs of users are very clear. In the results provided by search engines, users can also clearly judge whether they meet their needs by browsing and clicking. However, the recommendation system accepts information passively, and the requirements are vague and ambiguous. Take "shopping" shopping mall as an example, when users enter the mall, if the demand is not clear, at this time, they need a recommendation system to tell users what quality goods they have, what appropriate content, etc., but if users are already very clear about which brand and model of goods they need to buy, they can just go to the corresponding store and search at this time.

Figure 2: as can be seen from the search terms, users have a large number of needs for personalized recommendations

Many Internet products need to meet the two needs of users at the same time. For example, for websites that provide music, news, or e-commerce services, they must provide search functions. When users want to find a certain song or product, enter the name and you will find it. At the same time, it is also necessary to provide recommendations that are good enough to enhance the user experience when users just want to listen to good songs or pass the time to watch the news, but are not sure which songs they must listen to.

The degree of personalization

In addition to active and passive, another interesting difference is the degree of personalization. Although search engines can also have a certain degree of personalization, but the overall space for personalized operation is relatively small. Because when the requirements are very clear, there is usually not much personalized difference in finding the results. For example, when searching for "weather", the search engine can supplement the information of the user's area and give the results of the local weather, but the results given after personalized replenishment are also clear.

Users' personalized needs for information

However, the operation space of the recommendation system in terms of personalization is much larger. Take "recommend good movies" as an example, 100 users have 100 tastes, and there is not a "standard" answer. The recommendation system can generate a result that is most valuable to the current user according to each user's historical viewing behavior, scoring records, etc., which is also the unique charm of the recommendation system. Although there are many kinds of recommendations (such as related recommendations, personalized recommendations, etc.), personalization is so important to the recommendation system that in many cases, people simply call the recommendation system "personalized recommendation" or even "intelligent recommendation".

Quick gratification or continuous service?

Friends who have developed search engines know that an important consideration in evaluating the quality of search results is to help users find the results they need as soon as possible and click to leave. In the design of search sorting algorithm, we need to find ways to put the best results at the top, often the first three results of search engines gather the vast majority of user clicks. To put it simply, a "good" search algorithm requires users to obtain information more efficiently and stay for a shorter time.

But recommendation is on the contrary, recommendation algorithm and recommended content (such as goods, news, etc.) are often closely integrated, and the process for users to obtain recommendation results can be continuous and long-term, to measure whether the recommendation system is good enough or not. It is often based on whether users can stay more time (such as buying more goods, reading more news, etc.), the deeper the mining of users' interests. The more you "understand" the user, the higher the success rate of the recommendation, and the more likely the user is to stay in the product.

Therefore, for a large number of content-based applications, to create an excellent recommendation system is a means to improve performance.

The recommendation system meets the needs that are difficult to express in words.

At present, the mainstream search engines still use words to form query words (Query). This is because text is the most concise and direct way for people to describe their needs, and most of the content crawled and indexed by search engines is also organized by text.

Because of this factor, we find that most of the search query words entered by users are relatively short, and those with 5 or less elements (or Term) account for more than 98% of the total query volume (for example: Query "outlook data address", including two elements "outlook data" and "address").

But on the other hand, there are a lot of users' demand that is difficult to organize in refined words, such as looking for "Sichuan cuisine restaurant close to me with a price of less than 100 yuan" and "other dresses of the same style but more favorable price as the dress I am looking at."

On the one hand, few users are willing to type so many words to find results (users are naturally willing to be lazy), on the other hand, the search engine's understanding of semantics is not deep enough. Therefore, when meeting these needs, through the functions set up by the recommendation system (such as "relevant recommendation", "guess you like" and other modules set up on the page), plus interaction with users (such as filtering, sorting, clicking, etc.), continue to accumulate and mine user preferences, these difficult to express in words to meet the needs.

Visually speaking, the recommendation engine is also known as silent search, which means that although users do not actively enter query words to search, the recommendation engine automatically generates complex query conditions by analyzing the user's historical behavior and the current context scene, and then give the results of calculation and recommendation.

Matthew effect and long tail Theory

Matthew effect (Mattnew Effect) refers to the phenomenon that the strong is stronger and the weak is weaker. The popular products in the Internet receive more attention, and the unpopular content will be forgotten. The Matthew effect is named after a parable in the Gospel of Matthew of the New Testament: "whoever has will be doubly given to him to be superfluous, and what he does not have will be taken away from him."

Search engines fully reflect the Matthew effect-such as the following Google click heat map, the redder part indicates more clicks and hot, the more purple part means fewer clicks and cold, the vast majority of users' clicks are focused on the top few results, the following results and the results after turning the page get very little attention. It also explains why Google and Baidu ads make so much money, and why corporate customers make so much effort to do SEM or SEO to improve their rankings-because they only have a chance to be at the top of search results.

The Matthew effect fully embodied by search engines: high-quality content attracts the vast majority of clicks

Interestingly, corresponding to the "Matthew effect", there is also a very influential theory called "long tail theory".

Long tail theory (Long Tail Effect) was first put forward by Chris Anderson, editor of Wired magazine, in his article "long tail" in October 2004. long tail is actually the extension and colloquial expression of power law (Power Laws) and Pareto distribution (Pareto Distribution) in statistics, which is used to describe the distribution of popular and unpopular items. By observing the data, Chris Anderson found that in the Internet era, because network technology can enable people to get more information and choices at a very low cost, more and more non-hot things that were previously "forgotten" have been paid attention to by people again in many websites. In fact, everyone's tastes and preferences are not completely consistent with the mainstream, Chris points out: the more we find out, the more we realize that we need more choices. If the search engine embodies the Matthew effect, then the long tail theory expounds the value of the recommendation system. Chen Yunwen

Recommendation system and long tail Theory

A practical example is the data comparison between Amazon's online bookstore and traditional large bookstores. There are more than millions of books published in the market, but most of them cannot be sold on the shelves of traditional large bookstores (physical store space is limited), and those that can be placed in prominent positions in bookstores (such as bestsellers Best Seller shelves) are even rarer, so the business model of traditional bookstores is mostly centered on bestsellers. However, the development of online bookstores such as Amazon provides infinitely broad space for long-tail books. It is much more convenient for users to browse and purchase these long-tail books than traditional bookstores, so thousands of niche books are sold in the Internet era, even if they only sell one or two at a time, but because there are many more kinds of books than popular books, just like long tails. The cumulative sales of these books even surpass those of bestsellers. As Steve Kaiser of Amazon said: "if I have 100000 books, even if I sell only one at a time, their sales will exceed that of the newly published Harry Potter in 10 years' time!"

As a new economic model, long tail theory has been successfully applied in the field of network economy. The activation and utilization of long-tail resources is precisely what the recommendation system is good at, because users are usually strange to the long-tail content and can not actively search, but only through the way of recommendation to attract users' attention and explore their interests. to help users make the final choice.

Invigorating the long tail content is also very critical for enterprises, to create a rich content, a hundred flowers blossom ecology, can protect the enterprise healthy ecology. Just imagine, if an enterprise relies on only 0.1% of the "popular style" goods or content to attract popularity, then over time these popular styles are no longer popular and the new popular styles are not filled in time. Then the performance of the enterprise is bound to fluctuate greatly.

Another imperceptible danger of relying only on the most popular content is the loss of potential users: because relying only on popular styles can attract a group of users (Class A users for short). But at the same time, it also quietly excludes users who are not interested in these popular content (Class B users for short). According to the long tail theory, the number of Class B users is not small. And with the passage of time, Class A users will gradually change into Class B users (because people like the new and hate the old), so rely on the recommendation system to fully meet the personalized and differentiated needs of users, and let the long tail content be exposed at the right time. Maintain the healthy ecology of the enterprise, in order to make the operation of the enterprise more stable and less volatile.

Similarities and differences of evaluation methods

Search engines are usually based on the Cranfield evaluation system, and based on the evaluation indicators commonly used in information retrieval, such as nDCG (English full name normalized Discounted Cumulative Gain), Precision-Recall (or its combination F1), Prunn and so on. For details, please refer to the article "how to quantitatively evaluate the quality of search engine results Chen Yunwen" published in InfoQ. On the whole, the focus of the evaluation is to put the high-quality results at the top of the search results as far as possible, and the top 10 results (corresponding to the first page of the search results) almost cover the main content of the search engine evaluation. Let users find the content with the least number of clicks and the fastest speed is the core of the evaluation.

The evaluation scope of the recommendation system is much broader, often the number of recommendation results is much more, and the location and scene are also very complex. From a quantitative point of view, when applied to Top-N results recommendation, MAP (Mean Average Precison) or CTR (commonly used in computing advertising) is a common measurement method; when used in scoring prediction problems, RMSE (Root Mean Squared Error) or MAE (Mean Absolute Error) is a common quantitative method.

As the recommendation system is more closely bound to the actual business, there are many side evaluation methods from the business point of view, according to different business forms, there are different methods, such as incremental clicks, number of recommended successes, transaction conversion and improvement. Indicators such as the extended stay time of the user.

The blending of search and recommendation

Although there are many differences between search and recommendation, both of them are application branches of big data technology, and there is a lot of overlap. In recent years, search engines have gradually integrated the results of the recommendation system, such as "relevant recommendations" on the right and "related search terms" at the bottom, using the product ideas and operation methods of the recommendation system (such as the red circle area below).

In other platform e-commerce sites, because of the large number of results and no obvious difference in relevance, there is a certain operational space for personalized ranking of search results. The personalized recommendation technology used here is also helpful to promote transaction.

Recommendation system elements fused in search engine

The recommendation system also makes extensive use of search engine technology. An important data structure for search engines to solve the computational performance is inverted index technology (Inverted Index). In the recommendation system, a kind of important algorithm is content-based recommendation (Content-based Recommendation), in which a large number of methods such as inverted index, query, result merging and so on are used. In addition, click feedback (Click Feedback) algorithm is also widely used in both to improve the effect.

About outlook data

Dagan data is a high-tech start-up company focusing on enterprise big data application services, committed to providing high-quality big data mining services for e-commerce, new media, finance, enterprises, etc., including technical services such as recommendation systems and search engines, and strive to help cooperative enterprises improve their performance, improve service quality and enhance competitiveness through the technical experience accumulated by Dagan data.

This paper summarizes

As the two major applications of big data, search engine and recommendation system not only accompany and influence each other, but also meet different product needs. As the connector of Internet products: the bridge between people, information and services, search and recommendation have their own characteristics. This paper expounds the relationship between them and analyzes the similarities and differences. They are not only the wisdom crystallization of long-standing disciplines such as data mining technology, information retrieval technology, computational statistics, but also related to cognitive science, prediction theory, marketing and other related disciplines. Interested readers can be extended to these related disciplines to do more in-depth understanding. (Wen / Chen Yunwen)

Attachment: http://down.51cto.com/data/2367228

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Network Security

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report