In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly introduces the relevant knowledge of what the ten key points of software development are, the content is detailed and easy to understand, the operation is simple and fast, and has a certain reference value. I believe that you will gain something after reading the ten key points of software development. Let's take a look.
1 full use of explicit / implicit feedback data
Data is the basis of all recommendation systems. A good recommendation effect must come from rich and accurate data. These data not only include the basic information related to the user (user) and the items to be recommended (item) (Note: item is related to specific recommendation scenarios, such as goods, movies, music, news, etc., and if it is a friend recommendation, then item can also be user itself), on the other hand, the user behavior and relationship data between user and item in the website or application is also very important. Because these user behaviors and relationship data can truly reflect each user's preferences and habits. Collecting these basic data and doing a good job of cleaning and pre-processing is the cornerstone of the whole recommendation system.
User behavior data can be subdivided into two parts: explicit feedback data (explicit feedbacks) and implicit feedback data (implicit feedbacks). Explicit feedback refers to the behavior data that can clearly express the user's likes and dislikes, such as the user's purchase, collection, rating and other data. On the contrary, implicit feedback data refers to the behaviors that can not directly reflect the user's preferences, such as clicking, browsing, staying, jumping, closing and so on. Users' preferences can be clearly grasped by mining explicit feedback data, but in many applications, explicit feedback data is usually very sparse, resulting in the mining of user preferences can not be in-depth. This problem is particularly evident in some newly launched applications, or unpopular items or users. In this case, the implicit feedback data of users is particularly important. Because although the user's behavior such as clicking in the website is very complicated, it contains a lot of information. In the international famous recommendation competition Netflix Prize held from 2006 to 2008, Yehuda Koren, a member of the champion, found that the video records rented by users were converted into eigenvector injection singular value decomposition (SVD) algorithm to influence users' interest vectors, which can improve the accuracy of recommendation.
The preprocessing of basic data is also very helpful to improve the effect of recommendation. Take this year's KDD-Cup international data mining competition as an example, the number of negative samples in the training samples actually reached 92.82% of the total number of samples, but after careful analysis of these negative samples, it was found that a large number of samples had noise. Through a series of Session analysis and screening methods, 11.2% of the samples were retained for subsequent recommendation mining, which not only successfully improved the recommendation accuracy. And the amount of computation is greatly reduced. Therefore, the first key point is to make full use of all kinds of explicit and implicit data, and to preprocess the data to ensure the quality of the input data.
2 multi-model fusion technology
After years of development, many recommendation algorithms have been proposed. The common recommendation methods are divided into large categories, including historical behavior-based (Memory-based) method, model-based (Model-based) method, content-based (Content-based) method and so on. In the direction of Memory-based method, it can be further subdivided into item-based collaborative filtering algorithm (item-based collaborative filtering), user-based collaborative filtering algorithm (user-based collaborative filtering), association rules (association rule), etc. Model-based methods commonly used include Random Walk, pLSA, SVD, SVD++ and so on. When each method is implemented, there are many different implementation schemes for different problems. For example, in the item-based collaborative filtering algorithm (item-based collaborative filtering), the similarity calculation formula (Similarity) between item may also have many changes.
In addition to the results of the system recommendation, there is a traditional way to make recommendations through experts. These experts can be experienced editors or opinion leaders in the community. The results recommended by experts in these fields can be used as a useful supplement to the algorithm recommendation results in many practical applications.
In fact, in practice, no method always occupies an overwhelming advantage in practice, each has its own advantages, each has its own appropriate application scenarios, so according to local conditions, choose different methods according to different scenarios, and organic combination, can greatly improve the effect of recommendation. Common fusion methods include Restricted Boltzmann Machines (RBM), Gradient Boosted Decision Trees (GBDT), Logistic Regression (LR) and so on. There are many related articles in previous recommendation competitions. We can see that in order to improve the recommendation effect, the results of different algorithms can learn from each other and play their own value, which is extremely effective.
3 the application of time factor
There is a strong time rule in the behavior of users. For example, we usually have lunch, take weekends off, go home for reunion and so on. The behavior of users in various applications also has rules that can be mined. Making good use of the feature of time will be of great help to improve the effect of recommendation in many recommendation scenarios.
In the user behavior log, the time stamp (timestamp) of the behavior is usually recorded. This timestamp can be analyzed from both user and item. From the perspective of user: the interest of user often changes over time, and the interest of a few years ago may be different from the current interest; on the other hand, there are certain rules in the behavior of user, such as the behavior of working days is similar, while the behavior of user will change on weekends, even in the same day, the behavior and preferences of user in the morning and evening will have different rules.
From the perspective of item, popularity fluctuates regularly over time. By continuously mining the records of the behavior between user and item over a period of time, we can often discover this rule, and then it can be used to guide us to predict the behavior of user at a later time and improve the accuracy of recommendation.
Some common ways to deal with time factors include: 1) adding time factor to the formula for calculating user or item similarity by collaborative filtering, 2) mapping time discretization to natural month, week, day, hour and other time slices, and doing statistical calculation respectively, and then using the accumulated data in a specific regression model (Regression models) to guide the prediction of the results. 3) time is used as a linear continuous variable to train model parameters.
4 the use of regional attributes in specific scenes
Some recommendation scenarios are closely related to the region of the user, especially for some LBS and O2O applications, once you leave the region, then the effect of intelligent recommendation is impossible. For example, when you need to recommend a restaurant, if you do not consider the current location of the user, then even if a restaurant and the current user's taste match is very high, but far away, the recommendation is worthless.
At present, the use of regional features in the recommendation system is still in a relatively primitive state, and users usually need to manually screen the areas where the recommendation results are located (such as provinces, cities, districts, counties, etc.), or specify the results within a certain radius. This method is not only tedious to operate, but also lacks detailed analysis of regional information. For example, although the map of locations An and B is far away, there is a direct subway commuting between the two points, while the other location C, although the map is very close to A, needs to make a detour between the two points. In addition, from the user's point of view, the area of daily activities is always regular, for example, during the working day, the activity area is often near the place of work, and the time at night is near home.
In applications based on geographic location information, it is necessary to be smarter to mine users' regional preferences (and this preference is often closely related to time). For example, in user-based collaborative filtering, the behavior of active users in similar regions is used as the basis for recommendation, that is, users who think that their activities are similar may have the same preference. Or use the idea of item-based collaborative filtering to introduce regional features when calculating the similarity between item. In Latent Factor Model, it is feasible to use the user's active region as implicit feedback to act on the user feature vector and so on.
Mobile phone is the best carrier for recommendation based on regional information. with the increasing popularity of mobile Internet applications, we look forward to the advent of more recommended products based on regional information in the future.
5. Use of SNS relationships
Social network has developed by leaps and bounds in recent years, users are no longer simple content receivers, but can actively establish the relationship between users. These relationships can be divided into explicit relations (explicit relations) and implicit relations (implicit relations). An explicit relationship refers to a related relationship that the user has clearly established, such as following / being followed on Weibo, or adding as a friend in the community. The implicit relationship means that there are some interactive behaviors between users, but these behaviors can not clearly indicate the relationship between users. For example, a user clicks, comments, or forwards another user's post in Weibo, if another player talks in the online game world, or competes, etc. Although implicit relationships are not as explicit as explicit relationships, they are much richer than explicit relationships. Therefore, in some application scenarios that require high recommendation accuracy, explicit relationships need to play a major role; while in some scenarios where recommendation recall rate and diversity of recommendation results need to be improved, especially when explicit relations are faced with the problem of data sparsity (Note: this problem is common in recommendation applications), making full use of implicit relationships can achieve very good results. Take this year's KDD-Cup competition as an example. On Tencent Weibo's friend recommendation system, we can improve the recommendation accuracy by 5.5% by adding implicit relations to the SVD++ model to deal with the problem of data sparsity.
In addition, the popularity of the mobile Internet has made SNS relationships more convenient to use, and coupled with geographical information, has produced novel mobile applications like Wechat, while the combined use of SNS relationships and regional characteristics will certainly make the recommendation system produce more popular results.
6 ways to deal with the cold start problem
Cold start is the longest problem of recommendation system, with the birth of recommendation system so far. This is because the key to improving the effectiveness of the recommendation system naturally lies in the data, and when new users, or new items, are just online, a large number of methods are difficult to be effective in the initial period of time due to the scarcity of accumulated data.
Cold start problem can be subdivided into user cold start or item cold start. User cold start is common in some scenarios, such as in some short video sites, because user does not have the habit of logging in and browsing, so a large number of visitors are unfamiliar cookie users, how to recommend these user is very important. Common ideas include: 1) adopt the results of popular recommendations (rankings). Although ranking is a seemingly simple method, a well-designed ranking is not as simple as imagined. How to calculate the ranking and what statistical characteristics it is based on are worthy of in-depth study. 2) make full use of limited user information to quickly capture preferences. For example, the attributes of the user's source ip, access time, and the results initially clicked need to be fully utilized. 3) set up a simple taste test for new users and actively collect user preferences according to the answers submitted by users. Common solutions include building a user model quickly through the user's choice in order to provide some pre-designed options. When designing options, some considerations include: a) options that must be representative; B) options that need to be relatively popular or have a certain degree of user awareness; and C) differentiation between options.
The problem of cold start of item is common in some applications where item is updated frequently. For example, some e-commerce websites continue to put on the shelves of new products, which are difficult to recommend due to the lack of clicks. But the content-based approach (content-based) often plays a key role at this time. According to the initial characteristics of item, such as categories, tags, keywords and so on, the correlation degree between item can be calculated. Although many comparative tests show that the recommendation accuracy of content-based recommendation algorithm is often not high enough, this method has inherent advantages in dealing with item cold start, so it can be used in engineering practice.
7 how to present the result of recommendation
Recommendation system is not only limited to recommendation algorithm and architecture, but also a complete system. Among them, the presentation scheme of recommendation results is a very important part of this system, which is often ignored by engineers when developing a recommendation system, but the location and information provided by the recommendation results are ultimately presented to the users. often plays an important role.
The key points to pay attention to here include: 1) the recommendations of different item, due to different user concerns, show different solutions, to highlight the focus of users as far as possible. For example, when recommending a dress, thumbnails play an extremely important role in the user's willingness to click; while when recommending service goods (such as travel routes), the number of days, price, discount and other information, is the focus of users' attention; 2) the scene and location of the recommended display need to be in line with the user's behavior habits. A comparative experiment with LinkedIn, a job-hunting social networking site, shows that the former has 10 times the click-through rate of the latter by showing referrals before or after they have applied for a job. When you put the recommendation results in the middle or right sidebar of the page, the click-through rate varies by as much as 5 times.
Another extremely important aspect of the presentation is to provide reasons for recommendation. Because by showing the reasons for the recommendation, we can win the trust of the user, and then make it easier for the user to accept the result recommended to him. For example, giving A user a guess of her favorite video V may be difficult to trust. But if the reason for recommendation is given at the same time: "B and C with similar tastes to you have collected the video", it will improve the user's sense of trust. In addition, the recommendation reason itself is also a good supplementary description of the recommendation result. For example, to recommend a novel, according to the traditional scheme, it is difficult for readers to get enough information if they only provide the title and cover of the novel, but if they provide a reason for recommendation: "the most sold this week" or "the latest work by XXX, a platinum writer from the starting point", it will be of great help to improve the success rate of recommendation.
The appropriate display scheme of recommendation results requires the full combination of technology, products, UI, UED, etc., and has a detailed grasp of the needs and minds of users, which can often get twice the result with half the effort.
8 define the optimization objectives and evaluation means
It is not difficult to develop a preliminary recommendation system, but how to improve and go further on the basis of the original recommendation effect. The determination of optimization objectives and evaluation means is the key to solve this problem. First of all, we need to determine the optimization goal of the system. For example, some recommendation systems pursue the click rate of recommendation results; some also consider the actual conversion or transaction effect after clicking; some recommendation scenarios pay more attention to the novelty of recommendation results, that is, they want to show more of the new items included in this site to users; others pay more attention to the diversity of results.
After the goal of the recommendation system is clear, the following question is, how to evaluate these recommendation goals quantitatively? The traditional score prediction problems usually use calculation methods such as root mean square error (RMSE) or average absolute error (MAE). However, Top-N recommendation is more common in practical applications. In this scenario, NDCG (Normalized Discounted Cumulative Gain) or MAP (Mean Average Precision) is a commonly used measurement method.
Because recommendation systems often learn from some technologies in related fields, such as advertising or search systems, calculating the pCTR in advertising or the Precision-Recall curve of search systems are often used to evaluate the effectiveness of recommendations. Some systems even directly transform the recommendation system into a machine learning problem, and the evaluation means are also transformed into corresponding problem methods.
In the actual system, it is often the joint action of multiple indicators (click rate, accuracy, coverage, diversity, novelty, etc.), and according to the actual needs of the product, weighted compromise to evaluate the results. The test method is also online, such as Astroke B Testing and manual evaluation. No matter which method is used, a mature recommendation system must be based on clear optimization objectives and evaluation system, which are like a ruler to measure the progress of the recommendation system each time.
9. Timeliness problem
As the saying goes, "martial arts all over the world, only fast but not broken", the recommendation system should be able to catch the changes of user needs in time, feedback to the model, and respond to user requests in time to provide online services in real time. Because users are picky and impatient, especially for new users, users will be lost quickly if a recommendation system cannot adjust the results to suit users in a short period of time.
The timeliness of the recommendation system is first reflected in the ability to sensitively capture user feedback, which includes both positive feedback (loved by users) and negative feedback (which users are not interested in). Many recommendation systems often ignore the collection of negative feedback samples, but in fact, effective collection of users' positive and negative feedback and comparative training can grasp users' preferences more comprehensively and accurately.
On the basis of user feedback collection, we also need to be able to update the background recommendation model in a timely manner. The user model and item model of many recommendation systems need to be mined through a large number of user logs, so it is expensive to calculate, so a well-designed recommendation system needs to combine offline mining and online service system organically. The offline system can be designed to be "thick", that is, the algorithm is complex, the model is huge and the update is slow. On the other hand, the online system tends to be lightweight and flexible, which can transmit the captured positive and negative feedback information in time, modify the online model, and capture the short-term interest changes of users, so as to quickly modify the recommendation results.
10 big data Mining and performance Optimization
Big data mining is a research hotspot in recent years. Thanks to the widespread use of distributed computing technology, the data scale of the system is getting larger and larger, the ability of offline data mining is getting stronger and stronger, and it is becoming more and more convenient to deal with a large number of user behavior data. However, in recommendation mining, there is always a contradiction between the computing power that the system can provide and the actual operational requirements, so it is very important to allocate computing resources effectively and reasonably. Here, we need to make a reasonable allocation in the depth of excavation. For key users or item, we can allocate more resources and carry out more in-depth mining. The same is true for basic data, where high-quality data can be used for more detailed analysis, while low-value data may only need to simplify the processing process.
The back-end offline system often needs to update the model regularly, and the full or incremental update mode of the model is also a point of concern. Taking the user model as an example, not all users' personalization models need to be updated frequently, and active and high contribution users should need to be updated more frequently. Similarly for item, the cycles of popular item and unpopular item update technologies can be different. There are also some common skills in big data's recommendation system performance optimization, such as the use of inverted index, the full use of cache mechanism and so on.
This is the end of the article on "what are the Ten key points of Software Development?" Thank you for reading! I believe you all have a certain understanding of the knowledge of "what are the Ten key points of Software Development". If you want to learn more, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 257
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.