In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Preface
Recent decades are the era of rapid development of the Internet. With the development and growth of the Internet industry, there is bound to be a breakdown of roles, thus evolving into different functional positions. With the increasingly fierce market competition, practicing internal skills and improving the effect of products has become an important work for the development of various companies. How to evaluate the effect of the product? How is user experience measured? This paper attempts to explain the main responsibilities of this new position in Gaode, the process of development and evolution, as well as the means and system construction of product effect evaluation in this position.
When you type the word evaluation into each search engine, the related search you see usually looks like this:
These questions actually represent what most people know about the evaluation-- except for the game evaluation, the mobile phone evaluation, the car evaluation and the daily necessities evaluation, people don't know much about the evaluation. What do you do when Title is a test student in an Internet company? Maybe people know even less.
After more than three years of evaluation, the soul torture we often face in the first year is: "what do you evaluate?" The answer to this question is basically similar to the final three questions of philosophy: "who are you? where are you from? where are you going?"
Who is the evaluation? This is the positioning of the evaluation. Where does the evaluation come from? This is the foundation and origin of evaluation. Where is the evaluation going? This is the development goal and direction of the evaluation.
Who is the evaluation?
Simply put, an evaluation is a team that evaluates the effectiveness of a product. Hope to stand from the user's point of view, verify the demand effect before the launch, and establish a three-dimensional product evaluation system, that is, the evaluation system, through a comprehensive analysis of itself, user data and competitive products.
Where does the evaluation come from?
The answer to this question is, in fact, why do you want to evaluate it?
Just like every version update, we will be concerned about performance, and when we launch a new strategy, we will also be concerned about the effectiveness of the product. How to evaluate the effect of the product? After the development of strategy-related requirements is completed, is the actual effect of R & D consistent with the expectations of the product manager? Is the actual effect consistent with the user's expectations? Ideally, there should be no difference among the three. However, we should also have a way to measure whether there is a difference between them, and draw a conclusion as to whether the effect change is positive or not, so as to better protect the user experience.
In addition, even before the launch, everyone unanimously came to the positive conclusion that the launch of demand will greatly improve the user experience. It is still up to the user to decide how the real product experience is. Larger modifications can circle a small number of users through AB experiments, quickly collect user data, and further evaluate whether the demand effect is positive or not. Or go online directly and complete the online evaluation through the analysis of behavior data and user feedback.
At the same time, in order to find their own position in the market, the analysis of competitive products is essential.
With these requirements for effectiveness evaluation and analysis, there is an evaluation team.
How to evaluate
Offline effect evaluation and analysis before launch, AB experiment and analysis, index monitoring and problem analysis after launch, problem mining, competitive product monitoring and analysis are common evaluation methods.
1. Offline evaluation
Before going online, according to the demand of the product, the responsibility of the evaluation is to analyze and verify the effect of the product in various ways, give the conclusion whether it can meet the online standard, and analyze the head problem at the same time.
At the beginning of the establishment of the technical evaluation team, the main construction parts are: the determination of cooperation process, the construction of evaluation professional capacity and the construction of evaluation tools.
Cooperation process
Standard a version of the development of the project process, from requirements identification to development, to test verification and then to launch. The evaluation starts from the demand string stage, and it is clear which requirements involve the effect change. Then make the evaluation plan according to the change, and check whether the tool meets the needs, such as entering the rapid development stage of the tool otherwise. Then get the evaluation data, enter the evaluation verification stage, and finally send a report to give the conclusion of whether the requirements pass the evaluation or not, and summarize and classify the problems.
For the different business lines involved in the evaluation, the process of the evaluation is roughly the same. However, due to different businesses, the evaluation schemes and methods will be very different.
Evaluation scheme
According to the product requirements, clear the effect to modify the scope of influence, so as to determine the evaluation samples, evaluation methods and evaluation standards.
Evaluation sample
The evaluation samples are usually divided into random corpus and specific corpus according to the influence scope of demand.
The specific corpus is generally extracted according to the specific dimensions and types of demand modification, in order to ensure the coverage of the evaluation task. The random corpus is to reflect the real influence scope of demand. When an evaluation task needs to use a specific corpus. It is usually recommended to use a specific corpus and a random corpus to ensure adequate coverage at the same time, while understanding the real scope of influence to ensure that there are no unexpected changes.
In addition to the real corpus, the self-constructed corpus will also be used in specific situations. The common reasons are: 1) there is no real online corpus before the strategy is online; 2) the impact of the scene is too small to find enough Case in the real corpus.
Evaluation standard
Evaluation criteria usually involve a concept, that is, true value. When a certain type of data has the only correct answer in the real world, there is absolute truth, such as data information. Therefore, our evaluation criterion for this kind of data is whether it is consistent with the true value.
The other is relative truth. The source can be a user log. For example, when we judge whether the estimated time of arrival (ETA) provided to the user is correct, we can compare the real travel time of the user between the start and end point as the true value and our estimated time. However, because the actual driving time of a single user is affected by personal driving habits and single driving conditions, it is not completely accurate. So it's relative truth. In search and other business lines, the user's click behavior can also become a relative true value, thus becoming the standard of effect evaluation.
Whether there is a true value, whether it is easy to obtain, and whether it can be obtained automatically in large quantities is the judgment that needs to be made when confirming the evaluation standard.
Evaluation mode
Corresponding to different evaluation purposes, we give different offline evaluation methods. For business with true value, automatic evaluation can be realized through automatic acquisition or labeling of true value. On the other hand, for the business line without true value, the cost of judging whether the effect is good or bad is higher, which usually requires manual evaluation or semi-automatic evaluation.
Manual evaluation, as the name implies, is scored by manpower. Search companies are probably the first to evaluate the effectiveness of their products, Google, Microsoft, Baidu, Apple and so on, have adopted a similar way to evaluate the quality.
Google has published 164 pages of manual quality assessment guidelines. Baidu and Bing have released similar documents.
When Apple introduced its own evaluation system, it also specifically explained Human Judgement metrics, why we track them?
Version problems can be found before going online.
The index of manual evaluation is closely related to the quantitative index.
You can define the overall quality of a version and continuously follow up on effect changes.
It is more detailed and easier to locate the problem than user feedback.
There is no need to say much about the shortcomings of manual evaluation, such as high cost, small coverage and low efficiency. Because of its advantages, it is still an indispensable part of the company evaluation system. When used in combination with other evaluation methods, it can achieve good results.
To ensure the quality and efficiency of manual evaluation, there are three key points, the first is the standard, the second is the process, and the third is the tools.
Standard documents, similar to operating manuals, are designed to reduce personnel training costs and minimize cognitive differences on some Case that are difficult to judge. So standard documents should be as stupid as possible. It is well defined, all special and exceptional scenarios have examples, are tested repeatedly in practice, and are updated frequently. The document should be updated by a dedicated person, with a clear update cycle and synchronized update points to all evaluators.
Manual errors are inevitable, and no one can achieve 100% accuracy. At the same time, the evaluation objects that need manual evaluation usually do not have an objective and unified definite answer, so it is inevitable that there are differences in judgment. These problems need to be guaranteed from the process. Just as a Case must be marked by multiple people, only the Case with high consistency rate is retained, otherwise it will be discarded. Or adopt the first trial review system, less experienced personnel to conduct the first trial, senior personnel for review.
Blind trial, this method is usually used in comparison, remove the new and old versions or the left and right versions of the logo, and let the results appear randomly, so as to ensure the objectivity of the evaluators, not affected by subjective factors.
People in manual evaluation usually have two identities. One is an ordinary user, the other is an expert. Expert evaluation needs to stand in a more professional perspective, combined with their own understanding and experience of the business in order to draw a conclusion. The other is that ordinary users can give good or bad results from their own perspective. The latter can be tested to achieve a wide range of user experience and feedback, while obtaining some real data to support iterative optimization. Map navigation usually needs to be evaluated by experts because of its professionalism.
Evaluation tool
Evaluation tools are the guarantee of evaluation efficiency and quality. The core functions include data warehouse, task management, task capture and analysis, diff statistics and screening, task case display, evaluation, transfer, sampling, distribution, result management and automatic report.
Task types, scoring methods, and Case forms outside the general process can all be defined by themselves. Since most of the tasks are the evaluation of comparison classes, how to do diff is also very critical, so try to differentiate each focus of the business focus by diff. In order to quickly understand the impact surface of the iterative effect, and quickly locate the problem. Expert evaluation also needs data and tools to assist analysis or judgment when analyzing and locating problems. The access of tools can often greatly improve the efficiency of evaluation.
Manual evaluation can run well, with a certain amount of evaluation experience and business understanding, began to carry out semi-automatic and automatic evaluation construction.
The methods include defining the index fluctuation threshold and the smoke evaluation of extreme Case, and simulating the automatic scoring model of manual evaluation.
By learning the characteristics of manual evaluation, the automatic scoring model automatically gives the score of GSB, statistics the scoring results, and makes a preliminary judgment on the effect of the evaluation task. At present, it can be used as a reference for auxiliary judgment.
Smoke assessment first defines the scenarios and dimensions of the business core concerns, and sets indicators. According to the past evaluation experience, the acceptable fluctuation threshold is calculated. In addition, the bad Case which is unacceptable in the effect change is defined. For some experiments that need to be verified online quickly, the evaluation cycle can be shortened and no abnormal effect can be guaranteed. In this way, the process of automatic release and launch has been realized in some business lines.
The evaluation mode of index analysis and anomaly detection is one of the best practices for offline evaluation of business without true value at present. By defining the overall index, scene index and abnormal index, a more comprehensive index system is formed. Observe the overall fluctuation and distribution of the new version under different conditions. In the process, the abnormal Case was screened out and then checked manually. Finally, a conclusion is given according to the changes of the index and the results of manual test. If there are no anomalies, you can pass the evaluation quickly.
Finally, the road test is the ultimate means to verify the effect of navigation products. Experience and evaluate the whole process from the user's perspective. Although the cost is high and the efficiency is low, it is indispensable. Using it together with other means is also one of the ways to ensure the effect before going online.
2. AB experiment
Part of the requirements, especially model tuning. You need to go online to observe the effect. Therefore, after quickly passing the offline evaluation, we enter the AB stage to evaluate the effect.
The core links of AB are shunt marking, index observation and experimental conclusion output. The key point is the scientific nature of the experiment. In the effect evaluation link, it is not difficult to have the capability of AB, but the construction of AB experiment is a long-term process, which will not be discussed here.
III. Online verification
After offline verification and AB experiments, it is proved that the effects are positive, and the demand is usually fully online. What is the effect after the launch? it is necessary to analyze the online indicators and observe the feedback of users to find out whether there is an expected return on the core indicators, and to observe whether the indicators have abnormal changes.
The core of a product is to meet the needs of users and create user value. Therefore, whether to meet the needs of users, user satisfaction and the situation of products in the market must be questions that product creators should pay close attention to and answer for a long time. That's how we try to answer these questions.
Conclusion
The construction process of evaluation is actually the process of building a three-dimensional system of product effect evaluation. This responsibility needs to be taken up by someone in any Internet company. But the role may be testing, maybe product, maybe operation. At Gaud, the reason for the independence of this role stems from the emphasis on user experience and product effectiveness. Of course, this system is far from perfect, and we are still in the process of building and evolving, and we always hope to make our travel better through continuous efforts.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.