DataPipeline | Xia Kai, head of Happy products: data-driven user growth 04/28 Update SLTechnology News&Howtos

DataPipeline | Xia Kai, head of Happy products: data-driven user growth

2025-04-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Xia Kai

Carnegie Mellon graduated from computer science at Carnegie Mellon University and worked in the Evernote data team and Microsoft Bing.com search engine advertising department. After returning to China, he joined Xiao Hongshu as an early member, and successively engaged in big data, user growth, project and team management.

I first did search advertising in the United States. After returning to China, join Little Red Book as the basic data service and data platform. As an entrepreneurial team, at first, I want to do data mining, data statistics, growth, but there is no data, so we have to put together the technical framework of data collection, management, calculation and storage, and then do further data analysis based on analysis and decision-making, so as to promote product iteration and decision-making at all levels of the company. Therefore, I will share the path from technical architecture to analysis, growth and products.

First, what is the theory of enjoyment?

First of all, let's introduce what we are doing now. Hangwu says it is a Mini Program as well as APP, which is mainly used as a community for the exchange of items between users, a bit like idle transactions. Because it is free, users exchange things with each other through our defined Little Safflower points, user stickiness and interest are very high, and the growth rate is very fast. At the same time, with the very fast spread of Mini Program, in the past 10 months, we went from zero to 30 million. B-end business / enterprise side can send things on the platform to promote and give back to fans. In addition, we will also do some public welfare, such as donating the library.

2. Data platform from 0 to 1

1. The entrepreneurial story of the data team

As the data grows, it will pose a great challenge to the data team. For us, the initial challenge is that the business is growing at a very rapid rate within 10 months. it is difficult for us to build a perfect data platform from zero to one to support the fast-growing business. Therefore, based on the business, our first consideration is what kind of business data analysis requirements we need to support; the second is how to iterate a MVP product to support these requirements in a short time. Based on the above thinking, we use cloud computing tools and third-party services, coupled with the efforts of our team engineers, and iterate the data platform based on product feedback and business unit usage data feedback.

two。 Demand for data

First point: improve the product and improve the user experience. When making decisions on product function design, circulation layout, page design, etc., we want to rely more on data than experience. However, to support decision-making through data analysis, we need collection, calculation, storage and analysis tools. There is such a scene, we have strongly argued that between product design and business design, should be in the form of a single waterfall flow, or in the form of two columns?

The single column is full of pictures and text, and you can browse immersively, but you can only see one at a time; in the double-column layout, you can see more than one screen, click in and return, but it may interrupt the user's browsing. Today, it looks like the layout of Douyin and Kuaishou, but there was no Kuaishou for our reference at that time. At that time, we allocated 10% of the traffic to each of the two layouts, and after a period of time, by looking at the details of usage duration and user stickiness, we adopted a two-column layout based on data decisions. Due to the emergence of Douyin, we have redesigned the single-column layout to make the user experience more direct, immersive and interactive.

The second point: IF U CAN'T MEASURE IT, CAN'T MANAGE IT. In corporate management or decision analysis, without data support, it is easy to get caught up in subjective discussions and disputes. Therefore, we are iterating the product process / function, and make a perfect iterative plan according to the data analysis results with the help of data analysis tools, so as to avoid falling into meaningless arguments.

Finally, rise to the cultural level: active consciousness + IQ + rational environment + information transparency = RELIABLE DECISION MAKING. I personally think that there are four factors to help the whole company / organization to make reliable decision support: initiative, sufficient self-drive and IQ, and the latter two factors are closely related to the data, namely, rational environment and information transparency. In a relatively rational environment, so that everyone can have a voice, so that everyone's opinions can be respected, and decisions can be made through data. Another factor is information transparency, and everyone who makes / participates in decision-making makes effective judgments based on enough data and information.

3. Demand for data platform

For an enterprise, the departments with strong demand for data, such as marketing department, operation department, sales department, etc., the marketing department spends a lot of money every year, and ROI calculation is carried out through data analysis, so the marketing department has very high requirements for data quality and channel flow quality. The operation department has high requirements for data flexibility, for example, the operation department does an activity today, how to measure the effect of the activity on customers and the promotion effect on regular users, and judge the quality of the recall effect of different users through layering. This requires multi-dimensional analysis in terms of data flexibility and index flexibility.

4. Iteration of the scheme

The R & D team is more likely to support online business, so it is highly dependent on the high availability of the data platform. If his recommendation uses your data results, whether it is calculating the short-and medium-term user portraits of users, or directly speculating from the data results or making decisions on search rankings, you need to keep the data highly available and provide a programmable interface for direct docking. The demand put forward by the product is based on its own KPI index system, and different product managers will look at different indicators. Big product managers will look at GMV, while small product managers will look at registration, conversion, clicks, test results, etc., which support different drilldown efforts, as well as index analysis.

At first, we run the data in the business database, the engineer needs to write a script, and by running the data into the Excel, and then give it to the business unit. This approach is very inflexible and will affect online business access. Even if the data is synchronized to other databases, there will still be only final business data results and no process problems. For example, order data, it is difficult to know the transfer of this order from the status of receipt, transfer and delivery, because there are only the final results such as the current user status, transaction status, payment status and so on.

In the iterative process, what we need is the user behavior log, through the state change log, put some of the result data into the log, and finally do the analysis. In this way, the requirements for log collection, synchronous calculation and structure are put forward.

5. Design scheme of data platform

At that time, part of the business was in Tencent Cloud, and later we adopted a hybrid cloud approach. With the upgrading of our data platform architecture with the volume of business, after several iterations, let's share with you how we think during the iteration and why we make such a decision.

The initial business analysis requirements are relatively simple, at that time, the urgent demand for data analysis is the business department, such as the number of visits to certain interfaces, the number of views of certain functions, clicks analysis of these requirements, engineers can complete in one day. However, with the increase in business volume, a variety of flexible data analysis needs are coming, unable to cope with.

The traffic from the front end, client, Mini Program, mobile platform, PC, etc., these data will eventually fall into two places, the user behavior data will fall on the user's access log, and these logs will be further collected. Business data will be periodically synchronized to file storage by timing synchronization, and eventually the data will fall into AWS. Based on the data warehouse, we will fall directly to the data warehouse. This data warehouse can be structured and implemented quickly, and the cost of maintenance and construction is relatively low.

At that time, there were two problems: one is that the flexibility of the system is limited, when there are real-time, large amounts of data requirements, and the need to quickly get the analysis results, the calculation can not keep up with the display problem. In addition, at that time, AWS only had a special area for offline libraries, and the data would be delayed. With the opening up, we quickly moved into the past. Because of the real-time data requirements, we used the AWS version of the linked data for synchronization. However, it will still lead to the problem of computing storage due to a large amount of data.

Based on the above situation, we set up our own EMR data cluster, and then analyze the data and offline calculation results, which can be quickly used and accessed by business departments, and then support logistics computing and KPI computing. However, when the amount of data from different data sources is growing rapidly, this approach becomes unstable.

When I first started doing this, we engineers wrote tasks ourselves, managed a lot of scripts, and ran all kinds of back-and-forth dependent tasks, which took a lot of engineers' energy, especially when there were only two or three engineers. In this case, we came into contact with DataPipeline, they came up with a relatively mature data management platform DataPipeline, and through privatization deployed on our servers and perfect solutions, which to a large extent helped us to solve many problems encountered by engineers. Can be relatively stable to support our recommendation, search, front-line analysis, business analysis and other different scenarios.

6. Data platform tools

We supported the entire data architecture with only a few engineers for business units to analyze. We currently use Airflow to do some task scheduling, as well as visual analysis through Tableau.

Third, the analytical framework of Growth

Let's share with you some content about Growth. With the growth of business, with data and analysis tools, we need to analyze and achieve the growth target continuously, including customer acquisition, activation, recall and retention. What I share with you today will be more biased towards the framework of data analysis, which is briefly summarized in the following five steps.

Step 1: monitoring

There needs to be a data platform or data tool that can be monitored. Through the data platform / data tools, it is possible to analyze what is happening so that how to optimize and iterate. The ideal state of monitoring is to scan the data bulletin every day for the first five minutes when I enter the office and have a general assessment of the whole platform and the whole product.

Step 2: dig / guess

When it is found that there is a change in the good / bad condition of the product, then further disassemble these indicators, analyze them, and do further excavation.

The following figure is a Growth case. For example, I am concerned about the total usage time of the entire product. I will pay attention to the changes in indicators every day, because there are weekends between days, so there will be more fluctuations, while the average duration of seven days is relatively flat. This is to see the trend change through the time level.

Once you see that the trend is getting worse / better, analyze it by dismantling the goals and indicators. A simple disassembly idea is that in terms of the duration of use, from the point of view of content generation and content consumption, how many people have contributed content that can be increased in length, and how many people will consume the content and increase the duration. For example, if enjoying goods is a content consumption platform, our content is published by users, and the published content includes topics, notes or items. I may divide it into the following dimensions: who, when, how, new / old users, users in which areas, working days or holidays, night or daytime, the content will be divided into different categories, mother and child, clothing, makeup or other categories How do I publish it? Whether to enter from the activity page or from the feature page through Mini Program, APP, Android, IOS, PC.

The reason for the dimension split is that when we see a big index change, we can't immediately know whether the overall situation or only a certain time node has changed, so after such a split, if this index has changed in each dimension, we can judge that it may be an overall change. If this metric changes only in a few dimensions, for example, only among new male users in Shanghai, we may have some basic assumptions that can be used to verify whether Shanghai has recently done offline activities, or whether BUG has occurred on the IOS version.

Step 3: test

After the dimension of the index is split, it is verified by the data based on mining / conjecture. And then improve some functions of the product or iterate in the process.

Step 4: read

After completing the "test" part, further put the changes you want to make into the product for verification, run the data for a period of time, and find the fine measurement results through the Amax B test and the data effect.

Step 5: iteration

The iterative part is to make various improvements and tests on the style, button copy, and layout of the product by repeating this process. Because want to quickly produce product or index effect, qualitative leap is more difficult, through continuous optimization can gradually promote the product to a better direction.

The above is a simple framework for data-based growth analysis, and the last thing I share with you is how to tell stories with data.

Fourth, the routine of telling stories with data.

When we need to present data or arguments, if we just present the data, others will not be able to directly see what you are trying to express. If you visualize the data, it will be easier for people to know exactly what you are trying to say. For example, the following typical e-commerce data visualization chart, the horizontal axis represents the number of item views, the vertical axis is the actual sales volume of items, different color circles represent different categories of items, and different sizes represent different profit margins. So it represents how many times an item has been seen before it is successfully traded. For example, the lower right corner represents things that many users see, but no one buys, and the upper left corner represents the items that users buy immediately when they see them. This is how we feel about the conversion rate when we see this picture.

Why is it so important to visualize or tell stories with data? Visualization is only a very small part of the entire workflow of a data analyst or data scientist. More attention is paid to ensuring the correctness of data, ensuring the complete process of data collection, data cleaning, collation, calculation and modeling. However, through visualization or telling stories with data, it is often the most perceptible link, and there is the most lack of ready-made experience or directly learnable textbooks or skills in this link. Therefore, I have exchanged some experience with you in this respect.

1. Two kinds of data analysis requirements

General data analysis is divided into two categories, one is called interpretive analysis, and the other is called exploratory analysis. Exploratory analysis is that when we find a problem through data reports or some daily data analysis, we will further find some hypotheses with this problem, and then use the data to verify it. Once we feel that this hypothesis is feasible, we will need some explanatory analysis to show it to the person concerned, tell him why there is such a hypothesis, and why we should do ABC different solutions. At this point, we need to use data to explain the previously proposed scenarios / arguments, and these two processes are often essential.

two。 Turn data into stories

When we present the data report / PPT to others, first, the audience will be different people; second, it will be a different scene. An engineer, for example, may pay more attention to how your data analysis is derived, and why do you produce such results? He will have a very strong doubt about the credibility of the result reasoning and the production process, and you need to give him very strong evidence. If you are a salesperson or marketer, you don't pay special attention to this process. He pays attention to what your analysis will produce to him and how to use it, so you need to say different things to different people.

On the other hand, some people are born with preconceived prejudices or consciousness. For example, when you discuss with the product manager with the results of the data analysis, he will refute your data analysis reasoning with the direct results of the user feedback. Why is this happening? Because user feedback is a more direct and intuitive opinion, but the data is a relatively cold thing to put there. In addition, there is another reason called survivor bias, and there are a lot of people who are willing to give you feedback, especially negative feedback. Because those who use them well may stop talking, and those who are unhappy will come and scold them. if you disagree with some functions, they may express their opinions more strongly, and you will pay more attention to them. This is a point to pay attention to when making decisions when balancing user feedback or data research results.

3. Elevator test

A common example of this is the elevator test, whether it is starting a business or reporting a job to a superior in a company, or trying to get a new plan to pass, there will be such a process: how to GET the other person to the point you want to express in a short time. When he is willing to communicate with you further, you will give more evidence or data to prove it. Finally, when he ends this conversation with you, it will only enter one or two conclusions from you. This is the whole process of elevator testing.

The same is true of presenting data. When you present data to him, you only want him to remember one or two points that can GET what your data needs to mean.

I will explain it with two examples below. The first example is that a customer service work order completion record has been made. Write down the completion of the monthly customer service work order and distinguish between the work order received and the work order processed. There is no problem from the presentation of this report, but we do not see what we need to pay attention to. What is its optimization? In fact, what we want to say is that in the next few months, we have received more and more work orders, and the work orders we have received will not be processed, so we will list the work orders received and work orders processed directly here, and make a direct digital and intuitive presentation of the next few months. Post the conclusion here and tell the superiors that there is a need for a sharp increase in manpower because the unfinished orders will be processed later. This is the first time you seem to know what your picture is trying to express and what it means.

The same is true for pie charts. Pie chart is a very bad data tool in my opinion. Try not to use pie chart. When we do the general optimization, more people are satisfied, but because there are more mixed people, we need to react to see that different colors represent different meanings, and then we will find that there are more satisfied people. In addition, if I want to express a completely different opinion, I manipulate the data results by making a 3D pie chart.

The picture below is exactly the same pie chart. when I make a 3D pie chart in different ways, you will see that there are more supporters on the left and more opponents on the right.

The content shared above basically covers the different areas in which I have worked, initially how to support the business through data architecture or data engineering iterations, how to further grow through data, and finally when I have a variety of analysis scenarios. How can I make analysis and communication more efficient?

Thank you. That's all I have to share with you.

Click here to apply for a trial of DataPipeline products for free

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.