The essence of data work: from the business to the business 02/13 Update SLTechnology News&Howtos

The essence of data work: from the business to the business

2026-02-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Introduction: data work is relatively simple in terms of composition structure and process, because the work is already very young and the division of labor is not very detailed. Generally speaking, I regard data work as three interconnected parts: taking numbers, rational numbers, and using numbers, which is a closed loop. The demand of using numbers will drive the work of fetching numbers, and put forward specific operational requirements for the work of fetching numbers.

The book "data Source thinking" is based on this essence as the core content, and puts forward a set of simple and practical methods to control the value of data work.

Fetch number

The data fetching work solves the problem of the data source, specifically, it consists of the following series of work:

[1] Design and implement the methods and rules of fetching numbers in products.

[2] in the process of product operation, data are obtained from the product side in real time or periodically.

[3] transmit, receive and verify data

[4] format archived storage data.

After the fetch work, the original data generated by the business operation is formed. The raw data is extremely rich, and there are many classification methods, which can be roughly divided into two from the user's point of view. One is that the user is aware of the active provision, such as registration data, published text, etc.; the other is the passive provision that the user is not easily aware of, such as the Internet IP address, operation actions (such as mouse movement on the PC, sliding on the mobile phone).

The reason for using this dimension, which is not commonly used by data workers, to classify raw data is to remind product managers that they also need a little data thinking when designing products. If you can get the data in the second way, you don't have to bother the user in the first way.

Rational number

Sort out the data. This step is not necessary, especially for startups, it is more common to use raw data directly. On the one hand, because the focus of the business in the start-up period is not on data issues, on the other hand, because some of the original data is structured, directly into the database, it can already be used, such as user registration data. However, with the enrichment of data and the change of business focus, rationalization becomes more and more important, because most of the raw data cannot be directly used for analysis and reproduction, such as IP addresses, such as text.

The landmark work of rational number is the construction of data warehouse for multi-layer extraction, induction and abstraction of the original data. If the number is taken from the user and the number is used into the business, then the rational number is the bridge between the two, which converts the data raw materials from users into data components and semi-finished products that can be used for research, analysis or formation of data products. This will involve data mining. For example, the IP address mentioned above cannot be used directly, so the IP is generally translated into a locale name based on a database of IP addresses. This is to turn a raw technical data into a meaningful business information.

There is no strict distinction between rational stage data mining and multi-stage data mining. it is generally believed that the main task of this stage is to make the demand more common and use more extensive information to mine from the original data to reduce the workload of later data. For example, the mining of basic attributes such as user sex and age. Although most Internet products allow users to fill in these fields, they are called raw data. If you use the raw data directly, it seems to skip the rational work, but in fact you are enabling a rational rule or model, except that the input and output are the same. The development and application cost of this model is zero, but it is up to you to judge the opportunity cost.

When the database and data warehouse are ready for spare parts and semi-finished products, the data work will enter the most dazzling stage of use.

Use the number

There are two directions for the use of data, one is to provide decision support for the internal work of the enterprise, and the other is to directly provide users with independent data products or new product functions supported by data.

BI is probably the first thing that comes to mind when it comes to decision support. In a narrow sense, the traditional BI mainly uses the internal data generated by the operation of the enterprise, and then makes some forms, columns, bars, broken lines and other kinds of charts, which is more boring. The decision support of modern Internet becomes much more interesting because of the different data sources.

For example, we once provided a product for the recruitment of the company's human resources department, that is, according to the recruitment requirements, we used Weibo data to accurately find candidates. Of course, finding someone is only the first step, assessing talent ability, behavior habits, industry salary levels and other data work can play a role. You can even collect multi-party data to do early warning of employee turnover. Therefore, the decision support based on Internet data can support all aspects of the work of the enterprise. For example, in Internet companies, there are data applications of decision support:

1. Product optimization decision

The main job of the product manager is to capture the user's demand point and then design the product / service to meet it. Although the discovery of demand points is often an empirical and qualitative work, data work can still make optimal decisions in two aspects:

First, give the overall preferences and habits of mainstream users or a certain category of users in the market to help product managers deepen their understanding of users. For example, what kind of users like to listen to audio in what scene, in what scene like to read text, in which scene the video may be more likely to be opened, and so on. This is very important for the product manager to select the demand entry point of the user base.

Second, assess the possible market size and growth curve.

After a new product or feature is launched, the product manager needs data feedback to determine the user's acceptance of his or her design. Although overall indicators such as PV and DAU can reflect users' attitudes towards new products / features, because they are overall indicators, their changes include too many factors, such as promotion efforts, operational activities and so on. Therefore, in order to look at the product more accurately, a better choice is to measure user feedback by changes in user individual indicators such as return visit rate, duration of use, frequency, exit / pop-out, transformation and so on.

In addition to ex post facto monitoring, AB tests are sometimes used to verify the effectiveness of different designs in order to know user preferences in advance and reduce market risk of new products / features. Here, it will involve the cooperation with the fetch work. When AB testing is deployed, you should select two groups of similar users to push the test content under certain conditions, and see the actual effect without the user's knowledge.

two。 Operational support

The operation of Internet products mainly includes user operation, content operation, event operation and customer service. Data work can give basic support in each piece.

For example, an important task for users to promote activity is to prevent loss. Here there will be a problem of loss judgment criteria. How long does it take not to calculate the loss? The focus of this research is not really on the lost group of users, because you can't get the loss time information from them. Our focus is on those non-lost users who haven't come for a long time, but eventually come back in the natural state (note: no recall and activity impact). From this group of users, we can find out how long a user may come back after how long a user has been silent, and vice versa, we can judge the loss. In the actual research, you will find that some users will come back after half a year or more, which are certainly not reflux in a natural state empirically. So judging whether the natural state is a new problem, one of the data sources to solve this problem is the access source.

Of course, calculating the loss standard time limit does not have any direct effect on preventing loss, the practical use of this standard is to screen out the loss research samples and draw the loss early warning model through the sample data. through the changes in the behavior of users when they are still active to predict the probability of their loss, and then provide users with operations to make decisions on the next step.

3. Marketing anti-cheating

Anti-cheating and cheating is a work pair, basically in the state of as virtue rises one foot and vice rises ten constantly learning from each other and mutual restraint. Therefore, with the continuous updating of cheating methods, there are many ways to counter cheating and identify false users. Most of the methods are based on discriminant models based on manual or machine learning experience. These methods have the advantages of high discrimination efficiency, low implementation cost and wide use, but they also have fatal shortcomings. Because these methods are supervised and the experience comes from historical data, if the channel cheating methods remain unchanged, these anti-cheating identification methods will remain highly effective. But the problem is that when you identify the channel cheating and refuse to pay for it, the channel immediately knows that you have a means to identify the current cheating methods, and they will upgrade the cheating. At the same time, they will also ask you to produce evidence of their cheating. If you tell them, it means that you reveal the identification method, and they can more easily bypass your original anti-cheating method and realize vice rises ten. Finally, you have to come up with an unsupervised way to fight cheating.

In addition, sales, manpower, strategic decision-making and so on will be the stage of data application.

In addition to serving as a supporting role in decision support, data applications also play a leading role. For example, Baidu search fame, Weibo buzzwords and other data products. There are more common data products such as "guess you like" and "related products" presented to users with the direct support of data work.

From the above introduction to data work, I don't know if you realize the essence or fundamental value of data work "coming from the business and going back to the business". If you are not a practitioner who is only satisfied with the internal technical processing of data, you must have a clear understanding of this nature.

The book "Digital Source thinking" will be a good choice. Click this link to view the book on the official website of the blog.

For more wonderful articles in time, search for "blog viewpoints" on Wechat or scan the QR code below and follow.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.