In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
Today, I will talk to you about how Dataphin helps enterprises extract data centers. Many people may not know much about it. In order to make you understand better, the editor summarized the following content for you. I hope you can get something according to this article.
As the product carrier of OneData (OneModel, OneID, OneService) methodology of Alibaba data Center, Dataphin helps enterprises to build three big data centers: vertical data center based on data integration, public data center based on data development precipitation and extraction data center based on label factory. Today, let's take a look at how Dataphin builds a data extraction center based on OneID ideas and connects upstream and downstream applications to create more value for enterprises.
Why to build an extraction data Center: increasing the density of data value
First, let's take a look at why Dataphin should help enterprises build their own extraction data centers.
In the era of big data, any small amount of data can produce incredible value. As an intelligent data construction and management platform, Dataphin's core functions such as standard modeling and data processing help enterprises efficiently integrate massive data from different business databases, precipitate data assets, build their own data center, and meet the challenges of Volume (mass), Variety (diversity) and Velocity (high speed) in big data era. However, compared with the traditional small data, big data's greater value lies in digging out data with reference significance for prediction and analysis from a large amount of unrelated data, increasing the value density of the data and applying it to guide production, so as to help enterprises achieve the goal of improving efficiency and reducing cost. Dataphin's data extraction function provides this capability.
From a business perspective, daily production and marketing activities, whether it is crowd circle selection, location selection or personalized delivery, are inseparable from the guidance of labels. A label is a three-dimensional portrayal of an entity (not limited to people, any existence that can be described and analyzed can be an entity, such as goods, companies, etc.). Labels of different dimensions describe entities from different angles, for example, from a retail perspective, we can describe consumers from natural attributes (such as gender, age), social attributes (such as economic status, marital status), interest preferences (such as clean environment, desire for beautiful teeth) and industry consumption preferences (such as makeup preference, mother-to-child preference). High-quality and comprehensive tags can effectively abstract the whole picture of the information of an entity, which lays the foundation for precision marketing.
Only through the integration of data can we produce greater value. We not only hope to analyze and apply big data, but also hope to get the data connected by cross-business units and finely extracted data. In this case, based on the original data of the business database and the precipitated data assets such as modeling research and development, the Dataphin data extraction module identifies and connects the master data in the whole system, that is, the core objects running through each isolated business, opens up the isolated island of business data, and further refines the high-value label data that can be applied directly, thus helping enterprises to build their own extraction data center. And docking upstream applications (QuickAudience, etc.) to further guide production and marketing activities.
How to set up extraction data Center efficiently: visual configuration, Automation production
The data extraction under the Dataphin R & D module provides us with the function of connecting behavior data and realizing label extraction. At this stage, priority is given to supporting the data system aimed at consumers. The function module mainly includes three parts: ID center, behavior center and label center (currently ID center is not online). In addition, a separate extraction operation and maintenance sub-module is provided under the operation and maintenance module to support viewing extraction-related scheduling tasks from a business perspective. Next, we will show you how Dataphin helps enterprises build their own extraction data centers from the perspective of several functional modules.
Cdn.com/95221d8f99c5611687fcfb363c72554d0071f209.png ">
1) ID Center: automatic identification and connection of related ID
Based on the idea of OneID, Dataphin uniquely identifies data from different platforms, systems and channels, supports parameter configuration through a visual interface, extracts from all data and automatically identifies mapping relationships between various types of ID based on algorithms (shopping member ID, video viewer ID, shopping device mac, viewing device IP, etc.), and connects different types of ID belonging to the same entity through a unique OneID So that the tags produced based on ID can be aggregated into the same entity, so that the entity can be described more accurately and comprehensively.
2) behavior center: precipitate behavior elements and construct behavior rules
Dataphin currently supports human-related ID as the center, through visual interface form configuration, extract from the source behavior data and then gather the behavior data under different business domains (such as e-commerce shopping, video viewing).
First of all, we need to sort out the behavior data from the business perspective and extract the reusable behavior elements (behavior domain, business line, action, object, object attribute). And define different behaviors by combining behavior elements (behavior domain-business line-action-object). The behavior domain aggregates the behavior data with the same business meaning, such as e-commerce domain and text entertainment domain; the business line further subdivides the behavior data based on the behavior domain, and each business line is relatively independent, such as Taobao business line and Tmall business line; action refers to the actions issued by the subject, such as purchase and browsing; the object refers to the specific things operated by the subject, such as commodities and movies The object attribute is the descriptive information of the object, such as name, brand, year. By extracting precipitation behavior elements, we can better divide and combine the source data to get behaviors with clear business meaning, such as e-commerce domain-Taobao-purchase-goods, entertainment domain-Youku-browsing-movies. By precipitating the behavioral elements, we can better regulate the source data and reduce repeated construction and manpower investment.
Select different source tables for the same behavior and add configurations, that is, generate different behavior rules (determined uniquely by the behavior + source table), and subsequent label production will depend on the behaviors and behavior rules that have been built. The rule configuration mainly includes the behavior subject ID, the object, the object attribute and the behavior occurrence times, selects the corresponding field from the source table, and then schedules the task through the behavior rule cycle, we can get the continuously updated behavior data as the source of label production.
3) label Center: efficient label production
After building the behavior and behavior rules, further, we will define the generation rules of tags through simple interface configuration based on the algorithm model.
The configuration of tags is divided into two steps: the first step is to select the behavior data that a tag needs to rely on based on the defined behavior circle, and then configure the expected tag value and marking method; the second step is to set the time attenuation mode for the selected behavior data and assign different weights to different behaviors based on business meaning. For example, we believe that users who "buy maternal and child products" and "watch parent-child videos" can be labeled as "mother-to-child population". In the first step, we check out the data related to these two kinds of behavior. set the expected label value to "mother and child population" Second, we believe that the recent behavior is more referential than the previous behavior, so we choose the linear attenuation mode to give more time weight to the recent behavior; at the same time, based on business experience, we believe that "buying maternal and child products" is more accurate to target users than "watching parent-child videos", so we assign more weight to "buying maternal and child products" behavior. In this way, we have completed the production of a shopping preference label such as "mother and baby".
Different from the traditional label production, Dataphin data extraction users only need to care about the specific business meaning and rules of the label, and do not care about the implementation of the underlying algorithm, through a simple interface operation can complete the configuration of the tag, and automatically generate code and periodic scheduling tasks, greatly reducing the difficulty and threshold of label production.
4) extraction operation and maintenance
Finally, the behavior rules and tags that we configure in the extraction module generate periodic tasks that are automatically scheduled. Under the "extraction operation and maintenance" sub-module of the "operation and maintenance" interface, we can view the corresponding tasks and corresponding generated instances more clearly from the business perspective, and resume production through operations such as replenishing data for abnormal scheduling. In this way, business people can also configure and view extraction tasks, greatly reducing their dependence on technicians.
After the launch of Dataphin data extraction function, the time for mass production of more than a dozen tags of the same type has been shortened from two weeks to about two days, and label production tasks can be monitored, both in terms of speed and correctness have been greatly improved; participants have also changed from the original data product managers, data R & D engineers, and data scientists to more business roles that can participate or even lead.
The establishment of Dataphin extraction data center helps enterprises to better realize the identification and connection of the relevant ID of the target object, the standardized structural aggregation of all the behaviors of the target object and the rapid creation of the relevant tag attributes of the target object, so as to quickly build their own user data assets in order to dock with data application products and realize marketing delivery.
After reading the above, do you have a better understanding of how Dataphin helps enterprises extract data centers? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.