How to build data Center based on DataWorks 02/14 Update SLTechnology News&Howtos

How to build data Center based on DataWorks

2026-02-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

Today, I will talk to you about how to build a data center based on DataWorks. Many people may not know much about it. In order to make you understand better, the editor has summarized the following for you. I hope you can get something from this article.

A new retail business model

If a new retail enterprise wants to be a data center, the first important thing is that it must understand the business. A classmate asked me before that it was difficult to build a data center. In my opinion, data and business are closely related, in the construction of the entire data platform, we must first have a very deep understanding of the business.

New retail enterprises will have a variety of business forms, such as online e-commerce platform, offline stores, official APP, distribution channels, supply chain and so on. It is not necessary for us to collect and unify the data from all channels from the very beginning, that is, to be a data center. What we need to understand at the beginning is what the business model of the whole enterprise is. Based on the business model, we define the business form that needs to be done, and the last thing is to start planning the construction of the enterprise's new retail data center. I can give you an example here.

For example, more new retail enterprises used to be mainly offline stores, and now they will do some online APP or e-commerce business, but their online inventory and offline inventory are out of sync, or the e-commerce model is different from the offline model. In fact, his business model is still a traditional retail business, but he has opened another online business. The first thing the data center needs is to break the original business model of the enterprise and design a real online and offline integration of the business form, so we often say that the data center is the top project of the enterprise.

After determining the business model, there are many business forms of new retail enterprises, and we are all trying different things. For example, some fresh businesses will have a XX minute limit, and enterprises in offline stores will introduce offline traffic into the online. At the same time, they will use offline stores as a warehouse at the online entrance, and some enterprises can pick up goods at offline stores after online purchase to ensure the same price online and offline. When these business patterns are determined, we will talk about how the data center supports these services and completes the closed loop of the whole business model through data connection.

Product Technical Architecture Design of the second New Retail Enterprise

Technical architecture diagram of business products

After determining the business model, the next step is to design the pure product technical architecture. At this time, many retail enterprises will be more tangled, because it is found that many traditional software manufacturers have an off-the-shelf software system, such as ERP and WMS, because they find that they do retail, store and merchant supermarkets. Is it OK for enterprises to buy one?

Now some of the traditional ERP software or logistics software have also been digitized, but the important difference is that the digitization of the data center station is not only for simple digitization and data structure, but more importantly, it is a very important support for the upper strategy layer to make a very good intelligent support for flow, logistics implementation, process optimization and financial strategy. We can share an example here a little bit. We have also investigated some large retailer superenterprises with offline stores. They also do online APP, but their inventory is isolated online and offline. If there are a total of 100 fish, they will be pre-distributed within APP, and only 10 will be sold online. After selling them out, they will be gone online. After having data, these 100fish come first-come first-served online and offline. At the same time, the algorithm can be used to predict inventory early warning, discount, cross-selling, supply chain adjustment, and so on. Compared with roughly dividing into two groups, the data center basically passes through all the data and commodities offline and reconstructs some business forms, so we say that the data center is not simply structuring the data.

If enterprises have certain technical capabilities, it is recommended that all core business systems should be self-developed, because new retail enterprises need to make a comprehensive digitization of many traditional businesses, including transactions, stores, warehousing, transportation and distribution, procurement, supply chain, labor, and so on. If external procurement, based on the business model, be sure to let the system form a closed loop, from trading stores, warehousing and freight, procurement supply chain, labor, etc., do not APP, stores, e-commerce are different systems, so that when you do data center, the data itself barriers are already very high.

A very important point to complete the whole closed loop is the rightmost data layer. Apart from the design of the business system, if there is no unified data center construction, it is very difficult to support the entire enterprise project. This is also the part that we will focus on today.

Introduction of the New Retail data Center team

In our view, the data center is not only a solution, but also a team function. Enterprises should build an independent data center team to support the business. For enterprises, data, like goods, members and equipment, is a very important asset. The students of the enterprise data center team are the builders, managers and operators of assets, through these assets to drive the whole retail supply chain full-chain, intelligent upgrade. Through the collection, management and construction of data, so that the data can be better used in the business.

The overall structure of the new retail data center

The above figure is the overall structure of the general data center, this part will have some particularity, but also some generality.

First of all, I would like to introduce the versatility. The whole infrastructure construction basically adopts the infrastructure of Aliyun. DataWorks+ Max computer on Aliyun has been supporting the construction of the data center of Alibaba Group for 11 years. In the whole data layer, the source data layer basically comes from the business system, and the access layer will be relatively complex. Many enterprises now talk about omni-channel coverage, including APP, offline, and even some enterprises have their own distributors, electric vehicles, as well as some IOT equipment data and human resources in stores, so there will be a lot of structured and unstructured data. Through the data processing layer to process the unstructured data, a very important data asset layer will be formed eventually.

After the data asset layer is built, it will have a certain business meaning, and this part of the data can be directly used by the business. However, on the data asset layer, we will define a data service layer to make the data more convenient to use right out of the box. When it comes to the service level, it may still be invisible. From the business side, we certainly hope that business users can directly use the data, instead of going to many tables to check the data. So above the data service layer, the data application layer data center team can build a lot of data products, give the business through the production way, and provide real data use. Product forms will also be more, in different ends, including PC, nails, palm treasure, and many small IOT devices, which may be a small black-and-white screen, there will be data transmission through. And in the rightmost data center, there will be a set of management system, through this management system, so that the entire enterprise operation and operation and maintenance can be effectively implemented. This architecture diagram is our understanding of a business-oriented hierarchical architecture diagram of the data center.

The technical architecture of the new retail data center

Based on the business-based hierarchical architecture of the data center just mentioned, we need to continue to design a set of technical architecture of the data center. If you have done big data, you will often encounter in data collection, at the same time there are offline and real-time calculation how to do? For offline computing, we recommend MaxCompute on Aliyun. Almost all of Alibaba's offline data is on MaxCompute, and the daily data processing capacity of double 11 MaxCompute in 2020 will reach 1.7EB level. For real-time computing, we recommend Flink, with a peak processing size of 4 billion messages per second, and the computing performance is also very powerful. In addition to computing, but also to do data storage, such as real-time computing Flink data summary processing, can be stored in MaxCompute Interactive Analysis (Hologres) to build our real-time data warehouse, MaxCompute Interactive Analysis (Hologres) can support a peak write speed of 596 million, while supporting subsecond query of PB-level data, as well as online search Elasticsearch, and these stores will become data services. Data services will have detailed indicators, as well as features, labels, and so on. These data can be extended to some of the most commonly used devices, operating platforms, nail mobile office, intelligent management, etc., which are more at the runtime level. At the operation level of the entire data Mart, there are metadata, data quality, disaster recovery management, data governance and so on. This technical architecture diagram, which is more like a technical requirements architecture diagram, is something that the technical team of the new retail enterprise needs to do when doing the data center.

Third, the solution of the new retail data center based on DataWorks

When the business model of the enterprise, the technical architecture of business products, and the technical requirements of the data center have been sorted out, we will begin to do a technical selection and research of the data center. What kind of products and what kind of system can support the whole set of technical architecture of the new retail enterprise. When talking about the enterprise's business system, we suggested that it should be self-developed, but the technology of the entire data center can not be developed by itself, because Aliyun already has a very mature product system for our new retail enterprises to build their own data center. Just now we talked about the selection of big data computing engine, offline data warehouse can choose MaxCompute, real-time data warehouse can choose real-time computing Flink+MaxCompute interactive analysis (Hologres), these three products can be seamlessly combined to build a complete set of real-time offline integrated data warehouse, building data development and governance tools can choose DataWorks,DataWorks to serve almost all business departments of Alibaba Group. Every day, tens of thousands of operation waiter / product manager / data engineer / algorithm engineer / R & D in the group are using DataWorks and serving a large number of users on Aliyun. The following is the overall structure of DataWorks:

Overall architecture diagram of DataWorks

Data integration is the first step in building a data center. DataWorks provides the ability of data integration externally. it has many batch, incremental, real-time, whole database data integration, and can support a variety of and complex data sources in enterprises. at present, DataWorks data integration supports 50 kinds of data sources in offline synchronization and 10 kinds of data sources in real-time synchronization, regardless of whether the data sources are in the public network, IDC, VPC and other environments. Can achieve secure, stable, flexible and fast data integration. DataWorks also has a set of metadata unified management services, which supports unified task scheduling and provides a very rich one-stop data development tools, covering the whole life cycle of data development, which can greatly improve the efficiency of enterprise data development. The upper layer also includes data governance, data services and so on, and it provides a very important open platform. Because for most enterprises, its business system may be self-developed / purchased products. Through DataWorks OpenAPI, many functions can be processed twice and integrated with various self-research systems and project systems, such as alarm information can be pushed to the enterprise's own monitoring and alarm system. At present, more than 100 OpenAPI provided by DataWorks allows enterprises to easily achieve this requirement.

Construction of New Retail data Center based on DataWorks

When we compare the technical requirements diagram of the data center with DataWorks, the data acquisition part corresponds to the data integration provided by DataWorks, and basically the requirements of data synchronization on the left can be met by DataWorks. In the data development layer, DataWorks can simultaneously complete offline, online and real-time data development through its DataStudio, HoloStudio and StreamStudio, and it also provides the ability of data service and open interface, which can be integrated with the existing systems and products of the enterprise through OpenAPI. Also crucially, DataWorks provides data mapping and data governance capabilities, which seem to be edge functions, but play a key role in building a data hub throughout the enterprise, which will continue to be expanded later.

The goal of the data center

The above can be seen more as the preparation process of the data center, understanding the business of the enterprise, designing the product system, and making a technology selection. Next, we need to determine the goal of the construction of the enterprise data center. The goal does not represent KPI, it may also be the mission or the original intention. The goal of data center construction is to establish an intermediate layer with rich data (full link, multi-dimension), reliable quality (caliber standard, accurate results) and stable operation (timely output and no failure). Many people will say that this is a data Mart, it doesn't matter, it is a middle tier. Another point is that the data center should provide reliable data services, data products and business applications for the upper business. This limits that it is not a simple data warehouse, nor a simple data Mart, but a data center, a data center that can be constantly used by the business. If an enterprise just synchronizes the data and puts it into MaxCompute or open source Hadoop or a database, it is still a warehouse. The data center that we define can be directly used by the business, or even bring business value to the business, which is called the data center.

After defining such a goal, we will start to do a step-by-step dismantling. When some business teams mention business requirements, they will only tell the data team to ask for a sales data, but there are still restrictions on this sales. For example, when? Does it include a refund? Whether to restrict the region, etc., so the data center should first design an index system, and this index system should be produced in the China Taiwan team. The second step is because the business does not use a table field. So we need a data model design support to enable enterprises to make the data more standard. The third step is based on our designed model, and we also need to do the development of data processing tasks. Finally, we have to open these data through data services for business to use, the form of data services is not limited to Table, API and Report, or even can be a product or anything else.

Overall model architecture of data Mart-overall layering

Overall model architecture of data Mart-functional positioning

The picture above is that you see a lot of hierarchical diagrams about data model or data Mart construction on the Internet-ODS, DWD, DWS and ADS. Although there are many concepts and ideas, but everyone's understanding of these layers is different, we have to have a very strict and clear definition of these layers, each layer should have its own characteristics and responsibilities. In our opinion, to put it in a brief summary:

ADS must be business-oriented, not development-oriented, this part of the data so that the business can be understood in the shortest time, or even directly used.

DWS must be an indicator, and it is also a carrier of the index system mentioned earlier, all of which are done by DWS, and DWS summary is basically the support of ADS.

DWD is the detail layer. How to build the detail layer? We recommend dimensional modeling. Enterprises have dimensional tables, fact tables, and dimensional tables have many hierarchical dimensions, such as enumerated dimensions, and fact tables have periodic snapshots. Of course, there is a point here that the field of DWD must be directly understandable, not ambiguous. Once there is ambiguity, there will be problems in the use of DWS, which will lead to problems in the whole upstream application.

Basically, everyone understands that ODS should be consistent, that is, business data should be synchronized directly. But now there are some architectural changes, and people like to do a preliminary ETL processing in ODS, which will cause the data of ODS to be inconsistent with that of enterprise business. In fact, we recommend not to do so, the reason is very simple, we need to ensure that the ODS is consistent with the business library, so that when there is a problem, we can quickly locate the cause of the problem. Once ETL is done, it is possible that there is a bug in the process of ETL, which will lead to data inconsistency between the two sides. So if the enterprise is strictly required from the data of the business database to ODS does not allow any logical processing, then when there is a problem, it can only be caused by the middleware or any other storage problems, not by business logic.

Fourth, build a new retail data center based on DataWorks

DataWorks data development platform

The previous more about the data center construction of some ideas, design, architecture, goals and requirements, and then I will talk to you about how to use DataWorks to build data center and use the DataWorks platform some experience. DataWorks platform not only serves customers on Aliyun, but also serves almost all business units of Alibaba Group since 2009. Therefore, many of its overall product designs tend to be open, universal and flexible. At this time, enterprises will have a series of problems due to excessive flexibility or lack of standards when using DataWorks. The following content will share some experiences with you according to our experience.

Data development-data synchronization

It is recommended that the data of all business libraries are unified synchronized hm_ods projects for unified storage management.

In order to save storage, only one copy of the same data can be synchronized.

From the point of view of data backtracking and auditing, the data life cycle is set to be permanently saved.

Data synchronization is the first step in building a data center. If the data cannot enter the warehouse, the data center cannot be built. When we do data synchronization, there will be several requirements, for example, all the business data of the enterprise are synchronized to one project, and only one copy is synchronized, and repeated synchronization is not allowed, so it is convenient to manage and reduce costs. at the same time, it ensures that the data does not have ambiguity. If there is a problem with the data source, then all the data behind it will be wrong, so the data center must ensure that the data source is 100% correct. Then, considering the data backtracking and audit, the data life cycle is set to be a permanent storage. Even if the business system has some archiving and deletion due to traffic problems in some online libraries, when they want to use historical data again, they can restore it back intact through the ODS layer.

Data development-data processing code development

The data processing process is the realization process of the business logic.

It is necessary to ensure not only the correctness of business logic, but also the stability and timeliness of data output.

The second is data development, data development is a test of personal ability, basically everyone uses SQL. Our own experience for this part of data development is simply that the data processing process is the realization of business logic, not only to ensure the correctness of business logic, but also to ensure the stability, timeliness and rationality of data output. DataWorks data development editor, in addition to providing better coding capabilities, but also provides some visual way to deal with the process, to help enterprises to do some code review, or even part of the check, this function is very helpful in our daily use.

Data development-example of code functionality

The business logic will be closed in the data detail layer as far as possible, in order to ensure the consistency of the data and simplify the downstream use.

Changes at the source can also ensure the stability of the detail layer structure through the conversion of code or format, so as to avoid bringing too many changes to the downstream.

A good model also needs to be developed in collaboration with the upstream business system, one is that the business system has a reasonable design, and the other is that the change can be perceived in time.

The whole process of data development, because I also do Java, each programming has a certain programming paradigm, and several steps are abstracted in the whole process of data development.

The first is transcoding. What is the main purpose of this transcoding? As mentioned just now, many business systems are designed to complete a business process, and there will be a lot of personalized processing, especially when people do Internet business, in order to solve some performance problems or filter problems, they will do some Json fields, media fields, delimiters, and so on. Such content will be ambiguous. We will have code conversion in development, such as turning something enumerated into something that we can actually understand. What on earth is 0? What is 2? Or what is a? There is also a format conversion, enterprises have some business systems, it is very difficult to standard, such as time, some use timestamp, some store strings, some store yymm, although they all represent time, but the format is different. In the process of building a data Mart, it requires that the data format must be consistent, and we will convert the non-standard data format into a standard format through format conversion.

The second is business judgment, business judgment here is basically through the condition of the way to get a business result. For example, young people certainly do not have a field or business logic called "young people" in the business system. If there is age data, we can judge that people under 30 are called young people when combing. This is what we call business judgment.

The third is the data connection, which is basically very simple, which is a table association to supplement the data.

The fourth is data aggregation, which is widely used by enterprises when doing DWS.

The fifth is data filtering, we often encounter some invalid data, we deal with these invalid data through data filtering.

The sixth is conditional selection, which is basically something of where, which is slightly similar to data filtering.

Finally, there is business analysis. Business parsing is the most frequently used by enterprises, because now NoSQL or MySQL also supports it, and some business teams even use Mongo, a large field in which there are a lot of business representations. When we do DWD in data Marts in recent years, we must parse the format of this Json field or map field into fixed column fields. Because we just said that its content must be consistent, so that users can see it directly. Share a lesson here, that is, the business logic will be closed in the data details layer as far as possible, in order to ensure the consistency of the data and simplify the downstream use. Changes at the source can also be transformed through code or format to ensure the stability of the detail layer structure and avoid bringing more changes downstream. A good model also requires collaborative development of upstream business systems. First, the business system must have a reasonable design, and second, changes can be perceived in a timely manner. Therefore, the construction of the data center is not a matter of the data team as a team. We should also work with the business team to do linkage and co-creation.

Data development-task scheduling configuration

These parts just mentioned are more of a development phase. If DataWorks only completes these, we think it is an IDE, but DataWorks as an one-stop platform for big data development and governance, the core point is to ensure the operation of the platform, how to ensure that the enterprise data development code can run? That is task scheduling through DataWorks. An enterprise's new retail business is very complex, fresh 30 minutes of delivery, e-commerce has a second day, three days, there are some pre-orders and so on. These may not be supported by a simple scheduling system. The good thing about DataWorks is that it provides very flexible task scheduling cycle choices, such as month, week and day, and can support the stable scheduling of 15 million tasks per day on Singles Day Day. In terms of scheduling cycle flexibility and stability, it is very good. At the beginning, we designed the enterprise's new retail business is a closed loop, each business is relevant, on the contrary, the enterprise's data tasks are also relevant, at this time the whole task scheduling link is very complex.

In the whole process, we have also had a lot of attempts, innovations, and stepped on many pits. Here, I would like to share with you. Missing data or errors may occur when the DataWorks task node is not tuned or at the wrong time. Here, it is necessary to ensure that enterprise data development should deal with any problems of each online task in a timely manner, because each problem will cause a data problem. A reasonable scheduling strategy can not only ensure the correctness of data output, but also ensure the timeliness of data output. We hope that one-day output should not be turned into hourly output. Instead of generating 12 times, we can press one day. If it is three days, we will set up a three-day schedule.

Data operation and maintenance-governance-data quality monitoring

The purpose of data quality monitoring is to ensure the correctness of the output of data assets.

The scope of monitoring includes changes in table size, table rows, field enumeration values (such as the new "takeout" type), primary key conflicts (two lines in the same SKU), illegal formats (such as email format), and so on.

The abnormal value will trigger the alarm or interrupt the data processing process, giving the person on duty a chance to intervene.

Through these steps, normally, when one of our projects or requirements is completed in this way, we think that the task of a data development engineer is over. But in general, this is not the case, because the data center is a partial commercial thing, so once it goes wrong, the impact is particularly great. If the group has the group core system, the department core system, the business line has the core system, the non-core system, different core systems need different guarantees, and there are p1, p2, p3, p4 ways to define the fault level, the same is true of data business. The data service is quite different from the normal business system in that the data center team relies on DataWorks to ensure the stability of the entire online big data business task. Among them, DataWorks provides a very important module, that is, data quality monitoring. Data quality monitoring can enable enterprises to find some problems in a more timely manner, and make sure that we know as soon as the business has an impact (because sometimes there is a certain delay in business use. What the data team often encounters is when business problems come to you to know). The purpose of monitoring data quality is to ensure the correctness of data output, and the monitoring scope must be relatively comprehensive, not limited to changes in table size, changes in functions, conflicts between field enumerated values and some primary keys, or even some illegal formats, and abnormal values will trigger alarms or interrupt the data processing process, and the personnel on duty should intervene as soon as possible.

Data operation and maintenance-governance-business baseline management

The purpose of the baseline is to ensure the timeliness of data asset output.

Priority not only determines the protection of system hardware resources, but also determines the protection of operation and maintenance personnel on duty.

Important tasks are integrated into baseline management; the core task priority is the highest level 8.

The above is about monitoring, but once there is too much monitoring, it will lead to a flood of monitoring, and there will be a lot of early warning alarms. DataWorks also provides another capability, that is, task baseline management. I just said that there are levels of business, and the enterprise's data business also has some important and non-important tasks, and we use this baseline approach to isolate these tasks. Our experience with the baseline is that the baseline is to ensure the timely output of data assets, and the priority determines the protection of the hardware resources of the system, as well as the protection of operators on duty. The most important business must be a level 8 baseline. This will ensure that your most important tasks will be produced in the first place. In addition, DataWorks has a good feature-the flashback tool, which allows me to quickly flush back the data when there is something wrong with my baseline or a broken line. And if you set up the intelligent monitoring of DataWorks, this function will help you predict in advance whether there is a risk of breaking the line through the current task status and historical running time under some baselines. For example, if a data is normally generated at 12:00 in the evening, before that, a data should be produced at 6 p.m., after setting up intelligent monitoring. If the previous task of producing data at 6: 00 p.m. Is not produced at 7: 00 p.m., and the system judges through the algorithm that it still cannot be produced normally at 12:00 in the evening, the intelligent monitoring will issue an alarm at 7: 00 p.m., let the technical students intervene in advance, and do not have to wait until 12:00 in the evening for the real data output delay to start intervention. This kind of intelligent monitoring and risk estimation is very useful for the stability of enterprise business.

Data operation and maintenance & governance-data asset governance

The main goal is to optimize storage and computing, reduce costs and improve the efficiency of resource utilization.

The technical team has multiple project, and governance needs to be done together by the technical team.

The means include useless application offline, table life cycle management, repeated computing governance, violent scanning management and so on.

Doing a good job in the monitoring and baseline of data quality basically ensures the stable and normal operation of the enterprise's big data tasks and business, as well as the governance of data assets. Alibaba is a company that advocates data, and a very big milestone in its transformation is that Alibaba's hardware cost of data storage and calculation exceeds that of the business system. This also caused Alibaba's CTO to take data asset governance as a very core task. DataWorks is the largest platform for data use in the entire Alibaba Group, and even the only platform, and it also provides a module for data assets called UDAP, which basically allows you to view today's overall resource usage through a variety of dimensions, from projects to tables and even to individuals, and provides users with a concept of health score. This health score gives a comprehensive view of the ranking of each individual in each business unit. The easiest way to manage is to hit the head off first, we first treat the head health score is the lowest, and then pull the health score up, the whole level down. At the same time, UDAP provides a lot of data visualization tools, so that you can quickly see the effect of governance, I also have some ideas to share with you.

First of all, the main goal is to optimize storage and computing, reduce costs, and improve resource utilization; the technical team will build a lot of project space on its own, and the data center team needs to work with the technical team to complete data governance. Some of the more useful means are useless applications to offline, table life cycle management, repeated computing governance, and, most importantly, violent scanning of computing resources, which need to be strictly prohibited. Some functions in UDAP can also be implemented in DataWorks's resource optimization module, such as duplicate tables, management of duplicate data development and data integration tasks, and so on.

Data operation and maintenance-governance-data security management

There are four layers of data security: platform (Maxcompute) level, project (Project) level, table level and field level.

In addition to safety regulations study and examination, outsourcing personnel also need special approval and signing confidentiality agreements.

Employee termination rights are automatically reclaimed.

After doing all this, we think that what the data center should do is almost done, and finally, there is one more point is data security management. With the development of the Internet, China basically produces a relevant network law every year, such as e-commerce law, network security law and so on. As an enterprise, it is very important to abide by the law. As the most unified data entry and exit of Ali big data, DataWorks has done a lot of means of data security management. It can be controlled at the engine level, through the project level, at the table level, or even at the field level. At the field level, each field has a level. For example, there are some high-level fields whose permissions must be approved by the department head or the president before they can be used. For example, even if the approval is approved, there is still some risky data, such as ID card number, mobile phone number and so on. DataWorks data protection umbrella will provide a technology called data desensitization. These sensitive and risky data have been desensitized. It does not affect the statistics or analysis of the user, but the user is invisible.

Alibaba Group has a set of unified data management methods, it is connected with the organizational structure, employees leave or transfer, his authority will be automatically withdrawn. In any enterprise, including Ali, personnel changes are very frequent, through such functions and systems, enterprises can ensure better application of data under the premise of data security.

Fifth, the value of building a data center based on DataWorks

How does the data center support the business

What we talked about before is to build a new retail data center based on DataWorks. We first mentioned that the data center must serve the business. Now I will also introduce some ways in which the data center can serve the business. The process of an enterprise using data is from shallow to deep. First of all, we are all the same. At the beginning, we just look at the data, what data do I have, and then look at some problems through the data and make some manual assistance and decision-making. But many of the new retail businesses are expanding very fast, opening more than 100 stores a year, covering more than 200 cities across the country, and so on. When its business form changes like this, Through simple data reports and data visualization, it is no longer possible to support the business of opening more than 100 stores a year. So at this time, enterprises can also do a lot of fine control, such as category diagnosis, inventory health, tell the business what problems you have now, rather than let them use reports to find problems.

For example, there is a very different point between some fresh business and e-commerce business. This new retail business is particularly affected by natural factors. For example, the weather or holidays, or even a traffic accident will affect the fresh business, because inventory problems lead to damage. In view of this situation, enterprises can do a lot of forecasting applications based on data, such as sales forecast. Fresh sales forecast can be required to hour, every hour to do iterations, and even do some simulation system, such as when the weather suddenly changes, through the simulation system to predict or perceive what kind of risk, and make some adjustments. After that, there will be some goods that are fresh every day (the goods will be sold on the same day). Every operator and salesperson has a lot of things to do every day. There are so many kinds of daily fresh products in so many stores. There is absolutely no way to efficiently perceive and make adjustments. If we kill all the hundreds of reports, we will concentrate all the scenarios in which people look at the data and find problems into the business system. When the data center finds that the daily fresh goods can no longer be sold, there are only three hours to go before the door closes, and a discount is needed. At this time, there is no need for people to participate. The discount is automatically triggered by the data prediction and algorithm of the data center. Sell this product. These applications combined with BI and AI can make the data center produce real value, and enterprises can also design different data application products according to the current different data application stages, so that the data can really empower the business.

After reading the above, do you have any further understanding of how to build a data center based on DataWorks? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.