How does Hologres perfectly support the real-time data warehouse of double 11 intelligent customer service? 07/13 Update SLTechnology News&Howtos

How does Hologres perfectly support the real-time data warehouse of double 11 intelligent customer service?

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article will explain in detail how Hologres perfectly supports the real-time data warehouse of double 11 intelligent customer service. The content of the article is of high quality, so the editor will share it for you as a reference. I hope you will have a certain understanding of the relevant knowledge after reading this article.

Business background

From 2016, CCO began to apply real-time data to the business, and at first it mainly supported the large screen application of the Singles' Day War Room. (note: the double 11 War Room, also known as Guangming Ding, is Alibaba's general command room during the double 11 period. its large combat screen carries the operational command system of the whole group during the double 11 period. it is a "operational command map" linked together by the technology, products, and services of Alibaba's combat organization. )

In 2017, real-time data applications increased on a large scale, no longer limited to promotion, and began to be applied on a large scale in daily customer service waiter management real-time monitoring, internal operation of data products, online product data applications and online application scenarios of algorithm models. Since 2018, the number of high-security operations for overall real-time data tasks has been close to 400. In the midst of great promotion, the large screen of the Singles' Day command room has also completely cancelled the quasi-real-time application and landed in a comprehensive real-time manner. Up to now, the number of real-time jobs has exceeded 800 +. From the scale of the job, the use of all kinds of engine middleware, the coverage of business scenarios to a very diversified stage.

On the whole, CCO presents several characteristics in the application of real-time data:

The data complexity is high, covering all-channel business scenarios and data dependencies from users to purchase, order, payment and after-sales refund.

There is a large amount of data, from handy log (peak value of 10 million / s) to transaction (peak value of several million / s) to consultation (peak value of hundreds of thousands / s).

Rich application scenarios, real-time monitoring large screen, real-time interactive analysis of data products, To Bhand C online applications.

With the abundance of scenarios, the sharp increase in the amount of data and the ever-changing query requirements of the business side, it is necessary not only to quickly respond to business requirements to provide highly reliable and low-latency data services, but also to ensure the continuous and smooth operation of the system. the technical system behind it is constantly challenged.

The Evolution of Real-time Technology Architecture

The evolution of real-time architecture of CCO can be divided into three stages: database stage, traditional data warehouse stage and real-time warehouse stage.

1) Database phase

The first stage is the database stage, using the typical Lambda architecture, data collection-> processing-> service. According to the smoky construction of business scenarios, there is no layering in the data architecture, and the corresponding application scenarios are supported by tasks. All the data are preprocessed and stored in OLTP and KV engines, and external services are provided directly through Point Query.

In the data processing layer, through Flink multi-stream Join, do dimension table association through Hbase, preprocess the streaming data to the specified granularity, persist it to the database and provide corresponding services.

In the case of fewer scenarios and fewer tasks, this end to end construction method is both flexible and cost-saving, and can provide high QPS and low RT service capacity. However, with the increase of the complexity of the business scenario, the cost of operation and maintenance development is getting higher and higher, and the way that all developers use preprocessing and each developer needs to end to end from the source can no longer adapt to the changes of the business.

2) traditional data warehouse stage

With the specification of real-time data application coming online and the obvious pain point in the database stage, it has developed to the traditional data warehouse stage. The optimization points of the traditional data warehouse phase architecture are as follows:

Introduction of OLAP engine: details of a small amount of data, light summary and other data are stored in AnalyticDB, and OLAP Query with higher QPS is supported.

Data model and task processing layer: in the DWD layer, the data of different data sources are integrated according to topics and output to Lindorm, and then through Hlog subscription, the flow task is triggered to check the fact table, and the wide table fields are aligned and output to TT as DWD middle-tier storage. Build a reusable DWS layer, model the commonly used dimensions and indicators according to the topic, for downstream reuse and reduce smoke.

By introducing the hierarchical architecture of data warehouse and the technology of MPP, the flexibility of the whole real-time data architecture and the reusability of data are enhanced. However, with the increase of data volume and scale, we find that the task is expanding on a large scale, and the engine cost is increasing year by year. The data in the data warehouse we build is not really transferred, because of the diversity of storage and services. Inevitably, a large amount of smoky business logic and data are redundant in different tasks and engines.

In order to solve the business requirements for different levels of stability SLA, we split the KV engine and OLAP engine into instances and isolate resources according to the business guarantee level. While ensuring the promotion, we have to process the processed data repeatedly and write it to different instances and engines, which makes the data very redundant, and multiple systems also bring high OPS development costs.

3) Real-time data warehouse stage

In the traditional data warehouse stage, with the continuous growth of task scale, data developers need to maintain multiple task assignments, and business applications have a stronger and stronger demand for real-time data, so a series of data development problems appear gradually. for example, how to improve development efficiency, how to improve data reusability, how to reduce data costs? This makes us have to constantly optimize and evolve the technical architecture of the data warehouse stage, followed by the third stage-the real-time warehouse stage.

First of all, let's analyze the main difficulty in the evolution of traditional data warehouse into real-time data warehouse:

Task duplication: the common practice is to split instances according to business scenarios, split instances according to security level, and route them to different engines according to different service forms, such as KV/OLAP. The task has to be repeated, and there needs to be a tradeoff between repetitive construction and stability. In practice, we often choose the second or third way to ensure stability first. Due to adding multiple SINK to different instances in the same task, any problem with any instance will cause back pressure or failover of the whole task, which will affect the stability of other instances.

Data storage redundancy: in the actual scenario, we need to provide Point Query,Adhoc Query,Olap Query and other services, and we need to store at least two copies in KV storage and MPP storage, resulting in a lot of unnecessary storage, and storage costs only increase.

Metadata management: on the traditional KV engine, because of the characteristics of schema free, we can not manage the metadata information of our tables and fields amicably and efficiently.

The processing link is complex: one of the two typical scenarios is that for the field alignment problem of the dwd layer width table, it can only be updated by multiple different streams according to the same competition through the KV feature of Lindorm, and then capture each change of the corresponding competition through Hlog, and then trigger the flow to reverse check the Lindorm width table, and then send the entire row record. Second, for the data written to the MPP engine, often because the MPP engine does not support the re-subscription consumption of writing data, it is necessary to add SINK to the upstream task and write to the message middleware before it can support secondary consumption, which increases the complexity of the link to a certain extent.

Real-time data warehouse architecture

In view of the above difficulties and urgency of building real-time data warehouses, we have been investigating and exploring what products can solve these problems. It is also an opportunity to understand that the positioning of Hologres,Hologres is the integration of service and analysis, which is also in line with our later technical planning direction. Through in-depth communication with the team and in-depth testing of the previous products, Hologres was finally selected as the main carrier of the real-time data warehouse. Why choose Hologres? What are the excellent capabilities of Hologres that can be landed in CCO scenarios?

Support row storage and HSAP's mixed service capability: for existing Point Query scenarios, row storage can be adopted, and for typical OLAP scenarios, column storage can be adopted.

Real-time write with high throughput: after actual testing, the write of row memory can meet the throughput requirements of tens of millions / s of our business. In the OLAP scenario of inventory, we can easily cope with our high aggregate data write requirements of hundreds of thousands / s.

Log subscription and dimension table association capability of row storage: when we write data into Hologres row storage tables, we can easily apply Flink through Binlog subscription and Hologres connector. In the task development of Flink, the common layer detail data is selectively calculated and written back to Hologres, which is provided to different application scenarios, which solves the corresponding problems of computing power balance and high QPS calculated by Hologres engine and Blink engine to a certain extent.

Cloud Native: supports flexible expansion and high scalability, which greatly urges us to achieve several times the usual expansion of resources in a few minutes this year, so that we can easily meet the daily and accelerated elastic business needs.

The following is the current CCO real-time data warehouse architecture composed of Hologres:

Unified storage: tables that need Point Query use row storage mode in Hologres, and store common detail layer and common mild summary layer. Tables that need to be queried by OLAP use column storage mode, which are stored in application layer details and application layer summary.

Simplified real-time link: Hologres row stores the common layer data stored in the cluster, through Binlog subscription, the supply layer does secondary consumption instead of Lindorm subscription log, and then obtains the link of the whole row record through additional tasks.

Unified service: Point Query routes to row storage tables, Olap Query routes to column storage tables.

Streaming and batch integration: the acceleration of small dimensional tables is no longer loaded through heterogeneous data import, but directly creates the appearance in Hologres and provides services for online OLAP application scenarios directly through the federated query (join) capability of the outer and inner tables.

Business value

From the beginning of contact with Hologres to the specific scenarios of Hologres actually landing CCO, including scenarios such as double 11 Bright Top Command screen and daily operation, the significant business value brought by Hologres is mainly as follows:

1) Real-time architecture upgrade

Real-time data closed-loop flow

Up to now, 60% of the real-time jobs run on the new real-time data warehouse architecture, and the maintenance of public layer details is all switched to consumption through Hologres Binlog subscriptions. This year, in order to maintain system stability, we still put some core business Point Query queries on Lindorm, and consume Binlog through Flink tasks to maintain real-time synchronization between the two engines. In stress testing, Hologres connector can currently support write and read of a single table of over 10 million / s. It has exceeded the demands of our business.

Greatly promote peak cutting and reduce cost

In this year's promotion, in practice, after the peak value of the transaction is written to the row storage table of more than several million per second, we use the merge capability of the Hologres Server end for the same batch and the crowded batch capability of Hologres Connector to complete the write and write. The peaking effect is 30%, reducing the server cost.

2) Fast response of self-help analysis

FBI+Vshow+Hologres self-help real-time large screen

We synchronize the existing public layer detail data to the Hologres inventory table in real time, and through the self-defined large screen configuration of the business waiter in FBI, we realize the ability of self-help analysis of real-time data business, and solve the Gap of business claims and data development resources encountered every year.

Flexible indexing mechanism

The index can be flexibly customized according to the scene, and the multi-dimensional analysis scenarios such as sorting, retrieval and aggregation can be flexibly customized through distribution key, clustering key and segment key, which greatly improves the query performance.

Table group and shard count optimization

Put the tables that need to be associated into the same table group according to the business scenario, and the mechanism of reducing shuffle through local join can greatly improve the response time of OLAP query. Create a sentry table to facilitate developers to add / modify / delete new tables directly. In practice, putting the table into the table group of shard count as small as possible can greatly reduce the overhead of each SQL startup and reduce the response time. In our actual optimization, an aggregation operation for a waiter is optimized from seconds to milliseconds.

3) systematization of service resources

Thousands of + large screens of on-site management of service resources help service resources to reasonably dispatch manpower, forecast and schedule shifts, real-time monitoring and early warning, and help dozens of + SP service providers, a number of government enterprises and dozens of schools and enterprises to greatly improve the scheduling capacity of service resources, so that tens of thousands of + waiter can quickly respond to service requests from businesses and consumers.

4) Intelligent experience engine

Based on CCO business data + consumer omni-channel chat data + behavioral operation data, around the reverse full-link transaction scenario, buyers and sellers unite, structured and unstructured cross, deeply insight into the root causes of the problem, and quickly solve the problem. In the past, it takes a lot of manpower, energy and material resources to find the problem to find and solve the problem, but now the intelligence of the experience engine enables the problem to be quickly located. In this way, there is more time and energy to solve the problem, which can be solved in a few minutes, improving the user experience of the whole process.

5) overall cost savings of nearly 30%

Cost is also an important consideration for the business, especially when the amount of data is increasing. After replacing Hologres, the overall cost savings are estimated to be several million, about 30% less than before, throughout the Singles' Day period. At present, CCO is still in the transition stage of migration. In order to ensure the overall stability of the system, some businesses have not been completely replaced, and the overall migration will begin to be promoted later. It is expected that more resources will be saved after the overall migration is completed.

About how Hologres perfectly supports double 11 intelligent customer service real-time data warehouse is shared here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.