Analysis of data Management of Mass E-commerce subscription Unit based on TableStore 04/21 Update SLTechnology News&Howtos

Analysis of data Management of Mass E-commerce subscription Unit based on TableStore

2025-04-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article introduces the relevant knowledge of "data management and analysis of massive e-commerce order units based on TableStore". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

I. background

Order system exists in various industries, such as e-commerce orders, bank flow, operator phone bills and so on. It is a very extensive and general system. For this kind of system, a classic practice has been formed in the past decade. However, with the development of the Internet and enterprises' attention to data, there are more and more orders that need to be stored and persisted. The importance of data and the expansion of data scale have brought new challenges. Whether the original system can continue to meet the needs has become the focus.

Demand scenario

An e-commerce platform A needs to persist the order data generated by all platforms. At the same time, based on all the order data, the system needs to provide diversified query services for a variety of roles: consumers, shopkeepers and platforms. Consumers can query their own historical orders, merchants can count the best-selling products, and the platform can also analyze user behavior, platform transaction scale and so on. The main query methods include multi-dimensional retrieval of orders, as well as analysis and statistics of order data, such as:

For consumers: [a Consumer] * [nearly 1 year] * [Product name contains' computer 'field] order query

For shopkeepers: [B store] * [nearly 1 month] * [per product] ranking of sales volume

Technical point

The technical points that usually need to be considered technically in order scenarios mainly include the following aspects:

Query ability: need to have rich query types, such as multi-dimension, scope, fuzzy query, etc., as well as sorting, statistics and other functions.

Data volume: while storing massive data, it meets the requirements of strong consistency, high availability, low cost and so on.

Service performance: to cope with high concurrency and high concurrency while ensuring low latency

II. Evolution of the plan

E-commerce usually adopts the traditional solution of MySQL to deal with the order scenario. With the powerful query ability of relational database, users can directly realize multi-dimensional query and data statistics of order data through SQL statements. The so-called data expansion is divided into horizontal and vertical, horizontal is the continuous iterative introduction of new field dimensions, vertical is the total amount of data stored. In the face of these two kinds of order data expansion, the single MySql solution is becoming more and more difficult. The combination scheme of SQL + NoSQL (hereinafter referred to as: combination scheme) arises at the historic moment, with the help of the respective advantages of the two databases to solve the respective needs of different scenarios. However, the combination scheme also brings new problems, which not only sacrifices the space cost, but also increases the development workload and operation and maintenance complexity. Incur additional overhead in ensuring data consistency.

Let's take a look at the following general scenarios:

General scheme 1. MySql sub-library and sub-table scheme

MySql itself has powerful data query and analysis functions, and creates an order system based on MyQql, which can deal with multi-dimensional query and statistical scenarios of order data. With the increase of order data, users will adopt sub-database, sub-table scheme to deal with, through this pseudo-distributed scheme to solve the problems caused by data expansion. However, once the data reaches the bottleneck, it is necessary to re-create a larger sub-database + full-scale migration of data, and the trouble will continue to arise. The problem caused by data iteration and expansion is difficult to overcome by MySql. Only rely on MySql's traditional order plan to highlight the shortcomings.

1. Data vertical (data scale) expansion: using the scheme of sub-database and sub-table, MySql needs to estimate the size of sub-database when deploying. Once the amount of data reaches the upper limit, redeploy and migrate all the data.

2. Horizontal expansion of data (field dimension): schema needs to be pre-defined, and the iterative addition of new fields is complex. When the dimension reaches a certain amount, it will affect the performance of the database.

2. MySql+HBase scheme

The scheme of introducing double data arises at the historic moment. Through the scheme of sharing and storing real-time data and historical data, the problem of data expansion can be solved to a certain extent. The scheme classifies the data into two parts of storage: real-time data and historical data. At the same time, through the data synchronization service, the expired data is synchronized to the historical data.

1. Real-time order data (for example, orders in the past 3 months): real-time orders are stored in MySql database. The speed of the total expansion of real-time orders is limited, and the ability of multi-dimensional query and analysis of real-time data is guaranteed.

2. Historical order data (for example, orders from 3 months ago): historical order data are stored in HBase, with the help of HBase, a distributed NoSql database, to effectively deal with the expansion of order data. It also ensures the persistence of historical order data.

However, this scheme sacrifices the use value of historical order data to users, merchants and platforms, and assumes that the demand frequency of historical data is very low. But once there is a demand, it needs a full table scan, the query speed is slow, and the cost of IO is very high. On the other hand, maintaining data synchronization brings some problems, such as data consistency, soaring cost of synchronous operation and maintenance, and so on.

3. MySql+Elasticsearch scheme

The combination solution also has MySql+Elasticsearch, which also stores the data in two parts, which can solve the problem of order index dimension growth to some extent. Users maintain their own data synchronization service to ensure the consistency of the two parts of data.

1. Full data: the full amount of order data is stored in the MySql database, and the data outside the order ID is stored as a field. The full amount of data is used as persistent storage, and it is also used for anti-checking of non-indexed fields.

2. Query data: only the fields that need to be retrieved are stored in Elasticsearch (based on Lucene distributed index database). With the help of the indexing ability of Elasticsearch, order data that can cope with dimension expansion is provided, and then MySql is checked to obtain complete order information if necessary.

This scheme can cope with the problems caused by the expansion of data dimensions, but with the continuous expansion of orders, the problem of poor scalability of MySql has been exposed again. At the same time, the cost of development and operation and maintenance of the scheme of data synchronization to Elasticsearch is very high, and there are also some disadvantages in the selection of the scheme.

Capability analysis MySqlHBaseElasticsearchTableStore storage mode row storage column storage + index storage scalability stand-alone, poor scalability, horizontal expansion (automatic) horizontal expansion consistency strong consistency, timing consistency

Whether the TableStore scheme with strong consistency and temporal consistency is supported for data volume ~ 1T ~ 100 million rows ~ 10 PB,~ trillion rows ~ 1 PB,~ 100 billion rows ~ 10 PB,~ trillion rows?

If you use the multiple indexing (SearchIndex) scheme developed by Table Storage (TableStore), the above problems can be solved perfectly. TableStore has the characteristics of ready-to-use, charge by quantity and so on. Multivariate index is created at any time, which is a high-quality solution for mass e-commerce order unit data management.

As a fully hosted and distributed NoSQL data storage service provided by Aliyun, TableStore has the functions of [massive data storage], [hot spot data automatic slicing], [massive data multi-dimensional retrieval] and so on, which naturally solves the challenge of order data explosion.

At the same time, SearchIndex function not only ensures the high availability of user data, but also provides data multi-dimensional search, statistics and other capabilities. Create a variety of indexes for a variety of scenes to achieve a variety of mode retrieval. Users can create and activate indexes only when they need them. TableStore ensures the consistency of data synchronization, which greatly reduces the workload of users' scheme design, service operation and maintenance, code development and so on.

This is the end of the content of "data management and analysis of massive e-commerce order units based on TableStore". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.