Order center, 100 million data architecture, this time 04/16 Update SLTechnology News&Howtos

Order center, 100 million data architecture, this time

2025-04-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly introduces "order Center, 100 million data structure, this time served". In the daily operation, I believe that many people have doubts on the order center, 100 million data structure. This time, the editor consulted all kinds of information and sorted out the simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts of "order Center, 100 million data structure, this time!" Next, please follow the editor to study!

Order center is a typical "multi-key" service in Internet business, that is, user ID, merchant ID, order ID and other key have business query requirements.

With the gradual increase of the amount of data and concurrency, how to design the architecture of order center, which is a "multi-key" business, and what factors need to be considered are the issues to be discussed systematically in this paper.

What is "multi-key" business?

The so-called "multi-key" means that there are foreground online query requirements on multiple attributes in a piece of metadata.

What is the business of the order center and what are the typical business requirements?

Order center is a very common "multi-key" business, which mainly provides order query and modification services. Its core metadata is:

Order (oid, buyer_uid, seller_uid, time, money, detail …)

Where:

Oid is the order ID, primary key

Buyer_uid is the buyer uid

Seller_uid is the seller uid

Time, money, detail,... And so on are order attributes.

In database design, generally speaking, at the beginning of the business, a single database, combined with the index on the query field, can meet the requirements of metadata storage and query.

Order-center: order center service that provides a friendly RPC interface for callers

Order-db: store data on orders and index orders, buyers, sellers, etc.

With the increasing number of orders, the database needs to be split horizontally. Due to the query requirements on multiple key, which field should be used for sharding?

If you use oid to split, queries on buyer_uid and seller_uid need to traverse multiple libraries

If you use buyer_uid or seller_uid to split, queries on other attributes need to traverse multiple libraries

In short, it is difficult to have an one-size-fits-all solution, before launching the technical solution, first sort out the query requirements.

Any architectural design that deviates from business requirements is a hooligan.

Order center, what are the typical business query requirements?

The first category, front desk visits, the most typical of which are three types of requirements:

Order entity query: query order entity through oid. 90% of the traffic belongs to this type of demand.

User order list query: query the list of user historical orders through buyer_uid paging. 9% of the traffic falls into this category.

Merchant order list query: query the merchant historical order list through seller_uid pagination. 1% of the traffic falls into this category.

What are the characteristics of the front desk visit?

The throughput is large, the service requirements are high and available, the access consistency requirements of users to orders are high, and the access consistency requirements of merchants to orders are relatively low, which can accept a certain time delay.

Second, background access, with different access modes according to product and operation requirements:

Inquire according to time, price, commodity, and details

What are the characteristics of background access?

The query on the operation side is basically a batch paging query, because it is an internal system, the number of visits is very low, the requirement for availability is not high, the requirement for consistency is not so strict, and the query delay of seconds or even ten seconds is allowed.

What kind of architectural solution should be used to solve these two different business requirements?

Point one: the architecture design of the separation of foreground and backstage.

If the foreground business and the backend business share a batch of services and a database, it may lead to the "inefficient" access of the "batch query" of "a few requests" in the background, resulting in an occasional instantaneous 100% of the cpu of the database, affecting the access of normal users at the foreground (for example, order query timeout).

The query requirements of foreground and background access are different, and the requirements of the system are also different, so the two should be decoupled and the architecture design of "separation of foreground and background" should be implemented.

The foreground business architecture remains unchanged, site access, service layering, database level segmentation.

The backend business requirements extract independent web/service/db to support and uncouple the systems. For backend businesses with "complex business", "low concurrency", "no need for high availability" and "can accept a certain delay":

The service layer can be removed and the data layer can be accessed directly through dao in the web layer of the operation background.

There is no need for reverse proxy and cluster redundancy.

Data can be synchronized asynchronously through MQ or offline, sacrificing the real-time performance of some data.

You can use "index external" or "HIVE" designs that are more suitable for a large amount of data and allow for higher latency.

After solving the access requirements of the background business, how does the oid,buyer_uid,seller_uid of the foreground split the database at the level?

Point 2: multi-dimensional query is more complex, for complex system design, should be broken one by one.

Assuming that there is no seller_uid, how do you break the query requirements of oid and buyer_uid? The order center, assuming only the query requirements on oid and buyer_uid, is degenerated into a "1-to-many" business scenario. For "1-to-many" business, horizontal segmentation should use the "genetic method".

Point 3: genetic method is a common solution to solve the "one-to-many" business and database level segmentation.

What is the sub-library gene?

Through the buyer_uid sub-library, assuming that it is divided into 16 libraries and uses the way of buyer_uid to route the database, the essence of the so-called module 16 is that the last four bit of buyer_uid determine which database this line of data falls on. These four bit are the sub-library genes.

What is the gene sub-bank?

When the order data oid is generated, the sub-library gene is added at the end of the oid, so that all orders under the same buyer_uid contain the same gene and fall on the same sub-library.

As shown in the figure above, the user of buyer_uid=666 placed an order:

Use the buyer_uid sub-library to decide which library to insert this line of data into.

The sublibrary gene is the last four bit of buyer_uid, that is, 1010.

When generating the order identification oid, first use a distributed ID generation algorithm to generate the pre-60bit (the green part in the image above)

Add the sub-library gene to the last four bit of oid (the pink part in the image above) and assemble the order oid of the final 64bit (the blue part in the image above)

In this way, it is guaranteed that all order oid placed by the same user fall on the same database, and the last four bit of oid are the same, so:

You can locate the library through buyer_uid

The library can also be located through oid.

Assuming that there is no oid, how do you break the query requirements of buyer_uid and seller_uid? The order center, assuming only the query requirements on buyer_uid and seller_uid, is degenerated into a "many-to-many" business scenario. For "many-to-many" business, horizontal segmentation should use "data redundancy".

As shown in the above figure:

When an order is generated, through the buyer_uid sub-library, the sub-library gene is incorporated into the oid and written into the DB-buyer library.

Redundancy of data to DB-seller library through offline asynchronism and binlog+canal

Buyer library is divided into buyer_uid libraries and seller libraries are divided into seller_uid libraries. The former meets the query needs of oid and buyer_uid, while the latter meets the query needs of seller_uid.

There are many ways to make data redundant:

Service synchronous double write

Service asynchronous double write

Offline asynchronous double writing (as shown above, offline asynchronous double writing)

Point 4: data redundancy is a common scheme to solve "many-to-many" business and database level segmentation.

No matter which scheme, because the two-step operation cannot guarantee atomicity, there is always the possibility of data inconsistency. high throughput distributed transactions is an unsolved problem in the industry. at this time, the architecture direction is the ultimate consistency, not to fully guarantee the consistency of the data, but to find the inconsistencies as soon as possible and fix them.

Point 5: ultimate consistency is a common practice of high throughput Internet business consistency.

There are three common scenarios to ensure the ultimate consistency of redundant data:

Full-time scanning of redundant data

Incremental log scan of redundant data

Real-time detection of redundant data online messages

What if oid/buyer_uid/seller_uid exists at the same time?

Combine the above solutions to:

If there is no seller_uid, the "multi-key" business will degenerate into a "1-to-many" business. At this time, we should use the "genetic method" to divide the library: use the buyer_uid sub-library, and add the sub-library gene to the oid.

If there is no oid, the "multi-key" business will degenerate into a "many-to-many" business, so we should use the "data redundancy method" to divide the database: use buyer_uid and seller_uid to separate the database, redundant data, and meet the query requirements on different attributes.

If oid/buyer_uid/seller_uid exists at the same time, the combination of the above two schemes can be used to solve the database level segmentation problem of "multi-key" business.

Summary of main points

The requirement of differentiation between the foreground and the background can be designed by using the architecture of the separation of the foreground and the background.

For complex system design, it should be broken down one by one.

Genetic method is a common scheme to solve the "one-to-many" business and database level segmentation.

Data redundancy is a common scheme to solve "many-to-many" business and database level segmentation.

Ultimate consistency is a common practice of high throughput Internet business consistency.

[this article is the original contribution of 58 Shen Jian, a 51CTO columnist. Please contact the original author for reprint]

Poke here to see more good articles by the author.

At this point, the study of "order Center, 100 million data Architecture, this time served" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.