Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to understand the domain model and data model in big data

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

In this issue, Xiaobian will bring you about how to understand domain models and data models in big data. The article is rich in content and analyzes and narrates from a professional perspective. After reading this article, I hope you can gain something.

I vaguely remember the first time I designed a system, I drew a bunch of UML diagrams, faced with Class Diagram(actually domain model), entangled for a long time, do not know how to land. Because, if you follow this class diagram to drop the database, it looks strange and a bit cumbersome. However, if he did not follow this class diagram, he did not know what use this class diagram had.

In retrospect, my dilemma stemmed from my confusion about two important concepts: domain models and data models. Recently, I have found that confusion between these two concepts is not an isolated case, but a very common phenomenon. As a result, it is small enough to affect the irrationality of some module designs, and large enough to affect such major technical decisions as the business middle station, because if the underlying logic, concepts, and theoretical basis are not clear, the system built on it will also have problems, very serious problems.

Since few people have studied and discussed this topic in depth, I think it is necessary to take time to clarify these two concepts carefully to help everyone make better design decisions in their work.

Conceptual definitions of domain models and data models

Domain models focus on domain knowledge, which is the core entity of the business domain, reflecting the key concepts in the problem domain and the relationships between concepts. The key of domain model modeling is to see whether the model can express business semantics explicitly and clearly, and extensibility is the second.

The data model focuses on data storage. All businesses are inseparable from data and CRUD of data. The decision-making factors of data model modeling are mainly non-functional attributes such as scalability and performance. It is unnecessary to over-consider the representation ability of business semantics.

According to Robert in Clean Architecture, the domain model is the core and the data model is the technical detail. But the reality is that both matter.

The reason why these two models are easily confused is that both emphasize Entity and Relationship, which is not the case. Our traditional database data model modeling is based on ER diagram.

Yes, they do have some things in common, and sometimes domain models and data models look alike, or even converge, which is normal. But more often than not, the two are different. The correct approach would be to consciously distinguish between these two models and design them separately, since they would model for different goals. As shown in the figure below, the data model is responsible for data storage, and its essence is scalability, flexibility, and performance. The domain model is responsible for the implementation of business logic, and its essence is the explicit expression of business semantics and the full use of OO characteristics to increase the business representation capability of the code.

However, the reality is that many of our business system designs do not distinguish well between the two. Two common mistakes are made, one is to treat the domain model as a data model, and the other is to treat the data model as a domain model.

Mistaken Domain Model as Data Model

These days, I am working on a quotation optimization project, which involves the issue of quotation rules. The business logic of this block is to say that for different commodities (differentiated by categories, brands, supplier types, etc.), we will give different price ranges, and then judge whether the quotation of the merchant should be automatically approved or blocked.

For this rule, the domain model is very simple, that is, it provides the configuration data required for price control, as shown in the following figure:

If we design our storage according to this domain model, we naturally need two tables: pricerule and pricerange, one for storing price rules and one for storing price ranges.

If we design the data model in this way, we make the mistake of treating the domain model as a data model. Here, it is more appropriate to use a table, pricerange as a field in priceule stored in a field, as shown in the figure below, multiple price range information inside with a json field to access it.

The benefits are obvious:

First of all, maintaining one database table is certainly less expensive than maintaining two.

Second, its data scalability is better. For example, if a new demand comes, I need to add a suggested price interval. If it is two tables, I need to add two new fields to the price_range. If it is json storage, the data model can remain unchanged.

However, in business code, it is not so nice to do things based on json. We need to convert json data objects into domain objects with business semantics. In this way, we can enjoy the convenience brought by the extensibility of the data model without losing the code readability brought by the explicit business semantics of the domain model.

Mistaken Data Model as Domain Model

Indeed, the data model had better be scalable as much as possible. After all, changing the database was a big project. Whether it was adding fields, subtracting fields, adding tables, or deleting tables, it involved a lot of workload.

When it comes to the classic design of the expansion of the data model, it is not Ali's business platform. The core four tables of commodities, orders, payments and logistics benefit from good scalability design, which supports thousands of business scenarios of dozens of Ali businesses.

Take the commodity middle table, for example, which solves all business commodity data storage scalability requirements with an auction_extend vertical table. In theory, this data model can satisfy unlimited business expansion.

JSON fields and vertical tables are good solutions to the problem of data storage scaling. However, it is best not to treat these features as domain objects, otherwise your code is not object-oriented programming at all, but rather extended field-oriented programming, thus making the mistake of treating the data model as a domain model. A better approach would be to convert data objects into domain objects for processing.

As shown below, this code is full of getFeature and addFeature, which is a typical example of misrepresenting the data model as the domain model.

The code shown above is a classmate who wrote business code on a certain platform. He sent it to me on the day he left office. He said he was fed up with this kind of messy code, but as a small P at the bottom, he could not change the situation. In desperation, he could only choose to leave.

Domain models and data models perform their respective functions

The above shows the problems caused by confusing domain models and data models. The correct approach should be to distinguish between domain models and data models, and let them perform their duties, so as to more reasonably structure our application system.

Among them, domain model is domain-oriented, concrete as possible, explicit expression of business semantics is its primary task, extensibility is second. The data model is data storage-oriented and should be as extensible as possible.

When specifically landing, we can adopt the architectural idea of COLA and use gateway as the escape gateway between Data Object and Entity. In addition to escape, gateway also plays the role of anti-corrosion decoupling, removing the direct dependence of business code on underlying data (DO, DTO, etc.), thus improving the maintainability of the system.

In addition, textbooks teach us to do relational database design, to meet 3NF (three normal forms), however, in practice, we often deliberately break this principle because of performance, scalability reasons, such as we will improve access performance through data redundancy, we will improve the scalability of tables through metadata, vertical tables, and extended fields.

Business scenarios are different, and the demands for data extension are different. For simple configuration of data extension such as pricerule, json can be competent. For more complex ones, vertical tables like auctionextend are also good choices.

Wait, some students said, if you do this, the data can be expanded, but how can the data query be solved? You can't always use join table or like it. In fact, you can completely like some configuration data or data with a small amount of data. However, for massive data such as Ali commodities and transactions, of course, you can't like it, but this problem can be easily solved by separating reading and writing and constructing search.

More thoughts on expansion

Finally, give me another thought.

The data extensions mentioned above are limited extensions within the domain. If I don't know what the business domain is, can I do data expansion? Yes, salesforce's force.com does just that, its underlying data store is completely metadata-driven, it uses a table of 500 anonymous fields to support all SaaS services, and the actual meaning of each field is described by metadata. As shown in the figure below, value0 to value500 are reserved service fields, and the specific meaning is defined by metadata.

To be honest, this implementation is indeed a very thoughtful and bold design, and it does support thousands of SaaS applications and salesforce's market value of hundreds of billions of dollars.

It's just, I don't know how salesforce does the mapping from metadata to domain objects, is it through their syntax sugar apex? If there are no domain objects, how do they write their business code? Anyway, according to the students who are vendors in salesforce, their so-called low code still has a lot of code written in apex, and the maintainability is average.

However, most of our apps are oriented towards a defined problem domain, and do not need to provide "infinite" scalability like Salesforce. In this case, I think domain objects are the best bridge between the data model and the business logic.

The above is how to understand the domain model and data model in the big data shared by Xiaobian. If there are similar doubts, please refer to the above analysis for understanding. If you want to know more about it, please pay attention to the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report