In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Background: in several years of work, many databases of systems or products have been designed, including stand-alone, local area network environment and Internet environment. Even for the same environment, there will be different designs depending on the business or the amount of data. Recently, we will design a database of Internet products (MySQL service). After previous accumulation, a lot of analysis, comparison and learning have been carried out in the ID design of the table, and I have a more systematic and profound understanding of the design of ID. I summarize the knowledge I have learned and practice and share it with you.
Selection of primary key id
For a relational database, the first step in designing each table is to determine its primary key, which is ID. In "common sense", int type self-increasing id, string type uuid, other business-related unique keys. It's our choice as the primary key. So does it mean that in a table, as long as the attribute column that ensures that the value is unique can be used as the primary key or more suitable as the primary key?
First of all, let's clarify a few concepts:
Logical primary key (proxy primary key): use a field in a database table that is independent of the business logic information in the current table as its primary key, or pseudo primary key
Business primary key (natural primary key): use fields with business logic meaning as primary keys in database tables
Take a very common example: a table of user information with column attributes such as id, user name and mobile phone number. Where the user name and mobile number (both are unique as login accounts) Id can be used as the logical primary key, and the user name and mobile phone number can be used as the business primary key. Can I choose a random one, even if I choose a business primary key without a logical primary key?
So let's first take a look at the fierce differences of opinion between the logical primary key and the business primary key:
Support logical primary key
The table ensures the uniqueness of each record through the primary key, and the primary key of the table should not have any business meaning, because any column that has business meaning is likely to change. One of the most important theories of relational database science is: don't give any business meaning to keywords. If the keyword has a business meaning, when the user decides to change the business meaning, maybe they want to add a few digits to the keyword or change the number to letters, then they must modify the relevant keyword. The primary key in one table may be used as a foreign key by another table. Even a simple change, such as adding a digit to a customer number, can incur significant maintenance overhead.
In order to make the primary key of a table not have any business meaning, one solution is to use a proxy primary key, such as defining an ID field (or other name) for the table that does not have any business meaning, specifically as the primary key of the table.
Sun Weiqin, proficient in Hibernate:Java object persistence Technology, P8
The main reason for using a logical primary key is that once the business primary key changes, the modification of the part of the system associated with the primary key will be inevitable, and the more references, the greater the change. On the other hand, the use of logical primary key only needs to modify the business logic related to the corresponding business primary key, which reduces the scope of influence on the system because of the change of business primary key. The change of business logic is inevitable, because "what will never change is change", no company is immutable, no business is immutable. The most typical example is the business change of the upgrade and the replacement of the driver's license number. Moreover, it is true that the * number is duplicated in reality, so it is difficult to deal with if the * number is used as the primary key. Of course, there are many solutions to deal with change, one of which is to build a new system to keep pace with the times, which is really a good thing for software companies.
Another reason for using a logical primary key is that the business primary key is too large for transmission, processing, and storage. I think generally if the business primary key is more than 8 bytes, we should consider using the logical primary key, because int is 4 bytes, bigint is 8 bytes, and the business primary key is generally a string. The same 8-byte bigint and 8-byte strings are naturally more efficient in transmission and processing. Just imagine the difference between the assembly codes of "12345678" for id and 12345678 for id. Of course, the logical primary key is not necessarily int or bigint, and the business primary key is not necessarily a string, it can also be an int or datetime type, and the transmission is not necessarily the primary key. This needs to be analyzed in detail, but the principle is similar. Here we only discuss the usual situation. At the same time, if other tables need to reference the primary key and also need to store the primary key, then the overhead of this storage space is not the same. And the reference field of these tables is usually a foreign key, or it is usually indexed to facilitate search, which will also cause differences in storage space overhead, which also requires specific analysis.
Another reason for using a logical primary key is that a join query using int or bigint as a foreign key performs faster than a string as a foreign key. The principle is similar to the above, and it will not be repeated here.
Another reason for using logical primary keys is the problem that users or maintainers mistakenly enter data into the business primary key. For example, if you mistakenly input RMB as RXB, the relevant references refer to the wrong data, which is very troublesome once you need to modify it. If the logical primary key is used, the problem can be easily solved. If the business primary key is used, the foreign key data of other tables will be affected. Of course, it can also be solved by cascading update, but not all of them can be cascaded.
-- Summary of SwitchBlade
Support business primary key
If your table contains a column that ensures that it is unique, non-empty, and can be used to locate a record, don't feel the need to add a pseudo primary key just because of tradition.
Bill Karwin "SQL Anti-pattern" p41
The main reason for using a business primary key is that adding a logical primary key adds a business-independent field, and users usually look for business-related fields (such as the employee's job number, the ISBN No of the book. In addition to indexing the logical primary key, we also have to index these business fields, which degrades database performance and increases the overhead of storage space Therefore, for the basic data that does not change very often in the business, it is a better choice to use the business primary key. On the other hand, for the basic data, there are generally few additions, deletions and changes, so the cost of this part will not be too much. If you are worried about the change of business logic at this time, you can also consider using the logical primary key. This requires a specific analysis of specific problems.
Another reason for using a business primary key is that for user operations, it is done through the business field, so in these cases, if a logical primary key is used, one more mapping transformation must be done. I think this worry is superfluous. You can get the results directly by using the business primary key query, regardless of the logical primary key, unless the business primary key itself is not unique. In addition, if the logical primary key is considered in the design, the coding will be based on the primary key, and the transmission, processing and storage within the system are all the same primary key, so there is no conversion problem. Unless the existing system uses a business primary key, there will be a conversion problem only if the existing system is changed to use a logical primary key. For the time being, I can't think of any other scenarios where there is such a transformation.
Another reason for using a business primary key is that security is more important than performance for the banking system, so you will consider using a business primary key, which can be used as either a primary key or a redundant data. avoid the problem of association loss caused by the use of logical primary keys. If the relationship between the main table and the child table is lost for some reason, the bank will face irreparable losses. To prevent this from happening, the business primary key needs to be redundant in important tables, and the best way to deal with this is to use the business primary key directly. For example, * * number, passbook number, card number, etc. So usually the banking system requires the use of business primary keys, this requirement is not for performance considerations but for security considerations.
-- Summary of SwitchBlade
Therefore, it shows that the choice of logical primary key and business primary key is not the result of brain slapping, but the result of decision-making according to different application scenarios and different requirements.
What's the problem if we use self-incrementing id of integer type as the primary key?
The later stage of the table with a very large amount of data often involves the need for horizontal sub-tables, when the self-increasing primary key will become an obstacle. (in fact, there will also be a solution to this situation, please see the article "re-Paipai Network Architecture in the sub-library design"
Selection of ID data types
Let's consider the choice of the primary key from another perspective: the data type.
Integer type:
Integer types are often the best choice for id columns because they are the most efficient and can use the database's self-incrementing primary key.
String type
String types are certainly more space-consuming than integer types and operate more slowly than integer types. I mainly use Mysql, and I suggest that you take a look at P125, the third edition of "High performance MySQL" for an explanation of this topic.
My solution (MySQL): use self-increasing id as primary key to deal with the insertion efficiency problem; using uuid as logical id has many benefits of logical primary key and can be used to deal with the horizontal table after.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.