How to realize the primary key in database design 07/01 Update SLTechnology News&Howtos

How to realize the primary key in database design

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly explains "how to realize the primary key in database design". Interested friends may wish to take a look at it. The method introduced in this paper is simple, fast and practical. Next let the editor to take you to learn "how to implement the primary key in database design"!

When designing based on relational database, it is usually necessary to specify a primary key for each table. The so-called primary key is the attribute or attribute group that can uniquely identify a row of records in the table. A table can have only one primary key, but can have multiple candidate indexes. Because the primary key can uniquely identify a row of records, you can ensure that there are no errors when performing data updates, deletions, and modifications. Of course, other fields can help us eliminate sharing conflicts when performing these operations, which is not the focus of this article and will not be discussed in detail. In addition to the above functions, primary keys often form referential integrity constraints with foreign keys to prevent data inconsistencies. Therefore, when designing the database, the primary key plays a very important role. Common database primary key selection methods are: automatic growth, manual growth, UniqueIdentifier, joint (compound), time series + random number, "COMB (Combine)" type.

I. automatic growth

Many database designers like to use auto-growing fields because they are easy to use. Automatic growth allows us to add data to the database without considering the value of the primary key. After the record is inserted, the database system automatically assigns a value to it to ensure that there is absolutely no repetition. If we use the SQL Server database, we can also use the @ @ IDENTITY global variable to get the primary key value assigned by the system after the record is inserted.

Although auto-growing fields will save us a lot of tedious work, there is also a potential problem with using it, that is, it is difficult to fill in the values of primary and foreign keys in data buffering mode. Suppose there are two primary and secondary tables:

Order (OrderID, OrderDate) order form

OrderDetial (OrderID, LineNum, ProductID, Price) order schedule

The OrderID in the Order table is an auto-growing field. Suppose we now need to enter an order, including inserting a record in the Order table and several records in the OrderDetail table. Because the OrderID in the Order table is an auto-growing field, we cannot know its value in advance before the record is formally inserted into the database, and we can only know what value the database has assigned to it until it is updated. This will lead to the following contradictions:

First, in order to add the correct value to the OrderID field of the OrderDetail, you must update the Order table to obtain the OrderID value assigned to it by the system, and then populate the OrderID column of the OrderDetail table with this OrderID. Finally, update the OderDetail table. However, in order to ensure data consistency, Order and OrderDetail must be updated in transactional mode, that is, either both tables are updated successfully at the same time, or both fail, and they obviously contradict each other.

Secondly, when we need to replicate data among multiple databases (SQL Server's data distribution and subscription mechanism allows us to copy data between databases), automatically growing fields may cause primary key conflicts and the loss of table associations during data merging. Imagine that when an Order table in one database replicates a database to an Order table in another library, should OrderID grow automatically? If it grows automatically, the association of its child table OrderDetial will be lost, and if it doesn't grow, it will repeat with the existing data primary key. Isn't that contradictory?

Thirdly, the value of self-increment needs to maintain a global data value in the system, and the value is incremented every time the data is inserted. When in a concurrent environment that produces a unique identity, each increment value must be unlocked for this global value to ensure the uniqueness of the increment. Cause concurrency bottleneck and reduce query performance.

In addition, when the data table is large enough or frequently changes and inserts cause the primary key type value to be out of range, this situation is rarely encountered, but it is also a problem that we must consider when designing the data table.

II. Manual growth field

Since auto-growing fields can cause so much trouble, we might as well consider using manually growing fields, that is, the values of primary keys need to be maintained on their own, and it is usually necessary to set up a separate table to store the current primary key values. For narrative convenience, still use the above example to illustrate, create a new table called IntKey, containing two fields, KeyName and KeyValue. Just like a HashTable, give a KeyName, you can know what the current KeyValue is, and then manually implement the key data increment. In SQL Server, you can write such a stored procedure to let the process of fetching key values proceed automatically. The code is as follows:

CREATE PROCEDURE [GetKey]

@ KeyName char (10)

@ KeyValue int OUTPUT

UPDATE IntKey SET @ KeyValue = KeyValue = KeyValue + 1 WHERE KeyName = @ KeyName

In this way, by calling the stored procedure, we can get the latest key values to ensure that there are no duplicates. If the OrderID field is set as a manual growth field, our program can be implemented by the following steps: first, call the stored procedure to get an OrderID, then use this OrderID to populate the Order table and OrderDetail table, and finally update the two tables under the transaction mechanism.

Using manual growth fields as primary keys in data replication between databases can ensure that there will be no key value conflicts in the process of data merging, as long as different primary key segments are assigned to different data tables. However, the use of manual growth fields will increase the burden on the network, and the current primary key value must be obtained by adding a database access, which will increase the load on the network and the database, when in a low-speed or disconnected network environment. This approach will have great disadvantages. At the same time, manual maintenance of primary keys should also consider various factors such as concurrency conflicts, which will increase the complexity of the system.

Third, use UniqueIdentifier

SQL Server provides us with the UniqueIdentifier data type and a generation function NEWID (), which uses NEWID () to generate a unique UniqueIdentifier. UniqueIdentifier occupies 16 bytes in the database, and the probability of repetition is almost zero, which is called the unique identification in the world. We often see something like 768427bf-9b37-4776-97ca-000365e160d5 or {45F0EB02-0727-4F2E-AAB5-E8AEDEE0CEC5} when there is an error in the registry or WINDOWS program that needs debugging. It is actually a UniqueIdentifier,Windows that uses it to identify COM components and interfaces to prevent duplication. In .NET, UniqueIdentifier is called GUID (Global UniqueIdentifier). In C #, you can generate a GUID using the following command:

Guid u = System.Guid.NewGuid ()

For the Order and OrderDetail programs mentioned above, if we choose UniqueIdentifier as the primary key, we can completely avoid the problem of adding network RoundTrip mentioned above. Generate the GUID fill primary key directly through the program, regardless of whether there will be duplicates. But the UniqueIdentifier field also has serious drawbacks: first, it is 16 bytes long, four times the length of an integer, and takes up a lot of storage space. To make matters worse, UniqueIdentifier generation is irregular, that is, unordered, and it is a time-consuming operation to build an index on it (most databases have indexes on the primary key). Some people have done experiments that when the record of the data table is relatively large, inserting the same amount of data at different data levels, using UniqueIdentifier data as the primary key is slower than using Integer data, and without considering the table association, for the sake of efficiency, we should avoid using UniqueIdentifier database as the primary key value as far as possible, but with the increasing speed of modern computer calculation. Using UniqueIdentifier primary keys in small and medium-sized projects is also an option.

Fourth, use business fields to federate primary keys

It is customary for database designers to use business meaningful fields to form a compound primary key as the primary key of the datasheet based on database tools such as DEPHI and POWERBUILDER. Of course, the use of business primary keys has its inherent advantages. in general, the database system will establish a clustered index by default, and the clustered index is based on the ascending order of the primary key. We do not feel this difference when the amount of data is relatively small. When the amount of data is relatively large, the advantage of this clustered index based on primary key definition appears. This makes the data table accurately confirm the physical location of the disk where the data is inserted or updated according to the index, reduce the head addressing time, improve the database performance, and ensure the integrity of the data in a business sense. increase the reliability of the program. However, the joint index based on business fields will take up more disk space when there are more business fields, and the index pages will take up more memory pages, resulting in a lower query hit rate. In addition, using the business primary key, when it comes to the modification of the primary key data, we should record the relationship table between the new value and the original value in the programming process, and compare the new value with the original value when updating, which increases the complexity of writing the program.

Time series + random number primary key

The use of time accurate to milliseconds or even sodium seconds and a randomly generated two-digit number as the primary key, such as 200911282311528 + two-digit random number, is an effective way to solve the primary key problem. The resulting primary key not only avoids the disorder when the UniqueIdentifier field is used as the primary key, but also effectively avoids the trouble caused by the automatic growth primary key, such as replication and data import. However, in the network real-time system with a large number of users, the uniqueness can not be guaranteed in time and space.

Use the "COMB (Combine)" type

Since the above five primary key type selection strategies have their own shortcomings, is there a good way to solve them? The answer is yes. By using the COMB type (there is no COMB type in the database, which Jimmy Nilsson designed in his article "The Cost of GUIDs as Primary Keys"), you can find a good balance between the above many primary key strategies.

The basic design idea of COMB data type is as follows: since the indexing efficiency of UniqueIdentifier data is inefficient due to irregularity, which affects the performance of the system, can we retain the first 10 bytes of UniqueIdentifier and use the last 6 bytes to represent the time of GUID generation (DateTime), so that we combine time information with UniqueIdentifier, which increases ordering while preserving the uniqueness of UniqueIdentifier? In order to improve the efficiency of index. Some people may worry that reducing UniqueIdentifier to 10 bytes will cause duplication of data. In fact, don't worry, the time accuracy of the last 6 bytes can reach 1max 300 seconds. The possibility that the two COMB type data are exactly the same is that the first 10 bytes of the two GUID generated in this 1max 300 seconds are exactly the same, which is almost impossible! Using the SQL command in SQL Server to implement this idea is:

DECLARE @ aGuid UNIQUEIDENTIFIER

SET @ aGuid = CAST (CAST (NEWID () AS BINARY (10))

+ CAST (GETDATE () AS BINARY (6)) AS UNIQUEIDENTIFIER)

After testing, using COMB as primary key is still slower than using INT as primary key in retrieval, insertion, update, deletion and other operations, but faster than Unidentifier type. In addition to using stored procedures to implement COMB data, we can also use C # to generate COMB data so that all primary key generation can be done on the client side.

The C# code is as follows:

The copy code is as follows:

/ / =

/ * /

/ / return GUID for database operations. A specific time code can improve the retrieval efficiency.

/ / /

/ / COMB (mixed GUID and time) type GUID data

Public static Guid NewComb ()

{

Byte [] guidArray = System.Guid.NewGuid () .ToByteArray

DateTime baseDate = new DateTime (1900 and 1)

DateTime now = DateTime.Now

/ / Get the days and milliseconds which will be used to build the byte string

TimeSpan days = new TimeSpan (now.Ticks-baseDate.Ticks)

TimeSpan msecs = new TimeSpan (now.Ticks-(new DateTime (now.Year, now.Month, now.Day) .Ticks))

/ / Convert to a byte array

/ / Note that SQL Server is accurate to 1/300th of a millisecond so we divide by 3.333333

Byte [] daysArray = BitConverter.GetBytes (days.Days)

Byte [] msecsArray = BitConverter.GetBytes ((long) (msecs.TotalMilliseconds/3.333333))

/ / Reverse the bytes to match SQL Servers ordering

Array.Reverse (daysArray)

Array.Reverse (msecsArray)

/ / Copy the bytes into the guid

Array.Copy (daysArray, daysArray.Length-2, guidArray, guidArray.Length-6,2)

Array.Copy (msecsArray, msecsArray.Length-4, guidArray, guidArray.Length-4,4)

Return new System.Guid (guidArray)

}

/ / =

/ * /

/ / generate time information from the GUID returned by SQL SERVER

/ / /

/ / COMB containing time information

/ time

Public static DateTime GetDateFromComb (System.Guid guid)

{

DateTime baseDate = new DateTime (1900 and 1)

Byte [] daysArray = new byte [4]

Byte [] msecsArray = new byte [4]

Byte [] guidArray = guid.ToByteArray ()

/ / Copy the date parts of the guid to the respective byte arrays.

Array.Copy (guidArray, guidArray.Length-6, daysArray, 2,2)

Array.Copy (guidArray, guidArray.Length-4, msecsArray, 0,4)

/ / Reverse the arrays to put them into the appropriate order

Array.Reverse (daysArray)

Array.Reverse (msecsArray)

/ / Convert the bytes to ints

Int days = BitConverter.ToInt32 (daysArray, 0)

Int msecs = BitConverter.ToInt32 (msecsArray, 0)

DateTime date = baseDate.AddDays (days)

Date = date.AddMilliseconds (msecs * 3.333333)

Return date

}

At this point, I believe you have a deeper understanding of "how to implement the primary key in database design". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.