Summary of primary key of relational database 07/19 Update SLTechnology News&Howtos

Summary of primary key of relational database

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

When designing based on relational database, it is usually necessary to specify a primary key for each table. The so-called primary key is the attribute or attribute group that can uniquely identify a row of records in the table. A table can have only one primary key, but can have multiple candidate indexes. Because the primary key can uniquely identify a row of records, you can ensure that there are no errors when performing data updates, deletions, and modifications. Of course, other fields can help us eliminate sharing conflicts when performing these operations, which is not the focus of this article and will not be discussed in detail. In addition to the above functions, primary keys often form referential integrity constraints with foreign keys to prevent data inconsistencies. So when designing the database, the primary key plays a very important role. Common database primary key selection methods are: automatic growth, manual growth, UniqueIdentifier, joint (compound), time series + random number, "COMB (Combine)" type.

I. automatic growth

Many database designers like to use auto-growing fields because they are easy to use. Automatic growth allows us to add data to the database without considering the value of the primary key. After the record is inserted, the database system automatically assigns a value to it to ensure that there is absolutely no repetition. If we use the SQL Server database, we can also use the @ @ IDENTITY global variable to get the primary key value assigned by the system after the record is inserted.

Although auto-growing fields will save us a lot of tedious work, there is also a potential problem with using it, that is, it is difficult to fill in the values of primary and foreign keys in data buffering mode. Suppose there are two primary and secondary tables:

Order (OrderID, OrderDate) order form

OrderDetial (OrderID, LineNum, ProductID, Price) order schedule

The OrderID in the Order table is an auto-growing field. Suppose we now need to enter an order, including inserting a record in the Order table and several records in the OrderDetail table. Because the OrderID in the Order table is an auto-growing field, we cannot know its value in advance before the record is formally inserted into the database, and we can only know what value the database has assigned to it until it is updated. This will lead to the following contradictions:

First, in order to add the correct value to the OrderID field of the OrderDetail, you must update the Order table to obtain the OrderID value assigned to it by the system, and then populate the OrderID column of the OrderDetail table with this OrderID. Finally, update the OderDetail table. However, in order to ensure data consistency, Order and OrderDetail must be updated in transactional mode, that is, either both tables are updated to function at the same time, or both fail, which are obviously contradictory.

Secondly, when we need to replicate data among multiple databases (SQL Server's data distribution and subscription mechanism allows us to copy data between databases), automatically growing fields may cause primary key conflicts and the loss of table associations during data merging. Imagine that when an Order table in one database replicates a database to an Order table in another library, should OrderID grow automatically? If it grows automatically, the association of its child table OrderDetial will be lost, and if it doesn't grow, it will repeat with the existing data primary key. Isn't that contradictory?

Thirdly, the value of self-increment needs to maintain a global data value in the system, and the value is incremented every time the data is inserted. When in a concurrent environment that produces a unique identity, each increment value must be unlocked for this global value to ensure the uniqueness of the increment. Cause concurrency bottleneck and reduce query performance.

In addition, when the data table is large enough or frequently changes and inserts cause the primary key type value to be out of range, this situation is rarely encountered, but it is also a problem that we must consider when designing the data table.

In the actual development, it is easy to have primary key conflict. Primary key conflict is the duplicate value of primary key.

For example, t_user t_admin has a primary key and is self-increasing in id.

Insert into t_user (id)

Select id from t_admin

When the primary key is 7 in both tables, the insertion of data is not successful because the primary key conflicts.

Solution.

UPDATE t_admin SET id=-id

WHERE id IS NOT NULL AND id > 0

Commit

II. Manual growth field

Since auto-growing fields can cause so much trouble, we might as well consider using manually growing fields, that is, the values of primary keys need to be maintained on their own, and it is usually necessary to set up a separate table to store the current primary key values. For narrative convenience, still use the above example to illustrate, create a new table called IntKey, containing two fields, KeyName and KeyValue. Just like a HashTable, give a KeyName, you can know what the current KeyValue is, and then manually implement the key data increment. In SQL Server, you can write such a stored procedure to let the process of fetching key values proceed automatically. The code is as follows:

CREATE PROCEDURE [GetKey]

@ KeyName char (10)

@ KeyValue int OUTPUT

UPDATE IntKey SET @ KeyValue = KeyValue = KeyValue + 1 WHERE KeyName = @ KeyName

In this way, by calling the stored procedure, we can get the latest key values to ensure that there are no duplicates. If the OrderID field is set as a manual growth field, our program can be implemented by the following steps: first, call the stored procedure to get an OrderID, then use this OrderID to populate the Order table and OrderDetail table, and finally update the two tables under the transaction mechanism.

Using manual growth fields as primary keys in data replication between databases can ensure that there will be no key value conflicts in the process of data merging, as long as different primary key segments are assigned to different data tables. However, the use of manual growth fields will increase the burden on the network, and the current primary key value must be obtained by adding a database access, which will increase the load on the network and the database, when in a low-speed or disconnected network environment. This approach will have great disadvantages. At the same time, manual maintenance of primary keys should also consider various factors such as concurrency conflicts, which will increase the complexity of the system.

Third, use UniqueIdentifier

SQL Server provides us with the UniqueIdentifier data type and a generation function NEWID (), which uses NEWID () to generate a unique UniqueIdentifier. UniqueIdentifier occupies 16 bytes in the database, and the probability of repetition is almost zero, which is called the unique identification in the world. We often see something like 768427bf-9b37-4776-97ca-000365e160d5 or {45F0EB02-0727-4F2EmurAAB5-E8AEDEE0CEC5} when there is an error in the registry or WINDOWS program that we need to debug. It is actually a UniqueIdentifier,Windows that uses it to identify COM components and interfaces to prevent duplication. In .NET, UniqueIdentifier is called GUID (Global UniqueIdentifier). In C #, you can generate a GUID using the following command:

Guid u = System.Guid.NewGuid ()

For the Order and OrderDetail programs mentioned above, if we choose UniqueIdentifier as the primary key, we can completely avoid the problem of adding network RoundTrip mentioned above. Generate the GUID fill primary key directly through the program, regardless of whether there will be duplicates. But the UniqueIdentifier field also has serious drawbacks: first, it is 16 bytes long, four times the length of an integer, and takes up a lot of storage space. To make matters worse, UniqueIdentifier generation is irregular, that is, unordered, and it is a time-consuming operation to build an index on it (most databases have indexes on the primary key). Some people have done experiments that when the record of the data table is relatively large, inserting the same amount of data at different data levels, using UniqueIdentifier data as the primary key is slower than using Integer data, and without considering the table association, for the sake of efficiency, we should avoid using UniqueIdentifier database as the primary key value as far as possible, but with the increasing speed of modern computer calculation. Using UniqueIdentifier primary keys in small and medium-sized projects is also an option.

Fourth, use business fields to federate primary keys

It is customary for database designers to use business meaningful fields to form a compound primary key as the primary key of the datasheet based on database tools such as DEPHI and POWERBUILDER. Of course, the use of business primary keys has its inherent advantages. in general, database systems establish a clustered index by default, and the clustered index is based on the ascending order of the primary key. We do not feel this difference when the amount of data is relatively small. When the data volume ratio is large, the advantage of this clustered index based on primary key definition appears. This makes the data table accurately confirm the physical location of the disk where the data is inserted or updated according to the index each time the data is accessed, so as to reduce the time between head addressing, thus improve the performance of the database, and ensure the integrity of the data in a business sense. increase the reliability of the program. However, the joint index based on business fields will take up more disk space when there are more business fields, and the index pages will take up more memory pages, resulting in a lower query hit rate. In addition, using the business primary key, when it comes to the modification of the primary key data, we should record the relationship table between the new value and the original value in the programming process, and compare the new value with the original value when updating, which increases the complexity of writing the program.

Time series + random number primary key

Using a time accurate to millisecond or even sodium seconds and a randomly generated two-digit number as the primary key, such as 200911282311528 + two-digit random number, is an effective way to solve the primary key problem. The resulting primary key not only avoids the disorder when the UniqueIdentifier field is used as the primary key, but also effectively avoids the trouble caused by the automatic growth primary key, such as replication and data import. However, in the network real-time system with a large number of users, the uniqueness can not be guaranteed in time and space.

Use the "COMB (Combine)" type

Since the above five primary key type selection strategies have their own shortcomings, is there a good way to solve them? The answer is yes. By using the COMB type (there is no COMB type in the database, which Jimmy Nilsson designed in his article "The Cost of GUIDs as Primary Keys"), you can find a good balance between the above many primary key strategies.

The basic design idea of COMB data type is as follows: since the indexing efficiency of UniqueIdentifier data is inefficient due to irregularity, which affects the performance of the system, can we retain the first 10 bytes of UniqueIdentifier and use the last 6 bytes to represent the time of GUID generation (DateTime), so that we combine time information with UniqueIdentifier, which increases ordering while preserving the uniqueness of UniqueIdentifier? In order to improve the efficiency of index. Some people may worry that reducing UniqueIdentifier to 10 bytes will cause duplication of data. In fact, don't worry, the time accuracy of the last 6 bytes can reach 1max 300 seconds. The possibility that the two COMB type data are exactly the same is that the first 10 bytes of the two GUID generated in this 1max 300 seconds are exactly the same, which is almost impossible! Using the SQL command in SQL Server to implement this idea is:

DECLARE @ aGuid UNIQUEIDENTIFIER

SET @ aGuid = CAST (CAST (NEWID () AS BINARY (10))

CAST (GETDATE () AS BINARY (6)) AS UNIQUEIDENTIFIER)

After testing, using COMB as primary key is still slower than using INT as primary key in retrieval, insertion, update, deletion and other operations, but faster than Unidentifier type. In addition to using stored procedures to implement COMB data, we can also use C # to generate COMB data so that all primary key generation can be done on the client side.

The C# code is as follows:

The copy code is as follows:

/ / =

/ / return GUID for database operations. A specific time code can improve the retrieval efficiency.

/ / /

/ / COMB (mixed GUID and time) type GUID data

Public static Guid NewComb ()

{

Byte [] guidArray = System.Guid.NewGuid () .ToByteArray

DateTime baseDate = new DateTime (1900 and 1)

DateTime now = DateTime.Now

/ / Get the days and milliseconds which will be used to build the byte string

TimeSpan days = new TimeSpan (now.Ticks-baseDate.Ticks)

TimeSpan msecs = new TimeSpan (now.Ticks-(new DateTime (now.Year, now.Month, now.Day) .Ticks))

/ / Convert to a byte array

/ / Note that SQL Server is accurate to 1/300th of a millisecond so we divide by 3.333333

Byte [] daysArray = BitConverter.GetBytes (days.Days)

Byte [] msecsArray = BitConverter.GetBytes ((long) (msecs.TotalMilliseconds/3.333333))

/ / Reverse the bytes to match SQL Servers ordering

Array.Reverse (daysArray)

Array.Reverse (msecsArray)

/ / Copy the bytes into the guid

Array.Copy (daysArray, daysArray.Length-2, guidArray, guidArray.Length-6,2)

Array.Copy (msecsArray, msecsArray.Length-4, guidArray, guidArray.Length-4,4)

Return new System.Guid (guidArray)

}

/ / =

/ / generate time information from the GUID returned by SQL SERVER

/ / /

/ / COMB containing time information

/ time

Public static DateTime GetDateFromComb (System.Guid guid)

{

DateTime baseDate = new DateTime (1900 and 1)

Byte [] daysArray = new byte [4]

Byte [] msecsArray = new byte [4]

Byte [] guidArray = guid.ToByteArray ()

/ / Copy the date parts of the guid to the respective byte arrays.

Array.Copy (guidArray, guidArray.Length-6, daysArray, 2,2)

Array.Copy (guidArray, guidArray.Length-4, msecsArray, 0,4)

/ / Reverse the arrays to put them into the appropriate order

Array.Reverse (daysArray)

Array.Reverse (msecsArray)

/ / Convert the bytes to ints

Int days = BitConverter.ToInt32 (daysArray, 0)

Int msecs = BitConverter.ToInt32 (msecsArray, 0)

DateTime date = baseDate.AddDays (days)

Date = date.AddMilliseconds (msecs * 3.333333)

Return date

}

Summing up the above six primary key selection strategies, I think using the type of "COMB (Combine)" as the primary key is a more appropriate primary key application strategy, but in the actual use process, we should select the appropriate primary key according to objective practice and events, and must not copy mechanically or backfire.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.