What is the function of Collation in SQL Server 07/19 Update SLTechnology News&Howtos

What is the function of Collation in SQL Server

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

What is the role of Collation in SQL Server? aiming at this question, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible way.

The composition of Collation

After talking about the storage form of characters, we can talk about what collation is. Collation describes what rules data follow to describe characters in the database, and how characters are sorted and compared. In SQL Server, Collation consists of two parts, for example, a collation in China is Chinese_PRC_CI_AI_WS, the first half refers to the supported character set, corresponding to code page, such as Chinese_PRC corresponding to the code page is 936, all the characters that can be used are defined in this code page. The second half of the CI_AI_WS is used to represent collations, such as:

_ CI (CS) indicates whether letters are case-sensitive, CI is not case-sensitive, CS is case-sensitive. If case-sensitive, lowercase letters are sorted before uppercase; if they are not case-sensitive, they are sorted as case-sensitive letters.

_ AI (AS) indicates whether to distinguish stress, AI does not distinguish, AS does not distinguish. If the stress is not distinguished, then "a" and "a" are sorted as the same characters.

_ KI (KS) indicates whether to distinguish the types of katakana, KI does not distinguish, KS does not distinguish. It is used in Japanese.

_ WI (WS) indicates whether to distinguish full half angles, WI does not distinguish, and WS distinguishes. Half-width is single byte, full-width is double-byte.

Four levels of Collation

Collation has a total of four levels, namely server-level, database-level, column-level and expression-level.

Server-level Collations

The server-level collation is specified when the database instance is installed, and if not specified, windows collation is used as the server-level collation. Windows collation is determined by the regional language in the operating system.

Because we chose Chinese (Simplified,PRC), our default server-level collations is: Chinese_PRC_. Server-level collation is also the default collation for system and user databases. In general, once server-level collation is set, it cannot be changed unless you export all the objects and data in the database, create a master, and then import the data back.

Database-level collations

Database-level collations can be found in create database. When collate, you specify that if you want to modify database-level collations, you can use alter database. Collate to modify. In general, you cannot modify the collations of the system database (master, etc.) unless you use the previously mentioned method of modifying server-level collations to modify the system database.

Column-level collations

When you create or change a table, you can use the COLLATE clause to specify a collation for each string column. Of course, you can also modify column-level collations.

Expression-level collations

Expression-level means to specify collations when executing a sql statement, such as:

SELECT name FROM customer ORDER BY name COLLATE Latin1_General_CS_AI

This query statement indicates that it is sorted according to the collation of Latin1_General_CS_AI. One of the benefits of Expression-level collations is its flexibility.

Lab 1: resolve conflicts caused by collation mismatch

An error is reported when joining tables in two databases with different collations levels. It is possible to use expression-level collations to specify which collations to use to solve the problem. For example, using Collate Database_Default will resolve the conflict by defining or converting the field to the default collation of the current database.

Step1:

Create two tables, the first using the default collation and the second specifying collation on the stuname column.

Create table student1

(

Stuid int not null

Stuname nvarchar (20) not null

)

Create table student2

(

Stuid int not null

Stuname nvarchar (20) COLLATE Latin1_General_CS_AI not null

)

-- find the table connection Step2:

Select s1.century journal s2.* from student1 s1century student 2 s2 where s1.stuname=s2.stuname

Execute the above query and report an error as follows:

Cannot resolve the collation conflict between "Latin1_General_CS_AI" and "Chinese_PRC_CI_AS" in the equal to operation.

Then use Collate Database_Default in expression-level

Select s1.century journal s2.* from student1 s1century student 2 s2 where s1.stuname=s2.stuname Collate Database_Default

The above query was executed successfully.

It is important to note that collation can only be used on columns of string type, and an error will be reported if you use collate on an int column.

Experiment 2: the impact of changing collation on the database.

Objective 1:

Create a database to see if the default database collation of the data is the same as server collation.

Objective 2:

Insert Chinese into the database where database collation is Chinese_PRC_CI_AS, and then change collation to Latin1_General_CS_AI to see if the saved data has changed. If you change collation back to Chinese_PRC_CI_AS again, what will happen?

Purpose 3:

In the case of Latin1_General_CS_AI collation, insert Chinese, what will happen and how to solve it.

Experiment 1: test the storage length of nvarchar and varchar

-- create a database TESTDB3 with the default collation of Chinese_PRC_CI_AS

USE TESTDB1

CREATE TABLE test

(

Lastname NVARCHAR (8) NOT NULL,--nvarchar type, double byte storage

Title VARCHAR (8) NOT NULL,-- varchar type, single byte storage

)

Insert into test values ('name 1' title 1')

Select * from test

Insert into test values ('123456789);-- String or binary data would be truncated.

Insert into test values ('12345678')

Insert into test values (12345678')

Insert into test values ('one two three four five six seven eight', 'one two three four')

Select * from test

-- Summary:

/ *

1.nvarchar (n), which is stored in characters, whether English or Chinese. It can store up to n Chinese or English, but the storage space occupied is 2n+2 bytes.

2.varchar (n) is stored in bytes, with a maximum of n English letters and 2 Chinese characters. But the storage space occupied is n bytes.

* /

-- experiment on the impact of changes in 2:collation on data.

USE TESTDB1

Select * from test

-- step1: change the collation of the modified database from the default Chinese_PRC_CI_AS to Latin1_General_CS_AI. This collation is used in English-speaking countries.

Use master

Alter database TESTDB1 collate Latin1_General_CS_AI

-- step2: the database in the instance that is different from the default collation can be found through the following statement, and our previous TESTDB1 can be found.

Use master

SELECT

NAME AS DATABASE_NAME

, DATABASEPROPERTYEX (NAME,'COLLATION') AS DBCOLLATION

, SERVERPROPERTY ('COLLATION') AS SERVERCOLLATION

FROM SYS.DATABASES

WHERE CONVERT (SYSNAME,DATABASEPROPERTYEX (NAME,'COLLATION')) SERVERPROPERTY ('COLLATION')

-- step3: look at the data in the table after modifying the collation and find that the data has not changed.

USE TESTDB1

Select * from test

-- step4: insert data under the new collation.

Insert into test values ('name 1' title 1');-- after insertion, I found that Chinese has become garbled "?"

Select * from test

-- step5: because lastname is of type nvarchar, we specify nvarchar when inserting.

Insert into test values (N' name 2 'title 2');-- at this time found that the name is not garbled, the title is needless to say or garbled?

Select * from test

-- step6: try inserting nvarchar data in a field of type varchar.

Insert into test values (N 'name 3 minutes N' title 3');-- found that the title is not garbled.

Select * from test

-- step7: change the collation from Latin1_General_CS_AI to the default Chinese_PRC_CI_AS.

Use master

Alter database TESTDB1 collate Chinese_PRC_CI_AS

-- step8: look at the data in the table after modifying the collation and find that the data has not changed.

USE TESTDB1

Select * from test

-- step9: insert characters without specifying unicode type. The query is displayed in Chinese, indicating that the current collation uses unicode encoding by default.

Insert into test values ('name 414' title 4')

Select * from test

Summary:

The change of collation does not change the data originally stored in the database, what it is, or what it is after modification, has not changed.

Latin1_General_CS_AI defaults to non-unicode, so inserting Chinese into garbled code under this collation must be specified when inserting data in the form of unicode, that is, adding the keyword "N", while Chinese_PRC_CI_AS this collation uses double-byte code page, which defines all Chinese characters, so there is no need to specify the keyword "N" when inserting data.

You can insert nvarchar data into columns of the varchar data type, that is, using varcha to store unicode data.

The answer to the question about the role of Collation in SQL Server is shared here. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel to learn more about it.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.