In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
View query of Oracle character set and settings of Oracle character set modify view database available character set parameter settings
SELECT * FROM v$nls_valid_values ORDER BY parameter,VALUE
This article mainly discusses the following parts: how to view and query the oracle character set, modify the set character set, and common oracle utf8 character set and oracle exp character set problems.
What is the Oracle character set
The Oracle character set is a set of symbols for the interpretation of byte data, which can be divided into sizes and contain each other. ORACLE's architecture that supports national languages allows you to use localized languages to store, process, and retrieve data. It automatically adapts database tools, error messages, sort order, dates, times, currencies, numbers, and calendars to localized languages and platforms.
The most important parameter that affects the Oracle database character set is the NLS_LANG parameter:
It has the following format: NLS_LANG = language_territory.charset
It has three components (language, region, and character set), each of which controls the characteristics of the NLS subset.
Where:
Language: specifies the language of the server message, which affects whether the prompt message is in Chinese or English
Territory: specifies the date and number format of the server
Charset: specifies the character set.
For example, AMERICAN _ AMERICA. ZHS16GBK
From the composition of NLS_LANG, we can see that it is the third part that really affects the database character set.
So the character sets between the two databases can import and export data to each other as long as the third part is the same, and the previous one only affects whether the message is in Chinese or English.
How do I check the database version?
Select * from v$version contains version information, core version information, digit information (32-bit or 64-bit) and other digit information. On Linux/unix platform, you can view it through file
Such as: file $ORACLE_HOME/bin/oracle
2. View database character set
The database server character set select * from nls_database_parameters;, which derives from props$, is the character set that represents the database.
The client character set environment select * from nls_instance_parameters;, which is derived from v$parameter and represents the setting of the character set on the client side, which may be a parameter file, an environment variable, or a registry
Instance character set environment select * from nls_session_parameters; it mainly involves the values of NLS_LANGUAGE and NLS_TERRITORY. NLS_INSTANCE_PARAMETERS comes from v$parameter, note: a lot of materials on the Internet say that "NLS_INSTANCE_PARAMETERS represents the setting of the client's character set, which can be a parameter file, an environment variable or a registry", and everyone is on the Internet. Remember that it is the character set environment that represents the instance.
The character set of the client needs to be the same as that of the server in order to display the non-Ascii characters of the database correctly. If multiple settings exist, alter session > environment variables > registry > parameter file
The character set is required to be consistent, but the language setting can be different, and the language setting is recommended in English. If the character set is zhs16gbk, nls_lang can be American_America.zhs16gbk.
A character set involving three aspects
1. Character set on oracel server side
2. Character set on oracle clien t side
3. The character set of the dmp file.
When doing data import, all three character sets need to be consistent in order to import correctly.
2.1 query the character set on oracle server
There are many ways to find out the character set on the oracle server side. The more intuitive query method is as follows:
SQL > select userenv ('language') from dual
USERENV ('LANGUAGE')
SIMPLIFIED CHINESE_CHINA.ZHS16GBK
2.2 how to query the character set of dmp files
The dmp file exported with oracle's exp tool also contains the character set information, and the second and third bytes of the dmp file record the character set of the dmp file. If the dmp file is small, such as only a few megabytes or dozens of megabytes, you can open it with UltraEdit (hexadecimal), look at the contents of the second and third bytes, such as 0354, and then use the following SQL to find its corresponding character set:
SQL > select nls_charset_name (to_number ('0354)) from dual
ZHS16GBK
If the dmp file is large, such as more than 2G (which is also the most common case), and it is slow or impossible to open it with a text editor, you can use the following command (on the unix host):
Cat exp.dmp | od-x | head-1 | awk'{print $2 $3}'| cut-c 3-6
Then using the above SQL, you can also get its corresponding character set.
2.3 query the character set on the oracle client side
Under the windows platform, it is the NLS_LANG of the corresponding OracleHome in the registry. You can also set it yourself in the dos window, such as set nls_lang=AMERICAN_AMERICA.ZHS16GBK
This only affects the environment variables in this window.
On the unix platform, it is the environment variable NLS_LANG:
$echo $NLS_LANG
AMERICAN_AMERICA.ZHS16GBK
If the result of the check finds that the character set on the server side is not consistent with the character set on the client side, please modify it to the same character set as the server side.
Add:
(1)。 Database server character set
Select * from nls_database_parameters
Derived from props$, is the character set that represents the database.
(2)。 Client character set environment
Select * from nls_instance_parameters
It comes from v$parameter and indicates the setting of the client's character set, which may be a parameter file, an environment variable, or a registry.
(3)。 Session character set environment
Select * from nls_session_parameters
Derived from v$nls_parameters, indicates the setting of the session itself, which may be the environment variable of the session or the completion of alter session. If the session has no special settings, it will be consistent with nls_instance_parameters.
(4)。 The character set of the client needs to be the same as that of the server in order to display the non-Ascii characters of the database correctly.
If multiple settings exist, NLS role priority: Sql function > alter session > environment variables or registry > parameter file > database default parameters
The character set is required to be consistent, but the language setting can be different, and the language setting is recommended in English. If the character set is zhs16gbk, nls_lang can be American_America.zhs16gbk.
III. Modify the oracle character set
8i and above can modify the character set through alter database, but it is only limited from subset to superset. It is not recommended to modify the props$ table, which may lead to serious errors.
Startup nomount
Alter database mount exclusive
Alter system enable restricted session
Alter system set job_queue_process=0
Alter database open
Alter database character set zhs16gbk
As mentioned above, the database character set cannot be changed in principle after creation. Therefore, it is important to consider which character set to use at the beginning of your design and installation. For the database server, incorrect modification of the character set will lead to many unpredictable consequences, which may seriously affect the normal operation of the database, so be sure to confirm whether there is a subset and superset relationship between the two character sets before modification. In general, we do not recommend modifying the character set on the server side of the oracle database unless you have to. In particular, there is no subset and superset relationship between the two most commonly used character sets, ZHS16GBK and ZHS16CGB231280, so theoretically, the conversion between the two character sets is not supported.
However, there are two ways to modify the character set.
1. It usually needs to be converted by exporting database data, rebuilding the database, and then importing the database data.
two。 Modify the character set through the ALTER DATABASE CHARACTER SET statement, but it is limited to modify the character set after creating the database. Only if the new character set is a superset of the current character set can you modify the database character set. For example, UTF8 is the superset of US7ASCII, and you can use ALTER DATABASE CHARACTER SET UTF8 to modify the database character set.
3.1 modify the server character set (not recommended)
1. Shut down the database
SQL > SHUTDOWN IMMEDIATE
two。 Boot to Mount
SQL > STARTUP MOUNT
SQL > ALTER SYSTEM ENABLE RESTRICTED SESSION
SQL > ALTER SYSTEM SET JOB_QUEUE_PROCESSES=0
SQL > ALTER SYSTEM SET AQ_TM_PROCESSES=0
SQL > ALTER DATABASE OPEN
-- here you can go from parent set to subset
SQL > ALTER DATABASE CHARACTER SET ZHS16GBK
SQL > ALTER DATABASE NATIONAL CHARACTER SET AL16UTF16
-- if you are from a subset to a parent set, you need to use the INTERNAL_USE parameter to skip the super subset detection.
SQL > ALTER DATABASE CHARACTER SET INTERNAL_USE AL32UTF8
SQL > ALTER DATABASE NATIONAL CHARACTER SET INTERNAL_USE AL16UTF16
SQL > SHUTDOWN IMMEDIATE
SQL > STARTUP
Note: if there are no large objects, there is no effect on language conversion during use. (keep in mind that the character set must be supported by ORACLE, otherwise you cannot start.) just follow the above method.
If a prompt such as' ORA-12717: Cannot ALTER DATABASE NATIONAL CHARACTER SET when NCLOB data exists' 'appears
There are two ways to solve this problem.
1. Modify the locale using the INTERNAL_USE keyword
two。 Use re-create, but re-create is a little complicated, so please use internal_use
SQL > SHUTDOWN IMMEDIATE
SQL > STARTUP MOUNT EXCLUSIVE
SQL > ALTER SYSTEM ENABLE RESTRICTED SESSION
SQL > ALTER SYSTEM SET JOB_QUEUE_PROCESSES=0
SQL > ALTER SYSTEM SET AQ_TM_PROCESSES=0
SQL > ALTER DATABASE OPEN
SQL > ALTER DATABASE NATIONAL CHARACTER SET INTERNAL_USE UTF8
SQL > SHUTDOWN immediate
SQL > startup
If you follow the above approach, there will be no problem with the locale of National charset
3.2 modify the character set of dmp files
As mentioned above, the second and third bytes of the dmp file record the character set information, so directly modifying the contents of the second and third bytes of the dmp file can 'deceive' the oracle check. In theory, it can only be modified from subset to superset, but in many cases, it can also be modified without the relationship between subset and superset. Some commonly used character sets, such as US7ASCII,WE8ISO8859P1,ZHS16CGB231280,ZHS16GBK, can be changed. Because the change is only the dmp file, so it has little impact.
There are many specific modification methods, the simplest of which is to modify the second and third bytes of the dmp file directly with UltraEdit.
For example, if you want to change the character set of a dmp file to ZHS16GBK, you can use the following SQL to find the hexadecimal code corresponding to the character set: SQL > select to_char (nls_charset_id ('ZHS16GBK'),' xxxx') from dual
0354
Then change the 2 or 3 bytes of the dmp file to 0354.
If the dmp file is too large to be opened with ue, you need to use the program method.
3.3 client character set setting method
1) UNIX environment
$NLS_LANG= "simplified chinese" _ china.zhs16gbk
$export NLS_LANG
Edit the profile file for the oracle user
2) Windows environment
Edit the registry
Regedit.exe-"HKEY_LOCAL_MACHINE -" SOFTWARE-"ORACLE-HOME
Or set it in the window:
Set nls_lang=AMERICAN_AMERICA.ZHS16GBK
Four. knowledge of character set:
4.1 character set
In essence, according to a certain character coding scheme, a set of different numerical codes is assigned to a specific set of symbols. The earliest coding scheme supported by the Oracle database is US7ASCII.
The character set naming of Oracle follows the following naming rules:
That is:
For example, ZHS16GBK represents a 16-bit (two-byte) simplified Chinese character set in GBK encoding format.
4.2 character coding scheme
4.2.1 single byte encoding
(1) single-byte 7-bit character set, which can define 128characters. The most commonly used character set is US7ASCII.
(2) single-byte 8-bit character set, which can define 256 characters, which is suitable for most European countries.
For example: WE8ISO8859P1 (Western European, 8-bit, ISO standard 8859P1 code)
4.2.2 multibyte coding
(1) variable length multibyte coding
Some characters are represented by one byte, others by two or more characters, and variable length multibyte codes are often used to support Asian languages, such as Japanese, Chinese, Hindi, etc.
For example: AL32UTF8 (where AL stands for ALL, meaning applicable to all languages), zhs16cgb231280
(2) fixed length multibyte coding
Each character uses a fixed-length byte encoding scheme. At present, the only fixed-length multibyte encoding supported by oracle is AF16UTF16, which is also used only for national character sets.
4.2.3 unicode coding
Unicode is a single encoding scheme that covers all the known characters currently used in the world, that is, Unicode provides a unique encoding for each character. UTF-16 is a 16-bit encoding of unicode, a fixed-length multi-byte coding, which uses 2 bytes to represent a unicode character, and AF16UTF16 is a UTF-16 coded character set.
UTF-8 is an 8-bit encoding of unicode and a variable length multibyte coding. This encoding can represent a unicode character with 1, 2, 3 bytes. AL32UTF8,UTF8 and UTFE are UTF-8 coded character sets.
4.3 character set Super
When the coding values of one character set (character set A) contain the coding values of all other character sets (character set B), and the same encoding values of the two character sets represent the same characters, character set An is the super of character set B. or character set B is a subset of character set A.
A subset-Super comparison Table (subset-superset pairs) is available in the official documentation of Oracle8i and oracle9i, for example, WE8ISO8859P1 is a subset of WE8MSWIN1252. Because US7ASCII is the earliest Oracle database encoding format, many character sets are supersets of US7ASCII, such as WE8ISO8859P1, ZHS16CGB231280, and ZHS16GBK are all supersets of US7ASCII.
4.4 Database character set (oracle server-side character set)
The database character set is specified when the database is created and usually cannot be changed after creation. When you create a database, you can specify the character set (CHARACTER SET) and the national character set (NATIONAL CHARACTER SET).
4.4.1 character set
(1) used to store CHAR, VARCHAR2, CLOB, LONG and other types of data
(2) used to mark such as table names, column names and PL/SQL variables
(3) used to store SQL and PL/SQL program units, etc.
4.4.2 National character set:
(1) used to store NCHAR, NVARCHAR2, NCLOB and other types of data
(2) the national character set is essentially an additional character set selected for oracle, the main function is to enhance the character processing capacity of oracle, because the NCHAR data type can provide support for the use of fixed-length multi-byte coding in Asia, while the database character set cannot. The national character set is redefined in oracle9i and can only be selected in AF16UTF16 and UTF8 in unicode encoding. The default value is AF16UTF16.
4.4.3 query character set parameters
You can query the following data dictionary or view to view the character set settings
Nls_database_parameters 、 props$ 、 v$nls_parameters
In the query results, NLS_CHARACTERSET represents the character set and NLS_NCHAR_CHARACTERSET represents the national character set.
4.4.4 modify the database character set
As mentioned above, the database character set cannot be changed in principle after creation. But there are two ways to do it.
1. If you need to modify the character set, you usually need to export the database data, rebuild the database, and then import the database data to convert it.
two。 Modify the character set through the ALTER DATABASE CHARACTER SET statement, but it is limited to modify the character set after creating the database. Only if the new character set is a superset of the current character set can you modify the database character set. For example, UTF8 is the superset of US7ASCII, and you can use ALTER DATABASE CHARACTER SET UTF8 to modify the database character set.
4.5 client character set (NLS_LANG parameter)
4.5.1 client character set meaning
The client character set defines the encoding method of the client character data. Any character data sent from or sent to the client is encoded using the character set defined by the client. The client can be regarded as a variety of applications that can be directly connected to the database, such as sqlplus,exp/imp and so on. The client character set is set by setting the NLS_LANG parameter.
4.5.2 NLS_LANG parameter format
NLS_LANG=_.
Language: displays oracle messages, parity, date naming
Territory: specify default date, number, currency, etc. Format
Client character set: specifies the character set that the client will use
For example: NLS_LANG=AMERICAN_AMERICA.US7ASCII
AMERICAN is the language, AMERICA is the region, and US7ASCII is the client character set.
4.5.3 client character set setting method
1) UNIX environment
$NLS_LANG= "simplified chinese" _ china.zhs16gbk
$export NLS_LANG
Edit the profile file for the oracle user
2) Windows environment
Edit the registry
Regedit.exe-"HKEY_LOCAL_MACHINE -" SOFTWARE-"ORACLE-HOME
4.5.4 NLS parameter query
Oracle provides several NLS parameters to customize the database and user machine to adapt to the local format, such as NLS_LANGUAGE,NLS_DATE_FORMAT,NLS_CALENDER, which can be viewed by querying the following data dictionary or v$ view.
NLS_DATABASE_PARAMETERS: displays the current NLS parameter values of the database, including the values of the database character set
NLS_SESSION_PARAMETERS: displays the parameters set by NLS_LANG or the values changed by alter session (excluding the client character set set by NLS_LANG)
NLS_INSTANCE_PARAMETE: displays the parameters defined by the parameter file init.ora
V$NLS_PARAMETERS: displays the value of the current NLS parameter in the database
4.5.5 modify NLS parameters
You can modify the NLS parameter using the following methods
(1) modify the initialization parameter file used when the instance is started
(2) modify the environment variable NLS_LANG
(3) use the ALTER SESSION statement to modify the oracle session
(4) use some SQL functions
NLS role priority: Sql function > alter session > environment variables or registry > parameter files > database default parameters
V. EXP/IMP and character set
5.1 EXP/IMP
Export and Import are a pair of tools for reading and writing Oracle data. Export outputs the data from the Oracle database to the operating system files, and Import reads the data from these files to the Oracle database. Because when using exp/imp for data migration, the character set is involved in the process of data transfer from the source database to the target database. If the character sets of these four links are not consistent, character set conversion will occur.
EXP
_ _ _
| | imp Import File | | Database character set |
The four character sets are
(1) Source database character set
(2) user session character set during Export (set by NLS_LANG)
(3) user session character set during Import (set by NLS_LANG)
(4) Target database character set
5.2 exported conversion process
During the Export process, if the source database character set does not match the Export user session character set, a character set conversion occurs and the ID number of the Export user session character set is stored in the first few bytes of the exported file. Data loss may occur during this conversion process.
Example: if the source database uses ZHS16GBK, and the Export user session character set uses US7ASCII, because ZHS16GBK is a 16-bit character set, and US7ASCII is a 7-bit character set, during this conversion process, Chinese characters cannot find equivalent characters in US7ASCII, so all Chinese characters will be lost and become "?" form, so the data loss has occurred in the Dmp file generated after conversion.
Therefore, if you want to export the source database data correctly, the user session character set in the Export process should be equal to the source database character set or the superset of the source database character set. Export steps:
1. Check the source database character set: env | grep NLS_LANG or type SELECT t.* from nls_database_parameters t in plsql.
two。 Change the character set of the current session to the same character set as the source database: NLS_LANG=Simplified Chinese_China.UTF8 export NLS_LANG.
3. Perform an export action.
5.3 imported conversion process
(1) determine the exported database character set environment
By reading the header of the exported file, you can obtain the character set settings of the exported file
(2) determine the character set to import session, that is, the NLS_LANG environment variables used by Session.
(3) IMP reads the exported file
Read the exported file character set ID and compare it with the NLS_LANG of the import process
(4) if the export file character set is the same as the import Session character set, then there is no need to convert in this step, if different, you need to convert the data into the character set used by the import Session. As you can see, there are two character set conversions in the process of importing data into the database
The first time: the conversion between the character set of the import file and the character set used by the import Session. If the conversion process is not completed correctly, the import process of Import to the target database cannot be completed.
The second time: import the conversion between Session character set and database character set.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.