In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
Editor to share with you how to use load data infile in mysql. I hope you will get something after reading this article. Let's discuss it together.
LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name.txt'
[REPLACE | IGNORE]
INTO TABLE tbl_name
[FIELDS
[TERMINATED BY 'string']
[[OPTIONALLY] ENCLOSED BY 'char']
[ESCAPED BY 'char']
]
[LINES
[STARTING BY 'string']
[TERMINATED BY 'string']
]
[IGNORE number LINES]
[(col_name_or_user_var,...)]
[SET col_name = expr,...]]
The LOAD DATA INFILE statement reads rows from a text file into a table at a high speed. The file name must be a text string.
For more information about the efficiency of INSERT and LOAD DATA INFILE and improving the speed of LOAD DATA INFILE, refer to the accelerated INSERT statement in the administrator's manual.
The character set indicated by the system variable character_set_database is used to interpret the information in the file. SET NAMES and setting character_set_client do not affect the interpretation of the input.
Users can also load data files using the import utility; it does so by sending a LOAD DATA INFILE command to the server. The-- local option causes import to read the data file from the client host. If the client and server support compression protocols, users can specify the-- compress option to achieve better performance in slower networks. Refer to the import-data importer in the administrator's manual.
If the user specifies a keyword, the execution of the LOW_PRIORITY,LOAD DATA statement will be delayed until no other client is reading the table.
If a GSSYS table meets the condition for simultaneous insertion (that is, the table has free blocks in the middle), and you specify a CONCURRENT for the GSSYS table, other threads retrieve data from the table while the LOAD DATA is executing. Even if no other thread is using this table at the same time, using this option will slightly affect the performance of LOAD DATA.
If the LOCAL keyword is specified, it interprets the connected client:
If LOCAL is specified, the client component on the client host reads the file and sends it to the server. You can give the full path to the file to determine its exact location. If a relative path is given, the file name is relative to the directory in which the client component was started.
If no LOCAL is specified, the file is located on the host of the server and is read directly by the server.
When locating files from the server host, the server uses the following rules:
If a complete path is given, the server uses that pathname.
If the relative path of one or more front components is given, the server searches for the file with the data directory relative to the server.
Given a file name that does not have a preamble, the server searches for the file from the current database directory.
Note:
These rules mean that a file given as'/ gsfile.txt' is read from the server's data directory, while a file given as `gsfile.txt' is read from the current database's data directory. For example, the following LOAD DATA statement reads the file 'data.txt', from the db1 database directory because db1 is the current database, even if the statement explicitly loads the file into a table in the db2 database, it will be read from the db1 directory:
Sqlcli > USE db1
Sqlcli > LOAD DATA INFILE 'data.txt' INTO TABLE db2.gs_table
Note:
When specifying a Windows pathname, you use a slash instead of a backslash. If you want to use a backslash, you must write double.
For security reasons, when reading a text file located on the server, the file must be in the database directory or be readable by all users. That is, when LOAD DATA INFILE is performed on a file on the server, the user must obtain FILE permission.
Refer to the permissions provided by GBase in the administrator's manual.
Using LOCAL is slightly slower than when the server accesses the file directly, because the contents of the file must be transferred over a client-to-server connection. For local files, on the other hand, you do not need to obtain FILE permissions.
LOCAL works only if both the server and the client allow it. For example, if when kernel starts,-- local-infile=0, then LOCAL does not work. Refer to the LOAD DATA LOCAL security issues in the administrator's manual.
The REPLACE and IGNORE keywords handle input records that duplicate existing primary key values.
If REPLACE is specified, the input row will replace the existing row (that is, the row with the same primary index value will be the existing row). Refer to the REPLACE syntax.
If IGNORE is specified, input lines that duplicate the existing row primary key values are skipped. If you do not specify either of the two, the action behavior depends on whether the LOCAL keyword is specified. If no LOCAL is specified, if a duplicate key value is found, an error is generated and the rest of the text file is ignored. If LOCAL is specified, the default operation behavior will be the same as that of the specified IGNORE; this is because the server has no way to terminate the transfer of files during the operation.
If you want to ignore foreign key constraints in the load operation, you can execute the SET FOREIGN_KEY_CHECKS=0 statement before executing the LOAD DATA.
If the user uses LOAD DATA INFILE on an empty GsSYS table, all non-unique indexes are created in batches (like REPAIR). This usually makes LOAD DATA INFILE faster when there are many indexes. Normally very fast, but in some extreme cases, users can use ALTER TABLE.. DISABLE KEYS closes them and uses ALTER TABLE after loading the files. ENABLE KEYS rebuilds the index, thereby speeding up index creation. Refer to the accelerated INSERT statement in the administrator's manual.
LOAD DATA INFILE is SELECT. Reverse operation of INTO OUTFILE. Refer to the SELECT syntax. Use SELECT... INTO OUTFILE writes data from a database to a file. Use LOAD DATA INFILE to read files into the database. The syntax of the FIELDS and LINES clauses of both commands is the same. Both clauses are optional, but if both are specified at the same time, the FIELDS clause must appear before the LINES clause.
If the user specifies a FIELDS clause, its clauses (TERMINATED BY, [OPTIONALLY] ENCLOSED BY, and ESCAPED BY) are also optional, but the user must specify at least one of them.
If the user does not specify a FIELDS clause, the default is the following statement:
FIELDS TERMINATED BY't 'ENCLOSED BY' 'ESCAPED BY'\'
If the user does not specify a LINES clause, the default is the following statement:
LINES TERMINATED BY'\ n 'STARTING BY''
In other words, when reading input, the default LOAD DATA INFILE looks like this:
Find the boundary of the line at the newline.
Do not omit any line prefixes.
Separate rows into fields at tabs.
Fields are not considered to be encapsulated by any quotation mark characters.
Interprets a locator, newline, or `\ 'that begins with "\" as a text character with a field value.
Conversely, when writing output, the default value causes SELECT... The INTO OUTFILE performance is as follows:
Add tabs between field values.
Do not encapsulate fields with any quotation mark characters.
Use "\" to escape locator, newline, or `\ 'character instances that appear in the field value.
Add a newline character at the end of the line.
Note:
In order to write FIELDS ESCAPED BY'\', the user must specify two backslashes, which are read as one backslash.
Note:
If it is a text file from a Windows system, you may have to use LINES TERMINATED BY'\ r\ n' to read the file, because the Windows system is characterized by the use of two characters as line Terminators. In some programs,\ r may be used as a line Terminator when writing a document, such as a WordPad. When reading such files, you need to use LINES TERMINATED BY'\ r'.
If all the lines to be read have prefixes that the user wants to ignore, you can use LINES STARTING BY 'prefix_string' to skip the prefix (and anything before it). If a line has no prefix, the entire line is skipped. Note that prefix_string may be in the middle of the line!
For example:
Sqlcli > LOAD DATA INFILE'/ tmp/test.txt'
-> INTO TABLE test LINES STARTING BY "xxx"
Use it to read files that contain the following:
Xxx "Row", 1
Something xxx "Row", 2
You can get the data ("row", 1) and ("row", 2).
The IGNORE number LINES option can be used to ignore lines at the beginning of the file. For example, you can use IGNORE 1 LINES to skip the first row with column names:
Sqlcli > LOAD DATA INFILE'/ tmp/test.txt'
-> INTO TABLE test IGNORE 1 LINES
When users use SELECT back and forth. When INTO OUTFILE and LOAD DATA INFILE write data from a database to a file and then read it into the database from the file, the fields and line processing options of the two commands must match. Otherwise, LOAD DATA INFILE will not interpret the contents of the file correctly. Suppose the user uses SELECT... INTO OUTFILE writes data to a file in comma-separated fields:
Sqlcli > SELECT * INTO OUTFILE 'data.txt'
-> FIELDS TERMINATED BY','
-> FROM table2
In order to read back a comma-separated file, the correct statement should be:
Sqlcli > LOAD DATA INFILE 'data.txt' INTO TABLE table2
-> FIELDS TERMINATED BY','
If the user tries to read the file with the statement shown below, it will not work because the command LOAD DATA INFILE distinguishes the field value with a locator:
Sqlcli > LOAD DATA INFILE 'data.txt' INTO TABLE table2
-> FIELDS TERMINATED BY'\ t'
The possible result is that each input line will be interpreted as a separate field.
LOAD DATA INFILE can also be used to read files obtained from external sources. For example, files in dBASE format have fields separated by commas and enclosed in double quotation marks. If the line in the file terminates with a newline character, the fields and line processing options that the user will use to load the file are described below:
Sqlcli > LOAD DATA INFILE 'data.txt' INTO TABLE tbl_name
-> FIELDS TERMINATED BY', 'ENCLOSED BY' "'
-> LINES TERMINATED BY'\ n'
Any field and row processing option can specify an empty string (''). If not empty, the FIELDS [OPTIONALLY] ENCLOSED BY and FIELDS ESCAPED BY values must be a single character. FIELDS TERMINATED BY and LINES TERMINATED BY values can be more than one character. For example, to write to a line terminated by a carriage return newline, or to read a file that contains such a line, you should specify a LINES TERMINATED BY'\ r\ n 'clause.
Reference to the FIELDS [OPTIONALLY] ENCLOSED BY control field. For the output (SELECT... INTO OUTFILE), if the user omits the word OPTIONALLY, all fields are surrounded by ENCLOSED BY characters. An example of such an output file with a comma as a field delimiter is as follows:
"1", "a string", "100.20"
"2", "a string containing a, comma", "102.20"
"3", "a string containing a\" quote "," 102.20 "
"4", "a string containing a\", quote and comma "," 102.20 "
If the user specifies that OPTIONALLY,ENCLOSED BY characters are only used to wrap fields such as fields that contain string types (such as CHAR,BINARY,TEXT or ENUM):
1, "a string", 100.20
2, "a string containing a comma", 102.20
3, "a string containing a\" quote ", 102.20
4, "a string containing a\", quote and comma ", 102.20
Note:
An ENCLOSED BY character that appears in a field value is escaped by using the ESCAPED BY character as its prefix. Also note that if the user specifies an empty ESCAPED BY value, it may result in an output file that cannot be read correctly by LOAD DATA INFILE. For example, if the escape character is empty, the output shown above becomes the output shown below. Notice the second field in the fourth line, which contains a comma followed by a quotation mark that looks like the end of a field:
1, "a string", 100.20
2, "a string containing a comma", 102.20
3, "a string containing a" quote ", 102.20
4, "a string containing a", "quote and comma", 102.20
For input, the ENCLOSED BY character, if present, is stripped from the end of the field value. This is true regardless of whether OPTIONALLY is specified or not; OPTIONALLY does not affect input interpretation. If the ESCAPED BY character exists before the ENCLOSED BY character, it is interpreted as part of the current field value
If a field begins with an ENCLOSED character, this character instance is considered to terminate a field value as long as it is followed by a field or line TERMINATED BY sequence. To be clear, if you want to use ENCLOSED BY characters in a field, you can repeat that character twice, and they will be interpreted as a single ENCLOSED BY character. For example, if you specify ENCLOSED BY'"', the quotation marks will be treated as follows:
"The"BIG"boss"-> The "BIG" boss
The "BIG" boss-> The "BIG" boss
The "" BIG "" boss-> The "" BIG "" boss
FIELDS ESCAPED BY controls how special characters are written or read. If the FIELDS ESCAPED BY character is not empty, it is used as a prefix for the following output characters:
FIELDS ESCAPED BY character
FIELDS [OPTIONALLY] ENCLOSED BY character.
The first character of the FIELDS TERMINATED BY and LINES TERMINATED BY values.
ASCII 0 (actually write ASCII '0characters after the escape character instead of a zero byte).
If the FIELDS ESCAPED BY character is empty, no characters will be escaped and the NULL value will still be output as NULL instead of\ N. It may not be a good idea to specify an empty escape character, especially if the user's data field value contains any of the characters in the list.
For input values, if the FIELDS ESCAPED BY character is not a null character, it is stripped when it occurs, and the following characters are then used as part of the field value. The exception is the escaped'0' or'N' (for example, or\ N, where the escape character is'). These sequences are understood as ASCII NUL (a zero byte) and NULL. The rules for NULL processing are described later in this section.
For more "\" escape syntax information, check the text value.
In some cases, fields interact with row processing:
If LINES TERMINATED BY is an empty string and FIELDS TERMINATED BY is not empty, the lines end with FIELDS TERMINATED BY.
If the FIELDS TERMINATED BY and FIELDS ENCLOSED BY values are both empty (''), then a fixed line (undelimited) format is used. When using a fixed row format, no delimiters are used between fields (but the user still has a line Terminator). The writing and reading of column values uses the display width of the column. For example, if a column is defined as INT (7), the value of the column will be written using a 7-character field. For input, the column value is obtained by reading 7 characters.
LINES TERMINATED BY is still used to detach rows. If a row does not contain all the fields, the rest of the column is set to their default values. If the user does not have a line Terminator, the user should set it to''. In this case, the text file must contain all the fields for each line.
The fixed line format also affects the processing of NULL values; see below. Note that if the user is using a multi-byte character set, the fixed-length format will not work.
There are many ways to handle NULL values, depending on the FIELDS and LINES options used by the user:
For the default FIELDS and LINES values, NULL is written as\ N when output, and\ N is read as NULL when read in (assuming the ESCAPED BY character is "\").
If FIELDS ENCLOSED BY is not a null value, the field containing the literal word NULL is read as the NULL value. This is different from the word NULL, which is surrounded by FIELDS ENCLOSED BY characters. The word is read as the string 'NULL'.
If FIELDS ESCAPED BY is empty, the NULL value is written as the word NULL.
In a fixed-line format (when both FIELDS TERMINATED BY and FIELDS ENCLOSED BY are null), NULL is written as an empty string. Note that this causes both the NULL value in the table and the empty string to be indistinguishable when being written to the file, because both are written as empty strings. If the user needs to be able to distinguish between the two when reading the file and returning it, the user should not use a fixed line format.
Some situations that cannot be supported by LOAD DATA INFILE:
Fixed-size record rows (FIELDS TERMINATED BY and FIELDS ENCLOSED BY are both empty) and BLOB or TEXT columns.
If the user specifies that one delimiter is the same as another, or the prefix of another, LOAD DATA INFILE may not interpret the input correctly. For example, the following FIELDS clause will cause problems:
FIELDS TERMINATED BY'"'ENCLOSED BY'"'
If FIELDS ESCAPED BY is empty, a field value that contains the value of FIELDS ENCLOSED BY or LINES TERMINATED BY followed by FIELDS TERMINATED BY will cause LOAD DATA INFILE to stop reading a field or line prematurely. This is because LOAD DATA INFILE does not know where the field or row value ends.
The following example loads all the columns of the persondata table:
Sqlcli > LOAD DATA INFILE 'persondata.txt' INTO TABLE persondata
By default, when no field column is provided after the LOAD DATA INFILE statement, LOAD DATA INFILE assumes that the input row contains all the fields in the table column. If the user wants to load some columns in the table, specify a list of fields:
Sqlcli > LOAD DATA INFILE 'persondata.txt'
-> INTO TABLE persondata (col1,col2,...)
If the order of the fields in the input file is different from that of the columns in the table, the user must also specify a list of fields. Otherwise, GBase does not know how to match the input fields to the columns in the table.
Column lists can contain column names or user variables and support SET clauses. This makes it possible to assign values to user variables with input values and transform those values before assigning the results to the column.
The user variable in the SET clause can be used for many purposes. The following example takes the first column of the data file directly as the value of t1.column1 and assigns the second column to a user variable that performs a division operation before being the value of t2.column2
LOAD DATA INFILE 'file.txt'
INTO TABLE t1
(column1, @ var1)
SET column2 = @ var1/100
The SET clause can provide values that are not derived from the input file. The following statement sets column3 to the current date and time:
LOAD DATA INFILE 'file.txt'
INTO TABLE t1
(column1, column2)
SET column3 = CURRENT_TIMESTAMP
By assigning an input value to a user variable, you can discard the input value and not assign this value to a column of the table:
LOAD DATA INFILE 'file.txt'
INTO TABLE t1
(column1, @ dummy, column2, @ dummy, column3)
Using column / variable lists and SET clauses is subject to the following restrictions:
The assignment column name in the SET clause should only be to the left of the assignment operator.
In the SET assignment statement, you can use subqueries. This query returns a value that will be assigned to the column, which may simply be a scalar query. You cannot use subqueries to query the tables that will be imported.
For column / variable lists or SET clauses, rows that are ignored because of the use of the IGNORE clause are not processed.
Because the user variable does not display the width, the user variable cannot be used when the imported data is in a fixed row format.
When processing input rows, LOAD DATA divides the rows into fields, and if a list of columns / variables and SET clauses are provided, these values are used according to it. The resulting rows are then inserted into the table. If this table has BEFORE INSERT or AFTER INSERT triggers, they are activated before and after rows are inserted.
If the record row entered has too many fields, the extra fields will be ignored and the number of warnings will be increased.
If an input row has fewer fields, the column without the input field is set to the default value. The default assignment is described in the CREATE TABLE syntax.
An empty field value is different from the explanation for the loss of field value:
For string types, the column is set to an empty string.
For numeric types, the column is set to 0.
For date and time types, the column is set to a zero value that is appropriate for the column type. Reference date and time type.
If you explicitly assign an empty string to a string, number, or date or time type in an INSERT or UPDATE statement, the user will get the same result as above.
The TIMESTAMP column is set to the current date and time only in two cases. In one case, when there is a NULL value in the column (that is,\ N); in the other case (for the first TIMESTAMP column only), when a field list is specified, the TIMESTAMP column is omitted from the field list.
LOAD DATA INFILE assumes that all inputs are strings, so users cannot set numeric values for ENUM or SET columns in the form of INSERT statements. All ENUM and SET must be specified as a string!
When the LOAD DATA INFILE query ends, it returns an information string in the following format:
Records: 1 Deleted: 0 Skipped: 0 Warnings: 0
After reading this article, I believe you have a certain understanding of "how to use load data infile in mysql". If you want to know more about it, you are welcome to follow the industry information channel. Thank you for reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.