Iconv,enconv,enca,convmv,unix2 04/19 Update SLTechnology News&Howtos

Iconv,enconv,enca,convmv,unix2

2025-04-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

Iconv,enconv,enca,convmv,unix2dos,dos2unix file format conversion, od/cut/wc/dd/diff/uniq/nice/du and other commands

[Linux common tools] 1.1 three formats of diff commands

Abstract: 1. View the file encoding in Vim: set fileencoding can display the file encoding format. If you just want to view files in other encoding formats or want to solve the problem of viewing garbled files with Vim, you can add the following to the ~ / .vim rc file: set encoding=utf-8 fileencodings=ucs-bom,utf-8,cp936 so that vim can automatically recognize the file encoding (it can automatically recognize UTF-8

1. View file encodings in Vim: set fileencoding

The file encoding format can be displayed.

If you just want to view files in other encoding formats or want to solve the problem of garbled viewing files with Vim, you can use the

Add the following to the vim rc file: set encoding=utf-8 fileencodings=ucs-bom,utf-8,cp936

In this way, you can let vim automatically identify file encodings (you can automatically identify UTF-8 or GBK encoded files), which is actually an attempt according to the coding list provided by fileencodings. If you do not find a suitable encoding, open it with latin-1 (ASCII) coding.

Import: after set ff=unix, save the file.

2.vim file transcoding 1. Convert file encoding directly in Vim, such as converting a file to utf-8 format

: set fileencoding=utf-8

3.iconv file transcoding: iconv conversion, such as converting a UTF-8-encoded file to GBK encoding

Iconv-f GBK-t UTF-8 file1-o file2

Iconv-f GB2312-t UTF-8 test.txt-o test2.txt

Download address:

Ftp://ftp.gnu.org/pub/gnu/libiconv/libiconv-1.8.tar.gz

4.enconv converts file encoding, for example, to convert a GBK-encoded file to UTF-8 encoding, as follows

Enconv-L zh_CN-x UTF-8 filename

Enconv-L GB2312-x UTF-8 test.txt

5.enca (if you do not have this command installed on your system, you can install it with sudo yum install-y enca) to view the file encodings

$enca filename

Filename: Universal transformation format 8 bits; UTF-8CRLF line terminators

It is important to note that enca does not recognize some GBK-encoded files very well, and it will appear during recognition:

Unrecognized encoding

6.convmv file name transcoding: copy files from Linux to windows or from windows to Linux

Sometimes the Chinese file name is garbled, and the reason for this problem is because

The Chinese coding of the file name of windows defaults to GBK, while the default file name of Linux is UTF8, because the encoding is inconsistent.

Therefore, it leads to the problem of garbled file name, and it is necessary to transcode the file name to solve this problem.

A special tool convmv is provided in Linux to convert file name encoding.

You can convert the file name from GBK to UTF-8 encoding, or from UTF-8 to GBK.

Yum-y install convmv

Here's a look at the specific use of convmv:

Convmv-f Source Encoding-t New Encoding [option] File name

Common parameters:

-r Recursive processing subfolders

-- notest actually operates, please note that by default, files are not actually manipulated, but only experimented with.

-list displays all supported encodings

-- unescap can do some escaping, such as turning% 20 into a space

For example, we have a file name encoded by utf8, which is converted to GBK encoding. The command is as follows:

Convmv-f UTF-8-t GBK-- notest utf8 encoded file name

In this way, after the conversion, the "utf8-encoded file name" will be converted to GBK encoding (only the conversion of the file name encoding, the file content will not change)

7. Unix2dosdos2unix conversion: use od-c-t x1 abc.txt to view special characters in the text file. DOS/Windows uses / rPlan as the end-of-line character, and Unix uses / n as the end-of-line character:

Unix2dos

< unix.txt >

Dos.txt converts plain text files in Unix format into plain text files in DOS/Windows format

Dos2unix

< dos.txt >

Unix.txt converts plain text files in DOS/Windows format into plain text files in Unix format

If you edit in openoffice, it is fully compatible. If you have symbols such as / M in vi, you can filter them out using tr or sed tools.

Text that wraps normally under Linux will no longer wrap when it comes to Windows.

When you wrap a line under Windows, there are two characters: carriage return (/ r) and line feed (/ n). But under Linux, there is only one newline (/ n)

Format conversion can be done using the unix2dos and dos2unix commands:

Parameters:

-k keep the date and time stamps of the output and input files unchanged

-o file default mode. Convert file and export it to file

-the new mode of n infile outfile. Convert infile and export to outfile

1. Unix2dos assumes that a new text file is created with vi, and type 123456 [root@centos test] # ls-l a.txt-rw-r--r-- 1 root root 7 Jan 7 21:31 a.txt [root@centos test] # hexdump-c a.txt 0000000 123456 / n 0000007 [root@centos test] # unix2dos-n a.txt b.txt unix2dos: converting file a.txt to file b.txt in DOS format. [root@centos test] # ls-l total 8-rw-r--r-- 1 root root 7 Jan 7 21:31 a.txt-rw- 1 root root 8 Jan 7 21:34 b.txt [root@centos test] # hexdump-c a.txt 0000000 1 234 56 / n 0000007 [root@centos test] # hexdump-c b.txt 0000000 1 2 34 56 / r / n 0000008 b.txt is the file under the converted DOS 2. Dos2unix [root @ centos test] # dos2unix-n b.txt c.txt dos2unix: converting file b.txt to file c.txt in UNIX format. [root@centos test] # ls-l total 12-rw-r--r-- 1 root root 7 Jan 7 21:31 a.txt-rw- 1 root root 8 Jan 7 21:34 b.txt-rw- 1 root root 7 Jan 7 21:38 c.txt [root@centos test] # hexdump-c b.txt 0000000 12 34 56 / r / n 0000008 [root@centos test] # hexdump-c c.txt 0000000 12 34 56 6 / n 0000007 c.txt is a text file under the converted unix

Od command users usually use the od command to view the contents of a file in a special format. You can display files in decimal, octal, hexadecimal, and ASCII codes by specifying different options for this command. Syntax: od [options] file... The meaning of the options in the command:-A specifies the address cardinality, including: d decimal o octal (system default) x hexadecimal n do not print displacement value-t specifies the display format of the data The main parameters are: C ASCII character or backslash sequence d signed decimal number f floating point number o octal (system default is 02) u unsigned decimal number x hexadecimal number except option c can be followed by a decimal number n to specify the number of bytes contained in each display value. Description: the default display mode of the od command system is octal, which is the origin of the name of the command (Octal Dump). But this is not the most useful way to display, the combination of ASCII code and hexadecimal can provide more valuable information output.

Od and hexdump display octal, hexadecimal, or other encoded bytes of the file content or stream. They are useful for accessing or visually checking for characters in a file that cannot be displayed directly on the terminal. S-w8 displays only 8 bytes per line: [tim@L gx] $od-Ad-tax1-w8 a.txt 0000000 1 2 34 5 6 cr nl 31 32 33 34 35 360 d 0a 0000008 a b c d e f cr nl 61 62 63 64 65 66 0d 0a 0000016 h e l l o, w o 68 65 6c 6f 2c 77 6f 0000024 r l d cr nl 72 6c 64 0d 0a-j2 characters output test content Skip the first two bytes: [tim@L gx] $od-Ad-tax1-j2 a.txt 0000002 34 5 6 cr nl a b c d e f cr nl 33 34 35 36 0d 0a 61 62 63 64 65 66 0d 0a 6 0000018 l l o, w o r l d cr nl 6c 6c 6f 2c 77 6f 72 6c 64 0d 0a 0000029-N2 displays only two bytes and displays in character form: [tim@L gx] $od-Ad-tax1-N2 a.txt 0000000 1 2 31 32 S

Use the wc command to improve the text content statistics: instruction name: wc method: wc [clw] file clearly: according to different options to calculate the number of words, words, lines, and so on. Please check the practical example by yourself 'man wc'. Example: count the number of files under the previous directory and use the command ls-l | wc-l ps: the number of parameters for this instruction is less than the previous one. Someone has used C language to implement the function of wc. You can also try it.

Use the sort command to sort the text content: instruction name: sort method: sort [- bcdfimMnr] [- o] [- t] [+ -] [--help] [--verison] [file] option solution: (for more information, please man sort yourself)-n: sort by number, number-r: sort in descending order-u: remove duplicates

Use the uniq command to view and delete the recopied columns of the text: instruction name: uniq syntax: uniq [option] file indicates that some of the characteristics of the lines in the text are shown. Option explanation: (for more information, please man uniq yourself)-c: add the number of occurrences of the line at the beginning of the line, the number of count writes. -d: only show duplicated lines-u: lines that do not repeat

Use the diff command to compare the text: instruction name: diff method: diff [option] file1 file2 states that diff performs the same operation on two files line by line. Optional explanation: (for more information, please man diff)-I: ignore the difference between uppercase and lowercase-b: ignore the difference between spaces

Use the du command to count the magnetic space occupied by directories or files: instruction name: du method: du [options] Directory or file options solution: (for more information, please man du)-k/m/g: size in kb, mb, gb du-S | sort-n list the directories that occupy the largest space-sh: only view specified directories, not subdirectories

Use the cut command to extract the desired data: instruction name: cut method: cut [option] file usage:-b: intercept the word-c: intercept the character cut-C1-15 means to intercept the content from column 1 to column 15, cut-C1-4, 8-to intercept content from column 1 to column 4, and content from column 8 to column 4-f: intercept the field cut-F1-content from column 1 to column 1 The cut-F1-- s that is intercepted as a delimiter indicates that the delimiter in the intercept is the Tab-separated text ps: when intercepting Chinese, you should pay attention to that the Chinese characters are made up of two English characters.

Use the dd command to test magnetic speed and create new files: instruction name: dd directive states: from the specified location copy data to the specified export location application practice: bs specifies the size of each operation, count specifies the number of operations to build 2m-sized files. # dd if=/dev/zero of=/home/test/2M.txt bs=1024 count=2048 is the same as testing the magnetic replication speed # dd if=/dev/zero of=/home/rwspeed.ret bs=1024 count=1048576 replication system # dd if=/home/test/my_fiter of=/ there is also a command to create files of a specified size on the bs=512 count=256 ps:windows plane of other appliances, which is fsutil.

Use the nice command to verify the priority of program execution: instruction name: nice instruction clearly states that the first level of the integration process applies: the priority level of the Linux process is from-20 to + 20. The smaller the number, the higher the priority, that is, the more time it takes up CPU. General use can only lower the priority of the program, while root can increase / lower the priority of the process. # nice check the default priority # nice. / a.out default execution, and add 10 priority levels to a.out, that is, allocate less cpu time. # nice-n-20 a.out is the highest priority Unix/Linux for a.out. There are a lot of commands above, which is the wisdom of many people and programmers all over the world. Proficient in mastering and using the commands provided by the system will often get half the result with twice the effort. Only a few of them are listed here. For other commands, you can refer to the introduction of the website, or find the profile of this website.

Iconv,enconv,enca,convmv,unix2dos,dos2unix file format conversion, od/cut/wc/dd/diff/uniq/nice/du and other commands