In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
This article will explain in detail how to use the synchronization tool inotify+rsync under Linux. The content of the article is of high quality, so the editor will share it with you for reference. I hope you will have some understanding of the relevant knowledge after reading this article.
The synchronization tool inotify+rsync under Linux uses 1. Rsync1.1 what is rsync
Rsync is a remote data synchronization tool that allows you to quickly synchronize files between multiple hosts through LAN/WAN. It uses the so-called "Rsync algorithm" to synchronize files between local and remote hosts, which transfers only different parts of the two files instead of the whole one at a time, so it is quite fast. So it can usually be used as a backup tool.
The machine running Rsync server is also called backup server, and one Rsync server can back up the data of multiple client at the same time, or multiple Rsync server can back up the data of one client. Rsync can be paired with ssh or even daemon mode. Rsync server will open a service channel (port) of 873 and wait for the other party to connect with rsync. When connecting, Rsync server checks whether the password matches, and if checked by the password, you can start the file transfer. When the first connection is complete, the entire document will be transferred once, and the next time only the different parts between the two files will be transferred.
Basic features:
You can mirror and save the entire directory tree and file system
It is easy to maintain the permissions, time, soft and hard links of the original file, etc.
Can be installed without special permission
Optimized process, high efficiency of file transfer
You can use rcp, ssh, etc., to transfer files, of course, through a direct socket connection
Anonymous transmission is supported.
Command syntax:
Rsync commands can be in the following six formats:
Rsync [OPTION]... SRC DEST
Rsync [OPTION]... SRC [USER@] HOST:DEST
Rsync [OPTION]... [USER@] HOST:SRC DEST
Rsync [OPTION]... [USER@] HOST::SRC DEST
Rsync [OPTION]... SRC [USER@] HOST::DEST
Rsync [OPTION]... Rsync:// [USER@] HOST [: PORT] / SRC [DEST]
Corresponding to the above six command formats, we can summarize that rsync has two different modes of operation:
Shell mode: connect using a remote shell program such as ssh or rsh. Using this mode when the hostname of the source or destination path is followed by a colon delimiter, you can use it directly after the rsync installation is complete, regardless of startup. (this method has not been tried so far.)
Daemon mode: connect to rsync daemon directly using TCP. This mode is used when the hostname of the source or destination path is followed by two colons, or when using rsync://URL, there is no need for remote shell, but rsync daemon must be started on a machine, default port 873, where you can manage rsync background processes through rsync-daemon using stand-alone processes or through xinetd super processes.
When rsync runs as daemon, it requires a user identity. If you want to enable chroot, you must run daemon as root, listen on the port, or set the file owner; if you do not enable chroot, you can run daemon without using the root user, but the user must have access to read and write data, logs and lock file to the corresponding module. When rsync is running in daemon mode, it also needs a configuration file-- rsyncd.conf. It is not necessary to restart rsync daemon after modifying this configuration, because each client connection will reread the file.
Generally speaking, the remote server side of DEST is called rsync Server, and the SRC that runs the rsync command is called Client.
Installation:
Rsync is installed by default on CentOS6, if not, you can use yum install rsync-y, the server side and client side are the same installation package.
Synchronous rsync-h1.2 test
For many descriptions of the options for the rsync command, see another article, rsync and inotifywait commands and configuration options.
1.2.1 Native folder synchronization "rsync-auvrtzopgP-- progress / root/ / tmp/rsync_bak/
You will see the list and rate of files transferred from / root/ to / tmp/rsync_bak/. If you run it again, you will see that there is no copy under sending incremental file list. You can touch a file under / root/ to see that only the modified files have been synchronized.
The following issues need to be considered above:
Deleting files under / root/ will not delete / tmp/rsync_bak synchronously unless you add the-- delete option
Any changes in attributes such as file access time, read and write permissions, file contents, etc., will be considered modified.
If the file in the destination directory is newer than the source directory, it will not be synchronized
Whether there is a slash at the end of the source path has a different meaning: if there is a slash, just copy the files in the directory; if there is no slash, copy not only the files in the directory, but also the directory itself
1.3 synchronize to a remote server
When rsync transfers files between servers, you need to have a service with rsync on, and this service requires two configuration files indicating the currently running user name and user group, which are useful when changing file permissions and related content, otherwise permission issues will sometimes occur. The configuration file also describes the security of modules and modular management services. The name of each module is defined by itself. You can add user name and password verification, verify IP, set whether the directory is writable, and so on. Different modules are used to synchronize directories with different requirements.
1.3.1 Server profile
/ etc/rsyncd.conf:
12345678910111213141516171819202122232014-12-11 by Seanuid=rootgid=rootuse chroot=nomax connections=10timeout=600strict modes=yesport=873pid file=/var/run/rsyncd.pidlock file=/var/run/rsyncd.locklog file=/var/log/ rsyncd.log [module _ test] path=/tmp/rsync_bak2comment=rsync test logsauth users=seanuid=seangid=seansecrets file=/etc/rsyncd.secretsread only=nolist=nohosts allow=172.29.88.204hosts deny=0.0.0.0/32
Here configure socket to transfer files, port 873, [module_test] start to define a module, specify the directory to be synchronized (receive) path, authorized users, password files, which server IP synchronization (send) and so on. For a detailed description of the options in the configuration file, please refer to the rsync and inotifywait commands and configuration option descriptions.
After testing, the above configuration file cannot be commented with # at the end of each line.
/ etc/rsyncd.secrets:
1sean:passw0rd
One user per line, user name: password. Note that the user name and password here have nothing to do with the user name and password of the operating system and can be specified at will, corresponding to the auth users in / etc/rsyncd.conf.
Modify permission: chmod 600 / etc/rsyncd.d/rsync_server.pwd.
1.3.2 the server starts the rsync background service
Modify the / etc/xinetd.d/rsync file to change disable to no
123456789101112131 default: off# description: The rsync server isa good addition to an ftp server, as it\ # allows crc checksumming etc.service rsync {4disable = no4flags = IPv64socket_type = no4user = root4server = / usr/bin/rsync4server_args =-- daemon4log_on_failure + = USERID}
Executing service xinetd restart restarts the rsync background process together, using the configuration file / etc/rsyncd.conf by default. You can also use / usr/bin/rsync-- daemon-- config=/etc/rsyncd.conf.
To prevent rsync from writing too many useless logs to / var/log/message (which is easy to fill up and miss important information), it is recommended to comment out the success of / etc/xinetd.conf:
"log_on_success = PID HOST DURATION EXIT
If you use a firewall, add a rule that allows IP to port 873.
123A INPUT-p tcp-m state-- state NEW-m tcp-- dport 873j ACCEPT# iptables-L check if the firewall is open # netstat-anp | grep 873
It is recommended to turn off selinux, which may cause synchronous errors due to strong access control.
1.3.3 client test synchronization
In one-way synchronization, the client only needs a file that contains the password.
/ etc/rsync_client.pwd:
1passw0rd
Chmod 600 / etc/rsync_client.pwd
Command:
Synchronize the local / root/ directory to the remote / tmp/rsync_bak2 directory of 172.29.88.223 (specified by module_test):
1/usr/bin/rsync-auvrtzopgP-progress-password-file=/etc/rsync_client.pwd / root/ sean@172.29.88.223::module_test
Of course, you can also synchronize the remote / tmp/rsync_bak2 directory to the local directory / root/tmp:
1/usr/bin/rsync-auvrtzopgP-progress-password-file=/etc/rsync_client.pwd sean@172.29.88.223::module_test / root/
From the above two commands can see, in fact, the concept of the server and client here is very vague, rsync daemon are running on the remote 172.29.88.223, the first command is to actively push the directory to the remote, the remote server is used for backup; the second command is the local initiative to request files remotely, the local server for backup, can also be considered as a process of local server recovery.
1.4 insufficient rsync
Compared with traditional cp and tar backup methods, rsync has the advantages of high security, rapid backup and supporting incremental backup. Rsync can solve the data backup requirements that are not high in real-time, such as regularly backing up file server data to remote servers, regularly mirroring the local disk and so on.
With the continuous expansion of the scale of the application system, better requirements for the security and reliability of data are also put forward. Rsync has gradually exposed many shortcomings in the high-end business system. First of all, when rsync synchronizes data, it needs to scan all files and compare them for differential transmission. If the number of files reaches the order of millions or even tens of millions, scanning all files will be very time-consuming. And what is changing is often a small part of it, which is a very inefficient way. Secondly, rsync can not monitor and synchronize data in real time, although it can trigger synchronization through crontab, but there must be a time difference between the two triggers, which may lead to inconsistency between server and client data and can not fully recover data in the event of application failure. Based on the above reasons, the rsync+inotify combination appeared!
2. Inotify-tools2.1 what is inotify
Inotify is a powerful, fine-grained, asynchronous file system event monitoring mechanism. The Linux kernel was introduced since 2.6.13, allowing the monitor to open a separate file descriptor and monitor one or more files for the event set, such as open, close, move / rename, delete, create, or change properties.
CentOS6 naturally already supports:
Use the ll / proc/sys/fs/inotify command, whether there are the following three pieces of information output, if not indicates that it is not supported.
1234total 0 root root 0 Dec 11 15:23 max_queued_events-rw-r--r-- 1 root root 0 Dec 11 15:23 max_user_instances-rw-r--r-- 1 root root 0 Dec 11 15:23 max_user_watches
/ proc/sys/fs/inotify/max_queued_evnets represents the maximum number of event that can be queued in the inotify instance when inotify_init is called. Events beyond this value are discarded, but the IN_Q_OVERFLOW event is triggered.
/ proc/sys/fs/inotify/max_user_instances represents the upper limit of the number of inotify instatnces that can be created per real user ID.
/ proc/sys/fs/inotify/max_user_watches represents the maximum number of directories that can be monitored per inotify instatnces. If the number of files being monitored is large, you need to increase the size of this value as appropriate.
Inotify-tools:
Inotify-tools is a set of C development interface library functions for inotify file monitoring tools under linux, as well as a series of command-line tools that can be used to monitor file system events. Inotify-tools is written in c and does not depend on other than requiring the kernel to support inotify. Inotify-tools provides two tools, one is inotifywait, which is used to monitor changes in files or directories, and the other is inotifywatch, which is used to count the number of file system visits.
Download inotify-tools-3.14-1.el6.x86_64.rpm and install it through the rpm package:
12345 rpm-ivh / apps/crm/soft_src/inotify-tools-3.14-1.el6.x86_64.rpm warning: / apps/crm/soft_src/inotify-tools-3.14-1.el6.x86_64.rpm: Header V3 DSA/SHA1 Signature, key ID 4026433f: NOKEYPreparing... # 1:inotify-tools # rpm-qa | use example of grep inotifyinotify-tools-3.14-1.el5.x86_642.2 inotifywait
Monitor for changes in the / root/tmp directory file:
12/usr/bin/inotifywait-mrq-- timefmt'% Y charger% m modify,delete,create,move,attrib% dmi% HV% MVA% S'-- format'% T% w% f'\-e modify,delete,create,move,attrib / root/tmp/
The above command indicates that the / root/tmp directory and its subdirectories are continuously monitored for file changes, including file modification, deletion, creation, movement, property changes, and display to the screen. After executing the above command, creating or modifying a file under / root/tmp will have information output:
1234567891011121314152014 root/tmp/ 11-15:40:04 / root/tmp/ new.txt2014/12/11-15:40:22 / root/tmp/ .new.txt.swp2014 / 12 Character11-15:40:22 / root/tmp/ .new.txt.swx2014 / 12 Uniqare 11-15:40:22 / root/tmp/ .new.txt.swx2014 / 12 Character11-15:40:22 / root/tmp/ .new.txt.swp2014 / 12 Character11-15:40:22 / root/ Tmp/ .new.txt.swp2014 / 12 tmp/ 11-15:40:23 / root/tmp/ .new.txt.swp2014 / 12 11-15:40:31 / root/tmp/ .new.txt.swp2014 / 12 root/tmp/ new.txt2014/ 11-15:40:32 / root/tmp/ 49132014 12 ash 11-15:40:32 / root/tmp/ new.txt2014/ 12 take 11-15:40:32 / root/tmp/ new.txt~2014/12/11-15:40:32 / root/tmp/ new.txt...3. Rsync combines inotify-tools to complete real-time synchronization
The core of this step is to create a script rsync.sh on the client side, which is suitable for inotifywait to monitor changes in the local directory and trigger rsync to transfer the changed files to the remote backup server. To get closer to the actual situation, we require some subdirectories to be out of sync, such as / root/tmp/log and temporary files.
3.1 create a list of files that are excluded from synchronization
There are two ways to exclude files or directories that do not need to be synchronized, the first is that inotify monitors the entire directory and adds an exclusion option in rsync, which is simple; the second is that inotify excludes some directories that are not monitored, and an exclusion option is also added to rsync, which can reduce unnecessary network bandwidth and CPU consumption. We choose the second one.
3.1.1 inotifywait exclusion
This operation is done on the client side, assuming that all files in the / tmp/src/mail/2014/ and / tmp/src/mail/2015/cache/ directories do not need to be synchronized, so there is no need to monitor, and other files and directories under / tmp/src/ are synchronized. (in fact, for open temporary files, you can listen to close_write instead of listening to modify time.)
Inotifywait excludes that monitoring directories have two formats-- exclude and-- fromfile, and can be used at the same time, but the former can mainly use regular, while the latter can only be specific directories or files.
123# vi / etc/inotify_exclude.lst:/tmp/src/pdf@/tmp/src/2014
You can only use absolute paths when using fromfile format, and you cannot use regular expressions such as * to match, and @ indicates exclusion.
If the format you want to exclude is complex and regular must be used, you can only add options to inotifywait, such as-- exclude'(. * / *\ .log |. * / *\ .swp) $| ^ / tmp/src/mail/ (2014 | 201.exclusion.cache.*)', which means excluding 2014 directories below / tmp/src/mail/, and all files or directories with cache in the 201* directory, and all files ending in .log or .swp in the / tmp/src directory.
3.1.2 rsync exclusion
If you use inotifywait to exclude monitoring directories, you must also use rsync to exclude the corresponding directories, otherwise, as long as synchronization is triggered, directories that should not be synchronized will also be synchronized. Similar to inotifywait, rsync synchronization can be written in two ways: exclude and-- exclude-from.
Individuals are still used to removing synchronized directories from a separate list of files, which is easy to manage. When using-- include-from=FILE, the absolute path is used to exclude the file list, but the relative path is used for the contents of the FILE, such as:
/ etc/rsyncd.d/rsync_exclude.lst:
1234567891011mail/2014/mail/201*/201*/201*/.??*mail??*src/*.html*src/js/src/ext3/src/2014/20140 [1-9] / src/201*/201*/201*/.??*membermail/membermail??*membermail/201*/201*/201*/.??*
The contents that exclude synchronization include the 2014 directory under mail, similar to the temporary or hidden files under 20150101 / 2015, and so on.
3.2 client synchronization to remote script rsync.sh
The following is a complete synchronization script, please tailor it as needed, rsync.sh:
123456789101112131415161718192021222324252627282930313233#rsync auto sync script with inotify#2014-12-11 Sean#variablescurrent_date=$ (date +% Y%m%d_%H%M%S) source_path=/tmp/src/log_file=/var/log/rsync_client.log#rsyncrsync_server=172.29.88.223rsync_user=seanrsync_pwd=/etc/rsync_client.pwdrsync_module=module_testINOTIFY_EXCLUDE=' (. * / *\ .log |. * / *\ .swp) $| ^ / tmp/src/mail/ ) 'RSYNC_EXCLUDE='/etc/rsyncd.d/rsync_exclude.lst'#rsync client pwd checkif [!-e ${rsync_pwd}] Thenecho-e "rsync client passwod file ${rsync_pwd} does not exist!" exit 0fi#inotify_functioninotify_fun () {/ usr/bin/inotifywait-mrq-timefmt'% Yamp% m modify,delete,create,move% DFO% HGV% MVA% S'-format'% T% w% f'\-- exclude ${INOTIFY_EXCLUDE}-e modify,delete,create,move Attrib ${source_path}\ | while read filedo/usr/bin/rsync-auvrtzopgP-- exclude-from=$ {RSYNC_EXCLUDE}-- progress-- bwlimit=200-- password-file=$ {rsync_pwd} ${source_path} ${rsync_user} @ ${rsync_server}: ${rsync_module} done} # inotify loginotify_fun > > ${log_file} 2 > & 1 &
-- bwlimit=200 is used to limit the maximum 200kb of the transmission rate, because in practical applications, it is found that if there is no rate limit, it will lead to huge CPU consumption.
Run the script #. / rsync.sh on the client side to synchronize the directory in real time.
Doubt
There is a doubt about the massive synchronization of rsync. If I have a large number of files, even excluding unmonitored and asynchronous directories, there are still 100000 files, and the file list alone reaches 10m. Then every time a file is generated or modified, synchronization will be triggered, which can easily lead to the transfer of file list and list comparison in most cases. The network bandwidth and CPU used to synchronize only a small file is very expensive, especially when the network condition is poor, the last list has not yet been transferred, and a new file is generated to trigger the sending file list. I wonder if there is such a treatment within rsync?
Other functions: bi-directional synchronization, sersync2 real-time synchronization of multiple remote servers
On the use of synchronization tool inotify+rsync under Linux to share here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.