How to achieve High availability of distributed File system FastDFS 04/28 Update SLTechnology News&Howtos

How to achieve High availability of distributed File system FastDFS

2025-04-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

FastDFS is an open source lightweight distributed file system written in C language. It manages files, and its functions include file storage, file synchronization, file access (file upload, file download), etc., and solves the problems of mass storage and load balancing. It is especially suitable for online services with files as carriers, such as photo album websites, video websites and so on.

FastDFS is tailored for the Internet, taking full account of redundant backup, load balancing, linear expansion and other mechanisms, and pays attention to high availability, high performance and other indicators. Using FastDFS, it is easy to build a high-performance file server cluster to provide file upload, download and other services.

What's the difference with Hadoop?

Hadoop is also a distributed file system. Hadoop deals with big data. What is big data? It's a lot of data. If you can't store a large amount of data on one disk, you need to store the data on multiple disks and manage it uniformly, and you need a distributed file system to manage it. FastDFS also means that we have a lot of pictures, but the capacity is limited, so we need to store all these pictures on multiple servers and manage them uniformly, so we need a distributed file system, which is obviously FastDFS. FastDFS is suitable for accessing images (suggested scope: 4KB

< file_size [storage server list]的映射表，Tracker需要管理的元信息很少，会全部存储在内存中；另外tracker上的元信息都是由storage汇报的信息生成的，本身不需要持久化任何数据，这样使得tracker非常容易扩展，直接增加tracker机器即可扩展为tracker cluster来服务，cluster里每个tracker之间是完全对等的，所有的tracker都接受stroage的心跳信息，生成元数据信息来提供读写服务。　　Storage server作用是文件存储，客户端上传的文件最终存储在Storage服务器上，Storage server没有实现自己的文件系统而是利用操作系统的文件系统来管理文件，可以将storage称为存储服务器。存储系统由一个或多个组组成，组与组之间的文件是相互独立的，所有组的文件容量累加就是整个存储系统中的文件容量。一个卷[Volume]（组[group]）可以由一台或多台存储服务器组成，一个组中的存储服务器中的文件都是相同的，组中的多台存储服务器起到了冗余备份和负载均衡的作用，数据互为备份，存储空间以group内容量最小的storage为准，所以建议group内的多个storage尽量配置相同，以免造成存储空间的浪费。　　我们从上图还能看到，Client端可以有多个，也就是同时支持多个客户端对FastDFS集群服务进行访问，Tracker是跟踪器，负责协调Client与Storage之间的交互，为了实现高可用性，需要用多个Tracker来作为跟踪器。Storage是专门用来存储东西的，而且是分组进行存储的，每一组可以有多台设备，这几台设备存储的内容完全一致，这样做也是为了高可用性，当现有分组容量不够时，我们可以水平扩容，即增加分组来达到扩容的目的。另外需要注意的一点是，如果一组中的设备容量大小不一致，比如设备A容量是80G，设备B的容量是100G，那么这两台设备所在的组的容量会以小的容量为准，也就是说，当存储的东西大小超过80G时，我们将无法存储到该组中了。Client端在与Storage进行交互的时候也与Tracker cluster进行交互，说的通俗点就是Storage向Tracker cluster进行汇报登记，告诉Tracker现在自己哪些位置还空闲，剩余空间是多大。文件上传的流程　　现给出一张文件上传的时序图，如下图所示：　　从中可以看到，Client想上传图片，它先向Tracker进行询问，Tracker查看一下登记信息之后，告诉Client哪个storage当前空闲，Tracker会把IP和端口号都返回给Client，Client在拿到IP和端口号之后，便不再需要通过Tracker，直接便向Storage进行上传图片，Storage在保存图片的同时，会向Tracker进行汇报，告诉Tracker它当前是否还留有剩余空间，以及剩余空间大小。汇报完之后，Storage将服务器上存储图片的地址返回给Client，Client可以拿着这个地址进行访问图片。说得更加细致一点，客户端上传文件后存储服务器将文件ID返回给客户端，此文件ID用于以后访问该文件的索引信息。文件索引信息包括：组名，虚拟磁盘路径，数据两级目录，文件名，如下所示：组名：文件上传后所在的storage组名称，在文件上传成功后由storage服务器返回，需要客户端自行保存。虚拟磁盘路径：storage配置的虚拟路径，与磁盘选项store_path*对应。如果配置了store_path0则是M00，如果配置了store_path2则是M01，以此类推。数据两级目录：storage服务器在每个虚拟磁盘路径下创建的两级目录，用于存储数据文件。文件名：与文件上传时不同。是由存储服务器根据特定信息生成，文件名包含：源存储服务器IP地址、文件创建时间戳、文件大小、随机数和文件拓展名等信息。文件下载的流程　　现给出一张文件下载的时序图，如下图所示：文件下载的步骤可以是： 1. client询问tracker下载文件的storage，参数为文件标识（组名和文件名）。 2. tracker返回一台可用的storage。 3. client直接和storage通讯完成文件下载。搭建FastDFS安装组件 nginx+FastDFS+fastdfs-nginx-module 布署结构：tracker：storage0：192.168.80.32storage1：192.168.80.30storage2：192.168.80.31nginx 放在192.168.80.32 下载下列最新的安装包： fastdfs-master.zip：FastDFS源码 libfastcommon-master.zip：（从 FastDFS 和 FastDHT 中提取出来的公共 C 函数库） fastdfs-nginx-module-master.zip：storage节点http服务nginx模块 nginx.tar.gz：Nginx安装包防火墙防火墙中打开tracker服务器端口（默认为 22122） shell>

Vi / etc/sysconfig/iptables

Add the following port line:

-An INPUT-m state-- state NEW-m tcp-p tcp-- dport 22122-j ACCEPT

The storage server needs to add a port

-An INPUT-m state-- state NEW-m tcp-p tcp-- dport 23000-j ACCEPT

Restart the firewall:

Shell > service iptables restart

All servers first install libevent toolkit yum-y install libevent all servers install libfastcommon toolkit

Upload fastdfs-master.zip and libfastcommon-master.zip to the / opt folder on all servers first

1. Decompress unzip libfastcommon-master.zip

Compilation and installation

2. Cd libfastcommon-master

3.. / make.sh

4.. / make.sh install

5. Copy the / usr/lib64/libfastcommon.so file to / usr/lib/ (32-bit only)

Install the Tracker service

Decompress unzip fastdfs-master.zip

Compile and install cd fastdfs-master. / make.sh./make.sh install

Files that begin with fdfs in the / usr/bin/ directory after installation are compiled.

The configuration files are all placed in the / etc/fdfs folder

Copy all the configuration files in the / opt/fastdfs-master/conf directory to / etc/fdfs.

Cp-r / opt/fastdfs-master/conf/* / etc/fdfs

Create file storage path mkdir-p / data/fastdfs/tracker # create tracker file storage path (required by tracker server) mkdir-p / data/fastdfs/storage # create storage file storage path mkdir-p / data/fastdfs/client # create client file storage path configure tracker service

Modify the / etc/fdfs/conf/tracker.conf file.

Vim / etc/fdfs/tracker.conf # Editing the tracker configuration file bind_addr= # binds the IP address of the IP binding (often used when the server has multiple IP but only one IP is expected to provide services). If it is left empty, it means all (OK is generally left empty). It is believed that the more skilled SA is commonly used in similar functions. Many systems and applications have port=22122 # tracker service port base_path=/data/fastdfs/tracker # directory address, in which data (store server information), logs, log files (root directory must exist, subdirectories will be created automatically)

Other content is default. For configuration instructions, please refer to the supplementary content at the end of the article.

Configure the storage service

Configure the storage service on two storage servers, and the previous topic is also to install libfastcommon and fastdfs.

Modify the / etc/fdfs/conf/storage.conf file.

Group_name=group1 # Storage group name client_bind=true # resolve the host address port=23000 # storage port 23000base_path=/data/fastdfs/storage # basic storage data and log files store_path0=/data/fastdfs/storage # group directory or hard disk when connecting to other servers, there are several write several tracker_server=192.168.80.32:22122 # specify the tracker1 server

Other content is default. For configuration instructions, please refer to the supplementary content at the end of the article.

Edit only the client.conf on the tracker server

Vim / etc/fdfs/client.confbase_path=/data/fastdfs/client # basic data and log files tracker_server=192.168.80.32:22122 # tracker1 server

These are all the configurations for fastdfs. Let's start tracker on service 32, storage on service 32, and storage on service 30 and 31.

Start tracker/usr/bin/fdfs_trackerd / etc/fdfs/tracker.conf

Restart the use command:

/ usr/bin/fdfs_trackerd / etc/fdfs/tracker.conf restart

Check to see if FastDFS Tracker Server starts successfully:

Ps-ef | grep fdfs_trackerd

Stop:

/ etc/init.d/fdfs_trackerd stop

Set the tracker service to boot

Chkconfig fdfs_trakcerd on starts storage service / usr/bin/fdfs_storaged / etc/fdfs/storage.conf restart stops storage server / etc/init.d/fdfs_storaged stop sets up storage service boot starts chkconfig fdfs_storaged on test service / usr/bin/fdfs_test / etc/fdfs/client.conf upload / etc/fdfs/anti-steal.jpg

The above returned content indicates that it has been uploaded successfully, but direct access is not allowed when we access url on the browser, so we need to use the following nginx proxy for web service access.

Set up nginx to provide http service

Function description of fastdfs-nginx-module

FastDFS stores files on the Storage server through the Tracker server, but file replication is required between the storage servers in the same group, so there is a problem of synchronization delay.

If the Tracker server uploads the file to 192.168.80.30, the file ID has been returned to the client after the upload is successful. At this point, the FastDFS storage cluster mechanism will synchronize this file to the same group storage 192.168.80.31. If the file has not been copied yet, if the client uses this file ID to fetch the file on 192.168.80.31, an error will occur that the file cannot be accessed.

On the other hand, fastdfs-nginx-module can redirect the file to connect to the source server to retrieve the file, which avoids the file inaccessible error caused by the replication delay on the client.

You can use the official nginx plug-in. To use the nginx plug-in, you need to recompile.

Change fastdfs-nginx-module configuration

Upload fastdfs-nginx-module.tar.gz

1. Extract the plug-in package

two。 Modify the / opt/fastdfs-nginx-module/src/config file to remove the local from it. (the latest one does not have local)

Copy mod_fastdfs.conf to fdfs for configuration

Cd fastdfs-nginx-module/srccp mod_fastdfs.conf / etc/fdfs/ # copy mod_fastdfs.conf to fdfs for configuration vim / etc/fdfs/mod_fastdfs.confbase_path=/tmp # Log storage path tracker_server=192.168.80.32:22122 # configure to tracker server address and port storage_server_port=23000url_have_group_name = true # whether the package group name store_path0=/data/fastdfs/storage # file storage path is consistent with storage group_name=group1 compilation and installation Nginx

Install dependency packages

Yum-y install zlib zlib-devel openssl openssl--devel pcre pcre-devel install nginx

Upload and decompress nginx

Cd nginx

Re-config./configure nginx\-- prefix=/opt/nginx\-- pid-path=/opt/nginx/nginx.pid\-- lock-path=/opt/nginx.lock\-- error-log-path=/opt/nginx/log/error.log\-- http-log-path=/opt/nginx/log/access.log\-- with-http_gzip_static_module\-- http-client-body-temp-path=/opt/nginx/client\-- http-proxy-temp-path=/opt / nginx/proxy\-http-fastcgi-temp-path=/opt/nginx/fastcgi\-http-uwsgi-temp-path=/opt/nginx/uwsgi\-http-scgi-temp-path=/opt/nginx/scgi\-add-module=/opt/fastdfs-nginx-module/src

Proceed again

. / make.sh & &. / make.sh install

Configure nginx

Vim / opt/nginx/conf/nginx.conf

Add a Server:server {listen 8888; server_name 192.168.80.32; location / group1/M00/ {# root / home/FastDFS/fdfs_storage/data; ngx_fastdfs_module;}} to the configuration file of nginx

However, the above content has been fixed. If there are multiple group corresponding to the M00 Magi Storage of group1, the access path with a group name, such as / group1/M00/00/00/xxx, will be used here instead:

Location ~ / group ([0-9]) / M00 {

Ngx_fastdfs_module

}

Note:

The port value of 8888 should correspond to the http.server_port=8888 in / etc/fdfs/storage.conf, because http.server_port defaults to 8888. If you want to change it to 80, you have to change it accordingly.

If you find the old newspaper 404 during download, change the first line of nginx.conf from user nobody to user root and restart it.

Open port 8888 of Nginx in the firewall

Vi / etc/sysconfig/iptables

Add:

-An INPUT-m state-- state NEW-m tcp-p tcp-- dport 8888-j ACCEPT

Restart the firewall:

Shell > service iptables restart

Boot self-start

That is, add the startup code to rc.local.

Vi / etc/rc.local

Add one line / opt/nginx/sbin/nginx

Set execution permissions:

Chmod 755 rc.local

Start Nginx

Shell > / opt/nginx/sbin/nginx

Ngx_http_fastdfs_set pid=xxx

The nginx restart command is:

/ opt/nginx/sbin/nginx-s reload

Access test pictures

Access the file uploaded during the test through the browser. The ID of the file returned after the test upload is: group1/M00/00/00/wKhQIFoKF3KAfw8wAABdrZgsqUU551_big.jpg, and the address for browsing access is: http://192.168.80.32:8888/group1/M00/00/00/wKhQIFoKF3KAfw8wAABdrZgsqUU551_big.jpg.

Note: never use the kill-9 command to kill the FastDFS process, or it may result in the loss of binlog data.

So far, a simple cluster mode has been set up, in which the above configuration needs to be adjusted according to the actual situation.

Description:

Tracker profile description:

Whether the disabled=false# configuration takes effect bind_addr=192.168.6.102# binds IPport=22122# service port connect_timeout=30# connection timeout network_timeout=60# tracker server network timeout in seconds. Base_path=/home/yangzi# directory address, which will create data (store server information), logs, log files, max_connections=256# system provides maximum number of connections work_threads=4# threads Usually set the number of CPU store_lookup=2 upload group (volume) 0: polling method 1: specify group 2: balance the load (select the group (volume) with the maximum remaining space to upload) here if upload to a fixed group is specified in the application layer, then this parameter is bypassed store_group=group1 when the previous parameter is set to 1 (store_lookup=1, that is, when the group name is specified), this parameter must be set to a group name that exists in the system. If you choose another upload method, this parameter has no effect on which storage server store_server=0 chooses to upload (after a file is uploaded, the storage server is equivalent to the storage server source of the file. Will push this file to the same group of storage server to achieve synchronization) # 0: polling method # 1: sort by ip address to select the first server (the one with the lowest IP address) # 2: sort according to priority (upload priority is set by storage server, parameter is upload_priority) store_path=0 choose which directory in storage server to upload. A storage server can have multiple base path (which can be understood as multiple disks) where files are stored. # 0: in rotation, multiple directories store files in turn # 2: select the directory with the largest remaining space to store files (note: the remaining disk space is dynamic, so the directory or disk stored may also change) download_server=0 chooses which storage server to use as the download server # 0: polling You can download any storage server# 1 of the current file: use which one is the source storage server (I mentioned earlier how the storage server source is generated), that is, the space reserved on the reserved_storage_space = 4GBstorage server server that was previously uploaded to the storage server server, to ensure the space required by the system or other applications (it is pointed out that if the hard drives of the servers in the same group are the same size, the smallest shall prevail. That is, as long as one server in the same group meets this standard, the reason is that they backup) log_level=info# chooses log-level run_by_group=# operating system running FastDFS user group run_by_user=# operating system running FastDFS user allow_hosts=*# can connect to the ip scope of this tracker server (affects all types of connections Including the client, storage server) sync_log_buff_interval = 1 "the time interval between synchronizing or refreshing log information to the hard disk, in seconds # Note: the log of tracker server is not always written to the hard disk, but to memory first. Check_active_interval = 12 seconds to detect the time interval of storage server survival, in seconds. # storage server sends a heartbeat to tracker server regularly. If the tracker server has not received a heartbeat from the storage server within a check_active_interval, the storage server will be considered offline. Therefore, the value of this parameter must be greater than the heartbeat interval configured by storage server. It is usually configured to be 2 or 3 times the storage server heartbeat interval. Thread_stack_size = the size of the 64KB# thread stack. The FastDFS server side is threaded. Correction, the tracker server thread stack should not be less than 64KB, not 512KB. The larger the thread stack, the more system resources a thread takes up. If you want to start more threads (the parameter for V1.x is max_connections,V2.0 and work_threads), you can reduce the value of this parameter appropriately. The parameter storage_ip_changed_auto_adjust = true# controls whether the cluster automatically adjusts when the storage server IP address changes. Note: automatic adjustment is completed only when the storage server process is restarted. Storage_sync_file_max_delay = 8640 parameters introduced by V2.0. The maximum delay for synchronizing files between storage servers defaults to 1 day. Adjust the parameters introduced by storage_sync_file_max_time = 30 percent V2.0 according to the actual situation. The maximum time it takes for a storage server to synchronize a file, which defaults to 300s, or 5 minutes. Whether the http.disabled=true# HTTP service does not work, of course, I have removed the with_ httpd macro at the time of compilation, http.server_port=80# HTTP service port # the following parameters are useful for http.check_alive_interval=30http.check_alive_type=tcphttp.check_alive_uri=/status.htmlhttp.need_find_content_type=true only when the http service is enabled

Storage.conf configuration instructions:

Whether the disabled=false# configuration takes effect: the group (volume) bind_addr=192.168.6.100# of group_name=group1#storage binds IP, and the other storage IP is 192.168.6.101client_bind=true#bind_addr, which is usually for server. This parameter is valid only when bind_addr is specified. Port=23000# is the connect_timeout=30# connection timeout of the storage service port, which is measured in seconds for the socket socket function connectnetwork_timeout=60# storage server network timeout. Heart_beat_interval=30# heartbeat interval, in seconds, the interval between which stat_report_interval=60# storage server reports disk space to tracker server, in seconds. Base_path=/home/eric# base_path directory address, root directory must exist subdirectory will automatically generate data (data storage place), logs log file max_connections=256# maximum number of connections buff_size = 256KB# set the buffer size of the queue node. Number of work_threads=4# worker threads disk_rw_separated = whether the true# disk IO reads and writes are separated, by default. Disk_reader_threads = number of threads read for a single storage path. The default value is 1disk_writer_threads = the number of threads written to a single storage path. The default value is 1sync_wait_msec=200# synchronizing files. If the file to be synchronized is not read from binlog, it will be read again after N milliseconds of hibernation. 0 means not dormant, and try to read it again immediately. The interval between sync_interval=0# synchronizing the previous file and then synchronizing the next file in milliseconds. 0 means no hibernation, and the next file is synchronized directly. The period of time that sync_start_time=00:00sync_end_time=23:59# allows the system to synchronize (the default is all-day). Generally used to avoid peak synchronization problems and set, I believe sa will understand. Write_mark_file_freq=500# periodically synchronizes storage's mark files to disk, in seconds when store_path_count=1# stores files, storage server supports multiple paths (such as disk). The number of base paths for storing files is configured here, usually with only one directory. Store_path0=/home/eric# configures store_path paths one by one, and the index number is based on 0. Note that after the configuration method, there is 0Query 1 store_path 2. You need to configure 0 to SQL-1. # if base_path0 is not configured, it will be the same as the path corresponding to base_path. When subdir_count_per_path=32# FastDFS stores files, it uses two-level directories. Here configure the number of directories to store files tracker_server=192.168.6.188:22122# tracker_server list to write port oh log_level=info# log level run_by_group=# run storage user group run_by_user=# run storage user allow_hosts=*# allow connection IP list file_distribute_path_mode=0# files distributed storage policy under the data directory. # 0: alternate storage # 1: random storage file_distribute_rotate_count=100# this parameter is valid when the above parameter file_distribute_path_mode is configured as 0 (rotational storage). # when the number of files stored in a directory reaches the value of this parameter, the subsequent uploaded files are stored in the next directory. Fsync_after_written_bytes=0# when writing large files, every N bytes are written, the system function fsync is called to forcibly synchronize the contents to the hard disk. 0 indicates the time interval between never calling fsyncsync_log_buff_interval=10# synchronization or refreshing log information to the hard disk, in seconds, from sync_binlog_buff_interval=60# synchronization binglog (update operation log) to the hard disk, in seconds, when sync_stat_file_interval=300# synchronizes storage stat files to disk (in seconds). The size of the thread_stack_size=512KB# thread stack. The FastDFS server side is threaded. The larger the thread stack, the more system resources a thread takes up. Upload_priority=10 this storage server as the source server, the priority of uploading files can be negative. The lower the value, the higher the priority. This corresponds to the configuration of store_server= 2 in tracker.conf, which corresponds to whether if_alias_prefix=check_file_duplicate=0 # detects whether the uploaded file already exists. If it already exists, the contents of the file do not exist, and a symbolic link is established to save disk space. Used in conjunction with fastdfh. 1 is detected, 0 is not detected, we do not use fastdfh of course 0 key_namespace=FastDFS# when the last parameter is set to 1 or yes (true/on is also possible), how the namespace keep_alive=0# in FastDHT connects with FastDHT servers (whether it is a persistent connection or not) # the following is the configuration of http, not to mention the http.disabled=truehttp.domain_name=http.server_port=80http.trunk_size=256KBhttp.need_find_content_type=true problem:

The pcre package is required to compile and install nginx. If it is not installed, you will be prompted as follows:

/ configure: error: the HTTP rewrite module requires the PCRE library.You can either disable the module by using-- without-http_rewrite_moduleoption, or install the PCRE library into the system, or build the PCRE librarystatically from the source with nginx by using-- with-pcre= option.

Need to install pcre's devel package, pcre-devel. Use yum to install: (the following command also has dependent installations such as ssl, zlib, etc.)

Yum-y install zlib zlib-devel openssl openssl--devel pcre pcre-devel

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.