GlusterFS 09/20 Update SLTechnology News&Howtos

GlusterFS

2025-09-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

Distributed storage has been studied for many years, but it has not been applied to engineering practice on a large scale until recent years, with the rise of cloud computing and big data applications by Internet companies such as Google, Amazon and Ali. Such as Google's distributed file system GFS, distributed table system google Bigtable, Amazon's object storage AWS, Ali's TFS and so on are good representatives, but also gave birth to a large number of excellent open source distributed storage systems, including ceph, swift, Lustre and glusterfs.

Distributed storage system

According to its storage interface, distributed storage is divided into three types: file storage, block storage and object storage.

file store

POSIX interfaces (such as glusterfs, but GFS and HDFS are non-POSIX interfaces) are usually supported, which can be accessed like ordinary file systems (such as ext4), but have more parallel access capabilities and redundant mechanisms than ordinary file systems. The main distributed file storage systems are TFS, cephfs, glusterfs, HDFS and so on. It mainly stores unstructured data, such as ordinary files, pictures, audio and video, etc. It can be accessed by protocols such as NFS and CIFS, and it is easy to share. NAS is the file storage type.

Block storage

This interface usually exists in the form of QEMU Driver or Kernel Module and is mainly accessed through qemu or iscsi protocols. The main block storage systems are ceph block storage, sheepdog and so on. It is mainly used to store structured data, such as database data. It is not convenient to share data. Both DAS and SAN are block storage types.

Object storage

Object storage system integrates the advantages of NAS and SAN, and has the advantages of high-speed direct access of SAN and data sharing of NAS. Take the object as the basic storage unit, provide RESTful data read and write interface to the outside, and often provide data access in the form of network service. The main object storage systems are AWS, swift and ceph object storage. Mainly used to store unstructured data

Glusterfs

Glusterfs is an open source distributed file system with strong scale-out ability, which can support several PB storage capacity and thousands of clients. Many cheap x86 hosts are interconnected into a parallel network file system through Infiniband RDMA or Tcp/Ip. It has the characteristics of scalability, high performance, high availability and so on.

Overview of GlusterFS

GlusterFS system is an extensible network file system. Compared with other distributed file systems, GlusterFS has the characteristics of high scalability, high availability, high performance and horizontal scalability, and it has no metadata server design, so that the whole service has no hidden danger of single point of failure.

Terminology:

A storage unit in Brick:GFS that is an export directory of a server in a trusted storage pool. Can be identified by hostname and directory name, such as' SERVER:EXPORT'

Client: the device with the GFS volume mounted

Extended Attributes:xattr is a file system feature that allows users or programs to associate files / directories and metadata.

FUSE:Filesystem Userspace is a loadable kernel module that allows non-privileged users to create their own file systems without modifying kernel code. The code that runs the file system in user space is bridged with the kernel through FUSE code.

Geo-Replication

Each file or directory in a GFID:GFS volume is associated with a unique 128bit data, which is used to simulate inode

Namespace: each Gluster volume exports a single ns as the mount point for POSIX

Node: a device with several brick

RDMA: remote direct memory access, which supports direct memory access without the OS of both parties.

RRDNS:round robin DNS is a method that returns different devices through DNS rotation for load balancing.

Self-heal: used for background runs to detect inconsistencies between files and directories in the replica volume and resolve these inconsistencies.

Split-brain: cerebral fissure

Translator:

Configuration file for the Volfile:glusterfs process, usually located in / var/lib/glusterd/vols/volname

Volume: a logical collection of bricks

1. Design without metadata.

Metadata is used to describe the location of a file or a given chunk in a distributed file system, in short, the location where a file or chunk is stored. Traditional distributed file systems will set up metadata servers or management servers with similar functions, which are mainly used to manage the storage location relationship between files and data blocks. Compared with other distributed file systems, GlusterFS does not have the concept of centralized or distributed metadata, instead of elastic hashing algorithm. Any server and client in the cluster can use the hash algorithm, path and file name to calculate, then the data can be located and read and write access operations can be performed.

The advantage of this design is that it not only greatly improves the scalability, but also improves the performance and reliability of the system; another remarkable feature is that if a certain file name is given, it will be very fast to find the file location. However, if you want to list files or directories, performance will be greatly degraded, because when listing files or directories, you need to query the node and aggregate the information in each node. At this time, the query efficiency of the distributed file system with metadata service will be much improved.

2. Deployment between servers

In previous versions, the relationship between servers is peer-to-peer, that is to say, each node server has the configuration information of the cluster. The advantage of this is that each node has the configuration information of the node, which is highly autonomous. All information can be queried locally. The information update of each node will be advertised to other nodes to ensure the consistency of information between nodes. However, if the scale of the cluster is large and there are many nodes, the efficiency of information synchronization will decrease and the inconsistency probability of node information will be greatly improved. Therefore, the future version of GlusterFS has a tendency to change to centralized management.

3. Client access process

When the client accesses GlusterFS storage, the program first reads and writes data by accessing the mount point. For users and programs, the cluster file system is transparent, and users and programs simply do not feel whether the file system is local or on a remote server. The read and write operations will be handed over to VFS (Virtual File System), the VFS will pass the request to the FUSE kernel module, and the FUSE will pass the data to the GlusterFS Client through the device / dev/fuse. Finally, it is calculated by GlusterFS Client, and finally the request or data is sent to GlusterFS Server through the network.

Third, the mode of GlusterFS cluster

The mode of the GlusterFS cluster is only the structure in which the data is stored in the cluster, similar to the level in the disk array.

1. Distributed volumes (Distributed Volume)

Also known as hash volume, similar to RAID0, the file is not sliced, the file is written to the hard disk of each node according to the hash algorithm, the advantage is large capacity, the disadvantage is no redundancy.

2. Copy volume (Replicated Volume)

Equivalent to raid1, the number of copies, which determines the size of the cluster, is usually used in combination with distributed volumes or stripe volumes to solve the redundancy defects of the first two types of storage volumes. The disadvantage is low disk utilization.

Duplicate volumes can be created with a specified number of replicas, usually 2 or 3, which will be stored on different brick of the volume, so several replicas must provide at least multiple brick, and when one server fails, you can read data from another server, so replicating GlusterFS volumes improves data reliability for colleagues, and also provides data redundancy.

3. Distributed replication volumes (Distributed Replicated Volume)

Distributed replication GlusterFS volumes combine the characteristics of distributed and replication Gluster volumes, which looks similar to RAID10, but in fact, RAID10 is essentially striped, but distributed replication GlusterFS volumes do not.

4. Stripe Volume (Striped Volume)

Equivalent to raid0, files are written evenly on the hard disk of each node, with the advantage of distributed read and write and good performance as a whole. The disadvantage is that there is no redundancy, and random read and write in fragments may cause the hard disk IOPS to become saturated.

5. Distributed stripe volume (Distributed Striped Volume)

When the size of a single file is very large and there are more clients, the stripe volume can no longer meet the demand, so it is a better choice to combine distribution with striping. Its performance is related to the number of servers.

Build a glusterfs cluster with three nodes, and then use one as a client

Hostnamectl set-hostname node1.lq.com

Each machine has to turn off the firewall and SElinuxsystemctl stop firewalldsetenforce 0vi / etc/hosts 192.168.80.100 master 192.168.80.101 slave1192.168.80.102 slave2 192.168.80.103 client

On 80.100: ssh-keygen-t rsassh-copy-id-I slave1ssh-copy-id-I slave2ssh-copy-id-I client

Configure the yum source as the Ali source (executed in each node)

Yum install wget-y

Wget http://mirrors.aliyun.com/repo/Centos-7.repo

Execute the yum source update command

Yum clean all

Yum makecache

Install glusterfs

Install glusterfs on master, slave1, slave2 nodes

Yum install centos-release-gluster-y

Yum install-y glusterfs glusterfs-server glusterfs-fuse glusterfs-rdma

Systemctl start glusterd

Systemctl enable glusterd

Configure on the master node to add two slave nodes to the gluster cluster.

Gluster peer probe master

Gluster peer probe slave1

Gluster peer probe slave2

View cluster status: view it on master

Gluster peer status

Create a data storage directory (run on all three nodes):

Mkdir-p / opt/gluster/data

View volume status:

Gluster volume info

Create a GlusterFS disk:

Gluster volume create models replica 3 master:/opt/gluster/data slave1:/opt/gluster/data slave2:/opt/gluster/data force

Replica 3 indicates that 3 backups are stored, followed by the storage directory of the server

GlusterFS several volume mode instructions:

First, the default mode, both DHT, also known as distributed volumes: files have been hash algorithm randomly distributed to a server node for storage.

Command: gluster volume create test-volume server1:/exp1 server2:/exp2

Second, copy mode, namely AFR, create volume with replica x quantity: copy files to replica x nodes.

Command: gluster volume create test-volume replica 2 transport tcp server1:/exp1 server2:/exp2

Third, stripe mode, namely Striped, the number of stripe x when creating volume: cut the file into data blocks and store them in stripe x nodes (similar to raid 0).

Command: gluster volume create test-volume stripe 2 transport tcp server1:/exp1 server2:/exp2

Fourth, distributed stripe mode (combined), which requires at least 4 servers to create. Stripe 2 server = 4 nodes when creating volume: it is a combination of DHT and Striped.

Command: gluster volume create test-volume stripe 2 transport tcp server1:/exp1 server2:/exp2 server3:/exp3 server4:/exp4

Fifth, distributed replication mode (combined), which requires at least 4 servers to create. Replica 2 server = 4 nodes when creating volume: it is a combination of DHT and AFR.

Command: gluster volume create test-volume replica 2 transport tcp server1:/exp1 server2:/exp2 server3:/exp3 server4:/exp4

Stripe replication volume mode (combined), which requires at least 4 servers to create. Stripe 2 replica 2 server = 4 nodes when creating volume: a combination of Striped and AFR.

Command: gluster volume create test-volume stripe 2 replica 2 transport tcp server1:/exp1 server2:/exp2 server3:/exp3 server4:/exp4

Seven, three modes mix, need at least 8 servers to create. Stripe 2 replica 2, every four nodes form a group.

Command: gluster volume create test-volume stripe 2 replica 2 transport tcp server1:/exp1 server2:/exp2 server3:/exp3 server4:/exp4 server5:/exp5 server6:/exp6 server7:/exp7 server8:/exp8

Check the volume information again

Gluster volume info

Start models

Gluster volume start models

Gluster performance tuning:

Enable the quota for the specified volume: (models is volume name)

Gluster volume quota models enable

Limit the maximum use of 10m space in / (total directory) in models

Gluster volume quota models limit-usage / 10MB

Set the cache size (depending on the actual situation here, if the setting is too large, it may cause later client mount failure)

Gluster volume set models performance.cache-size 512MB

Enable async, background operation

Gluster volume set models performance.flush-behind on

Set up io thread 32

Gluster volume set models performance.io-thread-count 32

Set write back (write data time, write to the cache first, and then write to the hard disk)

Gluster volume set models performance.write-behind on

Volume information after tuning

Gluster volume info

On the client:

Deploy the GlusterFS client and mount GlusterFS the file system

Yum install-y glusterfs glusterfs-fuse

Establish a mount point:

Mkdir-p / opt/gfsmount

Mount command:

Mount-t glusterfs 192.168.80.100:models / opt/gfsmount/

Df command check: df-h

Test:

Write a file to this directory, and then check the storage of the gluster server

Time dd if=/dev/zero of=/opt/gfsmount/hello bs=100M count=1

Check the / opt/gfsmount directory on the phn machine

Check the / opt/gluster/data directory on the master machine

Check the / opt/gluster/data directory on the slave1 machine

Check the / opt/gluster/data directory on the slave2 machine

You can see that there are backups on every node of the gluster server, which is in line with the principle of three backups set up earlier.

Other commands

View all the volume in GlusterFS:

Gluster volume list

Delete the GlusterFS disk:

Gluster volume stop models / / stop the disk named models

Gluster volume delete models / / Delete the disk named models

Note: after deleting the disk, you must delete the (.glusterfs / .trashcan /) directory in the disk (/ opt/gluster/data).

Otherwise, creating a disk with the same new volume will cause the problem of undistributed files or mistyped files.

Unmount a node GlusterFS disk

Gluster peer detach swarm-node-2

Set access restrictions per volume

Gluster volume set models auth.allow 10.6.0.,10.7.0.

Add a GlusterFS node:

Gluster peer probe swarm-node-3

Gluster volume add-brick models swarm-node-3:/opt/gluster/data

Configure Volum

Gluster volume set

Downsizing volume:

Migrate the data to another available Brick, and then remove the Brick after the migration:

Gluster volume remove-brick models slave1:/opt/gluster/data slave2:/opt/gluster/data start

After executing the start, you can use the status command to check the progress of the removal:

Gluster volume remove-brick models swarm-node-2:/opt/gluster/data swarm-node-3:/opt/gluster/data status

Delete the Brick without data migration:

Gluster volume remove-brick models swarm-node-2:/opt/gluster/data swarm-node-3:/opt/gluster/data commit

Note that if you are replicating volumes or stripe volumes, the number of Brick removed each time must be an integral multiple of replica or stripe.

Capacity expansion:

Gluster volume add-brick models swarm-node-2:/opt/gluster/data

Repair command:

Gluster volume replace-brick models swarm-node-2:/opt/gluster/data swarm-node-3:/opt/gluster/data commit-force

Migrate volume:

Gluster volume replace-brick models swarm-node-2:/opt/gluster/data swarm-node-3:/opt/gluster/data start

Pause to suspend migration

Gluster volume replace-brick models swarm-node-2:/opt/gluster/data swarm-node-3:/opt/gluster/data pause

Abort is to terminate the migration

Gluster volume replace-brick models swarm-node-2:/opt/gluster/data swarm-node-3:/opt/gluster/data abort

Status to view migration status

Gluster volume replace-brick models swarm-node-2:/opt/gluster/data swarm-node-3:/opt/gluster/data status

Use commit to take effect after the migration ends

Gluster volume replace-brick models swarm-node-2:/opt/gluster/data swarm-node-3:/opt/gluster/data commit

Balanced volume:

Gluster volume models lay-outstart

Gluster volume models start

Gluster volume models startforce

Gluster volume models status

Gluster volume models stop

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.