In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-07 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/03 Report--
Distributed storage has been studied for many years, but it has not been applied to engineering practice on a large scale until recent years, with the rise of cloud computing and big data applications by Internet companies such as Google, Amazon and Ali. Such as Google's distributed file system GFS, distributed table system google Bigtable, Amazon's object storage AWS, Ali's TFS and so on are good representatives, but also gave birth to a large number of excellent open source distributed storage systems, including ceph, swift, Lustre and glusterfs.
Distributed storage system
According to its storage interface, distributed storage is divided into three types: file storage, block storage and object storage.
file store
POSIX interfaces (such as glusterfs, but GFS and HDFS are non-POSIX interfaces) are usually supported, which can be accessed like ordinary file systems (such as ext4), but have more parallel access capabilities and redundant mechanisms than ordinary file systems. The main distributed file storage systems are TFS, cephfs, glusterfs, HDFS and so on. It mainly stores unstructured data, such as ordinary files, pictures, audio and video, etc. It can be accessed by protocols such as NFS and CIFS, and it is easy to share. NAS is the file storage type.
Block storage
This interface usually exists in the form of QEMU Driver or Kernel Module and is mainly accessed through qemu or iscsi protocols. The main block storage systems are ceph block storage, sheepdog and so on. It is mainly used to store structured data, such as database data. It is not convenient to share data. Both DAS and SAN are block storage types.
Object storage
Object storage system integrates the advantages of NAS and SAN, and has the advantages of high-speed direct access of SAN and data sharing of NAS. Take the object as the basic storage unit, provide RESTful data read and write interface to the outside, and often provide data access in the form of network service. The main object storage systems are AWS, swift and ceph object storage. Mainly used to store unstructured data
Glusterfs
Glusterfs is an open source distributed file system with strong scale-out ability, which can support several PB storage capacity and thousands of clients. Many cheap x86 hosts are interconnected into a parallel network file system through Infiniband RDMA or Tcp/Ip. It has the characteristics of scalability, high performance, high availability and so on.
Overview of GlusterFS
GlusterFS system is an extensible network file system. Compared with other distributed file systems, GlusterFS has the characteristics of high scalability, high availability, high performance and horizontal scalability, and it has no metadata server design, so that the whole service has no hidden danger of single point of failure.
Terminology:
A storage unit in Brick:GFS that is an export directory of a server in a trusted storage pool. Can be identified by hostname and directory name, such as' SERVER:EXPORT'
Client: the device with the GFS volume mounted
Extended Attributes:xattr is a file system feature that allows users or programs to associate files / directories and metadata.
FUSE:Filesystem Userspace is a loadable kernel module that allows non-privileged users to create their own file systems without modifying kernel code. The code that runs the file system in user space is bridged with the kernel through FUSE code.
Geo-Replication
Each file or directory in a GFID:GFS volume is associated with a unique 128bit data, which is used to simulate inode
Namespace: each Gluster volume exports a single ns as the mount point for POSIX
Node: a device with several brick
RDMA: remote direct memory access, which supports direct memory access without the OS of both parties.
RRDNS:round robin DNS is a method that returns different devices through DNS rotation for load balancing.
Self-heal: used for background runs to detect inconsistencies between files and directories in the replica volume and resolve these inconsistencies.
Split-brain: cerebral fissure
Translator:
Configuration file for the Volfile:glusterfs process, usually located in / var/lib/glusterd/vols/volname
Volume: a logical collection of bricks
1. Design without metadata.
Metadata is used to describe the location of a file or a given chunk in a distributed file system, in short, the location where a file or chunk is stored. Traditional distributed file systems will set up metadata servers or management servers with similar functions, which are mainly used to manage the storage location relationship between files and data blocks. Compared with other distributed file systems, GlusterFS does not have the concept of centralized or distributed metadata, instead of elastic hashing algorithm. Any server and client in the cluster can use the hash algorithm, path and file name to calculate, then the data can be located and read and write access operations can be performed.
The advantage of this design is that it not only greatly improves the scalability, but also improves the performance and reliability of the system; another remarkable feature is that if a certain file name is given, it will be very fast to find the file location. However, if you want to list files or directories, performance will be greatly degraded, because when listing files or directories, you need to query the node and aggregate the information in each node. At this time, the query efficiency of the distributed file system with metadata service will be much improved.
2. Deployment between servers
In previous versions, the relationship between servers is peer-to-peer, that is to say, each node server has the configuration information of the cluster. The advantage of this is that each node has the configuration information of the node, which is highly autonomous. All information can be queried locally. The information update of each node will be advertised to other nodes to ensure the consistency of information between nodes. However, if the scale of the cluster is large and there are many nodes, the efficiency of information synchronization will decrease and the inconsistency probability of node information will be greatly improved. Therefore, the future version of GlusterFS has a tendency to change to centralized management.
3. Client access process
When the client accesses GlusterFS storage, the program first reads and writes data by accessing the mount point. For users and programs, the cluster file system is transparent, and users and programs simply do not feel whether the file system is local or on a remote server. The read and write operations will be handed over to VFS (Virtual File System), the VFS will pass the request to the FUSE kernel module, and the FUSE will pass the data to the GlusterFS Client through the device / dev/fuse. Finally, it is calculated by GlusterFS Client, and finally the request or data is sent to GlusterFS Server through the network.
Third, the mode of GlusterFS cluster
The mode of the GlusterFS cluster is only the structure in which the data is stored in the cluster, similar to the level in the disk array.
1. Distributed volumes (Distributed Volume)
Also known as hash volume, similar to RAID0, the file is not sliced, the file is written to the hard disk of each node according to the hash algorithm, the advantage is large capacity, the disadvantage is no redundancy.
2. Copy volume (Replicated Volume)
Equivalent to raid1, the number of copies, which determines the size of the cluster, is usually used in combination with distributed volumes or stripe volumes to solve the redundancy defects of the first two types of storage volumes. The disadvantage is low disk utilization.
Duplicate volumes can be created with a specified number of replicas, usually 2 or 3, which will be stored on different brick of the volume, so several replicas must provide at least multiple brick, and when one server fails, you can read data from another server, so replicating GlusterFS volumes improves data reliability for colleagues, and also provides data redundancy.
3. Distributed replication volumes (Distributed Replicated Volume)
Distributed replication GlusterFS volumes combine the characteristics of distributed and replication Gluster volumes, which looks similar to RAID10, but in fact, RAID10 is essentially striped, but distributed replication GlusterFS volumes do not.
4. Stripe Volume (Striped Volume)
Equivalent to raid0, files are written evenly on the hard disk of each node, with the advantage of distributed read and write and good performance as a whole. The disadvantage is that there is no redundancy, and random read and write in fragments may cause the hard disk IOPS to become saturated.
5. Distributed stripe volume (Distributed Striped Volume)
When the size of a single file is very large and there are more clients, the stripe volume can no longer meet the demand, so it is a better choice to combine distribution with striping. Its performance is related to the number of servers.
Build a glusterfs cluster with three nodes, and then use one as a client
Hostnamectl set-hostname node1.lq.com
Each machine has to turn off the firewall and SElinuxsystemctl stop firewalldsetenforce 0vi / etc/hosts 192.168.80.100 master 192.168.80.101 slave1192.168.80.102 slave2 192.168.80.103 client
On 80.100: ssh-keygen-t rsassh-copy-id-I slave1ssh-copy-id-I slave2ssh-copy-id-I client
Configure the yum source as the Ali source (executed in each node)
Yum install wget-y
Wget http://mirrors.aliyun.com/repo/Centos-7.repo
Execute the yum source update command
Yum clean all
Yum makecache
Install glusterfs
Install glusterfs on master, slave1, slave2 nodes
Yum install centos-release-gluster-y
Yum install-y glusterfs glusterfs-server glusterfs-fuse glusterfs-rdma
Systemctl start glusterd
Systemctl enable glusterd
Configure on the master node to add two slave nodes to the gluster cluster.
Gluster peer probe master
Gluster peer probe slave1
Gluster peer probe slave2
View cluster status: view it on master
Gluster peer status
Create a data storage directory (run on all three nodes):
Mkdir-p / opt/gluster/data
View volume status:
Gluster volume info
Create a GlusterFS disk:
Gluster volume create models replica 3 master:/opt/gluster/data slave1:/opt/gluster/data slave2:/opt/gluster/data force
Replica 3 indicates that 3 backups are stored, followed by the storage directory of the server
GlusterFS several volume mode instructions:
First, the default mode, both DHT, also known as distributed volumes: files have been hash algorithm randomly distributed to a server node for storage.
Command: gluster volume create test-volume server1:/exp1 server2:/exp2
Second, copy mode, namely AFR, create volume with replica x quantity: copy files to replica x nodes.
Command: gluster volume create test-volume replica 2 transport tcp server1:/exp1 server2:/exp2
Third, stripe mode, namely Striped, the number of stripe x when creating volume: cut the file into data blocks and store them in stripe x nodes (similar to raid 0).
Command: gluster volume create test-volume stripe 2 transport tcp server1:/exp1 server2:/exp2
Fourth, distributed stripe mode (combined), which requires at least 4 servers to create. Stripe 2 server = 4 nodes when creating volume: it is a combination of DHT and Striped.
Command: gluster volume create test-volume stripe 2 transport tcp server1:/exp1 server2:/exp2 server3:/exp3 server4:/exp4
Fifth, distributed replication mode (combined), which requires at least 4 servers to create. Replica 2 server = 4 nodes when creating volume: it is a combination of DHT and AFR.
Command: gluster volume create test-volume replica 2 transport tcp server1:/exp1 server2:/exp2 server3:/exp3 server4:/exp4
Stripe replication volume mode (combined), which requires at least 4 servers to create. Stripe 2 replica 2 server = 4 nodes when creating volume: a combination of Striped and AFR.
Command: gluster volume create test-volume stripe 2 replica 2 transport tcp server1:/exp1 server2:/exp2 server3:/exp3 server4:/exp4
Seven, three modes mix, need at least 8 servers to create. Stripe 2 replica 2, every four nodes form a group.
Command: gluster volume create test-volume stripe 2 replica 2 transport tcp server1:/exp1 server2:/exp2 server3:/exp3 server4:/exp4 server5:/exp5 server6:/exp6 server7:/exp7 server8:/exp8
Check the volume information again
Gluster volume info
Start models
Gluster volume start models
Gluster performance tuning:
Enable the quota for the specified volume: (models is volume name)
Gluster volume quota models enable
Limit the maximum use of 10m space in / (total directory) in models
Gluster volume quota models limit-usage / 10MB
Set the cache size (depending on the actual situation here, if the setting is too large, it may cause later client mount failure)
Gluster volume set models performance.cache-size 512MB
Enable async, background operation
Gluster volume set models performance.flush-behind on
Set up io thread 32
Gluster volume set models performance.io-thread-count 32
Set write back (write data time, write to the cache first, and then write to the hard disk)
Gluster volume set models performance.write-behind on
Volume information after tuning
Gluster volume info
On the client:
Deploy the GlusterFS client and mount GlusterFS the file system
Yum install-y glusterfs glusterfs-fuse
Establish a mount point:
Mkdir-p / opt/gfsmount
Mount command:
Mount-t glusterfs 192.168.80.100:models / opt/gfsmount/
Df command check: df-h
Test:
Write a file to this directory, and then check the storage of the gluster server
Time dd if=/dev/zero of=/opt/gfsmount/hello bs=100M count=1
Check the / opt/gfsmount directory on the phn machine
Check the / opt/gluster/data directory on the master machine
Check the / opt/gluster/data directory on the slave1 machine
Check the / opt/gluster/data directory on the slave2 machine
You can see that there are backups on every node of the gluster server, which is in line with the principle of three backups set up earlier.
Other commands
View all the volume in GlusterFS:
Gluster volume list
Delete the GlusterFS disk:
Gluster volume stop models / / stop the disk named models
Gluster volume delete models / / Delete the disk named models
Note: after deleting the disk, you must delete the (.glusterfs / .trashcan /) directory in the disk (/ opt/gluster/data).
Otherwise, creating a disk with the same new volume will cause the problem of undistributed files or mistyped files.
Unmount a node GlusterFS disk
Gluster peer detach swarm-node-2
Set access restrictions per volume
Gluster volume set models auth.allow 10.6.0.,10.7.0.
Add a GlusterFS node:
Gluster peer probe swarm-node-3
Gluster volume add-brick models swarm-node-3:/opt/gluster/data
Configure Volum
Gluster volume set
Downsizing volume:
Migrate the data to another available Brick, and then remove the Brick after the migration:
Gluster volume remove-brick models slave1:/opt/gluster/data slave2:/opt/gluster/data start
After executing the start, you can use the status command to check the progress of the removal:
Gluster volume remove-brick models swarm-node-2:/opt/gluster/data swarm-node-3:/opt/gluster/data status
Delete the Brick without data migration:
Gluster volume remove-brick models swarm-node-2:/opt/gluster/data swarm-node-3:/opt/gluster/data commit
Note that if you are replicating volumes or stripe volumes, the number of Brick removed each time must be an integral multiple of replica or stripe.
Capacity expansion:
Gluster volume add-brick models swarm-node-2:/opt/gluster/data
Repair command:
Gluster volume replace-brick models swarm-node-2:/opt/gluster/data swarm-node-3:/opt/gluster/data commit-force
Migrate volume:
Gluster volume replace-brick models swarm-node-2:/opt/gluster/data swarm-node-3:/opt/gluster/data start
Pause to suspend migration
Gluster volume replace-brick models swarm-node-2:/opt/gluster/data swarm-node-3:/opt/gluster/data pause
Abort is to terminate the migration
Gluster volume replace-brick models swarm-node-2:/opt/gluster/data swarm-node-3:/opt/gluster/data abort
Status to view migration status
Gluster volume replace-brick models swarm-node-2:/opt/gluster/data swarm-node-3:/opt/gluster/data status
Use commit to take effect after the migration ends
Gluster volume replace-brick models swarm-node-2:/opt/gluster/data swarm-node-3:/opt/gluster/data commit
Balanced volume:
Gluster volume models lay-outstart
Gluster volume models start
Gluster volume models startforce
Gluster volume models status
Gluster volume models stop
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.