The implementation of Ceph in distributed File Storage system 07/03 Update SLTechnology News&Howtos

The implementation of Ceph in distributed File Storage system

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

What is a distributed file system

Distributed file system (Distributed File System) means that the physical storage resources managed by the file system are not necessarily connected to the local node directly, but are connected to the node through the computer network. Its design is based on the client / server mode.

As shown in the figure above, the application server and file server exist in the network respectively, and the network here can be either a unified subnet or a different subnet. The access to files by the server is carried out on the network, so that the capacity limit of commonly used storage devices can be broken.

2. Introduction of commonly used distributed file systems

1 、 Lustre

Lustre is a large-scale, secure and reliable cluster file system with high availability, which is developed and maintained by SUN. The main purpose of the project is to develop a next-generation cluster file system that can support more than 10000 nodes and PB storage systems.

2 、 Hadoop

Hadoop is not only a distributed file system for storage, but also a framework for executing distributed applications on a large cluster of general-purpose computing devices. At present, it is mainly used in big data, block chain and other fields.

3 、 FastDFS

FastDFS is an open source distributed file system, which manages files. Its functions include file storage, file synchronization, file access (file upload, file download), etc., which solves the problems of mass storage and load balancing. It is especially suitable for online services with files as carriers, such as photo album websites, video websites and so on.

4 、 Ceph

Ceph is a distributed file system with high scalability, high availability, high performance, and can provide object storage, block storage and file storage. It can provide PD-level storage space, which in theory is unlimited.

III. Introduction to Ceph

Ceph is a distributed storage system with high scalability, high availability and high performance. According to the scene division, Ceph can be divided into object storage, block device storage and file system services. In the field of virtualization, Ceph block device storage is commonly used. For example, in OpenStack project, Ceph block device storage can interface with OpenStack cinder back-end storage, Glance image storage and virtual machine data storage. It is more intuitive that the Ceph cluster can provide a block storage in raw format as the hard disk of the virtual machine instance.

The advantage of Ceph over other storage is that it not only stores, but also makes full use of the computing power on the storage node. When storing each data, it will calculate the location of the data storage and balance the data distribution as far as possible. At the same time, due to the good design of Ceph itself, using CRUSH algorithm, HASH ring and other methods, it does not have the traditional problem of single point of failure, and the performance will not be affected with the expansion of the scale.

IV. Ceph composition

The core components of Ceph include: Ceph OSD (object check-out device), Ceph Monitor (monitor), Ceph MSD (metadata server), Object, PG, RADOS, Libradio, CRUSH, RDB, RGW, CephFS

OSD: the full name is Object Storage Device, the real data storage component. Generally speaking, each disk participating in storage requires an OSD process. If there are 10 hard disks on a server, then there will be 10 OSD processes on the server.

MON:MON monitors the components of the cluster by saving a series of cluster status map, and uses map to save the cluster state. In order to prevent a single point of failure, monitor servers need an odd number (3 or more). If there is a disagreement, the voting mechanism is adopted, and the minority is subordinate to the majority.

MDS: full name Ceph Metadata Server, metadata server, only Ceph FS needs it.

The lowest storage unit of Object:Ceph is the Object object, and each Object contains metadata and raw data.

PG: full name Placement Grouops, is a logical concept, a PG contains multiple OSD. The introduction of PG layer is actually for better distribution of data and positioning data.

RADOS: full name Reliable Autonomic Distributed Object Store, is the essence of Ceph cluster, reliable autonomous distributed object storage, it is the basis of Ceph storage, to ensure that everything is stored in the form of objects.

Libradio:Librados is a Rados library, because RADOS is a protocol that is difficult to access directly, so the upper layer RBD, RGW and CephFS are all accessed through librados. Currently, only PHP, Ruby, Java, Python, C and C++ support are provided.

CRUSH: is a data distribution algorithm used by Ceph, similar to a consistent hash, to distribute data where it is expected.

RBD: the full name is RADOS block device. It is a RADOS block device that provides block storage services.

RGW: full name RADOS gateway,RADOS gateway, provides object storage, the interface is compatible with S3 and Swift.

CephFS: provides storage at the file system level.

V. Ceph deployment

1. Ceph topology

2. Server planning

3. Server environment preparation

Configure hostname and IP address resolution, and execute the following commands in 6 servers respectively:

[root@ceph-a ~] # echo-e "192.168.20.144 ceph-a" > > / etc/hosts [root@ceph-a ~] # echo-e "192.168.20.145 ceph-b" > > / etc/hosts [root@ceph-a ~] # echo-e "192.168.20.146 ceph-c" > > / etc/hosts [root@ceph-a ~] # echo-e "192.168.20.147 ceph-d" > > / etc / hosts [root@ceph-a ~] # echo-e "192.168.20.148 ceph-e" > > / etc/hosts [root@ceph-a ~] # echo-e "192.168.20.149 ceph-f" > > / etc/hosts

B. Configure secret-free login

The main purpose of scanning the keys of servers A to F is to avoid yes/no interaction during ssh connections or subsequent execution of ceph class commands, and execute the following commands in Ceph-A:

[root@ceph-a ~] # for i in {a.. f}; do ssh-keyscan ceph-$i > / root/.ssh/known_hosts; done

As shown below:

Generate the private key in Ceph-An and execute the following command in Ceph-A:

[root@ceph-a] # ssh-keygen-f / root/.ssh/id_rsa-N''

As shown in the figure:

Note: the-f parameter specifies the path of the private key file, and the-N parameter declares that the process is non-interactive, that is, there is no need for us to press enter manually.

Copy the private key file to the Ceph-B to the Ceph-F server and execute the following command:

[root@ceph-a ~] # for i in {b.. f}; do ssh-copy-id ceph-$i; done

C. Configure NTP network time synchronization service

Install the chrony package on 6 servers and execute the following command on server A:

[root@ceph-a] # for i in {a.. f}; do ssh ceph-$i yum-y install chrony; done

Configure Ceph-An as a NTP server:

[root@ceph-a ~] # vim / etc/chrony.conf

As shown in the figure, modify the three parts of the diagram

①, comment out the default NTP server, add the domestic NTP server, here is the NTP server of Beijing University of posts and Telecommunications.

②, allow time synchronization on the 192.168.20.0ax 24 network segment.

③, local time server level, uncomment.

Configure Ceph-B to Ceph-F server for time synchronization through Ceph-A

[root@ceph-b ~] # vim / etc/chrony.conf

As shown in the figure, the NTP server in the red box (that is, the Ceph-A just configured) can be added, and other useless comments can be added.

Save and exit, and copy the changed configuration file to Ceph-C to Ceph-F:

[root@ceph-b ~] # for i in {c.. f}; do scp / etc/chrony.conf ceph-$i:/etc/; done

Restart the chrony service from Ceph-A to Ceph-F and execute the following command in Ceph-A:

[root@ceph-a] # for i in {a.. f}; do ssh ceph-$i systemctl restart chronyd; done

Synchronize the time of Ceph-A

[root@ceph-a ~] # ntpdate ntp.sjtu.edu.cn

Synchronize the time from Ceph-B to Ceph-F, and execute it in Ceph-A:

[root@ceph-a] # for i in {b.. f}; do ssh ceph-$i ntpdate 192.168.20.144; done

4. Configure the yum source

We have previously deployed a local yum repository, and here we use the previously configured yum repository.

Remove the repo file that comes with the system, edit the / etc/yum.repos.d/localhost.repo file, and add the following to the file:

[Centos-Base] name=Centos-Base-Ceph baseurl= http://192.168.20.138 enable=1 gpgcheck=1 priority=2 type=rpm-md gpgkey= http://192.168.20.138/ceph-key/release.asc

Empty the metadata cache and rebuild

[root@ceph-a ~] # yum clean all [root@ceph-a ~] # yum makecache

5. Install the Ceph service, where Ceph-An is used as the admin management end

A. Install ceph-deploy in Ceph-A

[root@ceph-a ~] # yum-y install ceph-deploy

B. Create the working directory of ceph in Ceph-An and enter

[root@ceph-a] # mkdir / etc/ceph & & cd / etc/ceph

C. Create a cluster node configuration file in Ceph-A to Ceph-C. Ceph-D to Ceph-F server is temporarily for other use. The following operations are not involved.

[root@ceph-a ~] # ceph-deploy new ceph- {a.. c}

As shown in the figure:

D. Install the ceph package in the three nodes from Ceph-A to Ceph-C

[root@ceph-a ceph] # ceph-deploy install ceph- {a..c}

As shown in the figure:

E. Initialize the mon service

[root@ceph-a ceph] # ceph-deploy mon create-initial

As shown in the figure:

F. Create OSD device

In the server planning, we have planned a total of 4 hard drives, of which sda is used as the system disk, so there are three hard drives left: sdb, sdc and sdd. Here, we use sdb hard disk as log disk and sdc and sdd as data disk.

①, format sdb hard drives from Ceph-A to Ceph-C to gpt format

[root@ceph-a ~] # for i in {a.. c}; do ssh ceph-$i parted / dev/sdb mklabel gpt; done

Note: if there is an error in this command, you need to split the command into the following two commands, and then execute them separately in Ceph-A, Ceph-B and Ceph-C.

Parted / dev/sdb mklabel gpt

②, create a partition for the hard disk sdb

[root@ceph-a ~] # for i in {a.. c}; do ssh ceph-$i parted / dev/sdb mkpart primary 1m 50%; done [root@ceph-a ~] # for i in {a.. c}; do ssh ceph-$i parted / dev/sdb mkpart primary 50% 100%; done

You can use the lsblk command to check whether the partition is successful or not:

[root@ceph-a ~] # lsblk

As shown in the figure:

③, assign the main subordinate group to the partition sdb1 and sdb2 as ceph

[root@ceph-a ~] # for i in {a.. c}; do ssh ceph-$i chown ceph.ceph / dev/sdb?; done

Note: the following error may occur here

[ceph-a] [ERROR] admin_socket:exception getting command descriptions: [Errno 2] No such file or directory

Solutions are as follows:

Edit the ceph.conf and add the following line at the bottom: public_network = 192.168.20.0 overwrite-conf config push ceph-a ceph-b ceph-c 24 save and exit, and then execute the following command to push the configuration file overwrite to the three server nodes: [root@ceph-a ceph] # server-- overwrite-conf config push ceph-a ceph-b ceph-c

④, at this point, we can check the ceph status, using the command ceph health or ceph-s

[root@ceph-a ceph] # ceph health

HEALTH_OK is normally output, if not, there may be the following situations

Ceph:health_warn clock skew detected on mon

This error indicates that there is something wrong with our server time synchronization. We can synchronize the time first.

[root@ceph-a] # for i in {b.. f}; do ssh ceph-$i ntpdate 192.168.20.144; done

If there is still no resolution after time synchronization, use the following workaround:

[root@ceph-a ceph] # vim ceph.conf add: mon clock drift allowed = 2 mon clock drift warn backoff = 302 under the global field, push the configuration file to the mon node that needs synchronization: [root@ceph-a ceph] # ceph-deploy-- overwrite-conf config push ceph-a ceph-b ceph-c here push to Ceph-A, Ceph-B, Ceph-C It can also be followed by other discontinuous nodes 3. Restart the mon service (in centos7 environment) [root@ceph-a ceph] # for i in {a.. c}; do ssh ceph-$i systemctl restart ceph-mon.target; done 4. Verify: [root@ceph-a ceph] # ceph health shows OK, which means it is normal.

When using ceph-s to view status, the following may also occur:

As shown in the figure:

As shown in the red box, no active mgr, in this case, we need to create the management process manually. The solution is as follows

[root@ceph-a ceph] # ceph-deploy mgr create ceph- {a..c}

Check the status again, as shown in the following figure

It turns out to be application not enabled on 1 pool (s), in which case we are going to execute

[root@ceph-a ceph] # ceph osd pool application enable cephrbd image

Cephrbd: the pool created manually for us

Image: a manually created image for us

Check the status again

It is found that it has become OK, which means normal.

⑤, initialize sdc, sdd hard disk, clear hard disk data

[root@ceph-a ceph] # for i in {a.. c}; do ceph-deploy disk zap ceph-$i / dev/sdc; done [root@ceph-a ceph] # for i in {a.. c}; do ceph-deploy disk zap ceph-$i / dev/sdd; done

As shown in the figure:

Note: in versions prior to 13.2.0, the following command should be used:

[root@ceph-a ceph] # for i in {a.. c}; do ceph-deploy disk zap ceph-$i:/dev/sdc ceph-$i:/dev/sdd; done

But the above command is not suitable in version 13.2.2 of this example, otherwise, there will be the following error. It took a long time to find out that the command format is incorrect (the official document is the same, it is not the latest version, and many places do not apply to the latest version).

⑥, create OSD storage space. Here, sdb1 is used as the log disk of sdc and sdb2 is used as the log disk of sdd.

[root@ceph-a ceph] # for i in {a.. c}; do ceph-deploy osd create-- data / dev/sdc-- journal / dev/sdb1 ceph-$i; done [root@ceph-a ceph] # for i in {a.. c}; do ceph-deploy osd create-- data / dev/sdd-- journal / dev/sdb2 ceph-$i; done

As shown in the figure:

-- data: specify the data disk

-- journal: specify log disk

There is also a problem with the command format here, if it is a version prior to 13.2.0, use the following command:

[root@ceph-a ceph] # for i in {a.. c}; do ceph-deploy osd create ceph-$i:sdc:/dev/sdb1; done [root@ceph-a ceph] # for i in {a.. c}; do ceph-deploy osd create ceph-$i:sdc:/dev/sdb1; done

If you use the above command in this example, the following error will be reported, as shown in the figure

After the above steps, our Ceph is installed. Next, we simply talk about the Ceph application, and then analyze it in the later chapter of the more detailed application.

6. Use RBD (RADOS block device)

1. First of all, let's take a look at the storage methods of Ceph. Ceph supports three storage methods:

A, block storage, which is also the most widely used storage method.

B, CephFS: as long as you understand this storage method, it is not recommended to use it in a production environment, because it is not mature enough.

C, object storage: the storage method is also to understand on the line, at this stage is not mature and stable. Only OpenStack Swift and Amazon S3 interfaces are supported. We will explain it later if you need it.

2. Second, let's take a look at what ceph block storage is:

A, Ceph block device is also called RADOS device, namely: RADOS block device (RBD).

B, RBD drivers have been well integrated into the Linux kernel.

C, RBD provides enterprise functions, such as snapshots, COW clones, and so on.

D and RBD also support in-memory caching, which can greatly improve performance.

E and Linux kernels can directly access Ceph block storage.

F and KVM can be accessed by means of librdb.

3. Use RBD

①, View Storage Pool

[root@ceph-a ceph] # ceph osd lspools

Generally speaking, there will be a default No. 0 storage pool, but when I check here, there is no default No. 0 storage pool. I don't know if it has been officially removed. We need to continue to observe it later.

"②, create ceph OSD storage pool"

[root@ceph-a] # ceph osd pool create cephrbd 512

As shown in the figure:

Here, cephrbd is the name of the storage pool.

● usually needs to override the default pg_num before creating a pool. It is officially recommended:

If the ● is less than 5 OSD, set the pg_num to 128.

● 5x 10 OSD, set pg_num to 512.

● 1050 OSD, set pg_num to 4096.

If the number of ● exceeds 50 OSD, you can refer to pgcalc for calculation.

③, create a mirror named image with a size of 10G

[root@ceph-a] # rbd create cephrbd/image-- image-feature layering-- size 10G

Cephrbd/image: indicates the mirror of the image created in the storage pool cephrbd

-- image-feature: this option specifies the use of features and does not have to be fully enabled. We only need to use snapshots, tiered storage and other features to enable layering.

④, check whether there are mirrors in the cephrbd storage pool

[root@ceph-a ~] # rbd ls cephrbd

As shown in the figure:

⑤, view image information

[root@ceph-a ~] # rbd info cephrbd/image

As shown in the figure:

⑥, write UDEV rules so that after sdb1 and sdb2 restart, the group owner is still ceph.

[root@ceph-a ~] # vim / etc/udev/rules.d/87-cephdisk.rules ACTION== "add", KERNEL== "sdb?", OWNER= "ceph", GROUP= "ceph"

Copy rules to Ceph-B and Ceph-C

[root@ceph-a ~] # for i in {b.. c}; do scp / etc/udev/rules.d/87-cephdisk.rules ceph-$i:/etc/udev/rules.d/; done

4. Mirror operation

①, capacity expansion

[root@ceph-a] # rbd resize-- size 15G cephrbd/image

Expand image image to 15G

②, reducing capacity

[root@ceph-a] # rbd resize-- size 11G cephrbd/image-- allow-shrink

Reduce the image image capacity to 11G

③, delete image

[root@ceph-f ceph] # rbd rm cephrbd/demo-img

5. Use the Ceph client

Here, we use the Ceph-F server as the Ceph client and map the previously created mirror image to Ceph-F for disk use.

A. Install Ceph client software in Ceph-F

[root@ceph-f ~] # yum-y install ceph-common

B. Copy the ceph.conf and ceph.client.admin.keyring of Ceph-A to Ceph-F

[root@ceph-a ceph] # scp ceph.c* ceph-f:/etc/ceph

Ceph.client.admin.keyring is the key file of the client.admin user

C. View status

[root@ceph-f ~] # ceph- s

D, mapping mirroring to the local

[root@ceph-f ceph] # rbd map cephrbd/image

E. View the information of the mapped disk

[root@ceph-f ceph] # rbd showmapped

F, formatting, mounting / dev/rbd0

①, format / dev/rbd0

[root@ceph-f ceph] # mkfs.ext4 / dev/rbd0

②, mount / dev/rbd0 to / image, and add boot auto mount

[root@ceph-f ceph] # mkdir / image [root@ceph-f ceph] # mount / dev/rbd0 / image [root@ceph-f ceph] # echo-e "/ dev/rbd0 / image ext4 defaults 00" > > / etc/fstab

G. Write data to / image and check whether the data is written successfully

[root@ceph-f ceph] # echo "hello world" > / image/log.txt [root@ceph-f ceph] # cat / image/log.txt

H. Create a mirror snapshot and create an image-sn1 snapshot for the image mirror

[root@ceph-f ceph] # rbd snap create cephrbd/image-- snap image-sn1

I. View image snapshots

[root@ceph-f ceph] # rbd snap ls cephrbd/image

J. Delete mirror snapshot

[root@ceph-f ceph] # rbd snap remove cephrbd/image@image-sn1

K. Mirror snapshot operation

①, data recovery by snapshot

Delete / iamge/log.txt file

[root@ceph-f ceph] # rm-rf / image/log.txt

Unmount mount point

[root@ceph-f ceph] # umount / dev/rbd0

Restore a snapshot using image-sn1

[root@ceph-f ceph] # rbd snap rollback cephrbd/image-- snap image-sn1

Then mount / dev/rbd0 to see if the file exists

[root@ceph-f ceph] # mount / dev/rbd0 / image [root@ceph-f ceph] # ll / image/

②, snapshot cloning

If you want to restore a new mirror from a snapshot, you can use a snapshot clone

Note: before cloning, you need to protect the snapshot. The protected snapshot cannot be deleted. Unprotect (unprotect).

[root@ceph-f ceph] # rbd snap protect cephrbd/image-- snap image-sn1

Try to delete the snapshot you just protected

[root@ceph-f ceph] # rbd snap remove cephrbd/image@image-sn1

As shown in the image above, the snapshot cannot be deleted

Clone image-sn1 snapshot, clone name is image-clone

[root@ceph-f ceph] # rbd clone cephrbd/image--snap image-sn1 cephrbd/image-clone-image-feature layering

View the status of a clone snapshot

[root@ceph-f ceph] # rbd info cephrbd/image-clone

The description in the red box is: the clone of the snapshot image-sn1 of the mirror image of pool cephrbd

③, merge clone files

[root@ceph-f ceph] # rbd flatten cephrbd/image-clone

View merged information

[root@ceph-f ceph] # rbd info cephrbd/image-clone

From the picture above, we find that it has become a separate mirror image.

View mirrors in a cephrbd pool

[root@ceph-f ceph] # rbd ls cephrbd

④, Unmapping

[root@ceph-f ceph] # rbd unmap / dev/rbd/cephrbd/image

I will elaborate on more applications of Ceph in a later article, and that's all for today.

For the application of Ceph block devices, please refer to another blog post: https://blog.51cto.com/4746316/2330070

For the application of CephFS file system, please refer to another blog post: https://blog.51cto.com/4746316/2330186

For Ceph object storage, please refer to my other blog post: https://blog.51cto.com/4746316/2330455

VII. Summary of Ceph deployment

Generally speaking, the deployment of Ceph is relatively simple, although it seems simple, but the simpler it is, we need to pay more attention to it, because the simpler we think is, the easier it is for us to step on the hole in our daily operation and maintenance. For example, several problems encountered in the above deployment process were not even reflected in the official documents, resulting in a lot of thought in filling the hole, but at least I found out what the problem was.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.