CEPH block storage 07/12 Update SLTechnology News&Howtos

CEPH block storage

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

1. Install the Ceph block storage client

Ceph block devices, formerly called RADOS block devices, provide reliable, distributed, and high-performance block storage disks for clients.

The RADOS block device leverages the librbd library and sequentially stores blocks on multiple osd in the Ceph cluster. RBD is supported by Ceph's RADOS layer, so each block device is distributed across multiple Ceph nodes, providing high performance and excellent reliability. RBD has native support for the linux kernel.

Any ordinary linux host can act as a client to Ceph. The client interacts with the Ceph storage cluster over the network to store or retrieve user data. Ceph RBD support has been added to the Linux mainline kernel, starting with 2.6.34 and later.

192.168.3.158 do the following for the client

1.1 modify the hostname [root@localhost ~] # cat / etc/hosts. 192.168.3.165 ceph265192.168.3.166 ceph266192.168.3.167 ceph267192.168.3.158 ceph258 [root@localhost ~] # hostnamectl set-hostname ceph2581.2 modify ceph source file # wget-O / etc/yum.repos.d/ceph.repo https://raw.githubusercontent.com/aishangwei/ceph-demo/master/ceph-deploy/ceph.repo1.3 create directory # mkdir-p / etc/ceph1.4 install ceph# yum-y install epel-release# yum -y install ceph# cat / etc/ceph/ceph.client.rbd.keyring

# create ceph block client user name and authentication key

[ceph@ceph265 my-cluster] $ceph auth get-or-create client.rbd mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=rbd' | tee. / ceph.client.rbd.keyring [client.rbd] key = AQBLBwRepKVJABAALyRx67z6efeI4xogPqHkyw== Note: client.rbd is the configuration of authorization after the client name mon

Copy the configuration file and key to the client

[ceph@ceph265 my-cluster] $scp ceph.client.rbd.keyring root@192.168.3.158:/etc/ceph [ceph@ceph265 my-cluster] $scp ceph.conf root@192.168.3.158:/etc/ceph

# check whether it meets the environmental requirements of the block device

Uname-rmodprobe rbd

# install ceph client

# wget-O / etc/yum.repos.d/ceph.repo https://raw.githubusercontent.com/aishangwei/ceph-demo/master/ceph-deploy/ceph.repo

View key file

[root@ceph258 ~] # cat / etc/ceph/ceph.client.rbd.keyring [client.rdb] key = AQBLBwRepKVJABAALyRx67z6efeI4xogPqHkyw== [root@ceph258 ~] # ceph-s-name client.rbd

two。 Client creates block devices and mappings

Execute the following command on server 192.168.3.165

(1) create a block device

Block devices are created by default and are created directly in the rbd pool, but the rbd pool is not created after installation using deploy.

# create pools and blocks

$ceph osd lspools # View the clustered storage pool $ceph osd pool create rbd 50 pool 'rbd' created # 50 is the number of place group. Since we need more pg for subsequent tests, it is set to 50 here.

Determining the value of pg_num is mandatory because it cannot be calculated automatically. Here are a few common values:

Set pg_num to 128when there are less than 5 OSD

When the number of OSD is 5 to 10, you can set the pg_num to 512.

When the number of OSD is 10 to 50, you can set the pg_num to 4096.

When the number of OSD is greater than 50, you have to understand the tradeoff method and how to calculate the pg_num value yourself.

(2) the client creates a block device

Create a rbd block device with a capacity of 5105m

[root@ceph258] # rbd create rbd2-- size 5105-- name client.rbd

192.168.3.158 client views rbd2 block devices

[root@ceph258] # rbd ls-- name client.rbd rbd2 [root@ceph258 ~] # rbd ls-p rbd-- name client.rbd rbd2 [root@ceph258 ~] # rbd list-name client.rbd rbd2

View rbd2 block device information

[root@ceph258] # rbd-- image rbd2 info-- name client.rbd

# mapped to the client, an error should be reported

[root@ceph258] # rbd map-- image rbd2-- name client.rbd

Layering: tiered support

-exclusive-lock: exclusive lock support pair

-object-map: object mapping support (exclusive locking (exclusive-lock) required)

-deep-flatten: snapshot flat support (snapshot flatten support)

-fast-diff: using the krbd (kernel rbd) client on client-node1 for fast diff computing (requires object mapping), we will not be able to map block device images on CentOS kernel 3.10 because the kernel does not support object mapping (object-map), deep flat (deep-flatten), and fast dif (fast-dif) (support was introduced in kernel 4.9). To solve this problem, we will disable unsupported features, and there are several options to do this:

1) dynamic disable

Rbd feature disable rbdl exclusive-lock object-map deep-flatten fast-diff--name client.rbd

2) when creating a RBD image, only the layered feature is enabled.

Rbd create rbd2-size 10240-image-feature layering--name client.rbd

3) disable in ceph configuration file

Rbd_default_features=1

# We disable it dynamically here

[root@ceph258] # rbd feature disable rbd2 exclusive-lock object-map fast-diff deep-flatten-- name client.rbd

Map rbd2

[root@ceph258] # rbd map-- image rbd2-- name client.rbd

View the rbd images that have been mapped on this machine

[root@ceph258] # rbd showmapped-- name client.rbd

View disk rbd0 size

Format rbd0

Create a mount directory and mount it

[root@ceph258 ~] # mkdir / mnt/ceph-disk1 [root@ceph258 ~] # mount / dev/rbd0 / mnt/ceph-disk1/

# write data test

[root@ceph258 ~] # dd if=/dev/zero of=/mnt/ceph-disk1/file1 count=100 bs=1M

# make it into a service and automatically mount it when powered on

[root@ceph203-] # wget-O / usr/local/bin/rbd-mount https://raw.githubusercontent.com/aishangwei/ceph-demo/master/client/rbd-mount

# vim / usr/local/bin/rbd-mount

[root@ceph258~] # chmod + x / usr/local/bin/rbd-mount [root@ceph258~] # wget-O / etc/systemd/system/rbd-mount.service https://raw.githubusercontent.com/aishangwei/ceph-demo/master/client/rbd-mount.service[root@ceph258 ~] # systemctl daemon-reload [root@ceph258~] # systemctl enable rbd-mount.serviceCreated symlink from / etc/systemd/system/multi-user.target.wants/rbd-mount.service to / etc/systemd/system/rbd-mount.service.

Uninstall the manually mounted directory and carry out the service automatic mount test

[root@ceph258 ~] # umount / mnt/ceph-disk1/ [root@ceph258 ~] # systemctl status rbd-mount

Ceph: RBD online capacity expansion

Operation of the Ceph management side

Query the total capacity and allocated capacity of pool

[root@ceph265 ~] # ceph df

View pool that already exists

[root@ceph265 ~] # ceph osd lspools

Check the existing rbd

Start dynamic expansion of rbd2

[root@ceph265] # rbd resize rbd/rbd2-- size 7168

Operation of Ceph client

[root@ceph258 ~] # rbd showmapped

[root@ceph258] # df-h

[root@ceph258] # xfs_growfs-d / mnt/ceph-disk1

3. Ceph cluster error solution 3.1 inconsistent content of configuration files between nodes

Entering the ceph-deploy mon create-initial command to obtain the key key will generate several key in the current directory (for example, mine is ~ / etc/ceph/), but the error is as follows. It means that the content of the configuration file of the two failed nodes is inconsistent with the current node, so it is prompted to use the-overwrite-conf parameter to overwrite the inconsistent configuration file.

# ceph-deploy mon create-initial... [ceph3] [DEBUG] remote hostname: ceph3 [ceph3] [DEBUG] write cluster configuration to / etc/ceph/ {cluster}. Confession [ceph _ deploy.mon] [ERROR] RuntimeError: config file / etc/ceph/ceph.conf exists with different content; use-- overwrite-conf to overwrite [ceph _ deploy] [ERROR] GenericError: Failed to create 2 monitors...

Enter the following command (I have configured a total of three nodes ceph2~3 here):

# ceph-deploy-- overwrite-conf mon create ceph {3 write cluster configuration to 1 DEBUG 2}... [ceph3] [DEBUG] remote hostname: ceph3 [ceph3] [DEBUG] write cluster configuration to / etc/ceph/ {cluster} .conf [ceph3] [DEBUG] create the mon path if it does not exist [ceph3] [DEBUG] checking for done path: / var/lib/ceph/mon/ceph-ceph3/done...

After the configuration is successful, you can continue to initialize the disk.

3.2 too few PGs per OSD (21

< min 30)警告[root@ceph2 ceph]# ceph -scluster:id: 8e2248e4-3bb0-4b62-ba93-f597b1a3bd40health: HEALTH_WARNtoo few PGs per OSD (21 < min 30)services:mon: 3 daemons, quorum ceph3,ceph2,ceph4…… 从上面集群状态信息可查，每个osd上的pg数量=21最小的数目30。 [root@ceph2 ceph]# ceph osd pool create mytest 8pool 'mytest' created[root@ceph2 ceph]# ceph osd pool create mytest1 8pool 'mytest1' created[root@ceph2 ceph]# ceph -scluster:id: 8e2248e4-3bb0-4b62-ba93-f597b1a3bd40health: HEALTH_OKservices:mon: 3 daemons, quorum ceph3,ceph2,ceph4mgr: ceph3(active), standbys: ceph2, ceph4osd: 3 osds: 3 up, 3 inrgw: 1 daemon activedata:pools: 6 pools, 48 pgsobjects: 219 objects, 1.1 KiBusage: 3.0 GiB used, 245 GiB / 248 GiB availpgs: 48 active+clean 集群健康状态显示正常。 3.3 集群状态是 HEALTH_WARN application not enabled on 1 pool(s) 如果此时，查看集群状态是HEALTH_WARN application not enabled on 1 pool(s)： [root@ceph2 ceph]# ceph -scluster:id: 13430f9a-ce0d-4d17-a215-272890f47f28health: HEALTH_WARNapplication not enabled on 1 pool(s)[root@ceph2 ceph]# ceph health detailHEALTH_WARN application not enabled on 1 pool(s)POOL_APP_NOT_ENABLED application not enabled on 1 pool(s)application not enabled on pool 'mytest'use 'ceph osd pool application enable ', where is 'cephfs', 'rbd', 'rgw', or freeform for custom applications. 运行ceph health detail命令发现是新加入的存储池mytest没有被应用程序标记，因为之前添加的是RGW实例，所以此处依提示将mytest被rgw标记即可： [root@ceph2 ceph]# ceph osd pool application enable mytest rgwenabled application 'rgw' on pool 'mytest' 再次查看集群状态发现恢复正常 [root@ceph2 ceph]# ceph healthHEALTH_OK3.4 删除存储池报错以下以删除mytest存储池为例，运行ceph osd pool rm mytest命令报错，显示需要在原命令的pool名字后再写一遍该pool名字并最后加上--yes-i-really-really-mean-it参数 [root@ceph2 ceph]# ceph osd pool rm mytestError EPERM: WARNING: this will *PERMANENTLY DESTROY* all data stored in pool mytest. If you are *ABSOLUTELY CERTAIN* that is what you want, pass the pool name *twice*, followed by --yes-i-really-really-mean-it. 按照提示要求复写pool名字后加上提示参数如下，继续报错： [root@ceph2 ceph]# ceph osd pool rm mytest mytest --yes-i-really-really-mean-itError EPERM: pool deletion is disabled; you must first set the mon_allow_pool_delete config option to true before you can destroy a pool 错误信息显示，删除存储池操作被禁止，应该在删除前现在ceph.conf配置文件中增加mon_allow_pool_delete选项并设置为true。所以分别登录到每一个节点并修改每一个节点的配置文件。操作如下： [root@ceph2 ceph]# vi ceph.conf [root@ceph2 ceph]# systemctl restart ceph-mon.target 在ceph.conf配置文件底部加入如下参数并设置为true，保存退出后使用systemctl restart ceph-mon.target命令重启服务。 [mon] mon allow pool delete = true 其余节点操作同理。 [root@ceph3 ceph]# vi ceph.conf [root@ceph3 ceph]# systemctl restart ceph-mon.target[root@ceph4 ceph]# vi ceph.conf [root@ceph4 ceph]# systemctl restart ceph-mon.target 再次删除，即成功删除mytest存储池。 [root@ceph2 ceph]# ceph osd pool rm mytest mytest --yes-i-really-really-mean-itpool 'mytest' removed3.5 集群节点宕机后恢复节点排错笔者将ceph集群中的三个节点分别关机并重启后，查看ceph集群状态如下： [root@ceph2 ~]# ceph -scluster:id: 13430f9a-ce0d-4d17-a215-272890f47f28health: HEALTH_WARN1 MDSs report slow metadata IOs324/702 objects misplaced (46.154%)Reduced data availability: 126 pgs inactiveDegraded data redundancy: 144/702 objects degraded (20.513%), 3 pgs degraded, 126 pgs undersizedservices:mon: 3 daemons, quorum ceph3,ceph2,ceph4mgr: ceph2(active), standbys: ceph3, ceph4mds: cephfs-1/1/1 up {0=ceph2=up:creating}osd: 3 osds: 3 up, 3 in; 162 remapped pgsdata:pools: 8 pools, 288 pgsobjects: 234 objects, 2.8 KiBusage: 3.0 GiB used, 245 GiB / 248 GiB availpgs: 43.750% pgs not active144/702 objects degraded (20.513%)324/702 objects misplaced (46.154%)162 active+clean+remapped123 undersized+peered3 undersized+degraded+peered 查看 [root@ceph2 ~]# ceph health detailHEALTH_WARN 1 MDSs report slow metadata IOs; 324/702 objects misplaced (46.154%); Reduced data availability: 126 pgs inactive; Degraded data redundancy: 144/702 objects degraded (20.513%), 3 pgs degraded, 126 pgs undersizedMDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOsmdsceph2(mds.0): 9 slow metadata IOs are blocked >

30 secs, oldest blocked for 42075 secsOBJECT_MISPLACED 324 objects misplaced (46.154%) PG_AVAILABILITY Reduced data availability: 126 pgs inactivepg 8.28 is stuck inactive for 42240.369934, current state undersized+peered, last acting [0] pg 8.2a is stuck inactive for 45566.934835, current state undersized+peered, last acting [0] pg 8.2d is stuck inactive for 42240.371314, current state undersized+peered, last acting [0] pg 8.2f is stuck inactive for 45566.913284, current state undersized+peered, last acting [0] pg 8.32 is stuck inactive for 42240.354304, current state undersized+peered Last acting [0].... pg 8.28 is stuck undersized for 42065.616897, current state undersized+peered, last acting [0] pg 8.2a is stuck undersized for 42065.613246, current state undersized+peered, last acting [0] pg 8.2d is stuck undersized for 42065.951760, current state undersized+peered, last acting [0] pg 8.2f is stuck undersized for 42065.610464, current state undersized+peered, last acting [0] pg 8.32 is stuck undersized for 42065.959081, current state undersized+peered, last acting [0]....

It can be seen that the values of inactive and undersized appear in the data repair, which is an abnormal phenomenon.

Solution:

① handles the pg of inactive:

Just restart the osd service

[root@ceph2 ~] # systemctl restart ceph-osd.target

Continue to check the cluster status and find that the pg of the inactivity value has returned to normal, and there is only the pg of undersized left.

[root@ceph2] # ceph-scluster:id: 13430f9a-ce0d-4d17-a215-272890f47f28health: HEALTH_WARN1 filesystem is degraded241/723 objects misplaced (33.333%) Degraded data redundancy: 59 pgs undersizedservices:mon: 3 daemons, quorum ceph3,ceph2,ceph4mgr: ceph2 (active), standbys: ceph3, ceph4mds: cephfs-1/1/1 up {0=ceph2=up:rejoin} osd: 3 osds: 3 up, 3 in 229 remapped pgsrgw: 1 daemon activedata:pools: 8 pools, 288 pgsobjects: 241 objects, 3.4 KiBusage: 3.0 GiB used, 245 GiB / 248 GiB availpgs: 241 objects misplaced 723 objects misplaced (33.333%) 224 active+clean+remapped59 active+undersized5 active+cleanio:client: 1.2 KiB/s rd, 1 op/s rd, 0 op/s wr

② handles the pg of undersized:

Learn to look at the details of your health status first. After careful analysis, you can find that although the set number of backups is 3, there are only two copies of PG 12.x, one of which is stored on one of OSD 02s.

[root@ceph2 ~] # ceph health detail HEALTH_WARN 241 objects misplaced 723 objects misplaced (33.333%) Degraded data redundancy: 59 pgs undersizedOBJECT_MISPLACED 241 objects misplaced (33.333%) PG_DEGRADED Degraded data redundancy: 59 pgs undersizedpg 12.8 is stuck undersized for 1910.001993, current state active+undersized, last acting [2L0] pg 12.9 is stuck undersized for 1909.989334, current state active+undersized, last acting [2L0] pg 12.an is stuck undersized for 1909.995807, current state active+undersized, last acting [0LJ 2] pg 12.b is stuck undersized for 1910.009596, current state active+undersized, last acting [1J 0] pg 12.c is stuck undersized for 1910.010185, current state active+undersized Last acting [0,2] pg 12.d is stuck undersized for 1910.001526, current state active+undersized, last acting [1,0] pg 12.e is stuck undersized for 1909.984982, current state active+undersized, last acting [2,0] pg 12.f is stuck undersized for 1910.010640, current state active+undersized, last acting [2,0]

Take a further look at the cluster osd status tree and find that after the ceph3 and cepn3 outages are restored, the osd.1 and osd.2 processes are no longer on ceph3 and cepn3.

[root@ceph2] # ceph osd treeID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF-1 0.24239 root default-9 0.16159 host centos7evcloud1 hdd 0.08080 osd.1 up 1.00000 1.00000 2 hdd 0.08080 osd.2 up 1.00000 1.00000-3 0.08080 host ceph20 hdd 0.08080 osd.0 up 1.00000 1.00000-50 host ceph3-7 0 host ceph4

View the osd.1 and osd.2 service status, respectively.

Solution:

Restart the osd.1 and osd.2 services in the ceph3 and ceph4 nodes, respectively, and remap the two services to the ceph3 and ceph4 nodes.

[root@ceph2 ~] # ssh ceph3 [root@ceph3 ~] # systemctl restart ceph-osd@1.service [root@ceph3 ~] # ssh ceph4 [root@ceph4 ~] # systemctl restart ceph-osd@2.service

Finally, look at the cluster osd state tree and find that the two services are remapped to the ceph3 and ceph4 nodes.

[root@ceph4] # ceph osd treeID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF-1 0.24239 root default-9 0 host centos7evcloud-3 0.08080 host ceph20 hdd 0.08080 osd.0 up 1.00000 1.00000-5 0.08080 host ceph31 hdd 0.08080 osd.1 up 1.00000 1.00000-7 0.08080 host ceph42 hdd 0.08080 osd.2 up 1.00000 1.00000

The cluster status also shows the HEALTH_OK that has not been seen for a long time.

[root@ceph4] # ceph-scluster:id: 13430f9a-ce0d-4d17-a215-272890f47f28health: HEALTH_OKservices:mon: 3 daemons, quorum ceph3,ceph2,ceph4mgr: ceph2 (active), standbys: ceph3, ceph4mds: cephfs-1/1/1 up {0=ceph2=up:active} osd: 3 osds: 3 up, 3 inrgw: 1 daemon activedata:pools: 8 pools, 288 pgsobjects: 241 objects, 3.6KiBusage: 3.1GiB used, 245GiB / 248GiB availpgs: 288active+clean

3.6 uninstall CephFS and then mount the Times error

The mount command is as follows:

Mount-t ceph 10.0.86.246 mnt/mycephfs/-o name=admin,secret=AQBAI/JbROMoMRAAbgRshBRLLq953AVowLgJPw==

Uninstall CephFS and then mount it Times error: mount error (2): No such file or directory

Note: first check to see if the / mnt/mycephfs/ directory exists and is accessible. Mine exists but still reports an error No such file or directory. But I restarted the osd service and surprisingly, CephFS can be mounted normally.

[root@ceph2 ~] # systemctl restart ceph-osd.target [root@ceph2 ~] # mount-t ceph 10.0.86.246 mount-t ceph 10.0.86.221 root@ceph2 6789 name=admin,secret=AQBAI/JbROMoMRAAbgRshBRLLq953AVowLgJPw== / / mnt/mycephfs/-o name=admin,secret=AQBAI/JbROMoMRAAbgRshBRLLq953AVowLgJPw==

It can be seen that the mount is successful ~!

[root@ceph2 ~] # df-hFilesystem Size Used Avail Use% Mounted on/dev/vda2 48G 7.5G 41G 16% / devtmpfs 1.9G 0 1.9G 0 / devtmpfs 2.0G 8.0K 2.0G 1% / dev/shmtmpfs 2.0G 17M 2.0G 1% / runtmpfs 2.0G 02.0G 0% / sys/fs/cgrouptmpfs 2.0G 24K 2.0G 1% / var/lib/ceph/osd/ceph-0tmpfs 396M 0396M 0% / run/user/ 010.0.86.246 Virtue 6789 Magnum 10.0.86.221 Frey 6789 Magi 10.0.86.253 Phantom 6789 / 249G 3.1G 246G 2% / mnt/mycephfs

Reference link

Https://blog.csdn.net/SL_World/article/details/84584366

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.