In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
How to analyze the principle of using Ceph RBD volume in Docker and Kubernetes? in view of this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.
Does the use of Ceph RBD block devices in Docker or Kubernetes cause additional performance loss compared to host? Carry on the principle analysis to the related technology with these questions.
Mount binding in Linux propagates the Mount binding relationship of Linux
The Linux Mount namespace provides support for isolating file systems by isolating file system mount points. It is the first Linux Namespace in history, so its identity bit is special, which is CLONE_NEWNS. After isolation, changes in file structures in different Mount Namespace do not affect each other. You can use / proc/ / mounts to see all the file systems mounted in the current Namespace, and / proc/ [pid] / mountstats to see the statistics of file devices in Mount Namespace, including the name of the mounted file, file system type, mount location, and so on.
When the process creates the Mount Namespace, it copies the current file structure to the new Namespace. All Mount operations in the new Namespace affect only their own file systems and have no impact on the outside world. This achieves very strict isolation, but some cases may not apply. For example, if the process in the parent node Namespace mounts a CD-ROM, the directory structure of the Namespace copy of the child node cannot automatically mount the CD-ROM, because this operation will affect the file system of the parent node.
Mount propagation (Mount Propagation), introduced in 2006, solves this problem. Mount propagation defines the relationship between mount objects (Mount Object), which is used by the system to determine how mount events in any mount object propagate to other mount objects (reference: http://www.ibm.com/developerworks/library/l-mount-namespaces/). The so-called propagation event refers to the event of mounting and unmounting of other mounted objects caused by the change of the state of one mounted object.
Shared relationship (Share Relationship). If two mount objects have a shared relationship, mount events in one mount object are propagated to the other mount object, and vice versa.
Dependency (Slave Relationship). If two mount objects form a dependency, the mount event in one mount object propagates to the other mount object, but not vice versa; in this relationship, the dependent object is the recipient of the event.
A mount state may be one of the following:
Shared mount (Shared)
Slave mount (Slave)
Shared / dependent mount (Shared And Slave)
Private mount (Private)
Unbound mount (Unbindable)
The mount object that propagates the event is called shared mount (Shared Mount); the mount object that receives the propagation event is called Slave Mount. Mounted objects that neither propagate nor receive propagated events are called private mounts (Private Mount). Another special mount object is called unbound mount (Unbindable Mount), which is similar to private mount, but does not allow binding mount, that is, this file object cannot be copied when the Mount Namespace is created.
The application scenario of shared mount is very obvious, which is a mount mode that must exist for the sharing of file data; the greater significance of dependent mount lies in some "read-only" scenarios; private mount is actually pure isolation and exists as an independent individual. Unbound mounts help prevent unnecessary file copies, such as a user data directory, which needs to have an option that cannot be copied for both privacy and practical use when the root directory is copied recursively.
By default, all mounts are private. The commands set to share mount are as follows:
$mount-make-shared
Mount objects cloned from shared mounts are also shared mounts; they propagate mount events to each other.
The commands set as dependent mounts are as follows:
$mount-make-slave
The mount object cloned from the slave mount is also the slave mount, and it is also subordinate to the master mount object of the original slave mount.
If you set a dependent mount object as shared / dependent mount, you can execute the following command or move it to a shared mount object:
$mount-make-shared
If you want to remark the modified mount object as private, you can execute the following command:
$mount-make-private
You can mark the mount object as unbound by executing the following command:
$mount-- Mount binding test for make-unbindable Linux
View native block devices:
$lsblkNAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTsda 8:0 0 222.6G 0 disk ├─ sda1 8:1 0200M 0 part / boot └─ sda2 8:2 0222.4G 0 part ├─ centos-root 253 centos-root 0 122.4G 0 lvm / └─ centos-home 253 Fringe 1 0 100G 0 lvm / home
Create and bind directories:
$mkdir / opt/tmp / mnt/tmp / mnt/tmp1 / mnt/tmp2 $mount-- bind / opt/tmp / mnt/tmp$ mount-- bind / mnt/tmp1 / mnt/tmp2
View the details of the binding directory:
$cat / proc/self/mountinfo | grep / mnt/tmp549 40 253 xfs 0 / opt/tmp / mnt/tmp rw,relatime shared:1-xfs / dev/mapper/centos-root rw,seclabel,attr2,inode64,noquota583 40 253 dev/mapper/centos-root rw,seclabel,attr2,inode64,noquota 0 / mnt/tmp1 / mnt/tmp2 rw,relatime shared:1-xfs / dev/mapper/centos-root rw,seclabel,attr2,inode64,noquota
You can see that both binding directories are shared, with a shared ID of 1 and a parent directory on the 253VRO device.
The main ways to use data volumes in Docker
Reference documentation:
Manage data in Docker
Use bind mounts .
Mount modes supported by Docker:
Create a data volume (volume mount):
Docker run-- rm-it-v / data1 centos:7 bash# Or$ docker run-- rm-it-v data1:/data1 centos:7 bash# Or$ docker run-- rm-it-- mount target=/data1 centos:7 bash# Or$ docker run-- rm-it-- mount type=volume,target=/data1 centos:7 bash# Or$ docker run-- rm-it-- mount type=volume,source=data1 Target=/data1 centos:7 bash$ docker ps | awk 'NR==2 {print $1}' | xargs-i docker inspect-f'{{State.Pid}}'{} | xargs-I cat / proc/ {} / mountinfo | grep data1029 1011 253 grep data1029 0 / var/lib/docker/volumes/239be79a64f7fa6ec815b1d9f2a7773a678ee5c8c1150f03ca81b0d5177b36a0/_data / data1 rw,relatime master:1-xfs / dev/mapper/centos-root rw,seclabel,attr2,inode64,noquota
Map an external volume (bind mount):
$docker run-- rm-it-v / opt:/data2 centos:7 bash# Or$ docker run-- rm-it-- mount type=bind,source=/opt,target=/data2 centos:7 bash$ docker ps | awk 'NR==2 {print $1}' | xargs-i docker inspect-f'{{.State.Pid}}'{} | xargs-I cat / proc/ {} / mountinfo | grep data1029 1011 253fre0 / opt / data2 rw,relatime-xfs / dev/mapper/centos-root rw,seclabel,attr2,inode64,noquota
Use a data container (volume mount):
$docker create-- name vc-v / data1 centos:7$ docker run-- rm-it-- volumes-from vc centos:7 bash$ docker ps | awk 'NR==2 {print $1}' | xargs-i docker inspect-f'{{.State.Pid}}'{} | xargs-I cat / proc/ {} / mountinfo | grep data1029 1011 253fre0 / var/lib/docker/volumes/fe71f2d0ef18beb92cab7b99afcc5f501e47ed18224463e8c1aa1e8733003803/_data / data1 rw,relatime master:1-xfs / dev/mapper/centos-root rw,seclabel,attr2,inode64,noquota
Container with packaged data (volume mount):
Edit Dockerfile
FROM busybox:latestADD htdocs / usr/local/apache2/htdocsVOLUME / usr/local/apache2/htdocs
Create a container
$mkdir htdocs$ echo `date` > htdocs/test.txt$ docker build-t volume-test. $docker create-- name vc2-v / data1 volume-test$ docker run-- rm-it-- volumes-from vc2 volume-test sh/ # cat / proc/self/mountinfo | grep htdocs1034 1011 253 proc/self/mountinfo 0 / var/lib/docker/volumes/54f47af60b8fb25602f022dcd8ad5b3e1a93a2d20c1045184a70391d9bed69b6/_data / usr/local/apache2/htdocs rw,relatime master:1-xfs / dev/mapper/centos-root rw,seclabel,attr2,inode64 Noquota$ docker ps | awk 'NR==2 {print $1}' | xargs-i docker inspect-f'{{State.Pid}}'{} | xargs-I cat / proc/ {} / mountinfo | grep htdocs1034 1011 253 grep htdocs1034 0 / var/lib/docker/volumes/54f47af60b8fb25602f022dcd8ad5b3e1a93a2d20c1045184a70391d9bed69b6/_data / usr/local/apache2/htdocs rw,relatime master:1-xfs / dev/mapper/centos-root rw,seclabel,attr2,inode64,noquota
Use temporary external volumes (temporary mount):
$docker run-- rm-it-- mount type=tmpfs,target=/data1 centos:7 bash$ docker ps | awk 'NR==2 {print $1}' | xargs-i docker inspect-f'{{.State.Pid}}'{} | xargs-I cat / proc/ {} / mountinfo | grep data1029 1011 Vor160 / / data1 rw,nosuid,nodev,noexec,relatime-tmpfs tmpfs rw,seclabel tests related to the use of block devices in Docker
You can only use directories that have been formatted and mounted on the host by binding and mounting the block devices that use the host in the container:
$docker run-- rm-it-v / data1-v / opt:/data2 centos:7 bash [root@4282b3df2417 /] # mount | grep data/dev/sdb1 on / data2 type xfs (rw,relatime,attr2,inode64,noquota) / dev/sda1 on / data1 type xfs (rw,relatime,attr2,inode64,noquota) $docker inspect 4282b3df2417 | grep-I pid "Pid": 12797, "PidMode": "," PidsLimit ": 0 $cat / proc/12797/mounts | grep data/dev/sdb1 / data2 xfs rw,relatime,attr2,inode64,noquota 0 0/dev/sda1 / data1 xfs rw,relatime,attr2,inode64,noquota 0 0
Using a block device through a shared device in the container can read and write, but cannot be mounted:
Docker run-- rm-it-- device / dev/sdc:/dev/sdc centos:7 bash [root@55423f5eaeea /] # mkfs-t minix / dev/sdc21856 inodes65535 blocksFirstdatazone=696 Zonesize=1024Maxsize=268966912 [root@55423f5eaeea /] # mknod / dev/sdd b 8 48 [root@55423f5eaeea /] # mkfs-t minix / dev/sddmkfs.minix: cannot open / dev/sdd: Operation not permitted [root@55423f5eaeea /] # rm / dev/sdcrm: remove block special file'/ dev/sdc'? Y [root@55423f5eaeea /] # mknod / dev/sdc b 8 32 [root@55423f5eaeea /] # mkfs-t minix / dev/sdc21856 inodes65535 blocksFirstdatazone=696 Zonesize=1024Maxsize=268966912 [root@55423f5eaeea /] # mount / dev/sdc mnt/ [root@55423f5eaeea /] # mount: permission denied [root@55423f5eaeea /] # dd if=/dev/sdc of=/dev/null bs=512 count=1010+0 records in10+0 records out5120 bytes (5.1kB) copied, 0.000664491 s 7.7 MB/s [root@55423f5eaeea /] # dd if=/dev/zero of=/dev/sdc bs=512 count=1010+0 records in10+0 records out5120 bytes (5.1 kB) copied, 0.00124138 s, 4.1 MB/s
Using block devices in privileged mode within the container, you can read, write and mount:
$docker run-- rm-it-- privileged=true centos:7 bash [root@b5c40e199476 /] # mount / dev/sdc mnt [root@b5c40e199476 /] # mkfs-t minix / dev/sdcmount: unknown filesystem type 'minix' [root@b5c40e199476 /] # yum install-y xfsprogs [root@b5c40e199476 /] # mkfs.xfs / dev/sdc-fmeta-data=/dev/sdc isize=512 agcount=4, agsize=6553600 blks = sectsz=512 attr=2 Projid32bit=1 = crc=1 finobt=0, sparse=0data = bsize=4096 blocks=26214400, imaxpct=25 = sunit=0 swidth=0 blksnaming = version 2 bsize=4096 ascii-ci=0 ftype=1log = internal log bsize=4096 blocks=12800, version=2 = sectsz=512 sunit=0 blks, lazy-count=1realtime = none extsz=4096 blocks=0 Rtextents=0 [root@b5c40e199476 /] # mount / dev/sdc mnt [root@b5c40e199476 /] # df-hFilesystem Size Used Avail Use% Mounted onoverlay 30G 19G 12G 62% / tmpfs 910M 0910M 0% / devtmpfs 910M 0910M 0% / sys/fs/cgroup/dev/sda1 30G 19G 12G 62% / etc/hostsshm 64M 064M 0% / dev Use and implementation of block devices in / shm/dev/sdc 100G 33m 100G 1% / mnt [root@b5c40e199476 /] # echo `date` > / mnt/time.txt [root@b5c40e199476 /] # cat / mnt/time.txt Wed Mar 6 12:23:05 UTC 2019Kubernetes
Looking at the source code when kublet initializes the root directory / var/lib/kubelet, you can see that kubelet uses the syscall.MS_SHARED | syscall.MS_REC flag, and all Mount under this directory are shared by default (equivalent to executing mount-- make-rshared / var/lib/kubelet):
/ / pkg/kubelet/kubelet.go// setupDataDirs creates:// 1. The root directory// 2. The pods directory// 3. The plugins directory// 4. The pod-resources directoryfunc (kl * Kubelet) setupDataDirs () error { If err: = kl.mounter.MakeRShared (kl.getRootDir ()); err! = nil {return fmt.Errorf ("error configuring root directory:% v", err)}.} / / pkg/util/mount/nsenter_mount.gofunc (n * NsenterMounter) MakeRShared (path string) error {return doMakeRShared (path, hostProcMountinfoPath)} / / pkg/util/mount/mount_linux.go// doMakeRShared is common implementation of MakeRShared on Linux. It checks if// path is shared and bind-mounts it as rshared if needed. MountCmd and// mountArgs are expected to contain mount-like command, doMakeRShared will add//'--bind 'and'-- make-rshared'to mountArgs.func doMakeRShared (path string, mountInfoFilename string) error {shared, err: = isShared (path, mountInfoFilename) if err! = nil {return err} if shared {klog.V (4). Infof ("Directory% s is already on a shared mount" Path) return nil} klog.V (2). Infof ("Bind-mounting% q with shared mount propagation", path) / / mount-- bind / var/lib/kubelet / var/lib/kubelet if err: = syscall.Mount (path, path, "/ * fstype*/, syscall.MS_BIND," / * data*/) Err! = nil {return fmt.Errorf ("failed to bind-mount% s:% v", path, err)} / / mount-- make-rshared / var/lib/kubelet if err: = syscall.Mount (path, path, "/ * fstype*/, syscall.MS_SHARED | syscall.MS_REC," / * data*/) Err! = nil {return fmt.Errorf ("failed to make% s rshared:% v", path, err)} return nil}
Create a Pod that creates a RBD using PVC:
$echo 'apiVersion: v1kind: Podmetadata: name: nginx-testspec: containers:-name: nginx image: nginx:latest volumeMounts:-name: nginx-test-vol1 mountPath: / data/ readOnly: false volumes:-name: nginx-test-vol1 persistentVolumeClaim: claimName: nginx-test-vol1-claim' | kubectl create-f-pod/nginx-test created
View the status of PVC:
$kubectl get pvcNAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGEnginx-test-vol1-claim Bound pvc-d6f6b6f8-446c-11e9-bbd8-6c92bf74be54 10Gi RWO ceph-rbd 114s$ kubectl describe pvc nginx-test-vol1-claimName: nginx-test-vol1-claimNamespace: defaultStorageClass: ceph-rbdStatus: BoundVolume: Pvc-d6f6b6f8-446c-11e9-bbd8-6c92bf74be54Labels: Annotations: pv.kubernetes.io/bind-completed: yes pv.kubernetes.io/bound-by-controller: yes volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/rbdFinalizers: [kubernetes.io/pvc-protection] Capacity: 10GiAccess Modes: RWOVolumeMode: FilesystemEvents: Type Reason Age From Message-Normal ProvisioningSucceeded 6m36s persistentvolume-controller Successfully provisioned volume pvc-d6f6b6f8-446c-11e9-bbd8-6c92bf74be54 using kubernetes.io/rbdMounted By: nginx-test
View the status of PV:
$kubectl get pvNAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGEpvc-d6f6b6f8-446c-11e9-bbd8-6c92bf74be54 10Gi RWO Delete Bound default/nginx-test-vol1-claim ceph-rbd 105s $kubectl describe pv pvc-d6f6b6f8-446c-11e9-bbd8-6c92bf74be54Name: Pvc-d6f6b6f8-446c-11e9-bbd8-6c92bf74be54Labels: Annotations: kubernetes.io/createdby: rbd-dynamic-provisioner pv.kubernetes.io/bound-by-controller: yes pv.kubernetes.io/provisioned-by: kubernetes.io/rbdFinalizers: [kubernetes.io/pv-protection] StorageClass: ceph-rbdStatus: BoundClaim: default/nginx-test-vol1- ClaimReclaim Policy: DeleteAccess Modes: RWOVolumeMode: FilesystemCapacity: 10GiNode Affinity: Message: Source: Type: RBD (a Rados Block Device mount on the host that shares a pod's lifetime) CephMonitors: [172.29.201.125 RBDImage: kubernetes-dynamic-pvc-db7fcd29-446c-11e9-af81-6c92bf74be54 FSType: RBDPool: K8s RadosUser: K8s Keyring: / etc/ceph/keyring SecretRef: & SecretReference {Name:ceph-k8s-secret Namespace:,} ReadOnly: falseEvents:
View the RBD created and mapped:
$rbd ls-p k8skubernetes-dynamic-pvc-db7fcd29-446c-11e9-af81-6c92bf74be54 $lsblk | grep rbd0rbd0 252446c-11e9-af81 0 010G 0 disk / var/lib/kubelet/pods/18a8fb7b-446d-11e9-bbd8-6c92bf74be54/volumes/kubernetes.io~rbd/pvc-d6f6b6f8-446c-11e9-bbd8-6c92bf74be54
View the mount information of RBD:
$cat / proc/self/mountinfo | grep rbd0313 40 252 6c92bf74be54 rw,relatime shared:262 0 / / var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/k8s-image-kubernetes-dynamic-pvc-db7fcd29-446c-11e9-af81-6c92bf74be54 rw,relatime shared:262-ext4 / dev/rbd0 rw,seclabel,stripe=1024,data=ordered318 40 252 446c-11e9-bbd8 0 / / var/lib/kubelet/pods/18a8fb7b-446d-11e9-bbd8-6c92bf74be54/volumes/kubernetes.io~rbd/pvc-d6f6b6f8-6c92bf74be54 rw,relatime shared:262-ext4 / dev/rbd0 rw,seclabel,stripe=1024 Data=ordered
You can see that RBD is mounted in two locations, one is Pod's Volume directory, and the other is the RBD plug-in directory, and both directories are shared:262, indicating that the two directories are bound.
View the location of the RBD mount directory:
$cat / proc/self/mountinfo | grep "^ 40" 400 0 253 grep / / rw,relatime shared:1-xfs / dev/mapper/centos-root rw,seclabel,attr2,inode64,noquota
You can see that the RBD is mounted on the 253 0 device, which is where the root directory of the host is mounted.
View the volume directory mounted by Pod:
$cat / proc/self/mountinfo | grep 18a8fb7b-446d-11e9-bbd8-6c92bf74be54303 40 0:56 / / var/lib/kubelet/pods/18a8fb7b-446d-11e9-bbd8-6c92bf74be54/volumes/kubernetes.io~secret/default-token-zn95h rw,relatime shared:233-tmpfs tmpfs rw,seclabel318 40 252 6c92bf74be54303 0 / / var/lib/kubelet/pods/18a8fb7b-446d-11e9-bbd8-6c92bf74be54/volumes/kubernetes.io~rbd/pvc-d6f6b6f8-446c-11e9-bbd8-6c92bf74be54 rw,relatime shared:262-ext4 / dev/rbd0 rw,seclabel,stripe=1024 Data=ordered$ cat / proc/self/mountinfo | grep shared:233303 40 0:56 / / var/lib/kubelet/pods/18a8fb7b-446d-11e9-bbd8-6c92bf74be54/volumes/kubernetes.io~secret/default-token-zn95h rw,relatime shared:233-tmpfs tmpfs rw,seclabel
You can see that Pod has two volumes mounted, in addition to the previous RBD, there is also a volume where Secret is stored.
View the mount directory in the Docker container of Pod:
$docker inspect $(docker ps | grep nginx_nginx-test | awk'{print $1}') | grep Mounts-A33 "Mounts": [{"Type": "bind", "Source": "/ var/lib/kubelet/pods/18a8fb7b-446d-11e9-bbd8-6c92bf74be54/volumes/kubernetes.io~rbd/pvc-d6f6b6f8-446c-11e9-bbd8-6c92bf74be54", "Destination": "/ data" "Mode": "Z", "RW": true, "Propagation": "rprivate"}, {"Type": "bind", "Source": "/ var/lib/kubelet/pods/18a8fb7b-446d-11e9-bbd8-6c92bf74be54/volumes/kubernetes.io~secret/default-token-zn95h" "Destination": "/ var/run/secrets/kubernetes.io/serviceaccount", "Mode": "ro,Z", "RW": false, "Propagation": "rprivate"}, {"Type": "bind" "Source": "/ var/lib/kubelet/pods/18a8fb7b-446d-11e9-bbd8-6c92bf74be54/etc-hosts", "Destination": "/ etc/hosts", "Mode": "Z", "RW": true, "Propagation": "rprivate"} {"Type": "bind", "Source": "/ var/lib/kubelet/pods/18a8fb7b-446d-11e9-bbd8-6c92bf74be54/containers/nginx/190cc168", "Destination": "/ dev/termination-log", "Mode": "Z", "RW": true "Propagation": "rprivate"}]
You can see that these volumes of Docker are finally mounted through Bind, and Mount broadcasts use the rprivate attribute.
Check the Mount in the container of Pod:
$docker exec-it $(docker ps | grep nginx_nginx-test | awk'{print $1}') df-hFilesystem Size Used Avail Use% Mounted onoverlay 123G 4.7G 118G 4% / tmpfs 64M 064m 0% / devtmpfs 189G 0189G 0% / sys/fs/cgroup/dev/rbd0 9. 8G 37M 9.7G 1% / data/dev/mapper/centos-root 123G 4.7G 118G 4% / etc/hostsshm 64M 0 64m 0 / dev/shmtmpfs 189G 12K 189G 1% / run/secrets/kubernetes.io/serviceaccounttmpfs 189G 0189G 0 0 / proc/acpitmpfs 189G 0 189g 0% / proc/scsitmpfs 189G 0189G 0% / sys/firmware$ docker exec-it $(docker ps | grep nginx_nginx-test | awk'{print $1}') cat / proc/self/mountinfo | grep-e rbd-e serviceaccount617 599 252 serviceaccount617 0 / / data rw Relatime-ext4 / dev/rbd0 rw,seclabel,stripe=1024,data=ordered623 599 0:56 / / run/secrets/kubernetes.io/serviceaccount ro,relatime-tmpfs tmpfs rw,seclabel
You can see that RBD and Secret directories are mainly mounted in the container of Pod.
Analysis and summary
In Docker, no matter which way you use data volumes, you actually take advantage of Linux's mount-- bind binding mount function.
When using RBD volumes in Kubernetes, first rbd map to the host and format it, then mount to the host directory, and finally use the host directory mount-bind to the specified directory of the container.
According to the principle analysis, it can be inferred that there is no essential difference between testing the read and write performance of RBD in the host and testing the performance in Docker and Kubernetes respectively, and Docker and Kubernetes themselves will not affect the performance of RBD (and then I used Fio to conduct a complete performance test, which is also consistent with this conclusion).
This is the answer to the question on how to analyze the principles of using Ceph RBD volumes in Docker and Kubernetes. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel to learn more about it.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.