What are the standards of Ceph distributed storage hardware 10/29 Update SLTechnology News&Howtos

What are the standards of Ceph distributed storage hardware

2025-10-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

Today, I would like to share with you what are the relevant knowledge points of Ceph distributed storage hardware standards, detailed content and clear logic. I believe most people still know too much about this knowledge, so share this article for your reference. I hope you can get something after reading this article. Let's learn about it.

Ceph is a reliable, scalable, unified, distributed storage system. Can provide object storage RADOSGW, block storage RBD (Rados Block Device), file system storage Ceph FS3 functions at the same time. When planning a Ceph distributed storage cluster environment, the choice of hardware is very important, which is related to the performance of the entire Ceph cluster.

1) CPU selection

Ceph metadata server redistributes load dynamically, it is CPU sensitive, so Metadata Server should have better processor performance (such as quad-core CPU). When Ceph OSDs runs RADOS service, it needs CRUSH to calculate the storage location of data, replicate data, and maintain the copy of Cluster Map, so OSD also needs appropriate processing performance. Ceph Monitors simply maintains the backbone information of Cluster Map, so this is CPU insensitive.

2) RAM selection

Metadata servers and Monitors must be able to provide data quickly, so there must be sufficient memory (e.g.1GB of RAM per daemon instance). OSDs does not need too much memory (e.g.500MB of RAM per daemon instance) to perform normal operations, but a large amount of memory (e.g., ~ 1GB per 1TB of storage per daemon) is needed to perform recovery operations. Generally, and the more the better.

3) Data Storage selection

The tradeoff between cost and performance should be considered when planning data storage. Simultaneous OS operations and multiple daemons reading and writing to a single drive will significantly degrade performance. There are also file system limitations to consider: BTRFS is not very stable for production environments, but has the ability to record journal and write data in parallel, while XFS and EXT4 are better.

Tip: running multiple OSD on partitions of a single disk is not recommended. Running an OSD and a monitor or metadata service on a partition of a single disk is not recommended.

Storage drives are limited by seek time, access time, read and write time, and total throughput. These physical limitations affect the performance of the entire system, especially during the recovery process. We recommend that you use a dedicated drive for the operating system and software, and assign one drive for each OSD daemon you run on the host. Most "slow OSD" problems arise from running multiple OSDs and / or logs on the same drive of the same operating system.

Since the cost of solving a small part of the performance problem may outweigh the cost of additional disk drives, you can speed up your cluster design planning to avoid overloading OSD storage drives.

Running multiple Ceph OSD daemons on each hard drive at the same time, but this can lead to resource contention and reduce overall throughput. You may store log and object data on the same drive, but this may increase the time spent recording writes and sending ACK to the client. Before CEPH can ACK for write operations, Ceph must write the operations to the log.

BTRFS file system log data and object data can be written at the same time, while XFS and ext4 cannot. Ceph's recommended practice is to run the operating system, OSD data, and OSD logs on separate drives.

4) solid state disk selection

One of the opportunities for performance improvement is the use of solid state drives (SSD) to reduce random access time, read latency, and throughput acceleration. Solid state drives tend to cost more than 10 times more per GB than hard drives, but solid state drives tend to perform at least 100 times faster than hard drives.

Solid state drives do not move mechanical parts, so they do not need to be limited by the same type of hard drive. Although SSDs have obvious limitations. It is important to consider its continuous read and write performance. When storing multiple logs of multiple OSDs, the performance of SSD with sequential write throughput of 400MB/s is better and faster than that of mechanical disk 120MB/s.

OSD object storage for SSDs is expensive and you may see a significant performance improvement on OSDs when storing an OSD log on a separate hard disk drive SSD and OSD object data. The OSD log configuration defaults to / var/lib/ceph/osd/id/journal. You can mount this path to the partition of SSD or SSD and store the log files and data files on different disks.

5) Networks selection

It is recommended that each machine has at least two gigabit network cards. Now most ordinary hard drives can swallow up to 100MB/s, and the network card should be able to handle the total throughput of OSD hard disks, so at least two gigabit network cards are recommended for public network and cluster_network. A clustered network (preferably not connected to the Internet) is used to handle the additional load caused by data replication and helps prevent denial of service attacks, which interfere with the data configuration group so that it cannot return to the active+clean state when OSD data is replicated. Please consider deploying a 10 Gigabit network card. Replicating 1TB data over the 1Gbps network takes 3 hours, while 3TB (a typical driver configuration) takes 9 hours, in contrast to 20 minutes and 1 hour, respectively, if 10Gbps is used.

In a PB-level cluster, OSD disk failure is the norm, not an exception; under the premise of a reasonable performance-to-price ratio, the system administrator wants PG to recover from the degraded state to the active+clean state as soon as possible. The use of 10G network card is worth considering. Each network's top rack router to core router communication should have faster throughput, for example, 40Gbps~100Gbps.

6) other considerations:

You can run multiple OSD processes on each host, but you should ensure that the total throughput of the OSD hard disk does not exceed the network bandwidth required by the client to read or write data. The storage rate of data on each host should also be considered. If the percentage on a particular host is large, it can cause problems: to prevent data loss, it can cause Ceph to stop operation.

When running multiple OSD processes on each host, you also need to ensure that the kernel is up-to-date. When multiple OSD processes are running on each host (such as > 20), many threads are generated, especially for recovery and relalancing operations. The default thread limit for many Linux kernels is relatively small (for example, 32k). If you encounter this problem, you can consider setting kernel.pid_max a little higher. The theoretical maximum is 4194303.

These are all the contents of the article "what are the standards for Ceph distributed storage hardware?" Thank you for reading! I believe you will gain a lot after reading this article. The editor will update different knowledge for you every day. If you want to learn more knowledge, please pay attention to the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.