Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Example Analysis of ceph distributed Storage

2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article will explain in detail the example analysis of ceph distributed storage. The editor thinks it is very practical, so I share it with you for reference. I hope you can get something after reading this article.

Ceph: a Linux PB-level distributed file system

Ceph was originally a PhD research project on storage systems, implemented by Sage Weil in University of California, Santa Cruz (UCSC). But by the end of March 2010, you can find Ceph in the mainline Linux kernel (starting with version 2.6.34). Although Ceph may not be suitable for production environments, it is very useful for testing purposes. This article explores the Ceph file system and its unique features that make it the most attractive alternative to scalable distributed storage.

Ceph goal

Developing a distributed file system requires many efforts, but if the problem can be solved accurately, it is priceless. The goal of Ceph is simply defined as:

Can easily scale to several PB capacity

High performance for multiple workloads (input / output operations per second [IOPS] and bandwidth)

High reliability

Unfortunately, these goals compete with each other (for example, scalability degrades or suppresses performance or affects reliability). Ceph has developed some very interesting concepts (for example, dynamic metadata partitioning, data distribution, and replication) that are only briefly discussed in this article. Ceph's design also includes fault tolerance to protect against a single point of failure, which assumes that large-scale (PB-level storage) storage failures are a common phenomenon rather than an exception. Finally, it is not designed to assume a particular workload, but includes the ability to adapt to changing workloads and provide optimal performance. It accomplishes all these tasks with POSIX compatibility, allowing it to transparently deploy applications that currently rely on POSIX semantics (through improvements aimed at Ceph). Finally, Ceph is open source distributed storage and part of the mainline Linux kernel (2.6.34).

Ceph architecture

Now, let's explore the architecture of Ceph and the high-end core elements. Then I'll take it to another level to explain some of the key aspects of Ceph and provide a more detailed discussion.

The Ceph ecosystem can be roughly divided into four parts (see figure 1): clients (data users), metadata servers (caching and synchronizing distributed metadata), an object storage cluster (storing data and metadata as objects and performing other key functions), and finally a cluster monitor (performing monitoring functions).

Figure 1. Conceptual architecture of the Ceph ecosystem

As shown in figure 1, customers use a metadata server to perform metadata operations (to determine the location of the data). The metadata server manages the location of the data and where to store the new data. It is worth noting that the metadata is stored in a storage cluster (labeled "metadata Ibig O"). The actual file Imax O occurs between the customer and the object storage cluster. As a result, higher-level POSIX functions (for example, open, close, rename) are managed by the metadata server, while POSIX functions (such as read and write) are managed directly by the object storage cluster.

Another architectural view is provided in figure 2. A series of servers access the Ceph ecosystem through a client interface, which understands the relationship between metadata servers and object-level storage. Distributed storage systems can be viewed in several layers, including the format of a storage device (Extent and B-tree-based Object File System [EBOFS] or an alternative), and an overlay layer designed to manage data replication, failure detection, recovery, and subsequent data migration, called Reliable Autonomic Distributed Object Storage (RADOS). Finally, the monitor is used to identify component failures, including subsequent notifications.

Figure 2. A simplified hierarchical view of the Ceph ecosystem

Ceph component

Once you understand the conceptual architecture of Ceph, you can dig to another level to understand the main components implemented in Ceph. One of the important differences between Ceph and traditional file systems is that it applies intelligence to the ecosystem rather than to the file system itself.

Figure 3 shows a simple Ceph ecosystem. Ceph Client is a user of the Ceph file system. Ceph Metadata Daemon provides a metadata server, while Ceph Object Storage Daemon provides actual storage (for both data and metadata). Finally, Ceph Monitor provides cluster management. It is important to note that Ceph clients, object storage endpoints, and metadata servers (depending on the capacity of the file system) can have many, and at least one pair of redundant monitors. So how is the file system distributed?

Figure 3. Simple Ceph ecosystem

Ceph client

Because Linux displays a common interface of the file system (through the virtual file system switch [VFS]), the user perspective of Ceph is transparent. The administrator's perspective is certainly different, taking into account the potential for many servers to include storage systems (see the Resources section for more information on creating Ceph clusters). From the user's point of view, they access high-capacity storage systems without knowing the metadata servers, monitors, and independent object storage devices that are aggregated into a large storage pool. The user simply sees an installation point, at which point the standard file Imax O can be executed.

The Ceph file system-or at least the client interface-is implemented in the Linux kernel. It is worth noting that in most file systems, all control and intelligence are performed in the kernel's file system source itself. However, in Ceph, the intelligence of the file system is distributed on the nodes, which simplifies the client interface and provides large-scale (even dynamic) scalability for Ceph.

Ceph uses an interesting alternative rather than relying on the allocation list (mapping blocks on disk to the metadata of the specified file). A file in the Linux perspective is assigned an inode number (INO) from the metadata server, which is a unique identifier for the file. The file is then pushed into some objects (depending on the size of the file). With INO and object number (ONO), each object is assigned an object ID (OID). Using a simple hash on OID, each object is assigned to a placement group. The placement group (identified as PGID) is a conceptual container for an object. Finally, the mapping of placement groups to object storage devices is a pseudorandom mapping, using an algorithm called Controlled Replication Under Scalable Hashing (CRUSH). In this way, the mapping of placement groups (and replicas) to storage devices does not depend on any metadata, but on a pseudo-random mapping function. This operation is ideal because it minimizes storage overhead and simplifies allocation and data query.

The final component of the allocation is the cluster mapping. The cluster mapping is a valid representation of the device, showing the storage cluster. With PGID and cluster mapping, you can locate any object.

Ceph metadata server

The job of the metadata server (cmds) is to manage the namespaces of the file system. Although both metadata and data are stored in the object storage cluster, they are managed separately and support scalability. In fact, the metadata is further split on a metadata server cluster, and the metadata server can copy and allocate namespaces adaptively to avoid hotspots. As shown in figure 4, the metadata server manages the namespace portion and can overlap (for redundancy and performance). Metadata server-to-namespace mapping is performed in Ceph using dynamic subtree logical partitioning, which allows Ceph to adjust changing workloads (migrate namespaces between metadata servers) while preserving the performance location.

Figure 4. Partition of the Ceph namespace of the metadata server

But because each metadata server simply manages the namespace of the client population, its main application is an intelligent metadata cache (because the actual metadata is eventually stored in the object storage cluster). The metadata for write operations is cached in a short-term log, which is eventually pushed into physical storage. This action allows the metadata server to give back the most recent metadata to the customer (which is common in metadata operations). This log is also useful for failure recovery: if the metadata server fails, its logs are replayed to ensure that the metadata is safely stored on disk.

The metadata server manages the inode space and converts the file name to metadata. The metadata server converts the file name to the Inode, the file size, and the segmented data (layout) used by the Ceph client for the file Imax O.

Ceph monitor

Ceph contains monitors that implement cluster mapping management, but some elements of fault management are performed in the object store itself. When an object storage device fails or a new device is added, the monitor detects and maintains a valid cluster mapping. This function is performed in a distributed manner in which mapping upgrades can communicate with current traffic. Ceph uses Paxos, which is a series of distributed consensus algorithms.

Ceph object storage

Similar to traditional object storage, Ceph storage nodes include not only storage, but also intelligence. The traditional driver is a simple goal that only responds to commands from the initiator. But the object storage device is an intelligent device, which can act as the target and initiator to support communication and cooperation with other object storage devices.

From a storage perspective, Ceph object storage devices perform object-to-block mapping (tasks that are often performed in the client's file system layer). This action allows the local entity to decide in the best way how to store an object. Earlier versions of Ceph implemented a custom low-level file system on a local memory called EBOFS. The system implements a non-standard interface to the underlying storage, which is tuned for object semantics and other features, such as asynchronous notifications to disk submissions. Today, the B-tree file system (BTRFS) can be used for storage nodes, and it has implemented some of the necessary functions (such as embedded integrity).

Because Ceph customers implement CRUSH and know nothing about file mapping blocks on disk, the following storage devices can safely manage object-to-block mapping. This allows the storage node to replicate data (when a device failure is found). Allocation failure recovery also allows storage system expansion because fault detection and recovery are distributed across ecosystems. Ceph calls it RADOS (see figure 3).

Other interesting features

If the dynamic and adaptive features of the file system are not enough, Ceph also performs some interesting functions that users can see. Users can create snapshots, for example, on any subdirectory of Ceph (including everything). File and capacity calculations can be performed at the subdirectory level, which reports the storage size and number of files for a given subdirectory (and its contents).

The status and Future of Ceph

Although Ceph is now integrated into the mainline Linux kernel, it is only marked as experimental. The file system in this state is useful for testing, but not ready for the production environment. But given the addition of Ceph to the Linux kernel and the motivation of its creators to continue development, it should soon be used to address your mass storage needs.

Other distributed file systems

Ceph is not unique in the distributed file system space, but it is unique in the way it manages the mass storage ecosystem. Other examples of distributed file systems include Google File System (GFS), General Parallel File System (GPFS), and Lustre, which are only partially mentioned. The idea behind Ceph offers an interesting future for distributed file systems, as massive levels of storage pose the only challenge to mass storage problems.

look into the future

Ceph is not only a file system, but also an object storage ecosystem with enterprise capabilities. In the Resources section, you will find information on how to set up a simple Ceph cluster, including metadata servers, object storage servers, and monitors. Ceph fills the gap in distributed storage, and it will be interesting to see how this open source product evolves in the future.

This is the end of this article on "sample Analysis of ceph distributed Storage". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, please share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report