Example Analysis of implementing FileStore by Storage engine in Ceph 07/12 Update SLTechnology News&Howtos

Example Analysis of implementing FileStore by Storage engine in Ceph

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article is to share with you the content of a sample analysis of a storage engine implementing FileStore in Ceph. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

As a highly available and highly consistent software-defined storage implementation, it is very important to understand its internal IO path and storage implementation to use Ceph.

ObjectStore

ObjectStore is one of the most important concepts in Ceph OSD, which encapsulates all IO operations on the underlying storage. From the figure above, you can see that all IO requests are issued on the client side. After the Message layer is parsed uniformly, it will be distributed to each PG layer by the OSD layer. Each PG has a queue, and a thread pool will process each queue.

When an IO in the PG queue is presented, the IO request is processed according to the type and associated parameters. If it is a read request, it will get the corresponding content through the API provided by ObjectStore, and if it is a write request, it will also use the transaction API provided by ObjectStore to combine all write operations into an atomic transaction to submit to ObjectStore. ObjectStore provides different isolation levels to the upper layer through the interface. At present, only Serializable level is used in the PG layer to ensure the order of reading and writing.

The main interface of ObjectStore is divided into three parts, the first part is the read and write operation of Object, which is similar to the partial interface of POSIX, and the second part is the xattr read and write operation of Object, which is characterized by kv pair and associated with an Object. The third part is the kv operation associated with the Object (called omap in Ceph), which is actually very similar to the second part, but may change in implementation.

At present, the main implementation of ObjectStore is FileStore, that is, ObjectStore API is realized by using the POSIX interface of the file system. Each Object is treated as a file in the FileStore layer, and the xattr of the Object is accessed using the file's xattr attribute, because some file systems (such as Ext4) limit the length of the xattr, so the Metadata that exceeds the length is stored in the DBObjectMap. The omap of Object is directly implemented by DBObjectMap. Therefore, it can be seen that xattr and omap operations are interoperable, and from the user's point of view, the former can be seen as a limited length, while the latter is broader (API does not impose mandatory requirements on these).

FileJournal

DBObjectMap is a part of FileStore. The third part of ObjectStore is realized by using KeyValue database. The main complexity of API,DBObjectMap is that it implements the no-copy of clone operation. Because ObjectStore provides clone API, it provides a full clone of an Object (including Object properties and omap). DBObjectMap has a Header for each Object, and each Object-linked omap (kv pairs) pair will contact the Header. When clone, two new Header will be generated, and the original Header will be used as the parent of the two new Header. At this time, both the original Object and cloned Object will query the parent when querying or writing, and implement copy-on-write. So how does Header contact omap (kv pairs)? first of all, each Header has a unique seq, and then all key belonging to the Header will contain the seq. Therefore, the ordered prefix retrieval provided by KeyValueDB is used to traverse the omap.

It is mentioned above that FileStore will treat each Object as a file, then some attributes of Object will be used together with Object Name as the file name, and the PG to which Object belongs will be used as the file directory. When the files contained in a PG exceed a certain extent (too many files in the directory will cause the lookup performance loss of the file system), the PG will be split into two PG.

Thank you for reading! This is the end of this article on "sample analysis of the implementation of FileStore in the storage engine in Ceph". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it out for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.