Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

An example Analysis of the write Operation of three copies of Ceph Jewel version

2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

Editor to share with you the Ceph Jewel version of the three copies of the write example analysis, I hope you will learn something after reading this article, let's discuss it together!

1. Main OSD write processing flow

OSD::ms_fast_dispatch ()

| | _ OSD::dispatch_session_waiting () |

| | _ OSD::dispatch_op_fast () |

| | _ OSD::handle_op () |

| | _ _ OSD::get_pg_or_queue_for_pg () finds the corresponding PG and Pool information in OpRequest |

| | _ OSD::enqueue_op () |

| | _ PG::queue_op () |

| | _ _ OSD::ShardedThreadPool::ShardedWQ::queue () put PG and Op in the queue together |

OSD::ShardedOpWQ::_process () is responsible for handling Op in the OSD::ShardedThreadPool::ShardedWQ queue

| | _ _ PGQueueable::RunVis::operator () (const OpRequestRef & op) |

| | _ OSD::dequeue_op () |

| | _ ReplicatedPG::do_request () |

| | _ _ check whether the current PG is in flush or peering status. If so, put op in the waiting_for_peered queue and wait for PG to become available |

| | _ _ check whether the current PG is in Active state, and if not, put the op in the waiting_for_active queue |

| | _ _ check whether the current PG is in REPLAY status, and if so, put the op in the waiting_for_active queue |

| | _ ReplicatedPG::do_op () |

| | _ _ RepliatedPG::do_pg_op () CEPH_OSD_RMW_FLAG_PGOP the operation on PG that is included in the request |

| | _ _ create hobject_t class object (head) based on op request |

| | _ _ check whether object name length / object locator key length / object locator namespace length is greater than osd_max_object_name_len |

| | _ _ check whether the head of object is valid through FileStore |

| | _ _ check whether the op request address is in the blacklist of OSDMap |

| | _ _ for write requests, check whether the data size of the write request is greater than the osd_max_write_ size value |

| | _ _ if the head of the op request is currently unreadable, put the op in the waiting_for_unreadable_object queue and call the maybe_kick_recovery () function to try to start recovery |

| | _ _ ReplicatedPG::is_degraded_or_backfilling_object () checks whether the head requested by the current op is in recovery or backfill status |

| | _ _ ReplicatedPG::wait_for_degraded_object () puts the head of the current op request into the waiting_for_degraded_object queue |

| | _ _ check whether head is in the objects_blocked_on_degraded_snap queue, and if so, put the head of the current op request into the waiting_for_degraded_object queue |

| | _ _ check whether head is in the objects_blocked_on_snap_promotion queue, and if so, put the head of the current op request into the waiting_for_blocked_object queue |

| | _ _ check whether head is in the objects_blocked_on_cache_full queue, and if so, put the head of the current op request into the waiting_for_cache_not_full queue |

| | _ _ check whether the snapdir of head is unreadable. If so, put the snapdir of head into the waiting_for_unreadable_object queue and call the maybe_kick_recovery () function to try to start recovery. |

| | _ _ check whether the snapdir of head is in recovery or backfill status, and if so, put the snapdir of head into the waiting_for_degraded_object queue |

| | _ _ for op write request is already in PGLog, if the write operation has been completed, the MOSDOpReply message is returned directly to the client and CEPH_OSD_FLAG_ACK is set, otherwise the op is placed in the waiting_for_ack or waiting_for_ondisk queue |

| | _ _ ReplicatedPG::find_object_context () gets object context information |

| | _ _ check whether the object context is in io blocked state, and if so, put the op request into the waiting_for_blocked_object or waiting_for_degraded_object queue |

| | _ ReplicatedPG::execute_ctx () |

| | _ ReplicatedPG::prepare_transaction () |

| | _ _ create MOSDOpReply message instance |

| | _ _ ReplicatedPG::calc_trim_to () calculates trim PGLog |

| | _ _ ReplicatedPG::register_on_applied () registers the on_applied callback handler. If an ack is required for an op request and no sent_ack or sent_disk has been sent to the client at this time, a MOSDOpReply message is created and a CEPH_OSD_FLAG_ACK identity is added to the message, and then the MOSDOpReply message is sent to the client. |

| | _ _ ReplicatedPG::register_on_commit () registers the on_committed callback handler. If an ack is required for an op request and no sent_disk has been sent to the client at this time, a MOSDOpReply message is created and the CEPH_OSD_FLAG_ACK and CEPH_OSD_FLAG_ONDISK identities are added to the message, and then the MOSDOpReply message is sent to the client. |

| | _ _ ReplicatedPG::register_on_success () registers the on_success callback handler |

| | _ _ ReplicatedPG::register_on_finish () registers the on_finish callback handler |

| | _ _ ReplicatedPG::new_repop () creates a RepGather class object |

| | _ ReplicatedPG::issue_repop () |

| | _ _ create a C_OSD_RepopCommit class object, that is, the callback function class after all copies have completed commit, set repop- > all_committed=true in this function, and finally call ReplicatedPG::eval_repop () |

| | _ _ create a C_OSD_RepopApplied class object, that is, the callback function class after all copies have completed applied, set repop- > all_applied=true in this function, and finally call ReplicatedPG::eval_repop () |

| | _ ReplicatedBackend::submit_transaction () |

| | _ ReplicatedBackend::issue_op () |

| | _ _ ReplicatedBackend::generate_subop () creates a MOSDRepOp message class object |

| | _ ReplicatedPG::send_message_osd_cluster () |

| | _ _ OSD::send_message_osd_cluster () sends the MOSDRepOp message to the OSDs node where the copy resides |

| | _ _ create a C_OSD_OnOpApplied class object to handle the callback function class after the completion of the local applied |

| | _ _ create a C_OSD_OnOpCommit class object to handle the callback function class after the completion of the local commit |

| | _ RepliatedPG::queue_transactions () |

| | _ ObjectStore::queue_transactions () |

| | _ FileStore::queue_transactions () |

| | _ JournalingObjectStore::_op_journal_tranactions () |

| | _ _ FileJournal::submit_entry () submits the log writing request to the log task queue, and calls back the C_JournaledAhead class object after the log is written |

| | _ ReplicatedPG::eval_repop () |

| | _ _ check whether repop- > rep_done is completed |

| | _ _ check repop- > all_commit, that is, whether all copies have completed log writing, and if so, call back the on_committed () callback function |

| | _ _ check repop- > all_applied, that is, whether all copies are completed, and if so, call back the on_applied () callback function |

| | _ _ check repop- > all_commit and repop- > all_applied, that is, whether all copies have completed the write operation. If so, call the repop- > on_success () callback function |

The processing flow after the completion of local log writing

C_JournaledAhead::finish ()

| | _ FileStore::_journaled_ahead () |

| | _ _ FileStore::queue_op () puts the write request into the op_wq queue of FileStore |

| | _ _ callback the handler function of C_OSD_OnOpCommit class object |

Local data storage processing flow

FileStore::_do_op ()

| | _ _ read write request from op_wq queue |

| | _ _ FileStore::_do_transactions () performs the actual write data operation |

The processing flow after the completion of the local data

FileStore::_finish_op ()

| | _ _ callback the handler function of C_OSD_OnOpApplied class object |

2. The replica OSD handles the write request sent by the master OSD (the message is MOSDRepOp and the message type is MSG_OSD_REPOP)

OSD::ms_fast_dispatch ()

| | _ OSD::dispatch_session_waiting () |

| | _ OSD::dispatch_op_fast () |

| | _ OSD::handle_replica_op () |

| | _ _ check the sender's validity |

| | _ _ OSD::get_pg_or_queue_for_pg () finds the corresponding PG and Pool information in OpRequest |

| | _ OSD::enqueue_op () |

| | _ PG::queue_op () |

| | _ _ OSD::ShardedThreadPool::ShardedWQ::queue () put PG and Op in the queue together |

OSD::ShardedOpWQ::_process () is responsible for handling Op in the OSD::ShardedThreadPool::ShardedWQ queue

| | _ _ PGQueueable::RunVis::operator () (const OpRequestRef & op) |

| | _ OSD::dequeue_op () |

| | _ ReplicatedPG::do_request () |

| | _ ReplicatedBackend::handle_message () |

| | _ ReplicatedBackend::sub_op_modify () |

| | _ ReplicatedPG::log_operation () |

| | _ _ PG::append_log () writes PGLog |

| | _ _ create an instance of C_OSD_RepModifyCommit class, which is used to handle callback processing after log commit is completed |

| | _ _ create an instance of C_OSD_RepModifyApply class, which is used to handle the callback processing after the data has been dropped to disk |

| | _ ReplicatedPG::queue_transactions () |

| | _ FileStore::queue_transactions () |

| | _ JournalingObjectStore::_op_journal_tranactions () |

| | _ _ FileJournal::submit_entry () submits the log writing request to the log task queue, and calls back the C_JournaledAhead class object after the log is written |

The processing flow after the completion of local log writing

C_JournaledAhead::finish ()

| | _ FileStore::_journaled_ahead () |

| | _ _ FileStore::queue_op () puts the write request into the op_wq queue of FileStore |

| | _ _ callback the handler function of C_OSD_RepModifyCommit class object |

Local data storage processing flow

FileStore::_do_op ()

| | _ _ read write request from op_wq queue |

| | _ _ FileStore::_do_transactions () performs the actual write data operation |

The processing flow after the completion of the local data

FileStore::_finish_op ()

| | _ _ callback the handler function of C_OSD_RepModifyApply class object |

Third, the main OSD handles the MOSDRepOpReply message processing flow sent by the copy OSDs (message type MSG_OSD_REPOPREPLY)

OSD::ms_fast_dispatch ()

| | _ OSD::dispatch_session_waiting () |

| | _ OSD::dispatch_op_fast () |

| | _ OSD::handle_replica_op () |

| | _ _ check the sender's validity |

| | _ _ OSD::get_pg_or_queue_for_pg () finds the corresponding PG and Pool information in OpRequest |

| | _ OSD::enqueue_op () |

| | _ PG::queue_op () |

| | _ _ OSD::ShardedThreadPool::ShardedWQ::queue () put PG and Op in the queue together |

OSD::ShardedOpWQ::_process () is responsible for handling Op in the OSD::ShardedThreadPool::ShardedWQ queue

| | _ _ PGQueueable::RunVis::operator () (const OpRequestRef & op) |

| | _ OSD::dequeue_op () |

| | _ ReplicatedPG::do_request () |

| | _ ReplicatedBackend::handle_message () |

| | _ ReplicatedBackend::sub_op_modify_reply () |

| | _ _ for messages with CEPH_OSD_FLAG_ONDISK identity set, delete the corresponding OSD ID in the waiting_for_commit queue |

| | _ _ delete the corresponding OSD ID in the waiting_for_applied queue |

| | _ _ if the waiting_for_commit queue is empty, the callback function of the C_OSD_RepopCommit class object is called |

| | _ _ if the waiting_for_applied queue is empty, the callback function of the C_OSD_RepopApplied class object is called |

Fourth, callback function class processing

C_OSD_RepModifyCommit class processing flow

C_OSD_RepModifyCommit::finish ()

| | _ ReplicatedBackend::sub_op_modify_commit () |

| | _ _ create MOSDRepOpReply message and set CEPH_OSD_FLAG_ONDISK ID |

| | _ ReplicatedPG::send_message_osd_cluster () |

| | _ _ OSD::send_message_osd_cluster () sends the MOSDRepOpReply message back to the main OSD |

C_OSD_RepModifyApply class processing flow

C_OSD_RepModifyApply::finish ()

| | _ ReplicatedBackend::sub_op_modify_applied () |

| | _ _ create a MOSDRepOpReply message and set the CEPH_OSD_FLAG_ACK ID (for cases where no log processing is performed) |

| | _ ReplicatedPG::send_message_osd_cluster () |

| | _ _ OSD::send_message_osd_cluster () sends the MOSDRepOpReply message back to the main OSD |

C_OSD_OnOpCommit processing flow

C_OSD_OnOpCommit::finish ()

| | _ ReplicatedBackend::op_commit () |

| | _ _ Delete the corresponding OSD ID information in waiting_for_commit array |

| | _ _ check whether the waiting_for_commit array is empty. If it is empty, call the callback function of the C_OSD_RepopCommit class object |

C_OSD_OnOpApplied processing flow

C_OSD_OnOpApplied::finish ()

| | _ ReplicatedBackend::op_applied () |

| | _ _ Delete the corresponding OSD ID information in waiting_for_applied array |

| | _ _ check whether the waiting_for_applied array is empty. If it is empty, call the callback function of the C_OSD_RepopApplied class object |

After reading this article, I believe you have some understanding of "sample Analysis of three copies of Ceph Jewel version". If you want to know more about it, please follow the industry information channel and thank you for your reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report