In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
This article introduces you how to analyze the MongoDb Mmap engine, the content is very detailed, interested friends can refer to, hope to be helpful to you.
MongoDB has been using mmap engine as the default storage engine before 3.0. this article analyzes mmap engine from the point of view of source code. The industry has always had mixed comments on 10gen using mmap to implement storage engine, which is not discussed in this paper.
Storage is divided into directories according to db, and there are .ns files {dbname}. 0, {dbname}. 1 and other files in each db directory. Stored in the journal directory is WAL (write ahead log) for fault recovery. The directory structure is as follows:
Db |-journal |-_ j.0 |-lsn |-- lsn |-- local |-local.ns |-local.0 |-local.1 |-mydb |-mydb.ns |- -mydb.0 |-mydb.1
These three types of files make up the persistence unit of the mmap engine. This paper mainly analyzes the structure of each type of file from the code level.
Namespace metadata management. Ns file mapping
When the mmap engine loads a database, it first initializes the metadata entry of the namespaceIndex,namespaceIndex equivalent of database.
Mongo/db/storage/mmap_v1/catalog/namespace_index.cpp
89 DurableMappedFile _ f {MongoFile::Options::SEQUENTIAL}; 90 std::unique_ptr _ ht; 154const std::string pathString = nsPath.string (); 159p (pathString); 232p = _ f.getView (); 242ht.reset (new NamespaceHashTable (p, (int) len, "namespace index"))
As above, create a mmap for the. ns file and map the view of memory directly to hashtable (no, no parsing). So the. ns file is an in-memory image of hashtable
The key-value relationship of hashtable is string- > NamespaceDetails (namespace_details.h), using open addressing hash.
39 int NamespaceHashTable::_find (const Namespace& k, bool& found) const {46 while (1) {47 if (! _ nodes (I). InUse ()) {48 if (firstNonUsed)
< 0)49 firstNonUsed = i; 50 } 51 52 if (_nodes(i).hash == h && _nodes(i).key == k) { 53 if (chain >) 54 log () writing (drec (newDelLoc)); 138newDel- > extentOfs () = dr- > extentOfs (); 139 newDel- > lengthWithHeaders () = remainingLength;140 newDel- > nextDeleted (). Null (); 141142addDeletedRec (txn, newDelLoc); 143}
The above process of sharding memory is shown in the following figure:
If the allocation fails from an existing freelist, an attempt is made to apply for a new extent and add the new extent to the freelist with the largest size rule. And try allocating memory from freelist again.
59 const int RecordStoreV1Base::bucketSizes [] = {... 83 MaxAllowedAllocation, / / 16.5m 84 MaxAllowedAllocation + 1, / / Only MaxAllowedAllocation sized records go here. 85 INT_MAX, / / "oversized" bucket for unused parts of extents. 86}; 87
The above process is an overview of the memory management of the mmap engine. It can be seen that each record is not of a fixed size when allocated. The applied memory block needs to add the extra part to the deletedlist, and after the record is released, it is also linked to the deletedlist of the corresponding size. After doing so for a long time, a large number of memory fragments will be generated, and the mmap engine also has a compact process for fragments to improve memory utilization.
Fragment Compact
Compact is exposed to the client in the form of a command, which takes collection as the dimension and extent as the minimum granularity in implementation.
The whole process of compact is divided into two steps, as shown in the figure above. The first step is to disconnect extent from freelist, and the second step is to copy the used space in extent to the new extent, and copy it to ensure the compactness of memory. In order to achieve the purpose of compact.
OrphanDeletedList process
Leave the deletedlist under the namespace corresponding to collection empty, so that the newly created record will not be assigned to the existing extent.
443 WriteUnitOfWork wunit (txn); 444 / / Orphaning the deleted lists ensures that all inserts go to new extents rather than445 / / the ones that existed before starting the compact. If we abort the operation before446 / / completion, any free space in the old extents will be leaked and never reused unless447 / / the collection is compacted again or dropped. This is considered an acceptable448 / / failure mode as no data will be lost.449 log () setLastExtentSize (txn, 0); 454455 / / create a new extent so new records go there456 increaseStorageSize (txn, _ details- > lastExtentSize (txn), true); 467 for (std::vector::iterator it = extents.begin (); it! = extents.end (); it++) {468 txn- > checkForInterrupt (); 469 invariant (_ details- > firstExtent (txn) = = * it) 470 / / empties and removes the first extent471 _ compactExtent (txn, * it, extentNumber++, adaptor, options, stats); 472 invariant (_ details- > firstExtent (txn)! = * it); 473 pm.hit (); 474}
In the process of _ compactExtent, the record of the extent is gradually inserted into the new extent, and the space is gradually freed. When all the record is cleaned up, the extent becomes a brand new, unused extent. The figure below is as follows
324 while (! nextSourceLoc.isNull ()) {325 txn- > checkForInterrupt (); 326327 WriteUnitOfWork wunit (txn); 328 MmapV1RecordHeader* recOld = recordFor (nextSourceLoc); 329 RecordData oldData = recOld- > toRecordData (); 330 nextSourceLoc = getNextRecordInExtent (txn, nextSourceLoc); 371 CompactDocWriter writer (recOld, rawDataSize, allocationSize); 372 StatusWith status = insertRecordWithDocWriter (txn, & writer) 398 _ details- > incrementStats (txn,-(recOld- > netLength (),-1);}
This is the process of traversing the extent's record in the _ compactExtent function, inserting it into other extent, and gradually freeing up the space (line 398).
Mmap data writeback
When we introduced the structure of A. ns file above, we mentioned that A. ns file is mapped to a hashtable in memory through mmap, and the mapping process is implemented through DurableMappedFile. Let's take a look at how the module is persisted.
In the finishInit of mmap engine
252 void MMAPV1Engine::finishInit () {253 dataFileSync.go ()
Here, the scheduled task of the DataFileSync class is called to periodically drop the disk in the backgroud thread.
67 while (! inShutdown ()) {69 if (storageGlobalParams.syncdelay = = 0) {70 / / in case at some point we add an option to change at runtime 71 sleepsecs (5); 72 continue; 73} 74 75 sleepmillis (76 (long long) std::max (0.0, (storageGlobalParams.syncdelay * 1000)-time_flushing)) 83 Date_t start = jsTime (); 84 StorageEngine* storageEngine = getGlobalServiceContext ()-> getGlobalStorageEngine (); 85 86 dur::notifyPreDataFileFlush (); 87 int numFiles = storageEngine- > flushAllFiles (true); 88 dur::notifyPostDataFileFlush (); 97} 98}
FlushAllFiles will eventually call the flush method of each memory-map-file
245 void MemoryMappedFile::flush (bool sync) {246 if (views.empty () | | fd = = 0 | |! sync) 247 return 248249 bool useFsync =! ProcessInfo::preferMsyncOverFSync () 250251 if (useFsync? Fsync (fd)! = 0: msync (viewForFlushing (), len, MS_SYNC)! = 0) {252 / / msync failed, this is very bad 253 log ()
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.