Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How does MongoDB store data

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

This article is about how MongoDB stores data. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

Preface

Before you want to delve into how MongoDB stores data, one concept must be clear, and that is Memeory-Mapped Files.

Memeory-Mapped Files

The following figure shows how the database deals with the underlying system.

A memory-mapped file is an OS that creates a data file in memory through mmap, thus mapping the file to an area of virtual memory.

For a process, virtual memory is an abstraction of physical memory with an addressing space of 2 ^ 64.

The operating system maps all the data needed by the process to this address space (red line) through mmap, and then maps the current data to physical memory (gray line).

When a process accesses some data, if the data is not in virtual memory, page fault is triggered, and then OS loads the data from the hard disk into virtual memory and physical memory

If the physical memory is full, the swap-out operation is triggered, and some data needs to be written back to disk. If it is pure memory data, write it back to the swap partition, if not, write it back to disk.

Storage Model of MongoDB

With memory-mapped files, the data to be accessed seems to be in memory, simplifying the logic for MongoDB to access and modify data

MongoDB reads and writes only deal with virtual memory, leaving the rest to OS.

Virtual memory size = all file sizes + other overhead (connection, stack)

If journal is turned on, the size of virtual memory almost doubles

The advantage of using MMF 1: no need to manage memory and disk scheduling 2:LRU strategy 3: during the restart process, Cache is still there.

The disadvantages of using MMF 1:RAM use will be affected by disk fragmentation, high pre-reading will also affect 2: can not optimize the scheduling algorithm, can only use LRU

The files on disk are made up of extent, and the allocation of collection space is also based on extent.

A collection has one or more etent

The namespace record in the ns file points to the first extent of that collection

Data file and space allocation

When you create a database (in fact, MongoDB does not have an explicit way to create a database, which is automatically created when you write data to a collection in the database), MongoDB allocates a set of data files on disk in which all collections, indexes, and other metadata of the database are stored. The data file is placed in the dbpath specified at startup and under / data/db by default. A typical file organization structure is as follows:

$cat / data/db$ ls-al-rw- 1 root root 16777216 09-18 00:54 local.ns-rw- 1 root root 67108864 09-18 00:54 local.0-rw- 1 root root 2146435072 09-18 00:55 local.1-rw- 1 root root 2146435072 09-18 00:56 local.2-rw- 1 root root 2146435072 09-18 00:57 local.3- Rw- 1 root root 2146435072 09-18 00:58 local.4-rw- 1 root root 2146435072 09-18 00:59 local.5-rw- 1 root root 2146435072 09-18 01:01 local.6-rw- 1 root root 2146435072 09-18 01:02 local.7-rw- 1 root root 2146435072 09-18 01:03 local.8-rw- 1 root Root 2146435072 09-18 01:04 local.9-rw- 1 root root 2146435072 09-18 01:05 local.10-rw- 1 root root 16777216 09-18 01:06 test.ns-rw- 1 root root 67108864 09-18 01:06 test.0-rw- 1 root root 134217728 09-18 01:06 test.1-rw- 1 root root 268435456 09-18 01:06 test. 2Murrw1 root root 536870912 09-18 01:06 test.3-rw- 1 root root 1073741824 09-18 01:07 test.4-rw- 1 root root 2146435072 09-18 01:07 test.5-rw- 1 root root 2146435072 09-18 01:09 test.6-rw- 1 root root 2146435072 09-18 01:11 test.7-rw- 1 root root 2146435072 09-18 01:13 test.8...-rwxr-xr-x 1 root root 6 09-18 13:54 mongod.lockdrwxr-xr-x 2 root root 4096 11-13 18:39 journaldrwxr-xr-x 2 root root 4096 11-13 19:02 _ tmp

The server's process ID is stored in mongod.lock, which is a process lock file. The data file is named according to the database to which it belongs.

Test.ns is the first generated file (the ns extension means namespace), and each collection and index in the database has its own namespace, and the metadata for each namespace is stored in this file. By default, the. ns file size is fixed at 16MB and can store about 24000 namespaces. That is, the total number of indexes and collections in the database cannot exceed 24000, which can be customized through the-nssize option of mongod.

Files like test.0 that end with integers starting with 0 are collection and index data files. At the beginning, MongoDB pre-allocates several files even if there is only one piece of data. This pre-allocation allows data to be stored as continuously as possible, reducing disk fragmentation. When adding data to a database, MongoDB allocates more data files. Each new data file is twice the size of the previous allocated file (64m-> 128m-> 256m) until the upper limit of the pre-allocated file size is 2G. This is based on the assumption that if the total data size is growing at a constant rate, the space allocated to the data file should be gradually increased. Of course, this pre-allocation strategy can also be turned off through-noprealloc, but it is not recommended to use it in a production environment.

Default local database, which does not participate in replication. When mongod is a member of a replica set, there is a pre-allocated capped collection called oplog.rs in the local database with a pre-allocated size of 5% of disk space. This size can be adjusted with-oplogSize. Oplog is mainly used for replication in replica sets Primary and Secondary members, and its size limits how long it is allowed to be out of sync between two replica sets before resynchronizing.

Journal directory, journal function version 2.4 is enabled by default.

You can use db.stats () to confirm that space is used and allocated.

{"db": "test", "collections": 37, "objects": 317894523, # Total number of documents "avgObjSize": 232.3416429039893, # units are bytes "dataSize": 73860135744, # the actual size of all data in the collection (including the extra space allocated by padding factor for each document to allow document growth). This value does not decrease when the document size becomes smaller, unless the document is deleted, or the compact or repairDatabase operation "storageSize": 97834319392, # the amount of space allocated to the collection (including additional space reserved for collection growth and unallocated deleted space, that is, will not decrease as the document size becomes smaller or deleted). In fact, the space allocated to the collection from the data file is in blocks Also known as extents, that is, the size of the assigned extents "numExtents": 385, "indexes": 86, "indexSize": 58687466992, "fileSize": 182380920832, # the sum of all data file sizes, excluding the namespace file (ns file) "nsSizeMB": 16, "dataFileVersion": {"major": 4, "minor": 5}, "ok": 1}

Use db.accesslog.stats () to confirm the usage of a collection

{"ns": "test.accesslog", "count": 145352932, "size": 37060264352, # actual data size, excluding index "avgObjSize": 254.967435758365, "storageSize": 45794676448, # pre-allocated data storage space "numExtents": 42, "nindexes": 4, "lastExtentSize": 2146426864, "paddingFactor": 1, # when the document increases due to updated size, padding can speed up in advance. Reduce the generation of fragments "systemFlags": 1, "userFlags": 0, "totalIndexSize": 31897944512, "indexSizes": {"_ id_": 6722168208, "action_1_time_1": 8606482752, "gz_id_1_action_1_time_1": 10753778336, "time_1": 5815515216}, "ok": 1} Thank you for reading! This is the end of the article on "how MongoDB stores data". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report