What is the data set design bucket paradigm in MongoDB development department 07/03 Update SLTechnology News&Howtos

What is the data set design bucket paradigm in MongoDB development department

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

Today, I will talk to you about what is the data set design bucket paradigm in the MongoDB development department. Many people may not know much about it. In order to make you understand better, the editor has summarized the following contents for you. I hope you can get something from this article.

The dataset design pattern, MongoDB provides a detailed reference in the use cases section of the official document https://docs.mongodb.com/ecosystem/.

Bucket-splitting mode is a paradigm for MongoDB dataset design.

Design principles of barrel-by-barrel buckets

The so-called bucket optimization is that instead of creating a document for each piece of data, we can aggregate the measurement data in a certain period of time into a document, making use of the embedded array or subdocument features provided by MongoDB.

We know that many sensor data are time series data. For example: wind sensors, tide monitoring and location tracking are all of this type of data collection: Timestamp, collector name / ID, collection value. For the data of time series type, we can adopt an optimization strategy called time bucket.

Time series data

To put it simply, time series is the numerical series formed at each time point, and time series analysis is to predict the future value by observing historical data. Using the bucket design to write the data set, the elements are more likely to use time as the sorting element, write and read in turn.

There is an official translation article devoted to the design pattern of separate buckets.

Use scene description

The basic data set is as follows

{

Sensor_id: 12345

Timestamp: ISODate ("2019-01-31T10:00:00.000Z")

Temperature: 40

}

{

Sensor_id: 12345

Timestamp: ISODate ("2019-01-31T10:01:00.000Z")

Temperature: 40

}

{

Sensor_id: 12345

Timestamp: ISODate ("2019-01-31T10:02:00.000Z")

Temperature: 41

}

The improved document set is as follows

{

Sensor_id: 12345

Start_date: ISODate ("2019-01-31T10:00:00.000Z")

End_date: ISODate ("2019-01-31T10:59:59.000Z")

Measurements: [

{

Timestamp: ISODate ("2019-01-31T10:00:00.000Z")

Temperature: 40

}

{

Timestamp: ISODate ("2019-01-31T10:01:00.000Z")

Temperature: 40

}

...

{

Timestamp: ISODate ("2019-01-31T10:42:00.000Z")

Temperature: 42

}

]

Transaction_count: 42

Sum_temperature: 2413

}

When the program writes documents, we can do some simple calculation and arrangement, according to time segments, according to business needs, a large number of documents within a time break will be merged to avoid random aggregation and query when data is used. Such a period of time can be understood as a bucket.

When processing time series data, knowing the average temperature from 2:00 to 3:00 in Corning, California, on July 13, 2018 is usually more meaningful and important than knowing the temperature at 2:03. By organizing the data in buckets and preaggregating it, we can provide this information more easily.

Officially, there is a recommended article on Iot usage scenarios, https://www.mongodb.com/customers/bosch, which can be used as a reference.

Comment on the bucket in the design pattern

The sub-bucket operation scenario in comments is described under the https://docs.mongodb.com/ecosystem/use-cases/storing-comments/ Hybrid Schema Design node.

First of all, let's look at the dataset pattern.

_ id: ObjectId (...)

Discussion_id: ObjectId (...)

Bucket: 1

Count: 42

Comments: [{

Slug: '34db'

Posted: ISODateTime (...)

Author: {id: ObjectId (...), name: 'Rick'}

Text: 'This is so bogus...'}

...]

}

In my article on dataset design, I mentioned the design scenario of bucket pattern, which is mainly used for data preprocessing and block storage of time series. The time series is sorted according to the order of time and written in turn. The standard for chunking can be time, such as a day, an hour, or the number of comments.

Also, 100 comments is a soft limit for the number of comments per bucket. This value is arbitrary: choose a value that will prevent the maximum documentsize from growing beyond the 16MB BSON documentsize limit

The overall meaning of the above is that the number of elements in each bucket is not fixed, it is a measure after evaluation according to the actual situation when the application is developed. However, you need to consider the limit of 16MB per document of MongoDB itself.

For an application, such a design pattern requires some simple logic to determine which bucket to write, as well as a simple calculation, as follows

If bucket ['count'] > 100:

Db.discussion.update (

{'discussion_id: discussion [' _ id']

'num_buckets': discussion ['num_buckets']}

{'$inc': {'num_buckets': 1}})

With the help of a PPT of the 2019 MongoDB China user Conference, we have a clearer understanding of the split bucket paradigm.

Buckets.png

The views in the article are not rigorous, you are welcome to comment and communicate. While learning, while practicing, while reference, while improving, growing in the problem.

After reading the above, do you have any further understanding of what is the data set design bucket paradigm in the MongoDB development department? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.