The case and implementation method of MongoDB aggregation grouping to take the first record 07/01 Update SLTechnology News&Howtos

The case and implementation method of MongoDB aggregation grouping to take the first record

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Preface

Today, the developer mentioned an urgent need to us. From the collection mt_resources_access_log, we grouped according to the field refererDomain, took the data inserted in the most recent stroke in the group, and then imported the qualified data into the collection mt_resources_access_log_new.

It is a bit guilty to receive this demand for two reasons: first, the business needs, time is tight; second, the realization of this function MongoDB aggregation feels a bit complicated, aggregation needs to take many steps.

The data record format is as follows:

Record 1 {"_ id": ObjectId ("5c1e23eaa66bf62c0c390afb"), "_ class": "C1", "resourceUrl": "/ static/js/p.js", "refererDomain": "1234", "resourceType": "static_resource", "ip": "17.17.13.13", "createTime": ISODate ("2018-12-22T19:45:46.015+08:00") "disabled": 0} record 2 {"_ id": ObjectId ("5c1e23eaa66bf62c0c390afb"), "_ class": "C1", "resourceUrl": "/ static/js/p.js", "refererDomain": "1234", "resourceType": "Dome_resource", "ip": "17.17.13.14", "createTime": ISODate ("2018-12-21T19:45:46.015+08:00") "disabled": 0} record 3 {"_ id": ObjectId ("5c1e23eaa66bf62c0c390afb"), "_ class": "C2", "resourceUrl": "/ static/js/p.js", "refererDomain": "1235", "resourceType": "static_resource", "ip": "17.17.13.13", "createTime": ISODate ("2018-12-20T19:45:46.015+08:00") "disabled": 0} record 4 {"_ id": ObjectId ("5c1e23eaa66bf62c0c390afb"), "_ class": "C2", "resourceUrl": "/ static/js/p.js", "refererDomain": "1235", "resourceType": "Dome_resource", "ip": "17.17.13.13", "createTime": ISODate ("2018-12-20T19:45:46.015+08:00"), "disabled": 0}

The above are our 4 records, and the similar record document is 1500W.

Because of the special circumstances, the business distribution plate needs this data. The rush is relatively urgent, and through the aggregation framework aggregate, there are no ideas for a short time, so I wanted to try other solutions at that time.

Finally, the solution to the problem is as follows.

Step 1 first groups according to the condition requirements through the aggregation framework, and outputs the newly generated data to the collection mt_resources_access_log20190122 (a total of 95 pieces of data are generated)

The implementation code is as follows:

Db.log_resources_access_collect.aggregate ({$group: {_ id: "$refererDomain"}}, {$out: "mt_resources_access_log20190122"}])

Step 2 loops the data of mt_resources_access_log20190122 and mt_resources_access_log through two forEach operations.

Code interpretation, the logic of processing is that the data of mt_resources_access_log20190122 is taken out one by one (a total of 95 strokes), and each stroke is processed line by line. The logic of processing is mainly based on one's own _ id field data (this field comes from the refererDomain field before mt_resources_access_log aggregation), to compare with mt_resources_access_log field refererDomain, to query the data that meet this condition, and to take only one stroke in reverse order of _ id. Finally, Insert the data selected by Join to the collection mt_resources_access_log_new.

The new collection is also 95 pieces of data.

You don't have to worry about performance, the query statement implements the result query in 1s.

Db.mt_resources_access_log20190122.find ({}) .forEach (function (x) {db.mt_resources_access_log.find ({"refererDomain": x._id}) .sort ({_ id:-1}) .limit (1) .forEach (function (y) {db.mt_resources_access_log_new.insert (y)})})

The Step 3 query validates the newly generated collection mt_resources_access_log_new, which meets the business requirements.

Before brushing, the amount of data collected in mt_resources_access_log is more than 1500 W.

After brushing, a new set of mt_resources_access_log_new data is generated, which is 95 pieces.

Note: according to the requirements of time sorting, because some documents do not have a createTime field type and no index has been created on the createTime field, we have adopted the workaround method of sort ({_ id:1}) for sorting by time, because _ id still has the meaning of time. The following is the knowledge of MongoDB corresponding to _ id.

The most important thing is that the first 4 bytes contain the standard Unix timestamp. The next three bytes is the machine ID, followed by a 2-byte process ID. The last three bytes store process local counters. The counter ensures that the same process will not be repeated at the same time.

Summary

The above is the whole content of this article, I hope that the content of this article has a certain reference and learning value for your study or work, if you have any questions, you can leave a message and exchange, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.