In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
Background
MapReduce is a very flexible and powerful data aggregation tool. Its advantage is that an aggregate task can be divided into multiple small tasks, which can be distributed to multiple servers for parallel processing.
MongoDB also provides MapReduce, and of course the query must be JavaScript. MapReduce in MongoDB mainly has the following phases:
1. Map: Map an operation to every document in the collection
2. Shuffle: group documents according to Key and generate a series of (> = 1) value tables (List of values) for each different Key.
3. Reduce: processes the elements in the values table until there is only one element in the values table. Then return the value table to the Shuffle procedure and loop until there is only one value table for each Key, and there is only one element in the value table, which is the result of MR.
4. Finalize: this step is not required. After the final result of MR is obtained, some data "pruning" processing is carried out.
The emit function is used in MongoDB to provide Key/Value pairs to MapReduce.
The Reduce function takes two parameters: Key,emits. Key is the Key in the emit function. Emits is an array whose element is the Value provided by the emit function.
The return result of the Reduce function must be reused by Map or Reduce, so the return result must be consistent with the element structure in emits.
The this keyword in the Map or Reduce function represents the document that is currently being Mapping.
Example
Test data: this collection is data on the products and prices purchased by three users.
CodeCodefor (var iTunes 0 I db.mr2.find () {"_ id": {"user": "Joe", "sku": 0}, "value": {"count": 103}} {"_ id": {"user": "Joe", "sku": 1}, "value": {"count": 106}} {"_ id": {"user": "Joe" "sku": 2}, "value": {"count": 102}} {"_ id": {"user": "Joe", "sku": 3}, "value": {"count": 105}} {"_ id": {"user": "Josh", "sku": 4} "value": {"count": 87}} {"_ id": {"user": "Josh", "sku": 5}, "value": {"count": 107}} {"_ id": {"user": "Josh", "sku": 6} "value": {"count": 93}} {"_ id": {"user": "Ken", "sku": 7}, "value": {"count": 98}} {"_ id": {"user": "Ken", "sku": 8} "value": {"count": 83}} {"_ id": {"user": "Ken", "sku": 9}, "value": {"count": 116}
3. What is the total amount of products purchased by each user? (compound Reduce result processing)
Code//SQL implements select user,count (sku), sum (price) from testgroup by user//MapReduce implements map=function () {emit (this.user, {amount:this.price,count:1})} reduce=function (key,values) {var res= {amount:0,count:0} values.forEach (function (val) {res.amount+=val.amount; res.count+=val.count}); return res } db.test.mapReduce (map,reduce, {out: "mr3"}) > db.mr3.find () {"_ id": "Joe", "value": {"amount": 2053.8899999999994, "count": 395}} {"_ id": "Josh", "value": {"amount": 1409.260000002, "count": 292} {"_ id": "Ken" "value": {"amount": 1547.77000000002, "count": 313}}
4. The float precision of the amount returned in 3 needs to be changed to two decimal places, and the average price of the commodity needs to be obtained. (use Finalize to process reduce result sets)
Code//SQL implements select user,cast (sum (price) as decimal (10Magne2)) as amount,count (sku) as [count], cast ((sum (price) / count (sku)) as decimal (10Power2)) as avgPricefrom testgroup by user//MapReduce to implement map=function () {emit (this.user, {amount:this.price,count:1,avgPrice:0})} reduce=function (key,values) {var res= {amount:0,count:0 AvgPrice:0} values.forEach (function (val) {res.amount+=val.amount Res.count+=val.count}); return res;} finalizeFun=function (key,reduceResult) {reduceResult.amount= (reduceResult.amount) .tofixed (2); reduceResult.avgPrice= (reduceResult.amount/reduceResult.count) .tofixed (2); return reduceResult } db.test.mapReduce (map,reduce, {out: "mr4", finalize:finalizeFun}) > db.mr4.find () {"_ id": "Joe", "value": {"amount": "2053.89", "count": "avgPrice"} {"_ id": "Josh", "value": {"amount": "1409.26", "count": 292 "avgPrice": "4.83"} {"_ id": "Ken", "value": {"amount": "1547.77", "count": 313, "avgPrice": "4.94"}
5. Statistics of SKU with a unit price greater than 6, the number of purchases per user. (filter a subset of data for MR)
This is relatively simple, only need to 1. When calling MR, you can add a filter query, and everything else remains the same.
Codedb.test.mapReduce (map,reduce, {query: {price: {"$gt": 6}}, out: "mr5"})
Summary
The MR tool in MongoDB is very powerful, and the examples in this article are just basic examples. Combined with Sharding, multi-servers do data collection processing in parallel, which can really show its ability.
If you have time later, I hope to summarize and share more about MongoDB and SQL Server.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.