In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
Aggregate
Aggregate in MongoDB is mainly used to process data (such as statistical average, summation, etc.) and return calculated data results, similar to count (*) in sql statements.
The syntax is as follows:
Db.collection.aggregate ()
Db.collection.aggregate (pipeline,options)
Db.runCommand ({
Aggregate: ""
Pipeline: [,]
Explain:
AllowDiskUse:
Cursor:
})
Before using aggregate to implement aggregation operations, let's first take a look at a few common aggregation operators.
Project:: can rename the keys in the result set, control whether the keys are displayed, and calculate the columns.
$match: filter the result set and output only documents that meet the criteria.
Skip: skip the first few lines when displaying the results and return the rest of the document.
$sort: sorts the result set to be displayed
$limit: controls the size of the result set
Unwind: splits an array type field in a document into multiple strips, each containing a value in the array.
$geoNear: outputs ordered documents close to a geographic location.
$group: grouping, aggregation, summation, average, maximum, minimum, first, last, etc.
Expression description instance
$sum calculates the sum db.mycol.aggregate ([{$group: {_ id: "$by_user", num_tutorial: {$sum: "$likes"}])
Avg calculates the average db.mycol.aggregate ([{$group: {_ id: "$by_user", num_tutorial: {$avg: "$likes"}])
$min gets the minimum value db.mycol.aggregate for all documents in the collection ([{$group: {_ id: "$by_user", num_tutorial: {$min: "$likes"}])
$max gets the maximum value db.mycol.aggregate for all documents in the collection ([{$group: {_ id: "$by_user", num_tutorial: {$max: "$likes"}])
$push inserts a value into an array db.mycol.aggregate in the result document ([{$group: {_ id: "$by_user", url: {$push: "$url"}])
$addToSet inserts a value into an array in the result document, but does not create a copy of db.mycol.aggregate ([{$group: {_ id: "$by_user", url: {$addToSet: "$url"}])
$first obtains the first document data db.mycol.aggregate according to the sorting of resource documents ([{$group: {_ id: "$by_user", first_url: {$first: "$url"}])
$last obtains the last document data db.mycol.aggregate according to the sorting of resource documents ([{$group: {_ id: "$by_user", last_url: {$last: "$url"}])
Example:
Db.createCollection ("emp")
Db.emp.insert ({_ id:1, "ename": "tom", "age": 25, "department": "Sales", "salary": 6000})
Db.emp.insert ({_ id:2, "ename": "eric", "age": 24, "department": "HR", "salary": 4500})
Db.emp.insert ({_ id:3, "ename": "robin", "age": 30, "department": "Sales", "salary": 8000})
Db.emp.insert ({_ id:4, "ename": "jack", "age": 28, "department": "Development", "salary": 8000})
Db.emp.insert ({_ id:5, "ename": "Mark", "age": 22, "department": "Development", "salary": 6500})
Db.emp.insert ({_ id:6, "ename": "marry", "age": 23, "department": "Planning", "salary": 5000})
Db.emp.insert ({_ id:7, "ename": "hellen", "age": 32, "department": "HR", "salary": 6000})
Db.emp.insert ({_ id:8, "ename": "sarah", "age": 24, "department": "Development", "salary": 7000})
> use companyswitched to db company > db.emp.aggregate (. {$group: {_ id: "$department", dpct: {$sum:1}...) {"_ id": "Development", "dpct": 3} {"_ id": "HR", "dpct": 2} {"_ id": "Planning", "dpct": 1} {"_ id": "Sales" "dpct": 2} > db.emp.aggregate (. {$group: {_ id: "$department", salct: {$sum: "$salary"}, salavg: {$avg: "$salary"}.) {"_ id": "Development", "salct": 21500, "salavg": 7166.6666666667} {"_ id": "HR", "salct": 10500, "salavg": 5250} {"_ id": "Planning" "salct": 5000, "salavg": 5000} {"_ id": "Sales", "salct": 14000, "salavg": 7000} > db.emp.aggregate (. {$match: {age: {$lt:25}.) {"_ id": 2, "ename": "eric", "age": 24, "department": "HR", "salary": 4500} {"_ id": 5. "ename": "Mark", "age": 22, "department": "Development", "salary": 6500} {"_ id": 6, "ename": "marry", "age": 23, "department": "Planning", "salary": 5000} {"_ id": 8, "ename": "sarah", "age": 24, "department": "Development" "salary": 7000} > db.emp.aggregate (. {$match: {age: {$gt:25}},... {$group: {_ id: "$department", salct: {$sum: "$salary"}, salavg: {$avg: "$salary"}.) {"_ id": "HR", "salct": 6000, "salavg": 6000} {"_ id": "Development", "salct": 8000 "salavg": 8000} {"_ id": "Sales", "salct": 8000, "salavg": 8000} > db.emp.aggregate (. {$group: {_ id: "$department", salct: {$sum: "$salary"}, salavg: {$avg: "$salary"}},. {$match: {salavg: {$gt:6000}.) {"_ id": "Development", "salct": 21500 "salavg": 7166.66666666666667} {"_ id": "Sales", "salct": 14000, "salavg": 7000} > > db.emp.aggregate (. {$sort: {age:1}}, {$limit:3}.) {"_ id": 5, "ename": "Mark", "age": 22, "department": "Development", "salary": 6500} {"_ id": 6 "ename": "marry", "age": 23, "department": "Planning", "salary": 5000} {"_ id": 2, "ename": "eric", "age": 24, "department": "HR", "salary": 4500} > db.emp.aggregate ({$sort: {age:-1}, {$limit:3}) {"_ id": 7 Ename: "hellen", "age": 32, "department": "HR", "salary": 6000} {"_ id": 3, "ename": "robin", "age": 30, "department": "Sales", "salary": 8000} {"_ id": 4, "ename": "jack", "age": 28, "department": "Development" "salary": 8000} > db.emp.aggregate ({$sort: {age:-1}}, {$skip:4}) {"_ id": 2, "ename": "eric", "age": 24, "department": "HR", "salary": 4500} {"_ id": 8, "ename": "sarah", "age": 24, "department": "Development" "salary": 7000} {"_ id": 6, "ename": "marry", "age": 23, "department": "Planning", "salary": 5000} {"_ id": 5, "ename": "Mark", "age": 22, "department": "Development", "salary": 6500} > db.emp.aggregate ({$project: {"name": "$ename" "age": "$department", "salary": "$salary", _ id:0}}) {"name": "tom", "age": 25, "department": "Sales", "salary": 6000} {"name": "eric", "age": 24, "department": "HR" "salary": 4500} {"name": "robin", "age": 30, "department": "Sales", "salary": 8000} {"name": "jack", "age": 28, "department": "Development", "salary": 8000} {"name": "Mark", "age": 22, "department": "Development" "salary": 6500} {"name": "marry", "age": 23, "department": "Planning", "salary": 5000} {"name": "hellen", "age": 32, "department": "HR", "salary": 6000} {"name": "sarah", "age": 24, "department": "Development" "salary": 7000} > db.emp.aggregate ({$project: {"name": "$ename", "age": "$age", "department": "$department", "salary": "$salary", _ id:0}}, {$match: {"salary": {$gt:6000}) {"name": "robin", "age": 30, "department": "Sales" "salary": 8000} {"name": "jack", "age": 28, "department": "Development", "salary": 8000} {"name": "Mark", "age": 22, "department": "Development", "salary": 6500} {"name": "sarah", "age": 24, "department": "Development" "salary": 7000} >
Map Reduce
Map-Reduce is a computing model that simply breaks down (MAP) a large number of work (data) and then merges the results into the final result (REDUCE).
The Map-Reduce provided by MongoDB is very flexible and useful for large-scale data analysis.
The following is the basic syntax of MapReduce:
> db.collection.mapReduce (
Function () {emit (key,value);}, / / map function
Function (key,values) {return reduceFunction}, / / reduce function
{
Out: collection
Query: document
Sort: document
Limit: number
}
)
Using MapReduce to implement two functions, the Map function and the Reduce function, the Map function calls emit (key, value), traverses all the records in the collection, and passes key and value to the Reduce function for processing.
The Map function must call emit (key, value) to return a key-value pair.
Parameter description:
Map: mapping function (generates a sequence of key-value pairs as arguments to the reduce function).
Reduce statistics function, the task of the reduce function is to change key-values into key-value, that is, to turn the values array into a single value value.
Out statistical results store collections (if not specified, temporary collections are used, which are automatically deleted when the client is disconnected).
Query is a filter, and only documents that meet the criteria call the map function. (query. Limit,sort can be combined at will)
The sort sorting parameter of the combination of sort and limit (also sorting documents before sending to the map function) can optimize the grouping mechanism
The upper limit of the number of documents limit sends to the map function (without limit, it is not useful to use sort alone)
> db.emp.mapReduce (function () {emit (this.department,1)) }, function (key,values) {return Array.sum (values)}, {out: "depart_summary"}). Find () {"_ id": "Development", "value": 3} {"_ id": "HR", "value": 2} {"_ id": "Planning", "value": 1} {"_ id": "Sales" "value": 2} use the built-in sum function to return the number of people in each department > db.emp.mapReduce (function () {emit (this.department,this.salary)) }, function (key,values) {return Array.avg (values)}, {out: "depart_summary"}). Find () {"_ id": "Development", "value": 7166.66666666666667} {"_ id": "HR", "value": 5250} {"_ id": "Planning", "value": 5000} {"_ id": "Sales" "value": 7000} use the built-in avg function to return the average salary of each department > db.emp.mapReduce (function () {emit (this.department,this.salary)) }, function (key,values) {return Array.avg (values) .tofixed (2)}, {out: "depart_summary"}). Find () {"_ id": "Development", "value": "7166.67"} {"_ id": "HR", "value": "5250.00"} {"_ id": "Planning", "value": 5000} {"_ id": "Sales" "value": "7000.00"} > keep two decimal places > db.emp.mapReduce (function () {emit (this.department,this.salary)) }, function (key,values) {return Array.sum (values)}, {out: "depart_summary"}). Find () {"_ id": "Development", "value": 21500} {"_ id": "HR", "value": 10500} {"_ id": "Planning", "value": 5000} {"_ id": "Sales" "value": 14000} > use the built-in sum function to return the total salary of each department > db.emp.mapReduce (function () {emit (this.department, {count:1})) }, function (key,values) {var sum=0; values.forEach (function (val) {sum+=val.count}); return sum }, {out: "depart_summary"}). Find () {"_ id": "Development", "value": 3} {"_ id": "HR", "value": 2} {"_ id": "Planning", "value": {"count": 1}} {"_ id": "Sales" "value": 2} > manually calculate the total number of employees in each department > db.emp.mapReduce (function () {emit (this.department, {salct:this.salary,count:1})) }, function (key,values) {var res= {salct:0,sum:0}; values.forEach (function (val) {res.sum+=val.count;res.salct+=val.salct}); return res }, {out: "depart_summary"}). Find () {"_ id": "Development", "value": {"salct": 21500, "sum": 3}} {"_ id": "HR", "value": {"salct": 10500, "sum": 2}} {"_ id": "Planning", "value": {"salct": 5000 "count": 1}} {"_ id": "Sales", "value": {"salct": 14000, "sum": 2}} > manually calculate the total number of employees and salaries in each department > db.emp.mapReduce (function () {emit (this.department, {salct:this.salary,count:1}) }, function (key,values) {var res= {salct:0,sum:0}; values.forEach (function (val) {res.sum+=val.count;res.salct+=val.salct}); return res.salct/res.sum }, {out: "depart_summary"}). Find () {"_ id": "Development", "value": 7166.66666666666667} {"_ id": "HR", "value": 5250} {"_ id": "Planning", "value": {"salct": 5000, "count": 1}} {"_ id": "Sales" "value": 7000} > manually calculate the average salary of each department > db.emp.mapReduce (function () {emit (this.department,this.salary)) }, function (key,values) {return Array.avg (values)}, {out: "depart_summary"}) .find ({value: {$gt:5000}}) {"_ id": "Development", "value": 7166.666666666667} {"_ id": "HR", "value": 5250} {"_ id": "Sales", "value": 7000} filter the calculated value of the packet Show only those departments whose average salary is greater than 5000 > db.emp.mapReduce (function () {emit (this.department,this.salary)) }, function (key,values) {return Array.avg (values)}, {out: "depart_summary"}). Find ({value: {$gt:5000}}). Sort ({value:1}) {"_ id": "HR", "value": 5250} {"_ id": "Sales", "value": 7000} {"_ id": "Development" "value": 7166.666666666667} sort the values after grouping calculation Default is ascending order > db.emp.mapReduce (function () {emit (this.department,this.salary)) }, function (key,values) {return Array.avg (values)}, {out: "depart_summary"}). Find ({value: {$gt:5000}}). Sort ({value:-1}) {"_ id": "Development", "value": 7166.666666666667} {"_ id": "Sales", "value": 7000} {"_ id": "HR" "value": 5250} > sorts the values after grouping calculation Manually specify descending order > db.emp.mapReduce (function () {emit (this.department,this.salary)) }, function (key,values) {return Array.avg (values)}, {out: "depart_summary"}) .find ({value: {$gt:5000}}) .sort ({value:-1}) .limit (2) {"_ id": "Development", "value": 7166.666666666667} {"_ id": "Sales", "value": 7000} > the calculated values of the grouping are sorted in descending order Take two values > db.emp.mapReduce (function () {emit (this.department, {count:1})) }, function (key,values) {var sum=0; values.forEach (function (val) {sum+=val.count}); return sum }, {out: "depart_summary", query: {age: {$gt:25}}). Find () {"_ id": "Development", "value": {"count": 1}} {"_ id": "HR", "value": {"count": 1}} {"_ id": "Sales" "value": {"count": 1}} > filter data before grouping Then group calculation > db.emp.mapReduce (function () {emit (this.department, {count:1})) }, function (key,values) {var sum=0; values.forEach (function (val) {sum+=val.count}); return sum }, {out: "depart_summary", query: {age: {$gt:22}}, sort: {age:1}}). Find () {"_ id": "Development", "value": 2} {"_ id": "HR", "value": 2} {"_ id": "Planning", "value": {"count": 1} {"_ id": "Sales" "value": 2} > filter data before grouping And sort, and then group calculation (this example is meaningless)
Group
The basic syntax is as follows:
Db.runCommand ({group: {
Ns: collection name
Key: grouped key object
Initial: initialize accumulator
$reduce: component decomposer
Condition: condition
Finalize: group finisher}})
The grouping is first grouped according to key, and each document in each group executes the $reduce method, which receives two parameters: one is the record in the group, and the other is the accumulator data.
Example:
Group by department and calculate the sum of salaries for each department, as follows:
> db.runCommand (. {group: {ns: "emp", key: {"department": true}, initial: {salct:0},... $reduce:function (oriDoc,prev) {prev.salct+=oriDoc.salary}...}}) {"waitedMS": NumberLong (0), "retval": [{"department": "Sales", "salct": 14000}, {"department": "HR", "salct": 10500}. {"department": "Development", "salct": 21500}, {"department": "Planning", "salct": 5000}], "count": NumberLong (8), "keys": NumberLong (4), "ok": 1} > Statistics of the total number of employees and wages in each department As follows: > db.runCommand ({group: {ns: "emp", key: {"department": true}, initial: {salct:0,count:0}, $reduce:function (oriDoc,prev) {prev.salct+=oriDoc.salary) Prev.count+=1}}) {"waitedMS": NumberLong (0), "retval": [{"department": "Sales", "salct": 14000, "count": 2}, {"department": "HR", "salct": 10500, "count": 2}, {"department": "Development", "salct": 21500, "count": 3}, {"department": "Planning", "salct": 5000, "count": 1}] "count": NumberLong (8), "keys": NumberLong (4), "ok": 1} > Statistics of the total number of employees, the sum of wages and the average of each department As follows: > db.runCommand ({group: {ns: "emp", key: {"department": true}, initial: {salct:0,count:0,avg:0}, $reduce:function (oriDoc,prev) {prev.salct+=oriDoc.salary) Prev.count+=1 Prev.avg= (prev.salct/prev.count) .tofixed (2)}}) {"waitedMS": NumberLong (0), "retval": [{"department": "Sales", "salct": 14000, "count": 2, "avg": "7000.00"}, {"department": "HR", "salct": 10500, "count": 2, "avg": "5250.00"}, {"department": "Development" "salct": 21500, "count": 3, "avg": "7166.67"}, {"department": "Planning", "salct": 5000, "count": 1, "avg": "5000.00"}], "count": NumberLong (8), "keys": NumberLong (4), "ok": 1} > Statistics of the maximum wage in each department As follows: > db.runCommand ({group: {ns: "emp", key: {"department": true}, initial: {salct:0}, $reduce:function (oriDoc,prev) {if (oriDoc.salary > prev.salct) {prev.salct=oriDoc.salary}) {"waitedMS": NumberLong (0), "retval": [{"department": "Sales", "salct": 8000}, {"department": "HR", "salct": 6000}} {"department": "Development", "salct": 8000}, {"department": "Planning", "salct": 5000}], "count": NumberLong (8), "keys": NumberLong (4), "ok": 1} > Statistics of the maximum wage of each department And filter the results, showing only departments greater than 5000 As follows: > db.runCommand ({group: {ns: "emp", key: {"department": true}, initial: {salct:0}, $reduce:function (oriDoc,prev) {if (oriDoc.salary > prev.salct) {prev.salct=oriDoc.salary}}, condition: {salary: {$gt:5000}) {"waitedMS": NumberLong (0), "retval": [{"department": "Sales", "salct": 8000}, {"department": Development " "salct": 8000}, {"department": "HR", "salct": 6000}], "count": NumberLong (6), "keys": NumberLong (3), "ok": 1} > describe the statistical results This is as follows: > db.runCommand ({group: {ns: "emp", key: {"department": true}, initial: {salct:0},... $reduce:function (oriDoc,prev) {if (oriDoc.salary > prev.salct) {prev.salct=oriDoc.salary}},. Condition: {salary: {$gt:5000}},... Finalize:function (prev) {prev.salct= "Department of the highest salary is" + prev.salct}...}) {"waitedMS": NumberLong (0), "retval": [{"department": "Sales", "salct": "Department of the highest salary is 8000"}, {"department": "Development", "salct": "Department of the highest salary is 8000"}, {"department": "HR", "salct": "Department of the highest salary is 6000"}] "count": NumberLong (6), "keys": NumberLong (3), "ok": 1} > format grouped keys with functions: if keys Department and department exist together in the collection Then grouping is a bit troublesome. The solution is as follows: > db.emp.insert ({. "_ id": 9, "ename": "sophie", "age": 28, "Department": "HR", "salary": 18000. }) WriteResult ({"nInserted": 1}) > db.emp.find () {"_ id": 1, "ename": "tom", "age": 25, "department": "Sales", "salary": 6000} {"_ id": 2, "ename": "eric", "age": 24, "department": "HR", "salary": 4500} {"_ id": 3 Ename: "robin", "age": 30, "department": "Sales", "salary": 8000} {"_ id": 4, "ename": "jack", "age": 28, "department": "Development", "salary": 8000} {"_ id": 5, "ename": "Mark", "age": 22, "department": "Development" "salary": 6500} {"_ id": 6, "ename": "marry", "age": 23, "department": "Planning", "salary": 5000} {"_ id": 7, "ename": "hellen", "age": 32, "department": "HR", "salary": 6000} {"_ id": 8, "ename": "sarah" "age": 24, "department": "Development", "salary": 7000} {"_ id": 9, "ename": "sophie", "age": 28, "Department": "HR", "salary": 18000} > > db.runCommand ({group: {ns: "emp",... $keyf:function (oriDoc) {if (oriDoc.Department) {return {department:oriDoc.Department}} else {return {department:oriDoc.department}}) ... Initial: {salct:0},... $reduce:function (oriDoc,prev) {if (oriDoc.salary > prev.salct) {prev.salct=oriDoc.salary}},... Condition: {salary: {$gt:5000}},... Finalize:function (prev) {prev.salct= "Department of the highest salary is" + prev.salct}...}) {"waitedMS": NumberLong (0), "retval": [{"department": "Sales", "salct": "Department of the highest salary is 8000"}, {"department": "Development", "salct": "Department of the highest salary is 8000"}, {"department": "HR" "salct": "Department of the highest salary is 18000"}], "count": NumberLong (7), "keys": NumberLong (3), "ok": 1} >
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.