In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
I. brief introduction
The aggregation framework of MongoDB is mainly used to transform and combine the documents in the collection, so as to analyze the data and make use of it.
The basic idea of the aggregation framework is:
Use multiple components to create a pipe for processing a series of documents.
These components include:
Filter (filtering), project (projecting), group (grouping), sort (sorting), limit (limiting), and skip (skipping).
How to use the aggregation framework:
Db. Collection. Components (component 1, component 2... )
Note: because the aggregate result is to be returned to the client, the aggregate result must be limited to 16m, which is the maximum response message size supported by MongoDB.
Second, use examples
2.1. Prepare sample data
For (var iTuno db.scores.aggregate ({"$match": {"score": {$gte:80}}, {$project: {"studentId": 1}})
3: sort the names of students. Once a student's name appears, add one to him.
> db.scores.aggregate ({"$match": {"score": {$gte:80}}, {$project: {"studentId": 1}}, {$group: {"_ id": "$studentId", "count": {$sum:1})
4: sort the result sets in descending order according to count
> db.scores.aggregate ({"$match": {"score": {$gte:80}}, {$project: {"studentId": 1}}, {$group: {"_ id": "$studentId", "count": {$sum:1}}, {"$sort": {"count":-1})
5: return the first three pieces of data
Db.scores.aggregate ({"$match": {"score": {$gte:80}}, {$project: {"studentId": 1}}, {$group: {"_ id": "$studentId", "count": {$sum:1}}, {"$sort": {"count":-1}}, {"$limit": 3})
III. Pipe operator
Each operator accepts a series of documents, processes them accordingly, and then passes the converted document as a result to the next operator. The last operator returns the result.
Different pipe operators can be combined in any order and in any number.
3.1filter command $match
Used to filter a collection of documents, where all regular query operators can be used. It is usually placed at the front of the pipe for the following reasons:
1: quickly filter unneeded documents to reduce the amount of data for subsequent operations
2: filter before projection and grouping, queries can use indexes
3.2.Projective command $project
Used to extract fields from a document, you can specify include and exclude fields, or rename fields. For example, to change studentId to sid, as follows:
Db.scores.aggregate ({"$project": {"sid": "$studentId"}})
Pipe operators can also use expressions to meet more complex requirements.
Mathematical expression for the pipe operator $project:
For example, add 20 points to the collective score, as follows:
> db.scores.aggregate ({"$project": {"studentId": 1, "newScore": {$add: ["$score", 20]})
Supported operators and corresponding syntax:
1:$add: [expr1 [, expr2, … Exprn]]
2:$subtract: [expr1,expr2]
3:$multiply: [expr1 [, expr2, … Exprn]]
4:$divice: [expr1,expr2]
5:$mod: [expr1,expr2]
Date expression for the pipe operator $project:
The aggregation framework contains some expressions for extracting date information, as follows:
$year, $month, $week, $dayOfMonth, $dayOfWeek, $dayOfYear, $hour, $minute, $second.
Note: these can only manipulate date fields, not data. Use examples:
{"$project": {"opeDay": {"$dayOfMonth": "$recoredTime"}
String expression for the pipe operator $project:
1:$substr: [expr, start position, number of bytes to fetch]
2:$concat: [expr1 [, expr2, … Exprn]]
3:$toLower:expr
4:$toUpper:expr
For example: {"$project": {"sid": {$concat: ["$studentId", "cc"]}
Logical expression for the pipe operator $project:
1:$cmp: [expr1,expr2]: compare two expressions. 0 means equal, large before positive number and large after negative number.
2:$strcasecmp: [string1,string2]: compare two strings, case-sensitive, and valid only for strings made up of Roman characters
3:$eq, $ne, $gt, $gte, $lt, $lte: [expr1,expr2]
4:$and, $or, $not
5:$cond: [booleanExpr,trueExpr,falseExpr]: if the boolean expression is true, return the true expression, otherwise return the false expression
6:$ifNull: [expr,otherExpr]: if expr is null, return otherExpr, otherwise return expr
For example: db.scores.aggregate ({"$project": {"newScore": {$cmp: ["$studentId", "sss"]})
3.3.Group command $group
Used to group documents according to the different values of a particular field. Once the grouping fields are selected, you can pass them to the "_ id" field of the $group function. For example:
Db.scores.aggregate ({"$group": {"_ id": "$studentId"}})
Or
Db.scores.aggregate ({"$group": {"_ id": {"sid": "$studentId", "score": "$score"})
Operators supported by $group:
1:$sum:value: for each document, add the value to the calculation result
2:$avg:value: returns the average of each packet
3:$max:expr: returns the maximum value within the group
4:$min:expr: returns the minimum value within the group
5:$first:expr: returns the first value of the group, ignoring other values. Generally, this operation is meaningful only when the data order is clearly known after sorting.
6:$last:expr: contrary to the previous one, returns the last value of the grouping
7:$addToSet:expr: if the current array does not contain expr, add it to the array
8:$push:expr: add expr to the array
3.4.The split command $unwind
Used to split each value in the array into a separate document.
3.5. sort command $sort
You can sort by any field, the same syntax as in a normal query. If you want to sort a large number of documents, it is strongly recommended that you sort at the first stage of the pipeline, where you can use an index.
3.6. Common aggregate functions
1:count: used to return the number of documents in the collection
2:distinct: find all the different values for a given key, and you must specify a collection and key when using it, for example:
Db.runCommand ({"distinct": "users", "key": "userId"})
IV. MapReduce
In MongoDB's aggregation framework, you can also use MapReduce, which is very powerful and flexible, but has some complexity, and is designed to implement some complex aggregation functions.
MapReduce in MongoDB uses JavaScript as the query language, so it can express arbitrary logic, but it runs very slowly and should not be used in real-time data analysis.
4.1.The HelloWorld of MapReduce
Find out all the keys in the collection and count the number of times each key appears.
The 1:Map function uses the emit function to return the value to be processed, as shown in the following example:
Var map = function () {for (var key in this) {emit (key, {count:1});}}
This represents a reference to the current document.
The 2:reduce function needs to process the data of the Map phase or the previous reduce, so the document returned by reduce must be an element of the second parameter of reduce, as shown in the following example:
Var reduce = function (key,emits) {var total = 0; for (var i in emits) {total + = emits [I] .count;} return {"count": total};}
3: run MapReduce. The example is as follows:
> var mr = db.runCommand ({"mapreduce": "scores", "map": map, "reduce": reduce, "out": "mrout"})
Description: scores is the set name, map is the map function, reduce is the reduce function, and mrout is the output variable name.
4: the final result of the query. An example is as follows:
Db.mrout.find ()
You can also change it, such as counting the median value of studentId and the number of times each value appears, and you can do the following:
1: modify the map function. The example is as follows:
Var map = function () {emit (this.studentId, {count:1});}
The 2:reduce function does not need to be changed
3: re-execute
Db.runCommand ({"mapreduce": "scores", "map": map, "reduce": reduce, "out": "mrout"})
4: view the final result
Db.mrout.find ()
4.2.More MapReduce optional keys
1:finalize:function: the result of reduce can be sent to finalize, which is the last step of the whole process.
2:keeptemp:boolean: whether to save a temporary result collection when the connection is closed
3:query:document: filter the document before sending it to map
4:sort:document: sorts documents before sending them to map
5:limit:integer: the maximum number of documents sent to the map function
6:scope:document: variables that can be used in javascript
7:verbose:boolean: whether to record detailed server logs
Example:
Var query = {"studentId": {"$lt": "S2"} var sort = {"studentId": 1}; var finalize = function (key,value) {return {"mykey": key, "myV": value};}; var mr = db.runCommand ({"mapreduce": "scores", "map": map, "reduce": reduce, "out": "mrout", "query": query, "sort": sort, "limit": 2, "finalize": finalize})
5. Aggregate command group
It is used to group collections, and after grouping, documents within each group are aggregated.
For example, to group studentId and find the highest score for each student, you can do the following:
Db.runCommand ({"group": {"ns": "scores", "key": {"studentId": 1}, "initial": {"score": 0}, "$reduce": function (doc,prev) {if (doc.score > prev.score) {prev.score = doc.score;}))
Ns: specifies the collection to be grouped
Key: specifies the key for the grouping
Initial: when the reduce function of each group is called, it is called once at the beginning to initialize
$reduce: executed on each document in each group, the system automatically passes in two parameters. Doc is the currently processed document, and prev is the result document of the previous execution in this group.
You can also add conditions to group by adding condition, for example:
Db.runCommand ({"group": {"ns": "scores", "key": {"studentId": 1}, "initial": {"score": 0}, "$reduce": function (doc,prev) {if (doc.score > prev.score) {prev.score = doc.score;}}, "condition": {"studentId": {$lt: "S2"}))
You can also use finalizer to make the final processing of the results of reduce. For example, if you ask for an average score for each student, you can first group according to studentId to find a total score, and then calculate the average score in finalizer:
Db.runCommand ({"group": {"ns": "scores", "key": {"studentId": 1}, "initial": {"total": 0}, "$reduce": function (doc,prev) {prev.total + = doc.score;}, "condition": {"studentId": {"$lt": "S2"}}, "finalize": function (prev) {prev.avg = prev.total/3;})
Note: finalize is called only once before each set of results is returned to the user, that is, each set of results is called only once
When the key of a group is complex, you can also use a function as a key. For example, if the key is not sensitive to size, it can be defined as follows:
Db.runCommand ({"group": {"ns": "scores", $keyf:function (doc) {return {studentId:doc.studentId.toLowerCase ()};}, "initial": {"total": 0}, "$reduce": function (doc,prev) {prev.total + = doc.score) }, "condition": {"$or": [{"studentId": {"$lt": "S2"}, {"studentId": "S0"}]}, "finalize": function (prev) {prev.avg = prev.total/3;})
Note: to use $keyf to define the function as the key, be sure to return the format of the object
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.