In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
The official website of Mongodb provides a census of the United States, which can be downloaded at
Http://media.mongodb.org/zips.json
Data examples:
[root@localhost cluster] # head zips.json {"_ id": "01001", "city": "AGAWAM", "loc": [- 72.622739, 42.070206], "pop": 15338, "state": "MA"} {"_ id": "01002", "city": "CUSHMAN", "loc": [- 72.51564999999999, 42.377017], "pop": 36963 "state": "MA"} {"_ id": "01005", "city": "BARRE", "loc": [- 72.10835400000001, 42.409698], "pop": 4546, "state": "MA"} {"_ id": "01007", "city": "BELCHERTOWN", "loc": [- 72.41095300000001, 42.275103], "pop": 10579 "state": "MA"} {"_ id": "01008", "city": "BLANDFORD", "loc": [- 72.936114, 42.182949], "pop": 1240, "state": "MA"} {"_ id": "01010", "city": "BRIMFIELD", "loc": [- 72.188455, 42.116543], "pop": 3706 "state": "MA"} {"_ id": "01011", "city": "CHESTER", "loc": [- 72.988761, 42.279421], "pop": 1688, "state": "MA"} {"_ id": "01012", "city": "CHESTERFIELD", "loc": [- 72.833309, 42.38167], "pop": "state": "MA"} {"_ id": "01013", "city": "CHICOPEE", "loc": [- 72.607962, 42.162046], "pop": 23396, "state": "MA"} {"_ id": "01020", "city": "CHICOPEE", "loc": [- 72.576142, 42.176443], "pop": 31495 "state": "MA"}
Import data into a mongodb database using mongoimport
[root@localhost cluster] # mongoimport-d test-c "zipcodes"-- file zips.json-h 192.168.199.219 test 270202016-01-16T18:31:29.424+0800 connected to: 192.168.199.219 test 270202016-01-16T18:31:32.420+0800 [#.] Test.zipcodes 2.1 MB/3.0 MB (68.5%) 2016-01-16T18:31:34.471+0800 [#] test.zipcodes 3.0 MB/3.0 MB (100.0%) 2016-01-16T18:31:34.471+0800 imported 29353 documents
I. aggregation operation for a single purpose
Ask for simple operations such as count,distinct
Example 1.1: find the number of documents in the zipcodes collection
Db.zipcodes.count ()
Example 1.2 find the total number of documents in the MA state
Db.zipcodes.count ({state: "MA"})
Example 1.3 find out which states are in zipcodes
Db.zipcodes.distinct ("state")
Second, use the aggregate aggregation framework for more complex aggregation operations
Example 2.1: count the total population of each state
Db.zipcodes.aggregate ([{$group: {_ id: "$state", total: {$sum: "$pop"}])
Aggregate queries are made using the aggregate method of the collection.
The $group keyword is followed by a grouped field (be sure to use the $prefix when referencing the field), as well as the aggregate function.
_ id: is the keyword that represents the primary key that returns the result set.
The equivalent SQL of this query is
Select state as _ id,sum (pop) as total from zipcodes group by state
Example 2.2: statistics of the total population of each state and each city
Db.zipcodes.aggregate ([{$group: {_ id: {state: "$state", city: "$city"}, pop: {$sum: "$pop"},])
If there is more than one field grouped, each field should be given an alias, such as state: "$state"
Example 2.3: count the total population of cities with a population of more than 10000 in each state
Db.zipcodes.aggregate ([{$match: {"pop": {$gt: 10000}}}, {$group: {_ id: {state: "$state"}, pop: {$sum: "$pop"},])
The $match keyword is followed by the filter criteria for the collection. This statement is equivalent to the following SQL
Select state,sum (pop) as pop from zipcodes where pop > 10000 group by state
Example 2.4: query states with a total population of more than 10 million
Db.zipcodes.aggregate ([{$group: {_ id: {state: "$state"}, pop: {$sum: "$pop"}, {$match: {"pop": {$gt: 1000 million}}}])
Putting $match after $group is equivalent to performing a group operation before filtering the result set. The equivalent sql is as follows
Select state,sum (pop) as pop from zipcodes group by state having sum (pop) > 100000000
Example 5: find the average population of each state city
Db.zipcodes.aggregate ([{$group: {_ id: {state: "$state", city: "$city"}, pop: {$sum: "$pop"}, {$group: {_ id: "$_ id.state", avgPop: {$avg: "$pop"}}])
Our aggregate function supports multiple iterations, and the equivalent sql of this statement is
Select state,avg (pop) as avgPop from (select state,city,sum (pop) pop from zipcodes group by state,city) group by state
Example 2.5: find the name of the city with the largest and least population in each state and the corresponding population
Db.zipcodes.aggregate ([{$group: {_ id: {state: "$state", city: "$city"}, cityPop: {$sum: "$pop"}, {$sort: {cityPop: 1}}, {$group: {_ id: "$_ id.state", biggestCity: {$last: "$_ id.city"}, biggestPop: {$last: "$cityPop"} SmallestCity: {$first: "$_ id.city"}, smallestPop: {$first: "$cityPop"}}])
The first $group calculates the number of people grouped by state,city.
The $sort operation is sorted by population
The second $group is grouped by state, and cityPop sorting is installed for the data of each state packet. The first row of data for each group ($first) is the least populated city, and the last row ($last) is the most populous city.
Example 2.6 uses $project to reformat the result
Db.zipcodes.aggregate ([{$group: {_ id: {state: "$state", city: "$city"}, cityPop: {$sum: "$pop"}, {$sort: {cityPop: 1}}, {$group: {_ id: "$_ id.state", biggestCity: {$last: "$_ id.city"}, biggestPop: {$last: "$cityPop"} SmallestCity: {$first: "$_ id.city"}, smallestPop: {$first: "$cityPop"}}, {$project: {_ id:0, state: "$_ id", biggestCity: {name: "$biggestCity", pop: "$biggestPop"}, smallestCity: {name: "$smallestCity" Pop: "$smallestPop"}])
Example 2.7 do aggregate statistics on the contents of the array
Let's assume that there is a collection of students taking courses. Examples of data are as follows.
Db.course.insert ({name: "Zhang San", age:10,grade: "fourth grade", course: ["math", "English", "politics"]}) db.course.insert ({name: "Li Si", age:9,grade: "third grade", course: ["mathematics", "Chinese", "nature"]}) db.course.insert ({name: "Wang Wu", age:11,grade: "fourth grade", course: ["mathematics", "English") Db.course.insert ({name: "Zhao Liu", age:9,grade: "Grade 4", course: ["Mathematics", "History", "Politics"]})
Ask how many people take each course.
Db.course.aggregate ([{$unwind: "$course"}, {$group: {_ id: "$course", sum: {$sum: 1}, {$sort: {sum:-1}])
$unwind, which is used to unpack the contents of the array, and then group them according to the unpacked data. In addition, there is no $count keyword in aggregate, so use $sum:1 to calculate count.
Example 2.8 asks what city each state has.
Db.zipcodes.aggregate ([{$group: {_ id: "$state", cities: {$addToSet: "$city"}},])
AddToSet writes the city contents of each packet into an array.
Suppose we have the following data structure
Db.book.insert ({_ id: 1, title: "MongoDB Documentation", tags: ["Mongodb", "NoSQL"], year: 2014, subsections: [{subtitle: "Section 1: Install MongoDB", tags: ["NoSQL", "Document"], content: "Section 1: This is the content of section 1."}, {subtitle: "Section 2: MongoDB CRUD Operations" Tags: ["Insert", "Mongodb"], content: "Section 2: This is the content of section 2."}, {subtitle: "Section 3: Aggregation", tags: ["Aggregate"], content: {text: "Section 3: This is the content of section3.", tags: ["MapReduce", "Aggregate"]}}]})
This document describes the chapters of the book, each chapter has a tags field, and the book itself has a tags field.
If the customer needs it, check for books with the label Mongodb and chapters that show only the label Mongodb. It is impossible for us to use the find () method.
Db.book.find ({$or: [{tags: {$in: ['Mongodb']}}, {"subsections.tags": {$in: [' Mongodb']}}]})
A similar query above shows that all parts of the document are hit, and sections that do not contain Mongodb tags are also displayed.
Aggregate provides a $redact expression that can be tailored to the result.
Db.book.aggregate ([{$redact: {$cond: {if: {$gt: [{$size: {$setIntersection: ["$tags", ["Mongodb"]]}}, 0]}, then: "$$DESCEND", else: "$PRUNE"}}])
$DESCEND returns the conditional tags field if the condition is met, or the parent field for embedded documents. All judgment conditions are applied to the embedded document.
$PRUNE does not display this field if the condition is not met.
The query results are as follows
{"_ id": 1, "title": "MongoDB Documentation", "tags": ["Mongodb", "NoSQL"], "year": 2014, "subsections": [{"subtitle": "Section 2: MongoDB CRUD Operations" "tags": ["Insert", "Mongodb"], "content": "Section 2: This is the content of section 2."}]}
Third, use mapReduce
Example 3.1: count the total population of each state
Db.zipcodes.mapReduce (function () {emit (this.state, this.pop)}, / / mapFunction (key, values) = > {return Array.sum (values)}, / / reduceFunction {out: "zipcodes_groupby_state"})
With mapReduce, there are at least three parameters, the map function, the reduce function, and the out output parameter.
In the map function, this represents the current document being processed. The emit function, which passes the passed key-value pair to the reduce function.
Reduce accepts the output of the map function as input. The values in reduce is a list. For the above example, state is the key, and the pop corresponding to each record of the same state forms a list as a value. The form is as follows
State = "CA" values= [51841, 40629...]
The key of the reduce function must be returned by default, and the return value of return adds up the values in values. As a value.
Out: the collection saved by the output result
Example 3.2 counts the population of each city and the number of documents in each city.
Db.zipcodes.mapReduce (function () {var key = {state:this.state,city:this.city} emit (key, {count:1,pop:this.pop})}, / / mapFunction (key, values) = > {var retval = {count:0,pop:0} for (var I = 0 values I < values.length) Return retval +) {retval.count + = values [I] .count retval.pop + = values [I] .pop} return retval}, / / reduceFunction {out: "zipcodes_groupby_state_city"})
We pass {state,city} as an object as a value to the key of the map function. Pass the {count:1,pop:this.pop} object to the value of map.
The value of count,pop is calculated again in the reduce function. Return.
The equivalent sql is as follows
Select state,city,count (*) as count,sum (pop) as pop from zipcodes group by state,city
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.