Available in 2.1 development release. Will be stable for production in the 2.2 release
Built by Chris Westin (@cwestin63)
MongoDB provides an easy way to store documents that aren’t just flat records like SQL, but that also contain arrays and nested documents. Another common problem users have told us about is that it can be difficult to manipulate these sub-parts of documents easily. The new aggregation mechanism is also taking on making this easier, by providing operators that can be used to disassemble and re-assemble document sub-parts.
After many requests, we have also included expression evaluation to return computed values—for virtual fields.
The new aggregation framework is declarative. Rather than writing code to aggregate data, you specify a chain of operations, called a “pipeline,” to apply to your documents. This is similar to a pipe in a unix shell.
We’re now going to show you a few short examples (using the mongo shell) of this new functionality to get you started.
$match is a pipeline operator that filters out documents that don’t match the specified condition. For example
This very simple one step pipeline doesn’t introduce any new functionality yet, but shows how to put a query at the beginning of your pipeline. For those of you familiar with MongoDB, this is the equivalient of
> db.article.find({ author : "dave" });
$project is a pipeline operator that let’s you select which fields you want to include or exclude from a result. You can also create new virtual fields from computed values. Here is a simple example, building on the previous one:
While this functionality was available before from find(), what it does demonstrate here is how to put together more than one aggregation pipeline operator. The value of pipeline is an array consisting of the operators to apply. Conceptually, the aggregation framework behaves as if it was feeding the result of scanning the target collection through that pipeline.
Now let’s take a look at some new functionality that wasn’t available before.
$unwind hands out the elements of an array one at a time; each element is surrounded by the document that contains the original array. For this example, we’ll first show what a document from the article collection looks like. Suppose we had saved this:
db.article.save( { title : "this is your title" , author : "dave" , posted : new Date(4121381470000) , pageViews : 7 , tags : [ "fun" , "nasty" ] , comments : [ { author :"barbara" , text : "this is interesting" } , { author :"jenny" , text : "i like to play pinball", votes: 10 } ], other : { bar : 14 } });
{ "result" : [ { "_id" : ObjectId("4eeeb5fef09a7c9170df094b"), "title" : "this is your title", "author" : "dave", "posted" : ISODate("2100-08-08T04:11:10Z"), "pageViews" : 7, "tags" : "fun", "comments" : [ { "author" : "barbara", "text" : "this is interesting" }, { "author" : "jenny", "text" : "i like to play pinball", "votes" : 10 } ], "other" : { "bar" : 14 } }, { "_id" : ObjectId("4eeeb5fef09a7c9170df094b"), "title" : "this is your title", "author" : "dave", "posted" : ISODate("2100-08-08T04:11:10Z"), "pageViews" : 7, "tags" : "nasty", "comments" : [ { "author" : "barbara", "text" : "this is interesting" }, { "author" : "jenny", "text" : "i like to play pinball", "votes" : 10 } ], "other" : { "bar" : 14 } } ], "ok" : 1 }
Notice that the single document has been duplicated (see the identical _id values). However, the tags array in each copy of the document has been replaced with one value from the original tagsarray.
$unwind isn’t useful on its own, but is useful when combined with $match, which can be used to filter out copies of the document without values of interest, and then the array can be reassembled using other operators such as $push or $addToSet. $unwind is also useful when combined with $group.
$group groups elements with a common key together, and allows the application of aggregation functions. For the purposes of a small example, let’s add a second article to our collection:
db.article.save( { title : "this is some other title" , author : "jane" , posted : new Date(978239834000) , pageViews : 6 , tags : [ "nasty" , "filthy" ] , comments : [ { author :"will" , text : "i don't like the color" } , { author :"jenny" , text : "can i get that in green?" } ], other : { bar : 14 } });
If these articles represent content on a web site, I might want to know what tags users are applying to these articles. I can use $group to find that out like so:
The grouping key for a $group is defined by specifying a field (or subdocument of multiple fields) for the _id. The grouping key is used to define buckets into which documents are put. Other fields are aggregate functions that can compute values across the entries in each bucket.
In this case, one is added to count for each document that matches the group key. That counts up the number of items in each bucket. authors will be an array, and each unique author in the bucket will be added to it.
One document has the “filthy” tag, and it was written by jane. One document has the “fun” tag, and it was written by dave. Two documents have the “nasty” tag, and both jane and dave wrote documents with that tag.
The new aggregation functionality is available only by compiling from source. It is available in the 2.1 development release but is not meant for production. MongoDB 2.2 will be the first stable release with the aggregation framework. For more examples and information, check out the talk and demo from MongoSV 2011, or the documentation on the MongoDB wiki.