mongoose中的map reduce和group by

来源:互联网 发布:我的世界无限耐久js 编辑:程序博客网 时间:2024/06/12 01:46

原文:How to Map-Reduce withMongoose, mongoDB, Express, Node.js

mongoDB能够很好的支持Map-Reduce,如想通过Mongoose, Express, and Node.js实现这个功能,需要如下几个步骤:

在这个例子中,有如下的数据

1www.yahoo.com 2www.msn.com 3www.google.com 4www.yahoo.com 5www.yahoo.com 6www.msn.com
我们想把上述的数据变为如下的形式:

1www.yahoo.com,  3 2www.msn.com,    2 3www.google.com, 1
可以理解为SQL的group后的结果。

首先,我们基于nodejs+mongoose建立模型。

01mongoose = require('mongoose'); 02mongoose.connect('mongodb://localhost/db'); //this assumes your mongoDB is running on localhost within the collection 'db'03Schema = mongoose.Schema; 04ObjectId = Schema.ObjectId; 05var PingSchema = new Schema( 06  { 07    url       : String 08  , active    : { //each url has a start and end date for which it's active 09      start   : Date 10    , end     : Date 11    } 12  }); 13mongoose.model('Ping', PingSchema); //tell mongoose about the Ping schema 14Ping = mongoose.model('Ping'); //ask mongoose to create an instance of the Ping model 15  16app.get('/', function(req, res){ //set up an express route 17//...the code we'll be discussing below goes here 18}

建完model之后,我们现在转向map-reduce,有两个步骤需要处理: 一)执行map-reduce生成新的数据collection,二)查询新生成的collection.

关于map-reduce,可以参考this post about howmap-reduce works. 执行map-reduce采用如下的code.

1mongoose.connection.db.executeDbCommand(command, function(err, dbres) { 2        //If you need to alert users, etc. that the mapreduce has been run, enter code here 3});
command的定义如下

1var command = { 2        mapreduce: "pings", //the name of the collection we are map-reducing *note, this is the model Ping we defined above...mongoose automatically appends an 's' to the model name within mongoDB 3        query: { 'active.end' : { $gt: new Date() } }, //I've included this as an example of how to query for parameters outside of the map-reduced variable 4        map: urlMap.toString(), //a function we'll define next for mapping 5        reduce: urlReduce.toString(), //a function we'll define next for reducing 6        sort: {url: 1}, //let's sort descending...it makes the operation run faster 7        out: "pingjar" //the collection that will contain the map-reduce results *note, this must be a different collection than the map-reduce input 8};

接下来我们定义函数 urlMap 和urlReduce:

01urlMap = function() { //map function 02     emit(this.url, 1); //sends the url 'key' and a 'value' of 1 to the reduce function 03}  04  05urlReduce = function(previous, current) { //reduce function 06     var count = 0; 07     for (index in current) {  //in this example, 'current' will only have 1 index and the 'value' is 1 08       count += current[index]; //increments the counter by the 'value' of 1 09     } 10     return count; 11};

如果一切都执行的顺利,会生成新的数据collection‘pingjar’,其中包含了map-reduce的结果.由于mongoose没有提供访问该collection的方法,我们需要采用mongoDB原有的命令来读取该collection

1mongoose.connection.db.collection('pingjar', function(err, collection) { //query the new map-reduced table 2        collection.find({}).sort({'value': -1}).limit(10).toArray(function(err, pings) { //only pull in the top 10 results and sort descending by number of pings 3            res.render('home', { //tell Express to render the page with the database results pings and a title "PingJar" 4                'title': 'PingJar', 5                'pings': pings 6            }); 7        }); 8    });
结果中,对象’pings’的内容如下:

1{ "_id" : "www.yahoo.com", "value" : 3 } 2{ "_id" : "www.msn.com", "value" : 2 } 3{ "_id" : "www.google.com", "value" : 1 }

可能会有人疑问我为什么没有采用mongoDB group command. 因为我只是需要统计url的数目.如果想采用group,可以执行如下的代码:

command = {     'group' : { //mongodb group command        'ns' : 'pings', //the collection to query        'cond' : {'active.end' : { $gt: new Date() }}, //active.end must be in the future        'initial': {'count': 0}, //initialize any count object properties        '$reduce' : 'function(doc, out){ out.count++ }', //        'key' : {'url': 1} //fields to group by     } }        mongoose.connection.db.executeDbCommand(command, function(err, dbres){                        var ret = dbres.documents[0].retval; //这里包含了查询的结果集合。                        for (var key in ret)                                console.log(ret[key]);                });
原创粉丝点击