MongoDB新的数据统计框架介绍

　　目前的MongoDB在进行复杂的数据统计计算时都需要写MapReduce来实现，包括在SQL中比较常用的group by查询也需要写一个reduce才能实现，这是比较麻烦的。在MongoDB2.1中，将会引入一套全新的数据统计计算框架，让用户更方便的进行统计操作。

　　下面我们就来看看几个新的操作符：

　　$match

　　$match的作用是过滤数据，通过设置一个条件，将数据进行筛选过滤，例子：

db.runCommand({ aggregate : “article”, pipeline : [
{ $match : { author : “dave” } }
]});

　　这相当于将article这个collection中的记录进行筛选，筛选条件是author属性值为dave，其作用其实相当于普通的find命令，如：

> db.article.find({ author : “dave” });

　　所以，那这个命令有什么用呢?与find不同，find的结果是直接作为最终数据返回，而$match只是pipeline中的一环，它筛选的结果数据可以再进行下一级的统计操作。

　　$project

　　$project命令用于设定数据的筛选字段，就像我们SQL中select需要的字段一样。例子：

db.runCommand({ aggregate : “article”, pipeline : [
    { $match : { author : “dave” } },
    { $project : {
        _id : 0,
author : 1,
        tags : 1
    }}
]});

　　上面就是将所有author为dave的记录的author和tags两个字段取出来。(_id:0 表示去掉默认会返回的_id字段)

　　其实上面这个功能也能用我们平时用的find命令来实现，如：

> db.article.find({ author : “dave” }, { _id : 0, author : 1, tags : 1);

　　$unwind

　　$unwind命令很神奇，他可以将某一个为array类型字段的数据拆分成多条，每一条包含array中的一个属性。

　　比如你使用下面命令添加一条记录：

db.article.save( {
    title : “this is your title” ,
    author : “dave” ,
    posted : new Date(4121381470000) ,
    pageViews : 7 ,
    tags : [ “fun” , “nasty” ] ,
    comments : [
        { author :”barbara” , text : “this is interesting” } ,
        { author :”jenny” , text : “i like to play pinball”, votes: 10 }
    ],
    other : { bar : 14 }
});

　　这里面tags字段就是一个array。下面我们在这个字段上应用$unwind操作

　　db.runCommand({ aggregate : “article”, pipeline : [
　　{ $unwind : “$tags” }
　　]});

　　上面命令的意思就是按tags字段来拆分，此命令执行的结果如下：

{
        “result” : [
                {
                        “_id” : ObjectId(“4eeeb5fef09a7c9170df094b”),
                        “title” : “this is your title”,
                        “author” : “dave”,
                        “posted” : ISODate(“2100-08-08T04:11:10Z”),
                        “pageViews” : 7,
                        “tags” : “fun”,
                        “comments” : [
                                {
                                        “author” : “barbara”,
                                        “text” : “this is interesting”
                                },
                                {
                                        “author” : “jenny”,
                                        “text” : “i like to play pinball”,
                                        “votes” : 10
                                }
                        ],
                        “other” : {
                                “bar” : 14
                        }
                },
                {
                        “_id” : ObjectId(“4eeeb5fef09a7c9170df094b”),
                        “title” : “this is your title”,
                        “author” : “dave”,
                        “posted” : ISODate(“2100-08-08T04:11:10Z”),
                        “pageViews” : 7,
                        “tags” : “nasty”,
                        “comments” : [
                                {
                                        “author” : “barbara”,
                                        “text” : “this is interesting”
                                },
                                {
                                        “author” : “jenny”,
                                        “text” : “i like to play pinball”,
                                        “votes” : 10
                                }
                        ],
                        “other” : {
                                “bar” : 14
                        }
                }
        ],
        “ok” : 1
}

　　我们可以看到，原来的tags字段是一个包含两个元素的数组，通过$unwind命令后，被拆分成两条记录，每一条记录的tags字段是原来数组中的一个元素。

　　$group

　　$group命令比较好理解，功能就是按某一个key将key值相同的多条数据组织成一条。

　　比如我们使用下面命令再往article这个collection中写入一条记录，这时候我们就有两条记录了：

db.article.save( {
    title : “this is some other title” ,
    author : “jane” ,
    posted : new Date(978239834000) ,
    pageViews : 6 ,
    tags : [ “nasty” , “filthy” ] ,
    comments : [
        { author :”will” , text : “i don’t like the color” } ,
        { author :”jenny” , text : “can i get that in green?” }
    ],
    other : { bar : 14 }
});

　　我们可以先用上面的$unwind按tags将记录拆成多条，然后再将记录按tags字段重新组织，将同一个tag对应的所有author放在一个array中。只需要像下面这样写：

db.runCommand({ aggregate : “article”, pipeline : [
    { $unwind : “$tags” },
    { $group : {
_id : “$tags”,
        count : { $sum : 1 },
authors : { $addToSet : “$author” }
    }}
]});

　　这时候你就能得到如下结果了

{
        “result” : [
                {
                        “_id” : “filthy”,
                        “count” : 1,
                        “authors” : [
                                “jane”
                        ]
                },
                {
                        “_id” : “fun”,
                        “count” : 1,
                        “authors” : [
                                “dave”
                        ]
                },
                {
                        “_id” : “nasty”,
                        “count” : 2,
                        “authors” : [
                                “jane”,
                                “dave”
                        ]
                }
        ],
        “ok” : 1
}

　　上面是2.1版本将会推出的一些新的统计类命令的介绍，在易用性方面它们提供给我们很多便利，但是MongoDB MapReduce的最大硬伤，单个mongod中无法并行执行，貌似还是没有解决。虽然其命令中采用了pipeline 的组织模式，但是貌似还是完全串行且分降段完成的。

我们一直都在努力坚持原创.......请不要一声不吭，就悄悄拿走。

我原创，你原创，我们的内容世界才会更加精彩！

【所有原创内容版权均属TechTarget，欢迎大家转发分享。但未经授权，严禁任何媒体（平面媒体、网络媒体、自媒体等）以及微信公众号复制、转载、摘编或以其他方式进行使用。】

微信公众号

TechTarget

官方微博

TechTarget中国

取消回复

作者

: nosqlfan

MongoDB新的数据统计框架介绍

取消回复

作者

nosqlfan

相关推荐

MongoDB与Cassandra数据库对比

OpenWorld18大会：Ellison宣布数据库的搜寻和破坏任务

eHarmony公司利用Redis NoSQL数据库进行热存储

ObjectRocket着力发展Azure MongoDB服务