MapReduce with MongoDB and Python[ZT]
MapReduce with MongoDB and Python
Hi all,
- SQL commands are not used as query API (Examples of APIs used include JSON, BSON, etc.)
- Doesn't guarantee atomic operations.
- Distributed and horizontally scalable.
- It doesn't have to predefine schemas. (Non-Schema)
- Non-tabular data storing (eg; key-value, object, graphs, etc).
Map- Reduce
PyMongo
Map-Reduce in Action
- The MapReduce engine may invoke reduce functions iteratively; thus; these functions must be idempotent. That is, the following must hold for your reduce function:
for all k,vals : reduce( k, [reduce(k,vals)] ) == reduce(k,vals)
- Currently, the return value from a reduce function cannot be an array (it's typically an object or a number)
- If you need to perform an operation only once, use a finalize function.
As you can see the 'this' variable refers to the context from which the function is called. That is, MongoDB will call the map function on each document in the collection we are querying, and it will be pointing to document where it will have the access the key of a document such as 'text', by calling this.text. The map function doesn't return a list, instead it calls an emit function which it expects to be defined. This parameters of this function (key, value) will be grouped with others intermediate results from another map evaluations that have the same key (key, [value1, value2]) and passed to the function reduce that we will define now.
Now let's code our word count example using the Pymongo client and passing the map/reduce functions to the server.
With Map-Reduce function the word frequency count is extremely efficient and even performs better in a distributed environment. With this brief experiment we can see the potential of map-reduce model for distributed computing, specially on large data sets.
All code used in this article can be download here.
My next posts will be about performance evaluation on machine learning techniques. Wait for news!
Marcel Caraciolo
References
- http://nosql.mypopescu.com/post/394779847/mongodb-tutorial-mapreduce
- http://fredzvt.wordpress.com/2010/04/24/no-sql-mongodb-from-introduction-to-high-level-usage-in-csharp-with-norm/
转载于:https://www.cnblogs.com/zhengyun_ustc/archive/2010/08/22/1805849.html
MapReduce with MongoDB and Python[ZT]相关推荐
- mongodb和python交互
mongodb和python交互 1. mongdb和python交互的模块 pymongo 提供了mongdb和python交互的所有方法 安装方式: pip install pymongo 2. ...
- 基于 MongoDB 的 python 日志功能
本文首发于 Gevin的博客 原文链接:基于MongoDB的python日志功能 未经 Gevin 授权,禁止转载 基于MongoDB的python日志功能 why-log-to-mongodb 我几 ...
- MongoDB 和 Python 不通用的操作
具体操作 Mongodb SQL Python 空值操作 db.getCollection('example _data_2').find({'grade': null}) rows = collec ...
- MongoDB与python 交互
一.安装pymongo 注意 :当同时安装了python2和python3,为区分两者的pip,分别取名为pip2和pip3. 推荐:https://www.cnblogs.com/thunderLL ...
- 三、mongodb数据库系列——mongodb和python交互 总结
一.mongodb和python交互 学习目标 掌握 mongdb和python交互的增删改查的方法 掌握 权限认证的方式使用pymongo模块 1. mongdb和python交互的模块 pymon ...
- python做前端mongodb_Python爬虫之mongodb和python交互
mongodb和python交互 学习目标 掌握 mongdb和python交互的增删改查的方法 掌握 权限认证的方式使用pymongo模块 1. mongdb和python交互的模块 pymongo ...
- Mapreduce Wordcount白名单 Python实现
Mapreduce Wordcount白名单 Python实现 1.Mapper部分的map.py代码: 其中读入文件The_Man_of_Property.txt需要上传到HDFS文件系统上:had ...
- Python爬取豆瓣音乐存储MongoDB数据库(Python爬虫实战1)
Python爬取豆瓣音乐存储MongoDB数据库(Python爬虫实战1) 1. 爬虫设计的技术 1)数据获取,通过http获取网站的数据,如urllib,urllib2,requests等模块: ...
- MongoDB 与 python 的使用
MongoDB 与 python 的使用 运行结果 文章目录 MongoDB 与 python 的使用 MongoDB 数据库的结构 基本操作 创建数据库 删除数据库 创建集合 删除集合 查看已有集合 ...
最新文章
- 交流一点CCNP学习经验
- OPPOr7sm恢复出厂设置一直卡在开机界面
- jstat -gcutil 输出结果分析_JVM故障分析
- solidworks的小金球插件_SOLIDWORKS旋转流体仿真
- Microsoft SQL Server Desktop Engine安装过程中遇到的问题(2)
- html与markdown互相转换
- 使用RMAN备份数据库和归档日志合二为一
- 欧盟:2020年之前普及免费WiFi网络
- 《大象UML》看书笔记2:
- 斗鱼直播实时数据爬取
- 系统迁移到ssd 开启哪些服务器,如何使用分区助手完美迁移系统到SSD固态硬盘...
- 【python】py课上机作业3「谢尔宾斯基三角形」「递归输出列表」
- 【干货】实例讲解:跨部门沟通和与领导沟通的心得与技巧
- 【Matlab】 气候资料数据集预处理
- 【python爬虫学习】cookie模拟登陆
- codeforces 869 E. The Untended Antiquity(树状数组)
- 详解脑的功能区域分布以及布罗德曼分区系统
- linux一次系统调用时间,Linux系统调用—时间和日期
- python中write写入后文件依然空白
- POJ - 2955 Brackets (区间DP)
热门文章
- 关于Android Studio里的Gradle文件
- React入门0x014: Fragment
- vsftpd 配置:chroot_local_user与chroot_list_enable详解
- 线程工具类(根据电脑逻辑处理器个数控制同时运行的线程个数)
- 关于 继承、扩展和协议,深度好文
- [NodeJS] 优缺点及适用场景讨论 - 鱼松
- Eclipse工作空间还原到最初状态
- libvirt-adabddad
- iOS - 切换图片/clip subview/iCarousel
- 典型的开发国内小项目没失败的经验分享