文章目录

  • MIT 6.824 分布式系统 lab1:MapReduce
  • Notes
  • wordcount's MapReduce Model look like
  • a simple sequential mapreduce implementation(mrsequential.go)

MIT 6.824 分布式系统 lab1:MapReduce

文档 https://pdos.csail.mit.edu/6.824/labs/lab-mr.html
MapReduce论文 https://pdos.csail.mit.edu/6.824/schedule.html
知乎笔记 zhuanlan.zhihu.com/p/54243727
博客 https://www.cnblogs.com/haoweizh/p/10395016.html

Notes

单词计数模块:We also provide you with a couple of MapReduce applications: word-count in mrapps/wc.go. a simple sequential mapreduce implementation

[kou@python mrapps]$ pwd
/home/kou/MIT/6.824-golabs-2020/src/mrapps
[kou@python mrapps]$ ls
wc.go

wordcount’s MapReduce Model look like

[kou@python mrapps]$ cat wc.go
package main// a word-count application "plugin" for MapReduce.
//
// go build -buildmode=plugin wc.goimport "../mr"
import "unicode"
import "strings"
import "strconv"// The map function is called once for each file of input. The first
// argument is the name of the input file, and the second is the
// file's complete contents. You should ignore the input file name,
// and look only at the contents argument. The return value is a slice
// of key/value pairs.
//
func Map(filename string, contents string) []mr.KeyValue {// function to detect word separators.ff := func(r rune) bool { return !unicode.IsLetter(r) }// split contents into an array of words.words := strings.FieldsFunc(contents, ff)kva := []mr.KeyValue{}for _, w := range words {kv := mr.KeyValue{w, "1"}kva = append(kva, kv)}return kva
}// The reduce function is called once for each key generated by the
// map tasks, with a list of all the values created for that key by
// any map task.func Reduce(key string, values []string) string {// return the number of occurrences of this word.return strconv.Itoa(len(values))
}

a simple sequential mapreduce implementation(mrsequential.go)

mrsequential.go is a imple sequential MapReduce and leaves its output in the file mr-out-0. The input is from the text files named pg-xxx.txt.

[kou@python main]$ cat mrsequential.go
package main// simple sequential MapReduce.// go run mrsequential.go wc.so pg*.txtimport "fmt"
import "../mr"
import "plugin"
import "os"
import "log"
import "io/ioutil"
import "sort"// for sorting by key.
type ByKey []mr.KeyValue// for sorting by key.
func (a ByKey) Len() int           { return len(a) }
func (a ByKey) Swap(i, j int)      { a[i], a[j] = a[j], a[i] }
func (a ByKey) Less(i, j int) bool { return a[i].Key < a[j].Key }func main() {if len(os.Args) < 3 {fmt.Fprintf(os.Stderr, "Usage: mrsequential xxx.so inputfiles...\n")os.Exit(1)}mapf, reducef := loadPlugin(os.Args[1])// os.Args[1] wc.so//// read each input file,// pass it to Map,// accumulate the intermediate Map output.//intermediate := []mr.KeyValue{}for _, filename := range os.Args[2:] {//os.Args[2:] pg*.txtfile, err := os.Open(filename)if err != nil {log.Fatalf("cannot open %v", filename)}content, err := ioutil.ReadAll(file)if err != nil {log.Fatalf("cannot read %v", filename)}file.Close()kva := mapf(filename, string(content))intermediate = append(intermediate, kva...)}//// a big difference from real MapReduce is that all the// intermediate data is in one place, intermediate[],// rather than being partitioned into NxM buckets.//sort.Sort(ByKey(intermediate))oname := "mr-out-0"ofile, _ := os.Create(oname)//// call Reduce on each distinct key in intermediate[],// and print the result to mr-out-0.//i := 0for i < len(intermediate) {j := i + 1for j < len(intermediate) && intermediate[j].Key == intermediate[i].Key {j++}values := []string{}for k := i; k < j; k++ {values = append(values, intermediate[k].Value)}output := reducef(intermediate[i].Key, values)// this is the correct format for each line of Reduce output.fmt.Fprintf(ofile, "%v %v\n", intermediate[i].Key, output)i = j}ofile.Close()
}//
// load the application Map and Reduce functions
// from a plugin file, e.g. ../mrapps/wc.so
//func loadPlugin(filename string) (func(string, string) []mr.KeyValue, func(string, []string) string) {p, err := plugin.Open(filename)if err != nil {log.Fatalf("cannot load plugin %v", filename)}xmapf, err := p.Lookup("Map")if err != nil {log.Fatalf("cannot find Map in %v", filename)}mapf := xmapf.(func(string, string) []mr.KeyValue)xreducef, err := p.Lookup("Reduce")if err != nil {log.Fatalf("cannot find Reduce in %v", filename)}reducef := xreducef.(func(string, []string) string)return mapf, reducef
}

6.824 MapReduce lab1 2020(一)相关推荐

  1. 6.824 Raft lesson4 2020(一)

    raft实现 距离上一篇文章一个月,因为6.824的课程看不懂,基础知识薄弱.现在了解一点Raft算法(自己动手实现一遍)还需要其他分布式相关的基础知识(实现一个分布式对象存储系统),然后再去继续学习 ...

  2. 6.824 RPC lesson2 2020(二)

    Use RPC to make a kv storage server Go example: kv.go on schedule page A toy key/value storage serve ...

  3. 6.824 RPC lesson2 2020(一)

    resources https://pdos.csail.mit.edu/6.824/notes/crawler.go https://pdos.csail.mit.edu/6.824/notes/l ...

  4. 【大数据/分布式】MapReduce学习-结合6.824课程

    参考多篇文档.博客,仅供学习记录. 1.简介 MapReduce用于大规模数据集(大于1TB)的并行运算.概念"Map(映射)"和"Reduce(归约)",是它 ...

  5. 利用 Map-Reduce 从文件中找到出现频率最高的 10 个 URL(2021 VLDB Summer School Lab0)

    这篇博文主要是对 2021 VLDB Summer School Lab0 的一个总结 这个lab与MIT 6.824 的 lab1 相似,个人感觉比MIT 6.824 的 lab1 要稍微简单些,更 ...

  6. 【网易C计划重磅启动】参与开源分布式存储Curve,抢校招offer!

    前言 网易高性能分布式存储系统Curve已在github开源,开源以来受到了业界的广泛关注,现招募在校学生贡献者加入我们的开发. Who?任何对分布式存储系统,Curve感兴趣的在校生,不限年级不限专 ...

  7. Windows 7 x64 SP1 安装 Windows Edge 浏览器

    我需要在电脑上安装一个能录音录像的软件,找到了一个安装包但需要使用微软的 OneDrive 转存过来才能下载,我想直接使用浏览器同步 OneDrive ,因此需要安装 Microsoft Edge 浏 ...

  8. 咖说 | 富达:企业财资为何该考虑数字货币

    " 收集一众行业大咖观点,探索区块链商业及应用.百家争鸣.百花齐放,说理.解密.预测和八卦,了解行业内幕,看咖说就够了! 投稿请联系 :tougao@conflux-chain.org 本文 ...

  9. 6.824-Distributed Transactions

    Principles of Computer System Design An Introduction Chapter 9 Atomicity: All-or-Nothing and Before- ...

最新文章

  1. CSS grid 的用法
  2. 使用cv2.Sobel()、cv2.Scharr()、cv2.Laplacian()寻找图像的梯度、边缘
  3. python爬取电影和美食数据实战
  4. 算法--------数组------反转字符串中的元音字母
  5. 【David Silver强化学习公开课】-6:Value Function Approximation
  6. 送出orkut邀请,当然如果需要gmail邀请,还有很多
  7. 【Python基础知识-pycharm版】第八节-面向对象编程/类
  8. python面向对象编程之访问限制
  9. mysql-电商库演练1-创建数据-基本查询练习
  10. ASP中巧用Response属性
  11. html中文本框冒号对齐,html5 冒号分隔符对齐的实现,
  12. Pandas入门教程(三)
  13. JQueryDOM之CSS操作
  14. 【BZOJ3379】[Usaco2004 Open]Turning in Homework 交作业 DP
  15. 计算机系统的位的描述性定义,计算机系统中,“位”的描述性定义是________。
  16. SPSS回归分析结果解读【来自百度知道】
  17. 如何选择IT培训机构?
  18. Android SD卡简单的文件读写操作
  19. 数学建模算法 一 简述(3)规划模型-整数规划
  20. steam网络游戏加速技术,针对任何联网进程或者游戏代理加速都行。

热门文章

  1. treegrid,可以展开的jqgrid树
  2. Sx05RE-S905.arm-2.2.1
  3. Cisco TrustSec(理解)
  4. Android_(游戏)打飞机04:绘画敌机、添加子弹
  5. RTKLIB的主要功能
  6. zepto学习之路--源代码提取
  7. vi @-function
  8. 晚上答辩的理论知识准备
  9. 剑灵系统推荐加点_剑灵重制修炼系统 无定式加点打造自我风格
  10. java需要先安装jdk_谢谢知乎。Java初学者首先下载 JDK 开发环境,然后再下 eclipse 对吗?那 tomcat是什么?还需要安装吗?...