仅供测试学习的文章,不推荐在生产环境使用2.0,因为2.0采用YARN,hive,hbase,mahout等需要map/reduceV1的可能无法使用hadoop 2.0或者会出现意外情况。

5月23日,apache发布了hadoop 2.0的测试版。正好跟家呆着没事干,小小的体会了一下map/reduce V2。

环境,virtual box虚拟机ubuntu server 12.04,openjdk-7。
简单介绍一下,2.0.0是从hadoop 0.23.x发展出来的。取消了jobtracker和tasktracker,或者说,是把这两个封装到了container里面。使用YARN替代了原来的map/reduce。
YARN号称是第二代map/reduce,速度比一代更快,且支持集群服务器数量更大。hadoop 0.20.x和由其发展过来的1.0.x支持集群数量建议在3000台左右,最大支持到4000台。而hadoop 2.0和YARN宣称支持6000-10000台,CPU核心数支持200000颗。从集群数量和运算能力上说,似乎还是提高了不少的。并且加入了namenode的HA,也就是高可用。我说似乎,因为没有在实际生产环境测试速度。而namenode的HA,因为是虚拟机测试,也就没有测试。只是简单的看了一下。
2.0的文件结构相比1.0有所变化,更加清晰明了了。可执行文件在bin/下,server启动放到了sbin/下,map/red,streaming,pipes的jar包放到了share/下。很容易找到。
安装包解压缩后,先进入etc/hadoop/目录下,按照单机版方式配置几个配置文件。有core-site.xml,hdfs-site.xml,但是没有了mapred-site.xml,取而代之的是yarn-site.xml
假设已经按照单机配置配好了,那么进入$HADOOP_HOME/bin/目录下
执行如下
./hadoop namenode -format
#先格式化
cd ../sbin/
#进入sbin目录,这里放的都是server启动脚本
./hadoop-daemon.sh start namenode
./hadoop-daemon.sh start datanode
./hadoop-daemon.sh start secondarynamenode
#备份服起不起都无所谓,不影响使用,不过可以用来试试HA功能
#下面较重要,2.0取消了jobtracker和tasktracker,以YARN来代替,所以如果运行start jobtracker一类的,会报错。
#且hadoop,hdfs,map/reduce功能都分离出了单独脚本,所以不能用hadoop-daemon.sh启动所有了。
./yarn-daemon.sh start resourcemanager
#这个就相当于原来的jobtracker,用作运算资源分配的进程,跟namenode可放在一起。
./yarn-daemon.sh start nodemanager
#这个相当于原来的tasktracker,每台datanode或者叫slave的服务器上都要启动。
ps aux一下,如果看到4个java进程,就算启动成功了,访问http://localhost:50070看看hdfs情况。且由于取消了jobtracker,所以也就没有50030端口来查看任务情况了,这个以后再说吧。
然后来试试编写第一个map/reduce V2的程序。其实从程序的编写方式来说跟V1没有任何区别,只是最后调用方式变化了一下。hadoop 2.0为了保证兼容性,用户接口方面对于用户来说,还是跟原来是一样的。
这样一段数据
20120503        04      2012-05-03 04:49:22                     222.139.35.72   Log_ASF ProductVer="5.12.0425.2111"
20120503        04      2012-05-03 04:49:21                     113.232.38.239  Log_ASF ProductVer="5.09.0119.1112"
假设就2条不一样的吧,一共20条。
还是用python来写map/red脚本
#!/usr/bin/python
#-*- encoding:UTF-8 -*-
#map.py
import sys

debug = True
if debug:
                lzo = 0
else:
                lzo = 1

count='0'
for line in sys.stdin:
                try:
                                flags = line[:-1].split('\t')
                                if len(flags) == 0:
                                                break
                                if len(flags) != 5+lzo:
                                                continue

stat_date = flags[2+lzo].split(' ')[0]
                                version = flags[5+lzo].split('"')[1]

str = stat_date+','+version+'\t'+count
                                print str

except Exception,e:
                                print e

------------------------------------------------------------------

#!/usr/bin/python
#-*- encoding:UTF-8 -*-
#reduce.py
import sys

import string

res = {}
#声明字典

for line in sys.stdin:
                try:
                                flags = line[:-1].split('\t')
                                if len(flags) != 2:
                                                continue
                                field_key = flags[0]
                                if res.has_key(field_key) == False:
                                                res[field_key] = 0
                                res[field_key] += 1
                except Exception,e:
                                pass

for key in res.keys():
                print key+','+'%s' % (res[key])

然后把范例数据复制到hdfs上面用
./hadoop fs -mkdir /tmp
./hadoop fs -copyFromLocal /root/asf /tmp/asf
测试一下,还跟以前hadoop一样。不过两种streaming的方式都可以
./hadoop jar /opt/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.0.0-alpha.jar -mapper /opt/hadoop/mrs/map.py -reducer /opt/hadoop/mrs/red.py -input /tmp/asf -output /asf

或者

./yarn jar /opt/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.0.0-alpha.jar -mapper /opt/hadoop/mrs/map.py -reducer /opt/hadoop/mrs/red.py -input /tmp/asf -output /asf

然后
./hadoop fs -cat /asf/part-00000文件
2012-05-03,5.09.0119.1112,2
2012-05-03,5.12.0425.2111,18

结果正确。
附map/reduce V2执行日志:
root@localhost:/opt/hadoop/bin# ./yarn jar /opt/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.0.0-alpha.jar -mapper /opt/hadoop/mrs/map.py -reducer /opt/hadoop/mrs/red.py -input /tmp/asf -output /asf
12/06/01 23:26:40 WARN util.KerberosName: Kerberos krb5 configuration not found, setting default realm to empty
12/06/01 23:26:41 WARN conf.Configuration: session.id is deprecated. Instead, use dfs.metrics.session-id
12/06/01 23:26:41 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
12/06/01 23:26:41 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
12/06/01 23:26:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
12/06/01 23:26:42 WARN snappy.LoadSnappy: Snappy native library not loaded
12/06/01 23:26:42 INFO mapred.FileInputFormat: Total input paths to process : 1
12/06/01 23:26:42 INFO mapreduce.JobSubmitter: number of splits:1
12/06/01 23:26:42 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar
12/06/01 23:26:42 WARN conf.Configuration: mapred.create.symlink is deprecated. Instead, use mapreduce.job.cache.symlink.create
12/06/01 23:26:42 WARN conf.Configuration: mapred.job.name is deprecated. Instead, use mapreduce.job.name
12/06/01 23:26:42 WARN conf.Configuration: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
12/06/01 23:26:42 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
12/06/01 23:26:42 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
12/06/01 23:26:42 WARN conf.Configuration: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
12/06/01 23:26:42 WARN conf.Configuration: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
12/06/01 23:26:42 WARN conf.Configuration: mapred.mapoutput.value.class is deprecated. Instead, use mapreduce.map.output.value.class
12/06/01 23:26:42 WARN conf.Configuration: mapred.mapoutput.key.class is deprecated. Instead, use mapreduce.map.output.key.class
12/06/01 23:26:42 WARN conf.Configuration: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
12/06/01 23:26:42 WARN mapred.LocalDistributedCacheManager: LocalJobRunner does not support symlinking into current working dir.
12/06/01 23:26:42 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
12/06/01 23:26:42 INFO mapreduce.Job: Running job: job_local_0001
12/06/01 23:26:42 INFO mapred.LocalJobRunner: OutputCommitter set in config null
12/06/01 23:26:42 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
12/06/01 23:26:42 INFO mapred.LocalJobRunner: Waiting for map tasks
12/06/01 23:26:42 INFO mapred.LocalJobRunner: Starting task: attempt_local_0001_m_000000_0
12/06/01 23:26:42 INFO mapred.Task:    Using ResourceCalculatorPlugin : org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@52b5ef94
12/06/01 23:26:42 INFO mapred.MapTask: numReduceTasks: 1
12/06/01 23:26:42 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
12/06/01 23:26:42 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
12/06/01 23:26:42 INFO mapred.MapTask: soft limit at 83886080
12/06/01 23:26:42 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
12/06/01 23:26:42 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
12/06/01 23:26:42 INFO streaming.PipeMapRed: PipeMapRed exec [/opt/hadoop/mrs/map.py]
12/06/01 23:26:42 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
12/06/01 23:26:42 WARN conf.Configuration: user.name is deprecated. Instead, use mapreduce.job.user.name
12/06/01 23:26:42 WARN conf.Configuration: map.input.start is deprecated. Instead, use mapreduce.map.input.start
12/06/01 23:26:42 WARN conf.Configuration: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
12/06/01 23:26:42 WARN conf.Configuration: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
12/06/01 23:26:42 WARN conf.Configuration: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
12/06/01 23:26:42 WARN conf.Configuration: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
12/06/01 23:26:42 WARN conf.Configuration: map.input.length is deprecated. Instead, use mapreduce.map.input.length
12/06/01 23:26:42 WARN conf.Configuration: mapred.local.dir is deprecated. Instead, use mapreduce.cluster.local.dir
12/06/01 23:26:42 WARN conf.Configuration: mapred.work.output.dir is deprecated. Instead, use mapreduce.task.output.dir
12/06/01 23:26:42 WARN conf.Configuration: map.input.file is deprecated. Instead, use mapreduce.map.input.file
12/06/01 23:26:42 WARN conf.Configuration: mapred.job.id is deprecated. Instead, use mapreduce.job.id
12/06/01 23:26:43 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
12/06/01 23:26:43 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
12/06/01 23:26:43 INFO streaming.PipeMapRed: MRErrorThread done
12/06/01 23:26:43 INFO streaming.PipeMapRed: Records R/W=20/1
12/06/01 23:26:43 INFO streaming.PipeMapRed: mapRedFinished
12/06/01 23:26:43 INFO mapred.LocalJobRunner:
12/06/01 23:26:43 INFO mapred.MapTask: Starting flush of map output
12/06/01 23:26:43 INFO mapred.MapTask: Spilling map output
12/06/01 23:26:43 INFO mapred.MapTask: bufstart = 0; bufend = 560; bufvoid = 104857600
12/06/01 23:26:43 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214320(104857280); length = 77/6553600
12/06/01 23:26:43 INFO mapred.MapTask: Finished spill 0
12/06/01 23:26:43 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of committing
12/06/01 23:26:43 INFO mapred.LocalJobRunner: Records R/W=20/1
12/06/01 23:26:43 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
12/06/01 23:26:43 INFO mapred.LocalJobRunner: Finishing task: attempt_local_0001_m_000000_0
12/06/01 23:26:43 INFO mapred.LocalJobRunner: Map task executor complete.
12/06/01 23:26:43 INFO mapred.Task:    Using ResourceCalculatorPlugin : org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@25d71236
12/06/01 23:26:43 INFO mapred.Merger: Merging 1 sorted segments
12/06/01 23:26:43 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 574 bytes
12/06/01 23:26:43 INFO mapred.LocalJobRunner:
12/06/01 23:26:43 INFO streaming.PipeMapRed: PipeMapRed exec [/opt/hadoop/mrs/red.py]
12/06/01 23:26:43 WARN conf.Configuration: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
12/06/01 23:26:43 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
12/06/01 23:26:43 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
12/06/01 23:26:43 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
12/06/01 23:26:43 INFO streaming.PipeMapRed: Records R/W=20/1
12/06/01 23:26:43 INFO streaming.PipeMapRed: MRErrorThread done
12/06/01 23:26:43 INFO streaming.PipeMapRed: mapRedFinished
12/06/01 23:26:43 INFO mapred.Task: Task:attempt_local_0001_r_000000_0 is done. And is in the process of committing
12/06/01 23:26:43 INFO mapred.LocalJobRunner:
12/06/01 23:26:43 INFO mapred.Task: Task attempt_local_0001_r_000000_0 is allowed to commit now
12/06/01 23:26:43 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to hdfs://localhost:9000/asf/_temporary/0/task_local_0001_r_000000
12/06/01 23:26:43 INFO mapred.LocalJobRunner: Records R/W=20/1 > reduce
12/06/01 23:26:43 INFO mapred.Task: Task 'attempt_local_0001_r_000000_0' done.
12/06/01 23:26:43 INFO mapreduce.Job: Job job_local_0001 running in uber mode : false
12/06/01 23:26:43 INFO mapreduce.Job:    map 100% reduce 100%
12/06/01 23:26:43 INFO mapreduce.Job: Job job_local_0001 completed successfully
12/06/01 23:26:43 INFO mapreduce.Job: Counters: 32
                File System Counters
                                FILE: Number of bytes read=205938
                                FILE: Number of bytes written=452840
                                FILE: Number of read operations=0
                                FILE: Number of large read operations=0
                                FILE: Number of write operations=0
                                HDFS: Number of bytes read=252230
                                HDFS: Number of bytes written=59
                                HDFS: Number of read operations=13
                                HDFS: Number of large read operations=0
                                HDFS: Number of write operations=4
                Map-Reduce Framework
                                Map input records=20
                                Map output records=20
                                Map output bytes=560
                                Map output materialized bytes=606
                                Input split bytes=81
                                Combine input records=0
                                Combine output records=0
                                Reduce input groups=2
                                Reduce shuffle bytes=0
                                Reduce input records=20
                                Reduce output records=2
                                Spilled Records=40
                                Shuffled Maps =0
                                Failed Shuffles=0
                                Merged Map outputs=0
                                GC time elapsed (ms)=12
                                CPU time spent (ms)=0
                                Physical memory (bytes) snapshot=0
                                Virtual memory (bytes) snapshot=0
                                Total committed heap usage (bytes)=396361728
                File Input Format Counters
                                Bytes Read=126115
                File Output Format Counters
                                Bytes Written=59
12/06/01 23:26:43 INFO streaming.StreamJob: Output directory: /asf
当然map/reduce V2的功能还不止这些,还需要深入的研究一下。因为2.0虽然是0.23发展过来,但是跟0.23还有些不同,比如0.23中有ApplicationManager,2.0里好像没有在外面露出来了。也许也封装到container里面了。另外,那些xml的配置选项好像跟0.20.x也有很大不同了,具体还没细看。HA功能是支持多个namenode,且多个namenode分管不同的datanode。可以支持手工从某台namenode切换到另外一台namenode。这样做到高可用,据说未来会支持自动检测切换。

Hadoop 2.0.0-alpha尝鲜安装和hello world相关推荐

  1. 华为鸿蒙系统如何申请尝鲜,鸿蒙OS 2.0公测尝鲜来咯

    首先说一下6月2号鸿蒙OS将正式发布! 其次根据鸿蒙技术社区消息,鸿蒙 OS 首批用户尝鲜计划开启咯,Beta 尝鲜最低支持Mate20 系列手机,mate20系列之后的华为 Mate/novaico ...

  2. OceanBase 4.0 all-in-one 版本快速尝鲜安装步骤

    今天下午,OceanBase 4.0 all-in-one 版本的包发布出来了,获取地址:https://open.oceanbase.com/softwareCenter/community . 这 ...

  3. Cocos2d-x v3.0正式版尝鲜体验【3】 Label文本标签

    Cocos2d-x在新版本中加入了新的Label API,和以往不同的是,2.x的版本是通过三个不同的类来创建不同的文本标签,而现在是模仿着精灵的创建方式,一个类创建不同形式的文本,不过核心内容还是差 ...

  4. Vue 3.0 + Vite 快速尝鲜!

    1.Vite 简单介绍 Vite 是由 Vue 作者尤雨溪开发的一套一种新的.更快地 web 开发工具,它具快速的冷启动.即时的模块热更新.真正的按需编译几个特点. 作者曾在微博上发言:Vite,一个 ...

  5. ESXI6.5 最新版尝鲜安装图解

    ESXI6.5安装图解 转载于:https://blog.51cto.com/jdonghong/1883314

  6. Windows11 开发者版尝鲜安装教程

    今日凌晨,微软正式公布了新系统windows 11,早在16号网上就有泄露出的开发版提供下载,废话不多说,直接上链接 文件名:21996.1.210529-1541.co_release_CLIENT ...

  7. 探秘 Vue3.0 - Composition API 在真实业务中的尝鲜姿势

    前言 2019年2月6号,React 发布 16.8.0 版本,新增 Hooks 特性.随即,Vue 在 2019 的各大 JSConf 中也宣告了 Vue3.0 最重要的 RFC,即 Functio ...

  8. java变形金刚视频,Java 通用代码生成器光 2.0.0 Insight(内省) 发布尝鲜版 4,代码变形金刚...

    Java 通用代码生成器光 2.0.0 Insight(内省) 发布尝鲜版4,代码变形金刚 光 2.0.0 Insight(内省) 尝鲜版4拥有动态椰子树和动词否定两大功能群. 动态椰子树功能群允许您 ...

  9. PHP 8.0 源码编译安装 JIT 尝鲜

    女主宣言 今天小编为大家分享一篇最简化的 PHP 8 源码编译安装方法.PHP 8.0 Alpha 1 已经在2020年6月25号发布了,今天带领大家快速尝鲜 PHP 8.0 的新特性 JIT.希望能 ...

最新文章

  1. [FreeBSD] kvm下安装virtio驱动的freebsd
  2. 马尔可夫“折棍子”过程 Markovian Stick-breaking Process 在直方图平滑中的应用
  3. 【ABAP】在线预览文档对象的开发实现
  4. 以下属于4nf的分解为_中科院电工所张国强团队特稿:活性氧化铝和分子筛对C3F7CN/CO2及其过热分解产物的吸附特性...
  5. 如何在项目启动时就执行某些操作
  6. 【2019牛客暑期多校训练营(第二场)- E】MAZE(线段树优化dp,dp转矩阵乘法,线段树维护矩阵乘法)
  7. matlab 贝叶斯工具箱,matlab的BNT贝叶斯工具箱错误求教
  8. [zz]很详细,涵盖了多数场景!推荐 - python 的日志logging模块学习
  9. Dubbo学习笔记(一)
  10. Java库 学习笔记 - POI 在Word文档中查找指定关键字并设置背景色
  11. 浅谈如何根治慢性扁桃体炎-个人经验总结
  12. word中多级列表编号错乱怎么办?
  13. NDT算法的匹配流程
  14. 机器学习D14——随机森林
  15. 三角定位法java代码_GitHub - megagao/IndoorPos: 这是一个采用蓝牙4.0--iBeacon技术的室内定位服务端程序。...
  16. 【转载】SCI投稿过程总结、投稿状态解析、拒稿后对策及接受后期相关问答
  17. 【iKcamp线下】微信小程序技术沙龙
  18. 网页变灰的方法,适用于IE
  19. Laravel常驻进程内存泄漏
  20. 基于机器学习的航空公司客户价值分析与流失预测

热门文章

  1. PopsTabView--filter容器
  2. 玩转 React(四)- 创造一个新的 HTML 标签
  3. c++实现二叉搜索树
  4. 好想写点儿什么,但是不知道怎么写
  5. COM 组件设计与应用(六)
  6. 嵌套函数,匿名函数,高阶函数
  7. oo面向对象第一单元总结
  8. IPC之——消息队列
  9. #UnityTips# 2017.11.14
  10. 对KVC和KVO的理解