hadoop安装部署(伪分布及集群)

@(HADOOP)[hadoop]

  • hadoop安装部署伪分布及集群
  • 第一部分伪分布式
    • 一环境准备
    • 二安装hdfs
    • 三安装YARN
  • 第二部分集群安装
    • 一规划

      • 一硬件资源
      • 二基本资料
    • 二环境配置
      • 一统一用户名密码并为jediael赋予执行所有命令的权限
      • 二创建目录mntjediael
      • 三修改用户名及etchosts文件

第一部分:伪分布式

一、环境准备

1、安装linux、jdk
2、下载hadoop2.6.0,并解压
3、配置免密码ssh
(1)检查是否可以免密码:

$ ssh localhost

(2)若否:

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

4、在/etc/profile中添加以下内容

#hadoop setting
export PATH=$PATH:/mnt/jediael/hadoop-2.6.0/bin:/mnt/jediael/hadoop-2.6.0/sbin
export HADOOP_HOME=/mnt/jediael/hadoop-2.6.0
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

二、安装hdfs

1、配置etc/hadoop/core-site.xml:

<configuration><property><name>fs.defaultFS</name><value>hdfs://localhost:9000</value></property>
</configuration>

2、配置etc/hadoop/hdfs-site.xml:

<configuration><property><name>dfs.replication</name><value>1</value></property>
</configuration>

3、格式化namenode

$ bin/hdfs namenode -format

4、启动hdfs

$ sbin/start-dfs.sh

5、打开页面验证hdfs安装成功
http://localhost:50070/
6、运行自带示例
(1)创建目录

  $ bin/hdfs dfs -mkdir /user
$ bin/hdfs dfs -mkdir /user/jediael

(2)复制文件

bin/hdfs dfs -put etc/hadoop input

(3)运行示例

$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar grep input output 'dfs[a-z.]+’

(4)检查输出结果

$ bin/hdfs dfs -cat output/*
6       dfs.audit.logger
4       dfs.class
3       dfs.server.namenode.
2       dfs.period
2       dfs.audit.log.maxfilesize
2       dfs.audit.log.maxbackupindex
1       dfsmetrics.log
1       dfsadmin
1       dfs.servers
1       dfs.replication
1       dfs.file

(5)关闭hdfs

 $ sbin/stop-dfs.sh

三、安装YARN

1、配置etc/hadoop/mapred-site.xml

<configuration><property><name>mapreduce.framework.name</name><value>yarn</value></property>
</configuration>

2、配置etc/hadoop/yarn-site.xml

<configuration><property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property>
</configuration>

3、启动yarn

$ sbin/start-yarn.sh

4、打开页面检查yarn
http://localhost:8088/
5、运行一个map-reduce job

$  bin/hadoop fs -mkdir /input
$ bin/hadoop fs -copyFromLocal /etc/profile /input
$ cd  /mnt/jediael/hadoop-2.6.0/share/hadoop/mapreduce
$ /mnt/jediael/hadoop-2.6.0/bin/hadoop jar hadoop-mapreduce-examples-2.6.0.jar wordcount /input /output

查看结果:

 $/mnt/jediael/hadoop-2.6.0/bin/hadoop fs -cat /output/*

第二部分:集群安装

一、规划

(一)硬件资源

10.171.29.191 master
10.171.94.155 slave1
10.251.0.197 slave3

(二)基本资料

用户: jediael
目录:/mnt/jediael/

二、环境配置

(一)统一用户名密码,并为jediael赋予执行所有命令的权限

#passwd
# useradd jediael
# passwd jediael
# vi /etc/sudoers

增加以下一行:

jediael ALL=(ALL) ALL

(二)创建目录/mnt/jediael

$sudo chown jediael:jediael /opt
$ cd /opt
$ sudo mkdir jediael

注意:/opt必须是jediael的,否则会在format namenode时出错。

(三)修改用户名及/etc/hosts文件

1、修改/etc/sysconfig/network

NETWORKING=yes
HOSTNAME=*******

2、修改/etc/hosts
10.171.29.191 master
10.171.94.155 slave1
10.251.0.197 slave3
注 意hosts文件不能有127.0.0.1 *配置,否则会导致出现异常。org.apache.hadoop.ipc.Client: Retrying connect to server: master/10.171.29.191:9000. Already trie

3、hostname命令

hostname ****

(四)配置免密码登录
以上命令在master上使用jediael用户执行:

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
然后,将authorized_keys复制到slave1,slave2
scp ~/.ssh/authorized_keys slave1:~/.ssh/
scp ~/.ssh/authorized_keys slave2:~/.ssh/

注意
(1)若提示.ssh目录不存在,则表示此机器从未运行过ssh,因此运行一次即可创建.ssh目录。
(2).ssh/的权限为600,authorized_keys的权限为700,权限大了小了都不行。

(五)在3台机器上分别安装java,并设置相关环境变量
参考http://blog.csdn.net/jediael_lu/article/details/38925871

(六)下载hadoop-2.6.0.tar.gz,并将其解压到/mnt/jediael
wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
tar -zxvf hadoop-2.6.0.tar.gz

三、修改配置文件
【3台机器上均要执行,一般先在一台机器上配置完成,再用scp复制到其它机器】
(一)hadoop_env.sh

export JAVA_HOME=/usr/java/jdk1.7.0_51

(二)修改core-site.xml

    <property><name>hadoop.tmp.dir</name><value>/mnt/tmp</value><description>Abase for other temporary directories.</description></property><property><name>fs.defaultFS</name><value>hdfs://master:9000</value></property><property><name>io.file.buffer.size</name><value>4096</value></property>

(三)修改hdfs-site.xml

    <property><name>dfs.replication</name><value>2</value></property>

(四)修改mapred-site.xml

   <property><name>mapreduce.framework.name</name><value>yarn</value><final>true</final></property><property><name>mapreduce.jobtracker.http.address</name><value>master:50030</value>
</property>
<property><name>mapreduce.jobhistory.address</name><value>master:10020</value>
</property>
<property><name>mapreduce.jobhistory.webapp.address</name><value>master:19888</value>
</property><property><name>mapred.job.tracker</name><value>http://master:9001</value></property>

(五)修改yarn.xml

    <property><name>yarn.resourcemanager.hostname</name><value>master</value></property><property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value>
</property>
<property><name>yarn.resourcemanager.address</name><value>master:8032</value>
</property>
<property><name>yarn.resourcemanager.scheduler.address</name><value>master:8030</value>
</property>
<property><name>yarn.resourcemanager.resource-tracker.address</name><value>master:8031</value>
</property>
<property><name>yarn.resourcemanager.admin.address</name><value>master:8033</value>
</property>
<property><name>yarn.resourcemanager.webapp.address</name><value>master:8088</value>
</property>

(六)修改slaves
slaves:

slave1
slave3

四、启动并验证

1、格式 化namenode

[jediael@master hadoop-1.2.1]$  bin/hadoop namenode -format

2、启动hadoop【此步骤只需要在master上执行】

[jediael@master hadoop-1.2.1]$ bin/start-all.sh

3、验证1:向hdfs中写入内容

[jediael@master hadoop-2.6.0]$ bin/hadoop fs -ls /
[jediael@master hadoop-2.6.0]$ bin/hadoop fs -mkdir /test
[jediael@master hadoop-2.6.0]$ bin/hadoop fs -ls /
Found 1 items
drwxr-xr-x   - jediael supergroup          0 2015-04-19 23:41 /test

4、验证:登录页面
NameNode http://ip:50070

5、查看各个主机的java进程
(1)master:

$ jps
3694 NameNode
3882 SecondaryNameNode
7216 Jps
4024 ResourceManager

(2)slave1:

$ jps
1913 NodeManager
2673 Jps
1801 DataNode

(3)slave3:

$ jps
1942 NodeManager
2252 Jps
1840 DataNode

五、运行一个完整的mapreduce程序:运行自带的wordcount程序

$ bin/hadoop fs -mkdir /input
$ bin/hadoop fs -ls /
Found 2 items
drwxr-xr-x   - jediael supergroup          0 2015-04-20 18:04 /input
drwxr-xr-x   - jediael supergroup          0 2015-04-19 23:41 /test$ bin/hadoop fs -copyFromLocal etc/hadoop/mapred-site.xml.template /input  $ pwd
/mnt/jediael/hadoop-2.6.0/share/hadoop/mapreduce$ /mnt/jediael/hadoop-2.6.0/bin/hadoop jar hadoop-mapreduce-examples-2.6.0.jar wordcount /input /output
15/04/20 18:15:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/04/20 18:15:48 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
15/04/20 18:15:48 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
15/04/20 18:15:49 INFO input.FileInputFormat: Total input paths to process : 1
15/04/20 18:15:49 INFO mapreduce.JobSubmitter: number of splits:1
15/04/20 18:15:49 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local657082309_0001
15/04/20 18:15:50 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
15/04/20 18:15:50 INFO mapreduce.Job: Running job: job_local657082309_0001
15/04/20 18:15:50 INFO mapred.LocalJobRunner: OutputCommitter set in config null
15/04/20 18:15:50 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
15/04/20 18:15:50 INFO mapred.LocalJobRunner: Waiting for map tasks
15/04/20 18:15:50 INFO mapred.LocalJobRunner: Starting task: attempt_local657082309_0001_m_000000_0
15/04/20 18:15:50 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
15/04/20 18:15:50 INFO mapred.MapTask: Processing split: hdfs://master:9000/input/mapred-site.xml.template:0+2268
15/04/20 18:15:51 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
15/04/20 18:15:51 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
15/04/20 18:15:51 INFO mapred.MapTask: soft limit at 83886080
15/04/20 18:15:51 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
15/04/20 18:15:51 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
15/04/20 18:15:51 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
15/04/20 18:15:51 INFO mapred.LocalJobRunner:
15/04/20 18:15:51 INFO mapred.MapTask: Starting flush of map output
15/04/20 18:15:51 INFO mapred.MapTask: Spilling map output
15/04/20 18:15:51 INFO mapred.MapTask: bufstart = 0; bufend = 1698; bufvoid = 104857600
15/04/20 18:15:51 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26213916(104855664); length = 481/6553600
15/04/20 18:15:51 INFO mapred.MapTask: Finished spill 0
15/04/20 18:15:51 INFO mapred.Task: Task:attempt_local657082309_0001_m_000000_0 is done. And is in the process of committing
15/04/20 18:15:51 INFO mapred.LocalJobRunner: map
15/04/20 18:15:51 INFO mapred.Task: Task 'attempt_local657082309_0001_m_000000_0' done.
15/04/20 18:15:51 INFO mapred.LocalJobRunner: Finishing task: attempt_local657082309_0001_m_000000_0
15/04/20 18:15:51 INFO mapred.LocalJobRunner: map task executor complete.
15/04/20 18:15:51 INFO mapred.LocalJobRunner: Waiting for reduce tasks
15/04/20 18:15:51 INFO mapred.LocalJobRunner: Starting task: attempt_local657082309_0001_r_000000_0
15/04/20 18:15:51 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
15/04/20 18:15:51 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@39be5e01
15/04/20 18:15:51 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=363285696, maxSingleShuffleLimit=90821424, mergeThreshold=239768576, ioSortFactor=10, memToMemMergeOutputsThreshold=10
15/04/20 18:15:51 INFO reduce.EventFetcher: attempt_local657082309_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
15/04/20 18:15:51 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local657082309_0001_m_000000_0 decomp: 1566 len: 1570 to MEMORY
15/04/20 18:15:51 INFO reduce.InMemoryMapOutput: Read 1566 bytes from map-output for attempt_local657082309_0001_m_000000_0
15/04/20 18:15:51 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 1566, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->1566
15/04/20 18:15:51 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
15/04/20 18:15:51 INFO mapred.LocalJobRunner: 1 / 1 copied.
15/04/20 18:15:51 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
15/04/20 18:15:51 INFO mapred.Merger: Merging 1 sorted segments
15/04/20 18:15:51 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 1560 bytes
15/04/20 18:15:51 INFO reduce.MergeManagerImpl: Merged 1 segments, 1566 bytes to disk to satisfy reduce memory limit
15/04/20 18:15:51 INFO reduce.MergeManagerImpl: Merging 1 files, 1570 bytes from disk
15/04/20 18:15:51 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
15/04/20 18:15:51 INFO mapred.Merger: Merging 1 sorted segments
15/04/20 18:15:51 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 1560 bytes
15/04/20 18:15:51 INFO mapred.LocalJobRunner: 1 / 1 copied.
15/04/20 18:15:51 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
15/04/20 18:15:51 INFO mapreduce.Job: Job job_local657082309_0001 running in uber mode : false
15/04/20 18:15:51 INFO mapreduce.Job:  map 100% reduce 0%
15/04/20 18:15:51 INFO mapred.Task: Task:attempt_local657082309_0001_r_000000_0 is done. And is in the process of committing
15/04/20 18:15:51 INFO mapred.LocalJobRunner: 1 / 1 copied.
15/04/20 18:15:51 INFO mapred.Task: Task attempt_local657082309_0001_r_000000_0 is allowed to commit now
15/04/20 18:15:51 INFO output.FileOutputCommitter: Saved output of task 'attempt_local657082309_0001_r_000000_0' to hdfs://master:9000/output/_temporary/0/task_local657082309_0001_r_000000
15/04/20 18:15:51 INFO mapred.LocalJobRunner: reduce > reduce
15/04/20 18:15:51 INFO mapred.Task: Task 'attempt_local657082309_0001_r_000000_0' done.
15/04/20 18:15:51 INFO mapred.LocalJobRunner: Finishing task: attempt_local657082309_0001_r_000000_0
15/04/20 18:15:51 INFO mapred.LocalJobRunner: reduce task executor complete.
15/04/20 18:15:52 INFO mapreduce.Job:  map 100% reduce 100%
15/04/20 18:15:52 INFO mapreduce.Job: Job job_local657082309_0001 completed successfully
15/04/20 18:15:52 INFO mapreduce.Job: Counters: 38File System CountersFILE: Number of bytes read=544164FILE: Number of bytes written=1040966FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=4536HDFS: Number of bytes written=1196HDFS: Number of read operations=15HDFS: Number of large read operations=0HDFS: Number of write operations=4Map-Reduce FrameworkMap input records=43Map output records=121Map output bytes=1698Map output materialized bytes=1570Input split bytes=114Combine input records=121Combine output records=92Reduce input groups=92Reduce shuffle bytes=1570Reduce input records=92Reduce output records=92Spilled Records=184Shuffled Maps =1Failed Shuffles=0Merged Map outputs=1GC time elapsed (ms)=123CPU time spent (ms)=0Physical memory (bytes) snapshot=0Virtual memory (bytes) snapshot=0Total committed heap usage (bytes)=269361152Shuffle ErrorsBAD_ID=0CONNECTION=0IO_ERROR=0WRONG_LENGTH=0WRONG_MAP=0WRONG_REDUCE=0File Input Format CountersBytes Read=2268File Output Format Counters $ /mnt/jediael/hadoop-2.6.0/bin/hadoop fs -cat /output/*

hadoop安装部署(伪分布及集群)相关推荐

  1. Zookeeper的安装部署,zookeeper参数配置说明,集群搭建,查看集群状态

    1.Zookeeper的安装部署 7.1 Zookeeper工作机制 7.1.1.Zookeeper集群角色 Zookeeper集群的角色:  Leader 和  follower (Observer ...

  2. CentOS 7.5安装部署Jewel版本Ceph集群

    参考文档 https://www.linuxidc.com/Linux/2017-09/146760.htm https://www.cnblogs.com/luohaixian/p/8087591. ...

  3. 四、《云原生 | Kubernetes篇》二进制安装部署k8s高可用集群V1.24

    一.环境准备 1.1.部署k8s的两种方式 1)方式一:kubeadm部署 Kubeadm是一个K8s部署工具,提供kubeadm init和kubeadm join,用于快速部署Kubernetes ...

  4. VMware创建Linux虚拟机之(三)Hadoop安装与配置及搭建集群

    Hello,world!

  5. Hadoop部署方式-高可用集群部署(High Availability)

    Hadoop部署方式-高可用集群部署(High Availability) 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. 本篇博客的高可用集群是建立在完全分布式基础之上的,详情请参 ...

  6. hadoop +hbase+zookeeper 伪分布安装(超级无敌详细)

    hadoop +hbase+zookeeper 伪分布安装(超级无敌详细) hadoop 配置 图片打不开的可以点击下方链接直接去图床查看,辣鸡CSDN 安装jdk sudo apt update// ...

  7. Hadoop安装搭建伪分布式教程(全面)吐血整理

    Hadoop安装搭建伪分布式教程(全面)吐血整理 1.安装下载虚拟机VMware 2.下载Ubuntu18.04镜像文件,并在VMware中添加虚拟机. 3.Hadoop伪分布式搭建 3.1 创建ha ...

  8. kafka的简单介绍以及docker-compose部署单主机Kafka集群

    Kafka简单介绍 Kafka是由Apache软件基金会开发的一个分布式.分区的.多副本的.多订阅者的开源流处理平台,由Scala和Java编写.Kafka是一种高吞吐量的分布式发布订阅消息系统,它可 ...

  9. Cloudera Manager安装之利用parcels方式(在线或离线)安装3或4节点集群(包含最新稳定版本或指定版本的安装)(添加服务)(Ubuntu14.04)(五)...

    如果大家,在启动的时候,比如遇到如下问题,则 明明已经授权了啊,怎么被拒绝,纳尼??? 解决办法 然后,再来这样,就可以了. 注意,在此之前. ubuntucmbigdata1机器上,则需要执行 bi ...

最新文章

  1. 单臂路由的过程模拟和数据分析
  2. 研发效能提升最佳实践的探索
  3. 在Visual Studio 2010 中使用菱形向导对窗口进行布局
  4. [蓝桥杯][2014年第五届真题]稍大的串(STL)
  5. 加密选项_视频会议Zoom 5.0版本重大更新,增强加密功能提供更多安全选项
  6. 【摘录】Android2.3所支持语言的列表
  7. 12.01 晚 心情 阴雨 既然能力支撑不起自己的野心就先沉下心来努力学习好让有一天重新出现在江湖大放光芒...
  8. 真相 | 14 岁编程神童谎言坐实,除了谴责我们该反思什么?
  9. LeetCode刷题——75. 颜色分类
  10. Web Dev领域:2017精彩事件和 2018预测
  11. c#精彩编程200例百度云_每天宅家创客5分钟|智龙6号星球车:01唤醒星球车——温州中小学趣味信息技术云课程...
  12. Redis下载和安装
  13. 吴忌寒联姻500.COM的背后秘密
  14. Android仓库管理系统
  15. android t渐变立体按钮,Android 多色渐变按钮
  16. Tolerance Analysis 尺寸公差分析
  17. 一种基于STM32F1 MCU的增量型编码器测速的方法
  18. pytorch实现yolov4_v2(网络模块)
  19. 2021年人工智能(AI)的五大发展方向
  20. CF 115B. Lawnmower

热门文章

  1. 【解题报告+通法】_九宫幻方 蓝桥杯 2017年C组第八题(dfs解法)
  2. Dev-Cpp 常用的快捷键(持续更新)
  3. 模拟器不全屏_刺激战场:腾讯模拟器怎么设置才不卡
  4. 使用Eclipse创建Web工程后未生成web.xml文件
  5. Mybatis插入数据后获取主键的值
  6. docker 分布式 lnmp 镜像制作
  7. mysql查询09软件技术1班_MySQL查询语句的45道练习(2019.09最新版)
  8. ajax带来的主要问题有哪些,ajax面试题
  9. catia圆管焊接焊接_CATIA焊接教程.ppt
  10. npm run build 打包 之后怎么用_npm 组件你应该知道的事