基于docker1.7.03.1单机上部署hadoop2.7.3分布式集群
基于docker1.7.03.1单机上部署hadoop2.7.3分布式集群
[TOC]
声明
文章均为本人技术笔记,转载请注明出处:
[1] https://segmentfault.com/u/yzwall
[2] blog.csdn.net/j_dark/
0 docker版本与hadoop版本说明
PC:ubuntu 16.04.1 LTS
Docker version:17.03.1-ce OS/Arch:linux/amd64
Hadoop version:hadoop-2.7.3
1 docker中配置构建hadoop镜像
1.1 创建docker容器container
创建基于ubuntu镜像的容器container
,官方默认下载ubuntu最新精简版镜像;
sudo docker run -ti container ubuntu
1.2 修改/etc/source.list
修改默认源文件/etc/apt/source.list
,用国内源代替官方源;
1.3 安装java8
# docker镜像为了精简容量,删除了许多ubuntu自带组件,通过`apt-get update`更新获得
apt-get update
apt-get install software-properties-common python-software-properties # add-apt-repository
apt-get install software-properties-commonapt-get install software-properties-common # add-apt-repository
add-apt-repository ppa:webupd8team/java
apt-get update
apt-get install oracle-java8-installer
java -version
1.4 docker中安装hadoop-2.7.3
1.4.1 下载hadoop-2.7.3源码
# 创建多级目录
mkdir -p /software/apache/hadoop
cd /software/apache/hadoop
# 下载并解压hadoop
wget http://mirrors.sonic.net/apache/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz
tar xvzf hadoop-2.7.3.tar.gz
1.4.2 配置环境变量
修改~/.bashrc文件。在文件末尾加入下面配置信息:
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export HADOOP_HOME=/software/apache/hadoop/hadoop-2.7.3
export HADOOP_CONFIG_HOME=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
source ~/.bashrc
使环境变量配置生效;
注意:完成./bashrc文件配置后,hadoop-env.sh无需再配置;
1.5 配置hadoop
配置hadoop主要配置core-site.xml、hdfs-site.xml、mapred-site.xml, yarn-site.xml三个文件;
在$HADOOP_HOME
下创建namenode
, datanode
和tmp
目录
cd $HADOOP_HOME
mkdir tmp
mkdir namenode
mkdir datanode
1.5.1 配置core.site.xml
配置项
hadoop.tmp.dir
指向tmp
目录配置项
fs.default.name
指向master节点,配置为hdfs://master:9000
<configuration><property><!-- hadoop temp dir --><name>hadoop.tmp.dir</name><value>/software/apache/hadoop/hadoop-2.7.3/tmp</value><description>A base for other temporary directories.</description></property><!-- Size of read/write buffer used in SequenceFiles. --><property><name>io.file.buffer.size</name><value>131072</value></property><property><name>fs.default.name</name><value>hdfs://master:9000</value><final>true</final><description>The name of the default file system.</description></property>
</configuration>
1.5.2 配置hdfs-site.xml
dfs.replication
表示节点数目,配置集群1个namenode,3个datanode,设置备份数为4;dfs.namenode.name.dir
和dfs.datanode.data.dir
分别配置为之前创建的NameNode和DataNode的目录路径
<configuration><property><name>dfs.namenode.secondary.http-address</name><value>master:9001</value></property><property><name>dfs.replication</name><value>3</value><final>true</final><description>Default block replication.</description></property><property><name>dfs.namenode.name.dir</name><value>/software/apache/hadoop/hadoop-2.7.3/namenode</value><final>true</final></property><property><name>dfs.datanode.data.dir</name><value>/software/apache/hadoop/hadoop-2.7.3/datanode</value><final>true</final></property><property><name>dfs.webhdfs.enabled</name><value>true</value></property>
</configuration>
1.5.3 配置mapred-site.xml
在$HADOOP_HOME
下使用cp
命令创建mapred-site.xml
cd $HADOOP_HOME
cp mapred-site.xml.template mapred-site.xml
配置mapred-site.xml
,配置项;mapred.job.tracker
指向master节点
在hadoop 2.x.x中,用户无需配置mapred.job.tracker,因为JobTracker已经不存在,功能由组件MRAppMaster实现,因此需要用mapreduce.framework.name指定运行框架名称,指定yarn
——《Hadoop技术内幕:深入解析YARN架构设计与实现原理》
<configuration><property><name>mapreduce.framework.name</name><value>yarn</value></property><property><name>mapreduce.jobhistory.address</name><value>master:10020</value></property><property><name>mapreduce.jobhistory.address</name><value>master:19888</value></property>
</configuration>
1.5.4 配置yarn-site.xml
<configuration><property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>master:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>master:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>master:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>master:8088</value> </property>
</configuration>
1.5.5 安装vim,ifconfig与ping
安装ifconfig
与ping
命令所需软件包
apt-get update
apt-get install vim
apt-get install net-tools # for ifconfig
apt-get install inetutils-ping # for ping
1.5.6 构建hadoop基础镜像
假设当前容器名为container
,保存基础镜像为ubuntu:hadoop
,后续hadoop集群容器都根据该镜像创建启动,无需重复配置;
sudo docker commit -m "hadoop installed" container ubuntu:hadoop /bin/bash
2. hadoop分布式集群搭建
2.1 根据已经创建hadoop基础镜像创建容器集群
分别根据基础镜像ubuntu:hadoop
创建mater容器和slave1~3容器,各自主机名与容器名一致;
创建master:docker run -ti -h master --name master ubuntu:hadoop /bin/bash
创建slave1:docker run -ti -h slave1 --name slave1 ubuntu:hadoop /bin/bash
创建slave2:docker run -ti -h slave2 --name slave2 ubuntu:hadoop /bin/bash
创建slave3:docker run -ti -h slave3 --name slave3 ubuntu:hadoop /bin/bash
2.2 配置各容器hosts文件
在各容器的/etc/hosts
中添加以下内容,各容器ip地址通过ifconfig
查看:
master 172.17.0.2
slave1 172.17.0.3
slave2 172.17.0.4
slave3 172.17.0.5
注意:docker容器重启后,hosts内容可能会失效,经验不足暂时只能避免容器频繁重启,否则得手动再次配置hosts文件;
参考http://dockone.io/question/400
1./etc/hosts, /etc/resolv.conf和/etc/hostname,容器中的这三个文件不存在于镜像,而是存在于/var/lib/docker/containers/<container_id>,在启动容器的时候,通过mount的形式将这些文件挂载到容器内部。因此,如果在容器中修改这些文件的话,修改部分不会存在于容器的top layer,而是直接写入这三个物理文件中。
2.为什么重启后修改内容不存在?原因是:每次Docker在启动容器的时候,通过重新构建新的/etc/hosts文件,这又是为什么呢?原因是:容器重启,IP地址为改变,hosts文件中原来的IP地址无效,因此理应修改hosts文件,否则会产生脏数据。?原因是:每次Docker在启动容器的时候,通过重新构建新的/etc/hosts文件,这又是为什么呢?原因是:容器重启,IP地址为改变,hosts文件中原来的IP地址无效,因此理应修改hosts文件,否则会产生脏数据。1./etc/hosts, /etc/resolv.conf和/etc/hostname,容器中的这三个文件不存在于镜像,而是存在于/var/lib/docker/containers/<container_id>,在启动容器的时候,通过mount的形式将这些文件挂载到容器内部。因此,如果在容器中修改这些文件的话,修改部分不会存在于容器的top layer,而是直接写入这三个物理文件中。
2.3 集群节点SSH配置
2.3.1 所有节点:安装ssh
apt-get update
apt-get install ssh
apt-get install openssh-server
2.3.2 所有节点:生成随机密钥
# 生成无密码密钥,生成密钥位于~/.ssh下
ssh-keygen -t rsa -P ""
2.3.3 master节点:生成证书文件authorized_keys
将生成的公钥写入authorized_keys中
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
2.3.4 所有节点:修改sshd_config文件
通过修改sshd_config文件,保证ssh可远程登陆其他节点的root用户
vim /etc/ssh/sshd_config
# 将PermitRootLogin prohibit-password修改为PermitRootLogin yes
# 重启ssh服务
service ssh restart
2.3.5 master节点:通过scp传输证书到slave节点
传输master节点上的authorized_keys到其他slave节点~/.ssh下,覆盖同名文件;保证所有节点的证书一致,因此可以实现任意节点间可以通过ssh访问;
cd ~/.ssh
scp authorized_keys root@slave1:~/.ssh/
scp authorized_keys root@slave2:~/.ssh/
scp authorized_keys root@slave3:~/.ssh/
2.3.6 slave节点:修改证书权限确保生效
chmod 600 ~/.ssh/authorized_keys
注意
查看ssh服务是否开启:
ps -e | grep ssh
开启ssh服务:
service ssh start
重启ssh服务:
service ssh restart
完成2.3.1操作后,各个容器之间可通过ssh访问;
2.4 master节点配置
在master节点中,修改slaves文件配置slave节点
cd $HADOOP_CONFIG_HOME/
vim slaves
将其中内容覆盖为:
slave1
slave2
slave3
2.5 启动hadoop集群
进入master节点,
执行
hdfs namenode -format
,出现类似信息表示namenode格式化成功:
common.Storage: Storage directory /software/apache/hadoop/hadoop-2.7.3/namenode has been successfully formatted.
执行
start_all.sh
启动集群:
root@master:/# start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master]
The authenticity of host 'master (172.17.0.2)' can't be established.
ECDSA key fingerprint is SHA256:OewrSOYpvfDE6ixf6Gw9U7I9URT2zDCCtDJ6tjuZz/4.
Are you sure you want to continue connecting (yes/no)? yes
master: Warning: Permanently added 'master,172.17.0.2' (ECDSA) to the list of known hosts.
master: starting namenode, logging to /software/apache/hadoop/hadoop-2.7.3/logs/hadoop-root-namenode-master.out
slave3: starting datanode, logging to /software/apache/hadoop/hadoop-2.7.3/logs/hadoop-root-datanode-slave3.out
slave2: starting datanode, logging to /software/apache/hadoop/hadoop-2.7.3/logs/hadoop-root-datanode-slave2.out
slave1: starting datanode, logging to /software/apache/hadoop/hadoop-2.7.3/logs/hadoop-root-datanode-slave1.out
Starting secondary namenodes [master]
master: starting secondarynamenode, logging to /software/apache/hadoop/hadoop-2.7.3/logs/hadoop-root-secondarynamenode-master.out
starting yarn daemons
starting resourcemanager, logging to /software/apache/hadoop/hadoop-2.7.3/logs/yarn-root-resourcemanager-master.out
slave3: starting nodemanager, logging to /software/apache/hadoop/hadoop-2.7.3/logs/yarn-root-nodemanager-slave3.out
slave1: starting nodemanager, logging to /software/apache/hadoop/hadoop-2.7.3/logs/yarn-root-nodemanager-slave1.out
slave2: starting nodemanager, logging to /software/apache/hadoop/hadoop-2.7.3/logs/yarn-root-nodemanager-slave2.out
分别在master,slave节点中执行jps
,
master:
root@master:/# jps
2065 Jps
1446 NameNode
1801 ResourceManager
1641 SecondaryNameNode
slave1:
1107 NodeManager
1220 Jps
1000 DataNode
slave2:
241 DataNode
475 Jps
348 NodeManager
slave3:
500 Jps
388 NodeManager
281 DataNode
3. 执行wordcount
在hdfs中创建输入目录/hadoopinput
,并将输入文件LICENSE.txt
存储在该目录下:
root@master:/# hdfs dfs -mkdir -p /hadoopinput
root@master:/# hdfs dfs -put LICENSE.txt /hadoopint
进入$HADOOP_HOME/share/hadoop/mapreduce
,提交wordcount任务给集群,将计算结果保存在hdfs中的/hadoopoutput
目录下:
root@master:/# cd $HADOOP_HOME/share/hadoop/mapreduce
root@master:/software/apache/hadoop/hadoop-2.7.3/share/hadoop/mapreduce# hadoop jar hadoop-mapreduce-examples-2.7.3.jar wordcount /hadoopinput /hadoopoutput
17/05/26 01:21:34 INFO client.RMProxy: Connecting to ResourceManager at master/172.17.0.2:8032
17/05/26 01:21:35 INFO input.FileInputFormat: Total input paths to process : 1
17/05/26 01:21:35 INFO mapreduce.JobSubmitter: number of splits:1
17/05/26 01:21:35 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1495722519742_0001
17/05/26 01:21:36 INFO impl.YarnClientImpl: Submitted application application_1495722519742_0001
17/05/26 01:21:36 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1495722519742_0001/
17/05/26 01:21:36 INFO mapreduce.Job: Running job: job_1495722519742_0001
17/05/26 01:21:43 INFO mapreduce.Job: Job job_1495722519742_0001 running in uber mode : false
17/05/26 01:21:43 INFO mapreduce.Job: map 0% reduce 0%
17/05/26 01:21:48 INFO mapreduce.Job: map 100% reduce 0%
17/05/26 01:21:54 INFO mapreduce.Job: map 100% reduce 100%
17/05/26 01:21:55 INFO mapreduce.Job: Job job_1495722519742_0001 completed successfully
17/05/26 01:21:55 INFO mapreduce.Job: Counters: 49File System CountersFILE: Number of bytes read=29366FILE: Number of bytes written=295977FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=84961HDFS: Number of bytes written=22002HDFS: Number of read operations=6HDFS: Number of large read operations=0HDFS: Number of write operations=2Job Counters Launched map tasks=1Launched reduce tasks=1Data-local map tasks=1Total time spent by all maps in occupied slots (ms)=2922Total time spent by all reduces in occupied slots (ms)=3148Total time spent by all map tasks (ms)=2922Total time spent by all reduce tasks (ms)=3148Total vcore-milliseconds taken by all map tasks=2922Total vcore-milliseconds taken by all reduce tasks=3148Total megabyte-milliseconds taken by all map tasks=2992128Total megabyte-milliseconds taken by all reduce tasks=3223552Map-Reduce FrameworkMap input records=1562Map output records=12371Map output bytes=132735Map output materialized bytes=29366Input split bytes=107Combine input records=12371Combine output records=1906Reduce input groups=1906Reduce shuffle bytes=29366Reduce input records=1906Reduce output records=1906Spilled Records=3812Shuffled Maps =1Failed Shuffles=0Merged Map outputs=1GC time elapsed (ms)=78CPU time spent (ms)=1620Physical memory (bytes) snapshot=451264512Virtual memory (bytes) snapshot=3915927552Total committed heap usage (bytes)=348127232Shuffle ErrorsBAD_ID=0CONNECTION=0IO_ERROR=0WRONG_LENGTH=0WRONG_MAP=0WRONG_REDUCE=0File Input Format Counters Bytes Read=84854File Output Format Counters Bytes Written=22002
计算结果保存在/hadoopoutput/part-r-00000
中,查看结果:
root@master:/# hdfs dfs -ls /hadoopoutput
Found 2 items
-rw-r--r-- 3 root supergroup 0 2017-05-26 01:21 /hadoopoutput/_SUCCESS
-rw-r--r-- 3 root supergroup 22002 2017-05-26 01:21 /hadoopoutput/part-r-00000root@master:/# hdfs dfs -cat /hadoopoutput/part-r-00000
""AS 2
"AS 16
"COPYRIGHTS 1
"Contribution" 2
"Contributor" 2
"Derivative 1
"Legal 1
"License" 1
"License"); 1
"Licensed 1
"Licensor" 1
...
至此,基于docker1.7.03单机上部署hadoop2.7.3集群圆满成功!
参考
[1] http://tashan10.com/yong-dockerda-jian-hadoopwei-fen-bu-shi-ji-qun/
[2] http://blog.csdn.net/xiaoxiangzi222/article/details/52757168
基于docker1.7.03.1单机上部署hadoop2.7.3分布式集群相关推荐
- 在单机上构建 insecure 多节点CockroachDB集群
本文将向大家介绍如何在一台机器上(可以是物理机,也可以是一台虚拟机)创建insecure多节点CockroachDB集群(在示例中我们将创建一个具有3节点的CRDB数据库集群).这种方式创建的集群非常 ...
- 如何在Rancher 2.2 Preview2上部署和管理多K8s集群应用
Rancher 2.2 Preview2于近日全面发布,这一版本包含了许多K8S集群操作的强大特性.本文将详细介绍多集群应用这一特性,让您可以在短时间内更新集群,大大提升工作效率. 近日,全球领先的容 ...
- 如何在Rancher 2.2 Preview2上部署和管理多K8s集群应用 1
2019独角兽企业重金招聘Python工程师标准>>> Rancher 2.2 Preview2于近日全面发布,这一版本包含了许多K8S集群操作的强大特性.本文将详细介绍多集群应用这 ...
- shell半自动化部署standalone的spark分布式集群
背景:以前每次配置hadoop和spark都要各种输入配置,太烦了.这次花了点时间,自己做了个shell来辅助自己的spark部署方式.cdh的hadoop没有部署,以后再部署,hadoop和spar ...
- 《Linux运维实战:Centos7.6一键离线部署mongodb4.2.23副本集群》
一.部署背景 由于业务系统的特殊性,我们需要面向不通的客户安装我们的业务系统,而作为基础组件中的mongodb针对不同的客户环境需要多次部署,作为一个运维工程师,提升工作效率也是工作中的重要一环.所以 ...
- 云堡垒机分布式集群部署优缺点简单说明
目前云堡垒机安装部署模式主要分为单机部署.高可用集群部署以及分布式集群部署等.其中分布式集群部署就是将核心功能模块(如门户服务.会话中转服务.数据库服务等),分别部署在多个计算节点上.那采取布式集群部 ...
- Spark笔记整理(一):spark单机安装部署、分布式集群与HA安装部署+spark源码编译...
[TOC] spark单机安装部署 1.安装scala 解压:tar -zxvf soft/scala-2.10.5.tgz -C app/ 重命名:mv scala-2.10.5/ scala 配置 ...
- 需要单机还是集群部署_5000W如何玩转Filecoin市场 部署最符合企业的集群模式
5000W如何玩转Filecoin市场 部署最符合企业的集群模式 对 Filecoin集群的研究,其实一直以来都没有停止过. 如果我们将 Filecoin挖矿比作建高楼大厦.那么,集群就像打地基,地基 ...
- linux上部署最新版本zookeeper伪分布式集群
1.环境准备 centos7系统,VM安装centos可参考还不会使用linux?快来通过VMware安装centos系统吧~ zookeeper安装包 SecureCRT 2.zookeeper简介 ...
- 记一次 基于Hadoop 3.3.0 安装部署 Spark 3.0.0 分布式集群
一.基本信息 官网 http://spark.apache.org/ Apache Spark 官方文档中文版(Spark 2.2.0) http://spark.apachecn.org/#/ Sp ...
最新文章
- 互联网广告综述之点击率特征工程
- MySQL内部执行流程
- java 面向接口编程的理解
- Python中sorted函数的用法
- 本地tomcat启动war包_「shell脚本」懒人运维之自动升级tomcat应用(war包)
- 关于Web面试的基础知识点--Javascript(一)
- 简述java的异常处理机制_简述java异常处理机制
- 基于ASP.NET MVC(C#)和Quartz.Net组件实现的定时执行任务调度
- 《Mastering opencv....读书笔记》基于标记的虚拟现实
- 传统MVP用在项目中是真的方便还是累赘?
- 21复变函数的积分(七)
- applet demo
- 让SQL用户快速进入Hadoop大数据时代 —— Transwarp Inceptor是怎样炼成的
- 微信小程序使用百度翻译
- 2022电大国家开放大学网上形考任务-科学与技术非免费(非答案)
- 蓝桥杯 画积木 /洛谷 P1990 墙壁覆盖
- 论文阅读:CVPR2016 Paper list
- Java 矩阵主对角线所有数字之和
- 面试-JVM-类加载-类加载器--自定义类加载器-JVM调优
- C++设计模式之单例工厂模式
热门文章
- ObjectARX_AutoCAD创建矩形功能实现
- win10 计算机描述,怎么设置win10以安全描述符定义语言(SDDL)语法表示的计算机访问权限...
- 基于Linux的录像机:Neuros OSD
- java在线测试工具_9个最好用的在线编译/调试工具
- JsonPath 解析Josn字符串
- 完整的支付系统整体架构
- 计算机桌面锁定了没设密码怎么解锁,屏幕锁定怎么解锁?
- 数学建模上课(一)推导万有引力定律
- 直播购物平台应开发的功能
- java 清除文本框数据_Java 添加、读取、删除Excel文本框