大数据平台分布式搭建-Hadoop集群配置
Section 1: 文件清单
- hadoop-2.8.4.tar.gz
- jdk-8u181-linux-x64.tar
- xshell 7家庭版
- xftp 7家庭版
Section 2: 下载链接
[JDK 下载]:https://www.oracle.com/technetwork/java/javase/downloads/java-archive-javase8u211-later-5573849.html
[Hadoop下载]:http://archive.apache.org/dist/hadoop/core/
[Xshell/Xftp 下载]:https://www.netsarang.com/zh/free-for-home-school/
备注:
Section 3: 安装部署
- Part I - JDK安装
Step 1: 解压Java至/opt/cluster,其中cluster为专用集群配置而单独创建的文件夹
[root@localhost opt]# mkdir cluster
[root@localhost ~]# tar -zxvf jdk-8u181-linux-x64.tar.gz -C /opt/cluster/
解压后,应在/opt/cluster/下应存在jdk文件夹,具体如下:
[root@localhost cluster]# ll
total 4
drwxr-xr-x. 7 10 143 4096 Jul 7 2018 jdk1.8.0_181
Step 2: 配置JAVA_HOME环境变量
[root@localhost ~]# cd /etc
[root@localhost etc]# vi profile
进入后,在该文件末尾添加JAVA_HOME环境变量:
export JAVA_HOME=/opt/cluster/jdk1.8.0_181
export PATH=$PATH:$JAVA_HOME/bin
退出后,执行以下代码进行生效。
[root@localhost etc]# source profile
测试是否Java配置成功,出现版本号就基本意味着配置成功。
[root@localhost etc]# java -version
java version "1.8.0_181"
Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)
- Part II - Hadoop集群
Step 1: 总览集群IP信息
节点角色 | Master | Slave1 | Slave2 |
---|---|---|---|
IP | 192.168.137.128 | 192.168.137.129 | 192.168.137.130 |
HostName | BlogMaster | BlogSlave1 | BlogSlave2 |
JDK配置 | BlogMaster | BlogSlave1 | BlogSlave2 |
Hadoop | BlogMaster | BlogSlave1 | BlogSlave2 |
Step 2: 配置主节点的Hadoop集群
- Step 2.1: 配置主节点
- Step 2.1: 解压Hadoop压缩包至/opt/cluster/下
[root@localhost ~]# tar -zxvf hadoop-2.8.4.tar.gz -C /opt/cluster/
- Step 2.2: 配置core-site.xml(位于/opt/cluster/hadoop-2.8.4/etc/hadoop)
进入该文件后,添加如下内容:
<configuration><property><name>fs.defaultFS</name><value>hdfs://192.168.137.128:9000</value></property><property><name>hadoop.tmp.dir</name><value>/opt/cluster/hadoop-2.8.4/tmp</value></property>
</configuration>
注意之后,在hadoop安装目录下,即/opt/cluster/hadoop-2.8.4下创建tmp文件夹。
- Step 2.3: 配置hadoop-env.sh(位于/opt/cluster/hadoop-2.8.4/etc/hadoop)
进入该文件后,修改JAVA_HOME为之前导入的Java安装所在根目录。
# The java implementation to use.
export JAVA_HOME=/opt/cluster/jdk1.8.0_181
- Step 2.4: 配置hdfs-site.xml(位于/opt/cluster/hadoop-2.8.4/etc/hadoop)
进入该文件后,新添如下内容:
<configuration>//original data<property><name>dfs.namenode.name.dir</name><value>/opt/cluster/hadoop-2.8.4/dfs/name</value></property>// hdfs data<property><name>dfs.datanode.data.dir</name><value>/opt/cluster/hadoop-2.8.4/dfs/data</value></property>// backup<property><name>dfs.replication</name><value>2</value></property>//secodary namenode<property><name>dfs.namenode.secondary.http-address</name><value>BlogSlave1:50090</value></property>
</configuration>
注意之后,在hadoop安装目录下,即/opt/cluster/hadoop-2.8.4下创建dfs文件夹,然后在该文件夹内部分别创建name和data文件夹,即
[root@localhost hadoop-2.8.4]# mkdir dfs
[root@localhost hadoop-2.8.4]# cd dfs
[root@localhost dfs]# mkdir name
[root@localhost dfs]# mkdir data
[root@localhost dfs]# ll
total 0
drwxr-xr-x. 2 root root 6 Nov 11 15:08 data
drwxr-xr-x. 2 root root 6 Nov 11 15:08 name
- Step 2.5:配置slaves文件(位于/opt/cluster/hadoop-2.8.4/etc/hadoop)
进入该文件夹后, 新添如下内容:
BlogSlave1
BlogSlave2
- Step 2.6:配置yarn-site.xml文件(位于/opt/cluster/hadoop-2.8.4/etc/hadoop)
<configuration>
<!-- Site specific YARN configuration properties --><property><name>yarn.resourcemanager.hostname</name><value>BlogMaster</value></property><property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property>
</configuration>
- Step 2.7:配置yarn-env.sh文件(位于/opt/cluster/hadoop-2.8.4/etc/hadoop)
# some Java parameters
export JAVA_HOME=/opt/cluster/jdk1.8.0_181
- Step 2.8:配置mapred-site.sh文件(位于/opt/cluster/hadoop-2.8.4/etc/hadoop)
进入该目录后,并不存在mapred-site.sh,而是存在mapred-site.xml.template文件。对此,可采用拷贝mapred-site.xml.template后,重命名的方式创建一个mapred-site.xml文件。
<configuration><property><name>mapreduce.framework.name</name><value>yarn</value></property>
</configuration>
- Step 2.9:配置yarn-site.sh文件(位于/opt/cluster/hadoop-2.8.4/etc/hadoop)
<configuration>
<!-- Site specific YARN configuration properties --><property><name>yarn.resourcemanager.hostname</name><value>BlogMaster</value></property><property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property><property><name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name><value>org.apache.hadoop.mapred.ShuffleHandler</value></property><property><name>yarn.resourcemanager.address</name><value>BlogMaster:8032</value></property><property><name>yarn.resourcemanager.scheduler.address</name><value>BlogMaster:8030</value></property><property><name>yarn.resourcemanager.resource-tracker.address</name><value>BlogMaster:8035</value></property><property><name>yarn.resourcemanager.admin.address</name><value>BlogMaster:8033</value></property><property><name>yarn.resourcemanager.webapp.address</name><value>BlogMaster:8088</value></property><property><name>yarn.nodemanager.pmem-check-enabled</name><value>false</value></property><property><name>yarn.nodemanager.vmem-check-enabled</name><value>false</value></property>
</configuration>
- Step 2.10:配置环境变量,并添加HADOOP_HOME至path中
在JAVA_HOME的基础上,进一步配置HADOOP_HOME和YARN_HOME等环境变量。
export JAVA_HOME=/opt/cluster/jdk1.8.0_181export HADOOP_HOME=/opt/cluster/hadoop-2.8.4
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoopexport YARN_HOME=/opt/cluster/hadoop-2.8.4
export YARN_CONF_DIR=$YARN_HOME/etc/hadoopexport PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
之后注意用source profile使命令生效。
- Step 2.11:关闭防火墙,便于后面好配置
[root@localhost etc]# systemctl stop firewalld
[root@localhost etc]# systemctl disable firewalld
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
Removed symlink /etc/systemd/system/basic.target.wants/firewalld.service.
Step 3: 采用VMware Workstation完整克隆主节点两份,作为Slave节点
克隆前,先关闭主节点,且注意是完整完整完整克隆。此处时间稍微有点长,可以听一首Zella Day的1965,enjoy…
Step 4: 设置Master和Slave从节点的IP
- Step 4.1: 配置BlogMaster节点
首先,覆盖/etc下的hostname文件,修改为:
BlogMaster
其次,在/etc下的hosts文件,新添如下内容:
192.168.137.128 BlogMaster
192.168.137.129 BlogSlave1
192.168.137.130 BlogSlave2
最后,reboot该虚拟机,使得配置生效。
- Step 4.2: 配置BlogSlave1节点的hostname(位于/etc文件夹下)
BlogSlave1
之后,reboot该虚拟机,使得配置生效。
- Step 4.3: 配置BlogSlave2节点的hostname(位于/etc文件夹下)
BlogSlave2
之后,reboot该虚拟机,使得配置生效。
- Step 4.4: 分发主节点的hosts至两台Slave节点对应目录 设置Master和Slave的免秘钥登录*
在进行免密配置前,可用ip addr查看三台电脑的网址是不是符合预期。
注意注意注意,配置免秘钥前,先确保完全关闭各节点的防火墙,完整命令如下:
[root@BlogMaster ~]# systemctl stop firewalld
[root@BlogMaster ~]# systemctl disable firewalld
[root@BlogSlave1 ~]# systemctl stop firewalld
[root@BlogSlave1 ~]# systemctl disable firewalld
[root@BlogSlave2 ~]# systemctl stop firewalld
[root@BlogSlave2 ~]# systemctl disable firewalld
之后,对各节点将分别执行如下四行命令。
[root@BlogMaster ~]# ssh-keygen
[root@BlogMaster ~]# ssh-copy-id 192.168.137.128
[root@BlogMaster ~]# ssh-copy-id 192.168.137.129
[root@BlogMaster ~]# ssh-copy-id 192.168.137.130
- Step 4.4.1: 三台全部执行如下命令,以分别产生公钥(注意此处一路回车直至出现秘钥图像)
[root@BlogMaster .ssh]# ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
/root/.ssh/id_rsa already exists.
Overwrite (y/n)? y
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
aa:3f:dc:a1:8b:20:70:48:c0:ce:f4:43:32:93:0c:3c root@BlogMaster
The key's randomart image is:
+--[ RSA 2048]----+
|B . |
|.E . |
|+.B |
|.+ o |
|o . . S |
|.. .. |
|. . ..o . |
| . . o+ . |
| o.oo |
+-----------------+
- Step 4.4.2: 三台全部执行如下命令,以分别实现彼此秘钥的配对
更为完整的信息,如下
[root@BlogMaster .ssh]# ssh-copy-id 192.168.137.128
The authenticity of host '192.168.137.128 (192.168.137.128)' can't be established.
ECDSA key fingerprint is fa:31:6b:64:bf:13:26:27:44:47:40:a7:f0:4a:39:e9.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@192.168.137.128's password: Number of key(s) added: 1Now try logging into the machine, with: "ssh '192.168.137.128'"
and check to make sure that only the key(s) you wanted were added.[root@BlogMaster .ssh]# ssh-copy-id 192.168.137.129
The authenticity of host '192.168.137.129 (192.168.137.129)' can't be established.
ECDSA key fingerprint is fa:31:6b:64:bf:13:26:27:44:47:40:a7:f0:4a:39:e9.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@192.168.137.129's password: Number of key(s) added: 1Now try logging into the machine, with: "ssh '192.168.137.129'"
and check to make sure that only the key(s) you wanted were added.[root@BlogMaster .ssh]# ssh-copy-id 192.168.137.130
The authenticity of host '192.168.137.130 (192.168.137.130)' can't be established.
ECDSA key fingerprint is fa:31:6b:64:bf:13:26:27:44:47:40:a7:f0:4a:39:e9.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@192.168.137.130's password: Number of key(s) added: 1Now try logging into the machine, with: "ssh '192.168.137.130'"
and check to make sure that only the key(s) you wanted were added.
- Step 4.5: 分发主节点的hosts至两台Slave节点对应目录
**总体:**执行如下命令,将修改后的IP-名称对应的hosts分发至集群另外两台Slave节点。
[root@BlogMaster etc]# scp -r hosts BlogSlave1:$PWD
[root@BlogMaster etc]# scp -r hosts BlogSlave2:$PWD
具体地,
首先, 对于BlogSlave1,则在BlogMaster节点执行如下命令:
[root@BlogMaster etc]# scp -r hosts BlogSlave1:$PWD
The authenticity of host 'blogslave1 (192.168.137.129)' can't be established.
ECDSA key fingerprint is fa:31:6b:64:bf:13:26:27:44:47:40:a7:f0:4a:39:e9.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'blogslave1,192.168.137.129' (ECDSA) to the list of known hosts.
root@blogslave1's password:
hosts 100% 240 0.2KB/s 00:00
其次, 对于BlogSlave2,则在BlogMaster节点执行如下命令:
[root@BlogMaster etc]# scp -r hosts BlogSlave2:$PWD
The authenticity of host 'blogslave2 (192.168.137.130)' can't be established.
ECDSA key fingerprint is fa:31:6b:64:bf:13:26:27:44:47:40:a7:f0:4a:39:e9.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'blogslave2,192.168.137.130' (ECDSA) to the list of known hosts.
root@blogslave2's password:
hosts 100% 240 0.2KB/s 00:00
Step 4.5: 在主节点BlogMaster执行格式化命令
[root@BlogMaster etc]# hadoop namenode -format
如果出现如下信息,则说明配置成功。
[root@BlogMaster etc]# hadoop namenode -format
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.19/11/11 16:27:31 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: user = root
STARTUP_MSG: host = BlogMaster/192.168.137.128
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.8.4
**比较长,这里仅给出一部分**...
Part III- Hadoop集群启动和运行状态查看
Step 1: 启动Hadoop集群
[root@BlogMaster ~]# start-dfs.sh
执行后,结果:
Starting namenodes on [BlogMaster]
The authenticity of host 'blogmaster (192.168.137.128)' can't be established.
ECDSA key fingerprint is fa:31:6b:64:bf:13:26:27:44:47:40:a7:f0:4a:39:e9.
Are you sure you want to continue connecting (yes/no)? yes
BlogMaster: Warning: Permanently added 'blogmaster' (ECDSA) to the list of known hosts.
BlogMaster: starting namenode, logging to /opt/cluster/hadoop-2.8.4/logs/hadoop-root-namenode-BlogMaster.out
BlogSlave1: starting datanode, logging to /opt/cluster/hadoop-2.8.4/logs/hadoop-root-datanode-BlogSlave1.out
BlogSlave2: starting datanode, logging to /opt/cluster/hadoop-2.8.4/logs/hadoop-root-datanode-BlogSlave2.out
Starting secondary namenodes [BlogSlave1]
BlogSlave1: starting secondarynamenode, logging to /opt/cluster/hadoop-2.8.4/logs/hadoop-root-secondarynamenode-BlogSlave1.out
Step 2: 查看Hadoop集群运行状态
- Step 2.1: jps查看命令,当各个节点出现如下信息,说明启动成功
在各节点分别输入jps命令后,
主节点BlogMaster:
[root@BlogMaster name]# jps
4469 Jps
4235 NameNode
从节点BlogSlave1:
[root@BlogSlave1 data]# jps
3057 DataNode
3153 SecondaryNameNode
3199 Jps
从节点BlogSlave2:
[root@BlogSlave2 data]# jps
2709 Jps
2631 DataNode
- Step 2.2:Web页面查看
首先,需要在个人PC电脑的C:\Windows\System32\drivers\etc下hosts文件添加如下信息:
192.168.137.128 BlogMaster
192.168.137.129 BlogSlave1
192.168.137.130 BlogSlave2
其次,打开浏览器,输入http://192.168.137.128:50070/ 后,查看集群信息
Part IV- 可能错误与解决方案
Problem 1:
集群格式化多次,注意再次格式化前,需先清除集群各节点下/opt/cluster/hadoop-2.8.4/dfs 下的name和data文件夹下的所有内容。
大数据平台分布式搭建-Hadoop集群配置相关推荐
- 大数据讲课笔记3.3 Hadoop集群配置
文章目录 零.学习目标 一.导入新课 二.新课讲解 (一)配置Hadoop集群 1.在master虚拟机上配置hadoop (1)编辑Hadoop环境配置文件 - hadoop-env.sh (2)编 ...
- 好程序员大数据笔记之:Hadoop集群搭建
好程序员大数据笔记之:Hadoop集群搭建在学习大数据的过程中,我们接触了很多关于Hadoop的理论和操作性的知识点,尤其在近期学习的Hadoop集群的搭建问题上,小细节,小难点拼频频出现,所以,今天 ...
- 全国大学生大数据技能竞赛(Hadoop集群搭建)
系列文章 全国大学生大数据技能竞赛(数仓部署) 全国大学生大数据技能竞赛(Spark on Yarn安装) 文章目录 系列文章 前言 资料链接 用VMware练习配置前准备三台虚拟机并修改网络为桥接 ...
- 自学大数据第四天~hadoop集群的搭建(一)
Hadoop集群安装配置 当hadoop采用分布式模式部署和运行时,存储采用分布式文件系统HDFS,此时HDFS名称节点和数据节点位于不同的机器上; 数据就可以分布到多个节点,不同的数据节点上的数据计 ...
- 大数据平台分布式搭建 - Hive(HWI)+MySQL分布式配置
Part I - 文件清单 Hive安装包 apache-hive-1.2.2-bin.tar.gz HIve-HWI Web监测:apache-hive-1.2.2-src.tar.gz HIve- ...
- 【数据平台】关于Hadoop集群namenode format安全事故
1.问题:重启namenode后提示连接失败,居然执行了format命令,出现了很严重的安全事故. 2.教训:对于生产集群,一定要建立运维安全体系,对于高危命令要做权限控制. 3.对于备份的理解: 1 ...
- 大数据导论实验一:搭建Hadoop集群
一.实验要求(10%) 搭建Hadoop集群,要求至少是3个节点的真分布式集群系统,包含1个NameNode,两个DataNode. 集群的安装配置大致为如下流程: 1)准备3台客户机(关闭防火墙.静 ...
- 【Hadoop大数据平台组件搭建系列(一)】——Zookeeper组件配置
简介 本篇介绍Hadoop大数据平台组件中的Zookeeper组件的搭建 使用软件版本信息 zookeeper-3.4.14.tar.gz Zookeeper安装 解压Zookeeper安装包至目标目 ...
- 教育行业需要了解的大数据,武汉数道云科技浅析:Hadoop大数据平台如何搭建?...
从教育行业的需求出发,去分析互联网时代教育行业需要了解的大数据,以及大数据平台的搭建应该注意什么? 中国-教育规模较大的国家之一,随着互联网的发展,海量的教育大数据将不断产生,如何有效科学的利用数据资 ...
最新文章
- android layout组件,Android UI学习 - Linear Layout, RelativeLayout
- 如何搭建亿级社交信息分享社交平台架构
- 软件开发的生命周期描述
- 截取AVI格式的视频C语言代码
- sql server 运维时CPU,内存,操作系统等信息查询(用sql语句)
- emui内核支持kvm吗_EMUI和MIUI为什么不基于安卓linux内核不使用虚拟机直接用c++开发一些更流畅系统自带软件呢?...
- python 传输视频_如何用python实现网络实时视频传输
- LCD1602液晶显示设计
- 霍尼韦尔:物联网“起跑线”上的巨头转型之路
- EXCEL出错 8000401a
- uniapp引入阿里图标库
- (源代码)用Python制作疫情的实时数据地图(PS:全国以及每个省)
- Carsim应用:LKA车道保持辅助系统(LQR算法推导)
- 金蝶k/3 现金流量表编制口诀
- WERTYU - UVA - 10082
- Flink常用算子Transformation介绍
- AMD完成对ATI并购 07年推CPU/GPU集成平台
- android jdk和ndk下载地址,cocos2d-x Android(SDK,NDK,JDK,ANT)下载地址
- php 运行c语言,echo c语言运行
- 解决微信ios端+sendReq: not found