Hadoop技术原理:

Hdfs主要模块:NameNode、DataNode
Yarn主要模块:ResourceManager、NodeManager
HDFS主要模块及运行原理:

1)NameNode:

功能:是整个文件系统的管理节点。维护整个文件系统的文件目录树,文件/目录的元数据和
每个文件对应的数据块列表。接收用户的请求。
2)DataNode:

功能:是HA(高可用性)的一个解决方案,是备用镜像,但不支持热备

一、hadoop单机版测试

1.创建用户并安装

[root@server1 ~]# ls
anaconda-ks.cfg  hadoop-3.0.3.tar.gz  jdk-8u181-linux-x64.tar.gz
[root@server1 ~]# useradd -u 1000 hadoop
[root@server1 ~]# passwd hadoop
Changing password for user hadoop.
New password:
BAD PASSWORD: The password is shorter than 8 characters
Retype new password:
passwd: all authentication tokens updated successfully.
[root@server1 ~]# mv hadoop-3.0.3.tar.gz jdk-8u181-linux-x64.tar.gz /home/hadoop/
[root@server1 ~]# su - hadoop
[hadoop@server1 ~]$ ls
hadoop-3.0.3.tar.gz  jdk-8u181-linux-x64.tar.gz
[hadoop@server1 ~]$ tar zxf jdk-8u181-linux-x64.tar.gz
[hadoop@server1 ~]$ ls
hadoop-3.0.3.tar.gz  jdk1.8.0_181  jdk-8u181-linux-x64.tar.gz
[hadoop@server1 ~]$ ln -s jdk1.8.0_181/ java
[hadoop@server1 ~]$ tar zxf hadoop-3.0.3.tar.gz
[hadoop@server1 ~]$ ls
hadoop-3.0.3         java          jdk-8u181-linux-x64.tar.gz
hadoop-3.0.3.tar.gz  jdk1.8.0_181
[hadoop@server1 ~]$ ln -s hadoop-3.0.3 hadoop
[hadoop@server1 ~]$ ls
hadoop        hadoop-3.0.3.tar.gz  jdk1.8.0_181
hadoop-3.0.3  java                 jdk-8u181-linux-x64.tar.gz

2、配置环境变量

[hadoop@server1 hadoop]$ cd /home/hadoop/hadoop/etc/hadoop/
[hadoop@server1 hadoop]$ pwd
/home/hadoop/hadoop/etc/hadoop
[hadoop@server1 hadoop]$ vim hadoop-env.sh

[hadoop@server1 hadoop]$ cd
[hadoop@server1 ~]$ vim .bash_profile

 3、测试

[hadoop@server1 ~]$ source .bash_profile
[hadoop@server1 ~]$ jps
942 Jps
[hadoop@server1 ~]$ cd /home/hadoop/hadoop
[hadoop@server1 hadoop]$ pwd
/home/hadoop/hadoop
[hadoop@server1 hadoop]$ mkdir input
[hadoop@server1 hadoop]$ cp etc/hadoop/*.xml input/
[hadoop@server1 hadoop]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.3.jar grep input output 'dfs[a-z.]+'
[hadoop@server1 hadoop]$ cd output/
[hadoop@server1 output]$ ls
part-r-00000  _SUCCESS
[hadoop@server1 output]$ cat *
1   dfsadmin

二、伪分布式
1.编辑文件

[hadoop@server1 output]$ cd /home/hadoop/hadoop/etc/hadoop/
[hadoop@server1 hadoop]$ pwd
/home/hadoop/hadoop/etc/hadoop
[hadoop@server1 hadoop]$ vim core-site.xml

[hadoop@server1 hadoop]$ vim hdfs-site.xml 

2.生成密钥做免密连接

[hadoop@server1 sbin]$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
8e:f1:1d:4e:29:e2:56:eb:42:4f:79:f2:f3:89:42:69 hadoop@server1
The key's randomart image is:
+--[ RSA 2048]----+
|                 |
|                 |
|                 |
|           .     |
|      o So+      |
|     ..BE*..     |
|     .+=++o      |
|     ...o o. .   |
|       ....oo    |
+-----------------+
[hadoop@server1 sbin]$ ssh-copy-id localhost
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
hadoop@localhost's password: Number of key(s) added: 1Now try logging into the machine, with:   "ssh 'localhost'"
and check to make sure that only the key(s) you wanted were added.

3.格式化并开启服务

[hadoop@server1 hadoop]$ bin/hdfs namenode -format
[hadoop@server1 hadoop]$ pwd
/home/hadoop/hadoop
[hadoop@server1 sbin]$ ./start-dfs.sh
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [server1]
[hadoop@server1 sbin]$ jps
11060 NameNode
11159 DataNode
11341 SecondaryNameNode
11485 Jps

4.浏览器查看

http://172.25.11.1:9870

5.测试,创建目录,并上传

[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir -p /user/hadoop
[hadoop@server1 hadoop]$ bin/hdfs dfs -ls
[hadoop@server1 hadoop]$ bin/hdfs dfs -put input
[hadoop@server1 hadoop]$ bin/hdfs dfs -ls
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2019-05-19 01:47 input

6. 删除input和output文件,重新执行命令

[hadoop@server1 hadoop]$ rm -fr input/ output/
[hadoop@server1 hadoop]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.3.jar grep input output 'dfs[a-z.]+'
[hadoop@server1 hadoop]$ ls
bin  etc  include  lib  libexec  LICENSE.txt  logs  NOTICE.txt  README.txt  sbin  share

此时input和output不会出现在当前目录下,而是上传到了分布式文件系统中,网页上可以看到

[hadoop@server1 hadoop]$ bin/hdfs dfs -cat output/*
1   dfsadmin
[hadoop@server1 hadoop]$ bin/hdfs dfs -get output  ##从分布式系统中get下来output目录
[hadoop@server1 hadoop]$ cd output/
[hadoop@server1 output]$ ls
part-r-00000  _SUCCESS

三、分布式
1.先停掉服务,清除原来的数据

[hadoop@server1 hadoop]$ sbin/stop-dfs.sh
Stopping namenodes on [localhost]
Stopping datanodes
Stopping secondary namenodes [server1]
[hadoop@server1 hadoop]$ cd /tmp/
[hadoop@server1 tmp]$ ls
hadoop  hadoop-hadoop  hsperfdata_hadoop
[hadoop@server1 tmp]$ rm -fr *

2.新开两个虚拟机,当做节点
    创建用户

[root@server2 ~]# useradd -u 1000 hadoop
[root@server3 ~]# useradd -u 1000 hadoop

安装nfs-utils

[root@server1 ~]# yum install -y nfs-utils
[root@server2 ~]# yum install -y nfs-utils
[root@server3 ~]# yum install -y nfs-utils[root@server1 ~]# systemctl start rpcbind
[root@server2 ~]# systemctl start rpcbind
[root@server3 ~]# systemctl start rpcbind

3.server1开启服务,配置

[root@server1 ~]# systemctl start nfs-server
[root@server1 ~]# vim /etc/exports
/home/hadoop   *(rw,anonuid=1000,anongid=1000)[root@server1 ~]# exportfs -rv
exporting *:/home/hadoop
[root@server1 ~]# showmount -e
Export list for server1:
/home/hadoop *

4.server2,3挂载

[root@server2 ~]# vim /etc/hosts
[root@server2 ~]# mount 172.25.11.1:/home/hadoop /home/hadoop
[root@server2 ~]# df
Filesystem               1K-blocks    Used Available Use% Mounted on
/dev/sda3                 18351104 1092520  17258584   6% /
devtmpfs                    498480       0    498480   0% /dev
tmpfs                       508264       0    508264   0% /dev/shm
tmpfs                       508264    6736    501528   2% /run
tmpfs                       508264       0    508264   0% /sys/fs/cgroup
/dev/sda1                   508580  110596    397984  22% /boot
tmpfs                       101656       0    101656   0% /run/user/0
172.25.11.1:/home/hadoop  18351104 2790656  15560448  16% /home/hadoop[root@server3 ~]# mount 172.25.11.1:/home/hadoop /home/hadoop
[root@server3 ~]# df
Filesystem               1K-blocks    Used Available Use% Mounted on
/dev/sda3                 18351104 1092468  17258636   6% /
devtmpfs                    498480       0    498480   0% /dev
tmpfs                       508264       0    508264   0% /dev/shm
tmpfs                       508264    6736    501528   2% /run
tmpfs                       508264       0    508264   0% /sys/fs/cgroup
/dev/sda1                   508580  110596    397984  22% /boot
tmpfs                       101656       0    101656   0% /run/user/0
172.25.11.1:/home/hadoop  18351104 2790656  15560448  16% /home/hadoop

此时使用hadoop用户时可以直接从server1登陆到server2和server3的

5.重新编辑文件

[root@server1 hadoop]# pwd
/home/hadoop/hadoop/etc/hadoop
[root@server1 hadoop]# vim core-site.xml<configuration><property><name>fs.defaultFS</name><value>hdfs://172.25.11.1:9000</value></property>
</configuration>[root@server1 hadoop]# vim hdfs-site.xml<configuration><property><name>dfs.replication</name><value>2</value> ##改为两个节点</property>
</configuration>[root@server1 hadoop]# vim workers
[root@server1 hadoop]# cat workers
172.25.11.2
172.25.11.3

在一个地方编辑,其他节点都有了

[hadoop@server2 hadoop]$ cat workers
172.25.11.2
172.25.11.3[root@server3 ~]# cat /home/hadoop/hadoop/etc/hadoop/workers
172.25.11.2
172.25.11.3

6.格式化,并启动服务

[hadoop@server1 hadoop]$ pwd
/home/hadoop/hadoop
[hadoop@server1 hadoop]$ bin/hdfs namenode -format
[hadoop@server1 hadoop]$ sbin/start-dfs.sh
Starting namenodes on [server1]
Starting datanodes
172.25.11.2: Warning: Permanently added '172.25.11.2' (ECDSA) to the list of known hosts.
172.25.11.3: Warning: Permanently added '172.25.11.3' (ECDSA) to the list of known hosts.
Starting secondary namenodes [server1]
[hadoop@server1 hadoop]$ sbin/start-dfs.sh
Starting namenodes on [server1]
Starting datanodes
Starting secondary namenodes [server1]
[hadoop@server1 hadoop]$ vim /etc/hosts
[hadoop@server1 hadoop]$ vim /etc/hosts
[hadoop@server1 hadoop]$ jps
14000 SecondaryNameNode ##生成SecondaryNameNode
13814 NameNode
14153 Jps

从节点可以看到datanode信息

[hadoop@server2 ~]$ jps
10336 Jps
10273 DataNode
[hadoop@server3 ~]$ jps
1141 DataNode
1230 Jps

7.测试

[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir -p /user/hadoop
[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir input
[hadoop@server1 hadoop]$ bin/hdfs dfs -put etc/hadoop/*.xml input
[hadoop@server1 hadoop]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.3.jar grep input output 'dfs[a-z.]+'

8.网页上查看,有两个节点,且数据已经上传

9.server4模拟客户端

[root@server4 ~]# useradd -u  1000 hadoop
[root@server4 ~]# vim /etc/hosts
[root@server4 ~]# yum  install -y  nfs-utils[root@server4 ~]# systemctl start rpcbind
[root@server4 ~]# mount 172.25.11.1:/home/hadoop /home/hadoop
[root@server4 ~]# su - hadoop
[hadoop@server4 ~]$ cd /home/hadoop/hadoop/etc/hadoop/
[hadoop@server4 hadoop]$ vim workers
[hadoop@server4 hadoop]$ cat workers
172.25.11.2
172.25.11.3
172.25.11.4
[hadoop@server4 hadoop]$ sbin/hadoop-daemon.sh start
[hadoop@server4 hadoop]$ jps
2609 Jps
2594 DataNode

浏览器查看,节点添加成功

[hadoop@server4 hadoop]$ dd if=/dev/zero of=bigfile bs=1M count=500
500+0 records in
500+0 records out
524288000 bytes (524 MB) copied, 25.8653 s, 20.3 MB/s
[hadoop@server4 hadoop]$ bin/hdfs dfs -put bigfile
  • 显示bigfile已经上传成功

hadoop的单机版测试和集群节点的搭建相关推荐

  1. 测试网站集群节点的页面访问是否正常

    场景之一: 基本架构: 网站是由多台Squid服务器做集群组成,客户端口访问网站的过程是: 客户端向Web站点发出请求,请求某一URL资源: 本地解析服务器域名的IP地址,由于DNS做了视图和对IP的 ...

  2. ceph搭建_如何一键搭建并管理Filecoin集群节点 | Gamma OS新功能上线

    参与filecoin挖矿,对于没有一定技术基础的矿工来说,是一个可触不可及的事情.对于没有任何技术或者技术实力不够的矿工而言,去找别的技术团队合作.或者全权委托其他技术合作方,都存在一些不透明以及信任 ...

  3. redis集群节点宕机 fail状态 redis的投票机制实测

    redis集群节点宕机 fail状态 redis的投票机制实测  redis集群是有很多个redis一起工作,那么就需要这个集群不是那么容易挂掉,所以呢,理论上就应该给集群中的每个节点至少一个备用的r ...

  4. Hadoop学习笔记—13.分布式集群中节点的动态添加与下架

    Hadoop学习笔记-13.分布式集群中节点的动态添加与下架 开篇:在本笔记系列的第一篇中,我们介绍了如何搭建伪分布与分布模式的Hadoop集群.现在,我们来了解一下在一个Hadoop分布式集群中,如 ...

  5. 不仅性能秒杀Hadoop,现在连分布式集群功能也开源了

    就在昨天(2020年8月3日),涛思数据团队正式宣布,物联网大数据平台TDengine集群版开源.此次开源,我们在GitHub上传了23.9万行源代码,1198个源文件,包含我自己疫情期间写的一万余行 ...

  6. 淘淘商城第39讲——使用Spring来管理单机版Redis与集群版Redis

    我们知道Jedis在处理Redis的单机版和集群版时是完全不同的,有可能在开发的时候使用的是单机版,但是当项目上线后使用的则是集群版,这就需要能够方便的在单机版和集群版之间进行切换了.我们的做法便是定 ...

  7. hadoop集群平台的搭建

    环境配置: master:192.168.1.20 slave1:192.168.1.21 slave2:192.168.1.22 准备工作: #yum安装需要的服务,关闭防火墙和selinux, y ...

  8. Zookeeper+Hadoop+Hbase+Hive+Kylin+Nginx集群搭建三(zookeeper篇)

    Zookeeper+Hadoop+Hbase+Hive+Kylin+Nginx集群搭建三(zookeeper篇) 四.Zookeeper集群搭建 1.下载安装zookeeper安装包 2.文件配置 3 ...

  9. Zookeeper+Hadoop+Hbase+Hive+Kylin+Nginx集群搭建一(虚拟机篇)

    Zookeeper+Hadoop+Hbase+Hive+Kylin+Nginx集群搭建一(虚拟机篇) 一.虚拟机安装Centos7 1.准备工作 2.centos7安装过程 3.关闭防火墙(关键) 二 ...

最新文章

  1. 0x12.基本数据结构 — 队列与单调队列
  2. html框架集 target
  3. python控制鼠标,如何在Mac中使用Python控制鼠标?
  4. Java 洛谷 P1152 欢乐的跳
  5. Alluxio 助力 Kubernetes,加速云端深度学习
  6. 信用卡申请被拒原因分析
  7. 计算机竞赛游戏探险岛,冒险岛2五大全新团本综合分析
  8. linux 编译查看链接库详情,Linux环境下的编译,链接与库的使用
  9. mssql与oracle不同点,MySql,Mssql,Oracle的优缺点和异同(欢迎补充) *
  10. 从治疗癌症到预测犯罪,细数数据科学在各领域的神奇应用
  11. 【noi 2.6_2421】Exchange Rates(DP)
  12. dubbo kryo序列化_为什么如此高效?解密kryo各个数据类型的序列化编码机制,强...
  13. JDK,JRE和JVM之间的区别
  14. MySQL对浮点数设置保留位数
  15. 获得两点之间连续坐标,向量加法、减法、乘法的运用
  16. 2019中国基金业金融科技发展白皮书
  17. python爬虫简单示例
  18. 查看树莓派引脚以及串口连接
  19. 视频编码器接入指挥调度平台的一种可行方法
  20. 推荐系统论文粗读记录【二】

热门文章

  1. 在socket 中使用域名
  2. USTC English Club Note20211222
  3. JDBC学习笔记——Java语言与数据库的鹊桥
  4. python画图解决Times New Roman自带粗体问题
  5. Linux 配置iSCSI Initiator
  6. 小规模纳税人可以申请美元账户收款么?
  7. Cesium与STK中的天空盒子(skybox)
  8. [DFS] P1236 算24点 ( 普及+/提高
  9. C语言(二):数据类型
  10. 【视频学习】12堂快速阅读课,10倍提升阅读效率