Hadoop基础入门学习
http://localhost:50070/
http://localhost:8088/cluster
1 问题排查
➜ /Users/zhaoshuai11/work/hadoop-2.7.3 hadoop fs -ls /
22/07/30 09:43:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicablelog4j.logger.org.apache.hadoop.util.NativeCodeLoader=ERROR
22/07/30 09:58:59 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
为什么执行mr 首先请求yarn?
2 hdfs 分布式文件系统
数据量大:多机横向发展,理论上无限扩展。
元数据记录:元数据记录-文件及其存储位置信息,快速定位文件位置。
文件分块存储:文件分块存储在不同机器,针对块并行操作提高效率。
副本机制:不同机器设置备份,冗余存储,保障数据安全。
➜ /Users/zhaoshuai11 hadoop fs -put java_error_in_idea_14849.log /itcast
➜ /Users/zhaoshuai11 hadoop fs -ls -h /itcast
Found 4 items
-rw-r--r-- 1 zhaoshuai11 supergroup 21 2022-07-30 09:55 /itcast/hello.txt
-rw-r--r-- 1 zhaoshuai11 supergroup 641.9 K 2022-07-30 09:48 /itcast/java_error_in_idea_11861.log
-rw-r--r-- 1 zhaoshuai11 supergroup 98.9 K 2022-07-30 10:34 /itcast/java_error_in_idea_14849.log
drwxr-xr-x - zhaoshuai11 supergroup 0 2022-07-30 09:59 /itcast/wordcount
➜ /Users/zhaoshuai11 hadoop fs -cat /itcast/hello.txt
hadoop hadoop hadoop
➜ /Users/zhaoshuai11 hadoop fs -get /itcast/wordcount/output/part-r-00000 ./
➜ /Users/zhaoshuai11 ll
-rw-r--r-- 1 zhaoshuai11 staff 9B 7 30 10:41 part-r-00000
➜ /Users/zhaoshuai11 cat part-r-00000
hadoop 3
小文件合并:
➜ /Users/zhaoshuai11 hadoop fs -appendToFile login.sh login-ext.sh /itcast/hello.txt
➜ /Users/zhaoshuai11 hadoop fs -cat /itcast/hello.txthadoop hadoop hadoop#!/usr/bin/expect -fset host "relay.baidu-int.com"set username "zhaoshuai11"set password "zs19961211."spawn ssh $username@$hostexpect "*Please input user's password*" {send "$password\r"}interact#!/bin/shbasepath=$(cd `dirname $0`; pwd)export LC_CTYPE=en_US#expect脚本所在位置filepath=$1if [ -f $filepath ]; thenexpect $filepathelseecho "$filepath not exits"fi%
➜ /Users/zhaoshuai11
3 hdfs 角色解读
4 hdfs 上传文件
5 MR
6 MR 特点
7 MR局限性
8 MR 实例进程
9 MR 官方案例 圆周率评估
➜ /Users/zhaoshuai11/work/hadoop-2.7.3/share/hadoop/mapreduce hadoop jar hadoop-mapreduce-examples-2.7.3.jar pi 2 2
Number of Maps = 2
Samples per Map = 2
Wrote input for Map #0
Wrote input for Map #1
Starting Job
22/07/30 15:38:57 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
22/07/30 15:38:57 INFO input.FileInputFormat: Total input paths to process : 2
22/07/30 15:38:58 INFO mapreduce.JobSubmitter: number of splits:2
22/07/30 15:38:59 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1659144775026_0003
22/07/30 15:38:59 INFO impl.YarnClientImpl: Submitted application application_1659144775026_0003
22/07/30 15:38:59 INFO mapreduce.Job: The url to track the job: http://127.0.0.1:8088/proxy/application_1659144775026_0003/
22/07/30 15:38:59 INFO mapreduce.Job: Running job: job_1659144775026_0003
22/07/30 15:39:09 INFO mapreduce.Job: Job job_1659144775026_0003 running in uber mode : false
22/07/30 15:39:09 INFO mapreduce.Job: map 0% reduce 0%
22/07/30 15:39:15 INFO mapreduce.Job: map 100% reduce 0%
22/07/30 15:39:24 INFO mapreduce.Job: map 100% reduce 100%
22/07/30 15:39:25 INFO mapreduce.Job: Job job_1659144775026_0003 completed successfully
22/07/30 15:39:25 INFO mapreduce.Job: Counters: 49File System CountersFILE: Number of bytes read=50FILE: Number of bytes written=357444FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=540HDFS: Number of bytes written=215HDFS: Number of read operations=11HDFS: Number of large read operations=0HDFS: Number of write operations=3Job CountersLaunched map tasks=2Launched reduce tasks=1Data-local map tasks=2Total time spent by all maps in occupied slots (ms)=8543Total time spent by all reduces in occupied slots (ms)=4822Total time spent by all map tasks (ms)=8543Total time spent by all reduce tasks (ms)=4822Total vcore-milliseconds taken by all map tasks=8543Total vcore-milliseconds taken by all reduce tasks=4822Total megabyte-milliseconds taken by all map tasks=8748032Total megabyte-milliseconds taken by all reduce tasks=4937728Map-Reduce FrameworkMap input records=2Map output records=4Map output bytes=36Map output materialized bytes=56Input split bytes=304Combine input records=0Combine output records=0Reduce input groups=2Reduce shuffle bytes=56Reduce input records=4Reduce output records=0Spilled Records=8Shuffled Maps =2Failed Shuffles=0Merged Map outputs=2GC time elapsed (ms)=201CPU time spent (ms)=0Physical memory (bytes) snapshot=0Virtual memory (bytes) snapshot=0Total committed heap usage (bytes)=534773760Shuffle ErrorsBAD_ID=0CONNECTION=0IO_ERROR=0WRONG_LENGTH=0WRONG_MAP=0WRONG_REDUCE=0File Input Format CountersBytes Read=236File Output Format CountersBytes Written=97
Job Finished in 28.108 seconds
Estimated value of Pi is 4.00000000000000000000
➜ /Users/zhaoshuai11/work/hadoop-2.7.3/share/hadoop/mapreduce
22/07/30 15:38:57 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 向yarn索要资源
10 MR wordcount单词统计
➜ /Users/zhaoshuai11/work/hadoop-2.7.3/share/hadoop/mapreduce hadoop jar hadoop-mapreduce-examples-2.7.3.jar wordcount /itcast/hello.txt /itcast/hello-count
22/07/30 15:49:26 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
22/07/30 15:49:26 INFO input.FileInputFormat: Total input paths to process : 1
22/07/30 15:49:27 INFO mapreduce.JobSubmitter: number of splits:1
22/07/30 15:49:27 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1659144775026_0004
22/07/30 15:49:27 INFO impl.YarnClientImpl: Submitted application application_1659144775026_0004
22/07/30 15:49:27 INFO mapreduce.Job: The url to track the job: http://127.0.0.1:8088/proxy/application_1659144775026_0004/
22/07/30 15:49:27 INFO mapreduce.Job: Running job: job_1659144775026_0004
22/07/30 15:49:37 INFO mapreduce.Job: Job job_1659144775026_0004 running in uber mode : false
22/07/30 15:49:37 INFO mapreduce.Job: map 0% reduce 0%
22/07/30 15:49:43 INFO mapreduce.Job: map 100% reduce 0%
22/07/30 15:49:49 INFO mapreduce.Job: map 100% reduce 100%
22/07/30 15:49:50 INFO mapreduce.Job: Job job_1659144775026_0004 completed successfully
22/07/30 15:49:50 INFO mapreduce.Job: Counters: 49File System CountersFILE: Number of bytes read=607FILE: Number of bytes written=238801FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=510HDFS: Number of bytes written=441HDFS: Number of read operations=6HDFS: Number of large read operations=0HDFS: Number of write operations=2Job CountersLaunched map tasks=1Launched reduce tasks=1Data-local map tasks=1Total time spent by all maps in occupied slots (ms)=3442Total time spent by all reduces in occupied slots (ms)=3708Total time spent by all map tasks (ms)=3442Total time spent by all reduce tasks (ms)=3708Total vcore-milliseconds taken by all map tasks=3442Total vcore-milliseconds taken by all reduce tasks=3708Total megabyte-milliseconds taken by all map tasks=3524608Total megabyte-milliseconds taken by all reduce tasks=3796992Map-Reduce FrameworkMap input records=18Map output records=47Map output bytes=591Map output materialized bytes=607Input split bytes=103Combine input records=47Combine output records=40Reduce input groups=40Reduce shuffle bytes=607Reduce input records=40Reduce output records=40Spilled Records=80Shuffled Maps =1Failed Shuffles=0Merged Map outputs=1GC time elapsed (ms)=122CPU time spent (ms)=0Physical memory (bytes) snapshot=0Virtual memory (bytes) snapshot=0Total committed heap usage (bytes)=332398592Shuffle ErrorsBAD_ID=0CONNECTION=0IO_ERROR=0WRONG_LENGTH=0WRONG_MAP=0WRONG_REDUCE=0File Input Format CountersBytes Read=407File Output Format CountersBytes Written=441
➜ /Users/zhaoshuai11/work/hadoop-2.7.3/share/hadoop/mapreduce
➜ /Users/zhaoshuai11/work/hadoop-2.7.3/share/hadoop/mapreduce hadoop fs -get /itcast/hello-count ./
➜ /Users/zhaoshuai11/work/hadoop-2.7.3/share/hadoop/mapreduce/hello-count cat part-r-00000
"$filepath 1
"$password\r"} 1
"*Please 1
"relay.baidu-int.com" 1
"zhaoshuai11" 1
"zs19961211." 1
#!/usr/bin/expect 1
#expect脚本所在位置 1
$0`; 1
$filepath 2
$username@$host 1
-f 2
LC_CTYPE=en_US 1
[ 1
]; 1
`dirname 1
basepath=$(cd 1
echo 1
else 1
exits" 1
expect 2
export 1
fi 1
filepath=$1 1
hadoop 3
host 1
if 1
input 1
interact#!/bin/sh 1
not 1
password 1
password*" 1
pwd) 1
set 3
spawn 1
ssh 1
then 1
user's 1
username 1
{send 1
➜ /Users/zhaoshuai11/work/hadoop-2.7.3/share/hadoop/mapreduce/hello-count
11 Map
12 Reduce
reduce 主动 拉取 mapTask 数据。
13 shuffle
14 Yarn
15 程序提交Yarn集群交互流程
16 资源调度器scheduler和调度策略
Hadoop基础入门学习相关推荐
- MAYA 2022基础入门学习教程
流派:电子学习| MP4 |视频:h264,1280×720 |音频:AAC,48.0 KHz 语言:英语+中英文字幕(根据原英文字幕机译更准确)|大小解压后:3.41 GB |时长:4.5小时 包含 ...
- Blender 3.0基础入门学习教程 Introduction to Blender 3.0
成为Blender通才,通过这个基于项目的循序渐进课程学习所有主题的基础知识. 你会学到什么 教程获取:Blender 3.0基础入门学习教程 Introduction to Blender 3.0- ...
- 三维地形制作软件 World Machine 基础入门学习教程
<World Machine课程>涵盖了你需要的一切,让你有一个坚实的基础来构建自己的高质量的电影或视频游戏地形. 你会学到什么 为渲染或游戏开发创建高分辨率.高细节的地形. 基于Worl ...
- SketchUp Pro 2021基础入门学习视频教程
SketchUp Pro 2021基础入门学习视频教程 1280X720 MP4 |视频:h264,1280×720 |音频:AAC,44.1 KHz,2 Ch 流派:电子学习|语言:英语+中文字幕( ...
- Maya基础入门学习教程
Maya基础入门学习教程 视频:.MKV, 1280x720, 共57节课 时长 4小时25分钟,3GB 语言:英语+中文字幕(根据原英文字幕机译更准确)+原英文字幕 指导老师:Shane Whitt ...
- Maya2022基础入门学习教程
Maya2022基础入门学习教程 Maya 2022 Essential Training Maya2022基础入门学习教程 Maya 2022 Essential Training MP4 |视频: ...
- Blender基础入门学习教程 Learning Blender from Scratch
Blender基础入门学习教程 Learning Blender from Scratch 流派:电子学习| MP4 |视频:h264,1280×720 |音频:aac,48000 Hz 语言:英语+ ...
- 零基础入门学习Python,我与python的第一次亲密接触后的感受!
前言:Python是适合初学者入门最好的语言 Python适合初学者入门最好的语言 人工智能用Python?高考要加入Python?现在连微软官方Excel都要把Python作为官方语言!Python ...
- 零基础入门学习Python(33)-图形用户界面编程(GUI编程)EasyGui
用户界面编程,即平时说的GUI(Graphical User Interface)编程,那些带有按钮.文本.输入框的窗口的编程 EasyGui是一个非常简单的GUI模块,一旦导入EasyGui模块,P ...
最新文章
- backbone.js学习笔记
- 关于TensorFlow开发者证书,你想要的资源都在这里!
- confluence启动不起来_汽车“一键启动”只用来点火?太浪费!你不知道的还有这3个功能...
- linux mysql utf-8编码_笔记:linux下mysql设置utf-8编码方法
- ArcGIS 10.2.2 for Desktop非管理员权限用户连接Oracle12c,崩溃
- java程序员如何编写出优美的代码-java编程规范
- Python压缩图片到指定大小
- java判断来源_java中判断applet 来源的方法有()
- 用数据分析看共享单车
- 微信公众号文章采集浅谈--搜狗APP近一天文章
- 鼠标自动点击工具鼠标连点器鼠标定时自动点击使用方法
- 【无标题】Vue长列表性能优化常用方案
- MySQL cluster集群/NDB集群学习
- PAT | 1080 MOOC期终成绩 (25分)【附柳神代码】
- ZZULIOJ-1095: 时间间隔(多实例测试)(Java)
- msys2 结合 vscode 使用 lldb 进行调试及 lldb-mi.exe 问题
- 探秘 App Clips
- 新能源系统仿真测试解决方案
- poj 3014 Asteroids
- GPS北斗定位模块应用实现汽车风控管理