操作流程

Snappy算法在本地模式里边没有,所以必须得去集群里跑这个代码
首先请看操作记录

[root@hadoop01 home]# hadoop fs -mkdir /aaaaa
在集群新建一个/aaaaa目录[root@hadoop01 home]# hadoop fs -put a.txt /aaaaa/
随便建一个a.txt输入任意内容,然后上传到集群的/aaaaa目录下[root@hadoop01 home]# hadoop fs -ls /aaaaa
看一看里边有啥
Found 1 items
-rw-r--r--   2 root supergroup         50 2019-11-18 09:34 /aaaaa/a.txt[root@hadoop01 home]# rz
上传我们代码打成的jar包到linux
rz waiting to receive.zmodem trl+C ȡ100%   26014 KB 26014 KB/s 00:00:01       0 Errors[root@hadoop01 home]# ll
//查看是否上传成功。test5.jar就是上传的jar
总用量 52020
-rw-r--r-- 1 root root       50 11月 18 09:33 a.txt
-rw-r--r-- 1 root root 26623563 11月 16 14:48 test4.jar
-rw-r--r-- 1 root root 26639104 11月 18 09:27 test5.jar[root@hadoop01 home]# hadoop jar test5.jar com.czxy.day20191118.demo02.MoreFileDriver
运行这个jar
19/11/18 09:35:22 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/192.168.100.201:8032
19/11/18 09:35:23 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
19/11/18 09:35:24 INFO input.FileInputFormat: Total input paths to process : 1
19/11/18 09:35:24 INFO mapreduce.JobSubmitter: number of splits:1
19/11/18 09:35:24 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1574040817860_0001
19/11/18 09:35:24 INFO impl.YarnClientImpl: Submitted application application_1574040817860_0001
19/11/18 09:35:25 INFO mapreduce.Job: The url to track the job: http://hadoop01:8088/proxy/application_1574040817860_0001/
19/11/18 09:35:25 INFO mapreduce.Job: Running job: job_1574040817860_0001
19/11/18 09:35:32 INFO mapreduce.Job: Job job_1574040817860_0001 running in uber mode : true
19/11/18 09:35:32 INFO mapreduce.Job:  map 100% reduce 0%
19/11/18 09:35:34 INFO mapreduce.Job:  map 100% reduce 100%
19/11/18 09:35:34 INFO mapreduce.Job: Job job_1574040817860_0001 completed successfully
19/11/18 09:35:34 INFO mapreduce.Job: Counters: 52File System CountersFILE: Number of bytes read=320FILE: Number of bytes written=496FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=400HDFS: Number of bytes written=165860HDFS: Number of read operations=35HDFS: Number of large read operations=0HDFS: Number of write operations=9Job Counters Launched map tasks=1Launched reduce tasks=1Other local map tasks=1Total time spent by all maps in occupied slots (ms)=0Total time spent by all reduces in occupied slots (ms)=0TOTAL_LAUNCHED_UBERTASKS=2NUM_UBER_SUBMAPS=1NUM_UBER_SUBREDUCES=1Total time spent by all map tasks (ms)=292Total time spent by all reduce tasks (ms)=1475Total vcore-milliseconds taken by all map tasks=0Total vcore-milliseconds taken by all reduce tasks=0Total megabyte-milliseconds taken by all map tasks=0Total megabyte-milliseconds taken by all reduce tasks=0Map-Reduce FrameworkMap input records=11Map output records=11Map output bytes=116Map output materialized bytes=144Input split bytes=97Combine input records=0Combine output records=0Reduce input groups=1Reduce shuffle bytes=144Reduce input records=11Reduce output records=11Spilled Records=22Shuffled Maps =1Failed Shuffles=0Merged Map outputs=1GC time elapsed (ms)=0CPU time spent (ms)=1340Physical memory (bytes) snapshot=766976000Virtual memory (bytes) snapshot=6122287104Total committed heap usage (bytes)=556793856Shuffle ErrorsBAD_ID=0CONNECTION=0IO_ERROR=0WRONG_LENGTH=0WRONG_MAP=0WRONG_REDUCE=0File Input Format Counters Bytes Read=50File Output Format Counters Bytes Written=889[root@hadoop01 home]# hadoop fs -ls /ccccc
运行完后,看看生成的/ccccc目录下的结果,是以.snappy结尾的,成功
Found 2 items
-rw-r--r--   2 root supergroup          0 2019-11-18 09:35 /ccccc/_SUCCESS
-rw-r--r--   2 root supergroup         84 2019-11-18 09:35 /ccccc/part-r-00000.snappy[root@hadoop01 home]# hadoop fs -cat /ccccc/part-r-00000.snappy
看看里边的内容是不是被压缩过了,都是乱码,显然压缩成功了。
tLta.txt       f
324    e
dr     fa    cd
xz     r234reaFvzcxv
asf:ihello.
[root@hadoop01 home]#

java代码

MoreFileDriver

package com.czxy.day20191118.demo02;import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;public class MoreFileDriver extends Configured implements Tool {@Overridepublic int run(String[] args) throws Exception {//在configuration中使用压缩算法Configuration conf = new Configuration();//设置Map输出的数据使用的压缩算法conf.set("mapreduce.map.out.compress","true");conf.set("mapreduce.map.out.compress","org.apache.hadoop.io.compress.SnappyCodec");//设置Reduce输出的数据使用的压缩算法conf.set("mapreduce.output.fileoutputformat.compress","true");conf.set("mapreduce.output.fileoutputformat.compress.type","RECORD");conf.set("mapreduce.output.fileoutputformat.compress.codec","org.apache.hadoop.io.compress.SnappyCodec");Job job = Job.getInstance(conf,"MoreFile");job.setJarByClass(MoreFileDriver.class);job.setInputFormatClass(TextInputFormat.class);TextInputFormat.addInputPath(job,new Path("/aaaaa"));job.setMapperClass(MoreFileMapper.class);job.setMapOutputKeyClass(Text.class);job.setMapOutputValueClass(Text.class);job.setReducerClass(MoreFileReducer.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(Text.class);job.setOutputFormatClass(TextOutputFormat.class);TextOutputFormat.setOutputPath(job,new Path("/ccccc"));return job.waitForCompletion(true)?0:1;}public static void main(String[] args) throws Exception {ToolRunner.run(new MoreFileDriver(),args);}
}

MoreFileMapper

package com.czxy.day20191118.demo02;import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileSplit;import java.io.IOException;public class MoreFileMapper extends Mapper<LongWritable, Text,Text,Text> {@Overrideprotected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {//通过context可以获取这行文本所属的文件名称FileSplit inputSplit = (FileSplit)context.getInputSplit();String filename= inputSplit.getPath().getName();context.write(new Text(filename),value);}
}

MoreFileReducer

package com.czxy.day20191118.demo02;import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;import java.io.IOException;public class MoreFileReducer extends Reducer<Text,Text,Text,Text> {@Overrideprotected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {//遍历Value进行输出for (Text value : values) {context.write(key,value);}}}

运行结果

[root@hadoop01 home]# hadoop fs -ls /ccccc
Found 2 items
-rw-r--r--   2 root supergroup          0 2019-11-18 09:35 /ccccc/_SUCCESS
-rw-r--r--   2 root supergroup         84 2019-11-18 09:35 /ccccc/part-r-00000.snappy
[root@hadoop01 home]# hadoop fs -cat /ccccc/part-r-00000.snappy
tLta.txt       f
324    e
dr     fa    cd
xz     r234reaFvzcxv
asf:ihello.
[root@hadoop01 home]#

请看第一步操作流程的具体内容,结合代码,理解snappy压缩。

总结

最重要的代码都在这里。
在Driver类中,配置Configuration,即可实现压缩算法。
所以压缩算法的java包名是必须要记忆的。

//在configuration中使用压缩算法
Configuration conf = new Configuration();//设置Map输出的数据使用的压缩算法
conf.set("mapreduce.map.out.compress","true");
conf.set("mapreduce.map.out.compress","org.apache.hadoop.io.compress.SnappyCodec");
//设置Reduce输出的数据使用的压缩算法
conf.set("mapreduce.output.fileoutputformat.compress","true");
conf.set("mapreduce.output.fileoutputformat.compress.type","RECORD");
conf.set("mapreduce.output.fileoutputformat.compress.codec","org.apache.hadoop.io.compress.SnappyCodec");

MapReduce Java代码应用Snappy压缩算法相关推荐

  1. MapReduce Java API-使用Partitioner实现输出到多个文件

    场景 MapReduce Java API-多输入路径方式: https://blog.csdn.net/BADAO_LIUMANG_QIZHI/article/details/119453275 在 ...

  2. MapReduce Java API-多输入路径方式

    场景 MapReduce Java API实例-统计单词出现频率: https://blog.csdn.net/BADAO_LIUMANG_QIZHI/article/details/11941016 ...

  3. MapReduce Java API实例-排序

    场景 MapReduce Java API实例-统计单词出现频率: https://blog.csdn.net/BADAO_LIUMANG_QIZHI/article/details/11941016 ...

  4. MapReduce Java API实例-统计平均成绩

    场景 MapReduce Java API实例-统计单词出现频率: https://blog.csdn.net/BADAO_LIUMANG_QIZHI/article/details/11941016 ...

  5. hive运行mysql脚本_用java代码调用shell脚本执行sqoop将hive表中数据导出到mysql

    1:创建shell脚本 1 touch sqoop_options.sh2 chmod 777 sqoop_options.sh 编辑文件  特地将执行map的个数设置为变量  测试 可以java代码 ...

  6. mapreduce 模板代码

    jai包 <dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-core ...

  7. java代码解压rar文件时报错,badRarArchive

    针对java代码解压rar文件时报错浅谈 1.针对所报的异常: de.innosystec.unrar.Archive setFile 警告: exception in archive constru ...

  8. 【Hadoop】hive2.1在hadoop2.7.3中的可执行java代码

    首先是可执行java代码 package lekko.hive;import java.sql.Connection; import java.sql.DriverManager; import ja ...

  9. 编写高性能Java代码的最佳实践

    编写高性能Java代码的最佳实践 摘要:本文首先介绍了负载测试.基于APM工具的应用程序和服务器监控,随后介绍了编写高性能Java代码的一些最佳实践.最后研究了JVM特定的调优技巧.数据库端的优化和架 ...

最新文章

  1. Ubuntu 安装docker CE以及harbor
  2. Windows上使用gitbook制作电子书
  3. 这是从云栖大会指挥部发回的现场报道
  4. 从产品与程序员打架事件,谈谈需求管理的沟通能力
  5. Dependency 'mysql:mysql-connector-java:5.1.28' not found【解决方案】
  6. 《大道至简》第二篇读后感
  7. 许可证编译器 (Lc.exe)
  8. ini配置文件打开模式 r,r+等
  9. android 人脸识别边框_Android相机预览,指定区域显示预览框,在区域内出现人脸进行人脸识别,并抓拍人脸照片存在本地,CameraX,虹软人脸识别...
  10. android 优秀控件以及开源项目
  11. YUV格式详解【全】
  12. Report Machine 在WIN7系统下 IE里面运行错误
  13. Python 列表内【自定义排序】
  14. linux自定义自动补全命令
  15. 【转】乔布斯演讲黄金法则
  16. JavaStream 常用操作(二)
  17. Windows XP 注册表修改大全
  18. 【Unity VR开发基础】Player视角设置调整与地面的相对高度
  19. SEO优化之百度百科
  20. python获取当前年月日_Python获取、格式化当前时间日期的方法

热门文章

  1. 为知识的海洋绘制航图 —— 利用CirroData-Graph图数据库构建知识图谱
  2. 校园招聘京东金融二面
  3. Websocket轻量级消息推送 浏览器socket通信
  4. 屏蔽“是否停止运行此脚本”窗口
  5. BB10 NDK Cascades 框架简介和QT使用教程(与Nokia Symbian QT的区别)
  6. 首个5G智慧机场落地广州 速度是4G的50倍
  7. 很小的文件excel2019内存不足
  8. DEK印刷机Horizon触摸屏维修03iX主机显示屏维修概述
  9. 如何让产品卖的更好提升运营能力(淘宝篇)-刘英
  10. Node.js配合WebSocket实现网页斗地主(一、环境配置)