Sqoop Java API 导入应用案例
环境信息:
Linux+JDK1.7
Sqoop 1.4.6-cdh5.5.2
hadoop-core 2.6.0-mr1-cdh5.5.2
hadoop-common 2.6.0-cdh5.5.2
hadoop-mapreduce-client-core 2.6.0-cdh5.5.2
需求:
将oracle中的某表导入到hdfs
实现:
首先组织Sqoop命令:
String[] args = new String[] { // Oracle数据库信息"--connect","jdbc:oracle:thin:@***:1522/**","-username","***","-password","***", // 查询sql"--query","select * from TABLE_NAME where $CONDITIONS and create_date>=date'2017-05-01' and create_date<date'2017-06-01' ","-split-by","id","--hive-overwrite","--fields-terminated-by","'\\001'","--hive-drop-import-delims","--null-string","'\\\\N'","--null-non-string","'\\\\N'","--verbose","--target-dir","/user/hive/warehouse/test.db/H_TABLE_NAME"};
执行Sqoop任务:
String[] expandArguments = OptionsFileUtil.expandArguments(args);SqoopTool tool = SqoopTool.getTool("import");Configuration conf = new Configuration();conf.set("fs.default.name", "hdfs://nameservice1");//设置HDFS服务地址Configuration loadPlugins = SqoopTool.loadPlugins(conf);Sqoop sqoop = new Sqoop((com.cloudera.sqoop.tool.SqoopTool) tool, loadPlugins);int res = Sqoop.runSqoop(sqoop, expandArguments);if (res == 0)log.info ("成功");
完成编码后,发到测试环境进行测试,发现Sqoop在进行动态编译时报编译错误:
2017-07-26 15:10:15 [ERROR] [http-0.0.0.0-8080-6] [org.apache.sqoop.tool.ImportTool.run(ImportTool.java:613)] Encountered IOException running import job: java.io.IOException: Error returned by javac
at org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:217)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:108)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
动态编译的日志如果没有特殊配置的话,是无法通过log4j进行输出的,因此,编译错误需要到系统日志里查找:
/tmp/sqoop-deploy/compile/b78440d7bc7097805be8b088c525566b/QueryResult.java:7: error: package org.apache.hadoop.io does not exist
import org.apache.hadoop.io.BytesWritable;
^
/tmp/sqoop-deploy/compile/b78440d7bc7097805be8b088c525566b/QueryResult.java:8: error: package org.apache.hadoop.io does not exist
import org.apache.hadoop.io.Text;
^
/tmp/sqoop-deploy/compile/b78440d7bc7097805be8b088c525566b/QueryResult.java:9: error: package org.apache.hadoop.io does not exist
import org.apache.hadoop.io.Writable;
^
/tmp/sqoop-deploy/compile/b78440d7bc7097805be8b088c525566b/QueryResult.java:37: error: cannot access Writable
public class QueryResult extends SqoopRecord implements DBWritable, Writable {
如上,推测是动态编译环境的classpath没有包含hadoop-common包导致的,在CompilationManager里查到了如下内容:
private String findHadoopJars() {String hadoopMapRedHome = options.getHadoopMapRedHome();if (null == hadoopMapRedHome) {LOG.info("$HADOOP_MAPRED_HOME is not set");return Jars.getJarPathForClass(JobConf.class);}if (!hadoopMapRedHome.endsWith(File.separator)) {hadoopMapRedHome = hadoopMapRedHome + File.separator;}File hadoopMapRedHomeFile = new File(hadoopMapRedHome);LOG.info("HADOOP_MAPRED_HOME is " + hadoopMapRedHomeFile.getAbsolutePath());Iterator<File> filesIterator = FileUtils.iterateFiles(hadoopMapRedHomeFile,new String[] { "jar" }, true);StringBuilder sb = new StringBuilder();while (filesIterator.hasNext()) {File file = filesIterator.next();String name = file.getName();if (name.startsWith("hadoop-common")|| name.startsWith("hadoop-mapreduce-client-core")|| name.startsWith("hadoop-core")) {sb.append(file.getAbsolutePath());sb.append(File.pathSeparator);}}if (sb.length() < 1) {LOG.warn("HADOOP_MAPRED_HOME appears empty or missing");return Jars.getJarPathForClass(JobConf.class);}String s = sb.substring(0, sb.length() - 1);LOG.debug("Returning jar file path " + s);return s;}
推测是由于配置里没有hadoopMapRedHome这个参数,导致这个方法只能取到JobConf.class所在的jar包,即hadoop-core包。打开DEBUG进行验证,找到如下日志:
2017-07-26 15:10:14 [INFO] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.findHadoopJars(CompilationManager.java:85)] $HADOOP_MAPRED_HOME is not set
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:171)] Current sqoop classpath = :/usr/local/tomcat6/bin/bootstrap.jar
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:195)] Adding source file: /tmp/sqoop-deploy/compile/1baf2f947722b9531d4a27b1e5ef5aca/QueryResult.java
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:199)] Invoking javac with args:
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:201)] -sourcepath
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:201)] /tmp/sqoop-deploy/compile/1baf2f947722b9531d4a27b1e5ef5aca/
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:201)] -d
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:201)] /tmp/sqoop-deploy/compile/1baf2f947722b9531d4a27b1e5ef5aca/
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:201)] -classpath
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:201)] :/usr/local/tomcat6/bin/bootstrap.jar:/var/www/webapps/***/WEB-INF/lib/hadoop-core-2.6.0-mr1-cdh5.5.2.jar:/var/www/webapps/***/WEB-INF/lib/sqoop-1.4.6-cdh5.5.2.jar
果然是缺少了jar包。在CompilationManager中查到classpath的组装方式如下:
String curClasspath = System.getProperty("java.class.path");LOG.debug("Current sqoop classpath = " + curClasspath);args.add("-sourcepath");args.add(jarOutDir);args.add("-d");args.add(jarOutDir);args.add("-classpath");args.add(curClasspath + File.pathSeparator + coreJar + sqoopJar);
可以通过两种方式将缺失的jar添加进去:
1.直接修改java.class.path:
String curClasspath = System.getProperty ("java.class.path");curClasspath = curClasspath+ File.pathSeparator+ "/var/www/webapps/***/WEB-INF/lib/hadoop-common-2.6.0-cdh5.5.2.jar"+ File.pathSeparator+ "/var/www/webapps/***/WEB-INF/lib/hadoop-mapreduce-client-core-2.6.0-cdh5.5.2.jar";System.setProperty ("java.class.path", curClasspath);
2.增加配置项(未尝试):
--hadoop-mapred-home <dir>
指定$HADOOP_MAPRED_HOME路径
使用第一种方式后,已经能够正常进行导入操作:
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1547)] Job complete: job_local703153215_0001
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:566)] Counters: 18
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:568)] File System Counters
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)] FILE: Number of bytes read=15015144
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)] FILE: Number of bytes written=15688984
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)] FILE: Number of read operations=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)] FILE: Number of large read operations=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)] FILE: Number of write operations=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)] HDFS: Number of bytes read=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)] HDFS: Number of bytes written=1536330810
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)] HDFS: Number of read operations=40
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)] HDFS: Number of large read operations=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)] HDFS: Number of write operations=36
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:568)] Map-Reduce Framework
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)] Map input records=3272909
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)] Map output records=3272909
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)] Input split bytes=455
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)] Spilled Records=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)] CPU time spent (ms)=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)] Physical memory (bytes) snapshot=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)] Virtual memory (bytes) snapshot=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)] Total committed heap usage (bytes)=4080271360
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:184)] Transferred 1.4308 GB in 71.5332 seconds (20.4822 MB/sec)
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:186)] Retrieved 3272909 records.
至此,Sqoop Java API 导入demo完成。
参考文章:
http://shiyanjun.cn/archives/624.html Sqoop-1.4.4工具import和export使用详解
http://blog.csdn.net/sl1992/article/details/53521819 Java操作Sqoop对象
转载于:https://www.cnblogs.com/claren/p/7240735.html
Sqoop Java API 导入应用案例相关推荐
- 小记--------hbase数据库java API 常用方法及案例
HBaseAdmin类:管理hbase数据库的表信息,'创建表.删除表.列出表选项.使表有效/无效.添加或删除列簇': Eg: //使用工厂模式创建connection val conn: Con ...
- java dms_奥点云-DMS Java API 文档
DMS Java API 导入Jar包org.eclipse.paho.client.mqttv3_1.0.2_dms.jar 初始化String clientId = "JavaSampl ...
- Elasticsearch——使用Java API实现ES中的索引、映射、文档操作
文章目录: 1.开篇 2.案例详解 2.1 创建ES客户端:完成与ES服务端的连接 2.2 创建索引 2.3 查看索引 2.4 删除索引 2.5 创建文档 2.6 修改文档 2.7 查看文档 2.8 ...
- Sqoop安装部署和数据的导入导出案例详解
目录 一.概述 二.Sqoop的工作机制 三.Sqoop1与Sqoop2架构对比 四.Sqoop安装部署 五.Sqoop的数据导入 1. 列举出所有的数据库 2. 准备表数据 3. 导入数据库表数据到 ...
- POI和Java Excel Api导入导出----详细到你不敢相信
来自:http://blog.csdn.net/jerehedu/article/details/45195359 一.介绍 当前B/S模式已成为应用开发的主流,而在企业办公系统中,常常有客户这样子要 ...
- 分布式文件系统—HDFS—Java API操作
原文作者:jiangw-Tony 原文地址:HDFS基础使用 hdfs 在生产应用中主要是客户端的开发,其核心步骤是从 hdfs 提供的 api 中构造一个 HDFS的访问客户端对象,然后通过该客户端 ...
- 知识图谱java实现_知识图谱:neo4j(四)Java API
知识图谱:neo4j(四)Java API 知识图谱:neo4j(四)Java API Neo4j Java API Neo4j 提供 JAVA API 以编程方式执行所有数据库操作.它支持两种类型的 ...
- SQOOP 部署及导入数据到 Hive 的实际应用
目录 写在最前: 1.核心的功能有两个 2.sqoop 版本说明 一.部署 1.安装前准备 2.创建用户和组 3.解压到指定目录 4.添加系统环境变量 5.创建 sqoop-env.sh 文件 5.1 ...
- Hbase高手之路 -- 第五章 -- HBase的Java API编程
Hbase高手之路 – 第五章 – HBase的Java API编程 一. 需求与数据集 某自来水公司,需要存储大量的缴费明细数据,以下截取了缴费明细的一部分内容: 因为缴费明细的数据记录非常庞大,该 ...
- ArangoDB 学习笔记(二)AQL Java API | AQL语法 | 使用Java连接ArangoDB
文章目录 参考资料 一.ArangoDB Java Driver 支持的不同类型 1.1 BaseDocument 1.2 XML 1.3 Graph 二.AQL 2.1 AQL 语法 2.1.1 查 ...
最新文章
- 基于opencv的简单视频处理类示例
- hivemetastore java,hive启动报错 hive.metastore.HiveMetaStoreClient
- R语言使用ggplot2包使用geom_dotplot函数绘制分组点图(自定义分组颜色、主题)实战(dot plot)
- 洛谷题单的Python版题解(有需要的小伙伴可以来看看哦~!)
- 解决vscode安装后无法启动的问题
- 信息学奥赛一本通(1169:大整数减法)
- c# base 和this 继承
- 编程之美2.17 数组循环移位
- 用Navicat for MySQL往数据表中添加数据时汉字出现乱码
- Linux_防火墙入门01:以太网的诞生与演变历程
- 南阳理工acm 205求余数
- 2021面试题——CSS面试题总结
- 加密解密之 crypto-js 知识
- 奇怪的电梯(DP动态规划和BFS)
- SMing:2022年中青杯B题思路
- 【5GC】三种SSC(Session and Service Continuity)模式介绍
- python坦克大战
- “添翼杯”人工智能创新应用大赛之垃圾分类
- 进程创建-终止-等待-替换
- Beyond compare用法详解
热门文章
- Androidpn里的XmppManager的理解
- 【转】如何解决:Android中 Error generating final archive: Debug Certificate expired on 10/09/18 16:30 的错误...
- 哪些原因可能导致SQL操作操时呢?
- ProcessPoolExecutor
- 比赛 | 第一届古汉语分词与词性标注国际评测来啦
- 【华为】对标谷歌Dropout专利,华为开源自研算法Disout,多项任务表现更佳
- 【GNN】图神经网络入门之GRN图循环网络
- 【分享】伙伴们!关注公众号要慎重!这半年我取关了很多,这几个留下了!真心推荐给你!...
- 【一分钟知识】梯度下降与牛顿法对比
- 每日算法系列【LeetCode 658】找到 K 个最接近的元素