环境信息:

Linux+JDK1.7

Sqoop 1.4.6-cdh5.5.2

hadoop-core 2.6.0-mr1-cdh5.5.2

hadoop-common 2.6.0-cdh5.5.2

hadoop-mapreduce-client-core 2.6.0-cdh5.5.2

需求:

将oracle中的某表导入到hdfs

实现:

首先组织Sqoop命令:

String[] args = new String[] {     // Oracle数据库信息"--connect","jdbc:oracle:thin:@***:1522/**","-username","***","-password","***",                      // 查询sql"--query","select * from  TABLE_NAME where $CONDITIONS and create_date>=date'2017-05-01' and create_date<date'2017-06-01' ","-split-by","id","--hive-overwrite","--fields-terminated-by","'\\001'","--hive-drop-import-delims","--null-string","'\\\\N'","--null-non-string","'\\\\N'","--verbose","--target-dir","/user/hive/warehouse/test.db/H_TABLE_NAME"};

执行Sqoop任务:

       String[] expandArguments = OptionsFileUtil.expandArguments(args);SqoopTool tool = SqoopTool.getTool("import");Configuration conf = new Configuration();conf.set("fs.default.name", "hdfs://nameservice1");//设置HDFS服务地址Configuration loadPlugins = SqoopTool.loadPlugins(conf);Sqoop sqoop = new Sqoop((com.cloudera.sqoop.tool.SqoopTool) tool, loadPlugins);int res = Sqoop.runSqoop(sqoop, expandArguments);if (res == 0)log.info ("成功");

完成编码后,发到测试环境进行测试,发现Sqoop在进行动态编译时报编译错误:

2017-07-26 15:10:15 [ERROR] [http-0.0.0.0-8080-6] [org.apache.sqoop.tool.ImportTool.run(ImportTool.java:613)] Encountered IOException running import job: java.io.IOException: Error returned by javac
    at org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:217)
    at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:108)
    at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
    at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
    at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)

动态编译的日志如果没有特殊配置的话,是无法通过log4j进行输出的,因此,编译错误需要到系统日志里查找:

/tmp/sqoop-deploy/compile/b78440d7bc7097805be8b088c525566b/QueryResult.java:7: error: package org.apache.hadoop.io does not exist
import org.apache.hadoop.io.BytesWritable;
                           ^
/tmp/sqoop-deploy/compile/b78440d7bc7097805be8b088c525566b/QueryResult.java:8: error: package org.apache.hadoop.io does not exist
import org.apache.hadoop.io.Text;
                           ^
/tmp/sqoop-deploy/compile/b78440d7bc7097805be8b088c525566b/QueryResult.java:9: error: package org.apache.hadoop.io does not exist
import org.apache.hadoop.io.Writable;
                           ^
/tmp/sqoop-deploy/compile/b78440d7bc7097805be8b088c525566b/QueryResult.java:37: error: cannot access Writable
public class QueryResult extends SqoopRecord  implements DBWritable, Writable {

如上,推测是动态编译环境的classpath没有包含hadoop-common包导致的,在CompilationManager里查到了如下内容:

private String findHadoopJars() {String hadoopMapRedHome = options.getHadoopMapRedHome();if (null == hadoopMapRedHome) {LOG.info("$HADOOP_MAPRED_HOME is not set");return Jars.getJarPathForClass(JobConf.class);}if (!hadoopMapRedHome.endsWith(File.separator)) {hadoopMapRedHome = hadoopMapRedHome + File.separator;}File hadoopMapRedHomeFile = new File(hadoopMapRedHome);LOG.info("HADOOP_MAPRED_HOME is " + hadoopMapRedHomeFile.getAbsolutePath());Iterator<File> filesIterator = FileUtils.iterateFiles(hadoopMapRedHomeFile,new String[] { "jar" }, true);StringBuilder sb = new StringBuilder();while (filesIterator.hasNext()) {File file = filesIterator.next();String name = file.getName();if (name.startsWith("hadoop-common")|| name.startsWith("hadoop-mapreduce-client-core")|| name.startsWith("hadoop-core")) {sb.append(file.getAbsolutePath());sb.append(File.pathSeparator);}}if (sb.length() < 1) {LOG.warn("HADOOP_MAPRED_HOME appears empty or missing");return Jars.getJarPathForClass(JobConf.class);}String s = sb.substring(0, sb.length() - 1);LOG.debug("Returning jar file path " + s);return s;}

推测是由于配置里没有hadoopMapRedHome这个参数,导致这个方法只能取到JobConf.class所在的jar包,即hadoop-core包。打开DEBUG进行验证,找到如下日志:

2017-07-26 15:10:14 [INFO] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.findHadoopJars(CompilationManager.java:85)] $HADOOP_MAPRED_HOME is not set
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:171)] Current sqoop classpath = :/usr/local/tomcat6/bin/bootstrap.jar
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:195)] Adding source file: /tmp/sqoop-deploy/compile/1baf2f947722b9531d4a27b1e5ef5aca/QueryResult.java
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:199)] Invoking javac with args:
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:201)]   -sourcepath
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:201)]   /tmp/sqoop-deploy/compile/1baf2f947722b9531d4a27b1e5ef5aca/
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:201)]   -d
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:201)]   /tmp/sqoop-deploy/compile/1baf2f947722b9531d4a27b1e5ef5aca/
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:201)]   -classpath
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:201)]   :/usr/local/tomcat6/bin/bootstrap.jar:/var/www/webapps/***/WEB-INF/lib/hadoop-core-2.6.0-mr1-cdh5.5.2.jar:/var/www/webapps/***/WEB-INF/lib/sqoop-1.4.6-cdh5.5.2.jar

果然是缺少了jar包。在CompilationManager中查到classpath的组装方式如下:

   String curClasspath = System.getProperty("java.class.path");LOG.debug("Current sqoop classpath = " + curClasspath);args.add("-sourcepath");args.add(jarOutDir);args.add("-d");args.add(jarOutDir);args.add("-classpath");args.add(curClasspath + File.pathSeparator + coreJar + sqoopJar);

可以通过两种方式将缺失的jar添加进去:

1.直接修改java.class.path:

        String curClasspath = System.getProperty ("java.class.path");curClasspath = curClasspath+ File.pathSeparator+ "/var/www/webapps/***/WEB-INF/lib/hadoop-common-2.6.0-cdh5.5.2.jar"+ File.pathSeparator+ "/var/www/webapps/***/WEB-INF/lib/hadoop-mapreduce-client-core-2.6.0-cdh5.5.2.jar";System.setProperty ("java.class.path", curClasspath);

2.增加配置项(未尝试):

--hadoop-mapred-home <dir>  指定$HADOOP_MAPRED_HOME路径

使用第一种方式后,已经能够正常进行导入操作:

2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1547)] Job complete: job_local703153215_0001
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:566)] Counters: 18
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:568)]   File System Counters
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]     FILE: Number of bytes read=15015144
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]     FILE: Number of bytes written=15688984
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]     FILE: Number of read operations=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]     FILE: Number of large read operations=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]     FILE: Number of write operations=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]     HDFS: Number of bytes read=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]     HDFS: Number of bytes written=1536330810
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]     HDFS: Number of read operations=40
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]     HDFS: Number of large read operations=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]     HDFS: Number of write operations=36
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:568)]   Map-Reduce Framework
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]     Map input records=3272909
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]     Map output records=3272909
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]     Input split bytes=455
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]     Spilled Records=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]     CPU time spent (ms)=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]     Physical memory (bytes) snapshot=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]     Virtual memory (bytes) snapshot=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]     Total committed heap usage (bytes)=4080271360
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:184)] Transferred 1.4308 GB in 71.5332 seconds (20.4822 MB/sec)
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:186)] Retrieved 3272909 records.

至此,Sqoop Java API 导入demo完成。

参考文章:

http://shiyanjun.cn/archives/624.html Sqoop-1.4.4工具import和export使用详解

http://blog.csdn.net/sl1992/article/details/53521819  Java操作Sqoop对象

转载于:https://www.cnblogs.com/claren/p/7240735.html

Sqoop Java API 导入应用案例相关推荐

  1. 小记--------hbase数据库java API 常用方法及案例

     HBaseAdmin类:管理hbase数据库的表信息,'创建表.删除表.列出表选项.使表有效/无效.添加或删除列簇': Eg: //使用工厂模式创建connection val conn: Con ...

  2. java dms_奥点云-DMS Java API 文档

    DMS Java API 导入Jar包org.eclipse.paho.client.mqttv3_1.0.2_dms.jar 初始化String clientId = "JavaSampl ...

  3. Elasticsearch——使用Java API实现ES中的索引、映射、文档操作

    文章目录: 1.开篇 2.案例详解 2.1 创建ES客户端:完成与ES服务端的连接 2.2 创建索引 2.3 查看索引 2.4 删除索引 2.5 创建文档 2.6 修改文档 2.7 查看文档 2.8 ...

  4. Sqoop安装部署和数据的导入导出案例详解

    目录 一.概述 二.Sqoop的工作机制 三.Sqoop1与Sqoop2架构对比 四.Sqoop安装部署 五.Sqoop的数据导入 1. 列举出所有的数据库 2. 准备表数据 3. 导入数据库表数据到 ...

  5. POI和Java Excel Api导入导出----详细到你不敢相信

    来自:http://blog.csdn.net/jerehedu/article/details/45195359 一.介绍 当前B/S模式已成为应用开发的主流,而在企业办公系统中,常常有客户这样子要 ...

  6. 分布式文件系统—HDFS—Java API操作

    原文作者:jiangw-Tony 原文地址:HDFS基础使用 hdfs 在生产应用中主要是客户端的开发,其核心步骤是从 hdfs 提供的 api 中构造一个 HDFS的访问客户端对象,然后通过该客户端 ...

  7. 知识图谱java实现_知识图谱:neo4j(四)Java API

    知识图谱:neo4j(四)Java API 知识图谱:neo4j(四)Java API Neo4j Java API Neo4j 提供 JAVA API 以编程方式执行所有数据库操作.它支持两种类型的 ...

  8. SQOOP 部署及导入数据到 Hive 的实际应用

    目录 写在最前: 1.核心的功能有两个 2.sqoop 版本说明 一.部署 1.安装前准备 2.创建用户和组 3.解压到指定目录 4.添加系统环境变量 5.创建 sqoop-env.sh 文件 5.1 ...

  9. Hbase高手之路 -- 第五章 -- HBase的Java API编程

    Hbase高手之路 – 第五章 – HBase的Java API编程 一. 需求与数据集 某自来水公司,需要存储大量的缴费明细数据,以下截取了缴费明细的一部分内容: 因为缴费明细的数据记录非常庞大,该 ...

  10. ArangoDB 学习笔记(二)AQL Java API | AQL语法 | 使用Java连接ArangoDB

    文章目录 参考资料 一.ArangoDB Java Driver 支持的不同类型 1.1 BaseDocument 1.2 XML 1.3 Graph 二.AQL 2.1 AQL 语法 2.1.1 查 ...

最新文章

  1. 基于opencv的简单视频处理类示例
  2. hivemetastore java,hive启动报错 hive.metastore.HiveMetaStoreClient
  3. R语言使用ggplot2包使用geom_dotplot函数绘制分组点图(自定义分组颜色、主题)实战(dot plot)
  4. 洛谷题单的Python版题解(有需要的小伙伴可以来看看哦~!)
  5. 解决vscode安装后无法启动的问题
  6. 信息学奥赛一本通(1169:大整数减法)
  7. c# base 和this 继承
  8. 编程之美2.17 数组循环移位
  9. 用Navicat for MySQL往数据表中添加数据时汉字出现乱码
  10. Linux_防火墙入门01:以太网的诞生与演变历程
  11. 南阳理工acm 205求余数
  12. 2021面试题——CSS面试题总结
  13. 加密解密之 crypto-js 知识
  14. 奇怪的电梯(DP动态规划和BFS)
  15. SMing:2022年中青杯B题思路
  16. 【5GC】三种SSC(Session and Service Continuity)模式介绍
  17. python坦克大战
  18. “添翼杯”人工智能创新应用大赛之垃圾分类
  19. 进程创建-终止-等待-替换
  20. Beyond compare用法详解

热门文章

  1. Androidpn里的XmppManager的理解
  2. 【转】如何解决:Android中 Error generating final archive: Debug Certificate expired on 10/09/18 16:30 的错误...
  3. 哪些原因可能导致SQL操作操时呢?
  4. ProcessPoolExecutor
  5. 比赛 | 第一届古汉语分词与词性标注国际评测来啦
  6. 【华为】对标谷歌Dropout专利,华为开源自研算法Disout,多项任务表现更佳
  7. 【GNN】图神经网络入门之GRN图循环网络
  8. 【分享】伙伴们!关注公众号要慎重!这半年我取关了很多,这几个留下了!真心推荐给你!...
  9. 【一分钟知识】梯度下降与牛顿法对比
  10. 每日算法系列【LeetCode 658】找到 K 个最接近的元素