Alex 的 Hadoop 菜鸟教程: 第12课 Sqoop1 安装/导入/导出教程
原帖地址: http://blog.csdn.net/nsrainbow/article/details/41575807
Sqoop是什么
sqoop是用于在传统关系型数据库跟hdfs之间进行数据导入导出的工具。目前sqoop已经出了2,但是截至当前,sqoop2还是个半成品,不支持hbase,功能还很少,所以我还是主要讲sqoop1
安装Sqoop1
- yum install -y sqoop
用help测试下是否有安装好
- # sqoop help
- Warning: /usr/lib/sqoop/../hive-hcatalog does not exist! HCatalog jobs will fail.
- Please set $HCAT_HOME to the root of your HCatalog installation.
- Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
- Please set $ACCUMULO_HOME to the root of your Accumulo installation.
- 14/11/28 11:33:11 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.0.1
- usage: sqoop COMMAND [ARGS]
- Available commands:
- codegen Generate code to interact with database records
- create-hive-table Import a table definition into Hive
- eval Evaluate a SQL statement and display the results
- export Export an HDFS directory to a database table
- help List available commands
- import Import a table from a database to HDFS
- import-all-tables Import tables from a database to HDFS
- job Work with saved jobs
- list-databases List available databases on a server
- list-tables List available tables in a database
- merge Merge results of incremental imports
- metastore Run a standalone Sqoop metastore
- version Display version information
- See 'sqoop help COMMAND' for information on a specific command.
拷贝驱动到 /usr/lib/sqoop/lib
mysql jdbc 驱动下载地址
下载后,解压开找到驱动jar包,upload到服务器上,然后移过去
- mv /home/alex/mysql-connector-java-5.1.34-bin.jar /usr/lib/sqoop/lib
导入
数据准备
- create database sqoop_test;
- CREATE TABLE `employee` (
- `id` int(11) NOT NULL,
- `name` varchar(20) NOT NULL,
- PRIMARY KEY (`id`)
- ) ENGINE=MyISAM DEFAULT CHARSET=utf8;
插入几条数据
- insert into employee (id,name) values (1,'michael');
- insert into employee (id,name) values (2,'ted');
- insert into employee (id,name) values (3,'jack');
导入mysql到hdfs
列出所有表
- mysql> use mysql
- mysql> update user set Host='%' where Host='127.0.0.1' and User='root';
- mysql> flush privileges;
- # sqoop list-databases --connect jdbc:mysql://host1:3306/sqoop_test --username root --password root
- Warning: /usr/lib/sqoop/../hive-hcatalog does not exist! HCatalog jobs will fail.
- Please set $HCAT_HOME to the root of your HCatalog installation.
- Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
- Please set $ACCUMULO_HOME to the root of your Accumulo installation.
- 14/12/01 09:20:28 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.0.1
- 14/12/01 09:20:28 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
- 14/12/01 09:20:28 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
- information_schema
- cacti
- metastore
- mysql
- sqoop_test
- wordpress
- zabbix
- # sqoop list-tables --connect jdbc:mysql://host1/sqoop_test --username root --password root
- Warning: /usr/lib/sqoop/../hive-hcatalog does not exist! HCatalog jobs will fail.
- Please set $HCAT_HOME to the root of your HCatalog installation.
- Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
- Please set $ACCUMULO_HOME to the root of your Accumulo installation.
- 14/11/28 11:46:11 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.0.1
- 14/11/28 11:46:11 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
- 14/11/28 11:46:11 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
- employee
- student
- workers
这条命令不用跟驱动的类名是因为sqoop默认支持mysql的,如果要跟jdbc驱动的类名用
- # sqoop list-tables --connect jdbc:mysql://localhost/sqoop_test --username root --password root --driver com.mysql.jdbc.Driver
导入数据到hdfs
- sqoop import --connect jdbc:mysql://host1:3306/sqoop_test --username root --password root --table employee --m 1 --target-dir /user/test3
- import 代表是导入任务
- --connect 指定连接的url
- --username 指定用户名
- --password 指定密码
- --table 指定要导入的数据源表
- --m 代表任务并发数,这里设置成1
- --target-dir 代表导入后要存储的hdfs上的文件夹位置
- 更详细的参数介绍在 http://sqoop.apache.org/docs/1.4.5/SqoopUserGuide.html#_purpose
- [root@host1 hadoop-hdfs]# sqoop import --connect jdbc:mysql://host1:3306/sqoop_test --username root --password root --table employee --m 1 --target-dir /user/test3
- Warning: /usr/lib/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
- Please set $HCAT_HOME to the root of your HCatalog installation.
- Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
- Please set $ACCUMULO_HOME to the root of your Accumulo installation.
- 15/01/23 06:48:10 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.2.1
- 15/01/23 06:48:10 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
- 15/01/23 06:48:11 INFO manager.SqlManager: Using default fetchSize of 1000
- 15/01/23 06:48:11 INFO tool.CodeGenTool: Beginning code generation
- 15/01/23 06:48:12 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee` AS t LIMIT 1
- 15/01/23 06:48:12 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee` AS t LIMIT 1
- 15/01/23 06:48:12 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
- Note: /tmp/sqoop-root/compile/0989201fc3275ff35dc9c41f1031ea42/employee.java uses or overrides a deprecated API.
- Note: Recompile with -Xlint:deprecation for details.
- 15/01/23 06:48:45 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/0989201fc3275ff35dc9c41f1031ea42/employee.jar
- 15/01/23 06:48:47 WARN manager.MySQLManager: It looks like you are importing from mysql.
- 15/01/23 06:48:47 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
- 15/01/23 06:48:47 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
- 15/01/23 06:48:47 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
- 15/01/23 06:48:47 INFO mapreduce.ImportJobBase: Beginning import of employee
- 15/01/23 06:48:57 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
- 15/01/23 06:49:12 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
- 15/01/23 06:49:13 INFO client.RMProxy: Connecting to ResourceManager at host1/192.168.199.126:8032
- 15/01/23 06:50:10 INFO db.DBInputFormat: Using read commited transaction isolation
- 15/01/23 06:50:10 INFO mapreduce.JobSubmitter: number of splits:1
- 15/01/23 06:50:12 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1421771779239_0003
- 15/01/23 06:50:22 INFO impl.YarnClientImpl: Submitted application application_1421771779239_0003
- 15/01/23 06:50:23 INFO mapreduce.Job: The url to track the job: http://host1:8088/proxy/application_1421771779239_0003/
- 15/01/23 06:50:23 INFO mapreduce.Job: Running job: job_1421771779239_0003
- 15/01/23 06:57:10 INFO mapreduce.Job: Job job_1421771779239_0003 running in uber mode : false
- 15/01/23 06:57:16 INFO mapreduce.Job: map 0% reduce 0%
- 15/01/23 06:58:13 INFO mapreduce.Job: map 100% reduce 0%
- 15/01/23 06:58:19 INFO mapreduce.Job: Job job_1421771779239_0003 completed successfully
- 15/01/23 06:58:33 INFO mapreduce.Job: Counters: 30
- File System Counters
- FILE: Number of bytes read=0
- FILE: Number of bytes written=128844
- FILE: Number of read operations=0
- FILE: Number of large read operations=0
- FILE: Number of write operations=0
- HDFS: Number of bytes read=87
- HDFS: Number of bytes written=23
- HDFS: Number of read operations=4
- HDFS: Number of large read operations=0
- HDFS: Number of write operations=2
- Job Counters
- Launched map tasks=1
- Other local map tasks=1
- Total time spent by all maps in occupied slots (ms)=74359
- Total time spent by all reduces in occupied slots (ms)=0
- Total time spent by all map tasks (ms)=74359
- Total vcore-seconds taken by all map tasks=74359
- Total megabyte-seconds taken by all map tasks=76143616
- Map-Reduce Framework
- Map input records=3
- Map output records=3
- Input split bytes=87
- Spilled Records=0
- Failed Shuffles=0
- Merged Map outputs=0
- GC time elapsed (ms)=501
- CPU time spent (ms)=2680
- Physical memory (bytes) snapshot=107692032
- Virtual memory (bytes) snapshot=654852096
- Total committed heap usage (bytes)=17760256
- File Input Format Counters
- Bytes Read=0
- File Output Format Counters
- Bytes Written=23
- 15/01/23 06:58:35 INFO ipc.Client: Retrying connect to server: host1.localdomain/192.168.199.126:39437. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
- 15/01/23 06:58:36 INFO ipc.Client: Retrying connect to server: host1.localdomain/192.168.199.126:39437. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
- 15/01/23 06:58:37 INFO ipc.Client: Retrying connect to server: host1.localdomain/192.168.199.126:39437. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
- 15/01/23 06:58:37 INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
- 15/01/23 06:59:16 INFO mapreduce.ImportJobBase: Transferred 23 bytes in 601.9783 seconds (0.0382 bytes/sec)
- 15/01/23 06:59:16 INFO mapreduce.ImportJobBase: Retrieved 3 records.
查看一下结果
- # hdfs dfs -ls /user/test3
- Found 2 items
- -rw-r--r-- 2 root supergroup 0 2014-12-01 14:16 /user/test3/_SUCCESS
- -rw-r--r-- 2 root supergroup 16 2014-12-01 14:16 /user/test3/part-m-00000
- # hdfs dfs -cat /user/test3/part-m-00000
- 1,michael
- 2,ted
- 3,jack
我遇到遇到的问题
- 14/12/01 10:12:42 INFO mapreduce.Job: Task Id : attempt_1406097234796_0017_m_000000_0, Status : FAILED
- Error: employee : Unsupported major.minor version 51.0
用ps aux| grep hadoop看下会发现hadoop用的是jdk1.6 。我的cdh是5.0.1 sqoop版本是 1.4.4 ,我遇到了这个问题。
关于改jdk的方法
- for x in `cd /etc/init.d ; ls hive-*` ; do sudo service $x stop ; done
- for x in `cd /etc/init.d ; ls hbase-*` ; do sudo service $x stop ; done
- for x in `cd /etc/init.d ; ls hadoop-*` ; do sudo service $x stop ; done
- <span style="font-family: Arial, Helvetica, sans-serif;">/etc/init.d/zookeeper-server stop</span>
- <pre code_snippet_id="538371" snippet_file_name="blog_20141201_15_171205" name="code" class="plain">/etc/init.d/zookeeper-server start
for x in `cd /etc/init.d ; ls hadoop-*` ; do sudo service $x start ; donefor x in `cd /etc/init.d ; ls hbase-*` ; do sudo service $x start ; donefor x in `cd /etc/init.d ; ls hive-*` ; do sudo service $x start ; done
从hdfs导出数据到mysql
数据准备
- mysql> truncate employee
导出数据到mysql
- # sqoop export --connect jdbc:mysql://host1:3306/sqoop_test --username root --password root --table employee --m 1 --export-dir /user/test3
- export 代表导出任务
- --connect 连接的url
- --username 用户名
- --password 密码
- --table 要导出的mysql数据表名字
- --m 并发数
- --export-dir 导出的hdfs源文件夹
- 更详细的参数介绍在 http://sqoop.apache.org/docs/1.4.5/SqoopUserGuide.html#_purpose_3
- [root@host1 hadoop-hdfs]# sqoop export --connect jdbc:mysql://host1:3306/sqoop_test --username root --password root --table employee --m 1 --export-dir /user/test3
- Warning: /usr/lib/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
- Please set $HCAT_HOME to the root of your HCatalog installation.
- Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
- Please set $ACCUMULO_HOME to the root of your Accumulo installation.
- 15/01/23 07:04:44 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.2.1
- 15/01/23 07:04:44 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
- 15/01/23 07:04:45 INFO manager.SqlManager: Using default fetchSize of 1000
- 15/01/23 07:04:45 INFO tool.CodeGenTool: Beginning code generation
- 15/01/23 07:04:48 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee` AS t LIMIT 1
- 15/01/23 07:04:48 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee` AS t LIMIT 1
- 15/01/23 07:04:48 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
- Note: /tmp/sqoop-root/compile/4e6318352dc0beeb6e1e7724c8a6d935/employee.java uses or overrides a deprecated API.
- Note: Recompile with -Xlint:deprecation for details.
- 15/01/23 07:05:07 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/4e6318352dc0beeb6e1e7724c8a6d935/employee.jar
- 15/01/23 07:05:07 INFO mapreduce.ExportJobBase: Beginning export of employee
- 15/01/23 07:05:11 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
- 15/01/23 07:05:24 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
- 15/01/23 07:05:24 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
- 15/01/23 07:05:24 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
- 15/01/23 07:05:25 INFO client.RMProxy: Connecting to ResourceManager at host1/192.168.199.126:8032
- 15/01/23 07:06:00 INFO input.FileInputFormat: Total input paths to process : 1
- 15/01/23 07:06:00 INFO input.FileInputFormat: Total input paths to process : 1
- 15/01/23 07:06:01 INFO mapreduce.JobSubmitter: number of splits:1
- 15/01/23 07:06:01 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
- 15/01/23 07:06:02 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1421771779239_0004
- 15/01/23 07:06:05 INFO impl.YarnClientImpl: Submitted application application_1421771779239_0004
- 15/01/23 07:06:06 INFO mapreduce.Job: The url to track the job: http://host1:8088/proxy/application_1421771779239_0004/
- 15/01/23 07:06:06 INFO mapreduce.Job: Running job: job_1421771779239_0004
- 15/01/23 07:08:03 INFO mapreduce.Job: Job job_1421771779239_0004 running in uber mode : false
- 15/01/23 07:08:03 INFO mapreduce.Job: map 0% reduce 0%
- 15/01/23 07:12:15 INFO mapreduce.Job: map 100% reduce 0%
- 15/01/23 07:12:49 INFO mapreduce.Job: Job job_1421771779239_0004 completed successfully
- 15/01/23 07:12:52 INFO mapreduce.Job: Counters: 30
- File System Counters
- FILE: Number of bytes read=0
- FILE: Number of bytes written=128509
- FILE: Number of read operations=0
- FILE: Number of large read operations=0
- FILE: Number of write operations=0
- HDFS: Number of bytes read=147
- HDFS: Number of bytes written=0
- HDFS: Number of read operations=4
- HDFS: Number of large read operations=0
- HDFS: Number of write operations=0
- Job Counters
- Launched map tasks=1
- Rack-local map tasks=1
- Total time spent by all maps in occupied slots (ms)=253584
- Total time spent by all reduces in occupied slots (ms)=0
- Total time spent by all map tasks (ms)=253584
- Total vcore-seconds taken by all map tasks=253584
- Total megabyte-seconds taken by all map tasks=259670016
- Map-Reduce Framework
- Map input records=3
- Map output records=3
- Input split bytes=121
- Spilled Records=0
- Failed Shuffles=0
- Merged Map outputs=0
- GC time elapsed (ms)=3872
- CPU time spent (ms)=3390
- Physical memory (bytes) snapshot=97640448
- Virtual memory (bytes) snapshot=652566528
- Total committed heap usage (bytes)=15585280
- File Input Format Counters
- Bytes Read=0
- File Output Format Counters
- Bytes Written=0
- 15/01/23 07:12:52 INFO mapreduce.ExportJobBase: Transferred 147 bytes in 448.1491 seconds (0.328 bytes/sec)
- 15/01/23 07:12:52 INFO mapreduce.ExportJobBase: Exported 3 records.
- mysql> select * from employee;
- +----+---------+
- | id | name |
- +----+---------+
- | 1 | michael |
- | 2 | ted |
- | 3 | jack |
- +----+---------+
- 3 rows in set (0.12 sec)
Alex 的 Hadoop 菜鸟教程: 第12课 Sqoop1 安装/导入/导出教程相关推荐
- Alex 的 Hadoop 菜鸟教程: 第10课 Hive 安装和使用教程
原帖地址: http://blog.csdn.net/nsrainbow/article/details/41748863 最新课程请关注原作者博客 声明 本文基于Centos 6.x + CDH 5 ...
- Alex 的 Hadoop 菜鸟教程: 第5课 YARN 安装以及helloworld (基于centos的CDH)
原帖地址:http://blog.csdn.net/nsrainbow/article/details/36627675 新老MapReduce的比较 说到YARN肯定要先说下老的MapReduce ...
- linux中mongo的导出数据,Linux下mongodb安装及数据导入导出教程(示例代码)
Linux下mongodb安装及数据导入导出教程 #查看linux发行版本 cat /etc/issue #查看linux内核版本号 uname -r 一.Linux下mongodb安装的一般步骤 1 ...
- MongoDB 教程六: MongoDB管理:数据导入导出,数据备份恢复及用户安全与认证
视频地址:MongoDB 教程六: MongoDB管理:数据导入导出,数据备份恢复及用户安全与认证 MongoDB数据库备份与恢复 一.备份 先介绍下命令语法: mongodump -h dbhost ...
- Linux下mongodb安装及数据导入导出教程
Linux下mongodb安装及数据导入导出教程 #查看linux发行版本 cat /etc/issue #查看linux内核版本号 uname -r 一.Linux下mongodb安装的一般步骤 1 ...
- Eclipse安装教程 ——史上最详细安装Java Python教程说明
Eclipse安装教程 --史上最详细安装Java&Python教程说明 ...
- K3金碟新建账套及基础数据导入导出教程
K3金碟新建账套及基础数据导入导出教程
- python3.7.4安装教程win7_Window10下python3.7 安装与卸载教程图解
Window10下python3.7 安装与卸载教程图解 1.进入官网https://www.python.org/,点击Downloads下的Windows按钮,进入下载页面. 2.如下图所示,点击 ...
- MySQL8.0安装教程,在Linux环境安装MySQL8.0教程,最新教程 超详细
在Linux系统安装MySQL8.0,网上已经有很多的教程了,到自己安装的时候却发现各种各样的问题,现在把安装过程和遇到的问题都记录下来. 需要说明的一点是我使用的是SSH secure shell ...
- JavaScript 导入导出教程与示例
我很高兴今天有机会与您分享 JavaScript 导入和导出语句.import 语句用于导入由另一个 JavaScript 文件导出的绑定. 代码可管理性是 Web 开发中最重要的因素.如果您正在构建 ...
最新文章
- 复旦邱锡鹏团队:Transformer最新综述!
- 里程碑:DTrace 切换到 GPL 许可证
- dell服务器 稳定性,设计优秀管理方便 戴尔R710服务器评测
- mysql性能调优快捷键_mysql优化篇
- ZedGraph5.1.5源码分析去掉鼠标悬浮内容闪烁问题(附源码下载)
- 沉浸式模式与沉浸式状态栏
- Highcharts改Y轴的刻度值
- Android中long类型对应Java/Jni/C++中的类型
- 杭电 3400 Line belt 解题报告
- 商品订单(增删改查):新增订单;批量删除,发货;模糊查询,下拉菜单内容过滤(含时间);全选反选,列名排序
- 关于ROHDESCHWARZ公司电流探头EZ-17系数修正的说明
- 导数卷积 牛客 NTT
- 微信平台发布谣言整治报告:近半年处罚公众号约4.5万个
- 塔塔露也能学会的算法(1) | dijkstra从入门到放弃
- 2019华为优招-南研所
- windows10家庭版打开组策略
- 英语名篇——关于《论学习》的读后感
- Linux常见命令 24 - RPM命名管理-包命名与依赖性
- 阿里飞冰官网 Icework 一个集成框架 构建前端工具
- mysql常用查看表结构的SQL语句
热门文章
- 现代操作系统(原书第四版)课后题答案 —— 第一章 引论
- flash mx拖拽实例_Flash MX 2004 Professional的百叶窗过渡效果
- WEB前端优化必备压缩工具YUI-compressor详解
- delphi7 调用XE编译的DLL遇到的坑
- 动物老了没生存能力时,它的子女会养育照顾它吗?
- 计算机考试打字软件,书记员考试必备!打字练习软件及电脑练习打字快速提高方法?...
- 超实用ExtJS教程100例
- h264文件视频存储格式和音频存储格式
- widthStep、width
- TrueCrypt 密码找回工具