Alex 的 Hadoop 菜鸟教程: 第14课 Sqoop1 从Hbase导出mysql
原帖地址: http://blog.csdn.net/nsrainbow/article/details/41697763
Hbase导出到mysql
数据准备
mysql建立空表
CREATE TABLE `employee` ( `rowkey` int(11) NOT NULL,`id` int(11) NOT NULL,`name` varchar(20) NOT NULL, PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Hbase建立employee表
hbase(main):005:0> create 'employee','info'
0 row(s) in 0.4740 seconds=> Hbase::Table - employee
hbase(main):006:0> put 'employee',1,'info:id',1
0 row(s) in 0.2080 secondshbase(main):008:0> scan 'employee'
ROW COLUMN+CELL 1 column=info:id, timestamp=1417591291730, value=1
1 row(s) in 0.0610 secondshbase(main):009:0> put 'employee',1,'info:name','peter'
0 row(s) in 0.0220 secondshbase(main):010:0> scan 'employee'
ROW COLUMN+CELL 1 column=info:id, timestamp=1417591291730, value=1 1 column=info:name, timestamp=1417591321072, value=peter
1 row(s) in 0.0450 secondshbase(main):011:0> put 'employee',2,'info:id',2
0 row(s) in 0.0370 secondshbase(main):012:0> put 'employee',2,'info:name','paul'
0 row(s) in 0.0180 secondshbase(main):013:0> scan 'employee'
ROW COLUMN+CELL 1 column=info:id, timestamp=1417591291730, value=1 1 column=info:name, timestamp=1417591321072, value=peter 2 column=info:id, timestamp=1417591500179, value=2 2 column=info:name, timestamp=1417591512075, value=paul
2 row(s) in 0.0440 seconds
建立Hive外部表
hive> CREATE EXTERNAL TABLE h_employee(key int, id int, name string) > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key, info:id,info:name")> TBLPROPERTIES ("hbase.table.name" = "employee");
OK
Time taken: 0.324 seconds
hive> select * from h_employee;
OK
1 1 peter
2 2 paul
Time taken: 1.129 seconds, Fetched: 2 row(s)
建立Hive原生表
CREATE TABLE h_employee_export(key INT, id INT, name STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\054';
我们去看下实际存储的文本文件是什么样子的
$ hdfs dfs -cat /user/hive/warehouse/h_employee_export/000000_0
1,1,peter
2,2,paul
开始导出
源Hive表导入数据到临时表
第一步先将数据从 h_employee(基于Hbase的外部表)导入到 h_employee_export(原生Hive表)
hive> insert overwrite table h_employee_export select * from h_employee;
hive> select * from h_employee_export;
OK
1 1 peter
2 2 paul
Time taken: 0.359 seconds, Fetched: 2 row(s)
$ hdfs dfs -cat /user/hive/warehouse/h_employee_export/000000_0
1,1,peter
2,2,paul
从Hive导出数据到mysql
$ sqoop export --connect jdbc:mysql://host1:3306/sqoop_test --username root --password root --table employee --m 1 --export-dir /user/hive/warehouse/h_employee_export/
Warning: /usr/lib/sqoop/../hive-hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
14/12/05 08:49:35 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.0.1
14/12/05 08:49:35 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
14/12/05 08:49:35 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
14/12/05 08:49:35 INFO tool.CodeGenTool: Beginning code generation
14/12/05 08:49:36 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee` AS t LIMIT 1
14/12/05 08:49:36 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee` AS t LIMIT 1
14/12/05 08:49:36 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-wlsuser/compile/d16eb4166baf6a1e885d7df0e2638685/employee.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
14/12/05 08:49:39 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-wlsuser/compile/d16eb4166baf6a1e885d7df0e2638685/employee.jar
14/12/05 08:49:39 INFO mapreduce.ExportJobBase: Beginning export of employee
14/12/05 08:49:41 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
14/12/05 08:49:43 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
14/12/05 08:49:43 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
14/12/05 08:49:43 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/12/05 08:49:43 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/192.111.78.111:8032
14/12/05 08:49:45 INFO input.FileInputFormat: Total input paths to process : 1
14/12/05 08:49:45 INFO input.FileInputFormat: Total input paths to process : 1
14/12/05 08:49:45 INFO mapreduce.JobSubmitter: number of splits:1
14/12/05 08:49:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1406097234796_0037
14/12/05 08:49:46 INFO impl.YarnClientImpl: Submitted application application_1406097234796_0037
14/12/05 08:49:46 INFO mapreduce.Job: The url to track the job: http://hadoop01:8088/proxy/application_1406097234796_0037/
14/12/05 08:49:46 INFO mapreduce.Job: Running job: job_1406097234796_0037
14/12/05 08:49:59 INFO mapreduce.Job: Job job_1406097234796_0037 running in uber mode : false
14/12/05 08:49:59 INFO mapreduce.Job: map 0% reduce 0%
14/12/05 08:50:10 INFO mapreduce.Job: map 100% reduce 0%
14/12/05 08:50:10 INFO mapreduce.Job: Job job_1406097234796_0037 completed successfully
14/12/05 08:50:10 INFO mapreduce.Job: Counters: 30File System CountersFILE: Number of bytes read=0FILE: Number of bytes written=99761FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=166HDFS: Number of bytes written=0HDFS: Number of read operations=4HDFS: Number of large read operations=0HDFS: Number of write operations=0Job Counters Launched map tasks=1Data-local map tasks=1Total time spent by all maps in occupied slots (ms)=8805Total time spent by all reduces in occupied slots (ms)=0Total time spent by all map tasks (ms)=8805Total vcore-seconds taken by all map tasks=8805Total megabyte-seconds taken by all map tasks=9016320Map-Reduce FrameworkMap input records=2Map output records=2Input split bytes=144Spilled Records=0Failed Shuffles=0Merged Map outputs=0GC time elapsed (ms)=97CPU time spent (ms)=1360Physical memory (bytes) snapshot=167555072Virtual memory (bytes) snapshot=684212224Total committed heap usage (bytes)=148897792File Input Format Counters Bytes Read=0File Output Format Counters Bytes Written=0
14/12/05 08:50:10 INFO mapreduce.ExportJobBase: Transferred 166 bytes in 27.0676 seconds (6.1328 bytes/sec)
14/12/05 08:50:10 INFO mapreduce.ExportJobBase: Exported 2 records.
任务状态查看
在这段日志中有这样一句话
14/12/05 08:49:46 INFO mapreduce.Job: The url to track the job: http://hadoop01:8088/proxy/application_1406097234796_0037/
意思是你可以用浏览器访问这个地址去看下任务的执行情况,如果你的任务长时间卡主没结束就是出错了,可以去这个地址查看详细的错误日志
mysql> select * from employee;
+--------+----+-------+
| rowkey | id | name |
+--------+----+-------+
| 1 | 1 | peter |
| 2 | 2 | paul |
+--------+----+-------+
2 rows in set (0.00 sec)mysql>
导入成功
空值处理
--input-null-string "\\\\N" --input-null-non-string "\\\\N"
变成
sqoop export --connect jdbc:mysql://localhost:3306/sqoop_test --username root --password root --table employee --m 1 --export-dir /user/hive/warehouse/h_employee_export/ --input-null-string "\\\\N" --input-null-non-string "\\\\N"
这两句话的意思是如果遇到空值就插入null,要不然出现空值是会报错的
参考资料
- http://hadooped.blogspot.jp/2013/06/apache-sqoop-part-3-data-transfer.html
Alex 的 Hadoop 菜鸟教程: 第14课 Sqoop1 从Hbase导出mysql相关推荐
- Alex 的 Hadoop 菜鸟教程: 第5课 YARN 安装以及helloworld (基于centos的CDH)
原帖地址:http://blog.csdn.net/nsrainbow/article/details/36627675 新老MapReduce的比较 说到YARN肯定要先说下老的MapReduce ...
- Alex 的 Hadoop 菜鸟教程: 第10课 Hive 安装和使用教程
原帖地址: http://blog.csdn.net/nsrainbow/article/details/41748863 最新课程请关注原作者博客 声明 本文基于Centos 6.x + CDH 5 ...
- 三维数字沙盘电子沙盘人工智能地理信息系统开发教程第14课
三维数字沙盘电子沙盘人工智能地理信息系统开发教程第14课 很久没有写了,主要前段时间在针对怎么显示高精度的 倾斜数据而努力,现在终于实现了效果不错.以前的版本显示倾斜数据控制不太好.最近SDK也改动了 ...
- 云知梦BootStrap重入门到精通项目实战 BootStrap基础教程 共14课
<云知梦BootStrap前端框架> ├第1集 BootStrap安装和栅格系统.avi ├第2集 BootStrap排版样式.avi ├第3集 BootStrap代码.表格和表单样式.a ...
- 第14课 Altium Designer20(AD20)+VESC6.4实战教程:PCB总体布局介绍(北冥有鱼)
第14课 Altium Designer20(AD20)+VESC6.4实战教程:PCB总体布局介绍(北冥有鱼)
- Hadoop安装教程_单机/伪分布式配置_Hadoop2.6.0/Ubuntu14.04
给力星 追逐内心的平和 首页 笔记 搜藏 代码 音乐 关于 Hadoop安装教程_单机/伪分布式配置_Hadoop2.6.0/Ubuntu14.04 2014-08-09 (updated: 2016 ...
- python scrapy菜鸟教程_scrapy学习笔记(一)快速入门
安装Scrapy Scrapy是一个高级的Python爬虫框架,它不仅包含了爬虫的特性,还可以方便的将爬虫数据保存到csv.json等文件中. 首先我们安装Scrapy. pip install sc ...
- python菜鸟教程字典-python教程菜鸟教程学习路线
python教程菜鸟教程学习路线,需要学Python 环境搭建.Python 中文编码.Python 基础语法.Python 变量类型.Python 运算符.Python 条件语句.Python 循环 ...
- 《Adobe Premiere Pro CC经典教程》——14.6 特殊颜色效果
本节书摘来自异步社区<Adobe Premiere Pro CC经典教程>一书中的第14课,第14.6节,作者 [美]Adobe公司 ,译者 裴强,宋松,更多章节内容可以访问云栖社区&qu ...
- 菜鸟教程-Javascript学习笔记-JS函数之前
教程连接是: https://www.runoob.com/js/js-tutorial.html DOM(一些操作页面元素的方法) BOM(一些操作浏览器的方法) ################# ...
最新文章
- 技术图文:03 结构型设计模式(上)
- IntelliJ IDEA 2020.2.4款 神级超级牛逼插件推荐
- Python自学笔记-列表生成式(来自廖雪峰的官网Python3)
- 梯度下降法进行线性回归---------二维及多维
- QueryList4采集-图片本地化
- 【Python】Paramiko模块实现Linux服务器远程文件操作
- Python的filter、map、reduce与lambda结合使用
- 智能家居系统c语言源代码,智能家居软件设计(附源程序).doc
- 中国“秃”如其来的头发经济
- 微信jssdk常见错误及解决方法
- Draftsharks回顾周末梦幻足球
- 基于ARM裸机的知识点总结(9)------- S5PV210的定时器、看门狗和RTC
- 浅谈SSD,eMMC,UFS
- 完美解决微信公众号多域名授权登录的问题
- C++ 重载操作符 <<实现模拟输出语句: cout << endl;
- PS将图片的背景填充变为无色
- excel离散度图表怎么算_怎样在Excel中计算散点图面积
- 【R语言】使用nnet过程中报错Error in eval(predvars, data, env) : object ‘naulong‘ not found
- java 什么是ajax_什么是AJAX?
- 客单价3000一周热卖1000万!又有哪些快手神奇商品爆单了?