1.注意win下直接复制进linux 改一下--等

sqoop-list-databases --connect jdbc:mysql://122.206.79.212:3306/ --username root -P

  

先看一下有什么数据库,发现有些数据库,能查询到的数据库才能导入,很奇怪。

2.导入到hdfs

sqoop import  --connect jdbc:mysql://122.206.79.212:3306/dating --username root --password 123456 --table t_rec_top --driver com.mysql.jdbc.Driver

  那个数据库 端口号 账户名 密码 那个表 不需要加上驱动  那没指定导入到hdfs的哪,肯定会有默认位置的

可以看出只有map任务 没有reduce任务

Warning: /home/hxsyl/Spark_Relvant/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hxsyl/Spark_Relvant/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
17/03/15 11:05:12 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
17/03/15 11:05:12 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/03/15 11:05:12 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time.
17/03/15 11:05:12 INFO manager.SqlManager: Using default fetchSize of 1000
17/03/15 11:05:12 INFO tool.CodeGenTool: Beginning code generation
17/03/15 11:05:13 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM t_rec_top AS t WHERE 1=0
17/03/15 11:05:13 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM t_rec_top AS t WHERE 1=0
17/03/15 11:05:13 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hxsyl/Spark_Relvant/hadoop-2.6.4/share/hadoop/mapreduce
Note: /tmp/sqoop-hxsyl/compile/ddeeb02cdbd25cddc2662317b89c80f1/t_rec_top.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/03/15 11:05:18 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hxsyl/compile/ddeeb02cdbd25cddc2662317b89c80f1/t_rec_top.jar
17/03/15 11:05:18 INFO mapreduce.ImportJobBase: Beginning import of t_rec_top
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hxsyl/Spark_Relvant/hadoop-2.6.4/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hxsyl/Spark_Relvant/hbase-1.2.4/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
17/03/15 11:05:19 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
17/03/15 11:05:19 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM t_rec_top AS t WHERE 1=0
17/03/15 11:05:21 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
17/03/15 11:05:21 INFO client.RMProxy: Connecting to ResourceManager at CentOSMaster/192.168.58.180:8032
17/03/15 11:05:28 INFO db.DBInputFormat: Using read commited transaction isolation
17/03/15 11:05:28 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(id), MAX(id) FROM t_rec_top
17/03/15 11:05:28 INFO mapreduce.JobSubmitter: number of splits:1
17/03/15 11:05:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1489547007191_0001
17/03/15 11:05:30 INFO impl.YarnClientImpl: Submitted application application_1489547007191_0001
17/03/15 11:05:31 INFO mapreduce.Job: The url to track the job: http://CentOSMaster:8088/proxy/application_1489547007191_0001/
17/03/15 11:05:31 INFO mapreduce.Job: Running job: job_1489547007191_0001
17/03/15 11:05:48 INFO mapreduce.Job: Job job_1489547007191_0001 running in uber mode : false
17/03/15 11:05:48 INFO mapreduce.Job:  map 0% reduce 0%
17/03/15 11:06:06 INFO mapreduce.Job:  map 100% reduce 0%
17/03/15 11:06:07 INFO mapreduce.Job: Job job_1489547007191_0001 completed successfully
17/03/15 11:06:07 INFO mapreduce.Job: Counters: 30File System CountersFILE: Number of bytes read=0FILE: Number of bytes written=127058FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=99HDFS: Number of bytes written=21HDFS: Number of read operations=4HDFS: Number of large read operations=0HDFS: Number of write operations=2Job Counters Launched map tasks=1Other local map tasks=1Total time spent by all maps in occupied slots (ms)=13150Total time spent by all reduces in occupied slots (ms)=0Total time spent by all map tasks (ms)=13150Total vcore-milliseconds taken by all map tasks=13150Total megabyte-milliseconds taken by all map tasks=13465600Map-Reduce FrameworkMap input records=1Map output records=1Input split bytes=99Spilled Records=0Failed Shuffles=0Merged Map outputs=0GC time elapsed (ms)=183CPU time spent (ms)=1200Physical memory (bytes) snapshot=107761664Virtual memory (bytes) snapshot=2069635072Total committed heap usage (bytes)=30474240File Input Format Counters Bytes Read=0File Output Format Counters Bytes Written=21
17/03/15 11:06:07 INFO mapreduce.ImportJobBase: Transferred 21 bytes in 46.7701 seconds (0.449 bytes/sec)
17/03/15 11:06:07 INFO mapreduce.ImportJobBase: Retrieved 1 records.

  

创建一个user/yonhumig的目录,其中t_rec_top里就是我们的数据,不过没有标头,可以看出只是以m,表示map任务就结束了

wc00是配置文件

"AS  1
"License");   1
${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}. 1
(the    1
-->  3
2.0 1
<!-- 3
</configuration>  1
</description>    1
</property>   15
<?xml    1
<configuration>   1
<description>Amount   1
<description>List 1
<description>Number   1
<description>The  7
<description>Where    1
<description>Whether  1
<description>fair-scheduler   1
<description>the  1
<name>yarn.log-aggregation-enable</name>    1
<name>yarn.nodemanager.aux-services</name>  1
<name>yarn.nodemanager.local-dirs</name>    1
<name>yarn.nodemanager.remote-app-log-dir</name>    1
<name>yarn.nodemanager.resource.cpu-vcores</name>   1
<name>yarn.nodemanager.resource.memory-mb</name>    1
<name>yarn.resourcemanager.address</name>   1
<name>yarn.resourcemanager.admin.address</name> 1
<name>yarn.resourcemanager.hostname</name>  1
<name>yarn.resourcemanager.resource-tracker.address</name>  1
<name>yarn.resourcemanager.scheduler.address</name> 1
<name>yarn.resourcemanager.scheduler.class</name>   1
<name>yarn.resourcemanager.webapp.address</name>    1
<name>yarn.resourcemanager.webapp.https.address</name>  1
<name>yarn.scheduler.fair.allocation.file</name>    1
<property>    15
<value>${yarn.home.dir}/etc/hadoop/fairscheduler.xml</value>    1
<value>${yarn.resourcemanager.hostname}:8030</value>    1
<value>${yarn.resourcemanager.hostname}:8031</value>    1
<value>${yarn.resourcemanager.hostname}:8032</value>    1
<value>${yarn.resourcemanager.hostname}:8033</value>    1
<value>${yarn.resourcemanager.hostname}:8088</value>    1
<value>${yarn.resourcemanager.hostname}:8090</value>    1
<value>/home/hxsyl/Spark_Relvant/yarn/local</value> 1
<value>/tmp/logs</value>    1
<value>12</value>   1
<value>30720</value>    1
<value>CentOSMaster</value> 1
<value>mapreduce_shuffle</value>    1
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>   1
<value>true</value> 1
ANY 1
An  1
Apache  1
BASIS,  1
CONDITIONS  1
CPU 1
Configs 1
IS"    1
Individual  1
KIND,   1
LICENSE 1
License 3
License,    1
License.    2
Licensed    1
MB, 1
Manager 1
OF  1
OR  1
RM  3
RM.</description> 2
Resource    1
See 2
Site    1
Unless  1
Version 1
WARRANTIES  1
WITHOUT 1
YARN    1
You 1
a   1
a-zA-Z0-9_  1
accompanying    1
adddress    1
address 4
admin   1
aggregate   1
aggregation</description> 1
agreed  1
allocated   2
an  1
and 2
applicable  1
application's  1
application.</description>    2
applications    1
as  1
at  1
be  4
by  1
called  1
can 3
class   1
compliance  1
conf    1
configuration   1
contain 1
container_${contid},    1
containers'    1
containers.</description> 2
copy    1
cores   1
directories 1
directories,    1
directory   1
distributed 2
either  1
enable  1
except  1
express 1
file    2
file.   1
files   1
for 3
found   1
governing   1
hostname    1
http    1
http://www.apache.org/licenses/LICENSE-2.0  1
https   1
implied.    1
in  4
in. 1
in: 1
interface   1
interface.</description>  2
is  1
language    1
law 1
limitations 1
localized   2
location</description>    1
log 1
logs    1
manager 1
may 2
memory, 1
name    1
not 2
numbers</description> 1
obtain  1
of  11
on  1
only    1
or  2
permissions 1
physical    1
properties  1
required    1
resource    1
scheduler   1
scheduler.</description>  1
service 1
should  1
software    1
specific    2
start   1
store   1
subdirectories  1
that    2
the 15
this    1
this.   1
to  5
to.</description> 1
under   3
use 2
valid   1
version="1.0"?>   1
web 2
will    2
with    2
work    1
writing,    1
you 1

  

--target-dir  /path       放到那个路径        -m :标书numberMapper

从hdfs上打开的文件可以看出  默认是逗号       --fields-terminated-by '\t'   这个分隔符不是为了写入到hdfs来分割,而是原始数据的分隔符

--columns 'id,account,income'    只导入某些特定的列

符合特定条件的列才被导入,--where "id>2 and id <9"

从多个表查询或者指定查询语句  --query "select * form t_detail where id >5 and $CONDITIONS"      $那个必须加

但是如果-m大于1 就需要指定各个Mapper读取几条记录或者找分隔符 --split-by t_detail.id   $CONDITIONS就是根据分割的信息找到记录条数,进而切分数据,

建议使用单引号 使用双引号需要转义, --后边跟的是全称 -是简写

单引号与双引号的最大不同在于双引号仍然可以保有变量的内容,但单引号内仅能是
一般字符 ,而不会有特殊符号。我们以底下的例子做说明:假设您定义了一个变量, 
name=VBird ,现在想以 name 这个变量的内容定义出 myname 显示 VBird its me 这
个内容,要如何订定呢?

[root@linux ~]# name=VBird 
[root@linux ~]# echo $name 
VBird 
[root@linux ~]# myname="$name its me" 
[root@linux ~]# echo $myname 
VBird its me 
[root@linux ~]# myname='$name its me' 
[root@linux ~]# echo $myname 
$name its me

发现了吗?没错!使用了单引号的时候,那么 $name 将失去原有的变量内容, 仅为
一般字符的显示型态而已!这里必需要特别小心在意!

转载于:https://www.cnblogs.com/hxsyl/p/6553033.html

Sqoop导入到hdfs相关推荐

  1. Sqoop 导入数据到hdfs和hive总结

    使用帮助 Sqoop help 查看所有的sqoop的帮助 Sqoop help import 查看import命令的帮助 Eg: sqoop导入到HDFS的一个简单示例: sqoop import ...

  2. Sqoop(三)将关系型数据库中的数据导入到HDFS(包括hive,hbase中)

    本文转自:https://www.cnblogs.com/yfb918/p/10855170.html 一.说明: 将关系型数据库中的数据导入到 HDFS(包括 Hive, HBase) 中,如果导入 ...

  3. sqoop——将mysql数据库的数据表导入到hdfs上

    sqoop是用来将mysql数据库上的内容导入到hdfs,或者将hdfs上的数据导入mysql的(相互之间转化)一个工具. 前提:开启hdfs.yarn服务,关闭safe模式 (1)首先,在mysql ...

  4. sqoop导入-hdfs

    Sqoop的数据导入 "导入工具"导入单个表从RDBMS到HDFS.表中的每一行被视为HDFS的记录.所有记录都存储为文本文件的文本数据(或者Avro.sequence文件等二进制 ...

  5. sqoop操作之Oracle导入到HDFS

    导入表的所有字段 sqoop import --connect jdbc:oracle:thin:@192.168.1.100:1521:ORCL \ --username SCOTT --passw ...

  6. sqoop从mysql导入数据到hdfs_利用Sqoop将数据从数据库导入到HDFS

    基本使用 如下面这个shell脚本: #Oracle的连接字符串,其中包含了Oracle的地址,SID,和端口号 CONNECTURL=jdbc:oracle:thin:@20.135.60.21:1 ...

  7. Sqoop将MySQL数据导入到HDFS和Hive中

    一.将数据从mysql导入 HDFS sqoop import --connect jdbc:mysql://192.168.76.1:3306/workflow --username root -- ...

  8. sqoop 导入mysql blob字段,Sqoop导入的数据格式问题

    Sqoop简单介绍 Sqoop是用来在Hadoop平台和其他结构性存储(比如关系型数据库)之间解决大量数据传输问题的工具.也就是说可以从Oracle,MySQL,PostgreSQL等数据库中将数据传 ...

  9. oracle导入初始数据死机,Sqoop导入Oracle数据至hive卡死在hive.HiveImport: Connecting to jdbc:hive2不执行...

    环境信息: HDP-3.1.4 已经下载好odjbc8.jar驱动程序放置在/usr/hdp/current/sqoop-client/lib/目录 Sqoop读取Oracle数据库数据导入Hive时 ...

  10. sqoop 导入到hive字段全是null_Sqoop 一点通

    sqoop 是什么? sqoop 主要用于异构数据: 1. 将数据从hadoop,hive 导入.导出到关系型数据库mysql 等; 2. 将关系型数据库 mysql 中数据导入.导出到 hadoop ...

最新文章

  1. 修改 Android 5.1 默认设置
  2. 学习正点原子讲解FreeRTOS中断管理心得和cortex-M3权威指南笔记
  3. Yoda 表示法错在哪里
  4. 带有Gluon Ignite和Dagger的JavaFX中的依赖注入
  5. autojs调用java识字_autojs实现抽象类的继承
  6. controller接收json数据_SpringMVC实现多种数据类型绑定
  7. 5 QM配置-质量计划配置-编辑缺陷类型的选择集
  8. selenium使用TestNG实现DDT
  9. C++虚函数的实现原理(最通俗但并不想专业的解释)
  10. 9 | Spatial-based GNN/convolution模型之GIN
  11. android飞屏app下载地址,飞屏下载安卓最新版_手机app官方版免费安装下载_豌豆荚...
  12. Vue.js学习与理解
  13. 初探Bootstrap
  14. 发布阿里云OSS for phpcmsV9整合教程
  15. 干货 | 120 道机器学习面试题!备战春招
  16. GitHub每月优秀热门项目推荐:2021年12月
  17. iOS永久不掉签名工具,TrollStore超详使用教程
  18. 怎么注销百度云服务器账号,百度网盘如何注销账号?百度网盘注销账号的方法步骤...
  19. BiliBili2020校招笔试题
  20. armbian 斐讯n1_尝试使用斐讯N1在armbian终端下播放音乐

热门文章

  1. [分享]Java 线程池的原理与实现
  2. 数据结构与算法 第一章 引入
  3. 图片识别引擎tesseract-ocr安装
  4. 教育部建议采纳:给予导师决定硕博士能否毕业的自主权!
  5. 【深度学习】神经网络为何非激活函数不可?
  6. 论文阅读|How Does Batch Normalizetion Help Optimization
  7. 191030_Lda主题模型
  8. 如何更好地刷题?谈谈我的一点看法
  9. 基于DEAP库的python进化算法--遗传算法实践--非线性函数寻优
  10. 自然语言处理——TF-IDF文本表示