Using Sqoop 1.4.6 With Hadoop 2.7.4

本文主要描述Sqoop 1.4.6的安装配置以及使用。
一、安装配置
1、Sqoop安装

[hadoop@hdp01 ~]$ wget http://mirror.bit.edu.cn/apache/sqoop/1.4.6/sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
[hadoop@hdp01 ~]$ tar -xzf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
[hadoop@hdp01 ~]$ mv sqoop-1.4.6.bin__hadoop-2.0.4-alpha /u01/sqoop
--编辑Sqoop环境变量
[hadoop@hdp01 ~]$ cd /u01/sqoop/conf
[hadoop@hdp01 conf]$ cp sqoop-env-template.sh sqoop-env.sh
[hadoop@hdp01 conf]$ vi sqoop-env.sh
export HADOOP_COMMON_HOME=/u01/hadoop
export HADOOP_MAPRED_HOME=/u01/hadoop
export HBASE_HOME=/u01/hbase
export HIVE_HOME=/u01/hive
export ZOOCFGDIR=/u01/zookeeper/conf
--注释掉configure-sqoop中的以下内容
#if [ -z "${HCAT_HOME}" ]; then
#  if [ -d "/usr/lib/hive-hcatalog" ]; then
#    HCAT_HOME=/usr/lib/hive-hcatalog
#  elif [ -d "/usr/lib/hcatalog" ]; then
#    HCAT_HOME=/usr/lib/hcatalog
#  else
#    HCAT_HOME=${SQOOP_HOME}/../hive-hcatalog
#    if [ ! -d ${HCAT_HOME} ]; then
#       HCAT_HOME=${SQOOP_HOME}/../hcatalog
#    fi
#  fi
#fi
#if [ -z "${ACCUMULO_HOME}" ]; then
#  if [ -d "/usr/lib/accumulo" ]; then
#    ACCUMULO_HOME=/usr/lib/accumulo
#  else
#    ACCUMULO_HOME=${SQOOP_HOME}/../accumulo
#  fi
#fi
## Moved to be a runtime check in sqoop.
#if [ ! -d "${HCAT_HOME}" ]; then
#  echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail."
#  echo 'Please set $HCAT_HOME to the root of your HCatalog installation.'
#fi
#
#if [ ! -d "${ACCUMULO_HOME}" ]; then
#  echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail."
#  echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.'
#fi
--编辑用户环境环境变量
[hadoop@hdp01 ~]$ vi .bash_profile
export SQOOP_HOME=/u01/sqoop
export SQOOP_CONF_DIR=$SQOOP_HOME/conf
export SQOOP_CLASSPATH=$SQOOP_CONF_DIR
export PATH=$PATH:$SQOOP_HOME/bin
[hadoop@hdp01 ~]$ source .bash_profile
--验证sqoop安装
[hadoop@hdp01 ~]$ sqoop version
2017-12-28 09:30:01,801 [myid:] - INFO  [main:Sqoop@92] - Running Sqoop version: 1.4.6
Sqoop 1.4.6
git commit id c0c5a81723759fa575844a0a1eae8f510fa32c25
Compiled by root on Mon Apr 27 14:38:36 CST 2015
或者运行sqoop-version
--拷贝jdbc驱动
将MySQL、PostgreSQL以及Oracle的jdbc驱动拷贝到$SQOOP_HOME/lib

二、Sqoop使用
1、Sqoop测试各个jdbc驱动连接
1.1 Sqoop与MySQL的连接

[hadoop@hdp01 bin]$ sqoop list-tables --username root -P --connect jdbc:mysql://192.168.120.92:3306/smsqw?useSSL=false
2017-12-28 09:38:19,587 [myid:] - INFO  [main:Sqoop@92] - Running Sqoop version: 1.4.6
Enter password:
2017-12-28 09:38:23,067 [myid:] - INFO  [main:MySQLManager@69] - Preparing to use a MySQL streaming resultset.
Phone
TestPhone
history_store
tbAreaprefix
tbAreaprefix_bak
tbBill
tbBilltmp
tbCat
tbContact
tbDataPath
tbDeliverMsg
tbDeliverMsg2
tbDest
tbLocPrefix
tbMessage
tbPrice
tbReceiver
tbSSLog
tbSendState
tbSendState2
tbSmsSendState
tbTest
tbUser

1.2 Sqoop与PostgreSQL的连接

[hadoop@hdp01 ~]$ sqoop list-tables --username rhnuser -P --connect jdbc:postgresql://192.168.120.93:5432/rhndb
2017-12-28 09:40:24,842 [myid:] - INFO  [main:Sqoop@92] - Running Sqoop version: 1.4.6
Enter password:
2017-12-28 09:40:29,775 [myid:] - INFO  [main:SqlManager@98] - Using default fetchSize of 1000
rhnservergroupmembers
rhntemplatestring
rhnservergrouptypefeature
rhnserverhistory
qrtz_fired_triggers

1.3 Sqoop与Oracle的连接

[hadoop@hdp01 ~]$ sqoop list-tables --username spwuser -P --connect jdbc:oracle:thin:@192.168.120.121:1521/rhndb --driver oracle.jdbc.driver.OracleDriver
2017-12-28 10:01:43,337 [myid:] - INFO  [main:Sqoop@92] - Running Sqoop version: 1.4.6
Enter password:
2017-12-28 10:01:43,425 [myid:] - INFO  [main:SqlManager@98] - Using default fetchSize of 1000
rhnservergroupmembers
rhntemplatestring
rhnservergrouptypefeature
rhnserverhistory
qrtz_fired_triggers

1.4 Sqoop与Hive的连接
基于PostgreSQL在hive上创建一个名为rhnpackagefile的表，但不导入数据，后面介绍数据导入。

[hadoop@hdp01 ~]$ sqoop create-hive-table --connect jdbc:postgresql://192.168.120.93:5432/rhndb --table rhnpackagefile --username rhnuser -P --hive-database hivedb
2017-12-28 10:32:01,376 [myid:] - INFO  [main:Sqoop@92] - Running Sqoop version: 1.4.6
Enter password:
2017-12-28 10:32:04,699 [myid:] - INFO  [main:BaseSqoopTool@1353] - Using Hive-specific delimiters for output. You can override
2017-12-28 10:32:04,699 [myid:] - INFO  [main:BaseSqoopTool@1354] - delimiters with --fields-terminated-by, etc.
2017-12-28 10:32:04,819 [myid:] - INFO  [main:SqlManager@98] - Using default fetchSize of 1000
2017-12-28 10:32:05,015 [myid:] - INFO  [main:SqlManager@757] - Executing SQL statement: SELECT t.* FROM "rhnpackagefile" AS t LIMIT 1
2017-12-28 10:32:05,674 [myid:] - INFO  [main:HiveImport@194] - Loading uploaded data into Hive
2017-12-28 10:32:09,089 [myid:] - INFO  [Thread-6:LoggingAsyncSink$LoggingThread@85] - SLF4J: Class path contains multiple SLF4J bindings.
2017-12-28 10:32:09,090 [myid:] - INFO  [Thread-6:LoggingAsyncSink$LoggingThread@85] - SLF4J: Found binding in [jar:file:/u01/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
2017-12-28 10:32:09,090 [myid:] - INFO  [Thread-6:LoggingAsyncSink$LoggingThread@85] - SLF4J: Found binding in [jar:file:/u01/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
2017-12-28 10:32:09,090 [myid:] - INFO  [Thread-6:LoggingAsyncSink$LoggingThread@85] - SLF4J: Found binding in [jar:file:/u01/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
2017-12-28 10:32:09,091 [myid:] - INFO  [Thread-6:LoggingAsyncSink$LoggingThread@85] - SLF4J: Found binding in [jar:file:/u01/tez/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
2017-12-28 10:32:09,091 [myid:] - INFO  [Thread-6:LoggingAsyncSink$LoggingThread@85] - SLF4J: Found binding in [jar:file:/u01/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
2017-12-28 10:32:09,091 [myid:] - INFO  [Thread-6:LoggingAsyncSink$LoggingThread@85] - SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
2017-12-28 10:32:09,095 [myid:] - INFO  [Thread-6:LoggingAsyncSink$LoggingThread@85] - SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2017-12-28 10:32:11,996 [myid:] - INFO  [Thread-6:LoggingAsyncSink$LoggingThread@85] -
2017-12-28 10:32:11,996 [myid:] - INFO  [Thread-6:LoggingAsyncSink$LoggingThread@85] - Logging initialized using configuration in jar:file:/u01/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true
2017-12-28 10:32:16,650 [myid:] - INFO  [Thread-6:LoggingAsyncSink$LoggingThread@85] - OK
2017-12-28 10:32:16,783 [myid:] - INFO  [Thread-6:LoggingAsyncSink$LoggingThread@85] - Time taken: 3.433 seconds
2017-12-28 10:32:17,248 [myid:] - INFO  [main:HiveImport@242] - Hive import complete.

2、数据迁移
2.1 PostgreSQL☞Hive

[hadoop@hdp01 ~]$ sqoop import --connect jdbc:postgresql://192.168.120.93:5432/rhndb --table rhnpackagefile --username rhnuser -P --fields-terminated-by ',' --hive-import --hive-database hivedb --columns package_id,capability_id,device,inode,file_mode,username,groupname,rdev,file_size,mtime,checksum_id,linkto,flags,verifyflags,lang,created,modified --split-by modified -m 4
2017-12-28 11:24:46,666 [myid:] - INFO  [main:Sqoop@92] - Running Sqoop version: 1.4.6
Enter password:
2017-12-28 11:24:48,891 [myid:] - INFO  [main:SqlManager@98] - Using default fetchSize of 1000
2017-12-28 11:24:48,894 [myid:] - INFO  [main:CodeGenTool@92] - Beginning code generation
2017-12-28 11:24:49,091 [myid:] - INFO  [main:SqlManager@757] - Executing SQL statement: SELECT t.* FROM "rhnpackagefile" AS t LIMIT 1
2017-12-28 11:24:49,127 [myid:] - INFO  [main:CompilationManager@94] - HADOOP_MAPRED_HOME is /u01/hadoop
Note: /tmp/sqoop-hadoop/compile/ca09f6bb133fa32808220902aedc0437/rhnpackagefile.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
2017-12-28 11:24:50,481 [myid:] - INFO  [main:CompilationManager@330] - Writing jar file: /tmp/sqoop-hadoop/compile/ca09f6bb133fa32808220902aedc0437/rhnpackagefile.jar
2017-12-28 11:24:50,493 [myid:] - WARN  [main:PostgresqlManager@119] - It looks like you are importing from postgresql.
2017-12-28 11:24:50,493 [myid:] - WARN  [main:PostgresqlManager@120] - This transfer can be faster! Use the --direct
2017-12-28 11:24:50,494 [myid:] - WARN  [main:PostgresqlManager@121] - option to exercise a postgresql-specific fast path.
2017-12-28 11:24:50,495 [myid:] - INFO  [main:ImportJobBase@235] - Beginning import of rhnpackagefile
2017-12-28 11:24:50,496 [myid:] - INFO  [main:Configuration@1019] - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2017-12-28 11:24:50,634 [myid:] - INFO  [main:Configuration@1019] - mapred.jar is deprecated. Instead, use mapreduce.job.jar
2017-12-28 11:24:51,160 [myid:] - INFO  [main:Configuration@1019] - mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
2017-12-28 11:24:51,506 [myid:] - INFO  [main:TimelineClientImpl@123] - Timeline service address: http://hdp01:8188/ws/v1/timeline/
2017-12-28 11:24:51,696 [myid:] - INFO  [main:AHSProxy@42] - Connecting to Application History server at hdp01.thinkjoy.tt/192.168.120.96:10201
2017-12-28 11:24:53,801 [myid:] - INFO  [main:DBInputFormat@192] - Using read commited transaction isolation
2017-12-28 11:24:53,805 [myid:] - INFO  [main:Configuration@1019] - mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
2017-12-28 11:24:53,805 [myid:] - INFO  [main:DataDrivenDBInputFormat@147] - BoundingValsQuery: SELECT MIN("modified"), MAX("modified") FROM "rhnpackagefile"
2017-12-28 11:25:14,854 [myid:] - WARN  [main:TextSplitter@64] - Generating splits for a textual index column.
2017-12-28 11:25:14,854 [myid:] - WARN  [main:TextSplitter@65] - If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.
2017-12-28 11:25:14,854 [myid:] - WARN  [main:TextSplitter@67] - You are strongly encouraged to choose an integral split column.
2017-12-28 11:25:14,903 [myid:] - INFO  [main:JobSubmitter@396] - number of splits:6
2017-12-28 11:25:14,997 [myid:] - INFO  [main:JobSubmitter@479] - Submitting tokens for job: job_1514358672274_0009
2017-12-28 11:25:15,453 [myid:] - INFO  [main:YarnClientImpl@236] - Submitted application application_1514358672274_0009
2017-12-28 11:25:15,485 [myid:] - INFO  [main:Job@1289] - The url to track the job: http://hdp01:8088/proxy/application_1514358672274_0009/
2017-12-28 11:25:15,486 [myid:] - INFO  [main:Job@1334] - Running job: job_1514358672274_0009
2017-12-28 11:25:24,763 [myid:] - INFO  [main:Job@1355] - Job job_1514358672274_0009 running in uber mode : false
2017-12-28 11:25:24,764 [myid:] - INFO  [main:Job@1362] -  map 0% reduce 0%
2017-12-28 11:26:00,465 [myid:] - INFO  [main:Job@1362] -  map 17% reduce 0%
2017-12-28 11:26:01,625 [myid:] - INFO  [main:Job@1362] -  map 50% reduce 0%
2017-12-28 11:26:03,643 [myid:] - INFO  [main:Job@1362] -  map 83% reduce 0%
2017-12-28 11:34:22,028 [myid:] - INFO  [main:Job@1362] -  map 100% reduce 0%
2017-12-28 11:34:22,035 [myid:] - INFO  [main:Job@1373] - Job job_1514358672274_0009 completed successfully
2017-12-28 11:34:22,162 [myid:] - INFO  [main:Job@1380] - Counters: 31File System CountersFILE: Number of bytes read=0FILE: Number of bytes written=860052FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=913HDFS: Number of bytes written=3985558014HDFS: Number of read operations=24HDFS: Number of large read operations=0HDFS: Number of write operations=12Job Counters Killed map tasks=1Launched map tasks=7Other local map tasks=7Total time spent by all maps in occupied slots (ms)=1208611Total time spent by all reduces in occupied slots (ms)=0Total time spent by all map tasks (ms)=1208611Total vcore-seconds taken by all map tasks=1208611Total megabyte-seconds taken by all map tasks=4331661824Map-Reduce FrameworkMap input records=18680041Map output records=18680041Input split bytes=913Spilled Records=0Failed Shuffles=0Merged Map outputs=0GC time elapsed (ms)=4453CPU time spent (ms)=180780Physical memory (bytes) snapshot=1957969920Virtual memory (bytes) snapshot=30116270080Total committed heap usage (bytes)=1611661312File Input Format Counters Bytes Read=0File Output Format Counters Bytes Written=3985558014
2017-12-28 11:34:22,170 [myid:] - INFO  [main:ImportJobBase@184] - Transferred 3.7118 GB in 571.0001 seconds (6.6566 MB/sec)
2017-12-28 11:34:22,174 [myid:] - INFO  [main:ImportJobBase@186] - Retrieved 18680041 records.
2017-12-28 11:34:22,215 [myid:] - INFO  [main:SqlManager@757] - Executing SQL statement: SELECT t.* FROM "rhnpackagefile" AS t LIMIT 1
2017-12-28 11:34:22,245 [myid:] - INFO  [main:HiveImport@194] - Loading uploaded data into Hive
2017-12-28 11:34:28,609 [myid:] - INFO  [Thread-98:LoggingAsyncSink$LoggingThread@85] -
2017-12-28 11:34:28,609 [myid:] - INFO  [Thread-98:LoggingAsyncSink$LoggingThread@85] - Logging initialized using configuration in jar:file:/u01/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true
2017-12-28 11:34:31,619 [myid:] - INFO  [Thread-98:LoggingAsyncSink$LoggingThread@85] - OK
2017-12-28 11:34:31,622 [myid:] - INFO  [Thread-98:LoggingAsyncSink$LoggingThread@85] - Time taken: 1.666 seconds
2017-12-28 11:34:32,026 [myid:] - INFO  [Thread-98:LoggingAsyncSink$LoggingThread@85] - Loading data to table hivedb.rhnpackagefile
2017-12-28 11:36:14,783 [myid:] - INFO  [Thread-98:LoggingAsyncSink$LoggingThread@85] - OK
2017-12-28 11:36:14,908 [myid:] - INFO  [Thread-98:LoggingAsyncSink$LoggingThread@85] - Time taken: 103.285 seconds
2017-12-28 11:36:15,363 [myid:] - INFO  [main:HiveImport@242] - Hive import complete.
2017-12-28 11:36:15,372 [myid:] - INFO  [main:HiveImport@278] - Export directory is contains the _SUCCESS file only, removing the directory.

2.2 MySQL☞HDFS

[hadoop@hdp01 ~]$ sqoop import --connect jdbc:mysql://192.168.120.92:3306/smsqw --username smsqw -P --table tbDest --columns iMsgID,cDest,tTime,cSMID,iReSend,tLastProcess,cEnCode,tCreateDT,iNum,iResult,iPriority,iPayment,cState,tGpTime --split-by tGpTime --target-dir /user/DataSource/MySQL/tbDest
2017-12-28 14:36:52,550 [myid:] - INFO  [main:Sqoop@92] - Running Sqoop version: 1.4.6
Enter password:
2017-12-28 14:36:55,496 [myid:] - INFO  [main:MySQLManager@69] - Preparing to use a MySQL streaming resultset.
2017-12-28 14:36:55,497 [myid:] - INFO  [main:CodeGenTool@92] - Beginning code generation
Thu Dec 28 14:36:55 CST 2017 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2017-12-28 14:36:56,233 [myid:] - INFO  [main:SqlManager@757] - Executing SQL statement: SELECT t.* FROM `tbDest` AS t LIMIT 1
2017-12-28 14:36:56,253 [myid:] - INFO  [main:SqlManager@757] - Executing SQL statement: SELECT t.* FROM `tbDest` AS t LIMIT 1
2017-12-28 14:36:56,260 [myid:] - INFO  [main:CompilationManager@94] - HADOOP_MAPRED_HOME is /u01/hadoop
Note: /tmp/sqoop-hadoop/compile/4a4024e6b2baa336939a9310f627636a/tbDest.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
2017-12-28 14:36:57,637 [myid:] - INFO  [main:CompilationManager@330] - Writing jar file: /tmp/sqoop-hadoop/compile/4a4024e6b2baa336939a9310f627636a/tbDest.jar
2017-12-28 14:36:57,650 [myid:] - WARN  [main:MySQLManager@107] - It looks like you are importing from mysql.
2017-12-28 14:36:57,650 [myid:] - WARN  [main:MySQLManager@108] - This transfer can be faster! Use the --direct
2017-12-28 14:36:57,650 [myid:] - WARN  [main:MySQLManager@109] - option to exercise a MySQL-specific fast path.
2017-12-28 14:36:57,650 [myid:] - INFO  [main:MySQLManager@189] - Setting zero DATETIME behavior to convertToNull (mysql)
2017-12-28 14:36:57,652 [myid:] - INFO  [main:ImportJobBase@235] - Beginning import of tbDest
2017-12-28 14:36:57,653 [myid:] - INFO  [main:Configuration@1019] - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2017-12-28 14:36:57,820 [myid:] - INFO  [main:Configuration@1019] - mapred.jar is deprecated. Instead, use mapreduce.job.jar
2017-12-28 14:36:58,229 [myid:] - INFO  [main:Configuration@1019] - mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
2017-12-28 14:36:58,581 [myid:] - INFO  [main:TimelineClientImpl@123] - Timeline service address: http://hdp01:8188/ws/v1/timeline/
2017-12-28 14:36:58,770 [myid:] - INFO  [main:AHSProxy@42] - Connecting to Application History server at hdp01.thinkjoy.tt/192.168.120.96:10201
Thu Dec 28 14:37:01 CST 2017 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2017-12-28 14:37:01,123 [myid:] - INFO  [main:DBInputFormat@192] - Using read commited transaction isolation
2017-12-28 14:37:01,124 [myid:] - INFO  [main:Configuration@1019] - mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
2017-12-28 14:37:01,124 [myid:] - INFO  [main:DataDrivenDBInputFormat@147] - BoundingValsQuery: SELECT MIN(`tGpTime`), MAX(`tGpTime`) FROM `tbDest`
2017-12-28 14:37:17,446 [myid:] - INFO  [main:JobSubmitter@396] - number of splits:4
2017-12-28 14:37:17,541 [myid:] - INFO  [main:JobSubmitter@479] - Submitting tokens for job: job_1514358672274_0012
2017-12-28 14:37:17,966 [myid:] - INFO  [main:YarnClientImpl@236] - Submitted application application_1514358672274_0012
2017-12-28 14:37:17,996 [myid:] - INFO  [main:Job@1289] - The url to track the job: http://hdp01:8088/proxy/application_1514358672274_0012/
2017-12-28 14:37:17,996 [myid:] - INFO  [main:Job@1334] - Running job: job_1514358672274_0012
2017-12-28 14:37:26,149 [myid:] - INFO  [main:Job@1355] - Job job_1514358672274_0012 running in uber mode : false
2017-12-28 14:37:26,150 [myid:] - INFO  [main:Job@1362] -  map 0% reduce 0%
2017-12-28 14:39:52,733 [myid:] - INFO  [main:Job@1362] -  map 25% reduce 0%
2017-12-28 14:40:14,978 [myid:] - INFO  [main:Job@1362] -  map 75% reduce 0%
2017-12-28 14:40:43,183 [myid:] - INFO  [main:Job@1362] -  map 100% reduce 0%
2017-12-28 14:40:43,191 [myid:] - INFO  [main:Job@1373] - Job job_1514358672274_0012 completed successfully
2017-12-28 14:40:43,321 [myid:] - INFO  [main:Job@1380] - Counters: 31File System CountersFILE: Number of bytes read=0FILE: Number of bytes written=573248FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=609HDFS: Number of bytes written=5399155888HDFS: Number of read operations=16HDFS: Number of large read operations=0HDFS: Number of write operations=8Job Counters Killed map tasks=2Launched map tasks=6Other local map tasks=6Total time spent by all maps in occupied slots (ms)=724670Total time spent by all reduces in occupied slots (ms)=0Total time spent by all map tasks (ms)=724670Total vcore-seconds taken by all map tasks=724670Total megabyte-seconds taken by all map tasks=2597217280Map-Reduce FrameworkMap input records=31037531Map output records=31037531Input split bytes=609Spilled Records=0Failed Shuffles=0Merged Map outputs=0GC time elapsed (ms)=3675CPU time spent (ms)=588590Physical memory (bytes) snapshot=4045189120Virtual memory (bytes) snapshot=20141694976Total committed heap usage (bytes)=1943535616File Input Format Counters Bytes Read=0File Output Format Counters Bytes Written=5399155888
2017-12-28 14:40:43,329 [myid:] - INFO  [main:ImportJobBase@184] - Transferred 5.0284 GB in 225.0893 seconds (22.8755 MB/sec)
2017-12-28 14:40:43,335 [myid:] - INFO  [main:ImportJobBase@186] - Retrieved 31037531 records.

2.3 HDFS☞MySQL

[hadoop@hdp01 ~]$ sqoop export --connect jdbc:mysql://192.168.120.92:3306/smsqw?useSSL=false --username smsqw -P --table tbDest2 --export-dir /user/DataSource/MySQL/tbDest
2017-12-28 16:03:18,922 [myid:] - INFO  [main:Sqoop@92] - Running Sqoop version: 1.4.6
Enter password:
2017-12-28 16:03:21,934 [myid:] - INFO  [main:MySQLManager@69] - Preparing to use a MySQL streaming resultset.
2017-12-28 16:03:21,934 [myid:] - INFO  [main:CodeGenTool@92] - Beginning code generation
2017-12-28 16:03:22,343 [myid:] - INFO  [main:SqlManager@757] - Executing SQL statement: SELECT t.* FROM `tbDest2` AS t LIMIT 1
2017-12-28 16:03:22,365 [myid:] - INFO  [main:SqlManager@757] - Executing SQL statement: SELECT t.* FROM `tbDest2` AS t LIMIT 1
2017-12-28 16:03:22,373 [myid:] - INFO  [main:CompilationManager@94] - HADOOP_MAPRED_HOME is /u01/hadoop
Note: /tmp/sqoop-hadoop/compile/332a6c4b30e942c56cf7f507cdff5761/tbDest2.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
2017-12-28 16:03:23,752 [myid:] - INFO  [main:CompilationManager@330] - Writing jar file: /tmp/sqoop-hadoop/compile/332a6c4b30e942c56cf7f507cdff5761/tbDest2.jar
2017-12-28 16:03:23,762 [myid:] - INFO  [main:ExportJobBase@378] - Beginning export of tbDest2
2017-12-28 16:03:23,762 [myid:] - INFO  [main:Configuration@1019] - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2017-12-28 16:03:24,011 [myid:] - INFO  [main:Configuration@1019] - mapred.jar is deprecated. Instead, use mapreduce.job.jar
2017-12-28 16:03:24,738 [myid:] - INFO  [main:Configuration@1019] - mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
2017-12-28 16:03:24,742 [myid:] - INFO  [main:Configuration@1019] - mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
2017-12-28 16:03:24,743 [myid:] - INFO  [main:Configuration@1019] - mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
2017-12-28 16:03:25,087 [myid:] - INFO  [main:TimelineClientImpl@123] - Timeline service address: http://hdp01:8188/ws/v1/timeline/
2017-12-28 16:03:25,269 [myid:] - INFO  [main:AHSProxy@42] - Connecting to Application History server at hdp01.thinkjoy.tt/192.168.120.96:10201
2017-12-28 16:03:27,400 [myid:] - INFO  [main:FileInputFormat@281] - Total input paths to process : 4
2017-12-28 16:03:27,406 [myid:] - INFO  [main:FileInputFormat@281] - Total input paths to process : 4
2017-12-28 16:03:27,484 [myid:] - INFO  [main:JobSubmitter@396] - number of splits:4
2017-12-28 16:03:27,493 [myid:] - INFO  [main:Configuration@1019] - mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
2017-12-28 16:03:27,493 [myid:] - INFO  [main:Configuration@1019] - mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
2017-12-28 16:03:27,577 [myid:] - INFO  [main:JobSubmitter@479] - Submitting tokens for job: job_1514358672274_0020
2017-12-28 16:03:28,062 [myid:] - INFO  [main:YarnClientImpl@236] - Submitted application application_1514358672274_0020
2017-12-28 16:03:28,091 [myid:] - INFO  [main:Job@1289] - The url to track the job: http://hdp01:8088/proxy/application_1514358672274_0020/
2017-12-28 16:03:28,092 [myid:] - INFO  [main:Job@1334] - Running job: job_1514358672274_0020
2017-12-28 16:17:18,663 [myid:] - INFO  [main:Job@1355] - Job job_1514358672274_0020 running in uber mode : false
2017-12-28 16:17:18,665 [myid:] - INFO  [main:Job@1362] -  map 0% reduce 0%
2017-12-28 16:17:34,148 [myid:] - INFO  [main:Job@1362] -  map 1% reduce 0%
2017-12-28 16:17:43,200 [myid:] - INFO  [main:Job@1362] -  map 2% reduce 0%
2017-12-28 16:17:55,269 [myid:] - INFO  [main:Job@1362] -  map 3% reduce 0%
......
2017-12-28 16:40:15,427 [myid:] - INFO  [main:Job@1362] -  map 100% reduce 0%
2017-12-28 16:40:32,491 [myid:] - INFO  [main:Job@1373] - Job job_1514358672274_0020 completed successfully
2017-12-28 16:40:32,659 [myid:] - INFO  [main:Job@1380] - Counters: 31File System CountersFILE: Number of bytes read=0FILE: Number of bytes written=571960FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=5401517442HDFS: Number of bytes written=0HDFS: Number of read operations=70HDFS: Number of large read operations=0HDFS: Number of write operations=0Job Counters Launched map tasks=4Other local map tasks=1Rack-local map tasks=3Total time spent by all maps in occupied slots (ms)=4931826Total time spent by all reduces in occupied slots (ms)=0Total time spent by all map tasks (ms)=4931826Total vcore-seconds taken by all map tasks=4931826Total megabyte-seconds taken by all map tasks=17675664384Map-Reduce FrameworkMap input records=31037531Map output records=31037531Input split bytes=2192Spilled Records=0Failed Shuffles=0Merged Map outputs=0GC time elapsed (ms)=21815CPU time spent (ms)=1522470Physical memory (bytes) snapshot=3453595648Virtual memory (bytes) snapshot=20112125952Total committed heap usage (bytes)=477102080File Input Format Counters Bytes Read=0File Output Format Counters Bytes Written=0
2017-12-28 16:40:32,667 [myid:] - INFO  [main:ExportJobBase@301] - Transferred 5.0306 GB in 2,227.9141 seconds (2.3122 MB/sec)
2017-12-28 16:40:32,671 [myid:] - INFO  [main:ExportJobBase@303] - Exported 31037531 records.

另附import和export常用参数说明表：

转载于:https://blog.51cto.com/candon123/2055559

Using Sqoop 1.4.6 With Hadoop 2.7.4相关推荐

Sqoop数据迁移原理及基本框架
目录 Sqoop简介 Sqoop架构 Sqoop导入底层工作原理 Sqoop导出底层工作原理 Sqoop简介 Sqoop旨在协助RDBMS与Hadoop之间进行高效的大数据交流.可以把关系型数据库的数 ...
sqoop mysql where_Sqoop基本语法简介
1.查看命令帮助 [hadoop@hadoop000 ~]$ sqoop help usage: sqoop COMMAND [ARGS] Available commands: codegen Ge ...
Sqoop找不到主类 Error: Could not find or load main class org.apache.sqoop.Sqoop
最近由于要使用Sqoop来到出数据到hdfs,可是发现Sqoop1.4.5跟hadoop2.X不兼容,需要对Sqoop1.4.5进行编译,编译的具体方法见:http://my.codeweblog.c ...
CentOS7上搭建Hadoop集群(入门级)
场景 Hadoop Apache Hadoop是一款支持数据密集型分布式应用并以Apache 2.0许可协议发布的开源软件框架,支持在商品硬件构建的大型集群上运行应用程序.Hadoop是根据Goog ...
Hadoop概念学习系列之Hadoop 生态系统
当下 Hadoop 已经成长为一个庞大的生态体系,只要和海量数据相关的领域,都有 Hadoop 的身影.下图是一个 Hadoop 生态系统的图谱,详细列举了在 Hadoop 这个生态系统中出现的各种数 ...
sqoop 导入mysql blob字段,Sqoop导入的数据格式问题
Sqoop简单介绍 Sqoop是用来在Hadoop平台和其他结构性存储(比如关系型数据库)之间解决大量数据传输问题的工具.也就是说可以从Oracle,MySQL,PostgreSQL等数据库中将数据传 ...
Hadoop生态系统介绍
Hadoop生态系统 Hadoop1.x 的各项目介绍 1. HDFS 2. MapReduce 3. Hive 4. Pig 5. Mahout 6. ZooKeeper 7. HBase 8. S ...
sqoop mysql parquet_Sqoop抽取Hive Parquet表数据到MySQL异常分析
温馨提示:要看高清无码套图,请使用手机打开并单击图片放大查看. Fayson的github:https://github.com/fayson/cdhproject 1.问题描述在CDH集群中我们需 ...
使用sqoop从Oracle或mysql抽取数据到HDFS遇到的报错及解决
一.参考文档: 1.https://www.rittmanmead.com/blog/2014/03/using-sqoop-for-loading-oracle-data-into-hadoop-o ...

Using Sqoop 1.4.6 With Hadoop 2.7.4

Using Sqoop 1.4.6 With Hadoop 2.7.4相关推荐

最新文章

热门文章