DataX

DataX 是阿里巴巴集团内被广泛使用的离线数据同步工具/平台,实现包括 MySQL、SQL Server、Oracle、PostgreSQL、HDFS、Hive、HBase、OTS、ODPS 等各种异构数据源之间高效的数据同步功能。

Features

DataX本身作为数据同步框架,将不同数据源的同步抽象为从源头数据源读取数据的Reader插件,以及向目标端写入数据的Writer插件,理论上DataX框架可以支持任意数据源类型的数据同步工作。同时DataX插件体系作为一套生态系统, 每接入一套新数据源该新加入的数据源即可实现和现有的数据源互通。

System Requirements

Linux
JDK(1.8以上,推荐1.8)
Python(推荐Python2.6.X)
Apache Maven 3.X(Compile DataX)

Quick Start

工具部署

方法一、直接下DataX工具包:DataX下载地址,下载后解压至本地某个目录,进入bin目录,即可运行同步作业

$ cd  {YOUR_DATAX_HOME}/bin
$ python datax.py {YOUR_JOB.json}

方法二、下载DataX源码,自己编译:DataX源码

①.安装JDK

tar xvf jdk-8u151-linux-x64.tar.gz
mv jdk1.8.0_151
vim /etc/profile.d/jdk.sh
export JAVA_HOME=/usr/local/jdk1.8.0_151
export JAVA_BIN=$JAVA_HOME/bin
export PATH=$PATH:$JAVA_BIN
export CLASSPATH=$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
source /etc/profile.d/jdk.sh

检测安装是否成功

[root@oracle ~]# java -version
java version "1.8.0_151"
Java(TM) SE Runtime Environment (build 1.8.0_151-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.151-b12, mixed mode)

②.查看Python版本,如果不满足则需要自行安装

[root@oracle ~]# python -V
Python 2.6.6

③.安装Maven

下载地址:http://maven.apache.org/download.cgi

开始安装配置

tar xvf apache-maven-3.6.1-bin.tar.gz
mv apache-maven-3.6.1-bin.tar ../maven
vim /etc/profile
M2_HOME=/usr/local/maven
export PATH=${M2_HOME}/bin:/u01/mysql/bin:${PATH}

验证Maven是否安装成功

[root@oracle src]# mvn -v
Apache Maven 3.6.1 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-11T00:41:47+08:00)
Maven home: /usr/local/maven
Java version: 1.8.0_151, vendor: Oracle Corporation
Java home: /usr/local/jdk1.8.0_151/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "3.10.0-862.el7.x86_64", arch: "amd64", family: "unix"

系统需求已配置完成,开始源码安装DataX,下载方法二选其一

下载地址:https://github.com/alibaba/DataX
git clone git@github.com:alibaba/DataX.git

开始源码安装

unzip DataX-master.zip
mv DataX-master ../
cd ../DataX-master
mvn -U clean package assembly:assembly -Dmaven.test.skip=true

该过程非常非常的漫长,需要等待,打包成功,最终显示如下

[INFO] BUILD SUCCESS
[INFO] -----------------------------------------------------------------
[INFO] Total time: 08:12 min
[INFO] Finished at: 2015-12-13T16:26:48+08:00
[INFO] Final Memory: 133M/960M
[INFO] -----------------------------------------------------------------

打包成功后的DataX包位于{DataX_source_code_home}/target/datax/datax/,结构如下:

cd /usr/local/Datax-master
[root@oracle DataX-master]# ls -a ./target/datax/datax/
.  ..  bin  conf  job  lib  log  log_perf  plugin  script  tmp

配置示例:从stream读取数据并打印到控制台,

第一步。创建创业的配置文件(json格式),可以通过命令查看配置模板:python datax.py -r {YOUR_READER} -w {YOUR_WRITER}

cd /usr/local/DataX-master/target/datax/datax/bin
./datax.py -r streamreader -w streamwriter
DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.Please refer to the streamreader document:https://github.com/alibaba/DataX/blob/master/streamreader/doc/streamreader.md Please refer to the streamwriter document:https://github.com/alibaba/DataX/blob/master/streamwriter/doc/streamwriter.md Please save the following configuration as a json file and  usepython {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json
to run the job.{"job": {"content": [{"reader": {"name": "streamreader", "parameter": {"column": [], "sliceRecordCount": ""}}, "writer": {"name": "streamwriter", "parameter": {"encoding": "", "print": true}}}], "setting": {"speed": {"channel": ""}}}
}

根据模板配置自己所需的json,具体如下

{"job": {"content": [{"reader": {"name": "oraclereader","parameter": {"column": ["zybh","xmbh","jsbh","xmmc","sfje","fylx","bz","sjc","zllx"],"connection": [{"jdbcUrl": ["jdbc:oracle:thin:@192.168.11.91:1521:orcl"],"table": ["JDZLJCXM"]}],"password": "jgzdwffz","username": "bjxxjgxt",}},"writer": {"name": "mysqlwriter","parameter": {"column": ["zybh","xmbh","jsbh","xmmc","sfje","fylx","bz","sjc","zllx"],"connection": [{"jdbcUrl": "jdbc:mysql://192.168.11.75:3336/prison_practical_platform","table": ["JDZLJCXM"]}],"password": "root","username": "root",}}}],"setting": {"speed": {"channel": "5"}}}
}

第二步。启动DataX

cd /usr/local/DataX-master/target/datax/datax/bin
./datax.py ../job/oracle11mysql8.json

同步结束,相关日志如下:

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.2019-05-24 11:59:06.065 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2019-05-24 11:59:06.075 [main] INFO  Engine - the machine info  => osInfo:    Oracle Corporation 1.8 25.161-b12jvmInfo:    Linux amd64 3.10.0-862.el7.x86_64cpu num:    1totalPhysicalMemory:    -0.00GfreePhysicalMemory:    -0.00GmaxFileDescriptorCount:    -1currentOpenFileDescriptorCount:    -1GC Names    [Copy, MarkSweepCompact]MEMORY_NAME                    | allocation_size                | init_size                      Eden Space                     | 273.06MB                       | 273.06MB                       Code Cache                     | 240.00MB                       | 2.44MB                         Survivor Space                 | 34.13MB                        | 34.13MB                        Compressed Class Space         | 1,024.00MB                     | 0.00MB                         Metaspace                      | -0.00MB                        | 0.00MB                         Tenured Gen                    | 682.69MB                       | 682.69MB                       2019-05-24 11:59:06.093 [main] INFO  Engine -
{"content":[{"reader":{"name":"oraclereader","parameter":{"column":["zybh","xmbh","jsbh","xmmc","sfje","fylx","bz","sjc","zllx"],"connection":[{"jdbcUrl":["jdbc:oracle:thin:@192.168.11.91:1521:orcl"],"table":["JDZLJCXM"]}],"password":"********","username":"bjxxjgxt"}},"writer":{"name":"mysqlwriter","parameter":{"column":["zybh","xmbh","jsbh","xmmc","sfje","fylx","bz","sjc","zllx"],"connection":[{"jdbcUrl":"jdbc:mysql://192.168.11.75:3336/prison_practical_platform","table":["JDZLJCXM"]}],"password":"****","username":"root"}}}],"setting":{"speed":{"channel":"5"}}
}2019-05-24 11:59:06.111 [main] WARN  Engine - prioriy set to 0, because NumberFormatException, the value is: null
2019-05-24 11:59:06.121 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2019-05-24 11:59:06.121 [main] INFO  JobContainer - DataX jobContainer starts job.
2019-05-24 11:59:06.122 [main] INFO  JobContainer - Set jobId = 0
2019-05-24 11:59:06.488 [job-0] INFO  OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:oracle:thin:@192.168.11.91:1521:orcl.
2019-05-24 11:59:06.605 [job-0] INFO  OriginalConfPretreatmentUtil - table:[JDZLJCXM] has columns:[ZYBH,XMBH,JSBH,XMMC,SFJE,FYLX,BZ,SJC,ZLLX].
2019-05-24 11:59:06.952 [job-0] INFO  OriginalConfPretreatmentUtil - table:[JDZLJCXM] all columns:[
ZYBH,XMBH,JSBH,XMMC,SFJE,FYLX,BZ,SJC,ZLLX
].
2019-05-24 09:48:44.768 [job-0] INFO  OriginalConfPretreatmentUtil - Write data [
INSERT INTO %s (zybh,xmbh,jsbh,xmmc,sfje,fylx,bz,sjc,zllx) VALUES(?,?,?,?,?,?,?,?,?)
], which jdbcUrl like:[jdbc:mysql://192.168.11.75:3336/prison_practical_platform?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]
2019-05-24 09:48:44.768 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...
2019-05-24 09:48:44.769 [job-0] INFO  JobContainer - DataX Reader.Job [oraclereader] do prepare work .
2019-05-24 09:48:44.769 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do prepare work .
2019-05-24 09:48:44.769 [job-0] INFO  JobContainer - jobContainer starts to do split ...
2019-05-24 09:48:44.769 [job-0] INFO  JobContainer - Job set Channel-Number to 5 channels.
2019-05-24 09:48:44.772 [job-0] INFO  JobContainer - DataX Reader.Job [oraclereader] splits to [1] tasks.
2019-05-24 09:48:44.772 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] splits to [1] tasks.
2019-05-24 09:48:44.820 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2019-05-24 09:48:44.830 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2019-05-24 09:48:44.832 [job-0] INFO  JobContainer - Running by standalone Mode.
2019-05-24 09:48:44.866 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2019-05-24 09:48:44.876 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2019-05-24 09:48:44.877 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2019-05-24 09:48:44.898 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2019-05-24 09:48:44.901 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Begin to read record by Sql: [select zybh,xmbh,jsbh,xmmc,sfje,fylx,bz,sjc,zllx from JDZLJCXM
] jdbcUrl:[jdbc:oracle:thin:@192.168.11.91:1521:orcl].
2019-05-24 09:48:45.050 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Finished read record by Sql: [select zybh,xmbh,jsbh,xmmc,sfje,fylx,bz,sjc,zllx from JDZLJCXM
] jdbcUrl:[jdbc:oracle:thin:@192.168.11.91:1521:orcl].
2019-05-24 09:48:46.419 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[1538]ms
2019-05-24 09:48:46.420 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2019-05-24 09:48:54.873 [job-0] INFO  StandAloneJobContainerCommunicator - Total 1554 records, 79079 bytes | Speed 7.72KB/s, 155 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.020s |  All Task WaitReaderTime 0.124s | Percentage 100.00%
2019-05-24 09:48:54.874 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2019-05-24 09:48:54.874 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do post work.
2019-05-24 09:48:54.874 [job-0] INFO  JobContainer - DataX Reader.Job [oraclereader] do post work.
2019-05-24 09:48:54.874 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2019-05-24 09:48:54.875 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: /usr/local/DataX-master/target/datax/datax/hook
2019-05-24 09:48:54.875 [job-0] INFO  JobContainer - [total cpu info] => averageCpu                     | maxDeltaCpu                    | minDeltaCpu                    -1.00%                         | -1.00%                         | -1.00%[total gc info] => NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime     Copy                 | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             MarkSweepCompact     | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             2019-05-24 09:48:54.875 [job-0] INFO  JobContainer - PerfTrace not enable!
2019-05-24 09:48:54.876 [job-0] INFO  StandAloneJobContainerCommunicator - Total 1554 records, 79079 bytes | Speed 7.72KB/s, 155 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.020s |  All Task WaitReaderTime 0.124s | Percentage 100.00%
2019-05-24 09:48:54.876 [job-0] INFO  JobContainer -
任务启动时刻                    : 2019-05-24 09:48:43
任务结束时刻                    : 2019-05-24 09:48:54
任务总计耗时                    :                 11s
任务平均流量                    :            7.72KB/s
记录写入速度                    :            155rec/s
读出记录总数                    :                1554
读写失败总数                    :                   0

在配置过程中出现的问题,错误如下

[ERROR] Failed to execute goal on project otsstreamreader: Could not resolve dependencies for project com.alibaba.datax:otsstreamreader:jar:1.0.0-SNAPSHOT: Could not find artifact
com.aliyun.openservices:tablestore-streamclient:jar:1.0.0-SNAPSHOT -> [Help 1]

该错误是由于快照版本不一致,由于ots基本不会被用到,直接把pom.xml中的<module>ots</module>去掉,也可以更改版本otsstreamreader中的默认版本为0.0.1,改为1.0.0

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-jar-plugin:2.4:jar (default-jar) on project ocswriter: Error assembling JAR: /Users/FengZhen/Desktop/Hadoop/DataX/源码/DataX/ocswriter/pom.xml isn't a file. -> [Help 1]

将ocs注释掉,重新打包即可。

参考来源:https://github.com/alibaba/DataX/blob/master/userGuid.md

     http://www.cnblogs.com/EnzoDin/p/9979583.html

转载于:https://www.cnblogs.com/Roobbin/p/10917350.html

通过DataX从Oracle同步数据到MySQL-安装配置过程相关推荐

  1. 使用DataX从ORACLE同步数据到MYSQL

    [前提]安装python3.7 oracle版本:oracle 11g mysql版本:mysql5.7 1.下载DataX wget http://datax-opensource.oss-cn-h ...

  2. Oracle同步数据到MySQL

    Oracle同步数据到MySQL 1.首先在TreeSoft数据库中配置两个数据源信息 2.配置数据同步任务,并执行任务 3.同步结果数据查看确认 4.目前TreeSoft支持以下数据同步方案 1.M ...

  3. Oracle通过kafka同步数据到MySQL

    场景 Oracle同步数据最佳的解决方案是自家的ogg,但是考虑到成本,需要找到其他的解决方案.如果是MySQL通过kafka同步,问题简单的多,因为阿里巴巴的开源数据同步方案--canel是最佳的解 ...

  4. 这样做,免费从Oracle同步数据

    点击▲关注 "数据和云"   给公众号标星置顶 更多精彩 第一时间直达 刘伟 刘伟,云和恩墨软件开发部研究院研究员:前微博DBA,主要研究方向为开源数据库,分布式数据库,擅长自动化 ...

  5. Sqoop 同步数据到mysql, Can't parse input data: '\N'

    Sqoop 同步数据到mysql Sqoop  从hdfs 同步数据到mysql 是我们常常遇到的事情 同步分为 分区表同步和非分区表同步 需要注意以下几点 1. hive 中的表 必须是textfi ...

  6. datax的工具配置oracle,完全小白级DataX安装配置过程详解

    apt-get install openjdk-7-jdk Linux小白,创建和删除文件夹都现问度娘学会的小白,最近被勒令研究一下DataX,当时一脸蒙X,根本木有听过吖,但是领导吩咐了就得办吖,问 ...

  7. MySQL安装配置(Windows和 Linux )

    MySQL安装配置(Windows和 Linux ) 文章目录 MySQL安装配置(Windows和 Linux ) 一.MySQL 下载 1. 1 点击 **DOWNLOADS** 1.2 点击 * ...

  8. SUSE Linux 11里Nginx+Resin+JSP+Memcached+MySQL安装配置整合

    服务器运维与网站架构|Linux运维|X研究 let's face reality,loyalty to an ideal! 首页 Linux Nginx Security Shell 服务器架构 互 ...

  9. 绿色版 MySQL 安装配置的正确操作步骤

    文章主要向大家介绍的是绿色版 MySQL 安装配置的实际操作流程,我们是在MySQL5.1版的环境下对色版 MySQL进行正确的安装配置,以及对其具体的下载地址有一详细介绍,以下就是文章的详细内容介绍 ...

最新文章

  1. struts2 与 sping 整合 控制器中 service注入的问题
  2. Ymal格式转Properties格式
  3. leetcode103JAVA_[LeetCode] 103. Binary Tree Zigzag Level Order Traversal Java
  4. 高度有用的Java ChronoUnit枚举
  5. 怎么使用starwind部署iscsi_2019 年总结 - 多环境多版本的部署
  6. 带有示例的Python date strftime()方法
  7. 在html设置文字位置,html设置怎么文字的位置
  8. Hibernate复习笔记
  9. [sed] linux sed 批量替换字符串-转
  10. 【最大连接数】Linux的文件最大连接数
  11. 201521123081《Java程序设计》 第4周学习总结
  12. C++调用其他语言(C#、java、python)
  13. 利用HTML和CSS做的简历模板
  14. windows cmd命令行添加mysql环境变量
  15. 计算机识别不到硬盘,电脑认不到硬盘怎么回事 电脑开机认不到硬盘处理方法...
  16. 39元超值!360超级充电器拆解与评测
  17. java sockets_Java Sockets
  18. 快速生成CRUD接口的神器-IDEA插件EasyCode
  19. 精华【分布式微服务云架构dubbo+zookeeper+springmvc+mybatis+shiro+redis】分布式大型互联网企业架构!
  20. wxpython中表格顶角怎么设置,wxpython listctrl并修复列宽

热门文章

  1. 拳王虚拟项目公社:虚拟资源知识付费怎么玩?如何做知识付费?知识付费如何赚钱?
  2. stm32F051系列 单片机引脚定时器输出pwm波形控制风扇转速
  3. Fast-SCNN 多分支结构共享低级特征的语义分割网络 (一)
  4. 取消对 null 指针“l”的引用。_彻底理解链表中为何使用二级指针或者一级指针的引用...
  5. 关于Ajax和@RequestBody配合使用的问题
  6. html信号动画,HTML5带音效的雷达检测信号动画
  7. 不同模块下包重名怎么解决_口臭怎么解决?|盘点不同类型口臭的去除方法
  8. 【java】SpringBoot新特性 节省百分之95﹪内存占用
  9. 【java】简述CGLIB常用API
  10. 1.1.0-简介-P12-分布式锁的解决方案(二)