2019独角兽企业重金招聘Python工程师标准>>>

pg_chameleon is available on pypi for download

python 包下载地址

The documentation is available on pythonhosted

文档链接:

Discussion forum available on googlegroups

google group讨论

Please submit your bug reports on GitHub.

发现问题请报告地址

Platform and versions

平台以及版本

The tool is developed using Linux Slackware 14.2.Is currently tested with python 2.7 and python 3.6. 这个工具我们是在Linux Slackware 14.2上开发,已经在Python2.7和Python2.6上测试

The tool is developed using FreeBSD as database server with 该工具的数据库测试环境是FreeBSD 服务,同时数据为:

  • MySQL: 5.6
  • PostgreSQL: 9.5

Possible applications

应用场景

  • 分析 (数据挖掘,数据仓库)

  • 迁移

  • 数据聚合, 数据源为多个MySQL 数据库

  • Analytics

  • Migrations

  • Data aggregation from multiple MySQL databases

Features

特性

  • Read the schema and data from MySQL and restore it into a target PostgreSQL schema
  • 数据MySQL 数据库的数据模式,和存储数据,在PostgreSQL 上 还原成相应的目标模式
    • Setup PostgreSQL to act as a MySQL slave
  • 将PostgreSQL扮演成一个MySQL从库
  • Basic DDL Support (CREATE/DROP/ALTER TABLE, DROP PRIMARY KEY/TRUNCATE)
  • 基本的DDL支持(CREATE/DROP/ALTER TABLE,DROP PRIMARY KEY/TRUNCATE)
  • Discards of rubbish data which is saved in the table sch_chameleon.t_discarded_rows
  • Replica from multiple MySQL schema or servers
  • Basic replica monitoring
  • Detach replica from MySQL

Requirements

安装需知

Python: Linux操作系统 , CPython2.7/3.3 (CPython是指官方发布的那个Python,非其他衍生)

Python: CPython 2.7/3.3+ on Linux

MySQL: 5.5+

PostgreSQL: 9.5+

需要安装的拓展库地址

  • PyMySQL
  • argparse
  • mysql-replication
  • psycopg2
  • PyYAML
  • tabulate

Optionals for building documentation

文编编译需要的工具,该工具为可选项

  • sphinx
  • sphinx-autobuild

Caveats

警告

The replica requires the tables to have a primary key. Tables without primary key are initialised during the init_replica process but the replicadoesn't update them. 这个环境需要数据表拥有主键,没有主键的数据表会在 init_replica 的过程中初始化,但是replica 不会更新

Multiple replica sources are supported. However is required a separate process for each replica. Each replica must have a unique destination schema in PostgreSQL.

支持多个复制源,但是,每一份复制需要一个单独的进程,每个复制在PostgreSQL中的映射是唯一的 The copy_max_memory is just an estimate. The average rows size is extracted from mysql's informations schema and can be outdated.If the copy process fails for memory error check the failing table's row length and the number of rows for each slice.

copy_max_memory只是一个估计,平均行大小是从MySQL的信息架构中提取出来的,并且可能已经过时了,如果复制过程由于内存错误而失败,请检查失败的表的行长度和每个slice的行数. 当复制为空时,捕获DDL或MySQL切换到另一个日志段(ROTATE EVENT),都会有这样的情况.

The batch is processed every time the replica stream is empty, when a DDL is captured or when the MySQL switches to another log segment (ROTATE EVENT).Therefore the replica_batch_size is the limit for when a write happens in PostgreSQL. The parameter controls also the size of the batch replayed by pg_engine.process_batch.

The current implementation is sequential. 目前的实现是顺序的(sequential). Read the replica -> Store the rows -> Replays the stored rows. 读取复制-> 存储行-> 重演存储的行.

2.0的版本会改进这个地方

支持Python3.0 但是MySQL的复制,需要Python3的版本在3.3以上

使用最后接受到的事件事件戳和PostgreSQL事件戳确定滞后信息.如果MySQL是只读的,延迟会增加,由于没有复制信息会产生.

detach复制进程在PostgreSQL中重演这些序列,使得PostgreSQL可以独立工作

MySQL源中的外键会被提取,但是会创建为无效的,外键在没有ON DELETE和ON UPDATE的语句中创建。第二次会长时创建外键,如果发生错误,会日志记录,由于源的配置.

The version 2.0 will improve this aspect.

Python 3 is supported but only from version 3.3 as required by mysql-replication .

The lag is determined using the last received event timestamp and the postgresql timestamp. If the mysql is read only the lag will increase becauseno replica event is coming in.

The detach replica process resets the sequences in postgres to let the database work standalone.

The foreign keys from the source MySQL schema are extracted and created initially as NOT VALID. The foreign keys are created without the ON DELETE or ON UPDATE clauses.A second run tries to validate the foreign keys. If an error occurs it gets logged out according to the source configuration.

Quick Setup

  • Create a virtual environment (e.g. python3 -m venv venv)
  • Activate the virtual environment (e.g. source venv/bin/activate)
  • Install pgchameleon with pip install pg_chameleon. If you get an error upgrade your pip first.
  • Create a user on mysql for the replica (e.g. usr_replica)
  • Grant access to usr on the replicated database (e.g. GRANT ALL ON sakila.* TO 'usr_replica';)
  • Grant RELOAD privilege to the user (e.g. GRANT RELOAD ON . to 'usr_replica';)
  • Grant REPLICATION CLIENT privilege to the user (e.g. GRANT REPLICATION CLIENT ON . to 'usr_replica';)
  • Grant REPLICATION SLAVE privilege to the user (e.g. GRANT REPLICATION SLAVE ON . to 'usr_replica';)

Configuration parameters

The system wide install is now supported correctly.

The first time chameleon.py is executed it creates a configuration directory in $HOME/.pg_chameleon.Inside the directory there are two subdirectories.

  • config is where the configuration files live. Use config-example.yaml as template for the other configuration files. Please note the logs and pid directories with relative path will no longer work. The you should either use an absolute path or provide the home alias. Again, check the config-example.yaml for an example.
  • pid is where the replica pid file is created. it can be changed in the configuration file
  • logs is where the replica logs are saved if log_dest is file. It can be changed in the configuration file

The file config-example.yaml is stored in ~/.pg_chameleon/config and should be used as template for the other configuration files.

do not use config-example.yaml directly. The tool skips this filename as the file gets overwritten when pg_chameleon is upgraded.

Is it possible to have multiple configuration files for configuring the replica from multiple source databases. It's compulsory to chose different destination schemas on postgresql.

Each source requires to be started in a separate process (e.g. a cron entry).

The configuration file is a yaml file. Each parameter controls theway the program acts.

  • my_server_id the server id for the mysql replica. must be unique among the replica cluster.
  • copy_max_memory the max amount of memory to use when copying the table in PostgreSQL. Is possible to specify the value in (k)ilobytes, (M)egabytes, (G)igabytes adding the suffix (e.g. 300M).
  • my_database mysql database to replicate. a schema with the same name will be initialised in the postgres database.
  • pg_database destination database in PostgreSQL.
  • copy_mode the allowed values are 'file' and 'direct'. With direct the copy happens on the fly. With file the table is first dumped in a csv file then reloaded in PostgreSQL.
  • hexify is a yaml list with the data types that require coversion in hex (e.g. blob, binary). The conversion happens on the copy and on the replica.
  • log_dir directory where the logs are stored.
  • log_level logging verbosity. allowed values are debug, info, warning, error.
  • log_dest log destination. stdout for debugging purposes, file for the normal activity.
  • my_charset mysql charset for the copy. Please note the replica library read is always in utf8.
  • pg_charset PostgreSQL connection's charset.
  • tables_limit yaml list with the tables to replicate. If the list is empty then the entire mysql database is replicated.
  • sleep_loop seconds between a two replica batches.
  • pause_on_reindex determines whether to pause the replica if a reindex process is found in pg_stat_activity
  • sleep_on_reindex seconds to sleep when a reindex process is found
  • reindex_app_names lists the application names to check for reindex (e.g. reindexdb). This is a workaround which required for keeping the replication user unprivileged.
  • source_name this must be unique along the list of sources. The tool detects if there's a duplicate when registering a new source
  • dest_schema this is also a unique value. once the source is registered the dest_schema can't be changed anymore
  • log_days_keep: specifies the amount how many days to keep the logs which are rotated automatically on a daily basis
  • batch_retention the max retention for the replayed batches rows in t_replica_batch. The field accepts any valid interval accepted by PostgreSQL
  • out_dir the directory where the csv files are dumped during the init_replica process if the copy mode is file

Reindex detection example setup

#Pause the replica for the given amount of seconds if a reindex process is found
pause_on_reindex: Yes
sleep_on_reindex: 30#list the application names which are supposed to reindex the database
reindex_app_names:
- 'reindexdb'
- 'my_custom_reindex'

MySQL connection parameters

mysql_conn:host: localhostport: 3306user: replication_usernamepasswd: never_commit_passwords

PostgreSQL connection parameters

pg_conn:host: localhostport: 5432user: replication_usernamepassword: never_commit_passwords

Usage

使用

The script chameleon.py requires one of the following commands.

  • drop_schema Drops the service schema sch_chameleon with cascade option.
  • create_schema Create the service schema sch_chameleon.
  • upgrade_schema Upgrade an existing schema sch_chameleon to an newer version.
  • init_replica Create the table structure from the mysql into a PostgreSQL schema with the same mysql's database name. The mysql tables are locked in read only mode and the data is copied into the PostgreSQL database. The master's coordinates are stored in the PostgreSQL service schema. The command drops and recreate the service schema.
  • start_replica Starts the replication from mysql to PostgreSQL using the master data stored in sch_chameleon.t_replica_batch. The master's position is updated time a new batch is processed. The command upgrade the service schema if required.
  • list_config List the available configurations and their status ('ready', 'initialising','initialised','stopped','running')
  • add_source register a new configuration file as source
  • drop_source remove the configuration from the registered sources
  • stop_replica ends the replica process gracefully
  • disable_replica ends the replica process and disable the restart
  • enable_replica enable the replica process
  • sync_replica sync the data between mysql and postgresql without dropping the tables
  • show_status displays the replication status for each source, with the lag in seconds and the last received event
  • detach_replica stops the replica stream, discards the replica setup and resets the sequences in PostgreSQL to work as a standalone db.

the optional command --config followed by the configuration file name, without the yaml suffix, allow to specify different configurations.If omitted the configuration defaults to default.

Example

Create a virtualenv and activate it

python3 -m venv venv
source venv/bin/activate

Install pg_chameleon

pip install pg_chameleon

Run the script in order to create the configuration directory.

chameleon.py

cd in ~/.pg_chameleon/config and copy the configuration-example.yaml to default.yaml. Please note this is the default configuration and can be omitted when executing the chameleon.py script.

In MySQL create a user for the replica.

CREATE USER usr_replica ;
SET PASSWORD FOR usr_replica=PASSWORD('replica');
GRANT ALL ON sakila.* TO 'usr_replica';
GRANT RELOAD ON *.* to 'usr_replica';
GRANT REPLICATION CLIENT ON *.* to 'usr_replica';
GRANT REPLICATION SLAVE ON *.* to 'usr_replica';
FLUSH PRIVILEGES;

Add the configuration for the replica to my.cnf (requires mysql restart)

binlog_format= ROW
binlog_row_image=FULL
log-bin = mysql-bin
server-id = 1

If you are using a cascading replica configuration ensure the parameter log_slave_updates is set to ON.

log_slave_updates= ON

In PostgreSQL create a user for the replica and a database owned by the user

CREATE USER usr_replica WITH PASSWORD 'replica';
CREATE DATABASE db_replica WITH OWNER usr_replica;

Check you can connect to both databases from the replication system.

For MySQL

mysql -p -h derpy -u usr_replica sakila
Enter password:
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -AWelcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 116
Server version: 5.6.30-log Source distributionCopyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.mysql>

For PostgreSQL

psql  -h derpy -U usr_replica db_replica
Password for user usr_replica:
psql (9.5.5)
Type "help" for help.
db_replica=>

Setup the connection parameters in default.yaml

---
#global settings
my_server_id: 100
replica_batch_size: 1000
my_database:  sakila
pg_database: db_replica#mysql connection's charset.
my_charset: 'utf8'
pg_charset: 'utf8'#include tables only
tables_limit:#mysql slave setup
mysql_conn:host: derpyport: 3306user: usr_replicapasswd: replica#postgres connection
pg_conn:host: derpyport: 5432user: usr_replicapassword: replica

Initialise the schema and the replica with

chameleon.py create_schema
chameleon.py add_source --config default
chameleon.py init_replica --config default

Start the replica with

chameleon.py start_replica --config default

Detaching the replica from MySQL

chameleon.py detach_replica --config default

转载于:https://my.oschina.net/innovation/blog/903312

MySQL 到PostgreSQL 的数据迁移工具相关推荐

  1. MySQL 到 PostgreSQL 实时数据同步实操分享

    摘要:很多 DBA 和开发同学经常会遇到要从一个数据库实时同步到另一个数据库的问题,同构数据还相对容易,遇上异构数据.表多.数据量大等情况就难以同步.最近了解到一款实时数据同步工具 Tapdata C ...

  2. postgresql导入mysql_【原创】MySQL和PostgreSQL 导入数据对比

    在虚拟机上测评了下MySQL 和 PostgreSQL 的各种LOAD FILE方式以及时间. 因为是虚拟机上的测评,所以时间只做参考,不要太较真, 看看就好了. MySQL 工具: 1. 自带mys ...

  3. MySQL 到 MongoDB 实时数据同步实操分享

    MySQL数据怎么实时同步到 MongoDB 实践分享系列 摘要:很多 DBA 同学经常会遇到要从一个数据库实时同步到另一个数据库的问题,同构数据还相对容易,遇上异构数据.表多.数据量大等情况就难以同 ...

  4. 3.2.3 Sqoop 数据迁移工具, 导入数据import, MySQL到HDFS/Hive, 导出数据export,增量数据导入, Sqoop job,常用命令及参数

    目录 数据迁移工具 -- Sqoop 第一部分 Sqoop概述 第二部分 安装配置 第三部分 应用案例 第 1 节 导入数据import MySQL 到 HDFS MySQL 到 Hive 第 2 节 ...

  5. OpenShift 4 - 使用 Debezium 捕获变化数据,实现MySQL到PostgreSQL数据库同步(附视频)

    <OpenShift / RHEL / DevSecOps 汇总目录> 说明:本文已经在OpenShift 4.10环境中验证 文章目录 场景说明 部署环境 安装CDC源和目标数据库 安装 ...

  6. DM 数据迁移工具——DTS(MySQL数据迁移到DM8数据库 Windows环境)

    DM 数据迁移工具--DTS MySQL数据迁移到DM8数据库 Windows环境 DM 数据迁移工具 DM DTS 提供了主流大型数据库迁移到 DM.DM 到 DM.文件迁移到 DM 以及 DM 迁 ...

  7. mysql表结构以及数据导入postgresql常见问题

    表结构 先将mysql的表结构转换成postgresql的表结构.(这里使用的是Navicat工具) 步骤以及图示如下:单表的话选择需要转换的表,单击表名右键–>逆向表到模型–>在新生成的 ...

  8. mysql longtext db2_从 MySQL 或 PostgreSQL 迁移到 DB2 Express-C

    从 MySQL 或 PostgreSQL 迁移到 DB2 Express-C 用三个简单步骤迁移到 DB2 Vikram Khatri, Nora Sokolof, 和 Manas Dadarkar ...

  9. 8款数据迁移工具选型,主流且实用

    前言:ETL(是Extract-Transform-Load的缩写,即数据抽取.转换.装载的过程),对于企业应用来说,我们经常会遇到各种数据的处理.转换.迁移的场景.今天特地给大家汇总了一些目前市面上 ...

最新文章

  1. Note:类(Class)
  2. 走进JavaWeb技术世界7:Tomcat中的设计模式
  3. 强化学习(四)—— Actor-Critic
  4. QT的QWriteLocker类的使用
  5. nacos使用_使用Nacos的CMDB实现微服务的就近访问!
  6. SEO笔记—网页结构优化(四)
  7. 小红书笔记_小红书的沙雕笔记,害人不浅啊
  8. 解决新版本Vivado打开老工程IP锁住的问题
  9. WindowsNT/2000的系统日志文件
  10. linux iscsi 发起程序,怎么查看进程的发起程序,iscsi发起程序是什么
  11. 管理科学与运筹学(MS/OR)国际权威期刊
  12. 设正整数n的十进制表示为n=ak……a1a0(0=ai=9,0=i=k,ak!=0),n的个位为起始数字的数字的正负交错之和T(n)=a0+a1+……+(-1)kak,证明:11|n的充分必要...
  13. 《思维力:高效的系统思维》读书笔记05 - 快速提升你的沟通表达能力
  14. 备份微信聊天记录到电脑上,并且可以随时导回
  15. linux下组播遇到的问题及解决办法
  16. html页面实现打印预览功能,js实现打印、页面设置、打印预览功能
  17. 基于Java(SSM框架)实现的购物网站系统【100010082】
  18. 魔兽争霸3(War3) YDWE下载与安装
  19. C语言中的stdlib头文件解析
  20. 使用迅雷下载远程FTP文件

热门文章

  1. Boost:序列化之text_wiarchive和和text_woarchive
  2. VTK:IO之GLTFImporter
  3. OpenCV将GIS数据加载到OpenCV容器中的实例(附完整代码)
  4. OpenCV使用OpenPose dnn进行人或手姿势检测的实例(附完整代码)
  5. Qt for VxWorks
  6. OpenGL 延迟着色法Deferred Shading
  7. C++shell sort希尔排序的实现算法之一(附完整源码)
  8. C语言以递归实现归并排序Merge Sort算法(附完整源码)
  9. 经典C语言程序100例之五三
  10. 经典C语言程序100例之八