mysql shrink_MySQL Group Replication内存使用分析和优化-1

本文主要分析MySQL Group Replication(MGR)相比传统MySQL复制模式，mysqld新增的几个内存缓存模块。举例说明由此可能引发的问题，并介绍潜在的优化方案。

与传统的MySQL主从复制不同，MySQL Group Replication模式下，mysqld会占据更多的内存空间。如果大家在云主机或docker这种内存十分有限的环境下使用MGR，那么就要特别留意，需结合业务场景合理规划内存空间。

MGR新增了多个内存缓存，其作用各不相同。本文仅介绍正常运行期间最重要的2个内存模块。分别是事务认证模块的冲突检测数据库(certification_info对象)和底层xcom模块的paxos cache。

certification_info对象

首先介绍冲突检测数据库，即certification_info对象。其保存的是事务的writeset。具体实现可以参考上一篇文章温正湖：MySQL Group Replication冲突检测机制再剖析zhuanlan.zhihu.com

因为一个事务只有在每个MGR节点都执行/回放后，其writeset才能从certification_info中被清理，所以，如果节点间复制延迟越大，writeset积累就越多。每个writeset有2部分组成，分别是唯一键的哈希值和事务执行时的gtid_executed。前者的大小是固定的，但后者的大小无法准确衡量。比如：

gtid_executed a为

6c6aa49f-22dd-11e9-82e2-c81f66e48c6e:1-696578346

gtid_executed b为

597672ea-a498-11e6-9dd2-246e9627d610:1-13276447793,

9041b15e-a686-11e8-8bf2-246e9672a2f0:1-1234925114,

cbaf3b47-bcb3-11e8-aaa6-246e96c4fc68:1-8077131748,

e8ea688c-2b73-11e7-a9d6-246e9627f950:1-6362,

ec82f85a-145d-11e7-979e-246e96280570:1-45540504450,

f73f342f-a496-11e6-a74a-246e9627cdc8:1-5118868458

b相比a占据数倍的内存，如果每个writeset包含的gtid_executed是b，那么会消耗更多的内存空间。所以，在使用MGR时gtid_executed中server_uuid个数越少越好。

目前MGR的流控模块进行流控调节是基于等待认证和等待回放的事务个数，而writeset的数目跟事务个数没有对应关系。一个事务可能只操作了一条记录，也可能操作了成百上千条记录。在这种情况下，基于MGR的流控机制，就无法限制certification_info对象中的writeset数目。导致出现执行的事务数很少，但内存中的writeset数目已经很大。

使用sysbench 1.0版本进行多线程并发prepare很容易构造上述场景。因为进行数据prepare的时候，每个事务为1MB左右，包括500条记录。每个表有2个索引，每个索引记录会产生带校对和不带校对2个版本writeset。那么一个事务就会产生2000条writeset。假设创建128个表，使用128个线程进行prepare，即使每个线程tps是1，一秒时间内就会产生25多万的writeset。按照MySQL社区版60s清理一次，可能会累积1500万的writeset。这是一笔不小的内存开销。

paxos cache

下面是与paxos cache相关的几个变量。

We require that the number of elements in the cache is big enough enough that

it is always possible to find instances that are not busy.

Under normal circumstances the number of busy instances will be

less than event_horizon, since the proposers only considers

instances which belong to the local node.

A node may start proposing no_ops for instances belonging

to other nodes, meaning that event_horizon * NSERVERS instances may be

involved. However, for the time being, proposing a no_op for an instance

will not mark it as busy. This may change in the future, so a safe upper

limit on the number of nodes marked as busy is event_horizon * NSERVERS.

#define CACHED 50000

static lru_machine cache[CACHED]; /* The Paxos instances, plus a link for the LRU chain */

MySQL社区版的paxos cache大小为5万个lru_machine。

/* {{{ Paxos machine cache */

struct lru_machine {

linkage lru_link;

pax_machine pax;

};

lru_machine是pax_machine的一次封装，pax_machine定义如下：

/* Definition of a Paxos instance */

struct pax_machine {

linkage hash_link;

lru_machine * lru;

synode_no synode;

double last_modified; /* Start time */

linkage rv; /* Tasks may sleep on this until something interesting happens */

struct {

ballot bal; /* The current ballot we are working on */

bit_set * prep_nodeset; /* Nodes which have answered my prepare */

ballot sent_prop;

bit_set * prop_nodeset; /* Nodes which have answered my propose */

pax_msg * msg; /* The value we are trying to push */

ballot sent_learn;

} proposer;

struct {

ballot promise; /* Promise to not accept any proposals less than this */

pax_msg * msg; /* The value we have accepted */

} acceptor;

struct {

pax_msg *msg; /* The value we have learned */

} learner;

int lock; /* Busy ? */

pax_op op;

int force_delivery;

};

每个pax_machine都是一个独立的paxos instance。可以看出其包括proposer、acceptor和learner三个子对象，分别对应paxos一致性协议的三个阶段。pax_msg *msg即为一个或多个事务的writeset集合(msg包括多个事务的场景是paxos做了batch)。

所以，paxos cache的总大小也是跟每个事务中writeset数量有关的，无法准确计算。但相比冲突检测数据库，MySQL在这块做得好一些，引入了

cache size limit and interval

size_t cache_limit;

默认为1G大小：

/* Reasonable initial cache limit */

#define CACHE_LIMIT 1000000000ULL

其逻辑是在将已经达成一致的paxos消息上推给MySQL执行后，会检查当前Cache的大小，如果超过1G，那么会触发cache清理。

Loop through the LRU (protected_lru) and deallocate objects until the size of

the cache is below the limit.

The freshly initialized objects are put into the probation_lru, so we can always start

scanning at the end of protected_lru.

lru_get will always look in probation_lru first.

void shrink_cache()

{

FWD_ITER(&protected_lru, lru_machine,

if ( above_cache_limit() && can_deallocate(link_iter)) {

last_removed_cache = link_iter->pax.synode;

hash_out(&link_iter->pax); /* Remove from hash table */

link_into(link_out(&link_iter->lru_link), &probation_lru); /* Put in probation lru */

init_pax_machine(&link_iter->pax, link_iter, null_synode);

} else {

return;

}

);

}

清理的逻辑是遍历整个cache，使用can_deallocate函数找出可以被清理的lru_machine，清理的标准之一是各个节点都已经收到这个已经达成一致的消息了。之后调用init_pax_machine释放其上的pax_msg对象内存。如果清理过程中发现cache已经小于1G，那么也会停止清理。可以说，正常情况下，paxos cache的大小维持在1G左右波动。但如果节点间的网络延时比较高，有个节点落后比较多，会导致cache的大小超过硬编码的1G阈值。

并且，在cache总大小已经超过阈值后，其大小还可能进一步变大，原因请看下面的代码

Get a machine for (re)use.

The machines are statically allocated, and organized in two lists.

probation_lru is the free list.

protected_lru tracks the machines that are currently in the cache in

lest recently used order.

static lru_machine *lru_get()

{

lru_machine * retval = NULL;

// !above_cache_limit() add By InnoSQL, make sure the cache size will no large than cache limit

if (!link_empty(&probation_lru)) {

retval = (lru_machine * ) link_first(&probation_lru);

} else {

/* Find the first non-busy instance in the LRU */

FWD_ITER(&protected_lru, lru_machine,

if (!is_busy_machine(&link_iter->pax)) {

retval = link_iter;

/* Since this machine is in in the cache, we need to update

last_removed_cache */

last_removed_cache = retval->pax.synode;

break;

}

)

}

assert(retval && !is_busy_machine(&retval->pax));

return retval;

}

从注释可以知道，lru_get函数是从paxos cache中申请一个lru_machine。如果当前cache中表示空闲列表的probation_lru还有空闲machine，那么优先从空闲列表分配。如果此时cache总大小已经超过阈值，那么选择走从probation_lru队列申请会进一步增加cache大小。

总结

本文先介绍了MGR相比普通MySQL复制增加的内存模块，并分析了其潜在的问题。下一篇将讨论如何对其进行优化

mysql shrink_MySQL Group Replication内存使用分析和优化-1相关推荐

由浅入深探究mysql索引结构原理_性能分析与优化_由浅入深探究mysql索引结构原理、性能分析与优化...
由浅入深探究mysql索引结构原理.性能分析与优化第一部分:基础知识第二部分:MYISAM和INNODB索引结构1, 简单介绍B-tree B+ tree树 2, MyisAM索引结构 3, Ann ...
mysql group 更新递增_MySQL Group Replication在网易使用和优化实践
本文由作者授权网易云发布,未经许可,请勿转载作者:温正湖,网易数据库技术专家 MGR(MySQL Group Replication)是MySQL官方推出的领先的服务高可用和数据高可靠方案,网易从2 ...
Mysql 8 group replication组复制集群单主配置图解
Mysql 8 MGR集群单主配置图解声明与简介本文的数据来自网络,部分代码也有所参照,这里做了注释和延伸,旨在技术交流,如有冒犯之处请联系博主及时处理.本文主要介绍mysql的MGR集群的配置. ...
mysql io_MySQL服务器 IO 100%的分析与优化方案
前言压力测试过程中,如果因为资源使用瓶颈等问题引发最直接性能问题是业务交易响应时间偏大,TPS逐渐降低等.而问题定位分析通常情况下,最优先排查的是监控服务器资源利用率,例如先用TOP 或者nmon等 ...
MySQL数据库的性能的影响分析及优化
MySQL数据库的性能的影响一. 服务器的硬件的限制二. 服务器所使用的操作系统三. 服务器的所配置的参数设置不同四. 数据库存储引擎的选择五. 数据库的参数配置的不同六. (重点)数据库 ...
mysql索引结构原理、性能分析与优化
摘要: 第一部分:基础知识第二部分:MYISAM和INNODB索引结构 1.简单介绍B-tree B+ tree树 2.MyisAM索引结构 3.Annode索引结构 4.MyisAM索引与Inno ...
Linux下tomcat内存溢出分析及优化
为什么80%的码农都做不了架构师?>>> 常见的内存溢出有以下两种: java.lang.OutOfMemoryError: PermGen space java.lang.O ...
Mysql 多表联合查询效率分析及优化
1. 多表连接类型 1. 笛卡尔积(交叉连接) 在MySQL中可以为CROSS JOIN或者省略CROSS即JOIN,或者使用',' 如: [sql] view plaincopy print? S ...
mysql 多表查询慢_详解Mysql多表联合查询效率分析及优化
1. 多表连接类型1. 笛卡尔积(交叉连接) 在MySQL中可以为CROSS JOIN或者省略CROSS即JOIN,或者使用',' 如: SELECT * FROM table1 CROSS JOI ...
element下拉列表触发_记一次vue长列表的内存性能分析和优化
好久没写东西,博客又长草了,这段时间身心放松了好久,都没什么主题可以写了上周接到一个需求,优化vue的一个长列表页面,忙活了很久也到尾声了,内存使用和卡顿都做了一点点优化,还算有点收获写的有点啰嗦 ...

mysql shrink_MySQL Group Replication内存使用分析和优化-1

mysql shrink_MySQL Group Replication内存使用分析和优化-1相关推荐

最新文章

热门文章