mysql物理优化_mysql物理优化器代价模型分析【原创】

1.引言

mysql的sql server在根据where condition检索数据的时候，一般会有多种数据检索的方法，其会根据各种数据检索方法代价的大小，选择代价最小的那个数据检索方法。

比如说这个语句，where col1=x and col2=y and col3 >z ,同时存在inx_col1，inx_col2，inx_col3，inx_col1_col2_col3这四个索引，sql server要解决的问题有1)选择哪个索引、2)是索引range扫描还是ref扫描、3)table scan的方式是否可行。

mysql会根据以下几种数据检索策略选择代价最小的策略来从数据表中获取数据，1)各个索引的range scan代价2)各个索引的ref scan代价3)table scan的代价。如何计算这些代价，是本文详细说明的重点。

总代价cost = cpu cost + io cost。

2 .代价因子

mysql的代价因子在内存中有一份副本，由Server_cost_constants和SE_cost_constants两个类组成。这两个类的具体数据成员如下。

Mysql Server代价因子

Server_cost_constants {

m_row_evaluate_cost//行记录条件谓词评估代价

m_key_compare_cost //键值比较代价

m_memory_temptable_create_cost //内存临时表创建代价

m_memory_temptable_row_cost //内存临时表的行代价

m_disk_temptable_create_cost //磁盘临时表创建代价

m_disk_temptable_row_cost

}

存储引擎代价因子

SE_cost_constants{

m_memory_block_read_cost//从buffer pool中读取一个页面的代价

m_io_block_read_cost //从文件系统中读取一个页面的代价，buffer miss的场景

m_memory_block_read_cost_default

m_io_block_read_cost_default

}

mysql的代价因子在系统的持久化系统表中也有一份副本，对应mysql.server_cost和mysql.engine_cost两个表，这两个表中的字段与内存中的类字段相同。DBA可以根据实际的硬件情况测试，测试出最适合的代价因子，然后update系统表中对应的字段。再然后执行flush OPTIMIZER_COSTS命令，将修改反应到内存中数据，这样新连接上来的mysql session会读取到内存中数据，然后以新的代价因子计算代价数。

代价因子如何根据实际的硬件环境与负载压力自适应地调整，是一个重要的研究课题。

3 .统计信息

sql server需要的统计信息是由存储引擎innodb提供的，调用innodb提供的api可以获取这些统计信息，本文的后半部分会罗列这些api。innodb的统计信息根据需要可以持久化到系统表中。mysql.innodb_table_stats和mysql.innodb_index_stats存储了表的统计信息和索引的统计信息。

mysql.innodb_table_stats表中字段说明

database_name 库名

table_name 表名

n_rows 表中的数据行数

clustered_index_size 聚集索引的页面数

sum_of_other_index_sizes 其他非主键索引的页面数

last_update 最后更新这张表的时间

mysql.innodb_index_stats表中字段说明

database_name 库名

table_name 表名

index_name 索引名

stat_name 统计项名称

stat_value 统计项值

sample_size 采样的页面数

last_update 最后更新这张表的时间

其中stat_name 统计项名称包括：

n_diff_pfxNN 为不同前缀列的cardinality，即不同前缀字段的 distinct value个数

n_leaf_page 索引叶子节点页面数目

size 索引页面数目

4.代价的计算公式

cpu代价计算

double row_evaluate_cost(doublerows)

{return rows * m_server_cost_constants->row_evaluate_cost();

}

table scan IO代价计算

Cost_estimate handler::table_scan_cost()

{double io_cost= scan_time() * table->cost_model()->page_read_cost(1.0);

}

ref and range scan IO代价计算

聚集索引扫描IO代价计算公式

Cost_estimate handler::read_cost(uint index, double ranges, doublerows)

{double io_cost= read_time(index, static_cast(ranges),

static_cast(rows)) *table->cost_model()->page_read_cost(1.0);

}

二级索引覆盖扫描(不需要回表)IO代价计算公式

Cost_estimate handler::index_scan_cost(uint index, double ranges, doublerows)

{double io_cost= index_only_read_time(index, rows) *table->cost_model()->page_read_cost_index(index, 1.0);

}

二级索引非覆盖扫描(需要回表)IO代价计算公式

min( table→cost_model()→page_read_cost(tmp_fanout), tab→worst_seeks )

估算读取pages个聚集索引页面所花费的代价，page数乘以代价因子

double Cost_model_table::page_read_cost(double pages)

估算读取pages个指定index索引页面所花费的代价数。

double Cost_model_table::page_read_cost_index(uint index, double pages)

5. innodb统计信息api

全表扫描聚集索引时，聚集索引(主键)占用的所有页面数

double ha_innobase::scan_time()

估算在聚集索引上，扫描rows条记录，需要读取的页面数

double ha_innobase::read_time(uint index, double ranges, double rows)

估算在指定keynr索引进行覆盖扫描(不需要回表)，扫描records条记录，需要读取的索引页面数

double handler::index_only_read_time(uint keynr, double records)

估算指定keynr索引在范围(min_key,max_key)中的记录数量

ha_innobase::records_in_range(uint keynr, /*!< in: index number*/key_range*min_key, /*!< in: start key value of the

key_range *max_key) /*!< in: range end key val, may

)

估算聚集索引内存中页面数占其所有页面数的比率

double handler::table_in_memory_estimate()

估算二级索引内存中页面数占其所有页面数的比率

double handler::index_in_memory_estimate(uint keyno)

6.开启优化器跟踪

set session optimizer_trace="enabled=on";

explain your sqlselect * from information_schema.optimizer_trace;

7.优化器跟踪示例

"rows_estimation": [

{"table": "`tab`","range_analysis": {"table_scan": {"rows": 5,"cost": 4.1},"potential_range_indexes": [

{"index": "PRIMARY","usable": false,"cause": "not_applicable"},

{"index": "inx_clo2","usable": true,"key_parts": ["clo2","clo1"]

{"index": "inx_clo3","usable": true,"key_parts": ["clo3","clo1"]

{"index": "inx_clo2_clo3","usable": true,"key_parts": ["clo2","clo3","clo1"]

}

],"best_covering_index_scan": {"index": "inx_clo2_clo3","cost": 2.0606,"chosen": true},"setup_range_conditions": [

],"group_index_range": {"chosen": false,"cause": "not_group_by_or_distinct"},"analyzing_range_alternatives": {"range_scan_alternatives": [

{"index": "inx_clo2","ranges": ["hu <= clo2 <= hu"],"index_dives_for_eq_ranges": true,"rowid_ordered": true,"using_mrr": false,"index_only": false,"rows": 2,"cost": 3.41,"chosen": false,"cause": "cost"},

{"index": "inx_clo3","ranges": ["huan <= clo3 <= huan"],"index_dives_for_eq_ranges": true,"rowid_ordered": true,"using_mrr": false,"index_only": false,"rows": 1,"cost": 2.21,"chosen": false,"cause": "cost"},

{"index": "inx_clo2_clo3","ranges": ["hu <= clo2 <= hu AND huan <= clo3 <= huan"],"index_dives_for_eq_ranges": true,"rowid_ordered": true,"using_mrr": false,"index_only": true,"rows": 1,"cost": 1.21,"chosen": true}

],"analyzing_roworder_intersect": {"intersecting_indexes": [

{"index": "inx_clo2_clo3","index_scan_cost": 1,"cumulated_index_scan_cost": 1,"disk_sweep_cost": 0,"cumulated_total_cost": 1,"usable": true,"matching_rows_now": 1,"isect_covering_with_this_index": true,"chosen": true}

],"clustered_pk": {"clustered_pk_added_to_intersect": false,"cause": "no_clustered_pk_index"},"chosen": false,"cause": "too_few_indexes_to_merge"}

},"chosen_range_access_summary": {"range_access_plan": {"type": "range_scan","index": "inx_clo2_clo3","rows": 1,"ranges": ["hu <= clo2 <= hu AND huan <= clo3 <= huan"]

},"rows_for_plan": 1,"cost_for_plan": 1.21,"chosen": true}

}

]

{"considered_execution_plans": [

{"plan_prefix": [

],"table": "`tab`","best_access_path": {"considered_access_paths": [

{"access_type": "ref","index": "inx_clo2","rows": 2,"cost": 2.4,"chosen": true},

{"access_type": "ref","index": "inx_clo3","rows": 1,"cost": 1.2,"chosen": true},

{"access_type": "ref","index": "inx_clo2_clo3","rows": 1,"cost": 1.2,"chosen": false},

{"rows_to_scan": 1,"access_type": "range","range_details": {"used_index": "inx_clo2_clo3"},"resulting_rows": 1,"cost": 1.41,"chosen": false}

]

},"condition_filtering_pct": 40,"rows_for_plan": 0.4,"cost_for_plan": 1.2,"chosen": true}

]

{"attaching_conditions_to_tables": {"original_condition": "((`tab`.`clo2` = 'hu') and (`tab`.`clo3` = 'huan'))","attached_conditions_computation": [

],"attached_conditions_summary": [

{"table": "`tab`","attached": "(`tab`.`clo2` = 'hu')"}

]

}

{"refine_plan": [

{"table": "`tab`"}

]

}

]

View Code

mysql物理优化_mysql物理优化器代价模型分析【原创】相关推荐

物联网 mysql数据库优化_MySQL数据库优化大全方法汇总-阿里云开发者社区
随着数据和负载增加,MySQL数据库会日渐缓慢,性能越来越差,用户体验也随之变差,所以数据库性能优化十分紧迫,云吞铺子分享MySQL数据库优化大全: MySQL数据库优化云吞铺子先模拟一下数据库访问 ...
mysql 阿里云优化_MySQL性能优化速记
总结自<MySQL 5.7从入门到精通(视频教学版)>刘增杰编著. 优化简介 MySQL数据库优化是多方面的,原则是减少系统的瓶颈,减少资源的占用,增加系统的反应速度. 在MySQL中,可 ...
mysql tcp性能优化_MySQL性能优化：使用pt-query-digest分析慢查询日志
一.简介 pt-query-digest是用于分析mysql慢查询的一个工具,它可以分析binlog.General log.slowlog,也可以通过SHOWPROCESSLIST或者通过tcpdu ...
mysql locate索引_MYSQL索引优化
1.查看sql的执行频率 MySQL 客户端连接成功后,通过 show [session|global] status 命令可以提供服务器状态信息.show [session|global] stat ...
table 条数过大优化_MySQL 数据库优化，看这篇就够了 | 不长不短，2000 字小结
前言数据库优化一方面是找出系统的瓶颈,提高MySQL数据库的整体性能,而另一方面需要合理的结构设计和参数调整,以提高用户的相应速度,同时还要尽可能的节约系统资源,以便让系统提供更大的负荷. 1. 优 ...
感知器(perceptron)模型分析及实现
感知器模型分析及实现 1. 感知器模型 2. 几何意义 3. 感知器模型的训练 4. 批处理训练过程 4.1 训练数据的规范化 4.2 批处理感知器算法实现代码 \qquad 感知器 (percep ...
mysql double 索引_MySQL架构优化实战系列1：数据类型与索引调优全解析
一.数据类型优化数据类型整数数字类型:整数和实数 tinyint(8).smallint(16).mediuint(24).int(32).bigint(64) 数字表示对应最大存储位数,如 t ...
高级mysql数据库优化_MySQL数据库优化建议
1.对查询进行优化,应尽量避免全表扫描,首先应考虑在WHERE及ORDER BY涉及的列上建立索引. 缺省情况下建立的索引是非群集索引,但有时它并不是最佳的.在非群集索引下,数据在物理上随机存放在数据 ...
Mysql删除语句优化_MySQL性能优化之常用SQL语句优化
SQL性能优化的目标:至少要达到range级别,要求是ref级别,consts最高.[阿里巴巴JAVA开发手册] 说明: 1).consts单表中最多只有一个匹配行(主键/唯一索引),在优化阶段即可读 ...

mysql物理优化_mysql物理优化器代价模型分析【原创】

mysql物理优化_mysql物理优化器代价模型分析【原创】相关推荐

最新文章

热门文章