sql server 主键

In this blog post, we are going to talk about another cardinality estimation model enhancement in SQL Server 2014 – Overpopulated Primary Key (OPK).

在此博客文章中,我们将讨论SQL Server 2014中的另一个基数估计模型增强功能–人口过多的主键(OPK)。

Consider a fact table that contains information about some sales, for example, and a date dimension table. Usually, a fact table contains the data about the current year and past years, but a dimension table usually contains the data for the next few years also.

考虑一个事实表,其中包含有关某些销售的信息(例如)和日期维度表。 通常,事实表包含有关当年和过去几年的数据,但是维度表通常也包含未来几年的数据。

If we are joining two tables fact and dimension and filtering on dimension table we usually demand the rows that do exist in a fact table, for example, we ask for the last month sales, but not for the next year sales. We also remember that a filter influences a join estimation, the join column statistics information should be modified according to the filter selectivity and cardinality. I don’t know the precise formula that is used for that in SQL Server, however, the algorithm is called Selection Without Replacement and is taken from the probability theory (the exact method doing maths in SQL Server is sqllang!CCardUtilSQL12::ProbSampleWithoutReplacement). You may read some details about the similar algorithm in Oracle further in useful links.

如果我们要连接两个表事实和维度并在维度表上进行过滤,则通常需要事实表中确实存在的行,例如,我们要求上个月的销售额,而不要求下一年的销售额。 我们还记得过滤器会影响联接估计,联接列统计信息应根据过滤器的选择性和基数进行修改。 我不知道用于SQL Server的精确公式,但是该算法称为“无替换选择”,它取自概率论(SQL Server中进行数学运算的确切方法是sqllang!CCardUtilSQL12 :: ProbSampleWithoutReplacement) 。 您可以在有用的链接中进一步阅读有关Oracle中类似算法的一些详细信息。

What is important for that post, that the number of distinct values in the join columns is used to calculate the join selectivity after filtering by another column. As we said before, a dimension table usually contains more distinct values than a fact table, because it is filled with the data for the next few years also, but we usually select only the relevant data, that is present in a fact table, i.e. those overpopulated distinct values should be somehow compensated and not concerned when we estimating the join over filtered dimension. That is what Overpopulated Primary Key model assumption about.

对于该帖子很重要的是,连接列中不同值的数量用于在被另一列过滤后计算连接选择性。 如前所述,维表通常比事实表包含更多不同的值,因为维表也填充了未来几年的数据,但是我们通常只选择事实表中存在的相关数据,即当我们估计过滤后的维度的联接时,应该以某种方式补偿那些过度填充的不同值,而不用担心。 这就是人口过多的主键模型假设。

Let’s see this in action. We will use AdventureWorksDW2012 database and issue two identical queries, the first one – uses the new CE framework (I also add TF 2363, to view some diagnostic output), the second query uses the old framework (because we run it with TF 9481 – that forces the old CE behavior).

让我们看看实际情况。 我们将使用AdventureWorksDW2012数据库并发出两个相同的查询,第一个查询–使用新的CE框架(我还添加了TF 2363,以查看某些诊断输出),第二个查询使用旧的框架(因为我们使用TF 9481运行它–强制执行旧的CE行为)。

alter database AdventureWorksDW2012 set compatibility_level = 120;
go
set statistics xml on
selectsum(s.SalesAmount)
fromdbo.FactInternetSales sjoin dbo.DimDate d on d.DateKey = s.DueDateKey
whered.CalendarYear = 2008
option(querytraceon 3604, querytraceon 2363)
go
selectsum(s.SalesAmount)
fromdbo.FactInternetSales sjoin dbo.DimDate d on d.DateKey = s.DueDateKey
whered.CalendarYear = 2008
option(querytraceon 9481)
set statistics xml off
go

The actual number of rows for both of the queries equals 34 229, let’s look at the estimates.

这两个查询的实际行数等于34 229 ,让我们看一下估计值。

The estimation with the new CE framework is much closer to the reality (26 154 new vs. 10 047 old vs. 34 229 actual). This is because the join selectivity was scaled to compensate the key overpopulation. If you look at the diagnostic output, you’ll see the message that tells you, that this action was taken.

与新的CE架构的估计更接近于现实(26 154新主场迎战1004734 229实际)。 这是因为按比例选择了连接选择性以补偿关键的人口过剩。 如果查看诊断输出,将看到一条消息,告诉您已采取了该操作。

模型变化 (The Model Variation)

As you may suppose in this case the model variation would be not to use the OPK assumption and do not adjust the selectivity. To do it, you should run the query with TF 9482.

您可能会在这种情况下假设模型变化是不使用OPK假设并且不调整选择性。 为此,您应该使用TF 9482运行查询。

set statistics xml on
--New - OPK disabled
selectsum(s.SalesAmount)
fromdbo.FactInternetSales sjoin dbo.DimDate d on d.DateKey = s.DueDateKey
whered.CalendarYear = 2008
option(querytraceon 9482)
go
-- Old
selectsum(s.SalesAmount)
fromdbo.FactInternetSales sjoin dbo.DimDate d on d.DateKey = s.DueDateKey
whered.CalendarYear = 2008
option(querytraceon 9481)
set statistics xml off
go

If we now examine the estimates we will see that they are very close, you also won’t see the “Scaling join selectivity…” output in the diagnostic console messages.

如果现在检查估算值,我们会发现它们非常接近,您也将不会在诊断控制台消息中看到“伸缩连接选择性…”输出。

Again, I should warn everybody. This is undocumented and should not be used in production.

同样,我应该警告大家。 这是无证的,不应在生产中使用。

The final thing to mention, that there is nothing special about a fact and a dimension table for the OPK to be used. The decision to adjust the selectivity is made based on the difference between the distinct value counts. That means that you may observe this behavior, not only in DW databases, but in OLTP also, depending on your data distribution.

最后要提到的是,要使用的OPK的事实和维度表没有什么特别的。 基于不同值计数之间的差异来做出调整选择性的决定。 这意味着,不仅在DW数据库中,而且在OLTP中,您都可能会观察到此行为,具体取决于您的数据分布。

That’s all for this post. Next, we will talk about one of my favorite and really helpful changes in the new CE – the Ascending Key situation.

这就是这篇文章的全部内容。 接下来,我们将讨论我在新的CE中最喜欢的且真正有用的更改之一-升序关键情况。

目录 (Table of contents)

Cardinality Estimation Role in SQL Server
Cardinality Estimation Place in the Optimization Process in SQL Server
Cardinality Estimation Concepts in SQL Server
Cardinality Estimation Process in SQL Server
Cardinality Estimation Framework Version Control in SQL Server
Filtered Stats and CE Model Variation in SQL Server
Join Containment Assumption and CE Model Variation in SQL Server
Overpopulated Primary Key and CE Model Variation in SQL Server
Ascending Key and CE Model Variation in SQL Server
MTVF and CE Model Variation in SQL Server
SQL Server中的基数估计角色
基数估计在SQL Server优化过程中的位置
SQL Server中的基数估计概念
SQL Server中的基数估计过程
SQL Server中的基数估计框架版本控制
SQL Server中的筛选后的统计信息和CE模型变化
在SQL Server中加入包含假设和CE模型变化
SQL Server中人口过多的主键和CE模型的变化
SQL Server中的升序密钥和CE模型变化
SQL Server中的MTVF和CE模型变化

参考资料 (References)

  • Optimizing Your Query Plans with the SQL Server 2014 Cardinality Estimator 使用SQL Server 2014基数估计器优化查询计划
  • Testing cardinality estimation models in SQL server 在SQL Server中测试基数估计模型
  • Cost Based Oracle: Fundamentals Chapter 10: Join Cardinality 基于成本的Oracle:基础知识第10章:联接基数

翻译自: https://www.sqlshack.com/overpopulated-primary-key-and-ce-model-variation/

sql server 主键

sql server 主键_SQL Server中人口过多的主键和CE模型的变化相关推荐

  1. sql server新增列_SQL Server 2016安装中的新增功能?

    sql server新增列 SQL Server 2016 introduced officially on the 1st of June 2016. It comes with many new ...

  2. sql 自定义函数 示例_SQL Server SESSION_CONTEXT()函数与示例

    sql 自定义函数 示例 This article explores the SQL Server session context function, SESSION_CONTEXT() and pe ...

  3. sql server高可用_SQL Server始终在线可用性组采访问题与解答

    sql server高可用 In this article, we will discuss a number of interview questions that you could be ask ...

  4. sql库缓存命中率_SQL Server内存性能指标–第4部分–缓冲区高速缓存命中率和页面寿命期望

    sql库缓存命中率 In SQL Server performance metrics – part 3, we presented some of the SQL Server Buffer Man ...

  5. sql服务器内存不足_SQL Server内存性能指标–第2部分–可用字节,总服务器和目标服务器内存

    sql服务器内存不足 first part of this series, we started with most important SQL Server performance memory m ...

  6. sql server高可用_SQL Server 2019常规可用性和安装概述

    sql server高可用 介绍 (Introduction) On November 4th, 2019, during the Ignite conference at Orlando, Micr ...

  7. sql server表分区_SQL Server中的FORCESCAN和分区表

    sql server表分区 I would like to share one curios case that I recently came across. 我想分享一下我最近遇到的一个古玩案例. ...

  8. sql server新增列_SQL Server 2017中的新增功能

    sql server新增列 SQL Server 2017 is considered a major release in the history of the SQL Server life cy ...

  9. sql 触发器嵌套条件_SQL Server中的嵌套触发器

    sql 触发器嵌套条件 Nested Triggers in SQL Server are actions that automatically execute when a certain data ...

最新文章

  1. 弗洛伊德算法(Floyd)简介
  2. 周志华、张潼亲自辅导AI课程,DeeCamp 2019正式启动
  3. Openresty最佳案例 | 第9篇:Openresty实现的网关权限控制
  4. 结构体指针和数组理解
  5. Symfony2插件StofDoctrineExtensionsBundle的使用说明
  6. java clock计时_Java Clock类| systemDefaultZone()方法与示例
  7. activex for chrome扩展程序 下载”_Chrome扩展程序一键生成网页骨架屏
  8. 什么是条件组合覆盖_什么是综合评价?高中生报名有哪些好处?
  9. mysql性能优化学习_mysql学习——性能优化之sql优化
  10. mybatis-plus中like的使用说明
  11. 从入门到入土:基于C语言实现并发Web服务器|父进程子进程|代码展示
  12. java简历下载_java个人简历模板下载
  13. linux md5校验脚本,Linux md5sum命令的使用方法
  14. [Noi online-j]T1 切蛋糕
  15. 令人眼前一亮的IDEA 2021
  16. python(3.10,Win10 64位)的wordcloud安装
  17. android显卡效果吗,安卓模拟器显卡渲染模式中DirectX和OpenGL两个有什么区别?
  18. html ppt在线浏览,怎么在网页中在线浏览PPT文档?
  19. word之表格如何自动换页?
  20. android系统的市场占有率,安卓系统市场占有率竟然比苹果iOS高了这么多

热门文章

  1. python连接redis集群如何释放内存_python 连接redis集群
  2. 08-图8 How Long Does It Take
  3. Docker(一):Docker入门
  4. 如何建立DB2分区数据库?(转)
  5. Ubuntu下配置使用maven
  6. RSA3:预提取数据
  7. 浅谈axios.interceptors拦截器
  8. DeprecationWarning:current URL string parser is deprecated, and will be removed in a future version.
  9. python 结构体数组_关于python:将结构化数组转换为常规NumPy数组
  10. 四十五岁了,存款三十万,是放手一搏,还是安于现状?