sql server表分区_介绍分区表SQL Server增量统计信息

sql server表分区

If you are maintaining a very large database, you might be well aware of the pain to perform update statistics on a very large table.

如果您维护的是非常大的数据库，那么您可能会很清楚在非常大的表上执行更新统计信息的痛苦。

This article introduces incremental statistics which is available from SQL Server 2014 highly simplifies statistics management on very large partitioned tables.

本文介绍了可从SQL Server 2014获得的增量统计信息，从而极大地简化了非常大的分区表的统计信息管理。

SQL Server增量统计 (SQL Server Incremental Statistics)

Accurate statistics are essential to allow query optimizer to generate a good enough query plan. In a very large partitioned table, updating table statistics requires to sample rows across all partitions and the statistics reflects the data distribution of the table as whole. Update statistics takes a lot of I/O and CPU resources not to mention the duration can be very lengthy.

准确的统计信息对于使查询优化器生成足够好的查询计划至关重要。在一个很大的分区表中，更新表统计信息需要对所有分区中的行进行采样，并且该统计信息反映了整个表的数据分布。更新统计信息会占用大量I / O和CPU资源，更不用说持续时间了。

Imagine the data distribution remain the same for all previous partitions, and you only need SQL Server to know the changed data distribution for a newly created\loaded partition. This sounds like a common scenario and now you can manage this scenario efficiently using Incremental Statistics which is built-in on SQL Server 2014 and onwards.

想象一下，所有以前的分区的数据分布都保持不变，而您只需要SQL Server就能知道新创建\加载的分区的更改后的数据分布。这听起来像是一种常见方案，现在您可以使用SQL Server 2014及更高版本内置的增量统计信息来有效地管理此方案。

Prior to SQL Server 2014, the similar workaround to maintain partition specific statistics is to create filtered statistics for each partition manually and update the specific partition statistics.

在SQL Server 2014之前，维护分区特定统计信息的类似解决方法是手动为每个分区创建过滤的统计信息并更新特定分区统计信息。

This article will utilize WideWorldImporters database on SQL Server 2016 Developer Edition Service Pack 1 to understand the utilization of Incremental Statistics.

本文将利用SQL Server 2016 Developer Edition Service Pack 1上的WideWorldImporters数据库来了解增量统计信息的利用率。

分区表简介 (Brief on partitioned tables)

WideWorldImporters is a great sample database as it comes with 2 partitioned tables – Purchasing.SupplierTransactions and Sales.CustomerTransactions.

WideWorldImporters是一个很好的示例数据库，它带有2个分区表-Purchasing.SupplierTransactions和Sales.CustomerTransactions。

For simplicity, we will just focus on table Purchasing.SupplierTransactions in this article.

为了简单起见，我们在本文中仅关注表Purchasing.SupplierTransactions。

Incremental statistics will only work on statistics which the index definition uses the same partition scheme as the partitioning column on the table to be able to set STATISTICS_INCREMENTAL = ON.

增量统计信息仅适用于索引定义使用与表上的分区列相同的分区方案的统计信息，以便能够将STATISTICS_INCREMENTAL = ON设置。


USE WideWorldImporters
GO
SELECTi.name AS Index_name, i.Type_Desc AS Type_Desc, ds.name AS DataSpaceName, ds.type_desc AS DataSpaceTypeDesc, st.is_incremental
FROM sys.objects AS o
JOIN sys.indexes AS i
ON o.object_id = i.object_id
JOIN sys.data_spaces ds
ON ds.data_space_id = i.data_space_id
JOIN sys.stats st
ON st.object_id = o.object_id AND st.name = i.name
LEFT OUTER JOIN sys.dm_db_index_usage_stats AS s
ON i.object_id = s.object_id
AND i.index_id = s.index_id AND s.database_id = DB_ID()
WHERE o.type = 'U'
AND i.type <= 2
AND o.object_id = OBJECT_ID('Sales.CustomerTransactions')

If you try to update partition level statistics on an index statistics which has not been set to use incremental statistics, it will prompt an error.

如果尝试在尚未设置为使用增量统计信息的索引统计信息上更新分区级别统计信息，它将提示错误。


UPDATE STATISTICS [WideWorldImporters].[Sales].[CustomerTransactions]
(CX_Sales_CustomerTransactions) WITH RESAMPLE ON PARTITIONS(1)

Msg 9111, Level 16, State 1, Line 34
UPDATE STATISTICS ON PARTITIONS syntax is not supported for non-incremental statistics.

Msg 9111，第16级，状态1，第34行
非增量统计信息不支持UPDATE STATISTICS ON PARTITIONS语法。

Note that argument RESAMPLE is required (argument FULLSCAN or SAMPLE number PERCENT is not supported) when updating partition level statistics. RESAMPLE reads the leaf-level statistics using the same sample rates and merge this result back into the main statistics histogram.

请注意，更新分区级别统计信息时，需要参数RESAMPLE（不支持参数FULLSCAN或SAMPLE number PERCENT）。 RESAMPLE使用相同的采样率读取叶级统计数据，并将此结果合并回主要统计数据直方图中。

A different partition sampling rate cannot be merged together and the syntax constraints made sure this does not occur as well.

无法将不同的分区采样率合并在一起，并且语法约束确保也不会发生这种情况。

启用增量统计 (Enabling Incremental Statistics)

There is a database level setting to enable incremental statistics. When the option INCREMENTAL is turn on at the database level, newly auto created column statistics will use incremental statistics on partitioned tables by default.

有一个数据库级别设置可启用增量统计信息。在数据库级别启用选项INCREMENTAL时，默认情况下，新自动创建的列统计信息将在分区表上使用增量统计信息。


USE [master]
GO
ALTER DATABASE [databaseName] SET AUTO_CREATE_STATISTICS ON (INCREMENTAL = ON)
GO

Existing index or column statistics will not be affected by this database option. You will have to manually set the existing statistics to be an incremental statistics on the partitioned table. The command is quite straight-forward as below.

现有的索引或列统计信息将不受此数据库选项的影响。您将必须手动将现有统计信息设置为分区表上的增量统计信息。该命令非常简单明了，如下所示。


UPDATE STATISTICS [WideWorldImporters].[Sales].[CustomerTransactions]
(CX_Sales_CustomerTransactions) WITH RESAMPLE ON PARTITIONS(3)

Once incremental statistics is enabled for an index statistics, the is_incremental value will be set to 1 on DMV sys.stats.

为索引统计信息启用增量统计信息后，DMV sys.stats上的is_incremental值将设置为1。


USE WideWorldImporters
GO
SELECT OBJECT_NAME(object_id) TableName, name , is_incremental, stats_id
FROM sys.stats
WHERE name = 'CX_Sales_CustomerTransactions'

Now that incremental statistics is enabled on CX_Sales_CustomerTransactions, we can update the index statistics at the partition level.

现在，已在CX_Sales_CustomerTransactions上启用了增量统计信息，我们可以在分区级别更新索引统计信息。

From SQL Server 2014 SP2 and SQL Server 2016 SP1, you can leverage a documented DMF sys.dm_db_incremental_stats_properties to view properties of the incremental statistics

在SQL Server 2014 SP2和SQL Server 2016 SP1中，您可以利用记录在案的DMF sys.dm_db_incremental_stats_properties来查看增量统计信息的属性


USE [WideWorldImporters]
GO
UPDATE STATISTICS Sales.CustomerTransactions(CX_Sales_CustomerTransactions) WITH RESAMPLE ON PARTITIONS(3)
GO
SELECT OBJECT_NAME(a.object_id) TblName, a.stats_id, b.partition_number, b.last_updated, b.rows, b.rows_sampled, b.steps
FROM sys.stats a
CROSS APPLY sys.dm_db_incremental_stats_properties(a.object_id, a.stats_id) b
WHERE a.name = 'CX_Sales_CustomerTransactions'

There are 5 partitions and each partition indicates it has its own statistics with a maximum of 200 steps for each partition that contains data. We have only updated the statistics for partition 3 and this is reflected by newer date and time stamp in the last_updated column.

有5个分区，每个分区都表明它具有自己的统计信息，每个包含数据的分区最多200个步骤。我们仅更新了分区3的统计信息，这由last_updated列中更新的日期和时间戳反映。

CE不使用增量统计信息 (Incremental statistics are not used by CE)

It is great to know each partition can contain up to 200 steps to form a histogram. However, SQL Server do not use this partition level statistics in Cardinality Estimate (CE). The main statistics which get updates from partition level statistics is the statistics that SQL Server will use. CE refers to an estimated prediction of the number of rows in query result and primarily derived from histograms that are created when indexes or statistics are created.

很高兴知道每个分区最多可以包含200个步骤来形成直方图。但是，SQL Server不在基数估计（CE）中使用此分区级别统计信息。从分区级别统计信息获取更新的主要统计信息是SQL Server将使用的统计信息。 CE是对查询结果中行数的估计预测，主要是从创建索引或统计信息时创建的直方图得出的。

To prove this statement, we will use DBCC SHOW_STATISTICS to get the statistics histogram of the main statistics and the incremental statistics and test the CE with a simple query.

为了证明这一说法，我们将使用DBCC SHOW_STATISTICS来获取主要统计数据和增量统计数据的统计直方图，并通过简单的查询来测试CE。

主要统计 (Main statistics)

At the time of this article, the only way to get detailed content of statistics histogram is to use DBCC SHOW_STATISTICS. Index statistics CX_Sales_CustomerTransactions has 200 steps and the screen shot is cut short to show the beginning and the end of the statistics histogram.

在撰写本文时，获取统计数据直方图的详细内容的唯一方法是使用DBCC SHOW_STATISTICS。索引统计信息CX_Sales_CustomerTransactions具有200个步骤，并且截屏显示了统计直方图的开始和结束。


DBCC SHOW_STATISTICS('Sales.CustomerTransactions', CX_Sales_CustomerTransactions) WITH
HISTOGRAM

Executing a simple query filtering on a TransactionDate = 2016-05-18 which has an equal EQ_ROWS in the statistics histogram returns with an accurate 101 rows in the Actual Number of Rows and also matches the Estimated Number of Rows in the query plan.

在TransactionDate = 2016-05-18上执行一个简单的查询过滤，该查询在统计直方图中具有相等的EQ_ROWS，返回的实际行数中的行数精确为101 ，并且与查询计划中的估计行数相匹配。


SELECT TransactionDate FROM [Sales].[CustomerTransactions]
WHERE TransactionDate = '2016-05-18'
OPTION (RECOMPILE)

分区级别增量统计 (Partition Level incremental Statistics)

We will use an undocumented trace flag 2309 to view the incremental statistics histogram. This trace flag allows an additional node_id parameter to be specified as an input into DBCC SHOW_STATISTICS command.

我们将使用未记录的跟踪标志2309查看增量统计数据直方图。此跟踪标志允许将一个附加的node_id参数指定为DBCC SHOW_STATISTICS命令的输入。

The node_id for a particular partition can be obtained using an undocumented DMF [sys].[dm_db_stats_properties_internal].

可以用一个未记录的DMF [SYS] [dm_db_stats_properties_internal]来获得用于特定分区NODE_ID。


USE [WideWorldImporters]
GO
SELECT node_id, last_updated, steps, next_sibling, left_boundary, right_boundary, partition_number
FROM [sys].[dm_db_stats_properties_internal](OBJECT_ID('Sales.CustomerTransactions'),1)
ORDER BY [node_id];

We will pick partition 4 which has 152 steps to display the incremental statistics histogram as an example.

我们以分区4为例，该分区有152个步骤来显示增量统计数据直方图。


DBCC TRACEON(2309);
GO
DBCC SHOW_STATISTICS('Sales.CustomerTransactions','CX_Sales_CustomerTransactions', 5);

Re-executing the SELECT query filtering on TransactionDate = 2016-05-27 indicates the Estimated Number of Rows is 87.6 whereas the actual number of rows read is 134 (134 is accurately reflected in the incremental statistics EQ_ROWS but is not used by SQL Server CE).

在TransactionDate = 2016-05-27上重新执行SELECT查询筛选，表明“估计的行数”为87.6，而实际读取的行数为134（134被准确地反映在增量统计信息EQ_ROWS中，但未被SQL Server CE使用）。

If you refer to the main statistics histogram, 87.6 is the AVG_RANGE_ROWS value for TransactionDate = 2016-05-31. So, SQL Server uses the main statistics histogram to get the CE and not the incremental statistics histogram.

如果您参考主要统计数据直方图，则TransactionDate = 2016-05-31的AVG_RANGE_ROWS值为87.6。因此，SQL Server使用主要统计数据直方图而不是增量统计数据直方图来获取CE。


SELECT TransactionDate FROM [Sales].[CustomerTransactions]
WHERE TransactionDate = '2016-05-27'
OPTION (RECOMPILE)

行动中的增量统计 (Incremental Statistics in Action)

We will insert 10 rows each into partition 1 and partition 5. The INSERT will not kick off automatic update statistics since the number of rows inserted are very small relative to the total number of rows in the table.

我们将分别在分区1和分区5中插入10行。由于插入的行数相对于表中的总行数很小，因此INSERT不会启动自动更新统计信息。


USE [WideWorldImporters]
GO
INSERT INTO [Sales].[CustomerTransactions] (CustomerTransactionID, CustomerID, TransactionTypeID, InvoiceID,
PaymentMethodID, TransactionDate, AmountExcludingTax, TaxAmount, TransactionAmount,
OutstandingBalance, FinalizationDate, LastEditedBy, LastEditedWhen)
SELECT TOP 10 CustomerTransactionID + 1000000, CustomerID, TransactionTypeID, InvoiceID,
PaymentMethodID, '20 Jan 2017', AmountExcludingTax, TaxAmount, TransactionAmount,
OutstandingBalance, FinalizationDate, LastEditedBy, LastEditedWhen
FROM [Sales].[CustomerTransactions]
UNION ALL
SELECT TOP 10 CustomerTransactionID + 2000000, CustomerID, TransactionTypeID, InvoiceID,
PaymentMethodID, '2 Jan 2013', AmountExcludingTax, TaxAmount, TransactionAmount,
OutstandingBalance, FinalizationDate, LastEditedBy, LastEditedWhen
FROM [Sales].[CustomerTransactions]

The index statistics CX_Sales_CustomerTransactions is not updated and hence the query plan below will not reflect the additional 10 rows inserted for TransactionDate = 2017-01-20.

索引统计信息CX_Sales_CustomerTransactions未更新，因此下面的查询计划不会反映为TransactionDate = 2017-01-20插入的另外10行。


SELECT TransactionDate FROM [Sales].[CustomerTransactions]
WHERE TransactionDate = '2017-01-20'
OPTION (RECOMPILE)

We now update the statistics for only partition 5 and check the main statistics

现在，我们仅更新分区5的统计信息，并检查主要统计信息


UPDATE STATISTICS Sales.CustomerTransactions(CX_Sales_CustomerTransactions) WITH
RESAMPLE ON PARTITIONS(5)
GO
DBCC SHOW_STATISTICS('Sales.CustomerTransactions', CX_Sales_CustomerTransactions) WITH
HISTOGRAM

The main statistics now has reflected statistics on partition 5 only, and the statistics histogram between partition 1 and partition 4 remains the same.

现在，主要统计信息仅反映了分区5的统计信息，分区1和分区4之间的统计直方图保持不变。

Re-executing the same query on TransactionDate = 2017-01-20 would now reflect a more accurate estimation of rows returned.

现在，在TransactionDate = 2017-01-20上重新执行相同的查询将反映出对返回的行的更准确的估计。


SELECT TransactionDate FROM [Sales].[CustomerTransactions]
WHERE TransactionDate = '2017-01-20'
OPTION (RECOMPILE)

Since index statistics CX_Sales_CustomerTransactions is updated using FULLSCAN, updating partition level statistics with RESAMPLE will also use FULLSCAN.

由于索引统计信息CX_Sales_CustomerTransactions是使用FULLSCAN更新的，因此使用RESAMPLE更新分区级别统计信息也将使用FULLSCAN。

Manually updating partition 1 and partition 5 statistics took 39 ms.

手动更新分区1和分区5的统计信息花费了39毫秒。


SET STATISTICS TIME ON
GO
UPDATE STATISTICS Sales.CustomerTransactions(CX_Sales_CustomerTransactions) WITH
RESAMPLE ON PARTITIONS(1, 5)

SQL Server Execution Times:
CPU time = 31 ms, elapsed time = 39 ms.

SQL Server执行时间：
CPU时间= 31毫秒，经过时间= 39毫秒。

The conventional way without incremental statistics to update statistics using FULLSCAN on index statistics CX_Sales_CustomerTransactions took 82 ms. On this very small scale of testing, this update statistics is twice slower than just updating incremental statistics of 2 partitions.

不使用增量统计信息对索引统计信息CX_Sales_CustomerTransactions使用FULLSCAN更新统计信息的常规方法花费了82毫秒。在这种非常小的测试规模上，此更新统计信息比仅更新2个分区的增量统计信息要慢两倍。

It is easy to imagine the benefit if the rows in the table is of magnitude in scale.

不难想象，如果表中的行规模可观，那么将带来好处。


SET STATISTICS TIME ON
GO
UPDATE STATISTICS Sales.CustomerTransactions(CX_Sales_CustomerTransactions) WITH
FULLSCAN

SQL Server Execution Times:
CPU time = 78 ms, elapsed time = 82 ms.

SQL Server执行时间：
CPU时间= 78毫秒，经过时间= 82毫秒。

摘要 (Summary)

Incremental Statistics are only relevant for partitioned tables, and this feature is a clever way to allow more efficient statistics management for very large partitioned tables.

增量统计信息仅与分区表有关，此功能是一种聪明的方法，可以对非常大的分区表进行更有效的统计信息管理。

Whilst the partition level statistics are not used by SQL Server CE, allowing finer grain control to only update subset of the main statistics based on partitions which has changed data only helps tremendously with the performance of statistics maintenance.

尽管SQL Server CE不使用分区级别的统计信息，但允许更精细的控制仅基于已更改数据的分区更新主要统计信息的子集，这仅对统计信息维护的性能有很大帮助。

翻译自: https://www.sqlshack.com/introducing-sql-server-incremental-statistics-for-partitioned-tables/

sql server表分区