sql索引调优

描述 (Description)

Indexing is key to efficient query execution. Knowing what indexes are unneeded, incorrectly used, or unused can allow us to reduce disk usage and improve write performance at little cost to your organization.

索引是有效执行查询的关键。 了解不需要,不正确使用或未使用哪些索引,可以使我们减少磁盘使用并提高写入性能,而对您的组织却几乎没有成本。

This is the first part in a series that will culminate in an automated index analysis solution, allowing you to understand index usage proactively, rather than waiting for things to break in order to resolve them.

这是本系列的第一部分,该系列的最终内容将是一个自动化的索引分析解决方案,使您可以主动了解索引的使用情况,而不必等待事情破裂来解决它们。

介绍 (Introduction)

Adding indexes to our important tables is likely a regular part of our performance tuning regimen. When we identify a frequently executed query that is scanning a table or causing an expensive key lookup, one of our first considerations is if an index can solve the problem.

将索引添加到我们的重要表中可能是我们性能调整方案的常规部分。 当我们识别出正在扫描表或导致昂贵的键查找的频繁执行的查询时,我们的首要考虑之一是索引是否可以解决问题。

While indexes can improve query execution speed, the price we pay is on index maintenance. Update and insert operations need to update the index with new data whenever columns in the index are included. This means that writes will slow down slightly with each index we add to a table. For example, if we were to insert a row into Production.Product (in AdventureWorks), the resulting execution plan for the insert would look like this:

虽然索引可以提高查询的执行速度,但我们付出的代价是维护索引。 每当包含索引中的列时,更新和插入操作都需要使用新数据来更新索引。 这意味着我们添加到表中的每个索引的写入速度都会稍微降低。 例如,如果要在Production.Product中插入一行(在AdventureWorks中 ),则插入的结果执行计划将如下所示:

Note the step “Clustered Index Insert” and the note that 4 non-clustered indexes were also inserted into, in addition to the clustered index. Any indexes we add would subsequently add to the write operations necessary to complete the overall operation.

注意步骤“ Clustered Index Insert”,注意除了聚簇索引外,还插入了4个非聚簇索引。 我们添加的所有索引随后都会添加到完成整体操作所需的写操作中。

To offset the frequent need to add or update indexes, we need to monitor index usage and identify when an existing index is no longer needed. This allows us to keep our indexing relevant and trim enough to ensure that we don’t waste disk space and IO on write operations to any unnecessary indexes.

为了抵消频繁添加或更新索引的需要,我们需要监视索引使用情况并确定何时不再需要现有索引。 这使我们能够保持索引的相关性并进行足够的调整,以确保我们不会在对任何不必要的索引进行写操作时浪费磁盘空间和IO。

索引利用率指标集合 (Index Utilization Metrics Collection)

The first step towards monitoring and intelligently responding to index usage needs is to collect and maintain a simple and easy-to-use data set of index usage metrics. This data set should allow us to quickly search for common scenarios in which we might consider removing or altering an index:

监视和智能地响应索引使用需求的第一步是收集和维护简单且易于使用的索引使用指标数据集。 此数据集应使我们能够快速搜索可能考虑删除或更改索引的常见方案:

  1. Unused indexes. 未使用的索引。
  2. Minimally used indexes. 最少使用的索引。
  3. Indexes that are written to significantly more than they are read. 写入的索引远多于读取的索引。
  4. Indexes that are scanned often, but rarely the target of seeks. 经常扫描的索引,但很少搜寻的目标。
  5. Indexes that are very similar and can be combined. 索引非常相似,可以合并。

SQL Server provides a dynamic management view that tracks all index usage: sys.dm_db_index_usage_stats. This view is a cumulative total of operations against indexes and is reset when SQL Server services are restarted. There was also a bug in previous versions of SQL Server in which an index rebuild would trigger a reset to these stats. Details on this bug and its resolution can be found in the references at the end of this article.

SQL Server提供了跟踪所有索引使用情况的动态管理视图: sys.dm_db_index_usage_stats 。 此视图是针对索引的累积操作总数,并且在重新启动SQL Server服务时将其重置。 在早期版本SQL Server中,还存在一个错误,其中索引重建会触发对这些统计信息的重置。 有关此错误及其解决方案的详细信息,请参见本文结尾处的参考。

Since this data is not maintained by SQL Server indefinitely, we need to create our own storage mechanism to ensure it is persisted through server restarts, allowing us to make smart decisions with a long-term data set.

由于此数据不是由SQL Server无限期维护的,因此我们需要创建自己的存储机制,以确保通过服务器重新启动将其保留下来,从而使我们能够使用长期数据集做出明智的决策。

A SELECT against this view returns the following data on my local server:

针对该视图的SELECT在我的本地服务器上返回以下数据:

Totals are given for seeks, scans, lookups, and updates, which allows us to accurately gauge the overall read/write operations against any given index. We also can see the last time an index had those operations performed against them, where NULL indicates that none have happened since the last instance restart.

给出了查找,扫描,查找和更新的总数,这使我们能够根据任何给定的索引准确地评估总体读/写操作。 我们还可以看到索引上一次对它们执行这些操作的时间,其中NULL表示自上一次实例重启以来未发生任何操作。

In order to collect this data, we will follow a relatively simple process:

为了收集这些数据,我们将遵循一个相对简单的过程:

  1. Create a table to store index metrics detail data. This will persist all data from each collection point-in-time and do so for a limited history. This detail can be useful for troubleshooting or seeing the status of an index at a given point-in-time. 创建一个表来存储索引指标详细数据。 这将保留每个收集时间点的所有数据,并保留有限的历史记录。 此详细信息对于在给定的时间点进行故障排除或查看索引状态很有用。
  2. Create a table to store aggregate index summary data. This provides a long-term view of how indexes have been used since they were created. We can clear this data at any point-in-time if we decide that we’d like to begin counting these metrics anew. 创建一个表来存储聚合索引摘要数据。 这提供了自索引创建以来如何使用索引的长期视图。 如果我们决定要重新计算这些指标,则可以在任何时间点清除此数据。
  3. Create and execute a stored procedure that will populate these tables. 创建并执行将填充这些表的存储过程。

Once created, a collection stored procedure can be run at regular intervals using any automated process, such as SQL Server Agent.

创建集合存储过程后,可以使用任何自动化过程(例如SQL Server代理)定期运行间隔存储过程。

模式创建 (Schema Creation)

Our first task is to create a holding table for detail data:

我们的首要任务是为详细数据创建一个保存表:


CREATE TABLE dbo.Index_Utiliztion_Details(  Index_Utiliztion_Metrics_Id INT NOT NULL IDENTITY(1,1) CONSTRAINT PK_Index_Utiliztion_Metrics PRIMARY KEY CLUSTERED,Index_Utiliztion_Details_Create_Datetime DATETIME NOT NULL,[Database_Name] SYSNAME,[Schema_Name] SYSNAME,Table_Name SYSNAME,Index_Name SYSNAME,User_Seek_Count BIGINT,User_Scan_Count BIGINT,User_Lookup_Count BIGINT,User_Update_Count BIGINT,Last_User_Seek DATETIME,Last_User_Scan DATETIME,Last_User_Lookup DATETIME,Last_User_Update DATETIME,);
CREATE NONCLUSTERED INDEX IX_Index_Utiliztion_Details_indexUtiliztionDetailsCreateDatetime ON dbo.Index_Utiliztion_Details (Index_Utiliztion_Details_Create_Datetime);

This table maintains a list of index metrics for each index inspected and tags it with the creation date/time, allowing us to trend usage over time, if needed. Data types for counts are chosen liberally as an index being hit 2.15 billion times is in no way an unattainable feat!

该表维护了每个检查索引的索引指标列表,并用创建日期/时间对其进行了标记,从而使我们可以根据需要趋势化使用时间。 可以自由选择计数的数据类型,因为达到21.5亿次的指数绝非难事!

The second table we need will store summary data, which will contain aggregate stats over a longer period of time:

我们需要的第二个表将存储摘要数据,其中将包含较长时间的汇总统计信息:


CREATE TABLE dbo.Index_Utiliztion_Summary(  Index_Utiliztion_Summary_Id INT NOT NULL IDENTITY(1,1) CONSTRAINT PK_Index_Utiliztion_Summary PRIMARY KEY CLUSTERED,[Database_Name] SYSNAME,[Schema_Name] SYSNAME,Table_Name SYSNAME,Index_Name SYSNAME,User_Seek_Count BIGINT,User_Scan_Count BIGINT,User_Lookup_Count BIGINT,User_Update_Count BIGINT,Last_User_Seek DATETIME,Last_User_Scan DATETIME,Last_User_Lookup DATETIME,Last_User_Update DATETIME,Index_Utiliztion_Summary_Create_Datetime DATETIME NOT NULL,Index_Utiliztion_Summary_Last_Update_Datetime DATETIME NOT NULL,User_Seek_Count_Last_Update BIGINT,User_Scan_Count_Last_Update BIGINT,User_Lookup_Count_Last_Update BIGINT,User_Update_Count_Last_Update BIGINT);

We include a create and last update time, allowing the viewer to know when an index was first tracked, and when the last time it was updated by the process. The four columns at the end track the last value for each aggregated count that was reported by the process. This ensures that when a restart occurs, we know exactly how to handle our aggregation, ie: Should we sum values or find the difference between them in order to determine the change since the last update.

我们包括一个创建和最后一次更新时间,使查看者可以知道何时首次跟踪索引,以及该过程最后一次更新的时间。 末尾的四列跟踪该过程报告的每个汇总计数的最后一个值。 这样可以确保在发生重新启动时,我们确切地知道如何处理聚合,即:我们应该求和或求出它们之间的差,以便确定自上次更新以来的变化。

Our last step in schema creation is to write a stored procedure that will perform this collection for us. By default, we will check index statistics on all tables in all databases. Indexed views are not included here, but could very easily be added if you had a frequent need to track their use.

模式创建的最后一步是编写一个存储过程,它将为我们执行此收集。 默认情况下,我们将检查所有数据库中所有表的索引统计信息。 此处未包括索引视图,但是如果您经常需要跟踪索引的使用,可以很容易地添加索引视图。

This stored procedure will iterate through all non-system databases, read from dm_db_index_usage_stats, join that data to other system views, and report it back to our permanent tables that were created above. This process is quite fast, as the volume of data we are looking at is bounded by the number of indexes you have. If only a single database or set of databases matters to you, then filtering can be performed in order to limit the database list to those of interest.

此存储过程将遍历所有非系统数据库,从dm_db_index_usage_stats中读取,将该数据加入其他系统视图,然后将其报告回上面创建的永久表。 这个过程非常快,因为我们正在查看的数据量受您拥有的索引数的限制。 如果只有一个数据库或一组数据库对您很重要,则可以执行过滤以将数据库列表限制为感兴趣的数据库列表。


IF EXISTS (SELECT * FROM sys.procedures WHERE procedures.name = 'Populate_Index_Utilization_Data')
BEGINDROP PROCEDURE dbo.Populate_Index_Utilization_Data;
END
GO/*    This stored procedure is intended to run semi-regularly (every 4-6 hours is likely sufficient) and will populate the table dbo.Missing_Index_Detailswith missing index data from the appropriate DMVs. This information can then be used in researching which indexes are not used, underused,or misused.
*/
CREATE PROCEDURE dbo.Populate_Index_Utilization_Data@Retention_Period_for_Detail_Data_Days TINYINT = 30,@Truncate_All_Summary_Data BIT = 0
AS
BEGINSET NOCOUNT ON;-- Remove old detail data based on the proc parameter. There is little need to save this data long-term.DELETE Index_Utiliztion_DetailsFROM dbo.Index_Utiliztion_DetailsWHERE Index_Utiliztion_Details.Index_Utiliztion_Details_Create_Datetime < DATEADD(DAY, -1 * @Retention_Period_for_Detail_Data_Days, CURRENT_TIMESTAMP);IF @Truncate_All_Summary_Data = 1BEGINTRUNCATE TABLE Index_Utiliztion_Summary;ENDDECLARE @Database_List TABLE(    [Database_Name] SYSNAME NOT NULL,Is_Processed BIT NOT NULL);DECLARE @Sql_Command NVARCHAR(MAX);DECLARE @Current_Database_Name SYSNAME;INSERT INTO @Database_List([Database_Name], Is_Processed)SELECTdatabases.name AS [Database_Name],0 AS Is_ProcessedFROM sys.databasesWHERE databases.name NOT IN ('master', 'msdb', 'model', 'tempdb', 'ReportServerTempDB', 'ReportServer');CREATE TABLE #Index_Utiliztion_Details(    Index_Utiliztion_Details_Create_Datetime DATETIME NOT NULL,[Database_Name] SYSNAME,[Schema_Name] SYSNAME,Table_Name SYSNAME,Index_Name SYSNAME,User_Seek_Count BIGINT,User_Scan_Count BIGINT,User_Lookup_Count BIGINT,User_Update_Count BIGINT,Last_User_Seek DATETIME,Last_User_Scan DATETIME,Last_User_Lookup DATETIME,Last_User_Update DATETIME  );WHILE EXISTS (SELECT * FROM @Database_List Database_List WHERE Database_List.Is_Processed = 0)BEGINSELECT TOP 1@Current_Database_Name = Database_List.[Database_Name]FROM @Database_List Database_ListWHERE Database_List.Is_Processed = 0SELECT@Sql_Command = ' USE [' + @Current_Database_Name + ']INSERT INTO #Index_Utiliztion_Details(Index_Utiliztion_Details_Create_Datetime, [Database_Name], [Schema_Name], Table_Name, Index_Name, User_Seek_Count,User_Scan_Count, User_Lookup_Count, User_Update_Count, Last_User_Seek, Last_User_Scan, Last_User_Lookup, Last_User_Update)SELECTCURRENT_TIMESTAMP AS Index_Utiliztion_Details_Create_Datetime,''' + @Current_Database_Name + ''' AS [Database_Name],schemas.name AS [Schema_Name],tables.name AS Table_Name,indexes.name AS Index_Name,dm_db_index_usage_stats.user_seeks AS User_Seek_Count,dm_db_index_usage_stats.user_scans AS User_Scan_Count,dm_db_index_usage_stats.user_lookups AS User_Lookup_Count,dm_db_index_usage_stats.user_updates AS User_Update_Count,dm_db_index_usage_stats.last_user_seek AS Last_User_Seek,dm_db_index_usage_stats.last_user_scan AS Last_User_Scan,dm_db_index_usage_stats.last_user_lookup AS Last_User_Lookup,dm_db_index_usage_stats.last_user_update AS Last_User_UpdateFROM ' + @Current_Database_Name + '.sys.dm_db_index_usage_statsINNER JOIN ' + @Current_Database_Name + '.sys.indexesON indexes.object_id = dm_db_index_usage_stats.object_idAND indexes.index_id = dm_db_index_usage_stats.index_idINNER JOIN ' + @Current_Database_Name + '.sys.tablesON tables.object_id = indexes.object_idINNER JOIN ' + @Current_Database_Name + '.sys.schemasON schemas.schema_id = tables.schema_idWHERE dm_db_index_usage_stats.database_id = (SELECT DB_ID(''' + @Current_Database_Name + '''));';EXEC sp_executesql @Sql_Command;UPDATE Database_ListSET Is_Processed = 1FROM @Database_List Database_ListWHERE [Database_Name] = @Current_Database_Name;ENDINSERT INTO dbo.Index_Utiliztion_Details(Index_Utiliztion_Details_Create_Datetime, [Database_Name], [Schema_Name], Table_Name, Index_Name, User_Seek_Count,User_Scan_Count, User_Lookup_Count, User_Update_Count, Last_User_Seek, Last_User_Scan, Last_User_Lookup, Last_User_Update)SELECT*FROM #Index_Utiliztion_Details;MERGE INTO dbo.Index_Utiliztion_Summary AS Utilization_TargetUSING (   SELECT*FROM #Index_Utiliztion_Details   ) AS Utilization_SourceON ( Utilization_Target.[Database_Name] = Utilization_Source.[Database_Name]AND Utilization_Target.[Schema_Name] = Utilization_Source.[Schema_Name]AND Utilization_Target.Table_Name = Utilization_Source.Table_NameAND Utilization_Target.Index_Name = Utilization_Source.Index_Name    )WHEN MATCHEDTHEN UPDATESET User_Seek_Count = CASE WHEN Utilization_Source.User_Seek_Count = Utilization_Target.User_Seek_Count_Last_UpdateTHEN Utilization_Target.User_Seek_CountWHEN Utilization_Source.User_Seek_Count >= Utilization_Target.User_Seek_CountTHEN Utilization_Source.User_Seek_Count + Utilization_Target.User_Seek_Count - Utilization_Target.User_Seek_Count_Last_UpdateWHEN Utilization_Source.User_Seek_Count < Utilization_Target.User_Seek_CountAND Utilization_Source.User_Seek_Count < Utilization_Target.User_Seek_Count_Last_UpdateTHEN Utilization_Target.User_Seek_Count + Utilization_Source.User_Seek_CountWHEN Utilization_Source.User_Seek_Count < Utilization_Target.User_Seek_CountAND Utilization_Source.User_Seek_Count > Utilization_Target.User_Seek_Count_Last_UpdateTHEN Utilization_Source.User_Seek_Count + Utilization_Target.User_Seek_Count - Utilization_Target.User_Seek_Count_Last_UpdateEND,User_Scan_Count = CASE WHEN Utilization_Source.User_Scan_Count = Utilization_Target.User_Scan_Count_Last_UpdateTHEN Utilization_Target.User_Scan_CountWHEN Utilization_Source.User_Scan_Count >= Utilization_Target.User_Scan_CountTHEN Utilization_Source.User_Scan_Count + Utilization_Target.User_Scan_Count - Utilization_Target.User_Scan_Count_Last_UpdateWHEN Utilization_Source.User_Scan_Count < Utilization_Target.User_Scan_CountAND Utilization_Source.User_Scan_Count < Utilization_Target.User_Scan_Count_Last_UpdateTHEN Utilization_Target.User_Scan_Count + Utilization_Source.User_Scan_CountWHEN Utilization_Source.User_Scan_Count < Utilization_Target.User_Scan_CountAND Utilization_Source.User_Scan_Count > Utilization_Target.User_Scan_Count_Last_UpdateTHEN Utilization_Source.User_Scan_Count + Utilization_Target.User_Scan_Count - Utilization_Target.User_Scan_Count_Last_UpdateEND,User_Lookup_Count = CASE WHEN Utilization_Source.User_Lookup_Count = Utilization_Target.User_Lookup_Count_Last_UpdateTHEN Utilization_Target.User_Lookup_CountWHEN Utilization_Source.User_Lookup_Count >= Utilization_Target.User_Lookup_CountTHEN Utilization_Source.User_Lookup_Count + Utilization_Target.User_Lookup_Count - Utilization_Target.User_Lookup_Count_Last_UpdateWHEN Utilization_Source.User_Lookup_Count < Utilization_Target.User_Lookup_CountAND Utilization_Source.User_Lookup_Count < Utilization_Target.User_Lookup_Count_Last_UpdateTHEN Utilization_Target.User_Lookup_Count + Utilization_Source.User_Lookup_CountWHEN Utilization_Source.User_Lookup_Count < Utilization_Target.User_Lookup_CountAND Utilization_Source.User_Lookup_Count > Utilization_Target.User_Lookup_Count_Last_UpdateTHEN Utilization_Source.User_Lookup_Count + Utilization_Target.User_Lookup_Count - Utilization_Target.User_Lookup_Count_Last_UpdateEND,User_Update_Count = CASE WHEN Utilization_Source.User_Update_Count = Utilization_Target.User_Update_Count_Last_UpdateTHEN Utilization_Target.User_Update_CountWHEN Utilization_Source.User_Update_Count >= Utilization_Target.User_Update_CountTHEN Utilization_Source.User_Update_Count + Utilization_Target.User_Update_Count - Utilization_Target.User_Update_Count_Last_UpdateWHEN Utilization_Source.User_Update_Count < Utilization_Target.User_Update_CountAND Utilization_Source.User_Update_Count < Utilization_Target.User_Update_Count_Last_UpdateTHEN Utilization_Target.User_Update_Count + Utilization_Source.User_Update_CountWHEN Utilization_Source.User_Update_Count < Utilization_Target.User_Update_CountAND Utilization_Source.User_Update_Count > Utilization_Target.User_Update_Count_Last_UpdateTHEN Utilization_Source.User_Update_Count + Utilization_Target.User_Update_Count - Utilization_Target.User_Update_Count_Last_UpdateEND,Last_User_Seek = CASEWHEN Utilization_Source.Last_User_Seek IS NULL THEN Utilization_Target.Last_User_SeekWHEN Utilization_Source.Last_User_Seek < Utilization_Target.Last_User_Seek THEN Utilization_Target.Last_User_SeekELSE Utilization_Source.Last_User_SeekEND,Last_User_Scan = CASEWHEN Utilization_Source.Last_User_Scan IS NULL THEN Utilization_Target.Last_User_ScanWHEN Utilization_Source.Last_User_Scan < Utilization_Target.Last_User_Scan THEN Utilization_Target.Last_User_ScanELSE Utilization_Source.Last_User_ScanEND,Last_User_Lookup = CASEWHEN Utilization_Source.Last_User_Lookup IS NULL THEN Utilization_Target.Last_User_LookupWHEN Utilization_Source.Last_User_Lookup < Utilization_Target.Last_User_Lookup THEN Utilization_Target.Last_User_LookupELSE Utilization_Source.Last_User_LookupEND,Last_User_Update = CASEWHEN Utilization_Source.Last_User_Update IS NULL THEN Utilization_Target.Last_User_UpdateWHEN Utilization_Source.Last_User_Update < Utilization_Target.Last_User_Update THEN Utilization_Target.Last_User_UpdateELSE Utilization_Source.Last_User_UpdateEND,Index_Utiliztion_Summary_Last_Update_Datetime = CURRENT_TIMESTAMP,User_Seek_Count_Last_Update = Utilization_Source.User_Seek_Count,User_Scan_Count_Last_Update = Utilization_Source.User_Scan_Count,User_Lookup_Count_Last_Update = Utilization_Source.User_Lookup_Count,User_Update_Count_Last_Update = Utilization_Source.User_Update_CountWHEN NOT MATCHED BY TARGETTHEN INSERT(    [Database_Name], [Schema_Name], Table_Name, Index_Name, User_Seek_Count, User_Scan_Count, User_Lookup_Count, User_Update_Count, Last_User_Seek,Last_User_Scan, Last_User_Lookup, Last_User_Update, Index_Utiliztion_Summary_Create_Datetime, Index_Utiliztion_Summary_Last_Update_Datetime,User_Seek_Count_Last_Update, User_Scan_Count_Last_Update, User_Lookup_Count_Last_Update, User_Update_Count_Last_Update   )VALUES(    Utilization_Source.[Database_Name],Utilization_Source.[Schema_Name],Utilization_Source.Table_Name,Utilization_Source.Index_Name,Utilization_Source.User_Seek_Count,Utilization_Source.User_Scan_Count,Utilization_Source.User_Lookup_Count,Utilization_Source.User_Update_Count,Utilization_Source.Last_User_Seek,Utilization_Source.Last_User_Scan,Utilization_Source.Last_User_Lookup,Utilization_Source.Last_User_Update,CURRENT_TIMESTAMP,CURRENT_TIMESTAMP,Utilization_Source.User_Seek_Count,Utilization_Source.User_Scan_Count,Utilization_Source.User_Lookup_Count,Utilization_Source.User_Update_Count);DROP TABLE #Index_Utiliztion_Details;
END

The stored procedure above accepts 2 parameters:

上面的存储过程接受2个参数:

@Retention_Period_for_Detail_Data_Days: The number of days for which data in Index_Utilization_Details will be kept.

@Retention_Period_for_Detail_Data_Days :将保留Index_Utilization_Details中的数据的天数。

@Truncate_All_Summary_Data: A flag that indicates whether the summary data should be removed so that aggregation can start anew. This could be useful after significant server or application settings in which you want a fresh measure of activity.

@Truncate_All_Summary_Data :一个标志,指示是否应删除摘要数据,以便可以重新开始聚合。 在重要的服务器或应用程序设置中您想要进行新的活动度量之后,这可能很有用。

How much data to retain is completely up to your needs and how far back you’ll typically want index usage history. The default is 30 days, but far more can be kept without consuming any significant amount of storage.

保留多少数据完全取决于您的需求,通常需要多久才能使用索引使用历史记录。 默认值为30天,但是可以保留更多的时间而不消耗大量的存储空间。

The temporary table #Index_Utiliztion_Details is used within the stored procedure in order to stage the current set of index usage data. This allows for us to insert this data into the detail table and merge it into the summary table without needing to access all of the system views a second time. It also allows for our messy MERGE statement to be a bit less messy.

在存储过程中使用临时表#Index_Utiliztion_Details以便暂存当前的索引使用数据集。 这使我们可以将这些数据插入到明细表中并将其合并到摘要表中,而无需第二次访问所有系统视图。 它还使我们凌乱的MERGE语句少了一些凌乱。

The MERGE statement is complex as it needs to ensure that our new running totals are accurate, regardless of when the server was last restarted and what any current values are. Instead of checking the server restart time and attempting to gauge the appropriate action, we compare the current value for a count to the previous value and the last collected value. This allows us to determine if this counter was reset since our last stored procedure run and accurately determine how to calculate the new value.

MERGE语句很复杂,因为它需要确保我们新的运行总计是准确的,而不管服务器上次重新启动的时间以及当前值是多少。 无需检查服务器重新启动时间并尝试评估适当的操作,我们将计数的当前值与上一个值和最后一个收集的值进行比较。 这使我们能够确定自上次存储过程运行以来是否重置了该计数器,并准确确定了如何计算新值。

For example, if the aggregate count of seeks on an index is 100, the new reading from the DMV is 25, and our previously collected value was 10, then we can determine that we have had 15 seeks since the last reading. It is possible the server restarted and we have had 25 since that last reading, but micromanaging our numbers to that extent is not necessary for a good idea of how our indexes our used. Unless a server is restarted hourly or daily, our metrics will provide enough accuracy to be effective. It also avoids the need to worry about the bug in previous versions that zeroed out this view upon index rebuilds.

例如,如果索引上的搜索总次数为100,DMV的新读数为25,而我们先前收集的值为10,则可以确定自上次读取以来,我们已经进行了15次搜索。 可能服务器已重新启动,自上次读取以来,我们已经有25个服务器了,但是对于很好地了解我们的索引使用方式,不必对这个数字进行微观管理。 除非服务器每小时或每天重启一次,否则我们的指标将提供足够的准确性以使其有效。 它还避免了担心以前版本中的错误,该错误在重建索引时将这种观点归零。

使用过程:我们可以学到什么? (Using the Process: What Can We Learn?)

How often to execute this process is an important question to consider. Typically, once per day or a few times per day is adequate. If more granularity is desired and you’d like to be able to compare index usage counts at, say, 5pm vs. 8am, then the timings can be customized to meet your needs.

多久执行一次此过程是要考虑的重要问题。 通常,每天一次或每天几次就足够了。 如果需要更大的粒度,并且您希望能够比较下午5点与上午8点之间的索引使用计数,则可以自定义时间以满足您的需求。

Once the process has been running for a few days, we can look at the summary data and learn about our index use, which will trend back until the last instance restart:

流程运行了几天后,我们可以查看摘要数据并了解我们的索引使用情况,该趋势将一直持续到最后一个实例重新启动:

From here, we can see index activity that stretches back to a few days ago when I last restarted my laptop. For data we can act on reliably, we’d want weeks or months of data to act on, but can begin to form an idea of what indexes may be unused or underused. One simple check we can perform is to determine any indexes that have no reads against them:

从这里,我们可以看到索引活动可以追溯到几天前我最后一次重新启动笔记本电脑时的活动。 对于我们可以可靠处理的数据,我们希望数周或数月的数据可以处理,但是可以开始形成对哪些索引可能未使用或未充分使用的想法。 我们可以执行的一项简单检查是确定未读取任何索引的索引:


SELECT*
FROM dbo.Index_Utiliztion_Summary
WHERE Index_Utiliztion_Summary.User_Seek_Count = 0
AND Index_Utiliztion_Summary.User_Scan_Count = 0
AND Index_Utiliztion_Summary.User_Lookup_Count = 0;

These indexes are the most likely candidates for removal as they have no reads within our aggregated data:

这些索引最有可能被删除,因为它们在我们的汇总数据中没有读取:

An index that is written to, but never read is essentially useless and can be removed, assuming it is truly unused, which is the important fact that we need to verify. Some indexes may be used infrequently, such as for a quarterly finance report or a yearly executive summary. In addition to ensuring that an index is not used in any commonly executed queries, we need to make sure that it is not needed for less frequent processes.

写入但从未读取的索引实际上是无用的,并且可以在假定它确实未被使用的情况下将其删除,这是我们需要验证的重要事实。 某些索引可能很少使用,例如用于季度财务报告或年度执行摘要。 除了确保在任何通常执行的查询中不使用索引之外,我们还需要确保对于频率较低的进程不需要该索引。

Indexes with zero reads are rare and unlikely in a real-world database. More likely, there will be indexes that are written to far more often than read, but are still read. To find indexes that are used, but are inefficient, we can adjust our query above:

在现实世界的数据库中,读取次数为零的索引很少,而且不太可能。 更有可能的是,索引被写入的次数远多于读取,但仍被读取。 要查找已使用但效率低下的索引,我们可以在上面调整查询:


SELECTIndex_Utiliztion_Summary.User_Seek_Count + Index_Utiliztion_Summary.User_Scan_Count + Index_Utiliztion_Summary.User_Lookup_Count AS Total_Reads,CAST((Index_Utiliztion_Summary.User_Seek_Count + Index_Utiliztion_Summary.User_Scan_Count + Index_Utiliztion_Summary.User_Lookup_Count) * 100.00 /(Index_Utiliztion_Summary.User_Seek_Count + Index_Utiliztion_Summary.User_Scan_Count + Index_Utiliztion_Summary.User_Lookup_Count + Index_Utiliztion_Summary.User_Update_Count) AS DECIMAL(6,3)) AS Percent_Reads,*
FROM dbo.Index_Utiliztion_Summary
ORDER BY CAST((Index_Utiliztion_Summary.User_Seek_Count + Index_Utiliztion_Summary.User_Scan_Count + Index_Utiliztion_Summary.User_Lookup_Count) * 100.00 /(Index_Utiliztion_Summary.User_Seek_Count + Index_Utiliztion_Summary.User_Scan_Count + Index_Utiliztion_Summary.User_Lookup_Count + Index_Utiliztion_Summary.User_Update_Count) AS DECIMAL(6,3)) ASC;

This returns all indexes ordered by the percentage of reads vs. total operations on each. This allows us to understand which indexes are used most efficiently vs. those which are potentially costing us more than they are worth:

这将返回所有索引,这些索引按读取次数与每个操作的总操作数的百分比排序。 这使我们能够了解最有效地使用哪些索引,以及哪些索引可能使我们付出的代价超过其价值:

This can be another useful metric when determining how effective an index is as we can gauge read vs. write operations and determine the best course of action, whether an index is unused or not.

当确定索引的有效性时,这可以是另一个有用的指标,因为我们可以衡量读取操作与写入操作之间的关系,并确定最佳的操作过程,无论索引是否未使用。

One other way to view this data is to compare scan operations vs. seeks. This can allow us to understand if an index is being used frequently for queries that are scanning, rather than seeking the index. Similarly, we can check the lookup count to see if an index is resulting in bookmark lookups frequently. An index that is scanned heavily may be an indication that a common query that uses it can be optimized further or a new index that could supplement it. Excessive lookups may indicate queries that would benefit from adding include columns to the existing index.

查看此数据的另一种方法是比较扫描操作与搜索。 这可以使我们了解索引是否经常用于正在扫描的查询,而不是查找索引。 同样,我们可以检查查找计数,以查看索引是否经常导致书签查找。 大量扫描的索引可能表示可以进一步优化使用该索引的常见查询,或者可以补充一个新索引。 过多的查询可能表明查询将受益于将包含列添加到现有索引中。


SELECTCASE WHEN (Index_Utiliztion_Summary.User_Seek_Count + Index_Utiliztion_Summary.User_Scan_Count + Index_Utiliztion_Summary.User_Lookup_Count) = 0 THEN 0 ELSECAST(Index_Utiliztion_Summary.User_Scan_Count * 100.00 /(Index_Utiliztion_Summary.User_Seek_Count + Index_Utiliztion_Summary.User_Scan_Count + Index_Utiliztion_Summary.User_Lookup_Count) AS DECIMAL(6,3)) END AS Percent_Scans,CASE WHEN (Index_Utiliztion_Summary.User_Seek_Count + Index_Utiliztion_Summary.User_Scan_Count + Index_Utiliztion_Summary.User_Lookup_Count) = 0 THEN 0 ELSECAST(Index_Utiliztion_Summary.User_Lookup_Count * 100.00 /(Index_Utiliztion_Summary.User_Seek_Count + Index_Utiliztion_Summary.User_Scan_Count + Index_Utiliztion_Summary.User_Lookup_Count) AS DECIMAL(6,3)) END AS Percent_Scans,*
FROM dbo.Index_Utiliztion_Summary
ORDER BY Index_Utiliztion_Summary.User_Lookup_Count + Index_Utiliztion_Summary.User_Scan_Count - Index_Utiliztion_Summary.User_Seek_Count DESC;

This query will return the percentage of all reads that are scans, as well as the percentage that are lookups:

此查询将返回扫描的所有读取的百分比以及查找的百分比:

This allows us to see indexes that might benefit from further optimization, or data points that justify the addition of new indexes. Scans on a clustered index indicate that no nonclustered index was able to satisfy a query, and therefore a new index could be a great way to speed those queries up (if needed). Our next article on missing indexes will allow us to collect far more information that can be used to justify the decision to add new indexes when we find one that is potentially missing.

这使我们能够看到可能会从进一步优化中受益的索引,或证明添加新索引合理的数据点。 对聚集索引的扫描表明没有非聚集索引能够满足查询,因此新索引可能是加快这些查询(如果需要)的一种好方法。 我们的下一篇有关缺失索引的文章将使我们能够收集更多的信息,当发现潜在的索引缺失时,这些信息可用来证明添加新索引的决定是正确的。

Not all scans are bad. Occasionally we will have tables that are built to be fully read each time and that such a design is intentional. While rare, this scenario would be one in which we see index usage that indicates the need for indexing changes, but in reality none are required. Further research into these stats would invariably arrive at the same conclusion. A good example of this sort of behavior would be a configuration table that is read in its entirety by an application when it is started, and then not accessed again until the next time it is turned on.

并非所有扫描都不好。 有时,我们会创建一些表,这些表每次都可以完全读取,并且这种设计是有意的。 尽管这种情况很少见,但在这种情况下,我们会看到索引使用情况,这表明需要对索引进行更改,但实际上并不需要。 对这些统计数据的进一步研究将始终得出相同的结论。 此类行为的一个很好的例子是配置表,该表在启动时由应用程序完整读取,然后在下一次打开时才再次访问。

An important consideration is that index usage is indicative of table usage. If all of the indexes on a table (including the clustered index) are never read, then we know that a table is not used often (if at all). Confirming that a table is unused and being able to remove it could be a beneficial way to clean up unused objects and reclaim valuable space. A table that is scanned often with few updates could also be an excellent candidate for compression. In addition, a table that is heavily scanned can be indicative of queries that SELECT *, or are pulling far more columns than they need.

一个重要的考虑因素是索引的使用指示表的使用。 如果从不读取表上的所有索引(包括聚集索引),则我们知道不经常使用表(如果有的话)。 确认表未使用并能够将其删除可能是清理未使用的对象并回收宝贵空间的有益方法。 经常扫描而几乎没有更新的表也可能是压缩的理想选择。 另外,被大量扫描的表可以指示SELECT *的查询,或者正在拉出远远超出其所需数量的列。

In summary, when analyzing index statistics, we can learn quite a bit about our data and how it is used. This knowledge can allow us to make smart decisions about how we maintain our indexes and allow us to not pay the price of maintenance on unneeded objects.

总而言之,在分析索引统计信息时,我们可以学到很多有关数据及其使用方式的知识。 这些知识可以使我们对如何维护索引做出明智的决策,并且使我们不必为不需要的对象支付维护费用。

结论 (Conclusion)

Understanding index utilization allows us to track and learn about how our data is accessed and how effectively our indexes are being used to service our queries.

了解索引利用率使我们能够跟踪和了解如何访问我们的数据以及如何有效地使用索引来为我们的查询提供服务。

This data can allow us to figure out what indexes are not needed and can be dropped. In addition, these metrics can help us determine if an index is not used enough to justify its existence, or if it is not being used as effectively as we would expect it to be.

这些数据可以使我们找出不需要和可以删除哪些索引。 另外,这些度量标准可以帮助我们确定索引使用不足以证明其存在合理性,或者索引使用效率不如我们预期的那样。

Since indexes are the primary method by which queries access our data, having an effective set of indexes that are used regularly will ensure that our read operations perform adequately and that writes are not hampered by needing to maintain an extensive list of unused indexes.

由于索引是查询访问数据的主要方法,因此,拥有一组定期有效使用的索引将确保我们的读取操作能够充分执行,并且不需要维护大量未使用的索引,从而不会妨碍写入操作。

Next articles in this series:

本系列的下一篇文章:

  • Collecting, aggregating, and analyzing missing SQL Server Index Stats收集,汇总和分析丢失SQL Server索引统计信息
  • SQL Server reporting – SQL Server Index Utilization DescriptionSQL Server报告– SQL Server索引使用说明

资料下载 (Downloads)

  • Index Stats Usage索引统计用法

翻译自: https://www.sqlshack.com/sql-server-index-performance-tuning-using-built-in-index-utilization-metrics/

sql索引调优

sql索引调优_使用内置索引利用率指标SQL Server索引性能调优相关推荐

  1. sql错误索引中丢失_收集,汇总和分析丢失SQL Server索引统计信息

    sql错误索引中丢失 描述 (Description) Indexing is key to efficient query execution. Determining what indexes a ...

  2. Spark商业案例与性能调优实战100课》第20课:大数据性能调优的本质和Spark性能调优要点分析

    Spark商业案例与性能调优实战100课>第20课:大数据性能调优的本质和Spark性能调优要点分析 基于本元想办法,大智若愚,大巧若拙!深入彻底的学习spark技术内核!

  3. server sql top速度变慢解决方案_SQL Server的性能调优:解决查询速度慢的五种方法-数据库...

    编辑推荐: 本文主要通过一下几个方面介绍:使用SQL DMV查找慢速查询.通过APM解决方案查询报告.SQL Server扩展事件.SQL Azure查询性能洞察等相关内容. 本文来自博客园,由火龙果 ...

  4. MySQL 之视图、 触发器、事务、存储过程、内置函数、流程控制、索引(二)

    继上文 --------------------------------------------------------------------注:如果你对python感兴趣,我这有个学习Python ...

  5. MySQL 之视图、 触发器、事务、存储过程、内置函数、流程控制、索引(一)

    阅读目录 本文内容: 视图 触发器 事务 存储过程 内置函数 流程控制 索引 一.视图 视图就是通过查询得到一张虚拟表,然后保存下来,下次直接使用即可. 如果要频繁使用一张虚拟表,可以不用重复查询 视 ...

  6. elasticsearch原理_花几分钟看一下Elasticsearch原理解析与性能调优

    基本概念 定义 一个分布式的实时文档存储,每个字段 可以被索引与搜索 一个分布式实时分析搜索引擎 能胜任上百个服务节点的扩展,并支持 PB 级别的结构化或者非结构化数据 用途 全文检索 结构化搜索 分 ...

  7. eclipse索引4超出范围_Python内置的4个重要基本数据结构:列表、元组、字典和集合

    本章内容提要: 列表及操作 元组及操作 字典及操作 集合简介 第2章介绍了数字(整数.浮点数).逻辑值和字符串等Python内置的基本数据类型.在实际的操作中,仅仅依赖它们很难高效地完成复杂的数据处理 ...

  8. 【RGBCW五路调光投光灯照明方案】 共阳极无频闪调光驱动IC 内置MOS降压恒流LED驱动芯片FP7122

    一:方案名称: [RGBCW五路调光投光灯照明方案] 输入8.0-100V 最大输出2A 内置MOS降压恒流LED驱动IC芯片FP7122 二:方案品牌: 远翔FEELING(雅欣) 三.方案概述 F ...

  9. 04-前端技术_ javaScript内置对象与DOM操作

    目录 五,javaScript内置对象与DOM操作 1,JavaScript对象定义和使用 2,JavaScript内置对象 2.1 Array数组 2.1.1 创建方式 2.1.2 常用属性: 2. ...

最新文章

  1. 软件测试数据存储位置,关于数据存储类型的一点分析
  2. Java多态详解(入门可看)
  3. Spring Cloud Spring Boot mybatis分布式微服务云架构(三)属性配置文件详解(1)
  4. 汇编:JCXZ条件转移指令
  5. Angular NgModule 中的 declarations 和 exports定义
  6. 使用UIWebView载入本地或远程server上的网页
  7. 电脑运行VirtualBox虚拟机总是提示0x00000000错误的解决方法
  8. python安装cvxopt_python如何安装cvxopt
  9. xssfdataformat 设置单元格式为数值_Excel表格为工作表的部分区域设置保护密码同时隐藏单元格内容。...
  10. 码栈使用手册(二)---界面介绍
  11. 棋牌搭建,APP新手搭建教程
  12. 概率论与数理统计——常用结论
  13. golang struct数组排序_go语言中排序sort的使用方法示例
  14. Linux基础命令实例
  15. JSP概述——什么是JSP、JSP运行原理
  16. 软件测试时印象深刻的bug案例,请问你遇到过哪些印象深刻的bug,接口测试出现bug的原因有哪些?...
  17. matplotlib绘制图表
  18. MySQL8安装教程和新特征
  19. 巨准拓客CRM【工商财税】行业获客解决方案
  20. 十年SEO风云巨变,还有多少站长在坚持

热门文章

  1. SpringBoot + Mybatis/JPA
  2. 团队协作项目——SVN的使用
  3. 删除同目录下面txt文件(利用os,fnmacth模块)
  4. 截短 UTF-8 字符串
  5. 不实例化图片,获取图片宽高的方法(vb.net)
  6. 疯狂的Web应用开源项目
  7. 在登陆AD的机器上测试模拟经过验证的用户
  8. localStorage存储数组以及取数组方法。
  9. 【Vue2.0】—vue-router(二十七)
  10. JavaScript学习(七十三)—高阶函数