sql 执行计划嵌套循环_性能调优–嵌套和合并SQL循环与执行计划

sql 执行计划嵌套循环

In this article, we will explore Nested and Merge SQL Loops in the SQL Execution plan from a performance tuning view.

在本文中，我们将从性能调整的角度探讨SQL执行计划中的嵌套和合并SQL循环。

Even though reading execution is technical, it is more an art than science. The main iterator used when joining tables is a Loop. Nested and Merge loops are 2 of the most common. A plan can even have a loop without joining tables when a Seek needs a Lookup to find additional columns. This art of reading execution plan loops can help with performance tuning and debugging T-SQL. Once over the hump of reading a plan, going from beginner to intermediate is simple.

尽管阅读执行是技术性的，但它更是一门艺术而不是科学。连接表时使用的主要迭代器是循环。嵌套循环和合并循环是最常见的两种。当搜索需要查找以查找其他列时，计划甚至可以具有不连接表的循环。读取执行计划循环的这种技巧可以帮助性能调整和调试T-SQL。一旦阅读了计划书，从初学者到中级就很简单。

The first loop to look at is the Nested SQL Loop. Figure 1 is a Nested Loop from the INNER JOIN of tables SalesOrderHeader and Customer in the Adventure Works database.

要看的第一个循环是嵌套SQL循环。图1是Adventure Works数据库中SalesOrderHeader和Customer表的INNER JOIN的嵌套循环。

The actual T-SQL is in the Code 1 example below. This example shows a Clustered Index Seek finding one row in the Customer table from the WHERE clause CustomerID = 11091.

实际的T-SQL在下面的代码1示例中。此示例显示了聚集索引查找从WHERE子句CustomerID = 11091在Customer表中找到一行。

SELECT cust.CustomerID, soh.SalesOrderIDFROM Sales.Customer custINNER JOIN Sales.SalesOrderHeader sohON soh.CustomerID = cust.CustomerIDWHERE cust.CustomerID = 11091

The WHERE clause in the T-SQL looks for key of the customer table. The Clustered Index Seek in Figure 1 returns one row for the customer in the PK_Customer_CustomerID index. This is the Primary Key (and clustered index) of the Sales.Customer table. Figure 2 shows the detail properties of the Seek Iterator. These properties include Cost, Rows, descriptions and many others that are helpful. The cost is separated into I/O, CPU, Subtree and Operator.

T-SQL中的WHERE子句查找客户表的键。图1中的“聚集索引搜索”在PK_Customer_CustomerID索引中为客户返回一行。这是Sales.Customer表的主键（和聚集索引）。图2显示了Seek迭代器的详细信息属性。这些属性包括“成本”，“行”，“描述”以及许多其他有用的属性。成本分为I / O，CPU，子树和运算符。

The one row from the Customer seek is then passed to the Nested SQL Loop to find the data in the joined table SalesOrderHeader. The outer part of the loop is where the data from the Clustered Index Seek is processed. In this case, there is only one row to traverse the outer loop. The inner loop takes each value from the outer SQL loop and processes more information.

然后，将来自客户搜索的一行传递到嵌套SQL循环，以在联接表SalesOrderHeader中查找数据。循环的外部是处理来自聚集索引搜索的数据的位置。在这种情况下，只有一行可以穿过外循环。内部循环从外部SQL循环获取每个值，并处理更多信息。

The SalesOrderHeader table is being joined and requested for column SalesOrderId in the SELECT statement. The iterator from Figure 1 is an Index Seek on the index IX_SalesOrderHeader_CustomerID from SalesOrderHeader table. This is a non-clustered index seek to find data related to that one customer. It uses the _CustomerId index because the SalesOrderId is in the index. It is in the index because that Id is the Clustered Index (Primary Key) of the SalesOrderHeader table. The script for the both indexes is below in Code 2.

正在联接SalesOrderHeader表，并请求SELECT语句中的SalesOrderId列。图1中的迭代器是对SalesOrderHeader表中的索引IX_SalesOrderHeader_CustomerID的索引查找。这是一种非聚集索引，旨在查找与该客户相关的数据。它使用_CustomerId索引，因为SalesOrderId在索引中。之所以在索引中，是因为该ID是SalesOrderHeader表的聚簇索引（主键）。两个索引的脚本在代码2中。

ALTER TABLE [Sales].[SalesOrderHeader] ADD  CONSTRAINT [PK_SalesOrderHeader_SalesOrderID] PRIMARY KEY CLUSTERED ( [SalesOrderID] ASC ) CREATE NONCLUSTERED INDEX [IX_SalesOrderHeader_CustomerID] ON [Sales].[SalesOrderHeader]( [CustomerID] ASC )

Initially, some people are confused about the SalesOrderId column not visible in the create of the non-clustered index IX_SalesOrderHeader_CustomerID. An understanding of clustered indexes is needed. When creating non-clustered indexes, SQL Server needs the Cluster Index column(s) in the non-cluster indexes to lookup data. It makes a lot of sense once this is understood. If there is no clustered index on table (Heap table), it will use a RowId lookup which is an internal id in a Table for uniquely identifying a row in a data page.

最初，有些人对在非聚集索引IX_SalesOrderHeader_CustomerID的创建中看不到的SalesOrderId列感到困惑。需要了解聚集索引。创建非群集索引时，SQL Server需要非群集索引中的“群集索引”列来查找数据。一旦理解了这一点，就很有意义了。如果表（堆表）上没有聚簇索引，它将使用RowId查找，该查找是Table中的内部ID，用于唯一标识数据页中的行。

If other columns are added to the T-SQL SELECT, AccountNumber and OrderDate, the plan changes because the non-clustered index used in Figure 1 does not have these values.

如果将其他列添加到T-SQL SELECT，AccountNumber和OrderDate，则计划会更改，因为图1中使用的非聚集索引没有这些值。

SELECT cust.CustomerID, soh.SalesOrderID, soh.AccountNumber, soh.OrderDateFROM Sales.Customer custINNER JOIN Sales.SalesOrderHeader sohON soh.CustomerID = cust.CustomerIDWHERE cust.CustomerID = 11091

Figure 3 shows the new plan with an additional Nested Loop to get the new columns from a lookup on the Clustered index of table SalesOrderHeader. The Index Seek is now an Index Seek plus Nested Loop to get additional columns in the Key Lookup of the clustered index.

图3显示了带有附加嵌套循环的新计划，该嵌套循环可从对表SalesOrderHeader的聚集索引的查找中获取新列。索引查找现在是索引查找加嵌套循环，用于在聚簇索引的键查找中获取其他列。

Figure 3 shows for each of the 28 records found in index IX_SalesOrderHeader_CustomerID a row is retrieved from the Clustered Index PK_SalesOrderHeader with Iterator Key Lookup. If the cost of the Key Lookup for the Clustered Index cost too much, a covering index could be created to improve performance. Code 4 shows a covering index that would help this query.

图3显示了在索引IX_SalesOrderHeader_CustomerID中找到的28条记录中的每条记录，并使用迭代器键查找从聚簇索引PK_SalesOrderHeader中检索了一行。如果“聚集索引的关键字查找”的开销太大，则可以创建覆盖索引以提高性能。代码4显示了有助于此查询的覆盖索引。

CREATE NONCLUSTERED INDEX [IX_SalesOrderHeader_CustomerID_IncludeAcctNumOrderDate] ON [Sales].[SalesOrderHeader]( [CustomerID] ASC )INCLUDE (AccountNumber, OrderDate)

This new index now ‘covers’ the query with additional columns. Figure 4 shows the new plan which no longer has the second Nested SQL Loop and uses the index created in Code 4 to get additional information.

现在，此新索引将用其他列“覆盖”查询。图4显示了新计划，该计划不再具有第二个嵌套SQL循环，并使用在代码4中创建的索引来获取其他信息。

NOTE: If this query changes, like more columns in the SELECT, the plan might change back to the one in Figure 3. Always monitor the usage of indexes on a database.

注意：如果此查询发生更改（如SELECT中的更多列），则该计划可能会更改回图3中的列。始终监视数据库索引的使用情况。

The Merge Loop is simpler than a Nested Loop. The data being merged together must be in the same order. The merge works like a zipper. The data is sorted on both streams as part of the join, and as intersections of the 2 streams happen, the data is joined together.

合并循环比嵌套循环更简单。合并在一起的数据必须具有相同的顺序。合并的过程就像一个拉链。作为连接的一部分，数据在两个流上进行排序，并且当两个流发生交叉时，数据将连接在一起。

SELECT  P.Name, total_qty = SUM(I.Quantity)FROM Production.Product PJOIN Production.ProductInventory I  ON  I.ProductID = P.ProductIDGROUP BY P.Name

The T-SQL in Code 5 shows the joining of the Product table with the ProductInventory table. Since the key column is ProductID, both tables either have a Clustered or Non-Clustered Index to retrieve the data. The key field in the indexes is ProductID and is the first column in the main part of the index. Figure 5 shows a Merge Join in the execution plan.

代码5中的T-SQL显示了Product表与ProductInventory表的连接。由于键列是ProductID，因此两个表都具有聚集索引或非聚集索引来检索数据。索引中的关键字段是ProductID，它是索引主要部分的第一列。图5显示了执行计划中的合并联接。

The outer part of the SQL loop is going through the ProductId of the rows from ProductInventory and joins the rows with data from the table Product. ProductId is unique because it is the primary key of the Product table. The T-SQL needs the product table to satisfy the GROUP BY in the T-SQL. The ProductInventory table does not have the Product Name column but is needed for the Sum of Quantity.

SQL循环的外部遍历ProductInventory中行的ProductId，并将行与表Product中的数据连接起来。 ProductId是唯一的，因为它是Product表的主键。 T-SQL需要产品表来满足T-SQL中的GROUP BY。 ProductInventory表没有“产品名称”列，但数量总和是必需的。

The join between the 2 tables is where the plan indicates to get the ProductID. Be aware when there is a Sort iterator in this kind of execution plan. Sorts are high in the cost and it might be wiser to not use a GROUP BY or DISTINCT that forces a Sort.

2个表之间的联接是计划指示要获取ProductID的位置。请注意，在这种执行计划中何时存在Sort迭代器。排序的成本很高，因此最好不要使用GROUP BY或DISTINCT强制排序。

-- Add some columns
SELECT  P.Name, I.LocationID, total_qty = SUM(I.Quantity)FROM Production.Product PJOIN Production.ProductInventory I  ON  I.ProductID = P.ProductIDGROUP BY P.Name, I.LocationID

The T-SQL in Code 6 adds the additional column LocationID to the SELECT and GROUP BY. The Actual Execution Plan changes are shown in Figure 7.

代码6中的T-SQL将附加列LocationID添加到SELECT和GROUP BY中。实际执行计划更改如图7所示。

The new plan has a Sort added for the Product Name and Location ID combination, but we still have the Merge Join. The cost in Figure 7 show the Sort is 47% of the query. The Stream Aggregation has moved to the Left of the Merge Join because of the addition of the LocationId column. If the Sort is to costly, returning to the original T-SQL will product the first plan like in Figure 8.

新计划为“产品名称”和“位置ID”组合添加了“排序”，但是我们仍然具有“合并联接”。图7中的费用显示Sort是查询的47％。由于添加了LocationId列，流聚合已移至合并联接的左侧。如果排序很昂贵，则返回原始T-SQL将产生第一个计划，如图8所示。

Seemingly simple additions can change a plan. Sometimes to the SQL loop but sometimes to another part of the plan like the new Sort iterator added for the new column. Knowing how to read a plan can help diagnose when a covering index can help or when adding a column can change a plan with a costly iterator. Even though the costly iterator is added, the cost might not be high even to be concerned. Usually the end user will notify IT if something is running too slow.

看似简单的添加可以改变计划。有时到SQL循环，但有时到计划的另一部分，例如为新列添加的新的Sort迭代器。知道如何阅读计划可以帮助诊断何时可以使用覆盖索引或何时添加列可以使用昂贵的迭代器更改计划。即使添加了昂贵的迭代器，成本也可能不高，甚至值得关注。通常，如果运行速度太慢，最终用户将通知IT。

参考资料 (References)

Execution Plans 执行计划
ApexSQL Tools ApexSQL工具
Nested Loop 嵌套循环

翻译自: https://www.sqlshack.com/performance-tuning-nested-and-merge-loops-with-execution-plans/

sql 执行计划嵌套循环