并行循环和嵌套循环_并行嵌套循环连接–嵌套循环连接和残留谓词的内侧

并行循环和嵌套循环

This article is the second part of the Nested Loop Join Series. In the first part, Introduction of Nested Loop Join, we gave a brief introduction to Native, Indexed and Temporary Index Nested Loop Joins along with CPU cost details.

本文是嵌套循环连接系列的第二部分。在第一部分“嵌套循环连接简介”中，我们简要介绍了本机，索引和临时索引嵌套循环连接以及CPU成本详细信息。

The second part shows how parallelism is implemented in the Nested Loop Join, a brief introduction about a few Outer Row Optimizations and interesting concepts about the inner side of the Nested Loop Join.

第二部分展示了如何在嵌套循环连接中实现并行性，简要介绍了一些外部行优化，以及有关嵌套循环连接内侧的有趣概念。

并行嵌套循环连接 (Parallel Nested Loop Join)

In this section we will concentrate on parallel processing in the Nested Loop Join. Before we start this article, I would like to explain a bit about the Parallel Page Supplier, because it helps in understanding the Nested Loop Join as a whole.

在本节中，我们将集中讨论嵌套循环连接中的并行处理。在开始本文之前，我想先解释一下并行页面供应商，因为它有助于整体理解嵌套循环联接。

Whenever parallel scan/seek is initiated on the table, a storage engine feature called the Parallel Page Supplier performs its task of distributing a set of pages to the available threads on the basis of the demand based schema.

每当在表上启动并行扫描/搜索时，称为“并行页面供应商”的存储引擎功能便会根据基于需求的架构执行将页面集分配给可用线程的任务。

When a thread finishes with its assigned set of pages, it requests the next set of pages from the parallel page supplier. The demand based schema balances the load efficiently, if one thread runs slower than the other threads for any reason. For example, if a thread is busy doing some other processing, then that thread simply requests fewer pages while the other threads pick up the extra work.

当线程完成其分配的页面集时，它将向并行页面供应商请求下一组页面。如果一个线程由于某种原因运行速度慢于其他线程，则基于需求的架构可以有效地平衡负载。例如，如果某个线程正忙于执行其他一些处理，则该线程仅请求较少的页面，而其他线程则承担了额外的工作。

For a better understanding about Parallel Nested Loop Join, we will insert more rows to the DBO.T1 table and create a new NL_Parallel table. The script for this can be found below.

为了更好地了解并行嵌套循环联接，我们将在DBO.T1表中插入更多行，并创建一个新的NL_Parallel表。可以在下面找到该脚本。


Insert into [DBO].[T1]
Select Primarykey +100000 ,Primarykey +100000 , Primarykey +100000 , SomeData   from [DBO].[T1]Create Table NL_Parallel
(PrimaryKey INT NOT NULL CONSTRAINT PK_PrimaryKey PRIMARY KEY , KeyCol varchar(50) , SearchCol int , SomeData char(8000) )Insert INTO NL_Parallel
SELECT
*  FROM T1
WHERE Primarykey < 50001
GO

In the below query, we are controlling the query to choose the specific execution plan by using Option clause, so that we can get the identical execution plan. But if in case you are unable to get it then please run the script of statistics copy of the above tables on your system from below the article.

在下面的查询中，我们通过使用Option子句控制查询以选择特定的执行计划，以便我们获得相同的执行计划。但是，如果万一无法获得，请从文章下面在您的系统上运行上述表格的统计副本脚本。


SELECT
*
FROM NL_Parallel    OT1
JOIN T1    ot2
ON ot1.Keycol = ot2.Keycol
WHERE OT1.SearchCol IN (101,102,103,104,105,106,107,108,109,110, 111,112)OPTION (LOOP JOIN -- Choose only nested loop join as physical join type,RECOMPILE  -- Compile and create execution plan again,Querytraceon 8649 -- Undocumented: Force query optimizer to choose a parallel plan,Force Order -- to match the logical expressed query to physical query plan,MaxDop 4    -- Run with 4 logical processors)

Let us first look at the estimated execution plan.

让我们首先看一下估计的执行计划。

意外的表假脱机（如果您想严格了解嵌套循环联接中的并行执行计划，可以跳过此小节） (The unexpected Table Spool (if you want to learn strictly about the parallel execution plan in the nested loop join you can skip this subsection))

The major features of this execution plan is the Table Spool (Lazy spool, not to be confused with the index spool). A Table Spool (Lazy spool) is a part of the optimization process. The Query Processor creates a hidden temporary table in the Tempdb database, which inserts the rows in the Table Spool only when requested by the parent iterator, and it is a non blocking operator. But is it required here?

该执行计划的主要功能是表假脱机（惰性假脱机，不要与索引假脱机混淆）。表假脱机（惰性假脱机）是优化过程的一部分。查询处理器在Tempdb数据库中创建一个隐藏的临时表，该表仅在父迭代器请求时才在表假脱机中插入行，并且它是非阻塞运算符。但这是必需的吗？

To give a non-theoretical answer, we will execute two versions of the query one with Table Spool and another without it and we will investigate the results.

为了给出非理论性的答案，我们将执行两个版本的查询，一个使用Table Spool，另一个不使用它，我们将调查结果。


SET  STATISTICS TIME,IO ONSELECT
*
FROM NL_Parallel    OT1
JOIN T1    ot2
ON ot1.Keycol = ot2.Keycol
WHERE OT1.SearchCol IN (101,102,103,104,105,106,107,108,109,110, 111,112)OPTION (LOOP JOIN -- Choose only nested loop join as physical join type,RECOMPILE  -- Compile and create execution plan again,Querytraceon 8649 -- Force query optimizer to choose a parallel plan,Force Order -- to match the logical expressed query to physical query plan,MaxDop 4    -- Run with 4 logical processors)Print '2nd Query started'
SELECT
*
FROM NL_Parallel    OT1
JOIN T1    ot2
ON ot1.Keycol = ot2.Keycol
WHERE OT1.SearchCol IN (101,102,103,104,105,106,107,108,109,110, 111,112)OPTION (LOOP JOIN -- Choose only nested loop join as physical join type,RECOMPILE  -- Compile and create execution plan again,Querytraceon 8649 -- Force query optimizer to choose a parallel plan,Querytraceon 8690 --disable spooling in query plan,Force Order -- to match the logical expressed query to physical query plan,MaxDop 4    -- Run with 4 logical processors)

In the second query we used trace flag 8690 which is an undocumented trace flag and is not recommended at all in the production server. Statistics IO and Statistics Time information is also gathered for better understanding of the execution related metrics.

在第二个查询中，我们使用了跟踪标志8690 ，这是一个未记录的跟踪标志，在生产服务器中根本不建议使用。还收集统计信息IO和统计时间信息，以更好地了解与执行相关的指标。

Estimates of the query are pretty accurate, and the subtree cost of the top and bottom query is 44.7569 and 46.8445 respectively. If observed closely we can see that the difference came because of inner side of nested loop join.

该查询的估计非常准确，并且顶部和底部查询的子树成本分别为44.7569和46.8445。如果仔细观察，我们可以看到差异是由于嵌套循环连接的内侧造成的。

On the top query, Table Spool’s subtree cost is estimated 1.53917 units.
At bottom of the execution plan, on the inner side of nested loop join it is 3.62677.

在最上面的查询中，表假脱机的子树成本估计为1.53917单位。
在执行计划的底部，嵌套循环联接的内部是3.62677。

Now let us look at the message tab, for top query, there are two Worktables (Worktables are internal tables that hold intermediate results). One of them represents the Table Spool directly, on my system for the top query, 546866 additional logical reads are reported and it executes in 1630 ms, whereas for the bottom query there is no additional IO for the worktable and the query is executed in 551 ms.

现在让我们看看在消息选项卡，热门查询，有两个的工作表（工作表被持有中间结果的内部表）。其中之一直接代表表假脱机，在我的系统上针对最上面的查询，报告了546866个其他逻辑读取 ，并且在1630 ms内执行，而对于最下面的查询，该工作表没有其他IO，并且在551中执行了查询。毫秒

In my opinion, for the above scenario, there is no reason to have a Table Spool. The Table Spool is just additional overhead, since there is no filter predicate for the inner table DBO.T1. If there was any filter, DBO.T1 would have passed only some pages to Table Spool and further execution would have occurred from the newly created Table Spool. But in the above execution plan, it is passing the whole table to Table Spool, which would lead some additional IOs, TempDB Overhead and some other Table Spool related overheads.

我认为，对于上述情况，没有理由使用表假脱机。表假脱机只是额外的开销，因为内部表DBO.T1没有过滤谓词。如果有任何筛选器，则DBO.T1只能将某些页面传递给Table Spool，而新创建的Table Spool可能会进一步执行。但是在上述执行计划中，它将整个表传递给Table Spool，这将导致一些额外的IO，TempDB Overhead和其他与Table Spool相关的开销。

SQL Server uses cost based optimization to choose the final execution plan for the query, and in the optimizer reasoning, using a Table Spool would be slightly better than non-Table Spool version of the query. The Query Optimizer uses some assumptions which influence the cost of the execution plan. One of the major assumption which influence the plan choice is that every query starts with cold cache, also known as cold cache assumptions. So, it is possible that in the last execution cache was warm, hence try to execute it this time with the cold cache.

SQL Server使用基于成本的优化来选择查询的最终执行计划，并且在优化器推理中，使用表假脱机会比查询的非表假脱机版本稍好。查询优化器使用一些会影响执行计划成本的假设。影响计划选择的主要假设之一是每个查询均以冷缓存开始，也称为冷缓存假设。因此，有可能在上次执行缓存时预热了，因此这次尝试使用冷缓存来执行它。

冷速测试 (COLD CACHE TEST)

In this test we will run both the above queries, but this time with cold cache. To clean the buffer cache we will use the DBCC DropCleanBuffers command, which is not recommended on the production server.

在此测试中，我们将运行上述两个查询，但是这次使用冷缓存。为了清理缓冲区缓存，我们将使用DBCC DropCleanBuffers命令，这在生产服务器上不建议使用。


CHECKPOINT;
Dbcc Dropcleanbuffers;SELECT
*
FROM NL_Parallel    OT1
JOIN T1    ot2
ON ot1.Keycol = ot2.Keycol
WHERE OT1.SearchCol IN (101,102,103,104,105,106,107,108,109,110, 111,112)OPTION (LOOP JOIN -- Choose only nested loop join as physical join type,RECOMPILE  -- Compile and create execution plan again,Querytraceon 8649 -- Force query optimizer to choose a parallel plan,Querytraceon 8690 --disable spooling in query plan,Force Order -- to match the logical expressed query to physical query plan,MaxDop 4    -- Run with 4 logical processors)Print '2nd Query started'CHECKPOINT;Dbcc Dropcleanbuffers;SELECT
*
FROM NL_Parallel    OT1
JOIN T1    ot2
ON ot1.Keycol = ot2.Keycol
WHERE OT1.SearchCol IN (101,102,103,104,105,106,107,108,109,110, 111,112)OPTION (LOOP JOIN -- Choose only nested loop join as physical join type,RECOMPILE  -- Compile and create execution plan again,Querytraceon 8649 -- Force query optimizer to choose a parallel plan,Force Order -- to match the logical expressed query to physical query plan,MaxDop 4    -- Run with 4 logical processors)

To conclude the test, I ran both the scripts 10 times. The Top query which is without a spool ran in 46.86 seconds while the bottom query which has a Table Spool ran in 47.01 seconds. However, we should not trust the close numbers because when the query is running, the machine’s state matters. Even in an idle system, some background processes run, which consumes resources.

为了结束测试，我两次运行了两个脚本。没有假脱机的顶部查询在46.86秒内运行，而有表假脱机的底部查询在47.01秒内运行。但是，我们不应该信任结束号，因为当查询运行时，计算机的状态很重要。即使在空闲系统中，也会运行一些后台进程，这会消耗资源。

So, this justifies the execution plan costs to a limit, but still there is some additional Tempdb consumption, which is related to the Table Spool.

因此，这可以证明执行计划的成本是合理的，但是仍然有一些额外的Tempdb消耗，这与Table Spool有关。

However, this topic was not the main thrust of this article so, now, we will segue back to a discussion of parallel nested loop joins.

但是，该主题不是本文的重点，因此，现在，我们将回到对并行嵌套循环联接的讨论。

并行嵌套循环连接 (Parallel Nested Loop Join)

To remove complexity from the execution plan, we are removing the Table Spool using the trace flag 8690. Again, remember that this is an undocumented trace flag and is not recommended for use on a production server.

为了消除执行计划的复杂性，我们使用跟踪标志8690除去了表假脱机。同样，请记住，这是一个未记录的跟踪标志，不建议在生产服务器上使用。


SELECT
*
FROM NL_Parallel   OT1
JOIN T1    ot2
ON ot1.Keycol = ot2.Keycol
WHERE OT1.SearchCol IN (101,102,103,104,105,106,107,108,109,110, 111,112)OPTION (LOOP JOIN -- Choose only nested loop join as physical join type,RECOMPILE  -- Compile and create execution plan again,Querytraceon 8649 -- Force query optimizer to choose a parallel plan,Querytraceon 8690 --disable spooling in query plan,Force Order -- to match the logical expressed query to physical query plan,MaxDop 4    -- Run with 4 logical processors)

Have a look at the above query plan’s outer table. There exists a parallel scan, thus parallel page supplier (parallel page supplier is not a part of the relational engine so it cannot be seen in the execution plan) distributes a bunch of rows to all the threads. Then the filter is applied on all the active threads. All the rows that satisfy the filter predicate are in the thread 1 (Depends on the timings and data structure of the table, if you keep running the query you might see the rows in different threads).

看一下上面查询计划的外部表。存在并行扫描，因此并行页面提供者（并行页面提供者不是关系引擎的一部分，因此在执行计划中无法看到）将一堆行分配给所有线程。然后，将过滤器应用于所有活动线程。满足筛选谓词的所有行都在线程1中（取决于表的时间和数据结构，如果继续运行查询，可能会在不同的线程中看到行）。

The inner side of the parallel Nested Loop Join executes as per the outer side of the thread’s execution. To have a better understanding we have elaborated the inner side of an execution plan.

并行嵌套循环连接的内侧按照线程执行的外侧执行。为了更好地理解，我们详细说明了执行计划的内在方面。

As we can see in the outer table, only thread 1 has all the filtered rows, other threads do not have any rows at all. Thus, on the inner side of the loop join only thread 1 will take part in this activity while the rest of the threads won’t able to take part in this.

正如我们在外部表中看到的那样，只有线程1具有所有已过滤的行，其他线程根本没有任何行。因此，在循环连接的内侧，只有线程1会参与此活动，而其余线程则无法参与。

Let us change the constant literal values for the above query and see the execution plan.

让我们更改以上查询的常量文字值，并查看执行计划。


SELECT
*
FROM NL_Parallel    OT1
JOIN T1    ot2
ON ot1.Keycol = ot2.Keycol
WHERE OT1.SearchCol IN (101,102,103,504,505,506 ,1007,1008 ,1009,2010 , 2011,2012)OPTION (LOOP JOIN -- Choose only nested loop join as physical join type,RECOMPILE  -- Compile and create execution plan again,Querytraceon 8649 -- Force query optimizer to choose a parallel plan,Querytraceon 8690 --Disable spooling in query plan,Force Order -- To match the logical expressed query to physical query plan,MaxDop 4    -- Run with 4 logical processors)

Now look at the above execution plan on the NL_Parallel table. All four threads have equal amount of rows and, because of that, the inner side of execution processed is parallel. This execution plan is slightly better than the above execution plan, as it uses the Parallelism evenly in the inner table.

现在，在NL_Parallel表上查看上面的执行计划。这四个线程的行数相等，因此，执行的内部是并行的。该执行计划比上面的执行计划稍好，因为它在内部表中均匀使用了Parallelism。

Prior to SQL Server 2008, the Nested Loop Join has a problem in processing the inner side, in a similar way as it brought rows from the outer table scan/seek. Usually this is not a problem when a lot of rows are encountered in the outer table, but it is not optimal in situations where when the rows are uneven in the threads. But now, in SQL Server 2008 and later versions, the Query Optimizer can add the exchange iterator to distribute outer tables’s rows evenly, so that the inner side can take full advantage of the parallelism. This process is called Few Outer Rows Optimization.

在SQL Server 2008之前，嵌套循环联接在处理内部方面存在问题，其处理方式类似于从外部表扫描/查找中获取行。通常，在外部表中遇到很多行时，这不是问题，但是在线程中行不均匀的情况下，这并不是最佳选择。但是现在，在SQL Server 2008和更高版本中，查询优化器可以添加交换迭代器以均匀分布外部表的行，以便内部可以充分利用并行性。此过程称为“ 很少的外排优化” 。

Alert readers might be thinking that if a Few Outer Rows Optimization was available for SQL Server 2008 and above versions, then why this was not observed on the above test machine.

警报读者可能会认为，如果SQL Server 2008及更高版本可以使用“很少的外部行优化”，那么为什么在上述测试机上却没有看到这种优化。

There are actually some cases where Few Outer Rows Optimization does not start and the above case was one such example. NL_Parallel table has LOB data type (somedata char(8000)), therefore let us change the data size and then examine the query plan again.

实际上，在某些情况下，很少有外部行优化未启动，而上述情况就是这样的一个例子。 NL_Parallel表具有LOB数据类型（somedata char（8000）），因此让我们更改数据大小，然后再次检查查询计划。


ALTER TABLE NL_Parallel ALTER COLUMN SomeData char(20)
ALTER TABLE NL_Parallel REBUILD
--------------------------------------------------------------------------------------------
SELECT
*
FROM NL_Parallel   OT1
JOIN T1    ot2
ON ot1.Keycol = ot2.Keycol
WHERE OT1.SearchCol IN (101,102,103,104,105,106,107,108,109,110, 111,112)OPTION (LOOP JOIN -- Choose only nested loop join as physical join type,RECOMPILE  -- Compile and create execution plan again,Querytraceon 8649 -- Force query optimizer to choose a parallel plan,Querytraceon 8690 --disable spooling in query plan,Force Order -- to match the logical expressed query to physical query plan,MaxDop 4    -- Run with 4 logical processors)

In the above execution plan, at the outer input, it is the additional exchange iterator “round robin” type, which distributes all the rows evenly to all the available threads and because of that all the threads run evenly at the inner side as well. The exchange iterator is not a part of optimization, it is actually post optimization rewrite.

在上面的执行计划中，在外部输入处是附加的交换迭代器“循环”类型，该类型将所有行均匀地分配给所有可用线程，并且因为所有线程也均匀地在内侧运行。交换迭代器不是优化的一部分，它实际上是优化后的重写。

估算值与内侧的实际行数 (Estimation VS Actual Row count At the Inner Side )

Whenever we tune a long running query, we check the execution plan quality by looking at the different aspects of the execution plan, one of them is the estimated and actual row count difference.

每当我们调整长时间运行的查询时，我们都会通过查看执行计划的不同方面来检查执行计划的质量，其中之一是估计行数与实际行数之差。

If the difference of actual and estimated row count is huge and if it degrades the quality of the plan, we try to optimize that particular portion of the query plan.

如果实际行数和估计行数之差巨大，并且如果它降低了计划的质量，我们将尝试优化查询计划的特定部分。

But usually the Nested Loop Join’s inner side is an exception, even if the histogram represents the data distribution accurately, the difference between estimated and the actual row count is still observed. To demonstrate that we are using two tables DBO.T1 and DBO.T2 from this series:

但是通常，嵌套循环连接的内侧是个例外，即使直方图准确地表示了数据分布，仍会观察到估计行数与实际行数之间的差异。为了演示我们正在使用该系列中的两个表DBO.T1和DBO.T2：


SELECT
*
FROM DBO.T1
JOIN DBO.T2
ON T1.Primarykey = T2.Primarykey
WHERE T1.Primarykey > 0
AND T1.Primarykey < 35
ORDER BY T1.Primarykey
OPTION (RECOMPILE)

In the above execution plan, the difference of actual and estimated row count at the outer table should be observed. It is found to be perfect. Now take a look into the inner table’s estimation, and we can see the difference. On the inner side the estimated number of rows is 1 and the actual number of rows is 34.

在上述执行计划中，应注意外部表上实际行数与估计行数的差。发现它是完美的。现在看一下内表的估计，我们可以看到差异。在内侧，估计的行数为1，而实际的行数为34。

Let us try to debug the inner side, and analyze the origin of this difference? If you look at the query, the JOIN condition and WHERE predicate, both are on the primary key column. Look at the inner side on the DBO.T2 table, the same predicate is also applied, but this is not specified in the query.

让我们尝试调试内侧，并分析这种差异的根源吗？如果查看查询，则JOIN条件和WHERE谓词都在主键列上。看一下DBO.T2表的内侧，也应用了相同的谓词，但是在查询中未指定。

Actually, it is a part of the optimization process called Simplification, which we will look in detail in the next section of the article. The question is, why there is the difference of row count on the inner side of query. To find out the estimated number of rows, let us break the above query and apply the predicate directly to DBO.T2 table.

实际上，这是名为“简化”的优化过程的一部分，我们将在本文的下一部分中对其进行详细介绍。问题是，为什么查询内侧的行数有所不同。为了找出估计的行数，让我们打破上面的查询，并将谓词直接应用于DBO.T2表。


SELECT * FROM DBO.T2
WHERE Primarykey > 0
AND Primarykey < 35

There is no difference between estimated and actual row count values here. So what is the problem with the inner side of nested loop join, why is it misestimated the number of rows, is it buggy?

此处估计行数与实际行数之间没有区别。那么嵌套循环联接的内侧是什么问题，为什么它会错误地估计行数，它是否有错误？

No, we don’t need to worry about that because for Nested Loop Join’s inner side is calculated as per execution, irrespective of outer table’s rows. Actually, the term used “Estimated number of rows” is misleading here, it should have “Estimated number of rows per execution”.

不，我们不必担心，因为对于嵌套循环联接，内侧是根据执行计算的，而与外侧表的行无关。实际上，这里使用的术语“估计的行数”在这里是误导的，它应该具有“每次执行的估计的行数”。

There is a connect item filed by Hugo Kornelis for this issue, which can be found here.

Hugo Kornelis针对此问题提交了一个连接项，可以在此处找到。

If the Naive Nested Loop Join is used, then the estimated rows of the inner side would be the table’s cardinality (total estimated number of rows in the table) and if the Index Nested Loop Join is applied and the inner side of index is marked as a unique index (Joining Column(s)), then estimated rows would be 1. But if the inner side of the loop join index is not unique then estimated depends on the cardinality of the joining output, unless row goal applied.

如果使用朴素的嵌套循环联接，则内侧的估计行将是表的基数（表中估计的总行数），并且如果应用了索引嵌套循环联接并且索引的内侧被标记为唯一索引（联接列），则估算的行将为1。但是，如果循环联接索引的内侧不是唯一的，则估算将取决于联接输出的基数，除非应用行目标。

It can be calculated as:

可以计算为：

Estimated Output Rows / Inner Table’s Estimated Number of Execution = Inner Table’s Estimated Rows.

估计的输出行数/内部表的估计执行数=内部表的估计的行数。

不必要的残留谓词（仅当JOIN和WHERE列相同时） (Unnecessary Residual Predicates (Only when the JOIN and WHERE column(s) are same))

In this section of the article we see an unusual case, which only occurs with a Nested Loop Join, when the Join and the Where predicates are on the same column(s). We simply use the above query, and explore the case. The query is provided below with execution plan:

在本文的此部分中，我们看到一个不寻常的情况，只有在连接和Where谓词在同一列上时，才发生在嵌套循环连接中。我们只使用上面的查询，并探讨这种情况。下面为查询提供执行计划：


SELECT
*
FROM DBO.T1
JOIN DBO.T2
ON T1.Primarykey = T2.Primarykey
WHERE T1.Primarykey > 0
AND T1.Primarykey < 35
ORDER BY T1.Primarykey
OPTION (RECOMPILE)

The explanation of the execution is same as above. The part worth noticing is the inner side of Nested Loop Join’s predicate on the DBO.T2 table, which was not a part of the original query.

执行说明与上述相同。值得注意的部分是DBO.T2表上的嵌套循环联接谓词的内侧，它不是原始查询的一部分。

It is actually an undesired residual predicate, which is just an additional overhead. Because after seeking the row in the Inner side of the Indexed Nested Loop Join, an additional filter is applied during the execution, which is unnecessary in prospect of the above execution.

实际上，这是不希望的残留谓词，这只是一个额外的开销。因为在索引下的嵌套循环连接的内侧查找了行之后，在执行期间应用了一个附加的过滤器，因此对于执行上述操作是不必要的。

The residual predicate usually collapses with the child iterator, that is why we can’t see it in the execution plan, but there is an undocumented trace flag 9130 which can be used to see this residual predicate in the query plan. It is not recommended to use this trace flag in production though, as it is undocumented.

残留谓词通常与子迭代器一起折叠，这就是为什么我们在执行计划中看不到它的原因，但是有一个未记录的跟踪标志9130可用于查看查询计划中的残留谓词。但是，不建议在生产中使用此跟踪标志，因为它没有记录。


SELECT
*
FROM DBO.T1
JOIN DBO.T2
ON T1.Primarykey = T2.Primarykey
WHERE T1.Primarykey > 0
AND T1.Primarykey < 35
ORDER BY T1.Primarykey
OPTION ( QueryTraceON 9130 )

As far as I know this additional predicate cannot be avoided and ideally this residual predicate should not be present here. The question then arises that why is this residual predicate present in the above execution plan?

据我所知，这个附加谓词是不可避免的，理想情况下，此残留谓词不应在此处出现。随之产生的问题是，为什么上述执行计划中存在这个剩余谓词？

To understand it, we have to go back to the query optimization stages and look into the process where the query text is translated into the logical tree. To see this process, we will again use some undocumented trace flags 3604 and 8606 which are not recommended to run on the production systems.

要理解它，我们必须返回到查询优化阶段，并研究将查询文本转换为逻辑树的过程。要查看此过程，我们将再次使用一些未记录的跟踪标志3604和8606 ，建议不要在生产系统上运行它们。


SELECT
*
FROM DBO.T1
JOIN DBO.T2
ON T1.Primarykey = T2.Primarykey
WHERE T1.Primarykey > 0
AND T1.Primarykey < 35
Order BY T1.Primarykey
Option (Querytraceon 3604 , QUERYTRACEON 8606);

Look at the output of logical tree, we have shortened the output a little bit for better illustration. It is also called an input or converted tree. It is a text conversion of the above query. There is a lot of information in the input tree, but we will focus to the where condition only.

看一下逻辑树的输出，为了更好地说明，我们将输出缩短了一点。它也称为输入树或转换树。这是上述查询的文本转换。输入树中有很多信息，但是我们仅关注where条件。

You can see, it is clearly stated that x_cmpGt (Compare greater than) is applied to the T1.primarykey Column with the value of 0. In short, it is implied as T1.Primarykey > 0.

可以清楚地看到， x_cmpGt （大于）比较值应用于值为T0.primarykey的T1.primarykey列。总之，它隐含为T1.Primarykey> 0。

The x_cmpLt (Compare less than) is applied to the T1.primarykey Column with the value of 35. In short it implies as T1.Primarykey < 35.

x_cmpLt （小于）比较值应用于T1.primarykey列，值为35。简而言之，它表示为T1.Primarykey <35。

The simplification is the next stage in the query optimizer pipeline. It tries to rewrite the logical operations around to help this transformation in later stages. It is a good help to the optimizers, sometimes even it stops the optimizer from doing any work from here onwards to save resources in later stages.

简化是查询优化器管道中的下一个阶段。它尝试重写逻辑操作，以在以后的阶段中帮助此转换。这对优化器是一个很好的帮助，有时甚至会阻止优化器从此处开始进行任何工作以节省后期资源。

In the future, I will write a full length article to explore simplification in more detail. For now, look at the output after simplification phase:

将来，我将写一篇全长文章来更详细地探讨简化问题。现在，看一下简化阶段之后的输出：

You can clearly see a predicate is applied to the both tables and yes, it is a good decision, because at this stage the Query Optimizer has no idea what physical join will choose after completion of the optimization process. It is a possibility that it might choose the other two physical joins, Merge Join or Hash Match.

您可以清楚地看到两个表都使用了谓词，是的，这是一个不错的决定，因为在此阶段，Query Optimizer尚不知道在优化过程完成后将选择哪种物理联接。它可能会选择其他两个物理联接，即“合并联接”或“哈希匹配”。

If the final selection of join type is Hash or Merge Join, then it is always better to filter the rows as early as possible and pass the least number of rows to the physical join Iterator.

如果联接类型的最终选择是散列联接或合并联接，则总是最好尽早过滤行并将最少的行数传递给物理联接迭代器。

But for the indexed Nested Loop Join there is slight overhead and I assume Microsoft is aware of that situation, but it is ignored as it does not cost much in terms of the execution time.

但是对于索引嵌套循环联接来说，开销很小，我认为微软已经意识到了这种情况，但是由于执行时间不多，因此它被忽略了。

In this article, we discussed some interesting features about the Nested Loop Join like Parallel Nested Loop Join, Few Outer Row Optimization, Inner side of Nested Loop Join and Residual Predicate.

在本文中，我们讨论了有关嵌套循环连接的一些有趣功能，例如并行嵌套循环连接，很少的外行优化，嵌套循环连接的内部和残留谓词。

This series has not ended with this. We will come up shortly with the next article of this series.

这个系列还没有结束。我们很快会提出本系列的下一篇文章。

I am ending this article with a question. Which physical Join type does not require an additional memory during execution? Please answer in the comment box below.

我以一个问题结束本文。哪种物理联接类型在执行期间不需要额外的内存？请在下面的评论框中回答。

Download the copy of the statistics.

下载统计信息的副本。

The previous article in this series

本系列的上一篇文章

Introduction to a Nested Loop Join in SQL ServerSQL Server中的嵌套循环联接简介

有用的链接 (Useful Links)

Parallel processing in SQL Server SQL Server中的并行处理
Parallel Page Supplier 平行页面供应商
Option Clause (Transact-SQL) 选项条款（Transact-SQL）
Table Spool in Execution Plan 执行计划中的表假脱机
Trace flag 8690 跟踪标志8690
Statistics IO and 统计IO和Statistics Time 统计时间
Worktables 工作台
DBCC DropCleanBuffers DBCC DropCleanBuffers
Few Outer Rows Optimization 外排很少优化
Row goal in SQL Server SQL Server中的行目标
Residual predicate In SQL Server SQL Server中的残留谓词
Trace flags 3604 and 跟踪标志3604和8606 8606

翻译自: https://www.sqlshack.com/parallel-nested-loop-joins-the-inner-side-of-nested-loop-joins-and-residual-predicates/

并行循环和嵌套循环