First of all, a quick recap on what a recursive query is.


Recursive queries are useful when building hierarchies, traverse datasets and generate arbitrary rowsets etc. The recursive part (simply) means joining a rowset with itself an arbitrary number of times.


A recursive query is defined by an anchor set (the base rowset of the recursion) and a recursive part (the operation that should be done over the previous rowset).


This blogpost will cover some of the basics in recursive CTE’s and explain the approach done by the SQL Server engine.

该博文将介绍递归CTE的一些基础知识,并说明SQL Server引擎完成的方法。

基础 (The basics)

A recursive query helps in a lot of scenarios. For instance, where a dataset is built as a parent-child relationship and the requirement is to “unfold” this dataset and show the hierarchy in a ragged format.

递归查询在很多情况下都有帮助。 例如,在将数据集构建为父子关系的情况下,要求“展开”该数据集并以不整齐的格式显示层次结构。

A recursive CTE has a defined syntax and can be written in general terms like this …but don’t run way because of the general syntax. A lot of examples (in real code) will come:

递归CTE具有已定义的语法,可以用这样的通用术语来编写……但是由于通用语法而无法运行。 许多示例(以实际代码)将出现:

select result_from_previous.*
from result_from_previous
union all
select result_from_current.*
from set_operation(result_from_previous, mytable) as result_from_current

Or rewritten in another way:


select result_from_previous.*
from result_from_previous
union all
select result_from_current.*
from result_from_previous.*
join mytable
on condition(result_from_previous)

Another way to write the query (using cross apply):


select result_from_current.*
from result_from_previous
cross apply (
select result_from_previous.*
union all
select *
from mytable
where condition(result_from_previous.*)
) as result_from_current

The last one, with the cross apply, is row based and a lot slower than the other two. It iterates over every row from the previous result and computes the scalar condition (which returns true or false). The same row then gets compared to each row in mytable and the current row of result_from_previous. When these conditions are real – the query can be rewritten as a join. Why you should not use the cross apply for recursive queries.

最后一个带有交叉标记,基于行,并且比其他两个慢很多。 它从先前的结果遍历每一行,并计算标量条件(返回true或false)。 然后将同一行与mytable中的每一行以及result_from_previous的当前行进行比较。 当这些条件成立时,可以将查询重写为联接。 为什么不应该使用叉号来申请递归查询。

The reverse – from join to cross apply – is not always true. To know this, we need to look at the algebra of distributivity.

从联接到交叉应用的相反情况并不总是正确的。 要知道这一点,我们需要看一下分布代数。

分布代数 (Distributivity algebra)

Most of us have already learned that below mathematics is true:


X x (Y + Z) = (X x Y) + (X x Z)

X x(Y + Z)=(X x Y)+(X x Z)

But below is not always true:


X ^ (Y x Z) = (X ^ Z) x (X ^ Y)

X ^(Y x Z)=(X ^ Z)x(X ^ Y)

Or said with words, distributivity means that the order of operations is not important. The multiplication can be done after the addition and the addition can be done after the multiplication. The result will be the same no matter what.

或用言语来说,分布意味着操作顺序并不重要。 乘法可以在加法之后完成,加法可以在乘法之后完成。 无论如何,结果都是一样的。

This arithmetic can be used to generate the relational algebra. It’s pretty straight forward:

该算法可用于生成关系代数。 这很简单:

set_operation(A union all B, C) = set_operation(A, C) union all set_operation(B, C)

set_operation(A合并所有B,C)= set_operation(A,C)合并所有set_operation(B,C)

The condition above is true as with the first condition in the arithmetic.


So the union all over the operations is the same as the operations over the union all. This also implies that you cannot use operators like top, distinct, outer join (more exceptions here). The distribution is not the same between top over union all and union all over top. Microsoft has done a lot of good thinking in the recursive approach to reach one ultimate goal – forbid operators that do not distribute over union all.

因此,整个操作的联合与整个联合的操作相同。 这也意味着您不能使用诸如top,distinct,external join之类的运算符( 此处有更多例外 )。 最高工会超过所有工会和最高工会之间的分配是不同的。 微软在递归方法中做出了很多很好的思考,以实现一个最终目标–禁止没有在所有工会上进行分配的运营商。

With this information and knowledge our baseline for building a recursive CTE is now in place.


第一个递归查询 (The first recursive query)

Based on the intro and the above algebra we can now begin to build our first recursive CTE.


Consider a sample rowset (sampletree):





































From above we can see that Brian refers to Jane who refers to Ditlev. And John refers to Claus. This is fairly easy to read from this rowset – but what if the hierarchy is more complex and unreadable?

从上面我们可以看到,布莱恩(Brian)指简(Jane),后者指Ditlev。 约翰指的是克劳斯。 从该行集中可以很容易地读取它-但是如果层次结构更复杂且不可读怎么办?

A sample requirement could be to “unfold” the hierarchy in a ragged hierarchy so it is directly readable.


The anchor


We start with the anchor set (Ditlev and Claus). In this dataset the anchor is defined by parentId is null.

我们从锚点集(Ditlev和Claus)开始。 在此数据集中,由parentId定义的锚为null。

This gives us an anchor-query like below:


Now on to the next part.


The recursive


After the anchor part, we are ready to build the recursive part of the query.


The recursive part is actually the same query with small differences. The main select is the same as the anchor part. We need to make a self join in the select statement for the recursive part.

递归部分实际上是相同的查询,但差异很小。 主要选择与锚定部分相同。 我们需要在递归部分的select语句中进行自我联接。

Before we dive more into the total statement – I’ll show the statement below. Then I’ll run through the details.

在深入探讨总声明之前,我将在下面显示该声明。 然后,我将详细介绍。

Back to the self-reference. Notice the two red underlines in the code. The top one indicates the CTE’s name and the second line indicates the self-reference. This is joined directly in the recursive part in order to do the arithmetic logic in the statement. The join is done between the recursive results parentId and the id in the anchor result. This gives us the possibility to get the name column from the anchor statement.

回到自我参考。 请注意代码中的两个红色下划线。 最上面的一个表示CTE的名称,第二行表示自引用。 为了执行语句中的算术逻辑,将其直接连接到递归部分中。 连接是在递归结果parentId和锚结果中的id之间完成的。 这使我们有可能从anchor语句中获取名称列。

Notice that I’ve also put in another blank field in the anchor statement and added the parentName field in the recursive statement. This gives us the “human readable” output where I can find the hierarchy directly by reading from left to right.

注意,我还在锚语句中添加了另一个空白字段,并在递归语句中添加了parentName字段。 这为我们提供了“人类可读”的输出,在这里我可以通过从左到右阅读直接找到层次结构。

To get data from the above CTE I just have to make a select statement from this:


And the results:


I can now directly read that Jane refers to Ditlev and Brian refers to Jane.


But how is this done when the SQL engine executes the query – the next part tries to explain that.


SQL引擎处理 (The SQL engines handling)

Given the full CTE statement above I’ll try to explain what the SQL engine does to handle this.


The documented semantics is as follows:


  1. Split the CTE into anchor and recursive parts


  2. Run the anchor member creating the first base result set (T0)

    运行锚成员以创建第一个基本结果集(T 0

  3. Run the recursive member with Ti as an input and Ti+1 as an output

    以T i作为输入并以T i + 1作为输出运行递归成员

  4. Repeat step 3 until an empty result set is returned


  5. Return the result set. This is a union all set of T0 to Tn

    返回结果集。 这是T 0至T n的所有并集

So let me try to rewrite the above query to match this sequence.


The anchor statement we already know:


First recursive query:


Second recursive query:


The n recursive query:


The union all statement:


This gives us the exactly same result as we saw before with the rewrite:


Notice that the statement that I’ve put in above named Tn is actually empty. This to give the example of the empty statement that makes the SQL engine stop its execution in the recursive CTE.

注意,我在上面命名为Tn的语句实际上是空的。 这给出了空语句的示例,该语句使SQL引擎在递归CTE中停止其执行。

This is how I would describe the SQL engines handling of a recursive CTE.


Based on this very simple example, I guess you already can think of ways to use this in your projects and daily tasks.


But what about the performance and execution plan?




The execution plan for the original recursive CTE looks like this:


The top part of this execution plan is the anchor statement and the bottom part is the recursive statement.


Notice that I haven’t made any indexes in the table, so we are reading on heaps here.


But what if the data is more complex in structure and depth. Let’s try to base the answer on an example:

但是,如果数据的结构和深度更加复杂,该怎么办? 让我们尝试以示例为基础得出答案:

From the attached sql code you’ll find a script to generate +20.000 rows in a new table called complextree. This data is from a live solution and contains medical procedure names in a hierarchy. The data is used to show the relationships in medical procedures done by the Danish hospital system. It is both deep and complex in structure. (Sorry for the Danish letters in the data…).

从附带的sql代码中,您将找到一个脚本,用于在名为complextree的新表中生成+20.000行。 此数据来自实时解决方案,并按层次结构包含医疗程序名称。 数据用于显示丹麦医院系统在医疗程序中的关系。 它的结构既深又复杂。 (很抱歉数据中的丹麦字母…)。

When we run a recursive CTE on this data – we get the exactly same execution plan:


This is also what I would expect as the amount of data when read from heaps very seldom impact on the generated execution plan.


The query runs on my PC for 25 seconds.


Now let me put an index in the table and let’s see the performance and execution plan.


The index is only put on the parentDwId as, according to our knowledge from this article is the recursive parts join column.


The query now runs 1 second to completion and generates this execution plan:


The top line is still the anchor and the bottom part is the recursive part. Notice now the SQL engine uses the non-clustered index to perform the execution and the performance gain is noticeable.

最上面的行仍然是锚点,最下面的部分是递归部分。 现在注意,SQL引擎使用非聚集索引执行执行,并且性能明显提高。

结论 (Conclusion)

I hope that you’ve now become more familiar with the recursive CTE statement and are willing to try it on your own projects and tasks.


The basics is somewhat straight forward – but beware that the query can become complex and hard to debug as the demand for data and output becomes stronger. But don’t be scared. As I always say – “Don’t do a complex query all at once, start small and build it up as you go along”.

基础知识有些直截了当,但是请注意,随着对数据和输出的需求越来越强,查询可能变得复杂且难以调试。 但是不要害怕。 就像我一直说的那样-“不要一次做一个复杂的查询,要从小处着手,并在进行时逐步建立它”。

Happy coding.


Complete script can be downloaded here.


翻译自: https://www.sqlshack.com/ready-set-go-sql-server-handle-recursive-ctes/

