组织架构递归_映射架构和递归管理数据

组织架构递归

介绍 (Introduction)

In a typical OLTP environment, we want to maintain an acceptable level of data integrity. The easiest way to do this is through the use of foreign keys, which ensure that the values for a given column will always match that of a primary key in another table.

在典型的OLTP环境中，我们希望保持可接受的数据完整性水平。最简单的方法是使用外键，这可以确保给定列的值始终与另一个表中的主键匹配。

Over time, as the number of tables, columns, and foreign keys increase, the structure of that database can become unwieldy. A single table could easily link to thirty others, a table could have a parent-child relationship with itself, or a circular relationship could occur between a set of many tables.

随着时间的流逝，随着表，列和外键的数量增加，该数据库的结构可能变得笨拙。单个表可以轻松链接到其他三十个表，一个表可以与其本身具有父子关系，或者在一组许多表之间可以发生循环关系。

A common request that comes up is to somehow report on, or modify a set of data in a given table. In a denormalized OLAP environment, this would be trivially easy, but in our OLTP scenario above, we could be dealing with many relationships, each of which needs to be considered prior to taking action. As DBAs, we are committed to maintaining large amounts of data, but need to ensure that our maintenance doesn’t break the applications that rely on that data.

常见的要求是以某种方式报告或修改给定表中的一组数据。在非规范化的OLAP环境中，这很容易，但是在我们上面的OLTP场景中，我们可能要处理许多关系，在采取行动之前需要考虑每个关系。作为DBA，我们致力于维护大量数据，但需要确保我们的维护不会破坏依赖于该数据的应用程序。

How do we map out a database in such a way as to ensure that our work considers all relationships? How can we quickly determine every row of data that relates to a given row? That is the adventure we are embarking upon here!

我们如何以确保我们的工作考虑所有关系的方式来映射数据库？我们如何快速确定与给定行相关的每一行数据？那就是我们在这里进行的冒险！

问题 (Problem)

It is possible to represent the table relationships in a database using an entity-relationship diagram (ERD), which shows each primary key & foreign key for the set of tables we are analyzing. For this example, we will use AdventureWorks, focusing on the Production.Product table and relationships that can affect that table. If we generate a complete ERD for AdventureWorks, we get a somewhat unwieldy result:

可以使用实体关系图（ERD）表示数据库中的表关系，该图显示了我们正在分析的表集的每个主键和外键。在此示例中，我们将使用AdventureWorks，重点关注Production.Product表以及可能影响该表的关系。如果我们为AdventureWorks生成完整的ERD，则结果会有些笨拙：

Not terribly pretty, but it’s a good overview that shows the “hot spots” in the database, where many relationships exist, as well as outliers, which have no dependencies defined. One observation that becomes clear is that nearly every table is somehow related. Five tables (at the top) stand alone, but otherwise every table has at least one relationship with another table. Removing those five tables leaves 68 behind, which is small by many standards, but for visualizing relationships, is still rather clunky. Generating an ERD on very large databases can yield what I fondly refer to as “Death Stars”, where there are hundreds or thousands of tables, and the diagram puts them in a huge set of concentric circles:

虽然不是很漂亮，但是它是一个很好的概述，它显示了数据库中存在许多关系的“热点”以及未定义依赖项的异常值。一个显而易见的发现是，几乎每个表都以某种方式相关。五个表（位于顶部）是单独存在的，否则每个表至少与另一个表具有一种关系。删除这五个表会留下68，这在许多标准上都是很小的，但是对于可视化关系而言，仍然相当麻烦。在非常大的数据库上生成ERD可以产生我亲切的“死亡之星”，其中有成百上千个表，并且该图将它们置于大量同心圆中：

Whether it is a Spirograph or database is up to the viewer, but as a tool, it is more useful as wall art than as science.

视线描记器或数据库取决于观察者，但作为一种工具，它作为墙壁艺术品比作为科学更有用。

To simplify our problem, let’s take a small segment of AdventureWorks that relates to the Product table:

为了简化我们的问题，让我们来看一小段与Product表相关的AdventureWorks：

This ERD illustrates 13 tables and their dependencies. If we wanted to delete rows from Production.Product for any products that are silver, we would immediately need to consider all dependencies shown in that diagram. To do this, we could manually write the following queries:

该ERD说明了13个表及其依赖性。如果我们想从Production.Product中删除任何银色产品，我们将立即需要考虑该图中显示的所有依赖项。为此，我们可以手动编写以下查询：


SELECT COUNT(*) FROM Production.Product WHERE Color = 'Silver' -- 43 rows
SELECT COUNT(*) FROM Production.ProductCostHistory -- 45 rows
INNER JOIN Production.Product ON Production.Product.ProductID = Production.ProductCostHistory.ProductID
WHERE Production.Product.Color = 'Silver'
SELECT COUNT(*) FROM Production.WorkOrder -- 6620 rows
INNER JOIN Production.Product ON Production.Product.ProductID = Production.WorkOrder.ProductID
WHERE Production.Product.Color = 'Silver'
SELECT COUNT(*) FROM Production.TransactionHistory -- 10556 rows
INNER JOIN Production.Product ON Production.Product.ProductID = Production.TransactionHistory.ProductID
WHERE Production.Product.Color = 'Silver'
SELECT COUNT(*) FROM Production.ProductProductPhoto -- 43 rows
INNER JOIN Production.Product ON Production.Product.ProductID = Production.ProductProductPhoto.ProductID
WHERE Production.Product.Color = 'Silver'
SELECT COUNT(*) FROM Production.BillOfMaterials -- 400 rows
INNER JOIN Production.Product ON Production.Product.ProductID = Production.BillOfMaterials.ProductAssemblyID
WHERE Production.Product.Color = 'Silver'
SELECT COUNT(*) FROM Production.BillOfMaterials -- 567 rows
INNER JOIN Production.Product ON Production.Product.ProductID = Production.BillOfMaterials.ComponentID
WHERE Production.Product.Color = 'Silver'
SELECT COUNT(*) FROM Production.ProductListPriceHistory -- 45 rows
INNER JOIN Production.Product ON Production.Product.ProductID = Production.ProductListPriceHistory.ProductID
WHERE Production.Product.Color = 'Silver'
SELECT COUNT(*) FROM Production.ProductInventory -- 86 rows
INNER JOIN Production.Product ON Production.Product.ProductID = Production.ProductInventory.ProductID
WHERE Production.Product.Color = 'Silver'
SELECT COUNT(*) FROM Production.WorkOrderRouting -- 9467 rows
INNER JOIN Production.WorkOrder ON Production.WorkOrder.WorkOrderID = Production.WorkOrderRouting.WorkOrderID
INNER JOIN Production.Product ON Production.Product.ProductID = Production.WorkOrder.ProductID
WHERE Production.Product.Color = 'Silver'

While these queries are helpful, they took a very long time to write. For a larger database, this exercise would take an even longer amount of time and, due to the tedious nature of the task, be very prone to human error. In addition, order is critical—deleting from the wrong table in the hierarchy first could result in foreign key violations. The row counts provided are total rows generated through the join statements, and are not necessarily the counts in any one table. If we are ready to delete the data above, then we can convert those SELECT queries into DELETE statements, run them, and be happy with a job well done:

尽管这些查询很有用，但它们花费了很长时间编写。对于较大的数据库，此练习将花费更长的时间，并且由于任务的繁琐性质，非常容易出现人为错误。另外，顺序很关键-从层次结构中的错误表中删除可能会导致外键冲突。提供的行计数是通过join语句生成的总行数，不一定是任何一张表中的计数。如果我们准备删除上面的数据，则可以将那些SELECT查询转换为DELETE语句，运行它们，并对完成的工作感到满意：


DELETE [WorkOrderRouting]
FROM [Production].[WorkOrderRouting]
INNER JOIN [Production].[WorkOrder]ON  [Production].[WorkOrder].[WorkOrderID] =  [Production].[WorkOrderRouting].[WorkOrderID]
INNER JOIN [Production].[Product]ON  [Production].[Product].[ProductID] =  [Production].[WorkOrder].[ProductID]
WHERE (Product.Color = 'Silver')
GO
DELETE [ProductInventory]
FROM [Production].[ProductInventory]
INNER JOIN [Production].[Product]ON  [Production].[Product].[ProductID] =  [Production].[ProductInventory].[ProductID]
WHERE (Product.Color = 'Silver')
GO
DELETE [ProductListPriceHistory]
FROM [Production].[ProductListPriceHistory]
INNER JOIN [Production].[Product]ON  [Production].[Product].[ProductID] =  [Production].[ProductListPriceHistory].[ProductID]
WHERE (Product.Color = 'Silver')
DELETE [BillOfMaterials]
FROM [Production].[BillOfMaterials]
INNER JOIN [Production].[Product]ON  [Production].[Product].[ProductID] =  [Production].[BillOfMaterials].[ComponentID]
WHERE (Product.Color = 'Silver')
DELETE [BillOfMaterials]
FROM [Production].[BillOfMaterials]
INNER JOIN [Production].[Product]ON  [Production].[Product].[ProductID] =  [Production].[BillOfMaterials].[ProductAssemblyID]
WHERE (Product.Color = 'Silver')
GO
DELETE [ProductProductPhoto]
FROM [Production].[ProductProductPhoto]
INNER JOIN [Production].[Product]ON  [Production].[Product].[ProductID] =  [Production].[ProductProductPhoto].[ProductID]
WHERE (Product.Color = 'Silver')
GO
DELETE [TransactionHistory]
FROM [Production].[TransactionHistory]
INNER JOIN [Production].[Product]ON  [Production].[Product].[ProductID] =  [Production].[TransactionHistory].[ProductID]
WHERE (Product.Color = 'Silver')
GO
DELETE [ProductVendor]
FROM [Purchasing].[ProductVendor]
INNER JOIN [Production].[Product]ON  [Production].[Product].[ProductID] =  [Purchasing].[ProductVendor].[ProductID]
WHERE (Product.Color = 'Silver')
GO
DELETE [WorkOrder]
FROM [Production].[WorkOrder]
INNER JOIN [Production].[Product]ON  [Production].[Product].[ProductID] =  [Production].[WorkOrder].[ProductID]
WHERE (Product.Color = 'Silver')
GO
DELETE [PurchaseOrderDetail]
FROM [Purchasing].[PurchaseOrderDetail]
INNER JOIN [Production].[Product]ON  [Production].[Product].[ProductID] =  [Purchasing].[PurchaseOrderDetail].[ProductID]
WHERE (Product.Color = 'Silver')
GO
DELETE [ProductCostHistory]
FROM [Production].[ProductCostHistory]
INNER JOIN [Production].[Product]ON  [Production].[Product].[ProductID] =  [Production].[ProductCostHistory].[ProductID]
WHERE (Product.Color = 'Silver')
GO
DELETE FROM [Production].[Product]
WHERE Product.Color = 'Silver'

Unfortunately, the result of running this TSQL is an error:

不幸的是，运行此TSQL的结果是一个错误：

The DELETE statement conflicted with the REFERENCE constraint “FK_SpecialOfferProduct_Product_ProductID”. The conflict occurred in database “AdventureWorks2012”, table “Sales.SpecialOfferProduct”, column ‘ProductID’.

DELETE语句与REFERENCE约束“ FK_SpecialOfferProduct_Product_ProductID”冲突。数据库“ AdventureWorks2012”的表“ Sales.SpecialOfferProduct”的“ ProductID”列中发生了冲突。

It turns out there are relationships to tables outside of the Production schema in both Purchasing and Sales. Using the full ERD above, we can add some additional statements to our delete script that will handle them:

事实证明，在“购买”和“销售”中，生产表之外的表都有关系。使用上面的完整ERD，我们可以在删除脚本中添加一些其他语句来处理它们：


DELETE [WorkOrderRouting]
FROM [Production].[WorkOrderRouting]
INNER JOIN [Production].[WorkOrder]ON  [Production].[WorkOrder].[WorkOrderID] =  [Production].[WorkOrderRouting].[WorkOrderID]
INNER JOIN [Production].[Product]ON  [Production].[Product].[ProductID] =  [Production].[WorkOrder].[ProductID]
WHERE (Product.Color = 'Silver')
GO
DELETE [SalesOrderDetail]
FROM [Sales].[SalesOrderDetail]
INNER JOIN [Sales].[SpecialOfferProduct]ON  [Sales].[SpecialOfferProduct].[ProductID] =  [Sales].[SalesOrderDetail].[ProductID]
INNER JOIN [Production].[Product]ON  [Production].[Product].[ProductID] =  [Sales].[SpecialOfferProduct].[ProductID]
WHERE (Product.Color = 'Silver')
GO
DELETE [SalesOrderDetail]
FROM [Sales].[SalesOrderDetail]
INNER JOIN [Sales].[SpecialOfferProduct]ON  [Sales].[SpecialOfferProduct].[SpecialOfferID] =  [Sales].[SalesOrderDetail].[SpecialOfferID]
INNER JOIN [Production].[Product]ON  [Production].[Product].[ProductID] =  [Sales].[SpecialOfferProduct].[ProductID]
WHERE (Product.Color = 'Silver')
GO
DELETE [ProductInventory]
FROM [Production].[ProductInventory]
INNER JOIN [Production].[Product]ON  [Production].[Product].[ProductID] =  [Production].[ProductInventory].[ProductID]
WHERE (Product.Color = 'Silver')
GO
DELETE [ProductListPriceHistory]
FROM [Production].[ProductListPriceHistory]
INNER JOIN [Production].[Product]ON  [Production].[Product].[ProductID] =  [Production].[ProductListPriceHistory].[ProductID]
WHERE (Product.Color = 'Silver')
GO
DELETE [SpecialOfferProduct]
FROM [Sales].[SpecialOfferProduct]
INNER JOIN [Production].[Product]ON  [Production].[Product].[ProductID] =  [Sales].[SpecialOfferProduct].[ProductID]
WHERE (Product.Color = 'Silver')
GO
DELETE [BillOfMaterials]
FROM [Production].[BillOfMaterials]
INNER JOIN [Production].[Product]ON  [Production].[Product].[ProductID] =  [Production].[BillOfMaterials].[ComponentID]
WHERE (Product.Color = 'Silver')
GO
DELETE [BillOfMaterials]
FROM [Production].[BillOfMaterials]
INNER JOIN [Production].[Product]ON  [Production].[Product].[ProductID] =  [Production].[BillOfMaterials].[ProductAssemblyID]
WHERE (Product.Color = 'Silver')
GO
DELETE [ProductProductPhoto]
FROM [Production].[ProductProductPhoto]
INNER JOIN [Production].[Product]ON  [Production].[Product].[ProductID] =  [Production].[ProductProductPhoto].[ProductID]
WHERE (Product.Color = 'Silver')
GO
DELETE [TransactionHistory]
FROM [Production].[TransactionHistory]
INNER JOIN [Production].[Product]ON  [Production].[Product].[ProductID] =  [Production].[TransactionHistory].[ProductID]
WHERE (Product.Color = 'Silver')
GO
DELETE [ProductVendor]
FROM [Purchasing].[ProductVendor]
INNER JOIN [Production].[Product]ON  [Production].[Product].[ProductID] =  [Purchasing].[ProductVendor].[ProductID]
WHERE (Product.Color = 'Silver')
GO
DELETE [WorkOrder]
FROM [Production].[WorkOrder]
INNER JOIN [Production].[Product]ON  [Production].[Product].[ProductID] =  [Production].[WorkOrder].[ProductID]
WHERE (Product.Color = 'Silver')
GO
DELETE [PurchaseOrderDetail]
FROM [Purchasing].[PurchaseOrderDetail]
INNER JOIN [Production].[Product]ON  [Production].[Product].[ProductID] =  [Purchasing].[PurchaseOrderDetail].[ProductID]
WHERE (Product.Color = 'Silver')
GO
DELETE [ProductCostHistory]
FROM [Production].[ProductCostHistory]
INNER JOIN [Production].[Product]ON  [Production].[Product].[ProductID] =  [Production].[ProductCostHistory].[ProductID]
WHERE (Product.Color = 'Silver')
GO
DELETE FROM [Production].[Product]
WHERE Product.Color = 'Silver'

That executed successfully, but I feel quite exhausted from all the roundabout effort that went into the deletion of 43 rows from a table. Clearly this manual solution will not be scalable in any large database environment. What we need is a tool that can intelligently and quickly map these relationships for us.

该操作成功执行，但是我为删除表中的43行所做的所有回旋工作感到非常疲惫。显然，此手动解决方案无法在任何大型数据库环境中进行扩展。我们需要的是一种可以智能，快速地为我们映射这些关系的工具。

解 (Solution)

We want to build a stored procedure that will take some inputs for the table we wish to act on, and any criteria we want to attach to it, and return actionable data on the structure of this schema. In an effort to prevent this article from becoming unwieldy, I’ll refrain from a detailed explanation of every bit of SQL, and focus on overall function and utility.

我们想要构建一个存储过程，该存储过程将为我们要操作的表提供一些输入，并为我们附加到该表的任何条件，并在此模式的结构上返回可操作的数据。为了避免使本文变得笨拙，我将不对SQL的每一个细节进行详细说明，而将重点放在总体功能和实用程序上。

Our first task is to define our stored procedure, build parameters, and gather some basic data about the table we wish to act on (called the “target table” going forward). Deletion will be the sample action as it is the most destructive example that we can use. We will build our solution with 3 basic parameters:

我们的第一个任务是定义我们的存储过程，构建参数，并收集有关我们要操作的表（以后称为“目标表”）的一些基本数据。删除将是示例操作，因为它是我们可以使用的最具破坏性的示例。我们将使用3个基本参数构建解决方案：

@schema_name: The name of the schema we wish to report on
@table_name: The name of the table we wish to report on (target table).
@where_clause: The filter that we will apply when analyzing our data.

@schema_name ：我们希望报告的模式名称
@table_name ：我们要报告的表的名称（目标表）。
@where_clause ：分析数据时将应用的过滤器。


CREATE PROCEDURE dbo.atp_schema_mapping@schema_name SYSNAME,@table_name SYSNAME,@where_clause VARCHAR(MAX) = ''
AS
BEGINSET NOCOUNT ON;DECLARE @sql_command VARCHAR(MAX) = ''; -- Used for many dynamic SQL statementsSET @where_clause = ISNULL(LTRIM(RTRIM(@where_clause)), ''); -- Clean up WHERE clause, to simplify future SQLDECLARE @row_counts TABLE -- Temporary table to dump dynamic SQL output into(row_count INT);DECLARE @base_table_row_count INT; -- This will hold the row count of the base entity.SELECT @sql_command = 'SELECT COUNT(*) FROM [' + @schema_name + '].[' + @table_name + ']' + -- Build COUNT statementCASEWHEN @where_clause <> '' -- Add WHERE clause, if providedTHEN CHAR(10) + 'WHERE ' + @where_clauseELSE ''END;INSERT INTO @row_counts(row_count)EXEC (@sql_command);SELECT@base_table_row_count = row_count -- Extract count from temporary location.FROM @row_counts;-- If there are no matching rows to the input provided, exit immediately with an error message.IF @base_table_row_count = 0BEGINPRINT '-- There are no rows to process based on the input table and where clause.  Execution aborted.';RETURN;END
END
GO

For step one, we have also added a row count check. In the event that the filter we apply to the target table results in no rows returned, then we’ll exit immediately and provide an informational message to let the user know that no further work is needed. As a test of this, we can execute the following SQL, using a color that is surely not found in Adventureworks:

对于第一步，我们还添加了行计数检查。如果我们对目标表应用的过滤器未返回任何行，那么我们将立即退出并提供参考消息，以使用户知道不需要进一步的工作。作为对此的测试，我们可以使用肯定在Adventureworks中找不到的颜色来执行以下SQL：


EXEC dbo.atp_schema_mapping@schema_name = 'Production',@table_name = 'Product',@where_clause = 'Product.Color = ''Flurple'''

The result is exactly as we expected:

结果完全符合我们的预期：

There are no rows to process based on the input table and where clause. Execution aborted.

没有基于输入表和where子句处理的行。执行中止。

There is no other output or action from the stored proc, so far, but this provides a framework to begin our work.

到目前为止，存储的proc没有其他输出或操作，但这为开始我们的工作提供了一个框架。

The first hurdle to overcome is collecting data on our schema and organize it in a meaningful fashion. To process table data effectively, we need to turn an ERD into rows of metadata that describe a specific relationship, as well as how it relates to our target table. A critical part of this task is to emphasize that we are not just interested in relationships between tables. A set of relationships is not enough to completely map all data paths within a database. What we are truly interested in are data paths: Each set of relationships that leads from a given column back to our target table.

要克服的第一个障碍是在我们的架构上收集数据并以有意义的方式对其进行组织。为了有效地处理表数据，我们需要将ERD转换为描述特定关系及其与目标表的关系的元数据行。这项任务的关键部分是强调我们不仅对表之间的关系感兴趣。一组关系不足以完全映射数据库中的所有数据路径。我们真正感兴趣的是数据路径：从给定列到目标表的每组关系。

A table can be related to another via many different sets of paths, and it is important that we define all of these paths, so as not to miss any important relationships. The following shows a single example of two tables that are related in multiple ways:

一个表可以通过许多不同的路径集与另一个表关联，并且重要的是定义所有这些路径，以免遗漏任何重要的关系。下面显示了以多种方式关联的两个表的一个示例：

If we wanted to delete from the account table, we would need to examine the following relationships:

如果要从帐户表中删除，则需要检查以下关系：

account_contract – – – > account (via account_id)
account_contract – – – > employee_resource (via contract_owner_resource_id)
account – – – > account_resource (via account_primary_resource_id)
account_contract – – – > employee_resource (via account_id and account_primary_resource_id)

account_contract – – –>帐户（通过account_id）
account_contract – – –> employee_resource（通过contract_owner_resource_id）
帐户– – –> account_resource（通过account_primary_resource_id）
account_contract – – –> employee_resource（通过account_id和account_primary_resource_id）

The last relationship is very important—it illustrates a simple example of how it is possible for two tables to relate through any number of paths in between. It’s even possible for two tables to relate through the same intermediary tables, but using different key columns. Either way, we must consider all of these relationships in our work.

最后一个关系非常重要-它说明了一个简单的示例，说明了两个表之间如何通过任意数量的路径进行关联。两个表甚至可能通过相同的中间表进行关联，但使用不同的键列。无论哪种方式，我们都必须在工作中考虑所有这些关系。

In order to map these relationships, we will need to gather the appropriate schema metadata from a variety of system views and recursively relate that data back to itself as we build a useful set of data with which to move forward on:

为了映射这些关系，我们将需要从各种系统视图中收集适当的架构元数据，并在构建一组有用的数据以继续进行时，将数据与自身递归相关：


-- This table will hold all foreign key relationships
DECLARE @foreign_keys TABLE
(   foreign_key_id INT NOT NULL IDENTITY(1,1) PRIMARY KEY CLUSTERED,referencing_object_id INT NULL,referencing_schema_name SYSNAME NULL,referencing_table_name SYSNAME NULL,referencing_column_name SYSNAME NULL,primary_key_object_id INT NULL,primary_key_schema_name SYSNAME NULL,primary_key_table_name SYSNAME NULL,primary_key_column_name SYSNAME NULL,level INT NULL,object_id_hierarchy_rank VARCHAR(MAX) NULL,referencing_column_name_rank VARCHAR(MAX) NULL);-- Insert all foreign key relational data into the table variable using a recursive CTE over system tables.
WITH fkey     (referencing_object_id,referencing_schema_name,referencing_table_name,referencing_column_name,primary_key_object_id,primary_key_schema_name,primary_key_table_name,primary_key_column_name,level,object_id_hierarchy_rank,referencing_column_name_rank) AS
(      SELECTparent_table.object_id AS referencing_object_id,parent_schema.name AS referencing_schema_name,parent_table.name AS referencing_table_name,CONVERT(SYSNAME, NULL) AS referencing_column_name,CONVERT(INT, NULL) AS referenced_table_object_id,CONVERT(SYSNAME, NULL) AS referenced_schema_name,CONVERT(SYSNAME, NULL) AS referenced_table_name,CONVERT(SYSNAME, NULL) AS referenced_key_column_name,0 AS level,CONVERT(VARCHAR(MAX), parent_table.object_id) AS object_id_hierarchy_rank,CAST('' AS VARCHAR(MAX)) AS referencing_column_name_rankFROM sys.objects parent_tableINNER JOIN sys.schemas parent_schemaON parent_schema.schema_id = parent_table.schema_idWHERE parent_table.name = @table_nameAND parent_schema.name = @schema_nameUNION ALLSELECTchild_object.object_id AS referencing_object_id,child_schema.name AS referencing_schema_name,child_object.name AS referencing_table_name,referencing_column.name AS referencing_column_name,referenced_table.object_id AS referenced_table_object_id,referenced_schema.name AS referenced_schema_name,referenced_table.name AS referenced_table_name,referenced_key_column.name AS referenced_key_column_name,f.level + 1 AS level,f.object_id_hierarchy_rank + '-' + CONVERT(VARCHAR(MAX), child_object.object_id) AS object_id_hierarchy_rank,f.referencing_column_name_rank + '-' + CAST(referencing_column.name AS VARCHAR(MAX)) AS referencing_column_name_rankFROM sys.foreign_key_columns sfcINNER JOIN sys.objects child_objectON sfc.parent_object_id = child_object.object_idINNER JOIN sys.schemas child_schemaON child_schema.schema_id = child_object.schema_idINNER JOIN sys.columns referencing_columnON referencing_column.object_id = child_object.object_idAND referencing_column.column_id = sfc.parent_column_idINNER JOIN sys.objects referenced_tableON sfc.referenced_object_id = referenced_table.object_idINNER JOIN sys.schemas referenced_schemaON referenced_schema.schema_id = referenced_table.schema_idINNER JOIN sys.columns AS referenced_key_columnON referenced_key_column.object_id = referenced_table.object_idAND referenced_key_column.column_id = sfc.referenced_column_idINNER JOIN fkey fON f.referencing_object_id = sfc.referenced_object_idWHERE ISNULL(f.primary_key_object_id, 0) <> f.referencing_object_id -- Exclude self-referencing keysAND f.object_id_hierarchy_rank NOT LIKE '%' + CAST(child_object.object_id AS VARCHAR(MAX)) + '%')
INSERT INTO @foreign_keys
(      referencing_object_id,referencing_schema_name,referencing_table_name,referencing_column_name,primary_key_object_id,primary_key_schema_name,primary_key_table_name,primary_key_column_name,level,object_id_hierarchy_rank,referencing_column_name_rank)
SELECT DISTINCTreferencing_object_id,referencing_schema_name,referencing_table_name,referencing_column_name,primary_key_object_id,primary_key_schema_name,primary_key_table_name,primary_key_column_name,level,object_id_hierarchy_rank,referencing_column_name_rank
FROM fkey;UPDATE FKEYSSET referencing_column_name_rank = SUBSTRING(referencing_column_name_rank, 2, LEN(referencing_column_name_rank)) -- Remove extra leading dash leftover from the top-level column, which has no referencing column relationship.
FROM @foreign_keys FKEYSSELECT*
FROM @foreign_keys;

The TSQL above builds a set of data, centered on the target table provided (in the anchor section of the CTE), and recursively maps each level of relationships via each table’s foreign keys. The result set includes the following columns:

上面的TSQL以提供的目标表为中心（在CTE的锚点部分）构建了一组数据，并通过每个表的外键递归映射了每个级别的关系。结果集包括以下列：

foreign_key_id: An auto-numbering primary key.
referencing_object_id: The object_id of the referencing table
referencing_schema_name: The name of the referencing schema
referencing_table_name: The name of the referencing table
referencing_column_name: The name of the specific referencing column for the referencing table above
primary_key_object_id: The object_id of the table referenced by the referencing table above
primary_key_schema_name: The schema name of the primary key table.
primary_key_table_name: The table name of the primary key table.
primary_key_column_name: The name of the primary key column referenced by the referencing column.
level: How many steps does this relationship path trace from the target table to the referencing table? This provides us the ability to logically order any operations from most removed to least removed. For delete or update statements, this is crucial.
object_id_hierarchy_rank: A list of each table’s object_id within the relationship tree. The target table is on the left, whereas the referencing table for each relationship is on the right. This will be used when constructing TSQL statements and optimizing unused TSQL.
referencing_column_name_rank: A list of the names of the referencing columns. This will be used later on for optimizing and removing irrelevant statements.

foreign_key_id ：自动编号的主键。
referencing_object_id ：引用表的object_id
referencing_schema_name ：引用架构的名称
referencing_table_name ：引用表的名称
referencing_column_name ：上面引用表的特定引用列的名称
primary_key_object_id ：上面的引用表引用的表的object_id
primary_key_schema_name ：主键表的架构名称。
primary_key_table_name ：主键表的表名。
primary_key_column_name ：引用列引用的主键列的名称。
级别：此关系路径从目标表到引用表要跟踪多少步？这使我们能够按逻辑顺序对从删除最多到删除最少的所有操作进行排序。对于删除或更新语句，这至关重要。
object_id_hierarchy_rank ：关系树中每个表的object_id的列表。目标表在左侧，而每个关系的引用表在右侧。在构造TSQL语句和优化未使用的TSQL时将使用此方法。
referencing_column_name_rank ：引用列名称的列表。稍后将使用它来优化和删除不相关的语句。

There are 2 WHERE clauses that are worth explaining further:

有2个WHERE子句值得进一步说明：

 AND f.object_id_hierarchy_rank NOT LIKE '%' + CAST(child_object.object_id AS VARCHAR(MAX)) + '%'

This ensures that we don’t loop around in circles forever. If a relationship exists that is circular (such as our account example earlier), then an unchecked recursive CTE would continue to increment the level and add to the relationship tree until the recursion limit was reached. We want to enumerate each relationship path only once, and this guards against infinite loops and repeated data.

这确保了我们不会永远循环。如果存在循环关系（例如前面的客户示例），则未经检查的递归CTE将继续增加级别并添加到关系树中，直到达到递归限制为止。我们只想枚举每个关系路径一次，这可以防止无限循环和重复数据。

 WHERE ISNULL(f.primary_key_object_id, 0) &lt;&gt; f.referencing_object_id

There is a single caveat that was explicitly avoided above: self-referencing foreign keys. In an effort to avoid infinite loops, we remove any foreign keys that reference their own table. If the referencing and referenced tables are the same, then we will filter them out of our result set immediately and deal with them separately.

上面明确避免了一个警告：自引用外键。为了避免无限循环，我们删除了引用它们自己的表的所有外键。如果引用表和被引用表相同，那么我们将立即将它们从结果集中过滤出来并分别处理。

We’ve explicitly excluded relationships from a table to itself, and are now obligated to do something about that. To collect this data, we do not need a recursive CTE. A set of joins between parent & child data will suffice:

我们已经明确排除了表与表之间的关系，现在有义务对此做些事情。要收集此数据，我们不需要递归CTE。父子数据之间的一组联接就足够了：


DECLARE @self_referencing_keys TABLE
(      self_referencing_keys_id INT NOT NULL IDENTITY(1,1),referencing_primary_key_name SYSNAME NULL,referencing_schema_name SYSNAME NULL,referencing_table_name SYSNAME NULL,referencing_column_name SYSNAME NULL,primary_key_schema_name SYSNAME NULL,primary_key_table_name SYSNAME NULL,primary_key_column_name SYSNAME NULL);INSERT INTO @self_referencing_keys( referencing_primary_key_name,referencing_schema_name,referencing_table_name,referencing_column_name,primary_key_schema_name,primary_key_table_name,primary_key_column_name)
SELECT(SELECT COL_NAME(SIC.OBJECT_ID, SIC.column_id)FROM sys.indexes SI INNER JOIN sys.index_columns SICON SIC.index_id = SI.index_id AND SIC.object_id = SI.object_idWHERE SI.is_primary_key = 1AND OBJECT_NAME(SIC.OBJECT_ID) = child_object.name) AS referencing_primary_key_name,child_schema.name AS referencing_schema_name,child_object.name AS referencing_table_name,referencing_column.name AS referencing_column_name,referenced_schema.name AS primary_key_schema_name,referenced_table.name AS primary_key_table_name,referenced_key_column.name AS primary_key_column_name
FROM sys.foreign_key_columns sfc
INNER JOIN sys.objects child_object
ON sfc.parent_object_id = child_object.object_id
INNER JOIN sys.schemas child_schema
ON child_schema.schema_id = child_object.schema_id
INNER JOIN sys.columns referencing_column
ON referencing_column.object_id = child_object.object_id
AND referencing_column.column_id = sfc.parent_column_id
INNER JOIN sys.objects referenced_table
ON sfc.referenced_object_id = referenced_table.object_id
INNER JOIN sys.schemas referenced_schema
ON referenced_schema.schema_id = referenced_table.schema_id
INNER JOIN sys.columns AS referenced_key_column
ON referenced_key_column.object_id = referenced_table.object_id
AND referenced_key_column.column_id = sfc.referenced_column_id
WHERE child_object.name = referenced_table.name
AND child_object.name IN -- Only consider self-referencing relationships for tables somehow already referenced above, otherwise they are irrelevant.(SELECT referencing_table_name FROM @foreign_keys);

We can return data from this table (if needed) with one additional query:

我们可以使用另一个查询从该表返回数据（如果需要）：


IF (SELECT COUNT(*) FROM @self_referencing_keys) > 0
BEGINSELECT*FROM @self_referencing_keys;
END

We now have all of the data needed in order to begin analysis. We have a total of 3 goals to achieve here:

现在，我们拥有了开始分析所需的所有数据。我们总共要实现3个目标：

Get counts of data that fit each relationship.获取适合每个关系的数据计数。
If there are zero rows found for any relationships, then we can disregard them for the sake of deleting data. This will greatly speed up execution speed & efficiency on larger databases.如果发现任何关系的行为零，那么为了删除数据，我们可以忽略它们。这将大大加快大型数据库的执行速度和效率。
Generate DELETE statements for the relevant data identified above.为上面标识的相关数据生成DELETE语句。

Collecting row counts will require dynamic SQL in order to query an unknown list of tables and columns. For our example here, I use SELECT COUNT(*) FROM in order to return row counts. If you are working in tables with significant row counts, then you may find this approach to be slow, so please do not run the research portion of this stored procedure in a production environment without some level of caution (using a READ UNCOMMITTED isolation level removes contention, though it won’t speed things up much).

收集行数将需要动态SQL，以便查询未知的表和列列表。对于此处的示例，我使用SELECT COUNT（*）FROM来返回行数。如果您正在处理具有大量行数的表，那么您可能会发现这种方法很慢，因此，请不要在生产环境中运行此存储过程的研究部分，而不必格外小心（使用READ UNCOMMITTED隔离级别会删除争用，尽管这样做不会加快速度）。

The following TSQL defines some new variables and iterates through each relationship until row counts have been collected for each relationship:

以下TSQL定义了一些新变量，并遍历每个关系，直到为每个关系收集了行计数为止：


DECLARE @count_sql_command VARCHAR(MAX) = ''; -- Used for dynamic SQL for count calculations
DECLARE @row_count INT; -- Temporary holding place for relationship row count
DECLARE @object_id_hierarchy_sql VARCHAR(MAX);
DECLARE @process_schema_name SYSNAME = '';
DECLARE @process_table_name SYSNAME = '';
DECLARE @referencing_column_name SYSNAME = '';
DECLARE @join_sql VARCHAR(MAX) = '';
DECLARE @object_id_hierarchy_rank VARCHAR(MAX) = '';
DECLARE @referencing_column_name_rank VARCHAR(MAX) = '';
DECLARE @old_schema_name SYSNAME = '';
DECLARE @old_table_name SYSNAME = '';
DECLARE @foreign_key_id INT;
DECLARE @has_same_object_id_hierarchy BIT; -- Will be used if this foreign key happens to share a hierarchy with other keys
DECLARE @level INT;WHILE EXISTS (SELECT * FROM @foreign_keys WHERE processed = 0 AND level > 0 )
BEGINSELECT @count_sql_command = '';SELECT @join_sql = '';SELECT @old_schema_name = '';SELECT @old_table_name = '';CREATE TABLE #inner_join_tables(      id INT NOT NULL IDENTITY(1,1),object_id INT);SELECT TOP 1@process_schema_name = FKEYS.referencing_schema_name,@process_table_name = FKEYS.referencing_table_name,@object_id_hierarchy_rank = FKEYS.object_id_hierarchy_rank,@referencing_column_name_rank = FKEYS.referencing_column_name_rank,@foreign_key_id = FKEYS.foreign_key_id,@referencing_column_name = FKEYS.referencing_column_name,@has_same_object_id_hierarchy = CASE WHEN (SELECT COUNT(*) FROM @foreign_keys FKEYS2 WHERE FKEYS2.object_id_hierarchy_rank = FKEYS.object_id_hierarchy_rank) > 1 THEN 1 ELSE 0 END,@level = FKEYS.levelFROM @foreign_keys FKEYSWHERE FKEYS.processed = 0AND FKEYS.level > 0ORDER BY FKEYS.level ASC;SELECT @object_id_hierarchy_sql ='SELECT ' + REPLACE (@object_id_hierarchy_rank, '-', ' UNION ALL SELECT ');INSERT INTO #inner_join_tablesEXEC(@object_id_hierarchy_sql);SET @count_sql_command = 'SELECT COUNT(*) FROM [' + @process_schema_name + '].[' + @process_table_name + ']' + CHAR(10);SELECT@join_sql = @join_sql +CASEWHEN (@old_table_name <> FKEYS.primary_key_table_name OR @old_schema_name <> FKEYS.primary_key_schema_name)THEN 'INNER JOIN [' + FKEYS.primary_key_schema_name + '].[' + FKEYS.primary_key_table_name + '] ' + CHAR(10) + ' ON ' +' [' + FKEYS.primary_key_schema_name + '].[' + FKEYS.primary_key_table_name + '].[' + FKEYS.primary_key_column_name + '] =  [' + FKEYS.referencing_schema_name + '].[' + FKEYS.referencing_table_name + '].[' + FKEYS.referencing_column_name + ']' + CHAR(10)ELSE ''END, @old_table_name = CASEWHEN (@old_table_name <> FKEYS.primary_key_table_name OR @old_schema_name <> FKEYS.primary_key_schema_name)THEN FKEYS.primary_key_table_nameELSE @old_table_nameEND, @old_schema_name = CASEWHEN (@old_table_name <> FKEYS.primary_key_table_name OR @old_schema_name <> FKEYS.primary_key_schema_name)THEN FKEYS.primary_key_schema_nameELSE @old_schema_nameENDFROM @foreign_keys FKEYSINNER JOIN #inner_join_tables join_detailsON FKEYS.referencing_object_id  = join_details.object_idWHERE CHARINDEX(FKEYS.object_id_hierarchy_rank + '-', @object_id_hierarchy_rank + '-') <> 0 -- Do not allow cyclical joins through the same table we are originating fromAND FKEYS.level > 0AND ((@has_same_object_id_hierarchy = 0) OR (@has_same_object_id_hierarchy = 1 AND FKEYS.referencing_column_name = @referencing_column_name) OR (@has_same_object_id_hierarchy = 1 AND @level > FKEYS.level))ORDER BY join_details.ID DESC;SELECT @count_sql_command = @count_sql_command +  @join_sql;IF @where_clause <> ''BEGINSELECT @count_sql_command = @count_sql_command + ' WHERE (' + @where_clause + ')';ENDINSERT INTO @row_counts(row_count)EXEC (@count_sql_command);SELECT @row_count = row_count FROM @row_counts;UPDATE FKEYSSET processed = 1,row_count = @row_count,join_condition_sql = @join_sqlFROM @foreign_keys FKEYSWHERE FKEYS.foreign_key_id = @foreign_key_id;DELETE FROM @row_counts;DROP TABLE #inner_join_tables
END

3 new columns have been added to our @foreign_keys table:

3个新列已添加到我们的@foreign_keys表中：

processed: A bit used to flag a relationship once it has been analyzed.
row_count: The row count that results from our work above.
join_condition_sql: The sequence of INNER JOIN statements generated above is cached here so that we do not need to perform all of this work again in the future.

已处理 ：分析关系后，用于标记关系的位。
row_count ：以上我们的工作产生的行数。
join_condition_sql ：上面生成的INNER JOIN语句的序列被缓存在这里，因此我们以后无需再次执行所有这些工作。

The basic process followed is to:

遵循的基本过程是：

Collect all relevant information about a single foreign key relationship.收集有关单个外键关系的所有相关信息。
Build all of the INNER JOINs that relate this foreign key back to the target table via the specific relationship defined in step 1.通过步骤1中定义的特定关系，构建将此外键与目标表相关联的所有INNER JOIN。
Execute the count TSQL.执行计数TSQL。
Store the output of the count TSQL in our @foreign_keys table for use later.将count TSQL的输出存储在我们的@foreign_keys表中，以备后用。

结论（直到第2部分） (Conclusion (Until Part 2))

We’ve built a framework for traversing a hierarchy of foreign keys, and are well on our way towards our goal of effective schema research. In Part 2, we’ll apply some optimization to our stored procedure in order to speed up execution on larger, more complex databases. We’ll then put all the pieces together and demo the result of all of this work. Thanks for reading, and I hope you’re enjoying this adventure so far!

我们已经建立了一个遍历外键层次结构的框架，并且正在朝着有效模式研究的目标迈进。在第2部分中，我们将对存储过程进行一些优化，以加快在更大，更复杂的数据库上的执行。然后，我们将所有内容放在一起，并演示所有工作的结果。感谢您的阅读，希望您到目前为止喜欢这个冒险！

翻译自: https://www.sqlshack.com/mapping-schema-and-recursively-managing-data-part-1/

组织架构递归