SQL Server中的聚集索引与堆

摘要 (Summary)

There are few topics so widely misunderstood and that generates such frequent bad advice as that of the decision of how to index a table. Specifically, the decision to use a heap over a clustered index is one where misinformation spreads quite frequently.

很少有如此广泛的主题被误解，并且会经常产生错误的建议，例如如何索引表的决定。具体而言，决定在错误索引中传播错误信息的决定是在聚集索引上使用堆。

This article is a dive into SQL Server internals, performance testing, temporary objects, and all topics that relate to the choice of heap vs. clustered index.

本文深入探讨了SQL Server内部，性能测试，临时对象以及所有与选择堆索引还是聚集索引有关的主题。

常见的误解 (The Common Misconceptions)

Have you ever heard any of these statements?

您听过其中的任何陈述吗？

A heap is faster than a clustered index because there is less overhead to write data to it 堆比聚集索引快，因为将数据写入其中的开销较小
Clustered indexes take resources to create and maintain and may not be worth it 聚集索引占用了创建和维护的资源，可能不值得
Heaps allow for faster access to random data 堆可以更快地访问随机数据
Heaps are smaller and take up less storage and memory resources 堆更小，占用的存储和内存资源更少

The internet is full of these and many other statements that are either false or that address edge-cases so extremely that they should not be introduced without that predicate. Many of these ideas seep into our development teams and become a topic of debate or conversation, ultimately influencing how we design database objects and access them.

互联网上充斥着这些以及许多其他陈述，这些陈述要么是错误的，要么是针对极端情况的，它们是如此极端，以至于没有该谓词就不应引入它们。其中许多想法渗入我们的开发团队，并成为辩论或对话的主题，最终影响了我们设计数据库对象和访问它们的方式。

什么是堆？什么是聚集索引？ (What is a Heap? What is a Clustered Index?)

When we discuss these terms, we are referring to the underlying logical structure of a table. This has little impact on our ability to query a table and return results. Ignoring the impact of latency, we can access data in a table successfully, regardless of how we index it. The focus of this article will be on query performance and how these choices will make our queries faster or slower, as this will often be the key metric of success when reviewing application speed.

当我们讨论这些术语时，我们指的是表的底层逻辑结构。这对我们查询表和返回结果的能力影响很小。忽略延迟的影响，无论我们如何对其进行索引，我们都可以成功访问表中的数据。本文的重点是查询性能以及这些选择如何使我们的查询更快或更慢，因为在审查应用程序速度时，这通常是成功的关键指标。

堆 (Heaps)

A heap is a table that is stored without any underlying order. When rows are inserted into a heap, there is no way to ensure where the pages will be written nor are those pages guaranteed to remain in the same order as the table is written to or when maintenance is performed against it.

堆是存储的表，没有任何基础顺序。将行插入到堆中时，无法确保将页面写入何处，也无法保证这些页面保持与表写入顺序相同或对表执行维护时的顺序。

Logically, the heap is comprised of an Index Allocation Map (IAM) that points to all pages within the heap. Each page will contain as many rows of data as will fit, as they are written. Within the heap, there is no linking or organization between the pages. All reads and writes must consult the IAM first to read all pages within a heap. The following illustration shows a simplified model of a heap:

从逻辑上讲，堆由指向堆中所有页面的索引分配图（IAM）组成。每页包含的数据行数将与写入的行数一样多。在堆中，页面之间没有链接或组织。所有读取和写入操作都必须先咨询IAM才能读取堆中的所有页面。下图显示了堆的简化模型：

While there are many other considerations about how a heap is stored and how its data is managed, the most important aspect of it is lack of order. The primary reason why heaps behave as they do will be that the rows are stored without any specified order. This fact will have generally negative implications on read and write operations.

关于堆的存储方式以及如何管理堆数据还有许多其他考虑，但堆的最重要方面是缺乏顺序。堆的行为之所以如此，主要的原因是存储的行没有任何指定的顺序。这个事实通常会对读写操作产生负面影响。

聚集索引 (Clustered Index)

The alternative to an unordered heap is to define a table with a clustered index. This index provides an innate ordering for the table it is defined on and follows whatever column order the index is defined on. In a clustered index, when rows are inserted, updated, or deleted, the underlying order of data is retained.

无序堆的替代方法是定义一个具有聚集索引的表。该索引为其定义的表提供了固有的顺序，并遵循定义索引的任何列顺序。在聚集索引中，当插入，更新或删除行时，数据的基本顺序将保留。

A clustered index is stored as a binary tree (B-tree for short). This structure starts with a root node and branches out in pairs to additional nodes until enough exists to cover the entire table’s worth of values for the index. In addition to providing an ordering of data, the nodes of the B-tree provide pointers to the next and previous rows in the data set. We can visualize a clustered index as follows:

聚簇索引存储为二叉树（简称B树）。该结构从根节点开始，成对分支到其他节点，直到存在足够覆盖整个表的索引值为止。除了提供数据排序之外，B树的节点还提供指向数据集中下一行和上一行的指针。我们可以将聚簇索引可视化如下：

The rows of data are only stored in the leaf nodes at the lowest level of the index. These are the data pages that contain all the columns in the table, ordered by the clustered index columns. The remaining index nodes are used to organize data based on the values for the column(s) being indexed. In the diagram above, we are indexing numbers from 1-1000. With each level of index nodes, the range is broken down into smaller and smaller ranges. If we wanted to return all rows with the value of 698, we would traverse the tree as follows:

数据行仅存储在索引最低级别的叶节点中。这些是数据表，其中包含表中的所有列，并按聚簇索引列排序。其余索引节点用于根据要索引的列的值来组织数据。在上图中，我们正在索引1-1000的数字。对于索引节点的每个级别，范围都分为越来越小的范围。如果我们想返回所有值为698的行，我们将遍历树如下：

Start at the root node 从根节点开始
Move to the index node 501-100 移至索引节点501-100
Move to the index node 501-750 移至索引节点501-750
Move to the leaf node and locate the pages that contain rows with a value of 698, if any exist 移至叶节点并找到包含值698的行（如果存在）的页面

Mathematically, this is a significantly faster way to return rows as we can locate them in far fewer steps, rather than being forced to read every row in the table prior to returning results. Note that all pages in the index contain pointers to and from the previous and next nodes. This structure also guarantees a default sort based on the columns of the clustered index, which can help satisfy sorting operations in queries when they are needed.

从数学上讲，这是一种返回行的更快的方法，因为我们可以以更少的步骤查找行，而不是被迫在返回结果之前读取表中的每一行。请注意，索引中的所有页面都包含指向上一个节点和下一个节点的指针。该结构还保证基于聚簇索引的列进行默认排序，这可以帮助满足查询中需要时的排序操作。

逻辑存储与物理存储 (Logical vs. Physical Storage)

It is important to note that a clustered index does not describe a physical structure on disk. Pages are 8kb chunks of storage that are allocated to store index and row data. While pages reference each other via linked list pointers within the index, those pages do not have to be stored in any particular order on disk. How data is stored will be affected by the type of storage used, SQL Server configuration settings, and how often data is written.

重要的是要注意，聚集索引没有描述磁盘上的物理结构。页面是8kb的存储块，分配给存储索引和行数据。虽然页面通过索引内的链接列表指针相互引用，但这些页面不必以任何特定顺序存储在磁盘上。数据的存储方式将受到所用存储类型，SQL Server配置设置以及数据写入频率的影响。

As data is updated, inserted, and deleted, the amount of data in each page will shift, growing or shrinking. If a page fills up in the middle of an operation, then it will be split into two new pages in order to accommodate the new data. These page splits are what lead to fragmentation and are a phenomenon that can affect both heaps and clustered indexes. Due to their unordered nature, though, heaps will tend to take on fragmentation faster in most common use cases.

随着数据的更新，插入和删除，每页中的数据量将移动，增长或缩小。如果页面在操作过程中占满，那么它将被分为两个新页面以容纳新数据。这些页面拆分是导致碎片的原因，并且是一种会影响堆和聚集索引的现象。但是，由于其无序的性质，在大多数常见用例中，堆往往会更快地产生碎片。

有关临时对象和内存中对象的说明 (Notes on Temporary and In-Memory Objects)

All demos in this article will be based on permanent, physical tables. In general, these results will mirror the performance you see on temporary tables or table variables when considering solely the impact of using a heap.

本文中的所有演示都将基于永久的物理表。通常，仅考虑使用堆的影响时，这些结果将反映您在临时表或表变量上看到的性能。

Memory-optimized objects are a different implementation altogether and should not be designed in the same fashion as standard tables. In-memory objects to not require a clustered index, though they must have a non-clustered index defined against it that acts similarly to how a clustered index would behave normally. Additional hash indexes may be added as needed to account for another filtering/sorting/aggregation needs. Since data is not stored on pages, fragmentation is not the concern it is with disk-based tables.

内存优化对象是完全不同的实现，因此不应以与标准表相同的方式进行设计。内存中对象不需要聚集索引，尽管它们必须具有针对该对象定义的非聚集索引，该非聚集索引的行为类似于聚集索引的正常行为。可以根据需要添加其他哈希索引，以解决另一个过滤/排序/聚合需求。由于数据没有存储在页面上，因此与基于磁盘的表无关的是碎片。

To summarize: This article is not a discussion of memory-optimized tables and the advice and ideas here should not be applied to any in-memory objects.

总结：本文不是对内存优化表的讨论，此处的建议和想法不应应用于任何内存中对象。

性能比较 (Performance Comparison)

The remainder of our work will be to compare tables with clustered indexes to heaps and draw some conclusions about their behavior and performance. This process will inspect INSERT, UPDATE, DELETE, and MERGE operations, using data from Adventureworks to feed these tables quickly. The end result will be a comparison of reads, writes, table size, and query performance against each table with and without indexes.

我们剩下的工作将是比较具有聚簇索引的表与堆，并得出有关其行为和性能的一些结论。此过程将使用Adventureworks中的数据快速检查这些表，从而检查INSERT，UPDATE，DELETE和MERGE操作。最终结果将是对具有和不具有索引的每个表的读取，写入，表大小和查询性能的比较。

Let’s begin by creating two identical tables, except that one has a clustered index and the other does not:

让我们从创建两个相同的表开始，除了一个表具有聚集索引，而另一个没有：

CREATE TABLE dbo.heap_test(heap_test_id INT NOT NULL IDENTITY(1,1),person_first_name VARCHAR(100) NOT NULL,person_last_name VARCHAR(100) NOT NULL,person_location_id INT NOT NULL,person_birth_date DATE NULL,last_activity_time DATETIMEOFFSET NOT NULL,created_time DATETIMEOFFSET NOT NULL);CREATE TABLE dbo.clustered_index_test(heap_test_id INT NOT NULL IDENTITY(1,1),person_first_name VARCHAR(100) NOT NULL,person_last_name VARCHAR(100) NOT NULL,person_location_id INT NOT NULL,person_birth_date DATE NULL,last_activity_time DATETIMEOFFSET NOT NULL,created_time DATETIMEOFFSET NOT NULL);
CREATE CLUSTERED INDEX CI_clustered_index_test ON dbo.clustered_index_test (heap_test_id);

插入操作 (INSERT Operations)

Let’s insert some data into these tables from Person.Person:

让我们从Person.Person插入一些数据到这些表中：

INSERT INTO dbo.heap_test(person_first_name, person_last_name, person_location_id, person_birth_date, last_activity_time, created_time)
SELECTPerson.FirstName,Person.LastName,1,Person.ModifiedDate,Person.ModifiedDate,GETUTCDATE()
FROM Person.Person;INSERT INTO dbo.clustered_index_test(person_first_name, person_last_name, person_location_id, person_birth_date, last_activity_time, created_time)
SELECTPerson.FirstName,Person.LastName,1,Person.ModifiedDate,Person.ModifiedDate,GETUTCDATE()
FROM Person.Person;

At first glance, these statements seem identical, and reviewing the execution plans confirms their similarities:

乍一看，这些语句看起来是相同的，并且查看执行计划可以确认它们的相似之处：

Aside from the nature of the insert, we can see that row counts and query costs are identical. Now, let’s look at the IO statistics for these operations:

除了插入的性质外，我们可以看到行数和查询成本是相同的。现在，让我们看一下这些操作的IO统计信息：

On our first insert, the heap required 20 times more reads than the clustered index. But, if we insert more rows, the reads needed to write to the heap remain about the same, while the reads on the clustered index table increase:

在我们的第一个插入中，堆需要的读取次数比聚集索引多20倍。但是，如果我们插入更多行，则写入堆所需的读取次数大致相同，而聚集索引表上的读取次数会增加：

If we continue to insert, we’ll find reads settle around this level. Still more efficient than inserting into a heap, but not as much so as when the table was first created. In terms of query duration, the clustered index in this scenario performs about as well as the heap. If we were to add a nonclustered index to the heap, then we’d see reads go up even further

如果继续插入，我们将发现读量稳定在此水平附近。它比插入到堆中更有效，但是不如第一次创建表时那么高。在查询持续时间方面，这种情况下的聚集索引的性能与堆一样好。如果我们要向堆中添加非聚集索引，那么读取将进一步增加

Running tests repeatedly return similar results, which helps us see that in this scenario a heap is not more efficient by any metric, and is a clear loser with regards to logical IO.

反复运行测试会返回相似的结果，这有助于我们了解在这种情况下，按任何度量标准，堆的效率都不高，并且在逻辑IO方面显然是输家。

磁盘空间使用 (Disk Space Usage)

After inserting many rows and adding a nonclustered index to the heap, let’s review disk usage:

在插入许多行并将非聚集索引添加到堆之后，让我们回顾一下磁盘使用情况：

We can see that the data space is similar, with the clustered index using slightly more space than the heap. On the other hand, the clustered index is practically free for the non-heap, whereas the nonclustered index on the heap costs about a 30% penalty on space to maintain. If an index is needed for subsequent queries and that index can be the clustered index, then having a clustered index is vastly preferable than a nonclustered index on a heap, from the perspective of disk utilization.

我们可以看到数据空间是相似的，聚簇索引使用的空间比堆略多。另一方面，对于非堆，聚簇索引实际上是免费的，而对堆而言，非聚簇索引的维护空间代价约为30％。如果后续查询需要索引，并且该索引可以是聚集索引，那么从磁盘利用率的角度来看，具有聚集索引比堆上非聚集索引更可取。

Otherwise, the tables are identical, with about 300k rows a piece from the many times I ran the insert queries above.

否则，这些表是相同的，我多次运行上面的插入查询时，每张表大约有30万行。

更新操作 (UPDATE Operations)

UPDATE dbo.heap_testSET person_location_id = 2
WHERE person_first_name = 'Terri'
AND person_last_name = 'Duffy';
UPDATE dbo.clustered_index_testSET person_location_id = 2
WHERE person_first_name = 'Terri'
AND person_last_name = 'Duffy';

When we execute the update statements above, we can review the execution plan and IO operations:

当我们执行上述更新语句时，我们可以查看执行计划和IO操作：

We can see that overall query performance is similar, with minor differences in the query cost (52% to 48%) and IO (2136 reads vs. 2220 reads). Let’s now test the update of a row based on searching via an indexed column:

我们可以看到，总体查询性能是相似的，查询成本（52％至48％）和IO（2136次读取与2220次读取）存在细微差别。现在让我们基于通过索引列进行的搜索来测试行的更新：

UPDATE dbo.heap_testSET person_location_id = 2
WHERE heap_test_id = 2;
UPDATE dbo.clustered_index_testSET person_location_id = 2
WHERE heap_test_id = 2;

The resulting performance is as follows:

产生的性能如下：

The execution plans are basically identical. Different operations, but a similar cost. The heap requires more effort to seek a nonclustered index to update a single value than the clustered index table. This will generally be true, regardless of how much data is in the table or how much we want to update. Searching for data via an index on a heap

执行计划基本上是相同的。不同的操作，但成本相似。与聚集索引表相比，堆需要更多的精力来寻找非聚集索引来更新单个值。不管表中有多少数据或我们要更新多少，这通常都是正确的。通过堆上的索引搜索数据

选择操作 (SELECT Operations)

Let’s start by a table scan based on a filter on an unindexed column:

让我们从基于未索引列的过滤器的表扫描开始：

SELECT *
FROM dbo.heap_test
WHERE person_location_id = 2;
SELECT *
FROM dbo.clustered_index_test
WHERE person_location_id = 2;

The results are as follows:

结果如下：

What we see is that the heap had a similar execution plan and reads to the clustered index table, though reads were slightly lower. Reading an entire table is not an uncommon use case, but will be an undesired test for most large data sets. A more common scenario that we would care about would be seeking based on an indexed value, such as in this example:

我们看到的是，堆具有类似的执行计划，并且读取到聚集索引表，尽管读取次数略低。读取整个表并非罕见的用例，但对于大多数大型数据集而言，这将是不希望的测试。我们关心的一个更常见的情况是基于索引值进行搜索，例如在本示例中：

SELECT *
FROM dbo.heap_test
WHERE heap_test_id = 2;
SELECT *
FROM dbo.clustered_index_test
WHERE heap_test_id = 2;

Here are the execution plans and IO stats for these queries:

以下是这些查询的执行计划和IO状态：

In this case, a seek on the indexed heap requires a key lookup in order to retrieve the additional columns requested by the query, making this a significantly more expensive option. While an important index on a heap could be converted into a covering index, that would incur additional storage and maintenance overhead on top of what we already have allocated. Reads are also slightly higher, and generally will be higher for most seek operations when compared to using a clustered index.

在这种情况下，对索引堆的查找需要关键字查找，以便检索查询所请求的其他列，这使该选项明显昂贵。尽管可以将堆上的重要索引转换为覆盖索引，但是这将在我们已经分配的基础上增加额外的存储和维护开销。读取也略高，并且与使用聚簇索引相比，大多数查找操作的读取量通常会更高。

删除操作 (DELETE Operations)

The effort to delete rows will be similar to that of a SELECT operation, but we will find that deletion against a heap requires more IO than in a clustered index. Consider the following example where we remove the rows we were just returning:

删除行的工作与SELECT操作类似，但是我们发现针对堆的删除需要比在聚集索引中更多的IO。考虑以下示例，其中删除了刚返回的行：

DELETE
FROM dbo.heap_test
WHERE heap_test_id = 2;DELETE
FROM dbo.clustered_index_test
WHERE heap_test_id = 2;

Here are the performance results:

这是性能结果：

The cost to delete rows from a heap is significantly higher than from a clustered index, both in terms of query cost and reads. These costs will vary based on indexing on the table, where more indexes will typically increase the write costs, but overall the cost to update an index and delete from a table will be higher when the table is a heap, rather than having a clustered index.

从查询成本和读取角度来看，从堆中删除行的成本明显高于从聚簇索引中删除行的成本。这些成本将根据表上的索引而有所不同，更多的索引通常会增加写成本，但是总的来说，更新表和从表中删除时，如果表是堆，而不是具有聚集索引，则开销会更高。。

统计和执行计划质量注意事项 (Notes on Statistics and Execution Plan Quality)

The query optimizer has the challenging task of having to come up with a good execution plan in a short time given statistics and other query metrics. Lacking any organized indexes, queries against heaps will occasionally incur poor execution plans. This is not common, but is most often realized when joins exist between a heap and other tables.

查询优化器具有艰巨的任务，即必须在给定统计信息和其他查询指标的情况下在短时间内提出良好的执行计划。缺少任何组织的索引，针对堆的查询有时会导致执行计划不佳。这并不常见，但是最常在堆与其他表之间存在联接时实现。

If you run into a query that is producing a poor plan and involves a heap, consider testing the addition of a clustered index to the table. Even if the table is being scanned, an ordered data set will perform significantly better under some circumstances than others, either by removing an additional sort operation or by increasing the metadata available to the optimizer to make a better plan decision. This is especially true when working with temporary objects, where we often neglect indexing needs. This leads us to an additional topic that is worth introducing…

如果遇到的查询产生的计划很差并且涉及到堆，请考虑测试向表中添加聚簇索引。即使正在扫描表，在某些情况下，有序数据集的性能也将明显优于其他数据集，方法是删除其他排序操作或增加优化器可用的元数据以做出更好的计划决策。在使用临时对象时，尤其是这样，我们经常忽略索引需求。这使我们引出了一个值得介绍的附加主题……

临时物体 (Temporary Objects)

We often create temporary tables or table variables in SQL Server to facilitate the collection or transformation of data via intermediary steps, prior to moving that data into a permanent data store or reporting target. Due to their transient nature, temporary objects rarely get the same level of architecture scrutiny that standard tables receive. As a result, indexes, statistics, and constraints are ignored in favor of expediency.

我们经常在SQL Server中创建临时表或表变量，以在将数据移至永久数据存储或报告目标之前，通过中间步骤促进数据的收集或转换。由于其临时性，临时对象很少会获得与标准表相同级别的体系结构审查。结果，为了方便起见，索引，统计信息和约束被忽略。

All of the performance considerations and experiments discussed thus far apply to temporary objects as well. If you are troubleshooting a poorly-performing query against an unindexed temporary table or table variable, experiment adding indexes to it that support your queries. As always, these indexes incur maintenance and creation costs, so they should be added in scenarios where query performance is poor enough to warrant incurring those costs.

到目前为止讨论的所有性能考量和实验也适用于临时对象。如果要针对未索引的临时表或表变量对性能不佳的查询进行故障排除，请尝试向其中添加支持查询的索引。与往常一样，这些索引会产生维护和创建成本，因此应在查询性能差到足以导致这些成本的情况下添加这些索引。

For temporary objects that are being repeatedly written to prior to their final consumption, indexes can benefit each of those operations greatly. In general, if you are unsure whether a temporary object should be indexed, then err on the side of caution and add at least a clustered index if there will be any filters or joins against it.

对于在最终消耗之前被重复写入的临时对象，索引可以极大地有益于每个操作。通常，如果不确定是否应为临时对象建立索引，则应谨慎行事，如果有任何筛选器或联接可能会添加至少一个聚集索引。

For scenarios in which temporary table performance is critical, an even better solution is to use a memory-optimized table variable, which allows temporary data to be stored in memory, rather than in TempDB.

对于临时表性能至关重要的方案，一个更好的解决方案是使用内存优化的表变量，该变量允许将临时数据存储在内存中，而不是存储在TempDB中。

例外情况 (The Exceptions)

In general, heaps will perform worse than tables with clustered indexes. There is a very limited set of scenarios in which a heap will offer superior performance. These scenarios are typically those in which table scans are desired and there are few joins being made against the object. For example, our UPDATE operations above showed a heap as performing marginally better, with the reason primarily being that a scan was required to locate the data needed for the update.

通常，堆的性能将比具有聚簇索引的表差。在极少数情况下，堆将提供卓越的性能。这些方案通常是希望进行表扫描并且很少针对该对象进行联接的方案。例如，上面的UPDATE操作显示堆的性能稍好，其原因主要是需要扫描才能找到更新所需的数据。

Our job as technology experts is to be able to identify common use cases and code for them and relegate exceptions as one-offs that we handle on an as-needed basis. As a result, creating heaps without a well-thought-out and documented purpose is likely going to create more problems than it is worth. We should use heaps when we truly know what we are doing, have tested, and have proven out that they will indeed perform better than a table with a clustered index.

作为技术专家，我们的工作是能够识别常见的用例并为其编写代码，并将异常作为我们根据需要处理的一次性事件。结果，创建没有深思熟虑和有据可查的目的的堆可能会产生比其价值更大的问题。当我们真正知道我们在做什么，经过测试并证明了它们确实比具有聚簇索引的表的性能更好时，就应该使用堆。

This methodology should be used throughout our work in general, which will help ensure that we do not fixate on an exception, turn it into a rule, and make poor decisions based on it.

一般而言，应在整个工作过程中使用此方法，这将有助于确保我们不将异常视作例外，将其转化为规则，并基于此做出错误的决策。

结论 (Conclusion)

The only true test of performance will be our own tests that we perform using our data and schema. Only with rigorous testing should we even consider using heaps when designing database schema. Heaps will typically perform worse than clustered indexed tables and sometimes those performance problems won’t become readily apparent until a future time when data has grown or app/code complexity has increased.

唯一真正的性能测试将是我们自己使用数据和架构执行的测试。只有通过严格的测试，我们才应该在设计数据库架构时考虑使用堆。堆的性能通常比聚集索引的表差，并且有时直到将来数据增长或应用程序/代码复杂性增加时，这些性能问题才变得显而易见。

This generalization applies to temporary objects as well. We should give table variables and temporary tables the same level of architectural vigor that we apply to our permanent objects. Heaps should be used sparingly and only in scenarios where we have a high level of certainty that they will not hinder performance.

这种概括也适用于临时对象。我们应该为表变量和临时表赋予与应用于永久对象相同级别的体系结构活力。应当谨慎使用堆，并且仅在我们高度确定它们不会妨碍性能的情况下使用。

This is a topic that receives a wide array of disinformation across the internet. Words such as “always” and “never” are thrown around with little thought as to their meaning. Use heaps with caution and consider how an application will grow over time. The project to migrate billions of rows from a heap into a clustered index will be a hassle for anyone tasked with it, so thinking ahead and including the clustered index up-front will save your future self the headache of managing that project.

这个主题在互联网上收到各种各样的虚假信息。诸如“总是”和“从不”之类的词被抛在脑后，对其含义不加思索。谨慎使用堆，并考虑应用程序随时间增长的方式。将数十亿行从堆中迁移到聚簇索引中的项目对任何承担此任务的人来说都是一件麻烦事，因此，提前考虑并包括聚簇索引将使您以后的工作不再麻烦。

进一步阅读 (Further Reading)

Memory Optimized Table Variables内存优化表变量
Microsoft’s Documentation on HeapsMicrosoft的堆文档

翻译自: https://www.sqlshack.com/clustered-index-vs-heap/