sql批量插入防止重复插入_使用SQL批量插入锁定配置

sql批量插入防止重复插入

One challenge we may face when using SQL bulk insert is whether we want to allow access during the operation or prevent access and how we coordinate this with possible following transactions. We’ll look at working with a few configurations of this tool and how we may apply them in OLAP, OLPT, and mixed environments where we may want to use the tool’s flexibility for our data import needs.

使用SQL批量插入时，我们可能面临的一个挑战是我们是要在操作期间允许访问还是要阻止访问以及我们如何与可能的后续事务进行协调。我们将研究使用此工具的一些配置，以及如何将其应用到OLAP，OLPT和混合环境中，在这些环境中我们可能希望使用该工具的灵活性来满足数据导入需求。

注意事项 (Considerations)

The first point that we want to consider is whether our environment should lock transactions using SQL bulk inserts and loads or whether we should still allow access. OLAP and OLTP environments may have different requirements with the former allowing us to use locks more than the latter because OLTP environments tend to have a “live” requirement. If we have a schedule where we must load data over a period before reporting, we will have more flexibility to load data with hints that can increase performance. Also, for hybrid environments, we may have tables that we can use these hints on the table level or even during the actual insert.

我们要考虑的第一点是我们的环境是应该使用SQL批量插入和加载来锁定事务，还是应该仍然允许访问。 OLAP和OLTP环境可能有不同的要求，前者允许我们比后者使用更多的锁，因为OLTP环境往往具有“实时”要求。如果我们有一个计划，必须在报告之前的一段时间内加载数据，则我们将具有更大的灵活性来加载具有可提高性能的提示的数据。同样，对于混合环境，我们可能有一些表，我们可以在表级别甚至在实际插入过程中使用这些提示。

The next consideration is whether we want to lock transactions on the transaction level or table level. If we use a table solely for a bulk load, we may have more flexibility here. If our table is involved in SQL bulk inserts and in data feeds, where frequent data are fed to the table, we may want to avoid locks. In data feeds, we also expect frequent reads to get the new data making locks a possible problem for live or delayed reporting.

接下来要考虑的是我们是否要在事务级别或表级别锁定事务。如果仅将表用于批量加载，则此处可能具有更大的灵活性。如果我们的表涉及SQL批量插入和数据馈送（将频繁的数据馈送到该表），则我们可能希望避免锁定。在数据馈送中，我们还希望频繁读取数据可以锁定新数据，从而锁定实时或延迟报告的可能问题。

负载配置实验 (Experiments with Load Configurations)

We’ll look at an example by loading a file’s data into a table and experimenting with various lock techniques for the table. In our example, we will have a file of over 2 million lines with the sentence “The quick brown fox jumped over the lazy dogs” starting on line 2 and repeating. The first line (which we’ll skip in our SQL bulk insert) will have “Data” and we can create this file using any scripting language that allows a loop, or we can copy and paste the lines ourselves in batches. While this example uses this full sentence, we could also use one letter on each line or one word on each line.

我们将通过将文件的数据加载到表中并为该表尝试各种锁定技术来查看示例。在我们的示例中，我们将有一个超过200万行的文件，其中的句子“快速的棕色狐狸跳过了懒狗”从第二行开始重复。第一行（我们将在SQL批量插入中跳过）将带有“数据”，我们可以使用允许循环的任何脚本语言来创建此文件，也可以分批复制和粘贴这些行。尽管此示例使用了完整的句子，但我们也可以在每一行使用一个字母或在每一行使用一个单词。

As an alternative, if you already have a large custom text file for testing imports, you can use that file if the mappings involve one column or if you have a table that has identical mappings for the import. The timed results of the loads shown in the below examples may differ depending on the system and data you have.

或者，如果您已经具有用于测试导入的大型自定义文本文件，则如果映射涉及一列，或者您的表具有与导入相同的映射，则可以使用该文件。以下示例中显示的加载定时结果可能会有所不同，具体取决于您的系统和数据。

Example the first five lines of our test import file

示例测试导入文件的前五行

Once our file is large enough (in my example over 2 million lines), we will then SQL bulk insert the file’s data into our table that created with one column. Since we’re mapping the one line of data to one column, we do not specify a field terminator. The below image shows the select from the first five results of our bulk load into the table we created. In the code, I include a drop table that can be used when everything is done being tested.

一旦文件足够大（在我的示例中超过200万行），我们将通过SQL将文件的数据批量插入到用一列创建的表中。由于我们将一行数据映射到一列，因此我们没有指定字段终止符。下图显示了从批量装入我们创建的表的前五个结果中选择的内容。在代码中，我包含一个放置表，可以在完成所有测试后使用。

CREATE TABLE tblImport(FileData VARCHAR(MAX)
)BULK INSERT tblImport
FROM 'C:\Import\import.txt'
WITH (
ROWTERMINATOR = '\n',FIRSTROW=2
)SELECT *
FROM tblImport---- Remove table when finished
--DROP TABLE tblImport

The first five results of our SQL bulk insert.

SQL批量插入的前五个结果。

The load consumed 7 seconds for 2.45 million rows.

负载消耗了7秒，消耗了245万行。

When I ran this SQL bulk insert, the execution time from start to finish required 7 seconds (within the 5 to 10-second range). Now, we’ll run the same insert and adding one specification of TABLOCK for our bulk load operation – locking the table during the load:

当我运行此SQL批量插入时，从开始到完成的执行时间需要7秒（在5到10秒的范围内）。现在，我们将运行相同的插入操作，并为批量加载操作添加一个TABLOCK规范-在加载期间锁定表：

BULK INSERT tblImport
FROM 'C:\Import\import.txt'
WITH (ROWTERMINATOR = '\n',FIRSTROW=2,TABLOCK
)

The result of this is reduced time to insert the same amount of data.

这样的结果是减少了插入相同数据量的时间。

The same transaction with a table lock on the actual insert, consuming 4 seconds.

带有实际表插入锁定的同一事务，耗时4秒。

The advantage of specifying a table lock on the bulk load level is that if this table was used for both a reporting schedule and a feed, we wouldn’t be locking the table on every load – only the large insert that would require a table lock, if that’s the specification. This means that if we had 100 SQL bulk inserts on the table throughout the day and 1 of those load required a performance boost along with locking access on the table due to the nature of the load, we could use the TABLOCK specification for the 1 load while the other 99 loads would be unaffected. This is useful in these mixed contexts.

在批量加载级别上指定表锁的好处是，如果将此表用于报告计划和Feed，则我们不会在每次加载时都锁定该表–只有需要表锁的大型插入，如果这是规范。这意味着，如果我们在一天内在表上有100个SQL批量插入，并且由于负载的性质，其中有1个负载需要提高性能以及对表进行锁定访问，则可以对1个负载使用TABLOCK规范而其他99个负载将不受影响。这在这些混合环境中很有用。

According to Microsoft’s notes on specifying this option, this lock only occurs for the length of the actual bulk load – in other words, if we had further transformed following this in the same transaction, this lock would not apply to these further transactions (we would want to specify lock hints for them as well, if this was the desired behavior). Likewise, we can simultaneously bulk load the same table even if this option if specified, provided the destination table of the load has no indexes (columnstore indexes being the exception here).

根据Microsoft关于指定此选项的说明，此锁定仅在实际大容量负载的长度内发生–换句话说，如果我们在同一事务中对其进行了进一步转换，则此锁定将不适用于这些其他事务（我们将还希望为其指定锁定提示（如果这是所需的行为）。同样，我们可以同时批量加载同一张表，即使指定了此选项，只要加载的目标表没有索引（列存储索引是此处的例外）。

What about the scenario where the table is only for a reporting schedule where any SQL bulk insert must be locked during any load. We could still specify the TABLOCK option in our code during the actual insert or on the transaction level, but we can also add this option on the table level. In the below code, we set the lock on the table level using the Microsoft procedure sp_tableoption and perform a check to ensure that this option was saved successfully.

如果表仅用于报告计划，那么在任何加载期间都必须锁定任何SQL批量插入的情况下该怎么办？在实际插入过程中或在事务级别上，我们仍然可以在代码中指定TABLOCK选项，但也可以在表级别上添加此选项。在下面的代码中，我们使用Microsoft过程sp_tableoption在表级别设置锁定，并执行检查以确保成功保存此选项。

SELECT[name] TableName, CASE WHEN lock_on_bulk_load = 0 THEN 'False' ELSE 'True' END AS SBILock
FROM sys.tables
WHERE [name] = 'tblImport'EXEC sp_tableoption 'tblImport', 'table lock on bulk load', '1'SELECT[name] TableName, CASE WHEN lock_on_bulk_load = 0 THEN 'False' ELSE 'True' END AS SBILock
FROM sys.tables
WHERE [name] = 'tblImport'

The results of our query checking the lock on bulk load option for the specific table.

查询结果检查特定表的“批量加载锁定”选项。

Now, when I run the bulk load transaction with this option set on the table level from the execution above code and removing the TABLOCK option, I get a similar time with the lock set on the table level:

现在，当我从上面的代码执行中在表级别设置了此选项的情况下运行批量装入事务并删除TABLOCK选项时，在表级别设置了锁的时间也差不多：

BULK INSERT tblImport
FROM 'C:\Import\import.txt'
WITH (ROWTERMINATOR = '\n',FIRSTROW=2
)

With a lock on the table level, we see a similar result to the TABLOCK option specified on the SQL bulkinsert

锁定表级别后，我们看到的结果与在SQL bulkinsert上指定的TABLOCK选项相似

The advantage here in appropriate development contexts is that we wouldn’t need to specify the TABLOCK option on each of our SQL bulk insert calls. This would also mean that during loads, the table would be locked.

在适当的开发上下文中，这样做的好处是我们不需要在每个SQL批量插入调用中都指定TABLOCK选项。这也意味着在加载期间，桌子将被锁定。

As a note, to disable this option on the table, we would run the below call to the Microsoft stored procedure sp_tableoption:

注意，要在表上禁用此选项，我们将对Microsoft存储过程sp_tableoption运行以下调用：

EXEC sp_tableoption 'tblImport'
, 'table lock on bulk load'
, '0'

最后的想法 (Final Thoughts)

Do we have regular reports with live data or scheduled reports with SQL bulk inserts? We may use locks in situations where we have scheduled reports that must be completed by a time whereas if we use bulk loads with live data, they may require consistent access 我们是否具有包含实时数据的常规报告或具有SQL批量插入的定期报告？如果计划的报告必须在某个时间之前完成，则在某些情况下我们可能会使用锁，而如果将批量加载与实时数据一起使用，则可能需要一致的访问权限
With some exceptions, we may find it most appropriate to lock a table during a bulk load on the load itself or on the transaction if we have transforms that immediately follow and we want no access granted during this time 除某些例外，如果我们有紧随其后的转换并且我们希望在此期间不授予访问权限，则在批量加载期间本身或事务上锁定表最合适。
While we’ve looked at adding locks on a table in situations where we don’t want anyone to access the table while we SQL bulk insert data (increasing the performance), we can also apply additional performance measures such as removing indexes or dropping and re-creating the table, if we do not require the table’s existence before loading data
虽然我们考虑了在不希望任何人通过SQL批量插入数据（提高性能）的情况下访问表的情况下在表上添加锁，但是我们还可以应用其他性能措施，例如删除索引或删除和如果在加载数据之前不需要表的存在 ，则重新创建表

翻译自: https://www.sqlshack.com/lock-configurations-with-sql-bulk-insert/

sql批量插入防止重复插入