postgresql 锁

object-level locks (specifically, relation-level locks), as well as 对象级别的锁 (特别是关系级别的锁)，以及row-level locks with their connection to object-level locks and also explored wait queues, which are not always fair.行级别的锁及其与对象级别的锁的连接，还探讨了等待队列，但这并不总是公平的。

We have a hodgepodge this time. We'll start with deadlocks (actually, I planned to discuss them last time, but that article was excessively long in itself), then briefly review object-level locks left and finally discuss predicate locks.

这次我们有个大杂烩。我们将从死锁开始(实际上，我计划上次讨论死锁，但是这篇文章本身篇幅太长了)，然后简要回顾一下剩下的对象级锁 ，最后讨论谓词锁 。

死锁 (Deadlocks)

When using locks, we can confront a deadlock. It occurs when one transaction tries to acquire a resource that is already in use by another transaction, while the second transaction tries to acquire a resource that is in use by the first. The figure on the left below illustrates this: solid-line arrows indicate acquired resources, while dashed-line arrows show attempts to acquire a resource that is already in use.

使用锁时，我们可能会遇到死锁。当一个事务试图获取另一个事务已经使用的资源，而第二个事务试图获取第一个事务使用的资源时，就会发生这种情况。左下图说明了这一点：实线箭头表示获取的资源，而虚线箭头表示尝试获取已使用的资源。

To visualize a deadlock, it is convenient to build the wait-for graph. To do this, we remove specific resources, leave only transactions and indicate which transaction waits for which other. If a graph contains a cycle (from a vertex, we can get to itself in a walk along arrows), this is a deadlock.

为了可视化死锁，可以方便地构建等待图。为此，我们删除特定的资源，仅保留事务，并指示哪个事务等待其他事务。如果一个图形包含一个循环(从一个顶点开始，我们可以沿着箭头漫步)，这就是一个死锁。

A deadlock can certainly occur not only for two transactions, but for any larger number of them.

当然，不仅对于两个事务，而且对于任何数量的事务，都可能发生死锁。

If a deadlock occured, the involved transactions can do nothing but wait infinitely. Therefore, all DBMS, including PostgreSQL, track locks automatically.

如果发生死锁，则所涉及的事务只能无限期地等待。因此，包括PostgreSQL在内的所有DBMS都会自动跟踪锁定。

The check, however, requires a certain effort, and it's undesirable to make it each time a new lock is requested (deadlocks are pretty infrequent after all). So, when a process tries to acquire a lock, but cannot, it queues and «falls asleep», but sets the timer to the value specified in the deadlock_timeout parameter (1 second by default). If the resource gets free earlier, this is fine and we skimp on the check. But if on expiration of deadlock_timeout, the wait continues, the waiting process will wake up and initiate the check.

但是，该检查需要一定的努力，并且每次请求新的锁定时都不希望这样做(毕竟死锁很少发生)。因此，当进程尝试获取锁但无法获取锁时，它将排队并“入睡”，但将计时器设置为deadlock_timeout参数中指定的值(默认为1秒)。如果该资源较早获得免费，那很好，我们会跳过检查。但是，如果在deadlock_timeout到期时，等待继续进行，则等待过程将唤醒并启动检查。

If the check (which consists in building the wait-for graph and searching it for cycles) does not detect deadlocks, it continues sleeping, this time «until final victory».

如果检查(包括建立等待图表并搜索周期)没有检测到死锁，则它将继续Hibernate，这一次“直到最终胜利”。

Earlier, I was fairly reproached in the comments for not mentioning the lock_timeout parameter, which affects any operator and allows avoiding an infinitely long wait: if a lock cannot be acquired during the time specified, the operator terminates with a lock_not_available error. Do not confuse this parameter with statement_timeout, which limits the total time to execute the operator, no matter whether the latter waits for a lock or does a regular work.

早些时候，我在评论中责骂我没有提及lock_timeout参数，该参数会影响任何运算符，并可以避免无限长的等待时间：如果在指定时间内无法获取锁，则运算符将以lock_not_available错误终止。不要将此参数与statement_timeout混淆，后者会限制执行该操作符的总时间，无论该操作符是等待锁还是进行常规工作。

But if a deadlock is detected, one of the transactions (in most cases, the one that initiated the check) is forced to abort. This releases the locks it acquired and enables other transactions to continue.

但是，如果检测到死锁，则其中一个事务(大多数情况下是发起检查的事务)将被迫中止。这将释放它获取的锁，并使其他事务继续进行。

Deadlocks usually mean that the application is designed incorrectly. There are two ways to detect such situations: first, messages will occur in the server log and second, the value of pg_stat_database.deadlocks will increase.

死锁通常意味着应用程序设计不正确。有两种方法可以检测到这种情况：第一，消息将出现在服务器日志中，第二， pg_stat_database.deadlocks的值将增加。

死锁示例 (Example of deadlocking)

Usually deadlocks are caused by an inconsistent order of locking table rows.

通常，死锁是由锁定表行的顺序不一致引起的。

Let's consider a simple example. The first transaction is going to transfer 100 rubles from the first account to the second one. To this end, the transaction reduces the first account:

让我们考虑一个简单的例子。第一笔交易将从第一个帐户中转移100卢布到第二个帐户中。为此，交易减少了第一个帐户：

=> BEGIN;
=> UPDATE accounts SET amount = amount - 100.00 WHERE acc_no = 1;

UPDATE 1

At the same time, the second transaction is going to transfer 10 rubles from the second account to the first one. And it starts with reducing the second account:

同时，第二笔交易将从第二个帐户向第一个帐户转帐10卢布。从减少第二个帐户开始：

|  => BEGIN;
|  => UPDATE accounts SET amount = amount - 10.00 WHERE acc_no = 2;

|  UPDATE 1

Now the first transaction tries to increase the second account, but detects a lock on the row.

现在，第一个交易尝试增加第二个帐户，但检测到行上的锁定。

=> UPDATE accounts SET amount = amount + 100.00 WHERE acc_no = 2;

Then the second transaction tries to increase the first account, but also gets blocked.

然后，第二笔交易尝试增加第一个帐户，但也被阻止。

|  => UPDATE accounts SET amount = amount + 10.00 WHERE acc_no = 1;

So a circular wait arises, which won't end on its own. In a second, the first transaction, which cannot access the resource yet, initiates a check for a deadlock and is forced to abort by the server.

因此出现了循环等待，它不会自行结束。在第二个步骤中，尚无法访问资源的第一个事务将启动检查死锁并被服务器强制中止。

ERROR:  deadlock detected
DETAIL:  Process 16477 waits for ShareLock on transaction 530695; blocked by process 16513.
Process 16513 waits for ShareLock on transaction 530694; blocked by process 16477.
HINT:  See server log for query details.
CONTEXT:  while updating tuple (0,2) in relation "accounts"

Now the second transaction can continue.

现在，第二笔交易可以继续。

|  UPDATE 1

|  => ROLLBACK;

=> ROLLBACK;

The correct way to perform such operations is to lock resources in the same order. For example: in this case, accounts can be locked in ascending order of their numbers.

执行此类操作的正确方法是以相同顺序锁定资源。例如：在这种情况下，帐户可以按其编号的升序锁定。

两个UPDATE命令的死锁 (Deadlock of two UPDATE commands)

Sometimes we can get a deadlock in situations where, seemingly, it could never occur. For example: it is convenient and usual to treat SQL commands as atomic, but the UPDATE command locks rows as they are updated. This does not happen instantaneously. Therefore, if the order in which a command updates rows is inconsistent with the order in which another command does this, a deadlock can occur.

有时，在似乎永远不会发生的情况下，我们可能会陷入僵局。例如：将SQL命令视为原子命令是方便且通常的，但是UPDATE命令在更新行时将其锁定。这不会立即发生。因此，如果命令更新行的顺序与另一命令执行行的顺序不一致，则会发生死锁。

Although such a situation is unlikely, it can still occur. To reproduce it, we will create an index on the amount column in descending order of amount:

尽管这种情况不太可能发生，但仍然可能发生。要重现它，我们将创建的索引amount的递减顺序列amount ：

=> CREATE INDEX ON accounts(amount DESC);

To be able to watch what happens, let's create a function that increases the passed value, but very-very slowly, for as long as an entire second:

为了能够观察到发生的情况，让我们创建一个函数，该函数可以增加传递的值，但是非常非常缓慢，直到一整秒：

=> CREATE FUNCTION inc_slow(n numeric) RETURNS numeric AS $$SELECT pg_sleep(1);SELECT n + 100.00;
$$ LANGUAGE SQL;

We will also need the pgrowlocks extension.

我们还将需要pgrowlocks扩展。

=> CREATE EXTENSION pgrowlocks;

The first UPDATE command will update the entire table. The execution plan is evident — it is sequential scan:

第一条UPDATE命令将更新整个表。执行计划很明显-它是顺序扫描：

|  => EXPLAIN (costs off)
|  UPDATE accounts SET amount = inc_slow(amount);

|           QUERY PLAN
|  ----------------------------
|   Update on accounts
|     ->  Seq Scan on accounts
|  (2 rows)

Since tuples on the table page are located in ascending order of the amount (exactly how we added them), they will also be updated in the same order. Let the update start.

由于表页面上的元组以金额的升序排列(正是我们添加它们的方式)，因此它们也将以相同的顺序进行更新。让更新开始。

|  => UPDATE accounts SET amount = inc_slow(amount);

At the same time, in another session we'll forbid sequential scans:

同时，在另一个会话中，我们将禁止顺序扫描：

||     => SET enable_seqscan = off;

In this case, for the next UPDATE operator, the planner decides to use index scan:

在这种情况下，对于下一个UPDATE运算符，计划者决定使用索引扫描：

||     => EXPLAIN (costs off)
||     UPDATE accounts SET amount = inc_slow(amount) WHERE amount > 100.00;

||                            QUERY PLAN
||     --------------------------------------------------------
||      Update on accounts
||        ->  Index Scan using accounts_amount_idx on accounts
||              Index Cond: (amount > 100.00)
||     (3 rows)

The second and third rows meet the condition, and since the index is built in descending order of the amount, the rows will be updated in a reverse order.

第二和第三行满足条件，并且由于索引是按金额的降序构建的，因此这些行将以相反的顺序进行更新。

Let's run the next update.

让我们运行下一个更新。

||     => UPDATE accounts SET amount = inc_slow(amount) WHERE amount > 100.00;

A quick look into the table page shows that the first operator already managed to update the first row (0,1) and the second operator updated the last row (0,3):

快速浏览表页面显示，第一个操作员已经设法更新了第一行(0,1)，第二个操作员更新了最后一行(0,3)：

=> SELECT * FROM pgrowlocks('accounts') \gx

-[ RECORD 1 ]-----------------
locked_row | (0,1)
locker     | 530699            <- the first
multi      | f
xids       | {530699}
modes      | {"No Key Update"}
pids       | {16513}
-[ RECORD 2 ]-----------------
locked_row | (0,3)
locker     | 530700            <- the second
multi      | f
xids       | {530700}
modes      | {"No Key Update"}
pids       | {16549}

One more second elapses. The first operator updated the second row, and the second one would like to do the same, but cannot.

再过一秒钟。第一个操作员更新了第二行，而第二个操作员也想这样做，但不能。

=> SELECT * FROM pgrowlocks('accounts') \gx

-[ RECORD 1 ]-----------------
locked_row | (0,1)
locker     | 530699            <- the first
multi      | f
xids       | {530699}
modes      | {"No Key Update"}
pids       | {16513}
-[ RECORD 2 ]-----------------
locked_row | (0,2)
locker     | 530699            <- the first was quicker
multi      | f
xids       | {530699}
modes      | {"No Key Update"}
pids       | {16513}
-[ RECORD 3 ]-----------------
locked_row | (0,3)
locker     | 530700            <- the second
multi      | f
xids       | {530700}
modes      | {"No Key Update"}
pids       | {16549}

Now the first operator would like to update the last table row, but it is already locked by the second operator. Hence a deadlock.

现在，第一个操作员想要更新最后一个表行，但是它已经被第二个操作员锁定。因此陷入僵局。

One of the transactions aborts:

事务之一中止：

||     ERROR:  deadlock detected
||     DETAIL:  Process 16549 waits for ShareLock on transaction 530699; blocked by process 16513.
||     Process 16513 waits for ShareLock on transaction 530700; blocked by process 16549.
||     HINT:  See server log for query details.
||     CONTEXT:  while updating tuple (0,2) in relation "accounts"

And the second one continues:

第二个继续：

|  UPDATE 3

lock manager README.锁管理器README中找到有关检测和防止死锁的详细信息。

This completes a talk on deadlocks, and we proceed to the remaining object-level locks.

到此结束了有关死锁的讨论，然后我们继续进行其余的对象级锁。

锁定非关系 (Locks on non-relations)

When we need to lock a resource that is not a relation in the meaning of PostgreSQL, locks of the object type are used. Almost whatever we can think of can refer to such resources: tablespaces, subscriptions, schemas, enumerated data types and so on. Roughly, this is everything that can be found in the system catalog.

当我们需要锁定与PostgreSQL 无关的资源时，将使用object类型的锁。我们几乎可以想到的任何东西都可以引用这些资源：表空间，订阅，模式，枚举数据类型等。大致上，这就是可以在系统目录中找到的所有内容。

Illustrating this by a simple example. Let's start a transaction and create a table in it:

通过一个简单的例子说明。让我们开始一个事务并在其中创建一个表：

=> BEGIN;
=> CREATE TABLE example(n integer);

Now let's see what locks of the object type appeared in pg_locks:

现在，让我们看看pg_locks出现了pg_locks object类型的锁：

=> SELECTdatabase,(SELECT datname FROM pg_database WHERE oid = l.database) AS dbname,classid,(SELECT relname FROM pg_class WHERE oid = l.classid) AS classname,objid,mode,granted
FROM pg_locks l
WHERE l.locktype = 'object' AND l.pid = pg_backend_pid();

database | dbname | classid |  classname   | objid |      mode       | granted
----------+--------+---------+--------------+-------+-----------------+---------0 |        |    1260 | pg_authid    | 16384 | AccessShareLock | t16386 | test   |    2615 | pg_namespace |  2200 | AccessShareLock | t
(2 rows)

To figure out what in particular is locked here, we need to look at three fields: database, classid and objid. We start with the first line.

为了弄清楚什么特别锁定在这里，我们需要查看三个字段： database ， classid和objid 。我们从第一行开始。

database is the OID of the database that the resource being locked relates to. In this case, this column contains zero. It means that we deal with a global object, which is not specific to any database.database是与资源锁定相关的数据库的OID。在这种情况下，此列包含零。这意味着我们处理一个全局对象，该对象并非特定于任何数据库。 classid contains the OID from classid包含来自pg_class that matches the name of the system catalog table that actually determines the resource type. In this case, it is pg_class的OID，该OID与实际上确定资源类型的系统目录表的名称匹配。在这种情况下，它是pg_authid, that is, a role (user) is the resource.pg_authid ，即角色(用户)是资源。 objid contains the OID from the system catalog table indicated by objid包含系统目录表中由classid.classid指示的OID。

=> SELECT rolname FROM pg_authid WHERE oid = 16384;

rolname
---------student
(1 row)

We work as student, and this is exactly the role locked.

我们作为student工作，而这正是锁定的角色。

Now let's clarify the second line. The database is specified, and it is test, to which we are connected.

现在让我们澄清第二行。该数据库是指定的，它是test ，这是我们连接。

classid indicates the classid表示pg_namespace table, which contains schemas.pg_namespace表，其中包含模式。

=> SELECT nspname FROM pg_namespace WHERE oid = 2200;

nspname
---------public
(1 row)

This shows that the public schema is locked.

这表明public架构已锁定。

So, we've seen that when an object is created, the owner role and schema in which the object is created get locked (in a shared mode). And this is reasonable: otherwise, someone could drop the role or schema while the transaction is not completed yet.

因此，我们已经看到，当创建对象时，创建该对象的所有者角色和架构将被锁定(在共享模式下)。这是合理的：否则，有人可能会在事务尚未完成时放弃角色或架构。

=> ROLLBACK;

锁定关系扩展 (Lock on relation extension)

When the number of rows in a relation (table, index or materialized view) increases, PostgreSQL can use free space in available pages for inserts, but evidently, once new pages also have to be added. Physically they are added at the end of the appropriate file. And this is meant by a relation extension.

当关系(表，索引或实例化视图)中的行数增加时，PostgreSQL可以在可用页面中使用可用空间进行插入，但是很显然，一旦必须添加新页面。实际上，它们被添加到适当文件的末尾。这就是关系扩展 。

To ensure that two processes do not rush to add pages simultaneously, the extension process is protected by a specialized lock of the extend type. The same lock is used when vacuuming indexes for other processes to be unable to add pages during the scan.

为确保两个进程不会急于同时添加页面，扩展进程由extend类型的专用锁保护。在清理索引以使其他进程无法在扫描期间添加页面时，将使用相同的锁。

This lock is certainly released without waiting for completion of the transaction.

当然，无需等待事务完成即可释放此锁。

页面锁定 (Page lock)

Page-level locks of the page type are used in the only case (aside from predicate locks, to be discussed later).

page类型的page级锁仅在这种情况下使用(除了谓词锁，这将在后面讨论)。

GIN indexes enable us to accelerate search in compound values, for instance: words in text documents (or array elements). To a first approximation, these indexes can be represented as a regular B-tree that stores separate words from the documents rather than the documents themselves. Therefore, when a new document is added, the index has to be rebuilt pretty much in order to add there each new word from the document.GIN索引使我们能够加快复合值的搜索，例如：文本文档(或数组元素)中的单词。初步估计，这些索引可以表示为常规B树，该B树存储文档中的单独单词，而不是文档本身。因此，当添加新文档时，必须重新构建索引，以便在其中添加文档中的每个新单词。

For better performance, GIN index has a postponed insert feature, which is turned on by the fastupdate storage parameter. New words are quickly added to an unordered pending list first, and after a while, everything accumulated is moved to the main index structure. The gains are due to a high probability of occurrence of the same words in different documents.

为了获得更好的性能，GIN索引具有延迟插入功能，该功能由fastupdate存储参数fastupdate 。首先将新单词快速添加到无序的待处理列表中 ，过一会儿，所有累积的内容都将移至主索引结构。收益是由于在不同文档中出现相同单词的可能性很高。

To prevent moving from the pending list to the main index by several processes simultaneously, for the duration of moving, the index metapage gets locked in an exclusive mode. This does not hinder regular use of the index.

为了防止通过多个进程同时从暂挂列表移动到主索引，在移动期间，索引元页被锁定为互斥模式。这不会妨碍索引的常规使用。

咨询锁 (Advisory locks)

Unlike other locks (such as relation-level locks), advisory locks are never acquired automatically — the application developer controls them. They are useful when, for instance, an application for some reason needs a locking logic that is not in line with the standard logic of regular locks.

与其他锁(例如关系级锁)不同， 咨询锁永远不会自动获取-应用程序开发人员控制它们。例如，当应用程序由于某种原因需要与常规锁的标准逻辑不符的锁定逻辑时，它们很有用。

Assume we have a hypothetical resource that does not match any database object (which we could lock using commands such as SELECT FOR or LOCK TABLE). We need to devise a numeric identifier for it. If a resource has a unique name, a simple option is to use its hash code:

假设我们有一个与任何数据库对象都不匹配的假设资源(我们可以使用诸如SELECT FOR或LOCK TABLE之类的命令锁定该资源)。我们需要为其设计一个数字标识符。如果资源具有唯一名称，则一个简单的选择是使用其哈希码：

=> SELECT hashtext('resource1');

hashtext
-----------991601810
(1 row)

This is how we have the lock acquired:

这就是我们获得锁的方式：

=> BEGIN;
=> SELECT pg_advisory_lock(hashtext('resource1'));

As usual, information on locks is available in pg_locks:

像往常一样，可以在pg_locks获得关于锁的信息：

=> SELECT locktype, objid, mode, granted
FROM pg_locks WHERE locktype = 'advisory' AND pid = pg_backend_pid();

locktype |   objid   |     mode      | granted
----------+-----------+---------------+---------advisory | 991601810 | ExclusiveLock | t
(1 row)

For locking to be really effective, other processes must also acquire a lock on the resource prior to accessing it. Evidently the application must ensure that this rule is observed.

为了使锁定真正有效，其他进程还必须在访问资源之前获得对该资源的锁定。显然，应用程序必须确保遵守此规则。

In the above example, the lock will be held through the end of the session rather than the transaction, as usual.

在上面的示例中，锁将一直保留到会话结束，而不是像往常一样保留事务。

=> COMMIT;
=> SELECT locktype, objid, mode, granted
FROM pg_locks WHERE locktype = 'advisory' AND pid = pg_backend_pid();

locktype |   objid   |     mode      | granted
----------+-----------+---------------+---------advisory | 991601810 | ExclusiveLock | t
(1 row)

And we need to explicitly release it:

我们需要明确释放它：

=> SELECT pg_advisory_unlock(hashtext('resource1'));

A rich collection of functions to work with advisory locks is available for all intents and purposes:

可用于所有目的和用途的功能丰富的集合与咨询锁一起使用：

pg_advisory_lock_shared has a shared lock acquired.

pg_advisory_lock_shared已获取共享锁。
pg_advisory_xact_lock (and pg_advisory_xact_lock_shared) has a shared lock acquired up to the end of the transaction.

pg_advisory_xact_lock (和pg_advisory_xact_lock_shared )具有一个共享锁，直到事务结束为止。
pg_try_advisory_lock (as well as pg_try_advisory_xact_lock and pg_try_advisory_xact_lock_shared) does not wait for a lock, but returns false if a lock could not be acquired immediately.

pg_try_advisory_lock (以及pg_try_advisory_xact_lock和pg_try_advisory_xact_lock_shared )不等待锁，但是如果不能立即获取锁，则返回false 。

A collection of try_ functions is one more technique to avoid waiting for a lock, in addition to those listed in the last article.

除了上一篇文章中列出的功能外， try_函数的集合是避免等待锁定的另一种技术。

谓词锁 (Predicate locks)

The predicate lock term occurred long ago, when early DBMS made first attempts to implement complete isolation based on locks (the Serializable level, although there was no SQL standard at that time). The issue they confronted then was that even locking of all read and updated rows did not ensure complete isolation: new rows that meet the same selection conditions can occur in the table, which causes phantoms to arise (see the article on isolation).

谓词锁定术语发生在很久以前，当时早期的DBMS首次尝试实现基于锁定的完全隔离(可序列化级别，尽管当时没有SQL标准)。他们当时面临的问题是，即使锁定所有已读取和更新的行也不能确保完全隔离：在表中可能会出现满足相同选择条件的新行，这会导致产生幻像 (请参阅有关隔离的文章 )。

The idea of predicate locks was to lock predicates rather than rows. If during execution of a query with the condition a > 10 we lock the a > 10 predicate, this won't allow us to add new rows that meet the condition to the table and will enable us to avoid phantoms. The issue is that this problem is computationally complicated; in practice, it can be solved only for very simple predicates.

谓词锁的想法是锁定谓词而不是行。如果在执行条件a > 10的查询期间，我们锁定a > 10谓词，则这将不允许我们向表中添加满足条件的新行，从而避免出现幻像。问题是该问题在计算上很复杂。实际上，只能针对非常简单的谓词才能解决该问题。

In PostgreSQL, the Serializable level is implemented differently, on top of the available isolation based on data snapshots. Although the predicate lock term is still used, its meaning drastically changed. Actually these «locks» block nothing; they are used to track data dependencies between transactions.

在PostgreSQL中，除了基于数据快照的可用隔离之外，可序列化级别的实现方式也有所不同。尽管仍使用谓词锁定术语，但其含义已发生了巨大变化。实际上，这些“锁”什么也挡不住。它们用于跟踪事务之间的数据依赖性。

It is proved that snapshot isolation permits an inconsistent write (write skew) anomaly and a read-only transaction anomaly, but any other anomalies are impossible. To figure out that we deal with one of the two above anomalies, we can analyze dependencies between transactions and discover certain patterns there.

事实证明，快照隔离允许不一致的写(写偏斜)异常和只读事务异常 ，但是任何其他异常都是不可能的。为了弄清楚我们处理了以上两个异常之一，我们可以分析事务之间的依赖性并在其中发现某些模式。

Dependencies of two kinds are of interest to us:

我们感兴趣的是两种依赖关系：

One transaction reads a row that is then updated by the second transaction (RW dependency).一个事务读取一行，然后由第二个事务更新(RW依赖性)。
One transaction updates a row that is then read by the second transaction (WR dependency).一个事务更新一行，然后由第二个事务读取(WR依赖项)。

We can track WR dependencies using already available regular locks, but RW dependencies have to be tracked specially.

我们可以使用已经可用的常规锁来跟踪WR依赖性，但是必须特别跟踪RW依赖性。

To reiterate, despite the name, predicate locks bock nothing. A check is performed at the transaction commit instead, and if a suspicious sequence of dependencies that may indicate an anomaly is discovered, the transaction aborts.

重申一下，尽管有名字，但谓词并没有锁定博克。而是在事务提交时执行检查，如果发现了可能指示异常的可疑依赖关系序列，则事务中止。

Let's look at how predicate locks are handled. To do this, we'll create a table with a pretty large number of locks and an index on it.

让我们看一下谓词锁的处理方式。为此，我们将创建一个包含大量锁和一个索引的表。

=> CREATE TABLE pred(n integer);
=> INSERT INTO pred(n) SELECT g.n FROM generate_series(1,10000) g(n);
=> CREATE INDEX ON pred(n) WITH (fillfactor = 10);
=> ANALYZE pred;

If a query is executed using sequential scan of the entire table, a predicate lock on the entire table gets acquired (even if not all rows meet the filtering condition).

如果使用对整个表的顺序扫描执行查询，则将获得对整个表的谓词锁定(即使不是所有行都满足过滤条件)。

|  => SELECT pg_backend_pid();

|   pg_backend_pid
|  ----------------
|            12763
|  (1 row)

|  => BEGIN ISOLATION LEVEL SERIALIZABLE;
|  => EXPLAIN (analyze, costs off)
|    SELECT * FROM pred WHERE n > 100;

|                             QUERY PLAN
|  ----------------------------------------------------------------
|   Seq Scan on pred (actual time=0.047..12.709 rows=9900 loops=1)
|     Filter: (n > 100)
|     Rows Removed by Filter: 100
|   Planning Time: 0.190 ms
|   Execution Time: 15.244 ms
|  (5 rows)

All predicate locks are acquired in one special mode — SIReadLock (Serializable Isolation Read):

所有谓词锁都在一种特殊模式下获取-SIReadLock(可序列化隔离读取)：

=> SELECT locktype, relation::regclass, page, tuple
FROM pg_locks WHERE mode = 'SIReadLock' AND pid = 12763;

locktype | relation | page | tuple
----------+----------+------+-------relation | pred     |      |
(1 row)

|  => ROLLBACK;

But if a query is executed using index scan, the situation changes for the better. If we deal with a B-tree, it is sufficient to have a lock acquired on the rows read and on the leaf index pages walked through — this allows us to track not only specific values, but all the range read.

但是，如果使用索引扫描执行查询，情况会更好。如果处理B树，则在读取的行和遍历的叶子索引页上获得锁就足够了–这使我们不仅可以跟踪特定值，还可以跟踪所有读取的范围。

|  => BEGIN ISOLATION LEVEL SERIALIZABLE;
|  => EXPLAIN (analyze, costs off)
|    SELECT * FROM pred WHERE n BETWEEN 1000 AND 1001;

|                                       QUERY PLAN
|  ------------------------------------------------------------------------------------
|   Index Only Scan using pred_n_idx on pred (actual time=0.122..0.131 rows=2 loops=1)
|     Index Cond: ((n >= 1000) AND (n <= 1001))
|     Heap Fetches: 2
|   Planning Time: 0.096 ms
|   Execution Time: 0.153 ms
|  (5 rows)

=> SELECT locktype, relation::regclass, page, tuple
FROM pg_locks WHERE mode = 'SIReadLock' AND pid = 12763;

locktype |  relation  | page | tuple
----------+------------+------+-------tuple    | pred       |    3 |   236tuple    | pred       |    3 |   235page     | pred_n_idx |   22 |
(3 rows)

Note a few complexities.

注意一些复杂性。

First, a separate lock is created for each read tuple, and the number of such tuples can potentially be very large. The total number of predicate locks in the system is limited by the product of parameter values: max_pred_locks_per_transaction × max_connections (the default values are 64 and 100, respectively). The memory for these locks is allocated at the server start; an attempt to exceed this limit will result in errors.

首先，为每个读取的元组创建一个单独的锁，并且此类元组的数量可能非常大。系统中谓词锁的总数受参数值乘积的限制： max_pred_locks_per_transaction × max_connections (默认值分别为64和100)。这些锁的内存是在服务器启动时分配的。尝试超过此限制将导致错误。

Therefore, escalation is used for predicate locks (and only for them!). Prior to PostgreSQL 10, the limitations were hard coded, but starting this version, we can control the escalation through parameters. If the number of tuple locks related to one page exceeds max_pred_locks_per_page, these locks are replaced with one page-level lock. Consider an example:

因此，升级用于谓词锁(并且仅用于它们！)。在PostgreSQL 10之前，限制是硬编码的，但是从此版本开始，我们可以通过参数控制升级。如果与一页相关的元组锁的数量超过max_pred_locks_per_page ，则将这些锁替换为一页级锁。考虑一个例子：

=> SHOW max_pred_locks_per_page;

max_pred_locks_per_page
-------------------------2
(1 row)

|  => EXPLAIN (analyze, costs off)
|    SELECT * FROM pred WHERE n BETWEEN 1000 AND 1002;

|                                       QUERY PLAN
|  ------------------------------------------------------------------------------------
|   Index Only Scan using pred_n_idx on pred (actual time=0.019..0.039 rows=3 loops=1)
|     Index Cond: ((n >= 1000) AND (n <= 1002))
|     Heap Fetches: 3
|   Planning Time: 0.069 ms
|   Execution Time: 0.057 ms
|  (5 rows)

We see one lock of the page type instead of three locks of the tuple type:

我们看到一个page类型的锁，而不是三元tuple类型的锁：

=> SELECT locktype, relation::regclass, page, tuple
FROM pg_locks WHERE mode = 'SIReadLock' AND pid = 12763;

locktype |  relation  | page | tuple
----------+------------+------+-------page     | pred       |    3 |      page     | pred_n_idx |   22 |
(2 rows)

Likewise, if the number of locks on pages related to one relation exceeds max_pred_locks_per_relation, these locks are replaced with one relation-level lock.

同样，如果与一种关系相关的页面上的锁数量超过max_pred_locks_per_relation ，则这些锁将替换为一种关系级别的锁。

There are no other levels: predicate locks are acquired only for relations, pages and tuples and always in the SIReadLock mode.

没有其他级别：谓词锁仅针对关系，页面和元组获取，并且始终处于SIReadLock模式。

Certainly, escalation of locks inevitably results in an increase of the number of transactions that falsely terminate with a serialization error, and eventually, the system throuthput will decrease. Here you need to balance RAM consumption and performance.

当然，锁升级不可避免地导致错误地终止并出现序列化错误的事务数量增加，最终系统吞吐量将减少。在这里，您需要平衡RAM消耗和性能。

The second complexity is that different operations with an index (for instance, due to splits of index pages when new rows are inserted) change the number of leaf pages that cover the range read. But the implementation takes this into account:

第二个复杂性是对索引的不同操作(例如，由于插入新行时索引页的拆分)会改变覆盖读取范围的叶页数。但是实现考虑到了这一点：

=> INSERT INTO pred SELECT 1001 FROM generate_series(1,1000);
=> SELECT locktype, relation::regclass, page, tuple
FROM pg_locks WHERE mode = 'SIReadLock' AND pid = 12763;

locktype |  relation  | page | tuple
----------+------------+------+-------page     | pred       |    3 |      page     | pred_n_idx |  211 |      page     | pred_n_idx |  212 |      page     | pred_n_idx |   22 |
(4 rows)

|  => ROLLBACK;

By the way, predicate locks are not always released immediately on completion of the transaction since they are needed to track dependencies between several transactions. But anyway, they are controlled automatically.

顺便说一句，谓词锁并不总是在事务完成后立即释放，因为需要它们来跟踪多个事务之间的依赖关系。但是无论如何，它们都是自动控制的。

By no means all types of indexes in PostgreSQL support predicate locks. Before PostgreSQL 11, only B-trees could boast of this, but that version improved the situation: hash, GiST and GIN indexes were added to the list. If index access is used, but the index does not support predicate locks, a lock on the entire index is acquired. This, certainly, also increases the number of false aborts of transactions.

绝不是PostgreSQL中的所有类型的索引都支持谓词锁。在PostgreSQL 11之前，只有B树可以吹嘘它，但是该版本改善了这种情况：将哈希，GiST和GIN索引添加到列表中。如果使用索引访问，但是索引不支持谓词锁，则会获取整个索引的锁。当然，这也增加了错误中止交易的次数。

Finally, note that it's the use of predicate locks that limits all transactions to working at the Serializable level in order to ensure complete isolation. If a certain transaction uses a different level, it just won't acquire (and check) predicate locks.

最后，请注意，谓词锁的使用将所有事务限制为可序列化级别运行，以确保完全隔离。如果某个事务使用了不同的级别，它将不会获取(并检查)谓词锁。

predicate locking README, to start exploring the source code with.谓词锁定README的链接以开始使用源代码。

Read on.继续阅读。

翻译自: https://habr.com/en/company/postgrespro/blog/504498/