Adaptive Propagation Graph Convolutional Network

翻译一篇TNN 的论文仅用于学习
原文章链接
有道翻译的也是用了第一人称。如果有错，一定是你对。

题目：Adaptive Propagation Graph Convolutional Network

自适应传播图卷积网络

摘要：

图卷积网络(GCNs)是一类神经网络模型，通过交叉顶点操作和跨节点的消息传递交换对图数据进行推理。关于后者，出现了两个关键问题:1)如何设计一个可微交换协议(例如，原始GCN中的一跳拉普拉斯平滑)，2)如何在复杂性上描述相对于本地更新的权衡。在本文中，我们展示了最先进的结果可以通过在每个节点上独立地调整通信步数来实现。特别地，我们为每个节点赋予一个暂停单元(灵感来自Graves的自适应计算时间[1])，在每次交换后决定是否继续通信。我们证明了所提出的自适应传播GCN (AP-GCN)在许多基准测试中实现了优于或类似于目前所提出的最佳模型的结果，同时在额外参数方面需要较小的开销。我们还研究了一个正则化术语，以加强通信和准确性之间的明确权衡。AP-GCN实验的代码是作为开源库发布的。

Graph convolutional networks (GCNs) are a family of neural network models that perform inference on graph data by interleaving vertexwise operations and message-passing exchanges across nodes. Concerning the latter, two key questions arise: 1) how to design a differentiable exchange protocol (e.g., a one-hop Laplacian smoothing in the original GCN) and 2) how to characterize the tradeoff in complexity with respect to the local updates. In this brief, we show that the state-ofthe- art results can be achieved by adapting the number of communication steps independently at every node. In particular, we endow each node with a halting unit (inspired by Graves adaptive computation time [1]) that after every exchange decides whether to continue communicating or not. We show that the proposed adaptive propagation GCN (AP-GCN) achieves superior or similar results to the best proposed models so far on a number of benchmarks while requiring a small overhead in terms of additional parameters. We also investigate a regularization term to enforce an explicit tradeoff between communication and accuracy. The code for the AP-GCN experiments is released as an open-source library.

introduction：

深度学习在许多高维输入方面取得了显著的成功，通过适当地设计可以利用其属性的架构偏差。这包括图像(通过卷积过滤器)[2]、文本[3]、生物医学序列[4]和视频[5]。因此，一个主要的研究问题是，如何通过实现新的可微块来将这种成功复制到其他类型的数据上。在各种可能性中，图表代表了世界上最大的数据来源之一，范围从推荐系统[6]到生物医学应用[7]、社会网络[8]、计算机程序[9]、知识库[10]和许多其他。

Deep learning has achieved remarkable success on a number of high-dimensional inputs, by properly designing architectural biases that can exploit their properties. This includes images (through convolutional filters) [2], text [3], biomedical sequences [4], and videos [5]. A major research question, then, is how to replicate this success on other types of data, through the implementation of novel differentiable blocks adequate to them. Among the possibilities, graphs represent one of the largest sources of data in the world, ranging from recommender systems [6] to biomedical applications [7], social networks [8], computer programs [9], knowledge bases [10], and many others.

在其最一般的形式中，图是由一组顶点组成，顶点由一系列边连接，例如，社会关系、引文或任何形式的关系。图神经网络(gnn)可以通过交叉局部操作(在单个节点或边上定义)和通信步骤来设计，利用图拓扑来组合局部输出。然后，这些架构可以用于各种任务，从节点分类到边缘预测和路径计算。

In its most general form, a graph is composed of a set of vertices connected by a series of edges representing, e.g., social connections, citations, or any form of relation. Graph neural networks (GNNs) [11] [13], then, can be designed by interleaving local operations (defined on either individual nodes or edges) with communication steps, exploiting the graph topology to combine the local outputs. These architectures can then be exploited for a variety of tasks, ranging from node classification to edge prediction and path computation.

在过去几年提出的不同GNN模型家族中，图卷积网络(GCNs)[14]已经成为节点和图分类的一种事实上的标准，代表了图处理上下文中最简单(但有效)的构建块之一。GCNs是通过交叉顶点操作构建的，通过单个全连接层实现，利用图的所谓拉普拉斯矩阵进行通信。在实践中，单一的GCN层提供了跨邻居的加权信息组合，表示本地化的单跳信息交换。

Among the different families of GNN models proposed over the last years, graph convolutional networks (GCNs) [14] have become a sort of de facto standard for node and graph classification, representing one of the simplest (yet efficient) building blocks in the context of graph processing.GCNs are built by interleaving vertexwise operations, implemented via a single fully connected layer, with a communication step exploiting the so-called Laplacian matrix of the graph. In practice, a single GCN layer provides a weighted combination of information across neighbors, representing a localized one-hop exchange of information.1

以GCN层为基本构建块，最近几个研究问题受到了广泛的关注，最引人注目的是:1)如何设计更有效的通信协议，能够提高GCN的准确性和潜在地更好地利用图的结构，2)如何权衡本地(顶点)操作的数量与通信步骤[18]。虽然我们将相关工作的完整概述推迟到第二节，但我们在这里简要提到两个关键结果。首先，Li等人的[19]表明，拉普拉斯算子(一种平滑算子)的使用导致重复应用标准GCN层往往会过度平滑数据，不允许简单地叠加GCN层来获得极深的网络。其次，Klicpera等人的[18]表明，只要完全将节点之间的通信与顶点操作分离，用PageRank变量替换拉普拉斯通信步骤，就可以获得最先进的结果。我们稍后将利用这两个关键结果。

Taking the GCN layer as a fundamental building block, several research questions have received vast attention lately, most notably: 1) how to design more effective communication protocols, able to improve the accuracy of the GCN and potentially better leverage the structure of the graph [15] [17] and 2) how to trade off the amount of local (vertexwise) operations with the communication steps [18]. While we defer a complete overview of related works to Section II, we briefly mention two key results here. First, Li et al. [19] showed that the use of the Laplacian (a smoothing operator) has as a consequence that repeated application of standard GCN layers tends to oversmooth the data, disallowing the possibility of naively stacking GCN layer to obtain extremely deep networks. Second, Klicpera et al. [18] showed that the state-of-the-art results can be obtained by replacing the Laplacian communication step with a PageRank variation, as long as completely separating communication between nodes from the vertexwise operations. We exploit both of these key results later on.

Contributions of This Brief：

我们注意到，绝大多数改进前面提到的第1点的建议包括选择一个最大的通信步骤T，并为T步骤迭代一个简单的协议，以便在T跳邻居之间传播信息。在本文中，我们提出了以下研究问题:如果允许每个顶点的通信步数独立变化，是否可以提高GCN层的性能？

We note that the vast majority of proposals to improve point 1) mentioned before consists in selecting a certain maximum number of communication steps T and iterating a simple protocol for T steps in order to diffuse the information across T -hop neighbors. In this brief, we ask the following research question: can the performance of GCN layers be improved, if the number of communication steps is allowed to vary independently for each vertex ？

为了回答这个问题，我们提出了一种GCN的变体，我们称之为自适应传播GCN (AP-GCN)。在AP-GCN(见图1)中，每个顶点都被赋予一个额外的单元，该单元输出一个值来控制通信是继续进行下一步(从而结合较远的邻居的信息)还是停止，并保留最后的值以进行进一步处理。为了实现这个自适应单元，我们利用之前在递归神经网络[1]中关于自适应计算时间的工作，设计了一个可微方法来学习这种传播策略。在大量的比较和基准测试中，我们表明AP-GCN可以达到最先进的结果，而通信步骤的数量不仅在数据集之间，而且在各个顶点之间也可能有显著差异。在计算时间和额外的可训练参数方面，这是以极小的开销实现的。此外，我们进行了一个大型超参数分析，表明我们的方法可以提供一种简单的方法来平衡GCN的精度与传播步骤的数量。

To answer this question, we propose a variation of GCN that we call adaptive propagation GCN (AP-GCN). In the AP-GCN (see Fig. 1), every vertex is endowed with an additional unit that outputs a value controlling whether the communication should continue for another step (hence combining the information from neighbors farther away) or should stop, and the final value should be kept for further processing. In order to implement this adaptive unit, we leverage previous work on adaptive computation time in recurrent neural networks [1] to design a differentiable method to learn this propagation strategy. On an extensive set of comparisons and benchmarks, we show that AP-GCN can reach state-of-the-art results, while the number of communication steps can vary significantly not only across data sets but also across individual vertexes. This is achieved with an extremely small overhead in terms of computational time and additional trainable parameters. In addition, we perform a large hyperparameter analysis, showing that our method can provide a simple way to balance the accuracy of the GCN with the number of propagation steps.

Graph Convolutional Networks：

更一般地说，拉普拉斯矩阵可以用不同的方式重新规格化(见[14])，或者用图上定义的任何适当的移位算子替换。如第二节所述，GCN的名称来源于对第(1)项的GSP[20]的解释。一个图的傅里叶变换可以通过考虑拉普拉斯矩阵[23]的特征分解来定义。在这种情况下，(1)可以被证明等价于用线性滤波器[14]实现的图卷积。因为它的实现只需要在邻居之间进行一跳交换，所以GCN也是消息传递神经网络(MPNN)[13]的一个例子。

这两种解释为(1)中的基本模型提出了两类扩展，我们对其进行评论，认为它们与我们提出的方法有关。首先，在GSP解释下，用更复杂的滤波器代替线性滤波操作是有意义的特别是，多项式滤波器可以通过结合来自每个节点的高阶邻域的信息来实现，这取决于多项式[32]的程度。例如，Chebyshev过滤器[25]会导致下面的层(为简单起见省略了偏差)

然而，如果将(1)更广义地解释为MPNN，我们并不局限于考虑过滤操作。事实上，(1)最一般的扩展为(为简化单个节点i表示)[13]

be implemented as generic neural networks or any other differentiable mechanism.（实现为一般神经网络或任何其他可微机制。）最值得注意的是，Klicpera等人[18]提出使用(近似的)PageRank协议的传播步骤，以抵消重复应用拉普拉斯矩阵[19]的超平滑效应，尽管传播步骤的最大数量仍然必须由用户先验选择。

有趣的是，PageRank传播[18]和密切相关的ARMA模型[16]可以理解为在图[33]上近似有理滤波器，这通常比线性或多项式滤波器更有表现力。

Interestingly, PageRank propagation [18] and the closely related ARMA models [16] can be understood as approximating rational filters on the graph [33], which are in general more expressive than linear or polynomial filters.

Designing and Training Deep GCNs

在经典的深度网络的精神下，可以组合III-B节中描述的基本构建块来设计更深层次的架构。例如，一个具有单个隐藏层和一个输出层的二进制分类网络，根据(1)实现，定义为

其中，可调整的权重为W、v、b和c。在[17]中推广的一个更近期的推理是，以(3)的形式实现架构，使w ?和W ?(打不出来，看原文吧)更深层次的网络，但不交叉多个节点和传播步骤。我们在这里遵循这一设计原则，因为我们发现它在实践中表现得更好。

where the adaptable weights are W, v, b, and c. A more recent line of reasoning, popularized in [17], is to implement architectures in the form (3), making both w？and W? deeper network, but without interleaving multiple nodewise and propagation steps. We follow this design principle here, as we have found it to perform better empirically.

一旦一个特定的网络f被设计，它的优化遵循与其他深度网络相同的策略。例如，对于节点分类(如第三- a节所述)，我们使用已知节点标签上的交叉熵损失来优化网络。

Once a specific network f has been designed, its optimization follows the same strategies as for other deep networks. For example, for node classification (as described in Section III-A), we optimize the network with a cross-entropy loss on the known node labels 。

然而，请注意，与标准神经网络不同的是，f (xi)的输出将取决于几个其他节点，这取决于特定的架构。因此，(5)在随机方式下更难有效地求解[34]。

Note, however, that differently from standard neural networks, the output of f (xi ) will depend on several other nodes, depending on the specific architecture. For this reason, (5) is harder to solve efficiently in a stochastic fashion [34].

PROPOSED ADAPTIVE PROPAGATION PROTOCOL：

在第二节和第三节中，我们分析了在图中使用具有复杂扩散步骤的图模块的动机。然而，绝大多数的建议都考虑了一个单一的、为图中所有节点共享的最大通信步骤数[例如，(2)中的数字K]。在本节中，我们介绍了一种新的GCN变体，它为每个节点独立选择通信步数，并在训练过程中对该数目进行调整和实时计算。据我们所知，我们提出的AP-GCN是文献中唯一结合这两个特性的模型。

In Sections II and III, we analyzed the motivation for having graph modules with complex diffusion steps across the graph. However, the vast majority of proposals have considered a single, maximum number of communication steps that are shared for all the nodes in the graph [e.g., the number K in (2)]. In this section, we introduce a novel variation of GCN in which the number of communication steps is selected independently for every node and this number is adapted and computed on-the-fly during training. To the best of our knowledge, our proposed AP-GCN is the only model in the literature combining these two properties.

我们的AP-GCN框架如图1所示。考虑到(3)中的符号，我们从传播步骤 $Ψ\Psi$ 分离节点操作 $ψ\psi$ 。前者通过在单个节点 $zj=ψ(xj)z_j=\psi(x_j)$ 上应用通用神经网络实现，如图1左侧所示。然后，这种嵌入被用作迭代完成传播步骤 $Ψ\Psi$ 的起始种子。

该方案的关键是，传播步数依赖于节点i的索引，在传播过程中自适应计算。实现该机制的灵感来自于循环神经网络(RNNs)[1]的自适应计算时间。
首先，我们赋予每个节点一个线性二分类器作为传播过程的暂停单元。在传播的一般迭代k之后，我们按节点计算

为了确保传播步骤的数目保持合理，遵循[1]，我们采用了两种技术。首先，我们确定一个最大迭代次数T。其次，我们使用停止值的运行和来定义传播过程的预算

当k = Ki时，达到预算，节点i在迭代k时停止传播。我们将暂停概率合并如下

通过这种方式，序列{pi}形成了一个有效的停止概率{hi}的累积分布。利用它，而不是使用最新的价值传播，我们可以自适应地组合信息在每一步免费

传播步骤的数量可以通过定义传播代价Si来控制，类似于[1]，它表示更新第i个节点所需的传播步骤的数量

传播惩罚负责在计算时间和精度之间进行权衡。此外，它还规定了信息在图表上传播的容易程度。在实际操作中，主网络每L步交替优化一次停机单元(在我们的实验中，L = 5)。