paper reading:[Renormalization Trick] Semi-supervised classification with graph convolutional networks

文章目录

  • paper reading:[Renormalization Trick] Semi-supervised classification with graph convolutional networks
    • graph 中半监督学习的传统方式:
      • 核心公式:
      • 符号说明:
    • 公式成立的讨论:
      • assumption:
      • 局限:
    • 模型:
      • 模型简述:
        • localized first-order:
      • 模型适用场景(解决的问题):
      • 模型思路:
      • 模型所使用的信息:
    • fast approximate convolution :
      • layer-wise propagation rule:
      • spectral graph convolutions:
        • 卷积的频谱形式:
        • 切比雪夫多项式的截断展开化简gθ(Λ)g_{\theta}( \Lambda )gθ​(Λ)及其卷积化简:
      • layer-wise linear model:
        • 频域卷积的线性性:
        • Eq.5 λmax=2\lambda_{max}=2λmax​=2:
        • θ=θ0′=−θ1′\theta=\theta_0^{\ '}=-\theta_1^{\ '}θ=θ0 ′​=−θ1 ′​:
        • renormalization trick:
    • semi-supervised node classification:
      • 方法及适用范围:
      • 前向传播:
        • 核心公式:
        • 损失函数:
        • 训练方法:
        • 训练方法:

graph 中半监督学习的传统方式:

label information is smoothed over the graph via some form of explicit graph-based regularization, e.g. by using a graph Laplacian regularization term in the loss function:

核心公式:

L=L0+λLreg,Lreg=∑i,jAi,j∥f(Xi)−f(Xj)∥2=f(X)⊤Δf(X)L=L_0+λL_{reg}, L_{reg}=\sum_{i, j}A_{i,j}\parallel f(X_i)-f(X_j)\parallel ^2=f(X)^\top\Delta f(X) L=L0​+λLreg​,Lreg​=i,j∑​Ai,j​∥f(Xi​)−f(Xj​)∥2=f(X)⊤Δf(X)

符号说明:

  • L0L_0L0​:the supervised loss w.r.t. the labeled part of the graph。即对于有标签部分的图的有监督损失函数
  • λλλ : a weighing factor
  • XXX:matrix of node feature vectors XiX_iXi​
  • Δ=D−A\Delta = D − AΔ=D−A:unnormalized graph Laplacian of an undirected graph
  • AAA:adjacency matrix $A∈ \R^{N×N} $
  • DiiD_{ii}Dii​:degree matrix Dii=∑jAijD_{ii}=\sum _j A_{ij}Dii​=∑j​Aij​

公式成立的讨论:

assumption:

The formulation of Eq. 1 relies on the assumption that connected nodes in the graph are likely to share the same label.

即相连接的 node 倾向于拥有相似的 label。

局限:

This assumption, however, might restrict modeling capacity, as graph edges need not necessarily encode node similarity, but could contain additional information.

graph 中 node 的连接不一定表示标签的相似性,也可以表示其他的信息。


模型:

模型简述:

localized first-order approximation of spectral graph convolutions

即,频域图卷积的局部一阶近似。

#####spectral graph convolutions:

图卷积分为两种:空间域的图卷积、频域图卷积。此处用到的是频域图卷积。

上述分类的根据是:卷积的计算方式。

localized first-order:

即只考虑每一个节点的一阶邻域(first-order neighborhoods),从而实现局部性(localized)。


模型适用场景(解决的问题):

labels are only available for a small subset of nodes.

This problem can be framed as graph-based semi-supervised learning.

数据集中仅部分样本的标签可知,即半监督问题

We expect this setting to be especially powerful in scenarios where the adjacency matrix AAA contains information not present in the data XXX, such as citation links between documents in a citation network or relations in a knowledge graph.

即,该模型(方法)的适用场景:AAA的信息不包含在XXX中,比如 citation network 和 knowledge graph.

we consider here the task of transductive node classification within networks of significantly larger scale.


模型思路:

we encode the graph structure directly using a neural network model f(X,A)f(X, A)f(X,A) and train on a supervised target L0L_0L0​ for all nodes with labels, thereby avoiding explicit graph-based regularization in the loss function.

Conditioning f(⋅)f(·)f(⋅) on the adjacency matrix of the graph will allow the model to distribute gradient information from the supervised loss L0L_0L0​ and will enable it to learn representations of nodes both with and without labels.

对于有标签节点,使用神经网络f(X,A)f(X, A)f(X,A)进行监督学习 ⟹\Longrightarrow⟹ 损失函数消除λLregλL_{reg}λLreg​,仅保留L0L_0L0​

这种方法的实现为:

  • 计算输出时,对所有的节点计算输出
  • 损失函数,仅对训练集节点计算,即:优化仅对训练集数据进行

模型所使用的信息:

Our model scales linearly in the number of graph edges and learns hidden layer representations that encode both local graph structure and features of nodes.

即,网络会学习两种信息:

  • local graph structure,即 graph 的 edge
  • features of nodes,即 graph 的 node

fast approximate convolution :

layer-wise propagation rule:

H(l+1)=σ(D~−1/2A~D~−1/2H(l)W(l))H^{(l+1)}=\sigma (\widetilde{D}^{-1/2}\widetilde{A} \widetilde{D}^{-1/2} H^{(l)}W^{(l)}) H(l+1)=σ(D−1/2AD−1/2H(l)W(l))

  • A~=A+IN\widetilde{A}=A+I_NA=A+IN​:adjacency matrix of the undirected graph GGG with added self-connections
  • DDD:degree matrix
  • W(l)W^{(l)}W(l):layer-specific trainable weight matrix
  • $σ(·) $:activation function
  • H(l)H^{(l)}H(l) :∈RN×D∈ \R^{N×D } ∈RN×D,matrix of activations in the lthl^{th}lth layer,H(0)=XH^{(0)}=XH(0)=X

spectral graph convolutions:

卷积的频谱形式:

gθ∗x=UgθU⊤xg_{\theta} * x = U g_{\theta} U^{\top}x gθ​∗x=Ugθ​U⊤x

  • xxx:signal x∈RNx \in \R^Nx∈RN
  • gθg_{\theta}gθ​:filter gθ=diag(θ)g_{\theta} = diag(\theta)gθ​=diag(θ),
  • UUU:matrix of eigenvectors of the normalized graph Laplacian L=IN−D−1/2AD−1/2=UΛU⊤L = I_N − D^{− 1/2} AD^{− 1/2} = UΛU ^{\top}L=IN​−D−1/2AD−1/2=UΛU⊤

其中:

  • gθg_{\theta}gθ​ :a function of the eigenvalues of LLL。
  • $ U^{\top}x$ :the graph Fourier transform of xxx。
  • UgθU⊤U g_{\theta} U^{\top}Ugθ​U⊤:normalized graph Laplacian LLL 的特征分解。

切比雪夫多项式的截断展开化简gθ(Λ)g_{\theta}( \Lambda )gθ​(Λ)及其卷积化简:

gθ′(Λ)≈∑k=0Kθk′TK(Λ~)g_{\theta^{\ '}}( \Lambda ) \approx \sum_{k=0}^{K} \theta_{k}^{\ '} T_K(\widetilde{\Lambda}) gθ ′​(Λ)≈k=0∑K​θk ′​TK​(Λ)

  • Λ~\widetilde{\Lambda}Λ:Λ~=2λmaxΛ−IN\widetilde{\Lambda} = \frac{2}{\lambda_{max}} \Lambda - I_NΛ=λmax​2​Λ−IN​,rescaled。λmax\lambda_{max}λmax​ denotes the largest eigenvalue of LLL.
  • θ′∈RK\theta\ ' \in \R^Kθ ′∈RK is a vector of Chebyshev coefficients

从而将 LLL 的特征展开 UgθU⊤U g_{\theta} U^{\top}Ugθ​U⊤ 简化为 ∑k=0Kθk′Tk(L~)\sum_{k=0}^K \theta_k\ ' T_k(\widetilde{L})∑k=0K​θk​ ′Tk​(L),从而得到下式:
gθ∗x≈∑k=0Kθk′TK(L~)xg_{\theta} * x \approx \sum_{k=0}^{K} \theta_{k}^{\ '} T_K(\widetilde{L})x gθ​∗x≈k=0∑K​θk ′​TK​(L)x
其中:

  • ∑k=0Kθk′Tk(L~)\sum_{k=0}^K \theta_k\ ' T_k(\widetilde{L})∑k=0K​θk​ ′Tk​(L) 为 LLL 的特征展开 UgθU⊤U g_{\theta} U^{\top}Ugθ​U⊤的化简。
  • L~=2λmaxL−IN\widetilde{L} = \frac{2}{\lambda_{max}} L - I_NL=λmax​2​L−IN​
  • (UΛU⊤)k=UΛkU⊤(U\Lambda U^{\top})^k = U\Lambda^k U^{\top}(UΛU⊤)k=UΛkU⊤

Note that this expression is now KKK-localized since it is a KthK^{th}Kth-order polynomial in the Laplacian, i.e. it depends only on nodes that are at maximum KKK steps away from the central node (KthK_{th}Kth​-order neighborhood).

即,表达式(5)是**KKK-localized**(LLL的KKK阶近似),从几何来解释就是卷积至多能包括节点的KKK阶邻域(从而实现局部连接)。


layer-wise linear model:

频域卷积的线性性:

Eq.5 is a function that is linear w.r.t. LLL and therefore a linear function on the graph Laplacian spectrum.

即,频域的卷积操作是线性函数。


Eq.5 λmax=2\lambda_{max}=2λmax​=2:

得到下式:
gθ′∗x≈θ0′x+θ1′(L−IN)x=θ0′x−θ1′D−1/2AD−1/2xg_{\theta^{\ '}} * x \approx \theta_0^{\ '}x + \theta_1^{\ '}(L-I_N)x = \theta_0^{\ '}x - \theta_1^{\ '}D^{− 1/2} AD^{− 1/2}x gθ ′​∗x≈θ0 ′​x+θ1 ′​(L−IN​)x=θ0 ′​x−θ1 ′​D−1/2AD−1/2x

  • θ0′\theta_0^{\ '}θ0 ′​&θ1′\theta_1^{\ '}θ1 ′​:two free parameters

The filter parameters can be shared over the whole graph. Successive application of filters of this form then effectively convolve the kthk^{th}kth-order neighborhood of a node, where kkk is the number of successive filtering operations or convolutional layers in the neural network model.

即,Eq.6 实现了滤波器的权重共享,同时提高了卷积效率


θ=θ0′=−θ1′\theta=\theta_0^{\ '}=-\theta_1^{\ '}θ=θ0 ′​=−θ1 ′​:

gθ∗x≈θ(IN+D−1/2AD−1/2)xg_{\theta} * x \approx \theta (I_N+D^{− 1/2} AD^{− 1/2})x gθ​∗x≈θ(IN​+D−1/2AD−1/2)x

  • IN+D−1/2AD−1/2I_N+D^{− 1/2} AD^{− 1/2}IN​+D−1/2AD−1/2 now has eigenvalues in the range [0, 2].

Repeated application of this operator can therefore lead to numerical instabilities and exploding/vanishing gradients when used in a deep neural network model

即Eq.7的连续使用(深度网络),会导致梯度弥散梯度爆炸


renormalization trick:

令$ I_N+D^{− 1/2} AD^{− 1/2} \longrightarrow \widetilde{D}^{-1/2}\widetilde{A} \widetilde{D}^{-1/2} ,且,且,且\widetilde{A}=A+I_N,,,\widetilde{D}{ii}=\sum_j\widetilde{A}{ij}$,则有:
Z=D~−1/2A~D~−1/2XΘZ=\widetilde{D}^{-1/2}\widetilde{A} \widetilde{D}^{-1/2} X \Theta Z=D−1/2AD−1/2XΘ

  • X∈RN×CX \in \R ^{N×C}X∈RN×C signal,CCC input channels (i.e. a C-dimensional feature vector for every node)
  • Θ∈RC×F\Theta \in \R^{C×F}Θ∈RC×F,matrix of filter parameters
  • Z∈RN×FZ \in \R^{N×F}Z∈RN×F is the convolved signal matrix

$\widetilde{A}X $ can be efficiently implemented as a product of a sparse matrix with a dense matrix

即,$\widetilde{A}X $可以实现稀疏矩阵的乘法,从而提高效率。


semi-supervised node classification:

方法及适用范围:

we can relax certain assumptions typically made in graph-based semi-supervised learning by conditioning our model f(X,A)f(X, A)f(X,A) both on the data XXX and on the adjacency matrix AAA of the underlying graph structure.

即,本文方法(conditioning model f(X,A)f(X, A)f(X,A) both on the data XXX and on the adjacency matrix AAA )可以减轻 assumption (connected nodes in the graph are likely to share the same label) 的限制。


前向传播:

核心公式:

Z=f(X,A)=softmax(A^ReLU(A^XW(0))W(1)Z = f(X, A) = softmax(\widehat {A} ~ReLU(\widehat{A}XW^{(0)})~W^{(1)} Z=f(X,A)=softmax(A ReLU(AXW(0)) W(1)

  • A^=D~−1/2A~D~−1/2\widehat {A}=\widetilde{D}^{-1/2}\widetilde{A} \widetilde{D}^{-1/2}A=D−1/2AD−1/2
  • 每层有两个权重矩阵,are trained using gradient descent
    • W(0)∈RC×HW^{(0)}\in \R^{C×H}W(0)∈RC×H is an input-to-hidden weight matrix for a hidden layer with HHH feature maps
    • W(1)∈RH×FW^{(1)}\in \R^{H×F}W(1)∈RH×F is a hidden-to-output weight matrix
  • ReLU(A^XW(0))ReLU(\widehat{A}XW^{(0)})ReLU(AXW(0))的维度为N×HN×HN×H
  • ZZZ的维度为N×FN×FN×F
  • The softmax activation function is applied row-wise

损失函数:

evaluate the cross-entropy error over all labeled examples:
L=−∑l∈yL∑f=1FYlflnZlf\ L = −\sum _{l\in\ y_L} \sum^F_{f=1} Y_{lf} lnZ_{lf}  L=−l∈ yL​∑​f=1∑F​Ylf​lnZlf​

  • yL\ y_L yL​ is the set of node indices that have labels.

训练方法:

we perform batch gradient descent using the full dataset for every training iteration。

d examples:
L=−∑l∈yL∑f=1FYlflnZlf\ L = −\sum _{l\in\ y_L} \sum^F_{f=1} Y_{lf} lnZ_{lf}  L=−l∈ yL​∑​f=1∑F​Ylf​lnZlf​

  • yL\ y_L yL​ is the set of node indices that have labels.

训练方法:

we perform batch gradient descent using the full dataset for every training iteration。

paper reading:[renormalization]Semi-supervised Classification with Graph Convolutional Networks相关推荐

  1. 论文笔记:Semi-Supervised Classification with Graph Convolutional Networks

    Semi-Supervised Classification with Graph Convolutional Networks 1.四个问题 要解决什么问题? 半监督任务.给定一个图,其中一部节点已 ...

  2. [GCN] 代码解析 of GitHub:Semi-supervised classification with graph convolutional networks

    本文解析的代码是论文Semi-Supervised Classification with Graph Convolutional Networks作者提供的实现代码. 原GitHub:Graph C ...

  3. 【论文翻译】GCN-Semi-Supervised Classification with Graph Convolutional Networks(ICLR)

    学习总结 传统深度学习模型如 LSTM 和 CNN在欧式空间中表现不俗,却无法直接应用在非欧式数据上.因此大佬们通过引入图论中抽象意义上的"图"来表示非欧式空间中的结构化数据,并通 ...

  4. GCN - Semi-Supervised Classification with Graph Convolutional Networks 用图卷积进行半监督节点分类 ICLR 2017

    目录 文章目录 1 为什么GCN是谱图卷积的一阶局部近似?- GCN的推导 谱图卷积 Layer-wise Linear Model(逐层线性模型) 简化:K=1(2个参数的模型) 简化:1个参数的模 ...

  5. Semi-Supervised Classification with Graph Convolutional Networks

    Kipf, Thomas N., and Max Welling. "Semi-supervised classification with graph convolutional netw ...

  6. 【图卷积网络】Semi-Supervised Classification with Graph Convolutional Networks

    论文:Semi-Supervised Classification with Graph Convolutional Networks 代码:TensorFlow.PyTorch 摘要 我们提出了一种 ...

  7. AI医药论文解读:Modeling Polypharmacy Side Effects with Graph Convolutional Networks

    论文题目 Modeling Polypharmacy Side Effects with Graph Convolutional Networks 中文 使用图卷积网络对多药副作用进行建模 论文出自 ...

  8. [GCN+FocalLoss] 从数据角度分析实验 of Semi-supervised classification with graph convolutional networks

    文章目录 Cora数据集信息 类别信息 样本信息 图的性质 类别分布 论文的实验设置 超参数 数据集划分 模型架构 不同损失函数的训练结果 nll_loss CE loss focal loss:γ\ ...

  9. SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS 论文/GCN学习笔记

    1 前置知识:卷积与CNN 该内容在论文中并没有涉及,但理解卷积和CNN对于GCN的理解有帮助,因为GCN是将CNN推广到新的数据结构:graph上面. 1.1与1.2的内容均来自视频:https:/ ...

最新文章

  1. poj 2677 tour
  2. 热修复框架Tinker的从0到集成之路(转)
  3. Python dataframe修改列顺序(pandas学习)
  4. 【机器视觉】 dev_map_prog算子
  5. HTML期末作业-旅游网页
  6. 拳王虚拟项目公社:一款解除网站禁止复制的插件,Simple Allow Copy V 0.8.2
  7. CF789D Mike and distribution
  8. day2-Samba
  9. [转] SQL Server中各个系统表的作用
  10. opendrive map with UE4
  11. cso是什么职位(企业cso是什么职位)
  12. 速率法和终点法的区别_两点法终点法速率法
  13. MySQL中登录报错_mysql登录报错 ERROR 1045 (28000)
  14. 大模型系统和应用——Transformer预训练语言模型
  15. linux centos 手册,zh/FAQ/CentOS4 - CentOS Wiki
  16. UE4添加视频——手把手吧
  17. 程序员Mac开发软件工具推荐
  18. Swoole入门教程(一):服务器开发
  19. 嘿嘿,我发现了百度网盘秒传的秘密 !!
  20. 新iPad为何舍弃Lightning,改用Type-C

热门文章

  1. python界面-Python GUI 编程(Tkinter)
  2. 清华大学出版社-图书详情-《深度学习:语音识别技术实践》
  3. 无声也能语音识别?微软这个黑科技有点厉害
  4. android audiorecord jni,Android AudioRecord初始化失败
  5. 【动态规划笔记】状压dp:旅行商问题
  6. YUV通过MediaCodec编码H264
  7. 基于内容的图像检索系统(合集)
  8. java mongodb 时间查询_Spring-Data-MongoDB快速入门
  9. python twisted应用_如何通过Python(而不是通过Twisted)运行Twisted应用程序?
  10. html引用单文件组件,webpack入坑之旅(五)加载vue单文件组件_html/css_WEB-ITnose