paper reading：[Renormalization Trick] Semi-supervised classification with graph convolutional networks

文章目录

paper reading：[Renormalization Trick] Semi-supervised classification with graph convolutional networks
- graph 中半监督学习的传统方式：
- - 核心公式：
  - 符号说明：
- 公式成立的讨论：
- - assumption：
  - 局限：
- 模型：
- - 模型简述：
  - - localized first-order：
  - 模型适用场景（解决的问题）：
  - 模型思路：
  - 模型所使用的信息：
- fast approximate convolution ：
- - layer-wise propagation rule：
  - spectral graph convolutions：
  - - 卷积的频谱形式：
    - 切比雪夫多项式的截断展开化简gθ(Λ)g_{\theta}( \Lambda )gθ(Λ)及其卷积化简：
  - layer-wise linear model：
  - - 频域卷积的线性性：
    - Eq.5 λmax=2\lambda_{max}=2λmax=2：
    - θ=θ0′=−θ1′\theta=\theta_0^{\ '}=-\theta_1^{\ '}θ=θ0 ′=−θ1 ′：
    - renormalization trick：
- semi-supervised node classification：
- - 方法及适用范围：
  - 前向传播：
  - - 核心公式：
    - 损失函数：
    - 训练方法：
    - 训练方法：

graph 中半监督学习的传统方式：

label information is smoothed over the graph via some form of explicit graph-based regularization, e.g. by using a graph Laplacian regularization term in the loss function:

核心公式：

L=L0+λLreg,Lreg=∑i,jAi,j∥f(Xi)−f(Xj)∥2=f(X)⊤Δf(X)L=L_0+λL_{reg}, L_{reg}=\sum_{i, j}A_{i,j}\parallel f(X_i)-f(X_j)\parallel ^2=f(X)^\top\Delta f(X) L=L0+λLreg,Lreg=i,j∑Ai,j∥f(Xi)−f(Xj)∥2=f(X)⊤Δf(X)

符号说明：

L0L_0L0：the supervised loss w.r.t. the labeled part of the graph。即对于有标签部分的图的有监督损失函数。
λλλ ： a weighing factor
XXX：matrix of node feature vectors XiX_iXi
Δ=D−A\Delta = D − AΔ=D−A：unnormalized graph Laplacian of an undirected graph
AAA：adjacency matrix $A∈ \R^{N×N} $
DiiD_{ii}Dii：degree matrix Dii=∑jAijD_{ii}=\sum _j A_{ij}Dii=∑jAij

公式成立的讨论：

assumption：

The formulation of Eq. 1 relies on the assumption that connected nodes in the graph are likely to share the same label.

即相连接的 node 倾向于拥有相似的 label。

局限：

This assumption, however, might restrict modeling capacity, as graph edges need not necessarily encode node similarity, but could contain additional information.

graph 中 node 的连接不一定表示标签的相似性，也可以表示其他的信息。

模型：

模型简述：

localized first-order approximation of spectral graph convolutions

即，频域图卷积的局部一阶近似。

#####spectral graph convolutions：

图卷积分为两种：空间域的图卷积、频域图卷积。此处用到的是频域图卷积。

上述分类的根据是：卷积的计算方式。

localized first-order：

即只考虑每一个节点的一阶邻域（first-order neighborhoods），从而实现局部性（localized）。

模型适用场景（解决的问题）：

labels are only available for a small subset of nodes.

This problem can be framed as graph-based semi-supervised learning.

数据集中仅部分样本的标签可知，即半监督问题。

We expect this setting to be especially powerful in scenarios where the adjacency matrix AAA contains information not present in the data XXX, such as citation links between documents in a citation network or relations in a knowledge graph.

即，该模型（方法）的适用场景：AAA的信息不包含在XXX中，比如 citation network 和 knowledge graph.

we consider here the task of transductive node classification within networks of significantly larger scale.

模型思路：

we encode the graph structure directly using a neural network model f(X,A)f(X, A)f(X,A) and train on a supervised target L0L_0L0 for all nodes with labels, thereby avoiding explicit graph-based regularization in the loss function.

Conditioning f(⋅)f(·)f(⋅) on the adjacency matrix of the graph will allow the model to distribute gradient information from the supervised loss L0L_0L0 and will enable it to learn representations of nodes both with and without labels.

对于有标签节点，使用神经网络f(X,A)f(X, A)f(X,A)进行监督学习 ⟹\Longrightarrow⟹ 损失函数消除λLregλL_{reg}λLreg，仅保留L0L_0L0

这种方法的实现为：

计算输出时，对所有的节点计算输出
损失函数，仅对训练集节点计算，即：优化仅对训练集数据进行

模型所使用的信息：

Our model scales linearly in the number of graph edges and learns hidden layer representations that encode both local graph structure and features of nodes.

即，网络会学习两种信息：

local graph structure，即 graph 的 edge。
features of nodes，即 graph 的 node。

fast approximate convolution ：

layer-wise propagation rule：

H(l+1)=σ(D~−1/2A~D~−1/2H(l)W(l))H^{(l+1)}=\sigma (\widetilde{D}^{-1/2}\widetilde{A} \widetilde{D}^{-1/2} H^{(l)}W^{(l)}) H(l+1)=σ(D−1/2AD−1/2H(l)W(l))

A~=A+IN\widetilde{A}=A+I_NA=A+IN：adjacency matrix of the undirected graph GGG with added self-connections
DDD：degree matrix
W(l)W^{(l)}W(l)：layer-specific trainable weight matrix
$σ(·) $：activation function
H(l)H^{(l)}H(l) ：∈RN×D∈ \R^{N×D } ∈RN×D，matrix of activations in the lthl^{th}lth layer，H(0)=XH^{(0)}=XH(0)=X

spectral graph convolutions：

卷积的频谱形式：

gθ∗x=UgθU⊤xg_{\theta} * x = U g_{\theta} U^{\top}x gθ∗x=UgθU⊤x

xxx：signal x∈RNx \in \R^Nx∈RN
gθg_{\theta}gθ：filter gθ=diag(θ)g_{\theta} = diag(\theta)gθ=diag(θ)，
UUU：matrix of eigenvectors of the normalized graph Laplacian L=IN−D−1/2AD−1/2=UΛU⊤L = I_N − D^{− 1/2} AD^{− 1/2} = UΛU ^{\top}L=IN−D−1/2AD−1/2=UΛU⊤

其中：

gθg_{\theta}gθ ：a function of the eigenvalues of LLL。
$ U^{\top}x$ ：the graph Fourier transform of xxx。
UgθU⊤U g_{\theta} U^{\top}UgθU⊤：normalized graph Laplacian LLL 的特征分解。

切比雪夫多项式的截断展开化简gθ(Λ)g_{\theta}( \Lambda )gθ(Λ)及其卷积化简：

gθ′(Λ)≈∑k=0Kθk′TK(Λ~)g_{\theta^{\ '}}( \Lambda ) \approx \sum_{k=0}^{K} \theta_{k}^{\ '} T_K(\widetilde{\Lambda}) gθ ′(Λ)≈k=0∑Kθk ′TK(Λ)

Λ~\widetilde{\Lambda}Λ：Λ~=2λmaxΛ−IN\widetilde{\Lambda} = \frac{2}{\lambda_{max}} \Lambda - I_NΛ=λmax2Λ−IN，rescaled。λmax\lambda_{max}λmax denotes the largest eigenvalue of LLL.
θ′∈RK\theta\ ' \in \R^Kθ ′∈RK is a vector of Chebyshev coefficients

从而将 LLL 的特征展开 UgθU⊤U g_{\theta} U^{\top}UgθU⊤ 简化为 ∑k=0Kθk′Tk(L~)\sum_{k=0}^K \theta_k\ ' T_k(\widetilde{L})∑k=0Kθk ′Tk(L)，从而得到下式：
gθ∗x≈∑k=0Kθk′TK(L~)xg_{\theta} * x \approx \sum_{k=0}^{K} \theta_{k}^{\ '} T_K(\widetilde{L})x gθ∗x≈k=0∑Kθk ′TK(L)x
其中：

∑k=0Kθk′Tk(L~)\sum_{k=0}^K \theta_k\ ' T_k(\widetilde{L})∑k=0Kθk ′Tk(L) 为 LLL 的特征展开 UgθU⊤U g_{\theta} U^{\top}UgθU⊤的化简。
L~=2λmaxL−IN\widetilde{L} = \frac{2}{\lambda_{max}} L - I_NL=λmax2L−IN
(UΛU⊤)k=UΛkU⊤(U\Lambda U^{\top})^k = U\Lambda^k U^{\top}(UΛU⊤)k=UΛkU⊤

Note that this expression is now KKK-localized since it is a KthK^{th}Kth-order polynomial in the Laplacian, i.e. it depends only on nodes that are at maximum KKK steps away from the central node (KthK_{th}Kth-order neighborhood).

即，表达式(5)是**KKK-localized**（LLL的KKK阶近似），从几何来解释就是卷积至多能包括节点的KKK阶邻域（从而实现局部连接）。

layer-wise linear model：

频域卷积的线性性：

Eq.5 is a function that is linear w.r.t. LLL and therefore a linear function on the graph Laplacian spectrum.

即，频域的卷积操作是线性函数。

Eq.5 λmax=2\lambda_{max}=2λmax=2：

得到下式：
gθ′∗x≈θ0′x+θ1′(L−IN)x=θ0′x−θ1′D−1/2AD−1/2xg_{\theta^{\ '}} * x \approx \theta_0^{\ '}x + \theta_1^{\ '}(L-I_N)x = \theta_0^{\ '}x - \theta_1^{\ '}D^{− 1/2} AD^{− 1/2}x gθ ′∗x≈θ0 ′x+θ1 ′(L−IN)x=θ0 ′x−θ1 ′D−1/2AD−1/2x

θ0′\theta_0^{\ '}θ0 ′&θ1′\theta_1^{\ '}θ1 ′：two free parameters

The filter parameters can be shared over the whole graph. Successive application of filters of this form then effectively convolve the kthk^{th}kth-order neighborhood of a node, where kkk is the number of successive filtering operations or convolutional layers in the neural network model.

即，Eq.6 实现了滤波器的权重共享，同时提高了卷积效率。

θ=θ0′=−θ1′\theta=\theta_0^{\ '}=-\theta_1^{\ '}θ=θ0 ′=−θ1 ′：

gθ∗x≈θ(IN+D−1/2AD−1/2)xg_{\theta} * x \approx \theta (I_N+D^{− 1/2} AD^{− 1/2})x gθ∗x≈θ(IN+D−1/2AD−1/2)x

IN+D−1/2AD−1/2I_N+D^{− 1/2} AD^{− 1/2}IN+D−1/2AD−1/2 now has eigenvalues in the range [0, 2].

Repeated application of this operator can therefore lead to numerical instabilities and exploding/vanishing gradients when used in a deep neural network model

即Eq.7的连续使用（深度网络），会导致梯度弥散或梯度爆炸。

renormalization trick：

令$ I_N+D^{− 1/2} AD^{− 1/2} \longrightarrow \widetilde{D}^{-1/2}\widetilde{A} \widetilde{D}^{-1/2} ，且，且，且\widetilde{A}=A+I_N，，，\widetilde{D}{ii}=\sum_j\widetilde{A}{ij}$，则有：
Z=D~−1/2A~D~−1/2XΘZ=\widetilde{D}^{-1/2}\widetilde{A} \widetilde{D}^{-1/2} X \Theta Z=D−1/2AD−1/2XΘ

X∈RN×CX \in \R ^{N×C}X∈RN×C signal，CCC input channels (i.e. a C-dimensional feature vector for every node)
Θ∈RC×F\Theta \in \R^{C×F}Θ∈RC×F，matrix of filter parameters
Z∈RN×FZ \in \R^{N×F}Z∈RN×F is the convolved signal matrix

$\widetilde{A}X $ can be efficiently implemented as a product of a sparse matrix with a dense matrix

即，$\widetilde{A}X $可以实现稀疏矩阵的乘法，从而提高效率。

semi-supervised node classification：

方法及适用范围：

we can relax certain assumptions typically made in graph-based semi-supervised learning by conditioning our model f(X,A)f(X, A)f(X,A) both on the data XXX and on the adjacency matrix AAA of the underlying graph structure.

即，本文方法（conditioning model f(X,A)f(X, A)f(X,A) both on the data XXX and on the adjacency matrix AAA ）可以减轻 assumption (connected nodes in the graph are likely to share the same label) 的限制。

前向传播：

核心公式：

Z=f(X,A)=softmax（A^ReLU(A^XW(0))W(1)Z = f(X, A) = softmax（\widehat {A} ~ReLU(\widehat{A}XW^{(0)})~W^{(1)} Z=f(X,A)=softmax（A ReLU(AXW(0)) W(1)

A^=D~−1/2A~D~−1/2\widehat {A}=\widetilde{D}^{-1/2}\widetilde{A} \widetilde{D}^{-1/2}A=D−1/2AD−1/2
每层有两个权重矩阵，are trained using gradient descent
- W(0)∈RC×HW^{(0)}\in \R^{C×H}W(0)∈RC×H is an input-to-hidden weight matrix for a hidden layer with HHH feature maps
- W(1)∈RH×FW^{(1)}\in \R^{H×F}W(1)∈RH×F is a hidden-to-output weight matrix
ReLU(A^XW(0))ReLU(\widehat{A}XW^{(0)})ReLU(AXW(0))的维度为N×HN×HN×H
ZZZ的维度为N×FN×FN×F
The softmax activation function is applied row-wise

损失函数：

evaluate the cross-entropy error over all labeled examples：
L=−∑l∈yL∑f=1FYlflnZlf\ L = −\sum _{l\in\ y_L} \sum^F_{f=1} Y_{lf} lnZ_{lf} L=−l∈ yL∑f=1∑FYlflnZlf

yL\ y_L yL is the set of node indices that have labels.

训练方法：

we perform batch gradient descent using the full dataset for every training iteration。

d examples：
L=−∑l∈yL∑f=1FYlflnZlf\ L = −\sum _{l\in\ y_L} \sum^F_{f=1} Y_{lf} lnZ_{lf} L=−l∈ yL∑f=1∑FYlflnZlf

yL\ y_L yL is the set of node indices that have labels.

训练方法：

we perform batch gradient descent using the full dataset for every training iteration。

paper reading：[renormalization]Semi-supervised Classification with Graph Convolutional Networks相关推荐

论文笔记：Semi-Supervised Classification with Graph Convolutional Networks
Semi-Supervised Classification with Graph Convolutional Networks 1.四个问题要解决什么问题? 半监督任务.给定一个图,其中一部节点已 ...
[GCN] 代码解析 of GitHub：Semi-supervised classification with graph convolutional networks
本文解析的代码是论文Semi-Supervised Classification with Graph Convolutional Networks作者提供的实现代码. 原GitHub:Graph C ...
【论文翻译】GCN-Semi-Supervised Classification with Graph Convolutional Networks（ICLR）
学习总结传统深度学习模型如 LSTM 和 CNN在欧式空间中表现不俗,却无法直接应用在非欧式数据上.因此大佬们通过引入图论中抽象意义上的"图"来表示非欧式空间中的结构化数据,并通 ...
GCN - Semi-Supervised Classification with Graph Convolutional Networks 用图卷积进行半监督节点分类 ICLR 2017
目录文章目录 1 为什么GCN是谱图卷积的一阶局部近似?- GCN的推导谱图卷积 Layer-wise Linear Model(逐层线性模型) 简化:K=1(2个参数的模型) 简化:1个参数的模 ...
Semi-Supervised Classification with Graph Convolutional Networks
Kipf, Thomas N., and Max Welling. "Semi-supervised classification with graph convolutional netw ...
【图卷积网络】Semi-Supervised Classification with Graph Convolutional Networks
论文:Semi-Supervised Classification with Graph Convolutional Networks 代码:TensorFlow.PyTorch 摘要我们提出了一种 ...
AI医药论文解读：Modeling Polypharmacy Side Effects with Graph Convolutional Networks
论文题目 Modeling Polypharmacy Side Effects with Graph Convolutional Networks 中文使用图卷积网络对多药副作用进行建模论文出自 ...
[GCN+FocalLoss] 从数据角度分析实验 of Semi-supervised classification with graph convolutional networks
文章目录 Cora数据集信息类别信息样本信息图的性质类别分布论文的实验设置超参数数据集划分模型架构不同损失函数的训练结果 nll_loss CE loss focal loss:γ\ ...
SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS 论文/GCN学习笔记
1 前置知识:卷积与CNN 该内容在论文中并没有涉及,但理解卷积和CNN对于GCN的理解有帮助,因为GCN是将CNN推广到新的数据结构:graph上面. 1.1与1.2的内容均来自视频:https:/ ...

paper reading：[renormalization]Semi-supervised Classification with Graph Convolutional Networks

paper reading：[Renormalization Trick] Semi-supervised classification with graph convolutional networks

文章目录

graph 中半监督学习的传统方式：

核心公式：

符号说明：

公式成立的讨论：

assumption：

局限：

模型：

模型简述：

localized first-order：

模型适用场景（解决的问题）：

模型思路：

模型所使用的信息：

fast approximate convolution ：

layer-wise propagation rule：

spectral graph convolutions：

卷积的频谱形式：

切比雪夫多项式的截断展开化简gθ(Λ)g_{\theta}( \Lambda )gθ(Λ)及其卷积化简：

layer-wise linear model：

频域卷积的线性性：

Eq.5 λmax=2\lambda_{max}=2λmax=2：

θ=θ0′=−θ1′\theta=\theta_0^{\ '}=-\theta_1^{\ '}θ=θ0 ′=−θ1 ′：

renormalization trick：

semi-supervised node classification：

方法及适用范围：

前向传播：

核心公式：

损失函数：

训练方法：

训练方法：

paper reading：[renormalization]Semi-supervised Classification with Graph Convolutional Networks相关推荐

最新文章

热门文章

paper reading：[renormalization]Semi-supervised Classification with Graph Convolutional Networks

paper reading：[Renormalization Trick] Semi-supervised classification with graph convolutional networks

文章目录

graph 中半监督学习的传统方式：

核心公式：

符号说明：

公式成立的讨论：

assumption：

局限：

模型：

模型简述：

localized first-order：

模型适用场景（解决的问题）：

模型思路：

模型所使用的信息：

fast approximate convolution ：

layer-wise propagation rule：

spectral graph convolutions：

卷积的频谱形式：

切比雪夫多项式的截断展开化简gθ(Λ)g_{\theta}( \Lambda )gθ​(Λ)及其卷积化简：

layer-wise linear model：

频域卷积的线性性：

Eq.5 λmax=2\lambda_{max}=2λmax​=2：

θ=θ0′=−θ1′\theta=\theta_0^{\ '}=-\theta_1^{\ '}θ=θ0 ′​=−θ1 ′​：

renormalization trick：

semi-supervised node classification：

方法及适用范围：

前向传播：

核心公式：

损失函数：

训练方法：

训练方法：

paper reading：[renormalization]Semi-supervised Classification with Graph Convolutional Networks相关推荐

最新文章

热门文章

切比雪夫多项式的截断展开化简gθ(Λ)g_{\theta}( \Lambda )gθ(Λ)及其卷积化简：

Eq.5 λmax=2\lambda_{max}=2λmax=2：

θ=θ0′=−θ1′\theta=\theta_0^{\ '}=-\theta_1^{\ '}θ=θ0 ′=−θ1 ′：