文章目录

  • 往期文章链接目录
  • Overview
  • Intuition
  • Previous Work
  • Major Contribution
  • Graph Search Neural Network (GSNN)
    • GSNN Explanation
    • Three networks
    • Diagram visualization
    • Advantage
  • Incorporate the graph network into an image pipeline
  • Dataset
  • Conclusion
  • 往期文章链接目录

往期文章链接目录

Overview

Note: This previous post I wrote might be helpful for reading this paper summary:

  • Introduction to Graph Neural Network (GNN)

This paper investigates the use of structured prior knowledge in the form of knowledge graphs and shows that using this knowledge improves performance on image classification.

It introduce the Graph Search Neural Network (GSNN) as a way of efficiently incorporating large knowledge graphs into a vision classification pipeline, which outperforms standard neural network baselines for multi-label classification.

Intuition

While modern learning-based approaches can recognize some categories with high accuracy, it usually requires thousands of labeled examples for each of these categories. This approach of building large datasets for every concept is unscalable. One way to solve this problem is to use structured knowledge and reasoning (prior knowledge, this is what human usually do but current approaches do not).

For example, when people try to identify the animal shown in the figure, they will first recognize the animal, then recall relevant knowledge, and finally reason about it. With this information, even if we have only seen one or two pictures of this animal, we would be able to classify it. So we hope a model could also have similar reasoning process.

Previous Work

There has been a lot of work in end-to-end learning on graphs or neural network trained on graphs. Most of these approaches either extract features from the graph or they learn a propagation model that transfers evidence between nodes conditional on the type of edge. An example of this is the Gated Graph Neural Network which takes an arbitrary graph as input. Given some initialization specific to the task, it learns how to propagate information and predict the output for every node in the graph.

Previous works are focusing on building and then querying knowledge bases rather than using existing knowledge bases as side information for some vision task.

This work not only uses attribute relationships that appear in our knowledge graphs, but also uses relationships between objects and reasons directly on graphs rather than using object-attribute pairs directly.

Major Contribution

  1. The introduction of the GSNN as a way of incorporating potentially large knowledge graphs into an end-to-end learning system that is computationally feasible for large graphs;

  2. Provide a framework for using noisy knowledge graphs for image classification (In vision problems, graphs encode contextual and common-sense relationships and are significantly larger and noisier);

  3. The ability to explain image classifications by using the propagation model (Interpretability).

Graph Search Neural Network (GSNN)

GSNN Explanation

The idea is that rather than performing the recurrent update over all of the nodes of the graph at once, it starts with some initial nodes based on the input and only choose to expand nodes which are useful for the final output. Thus, the model only compute the update steps over a subset of the graph.

Steps in GSNN:

  • Determine Initial Nodes

They determine initial nodes in the graph based on likelihood of the visual concept being present as determined by an object detector or classifier. For their experiments, they use Faster R-CNN for each of the 80 COCO categories. For scores over some chosen threshold, they choose the corresponding nodes in the graph as our initial set of active nodes. Once they have initial nodes, they also add the nodes adjacent to the initial nodes to the active set.

  • Propagation

Given the initial nodes, they want to first propagate the beliefs about the initial nodes to all of the adjacent nodes (propagation network). This process is similar to GGNN.

  • Decide which nodes to expand next

After the first time step, they need a way of deciding which nodes to expand next. Therefore, a per-node scoring function is learned to estimates how “important” that node is. After each propagation step, for every node in the current graph, the model predict an importance score

iv(t)=gi(hv,xv)i_{v}^{(t)}=g_{i}\left(h_{v}, x_{v}\right) iv(t)​=gi​(hv​,xv​)

where gig_{i}gi​ is a learned network, the importance network. Once we have values of ivi_{v}iv​, we take the top PPP scoring nodes that have never been expanded and add them to the expanded set, and add all nodes adjacent to those nodes to our active set.

The above two steps (Propagation, Decide which nodes to expand next) repeated TTT times (TTT is a hyper-parameter).

Lastly, at the final time step TTT, the model computes the per-node-output (output network) and re-order and zero-pad the outputs into the final classification net. Re-order it so that nodes always appear in the same order into the final network, and zero pad any nodes that were not expanded.

The entire process is shown in the figure above.

Three networks

  • Propagation network: normal Graph Gated Neural Network (GGNN). GGNN is a fully end-to-end network that takes as input a directed graph and outputs either a classification over the entire graph or an output for each node. Check my previous post to know more details about GGNN and how propagation works.

  • Output network: After TTT time steps, we have our final hidden states. The node level outputs can then just be computed as
    ov=g(hv(T),xv)o_{v}=g\left(h_{v}^{(T)}, x_{v}\right) ov​=g(hv(T)​,xv​)
    where ggg is a fully connected network, the output network, and xvx_{v}xv​ is the original annotation for the node.

  • Importance network: learn a per-node scoring function that estimates how “important” that node is. To train the importance net, they assign target importance value to each node in the graph for a given image. Nodes corresponding to ground-truth concepts in an image are assigned an importance value of 1. The neighbors of these nodes are assigned a value of γ\gammaγ. Nodes which are two-hop away have value γ2\gamma^2γ2 and so on. The idea is that nodes closest to the final output are the most important to expand. After each propagation step, for each node in the current graph, they predict an importance score
    iv(t)=gi(hv,xv)i_{v}^{(t)}=g_{i}\left(h_{v}, x_{v}\right) iv(t)​=gi​(hv​,xv​)

Diagram visualization

First xinitx_{init}xinit​, the detection confidences initialize hinit(1)h_{i n i t}^{(1)}hinit(1)​, the hidden states of the initially detected nodes (Each visual concept (e.x., person, horse, cat, etc) in the knowledge graph is represented in a hidden state). We then initialize hadj1(1)h_{a d j 1}^{(1)}hadj1(1)​, the hidden states of the adjacent nodes, with 000. We then update the hidden states using the propagation net. The values of h(2)h^{(2)}h(2) are then used to predict the importance scores i(1)i^{(1)}i(1) which are used to pick the next nodes to add adj2adj2adj2. These nodes are then initialized with hadj2(2)=0h_{a d j 2}^{(2)}=0hadj2(2)​=0 and the hidden states are updated again through the propagation net. After TTT steps, we then take all of the accumulated hidden states hTh^{T}hT to predict the GSNN outputs for all the active nodes. During backpropagation, the binary cross entropy (BCE) loss is fed backward through the output layer, and the importance losses are fed through the importance networks to update the network parameters.

One final detail is the addition of a “node bias” into GSNN. In GGNN, the per-node output function g(hv(T),xv)g\left(h_{v}^{(T)}, x_{v}\right)g(hv(T)​,xv​), takes in the hidden state and initial annotation of the node vvv to compute its output. Our output equations are now g(hv(T),xv,nv)g\left(h_{v}^{(T)}, x_{v}, n_{v}\right)g(hv(T)​,xv​,nv​) where nvn_{v}nv​ is a bias term that is tied to a particular node vvv in the overall graph. This value is stored in a table and its value are updated by backpropagation.

Advantage

  • This new architecture mitigates the computational issues with the Gated Graph Neural Networks for large graphs which allows our model to be efficiently trained for image tasks using large knowledge graphs.

  • Importantly, the GSNN model is also able to provide explanations on classifications by following how the information is propagated in the graph.

Incorporate the graph network into an image pipeline

We take the output of the graph network, re-order it so that nodes always appear in the same order into the final network, and zero pad any nodes that were not expanded. Therefore, if we have a graph with 316 node outputs, and each node predicts a 5-dim hidden variable, we create a 1580-dim feature vector from the graph. We also concatenate this feature vector with fc7 layer (4096-dim) of a fine-tuned VGG-16 network and top-score for each COCO category predicted by Faster R-CNN (80-dim). This 5756-dim feature vector is then fed into 1-layer final classification network trained with dropout.

You can think of this as a way of feature engineering where you concatenate the output from models with different structures.

Dataset

COCO: COCO is a large-scale object detection, segmentation, and captioning dataset, which includes 80 object categories.

Visual Genome: a dataset that represents the complex, noisy visual world with its many different kinds of objects, where labels are potentially ambiguous and overlapping, and categories fall into a long-tail distribution (skew to the right).

Visual Genome contains over 100,000 natural images from the Internet. Each image is labeled with objects, attributes and relationships between objects entered by human annotators. They create a subset from Visual Genome which they call Visual Genome multi-label dataset or VGML (They took 316 visual concepts from the subset).

Using only the train split, we build a knowledge graph connecting the concepts using the most common object-attribute and object-object relationships in the dataset. Specifically, we counted how often an object/object relationship or object/attribute pair occurred in the training set, and pruned any edges that had fewer than 200 instances. This leaves us with a graph over all of the images with each edge being a common relationship.

Visual Genome + WordNet: including the outside semantic knowledge from WordNet. The Visual Genome graphs do not contain useful semantic relationships. For instance, it might be helpful to know that dog is an animal if our visual system sees a dog and one of our labels is animal. Check the original paper to know how to add WordNet into the graph.

Conclusion

In this paper, they present the Graph Search Neural Network (GSNN) as a way of efficiently using knowledge graphs as extra information to improve image classification.

Next Step:

The GSNN and the framework they use for vision problems is completely general. The next steps will be to apply the GSNN to other vision tasks, such as detection, Visual Question Answering, and image captioning.

Another interesting direction would be to combine the procedure of this work with a system such as NEIL to create a system which builds knowledge graphs and then prunes them to get a more accurate, useful graph for image tasks.


Reference:

  • Visual Genome: https://visualgenome.org
  • The More You Know: Using Knowledge Graphs for Image Classification: https://arxiv.org/abs/1612.04844

往期文章链接目录

The More You Know: Using Knowledge Graphs for Image Classification 论文总结相关推荐

  1. 【CVPR 2021】自我知识蒸馏:Self-distillation with Batch Knowledge Ensembling Improves ImageNet Classification

    [CVPR 2021]自我知识蒸馏:Self-distillation with Batch Knowledge Ensembling Improves ImageNet Classification ...

  2. LANGUAGE MODELS ARE OPEN KNOWLEDGE GRAPHS —— 读后总结

    这篇paper展示了一种从预训练的语言模型(例:BERT,GPT-2/3)通过无监督训练构建知识图谱(KGs)的idea,想法还是很新奇的,搭建了LM(Language Model)和KG(Knowl ...

  3. Error detection in Knowledge Graphs: Path Ranking, Embeddings or both?-学习笔记

    Error detection in Knowledge Graphs: Path Ranking, Embeddings or both? 错误的三元组本质上是对象s与对象o(两者均为E)之间的错误 ...

  4. (十四)【RecSys 2016】Personalized Recommendations using Knowledge Graphs: A Probabilistic【看不懂】

    题目: Personalized Recommendations using Knowledge Graphs: A Probabilistic Logic Programming Approach( ...

  5. 征稿 | Call for papers on Knowledge Graphs

    Knowledge graph是Data Intelligence的核心主题和期刊特色之一.为持续展示这一领域的最新进展和前沿成果,Data Intelligence正在与国际学者一道策划两期Know ...

  6. AAAI论文摘要【知识图谱补全】:A Survey on knowledge Graphs:Representation,Acquisition and Application

    A Survey on knowledge Graphs:Representation,Acquisition and Application 主要思路 全面回顾 知识图谱表示学习 知识图谱推理 全视 ...

  7. [论文学习]TDN: An Integrated Representation Learning Model of Knowledge Graphs

    [论文学习以及翻译]TDN: An Integrated Representation Learning Model of Knowledge Graphs 文章主要内容 摘要 前言 相关工作 基于T ...

  8. 【知识图谱综述】Knowledge Graphs: A Survey

    知识图谱综述 本文主要在阅读文章Knowledge Graphs. ACM Comput. Surv., 54(4): 1–37. 2021的基础上进行归纳总结,涉及原理知识较浅,旨在帮助对知识图谱进 ...

  9. Generative Adversarial Zero-shot Learning via Knowledge Graphs翻译

    Generative Adversarial Zero-shot Learning via Knowledge Graphs Abstract: 零样本学习(ZSL)是处理那些没有标记训练数据的看不见 ...

  10. Interactive natural language question answering over knowledge graphs论文导读

    论文导读 目录 Abstract introduction 1 抛砖引的玉(砖见于图谱构建综述吧) 2 现有方法介绍 3 问题驱动 4 挑战与贡献 Interaction approach overv ...

最新文章

  1. 还在用分页?你out了 !试试 MyBatis 流式查询,真心强大!
  2. JS break语句和continue语句
  3. scanf_s 发送访问冲突_程序员如何解决并发冲突的难题?
  4. VTK:非结构化网格之UGrid
  5. 学习笔记:Zookeeper 应用案例(上下线动态感知)
  6. 乱想-HTTP请求常用对象
  7. Hadoop之OutputFormat数据输出详解
  8. mysql 创建带参数的存储过程_在MySQL中创建带有IN和OUT参数的存储过程的方法
  9. LeetCode第13题 罗马数字转整数
  10. 用VAE(variational autoencoder)做sentence embedding/representation或者其他任何结构数据的热presentation...
  11. Atitit 解决Unhandled event loop exception错误的办法
  12. Java Web开发流程
  13. 虚拟机安装OpenGauss企业版
  14. 如何给自己的照片制作水印
  15. LeetCode 300. Longest Increasing Subsequence
  16. 慧数纵览:日产在华三大工厂将减产30,000辆
  17. golang 模拟键盘输入
  18. 沐风:做一个会自动赚钱的小程序
  19. 脚本小子进阶之路(一)用开源武装自己
  20. Mac无法写入移动硬盘 这些软件帮你解决

热门文章

  1. 整理了46个python人工智能库,详细介绍(含资源),建议收藏
  2. Panoramic 控件设计举例
  3. 1. C语言的第一个程序
  4. 明翰大数据Spark与机器学习笔记V0.1(持续更新)
  5. win7下笔记本电脑给手机开热点
  6. 计算机管理的磁盘管理简单卷,win7磁盘管理分区后无法新建简单卷怎么解决
  7. 微信卡包开发php,一个用起来非常简单,功能丰富的微信开发包
  8. 一文读懂 | 云上用户如何灵活应用定制化网络服务
  9. Spring In Action 学习 第一章 Spring之旅
  10. 买的鱼丸怎么做好吃 鱼丸的家常做法介绍