1、Visual reasoning[from RAVEN CVPR19]

Early attempts were made in 1940s-1970s in the field of logic-based AI. Newell argued that one of the potential solutions to AI was “to construct a single program that would take a standard intelligence test” [42].There are two important trials: (i) Evans presented an AI algorithm that solved a type of geometric analogy tasks in the Wechsler Adult Intelligence Scale (WAIS) test [10, 11],and (ii) Simon and Kotovsky devised a program that solved Thurstone letter series completion problems [54]. However,these early attempts were heuristic-based with hand-crafted rules, making it difficult to apply to other problems.
The reasoning ability of modern vision systems was first systematically analyzed in the CLEVR dataset [22].By carefully controlling inductive bias and slicing the vision systems’ reasoning ability into several axes, Johnson et al. successfully identified major drawbacks of existing models. A subsequent work [23] on this dataset achieved good performance by introducing a program generator in a structured space and combining it with a program execution engine. A similar work that also leveraged language guided structured reasoning was proposed in [18]. Modules with special attention mechanism were latter proposed in an end-to-end manner to solve this visual reasoning task [19, 49, 59]. However, superior performance gain was observed in very recent works [6, 36, 58] that fell back to structured representations by using primitives, dependency trees, or logic. These works also inspire us to incorporate structure information into solving the RPM problem.

More generally, Bisk et al. [4] studied visual reasoning in a 3D block world. Perez et al. [46] introduced a conditional layer for visual reasoning. Aditya et al. [1] proposed a probabilistic soft logic in an attention module to increase
model interpretability. And Barrett et al. [3] measured abstract reasoning in neural networks.


       CLEVR数据集[22]首次系统地分析了现代视觉系统的推理能力,通过仔细控制归纳偏差,将视觉系统的推理能力分成几个轴,Johnson等人。成功地识别了现有模型的主要缺点。关于这个数据集的后续工作[23]通过在结构化空间中引入程序生成器并将其与程序执行引擎结合,获得了良好的性能。文献[18]中提出了一项类似的工作,也利用了语言引导的结构化推理。后者以端到端的方式提出具有特殊注意机制的模块来解决这一视觉推理任务[19,49,59]。然而,在最近的研究中发现了优越的性能增益[6,36,58],这些工作通过使用原语、依赖树或逻辑回到结构化表示。这些工作也启发我们将结构信息融入到解决RPM问题中。             更广泛地说,Bisk等人。[4] 研究了三维块世界中的视觉推理。Perez等人。[46]为视觉推理引入了条件层。Aditya等人。[1] 提出了一种概率软逻辑在注意模块中的增加模型可解释性。Barrett等人。[3] 神经网络中的度量抽象推理。

Visual reasoning相关推荐

  1. 神经网络也可以有逻辑——解析视觉推理(Visual Reasoning)

    本文来源知乎,感谢作者Flood Sung授权转载! 前言 在我们的上一篇文章 最前沿:百家争鸣的Meta Learning/Learning to learn 中,我们谈到了星际2 需要AI具备极好 ...

  2. 【论文阅读】Interpretable Visual Reasoning via Probabilistic Formulation under Natural Supervision

    [论文阅读]Interpretable Visual Reasoning via Probabilistic Formulation under Natural Supervision 目录 [论文阅 ...

  3. Visual Reasoning(1): CLEVR Dataset

    CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning Introduction ...

  4. VQA+Visual Reasoning SOTA探索

    2014-2019年VQA论文:https://heary.cn/posts/VQA-%E8%BF%91%E4%BA%94%E5%B9%B4%E8%A7%86%E8%A7%89%E9%97%AE%E7 ...

  5. Visual Reasoning Strategies for Effect Size Judgments and Decisions

    论文传送门 作者 华盛顿大学 Alex Kale Matthew Kay 美国西北大学 Jessica Hullman 摘要 不确定性可视化经常强调点估计,以支持规模估计或通过可视化比较的决策.然而, ...

  6. (VQA)LRTA: A Transparent Neural-Symbolic Reasoning Framework with Modular Supervision for Visual Que

    发表于2020年的一篇文章 LRTA神经符号推理框架 视觉问答目前的主要方法依赖于"黑盒"神经编码器()对图像问题进行编码,难以为预测过程提供直观的.人类可读的证明形式, 本文提出 ...

  7. Visual BERT论文的简单汇总

    目录 ICCV 2019 VideoBERT NIPS 2019 ViLBERT arXiv 2019 VisualBERT arXiv 2019 CBT arXiv 2019 UNITER EMNL ...

  8. Visual Question Answering概述

    目录 任务描述 应用领域 主要问题 主流框架 常用数据集 Metrics 部分数据集介绍摘自这篇博客 任务描述 输入:图片III.由nnn个单词组成的问题Q={q1,...,qn}Q=\{ q_1,. ...

  9. 多模态 Generalized Visual Language Models

    点击上方"迈微AI研习社",选择"星标★"公众号 重磅干货,第一时间送达 多年来,人们一直在研究处理图像以生成文本,例如图像字幕和视觉问答.传统上,此类系统依赖 ...

  10. Learning Visual Commonsense for Robust Scene Graph Generation论文笔记

    原论文地址:https://link.springer.com/content/pdf/10.1007/978-3-030-58592-1_38.pdf 目录 总体结构: 感知模型GLAT: 融合感知 ...


  1. 新一代企业内部故障报修软件功能实现
  2. 微信搜索谁把你删除了
  3. 初心大陆-----python宝典之以外学习对比去重
  4. 数据中心布线系统的整体规划
  5. [剑指offer]面试题第[48]题[Leetcode][JAVA][第3题][无重复字符的最长字串][滑动窗口][HashSet/Map]
  6. 七、操作系统——动态分区分配算法(详解)
  7. 计算机基础知识教程算法,快速掌握!计算机二级公共基础知识教程:算法
  8. 第二十单元 计划任务crond服务
  9. 基于springboot+vue的医院预约系统(前后端分离)
  10. 数据库创建索引的规则
  11. selenium+python在Windows的环境搭建
  12. 进销存excel_EXCEL进销存系统升级版,自动库存更新,利润毛利分析一键操作
  13. c语言逻辑运算符的作用,C语言逻辑运算符知识整理
  14. 【java基础】同比和环比
  15. 手机号和座机号正则表达式
  16. 编程菜鸟到大鸟--代码积累
  17. Dialog 宽度占满全屏
  18. java学习四个月以来的想法
  19. 2021金融保险行业数据泄露大事件
  20. 损失函数结果不理想解决方案——机器学习


  1. 开篇记(好记性不如烂笔头)
  2. java 用户拒绝对代码授予权限_java – @Secured函数获取授权用户的拒绝访问权限...
  3. 基于ArcGIS的城市住房选址分析(以郑州市为例)
  4. linux inet_aton使用实例,C语言中实现inet_aton和inet_ntoa函数功能
  5. RxJava模式与原理-----标准观察者与RxJava观察者
  6. 《平成的超级偶像金牌舔狗》之mmdetection,paddle detection安装,demo跑通,训练跑通,保姆级教学
  7. P3390 【模板】矩阵快速幂
  8. 算法-使用双指针遍历删除链表节点
  9. 《UnityAPI.ParticleSystem粒子系统》(Yanlz+Unity+SteamVR+云技术+5G+AI+VR云游戏+Particle+loop+Emit+立钻哥哥++OK++)
  10. batchnomal_Linux Kernel 排程機制介紹 ? Loda's blog