文章目录

  • 0. 前言
  • 1. ACM MM
  • 2. CVPR
  • 3. ICCV
  • 4. AAAI

更新时间——2019.12 首稿


0. 前言

学习 VQA 的第一步——前期论文调研。 调研近几年在各大会议上的论文发表情况,来了解一下这个方向的进展,主要包括 CVPR, ICCV, ECCV,ACM MM,,AAAI。之后准备总结一下常用的数据集以及经典的方法。

1. ACM MM

ACM MM 是计算机科学与技术多媒体领域的主要国际会议,主要关注不同数字媒体产生的多角度信息整合与处理。而 VQA 隶属于其 多媒体内容理解主题里面(Understanding multimedia content)的 Vision and Language 分支。

1.1 ACM MM 2019

  • 不完全统计有 5 篇(包括Video / Visual Question Answer)
论文题目 作者
Multi-interaction Network with Object Relation for VideoQA 浙江大学
Learnable Aggregating Net with Divergent Loss for VideoQA 电子科技大学
Question-Aware Tube-Switch Network for VideoQA 中国科学技术大学
CRA-Net: Composed Relation Attention Network for Visual QA 电子科技大学
Erasing-based Attention Learning for Visual QA 中科院自动化所

1.2 ACM MM 2018

  • 不完全统计有 4 篇(包括Video / Visual Question Answer)
论文题目 作者单位
Explore Multi-Step Reasoning in Video Question Answering 天津大学
Fast Parameter Adaptation for Few-shot Image Captioning and Visual Question Answering 南方科技大学
Object-Difference Attention: A Simple Relational Attention for Visual Question Answering 北京邮电大学
Enhancing Visual Question Answering Using Dropout 中科院自动化所

1.3 ACM MM 2017

  • 不完全统计有 4 篇(包括Video / Visual Question Answer)
论文题目 作者单位
VideoQA via Hierarchical Dual-Level Attention Network Learning 浙江大学
VideoQA via Gradually Refined Attention over Appearance and Motion 浙江大学

2. CVPR

CVPR 全称 Conference on Computer Vision and Pattern Recognition, 中文名为国际计算机视觉与模式识别会议,一般是每年六月左右举行。

2.1 CVPR 2019

  • 不完全统计有 12 篇(包括Video / Visual Question Answer),但是基于视频的好像就一篇
论文题目 作者单位
Heterogeneous Memory Enhanced Multimodal Attention Model for VideoQA 京东研究院
MUREL: Multimodal Relational Reasoning for Visual Question Answering
OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge
Deep Modular Co-Attention Networks for Visual Question Answering
Visual Question Answering as Reading Comprehension
Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering
Cycle-Consistency for Robust Visual Question Answering
GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering
Progressive Attention Memory Network for Movie Story Question Answering
Transfer Learning via Unsupervised Task Discovery for Visual Question Answering
Explicit Bias Discovery in Visual Question Answering Models
Answer Them All! Toward Universal Visual Question Answering Models

2.2 CVPR 2018

  • 不完全统计有 15 篇(包括Video / Visual Question Answer),但是基于视频的好像就一篇
论文题目 作者单位
Motion-Appearance Co-Memory Networks for Video Question Answering
* Tips and Tricks for Visual Question Answering: Learnings From the 2017 Challenge
Don’t Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
Learning Answer Embeddings for Visual Question Answering
Cross-Dataset Adaptation for Visual Question Answering
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering
Visual Question Generation as Dual Task of Visual Question Answering
Focal Visual-Text Attention for Visual Question Answering
Visual Question Answering With Memory-Augmented Networks
Visual Question Reasoning on General Dependency Tree
Differential Attention for Visual Question Answering
Learning Visual Knowledge Memory Networks for Visual Question Answering
IVQA: Inverse Visual Question Answering
Customized Image Narrative Generation via Interactive Visual Question Generation and Answering

2.3 CVPR 2017

  • 不完全统计有 9 篇(包括Video / Visual Question Answer),没有基于视频的
论文题目 作者单位
Graph-Structured Representations for Visual Question Answering
Knowledge Acquisition for Visual Question Answering via Iterative Querying
The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions
TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering
End-To-End Concept Word Detection for Video Captioning, Retrieval, and Question Answering
Empirical Evaluation of Visual Question Answering for Novel Objects
Multi-Level Attention Networks for Visual Question Answering
A Dataset and Exploration of Models for Understanding Video Data Through Fill-In-The-Blank Question-Answering
Making the v in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering

3.3 CVPR 2016

  • 不完全统计有 8 篇(包括Video / Visual Question Answer),没有基于视频的,而且看起来是刚起步
论文题目 作者单位
Stacked Attention Networks for Image Question Answering
Image Question Answering Using Convolutional Neural Network With Dynamic Parameter Prediction
Where to Look: Focus Regions for Visual Question Answering
Ask Me Anything: Free-Form Visual Question Answering Based on Knowledge From External Sources
MovieQA: Understanding Stories in Movies Through Question-Answering
Answer-Type Prediction for Visual Question Answering
Visual7W: Grounded Question Answering in Images
Yin and Yang: Balancing and Answering Binary Visual Questions

3. ICCV

ICCV 全称 International Conference on Computer Vision, 中文名为国际计算机视觉大会,每两年在全世界范围内召开一次,录用率比较低,所以在业内评价较高,是三大CV顶会中公认级别最高的。

3.1 ICCV 2019

  • 不完全统计有 5 篇(包括Video / Visual Question Answer)
论文题目 作者单位
Compact Trilinear Interaction for Visual Question Answering
Why Does a Visual Question Have Different Answers?
Scene Text Visual Question Answering
Multi-Modality Latent Interaction Network for Visual Question Answering
Relation-Aware Graph Attention Network for Visual Question Answering

3.2 ICCV 2017

  • 不完全统计有 6 篇(包括Video / Visual Question Answer)
论文题目 作者单位
Learning to Reason: End-To-End Module Networks for Visual Question Answering
Structured Attentions for Visual Question Answering
Multi-Modal Factorized Bilinear Pooling With Co-Attention Learning for Visual Question Answering
An Analysis of Visual Question Answering Algorithms
MUTAN: Multimodal Tucker Fusion for Visual Question Answering
MarioQA: Answering Questions by Watching Gameplay Videos

3.3 ICCV 2015

  • 听名字感觉像是第一篇
论文题目 作者单位
VQA: Visual Question Answering

4. AAAI

视频问答与推理(Video Question Answering and Reasoning)——论文调研相关推荐

  1. 【VideoQA最新论文阅读】第一篇视频问答综述Video Question Answering: a Survey of Models and Datasets

    Video Question Answering: a Survey of Models and Datasets 长文预警!!! p.s.此篇文章于2021年1月25日新鲜出炉,在Springer需 ...

  2. Video Question Answering综述

    目录 引言 选择型视频问答 开放型视频问答 选择型.开放型均可的视频问答 结论 参考文献 引言 视频问答是视觉语言领域较为新兴的一个课题,需要根据视频内容和问题进行分析,得出问题的答案.根据回答形式, ...

  3. 【论文阅读】Multi-hop Question Answering via Reasoning Chains

    Multi-hop Question Answering via Reasoning Chains 论文:2019-Multi-hop Question Answering via Reasoning ...

  4. Open-Domain Question Answering相关部分论文阅读摘要

    主要内容 Open-Domain Question Answering相关部分论文阅读摘要 DrQA(Reading Wikipedia to Answer Open-Domain Questions ...

  5. CVPR 2020 Modality Shifting Attention Network for Multi-modal Video Question Answering

    动机 VQA具有挑战性,因为它需要同时使用图像和文本执行细粒度推理的能力.视频问答(VideoQA)和多模态视频问答(MVQA)都是这种需要推理的任务. 与VQA或VideoQA相比,MVQA是一项更 ...

  6. Divide and Conquer:Question-Guided Spatio-Temporal Contextual Attention for Video Question Answering

    动机 理解问题和寻找答案的线索是视频问答的关键. VQA任务主要分为图像问答(Image QA)和视频问答(Video QA)两种,针对不同视觉材料的自然语言问题进行回答.通常,理解问题并在给定的视觉 ...

  7. AAAI 2020 Location-aware Graph Convolutional Networks for Video Question Answering

    动机 视频问答(Video QA)是计算机视觉领域的一个新兴课题,由于其在人工问答系统.机器人对话.视频检索等方面的广泛应用,近年来受到越来越多的关注.与深入研究的图像问答(Image QA)任务不同 ...

  8. AAAI 2020 Reasoning with Heterogeneous Graph Alignment for Video Question Answering∗

    动机 视频问答(VideoQA)的推理通常涉及两个领域的异构数据,即时空视频内容和语言文字序列.现有的方法主要集中在多模态的表示和融合方面,在对齐和推理方面的研究还很少. 近年来,多模态问答技术取得了 ...

  9. VideoQA论文阅读笔记——Heterogeneous Memory Enhanced Multimodal Attention Model for Video Question Answering

    论文:Heterogeneous Memory Enhanced Multimodal Attention Model for VQA 来源:CVPR2019 作者:京东研究院 源码: Github ...

最新文章

  1. 【PHP】字符串去空格并将每个单词首字母转换成大写de多种解法
  2. 使用 git 下载linux 源码
  3. Java语法基础-1
  4. Windows下visual studio code搭建golang开发环境
  5. WinSCP+PuTTY搭配使用 ,解决Windows连接Linux系统文件传输和终端登陆
  6. spark-stream 访问 Redis
  7. matlab编程 英文翻译,MATLAB编程,MATLAB programming,音标,读音,翻译,英文例句,英语词典...
  8. 2017数学建模b题回顾_12月热门文章和2017年回顾
  9. jQuery中ajax的使用和缓存问题解决
  10. 深入研究java.lang.Class类
  11. WinForm 实例教程 通讯录 视频教程 入门教程
  12. Thinking in Java 11.3 添加一组元素
  13. 飞翔的小鸟(FlyBird)游戏C语言编程(含撞柱子)
  14. Apache Rewrite 详解 RewriteBase
  15. ARPG游戏角色行为分析
  16. assigning the result of this type assertion to a variable could eliminate the followin assertion解决
  17. ThinkPHP在线小说阅读管理系统
  18. 怎么给网页加动态背景
  19. Comparable Comparator
  20. 232:vue+openlayers选择左右两部分的地图,不重复,横向卷帘

热门文章

  1. matlab数值逼近,Matlab与数值分析函数的数值逼近作业
  2. FPGA功耗估计(二)
  3. Linux中的几种定时器
  4. UVa 12086 - Potentiometers
  5. 财会法规与职业道德【17】
  6. dell服务器r730老自动重启_dell r730服务器开机故障,进不去系统了
  7. java hybris_java – Hybris Entity未找到异常
  8. 教你如何自己创造一个头文件
  9. Linux基础入门,简单讲解
  10. 全球变暖--2018蓝桥杯省赛