Subjects: cs.Cv

1.Spatiotemporal Deformation Perception for Fisheye Video Rectification

标题:鱼眼视频矫正的时空形变感知

作者:Shangrong Yang, Chunyu Lin, Kang Liao, Yao Zhao

文章链接:https://arxiv.org/abs/2302.03934v1

项目代码:https://github.com/uof1745-cmd/sdp

摘要:

虽然鱼眼图像的失真校正已被广泛研究,但鱼眼视频的校正仍是一个难以捉摸的挑战。对于鱼眼视频的不同帧,现有的图像校正方法忽略了序列的相关性,导致校正后的视频出现时间上的抖动。为了解决这个问题,我们提出了一个时间加权方案,以获得一个合理的全局光流,通过逐步减少帧的权重来缓解抖动效应。随后,我们观察到,视频的帧间光流有利于感知鱼眼视频的局部空间变形。因此,我们通过鱼眼视频和无变形视频的流来推导空间变形,从而提高预测结果的局部准确性。然而,每一帧的独立校正会破坏时间上的关联性。由于鱼眼视频的特性,一个扭曲的运动物体可能会在另一个时刻找到其无扭曲的模式。为此,我们设计了一个时间变形聚合器来重建帧之间的变形相关性,并提供一个可靠的全局特征。我们的方法实现了端到端的校正,与SOTA校正方法相比,在校正质量和稳定性方面表现出优越性。

Although the distortion correction of fisheye images has been extensively studied, the correction of fisheye videos is still an elusive challenge. For different frames of the fisheye video, the existing image correction methods ignore the correlation of sequences, resulting in temporal jitter in the corrected video. To solve this problem, we propose a temporal weighting scheme to get a plausible global optical flow, which mitigates the jitter effect by progressively reducing the weight of frames. Subsequently, we observe that the inter-frame optical flow of the video is facilitated to perceive the local spatial deformation of the fisheye video. Therefore, we derive the spatial deformation through the flows of fisheye and distorted-free videos, thereby enhancing the local accuracy of the predicted result. However, the independent correction for each frame disrupts the temporal correlation. Due to the property of fisheye video, a distorted moving object may be able to find its distorted-free pattern at another moment. To this end, a temporal deformation aggregator is designed to reconstruct the deformation correlation between frames and provide a reliable global feature. Our method achieves an end-to-end correction and demonstrates superiority in correction quality and stability compared with the SOTA correction methods.

2.Convolutional Neural Networks Trained to Identify Words Provide a Good Account of Visual Form Priming Effects

标题:训练有素的卷积神经网络为识别单词提供了一个很好的视觉形式诱导效应的说明

作者:Dong Yin, Valerio Biscione, Jeffrey Bowers

文章链接:https://arxiv.org/abs/2302.03992v1

项目代码:https://github.com/don-yin/orthographic-dnn

摘要:

为了解释提供字母串之间正字学相似性测量的掩蔽引语数据,人们开发了各种各样的正字学编码方案和视觉单词识别模型。这些模型倾向于包括手工编码的正字表征,并对特定形式的知识进行单一单元编码(例如,对特定位置的字母或字母序列进行编码的单元)。在这里,我们评估了这些编码方案和模型的范围是如何解释形式引力项目中的形式引力效果的,并将这些发现与计算机科学中开发的11个标准深度神经网络模型(DNNs)中观察到的结果进行了比较。我们发现,深度卷积网络的表现与编码方案和单词识别模型一样好,甚至更好,而转化器网络则表现较差。卷积网络的成功是显著的,因为它们的架构不是为支持单词识别而开发的(它们被设计为在物体识别上表现良好),而且它们对单词的像素图像进行分类(而对字母串进行人工编码)。这些发现补充了最近的工作(Hannagan等人,2021年),表明卷积网络可能捕获视觉单词识别的关键方面。

A wide variety of orthographic coding schemes and models of visual word identification have been developed to account for masked priming data that provide a measure of orthographic similarity between letter strings. These models tend to include hand-coded orthographic representations with single unit coding for specific forms of knowledge (e.g., units coding for a letter in a given position or a letter sequence). Here we assess how well a range of these coding schemes and models account for the pattern of form priming effects taken from the Form Priming Project and compare these findings to results observed in with 11 standard deep neural network models (DNNs) developed in computer science. We find that deep convolutional networks perform as well or better than the coding schemes and word recognition models, whereas transformer networks did less well. The success of convolutional networks is remarkable as their architectures were not developed to support word recognition (they were designed to perform well on object recognition) and they classify pixel images of words (rather artificial encodings of letter strings). The findings add to the recent work of (Hannagan et al., 2021) suggesting that convolutional networks may capture key aspects of visual word identification.

3.Cross-Layer Retrospective Retrieving via Layer Attention(ICLR 2023)

标题:通过层注意进行跨层回顾性检索

作者:Yanwen Fang, Yuxi Cai, Jintai Chen, Jingyu Zhao, Guangjian Tian, Guodong Li

文章链接:https://arxiv.org/abs/2302.03985v2

项目代码:https://github.com/joyfang1106/mrla

摘要:

越来越多的证据表明,加强层间互动可以增强深度神经网络的表征能力,而自我注意擅长通过检索查询激活的信息来学习相互依赖。受此启发,我们设计了一种跨层注意机制,称为多头循环层注意(MRLA),它将当前层的查询表征发送到所有以前的层,以便从不同层次的感受野检索查询相关信息。还提出了一个轻量级的MRLA版本,以减少二次计算的成本。所提出的层关注机制可以丰富许多最先进的视觉网络的表示能力,包括CNN和视觉变换器。它的有效性已经在图像分类、物体检测和实例分割任务中得到了广泛的评估,可以持续观察到改进。例如,我们的MRLA可以在ResNet-50上提高1.6%的Top-1准确性,而只引入了0.16M的参数和0.07B的FLOPs。令人惊讶的是,在密集预测任务中,它能以很大的幅度提高3-4%的盒式AP和掩码AP的性能。

More and more evidence has shown that strengthening layer interactions can enhance the representation power of a deep neural network, while self-attention excels at learning interdependencies by retrieving query-activated information. Motivated by this, we devise a cross-layer attention mechanism, called multi-head recurrent layer attention (MRLA), that sends a query representation of the current layer to all previous layers to retrieve query-related information from different levels of receptive fields. A light-weighted version of MRLA is also proposed to reduce the quadratic computation cost. The proposed layer attention mechanism can enrich the representation power of many state-of-the-art vision networks, including CNNs and vision transformers. Its effectiveness has been extensively evaluated in image classification, object detection and instance segmentation tasks, where improvements can be consistently observed. For example, our MRLA can improve 1.6% Top-1 accuracy on ResNet-50, while only introducing 0.16M parameters and 0.07B FLOPs. Surprisingly, it can boost the performances by a large margin of 3-4% box AP and mask AP in dense prediction tasks. Our code is available at https://github.com/joyfang1106/MRLA.

每日学术速递2.10相关推荐

  1. 每日学术速递4.10

    CV - 计算机视觉 |  ML - 机器学习 |  RL - 强化学习 | NLP 自然语言处理 Subjects: cs.CV 1.Super-Resolving Face Image by Fa ...

  2. 每日学术速递5.10

    CV - 计算机视觉 |  ML - 机器学习 |  RL - 强化学习 | NLP 自然语言处理 Subjects: cs.CV 1.ZipIt! Merging Models from Diffe ...

  3. 每日学术速递5.15

    CV - 计算机视觉 |  ML - 机器学习 |  RL - 强化学习 | NLP 自然语言处理 Subjects: cs.CL 1.Not All Languages Are Created Eq ...

  4. 每日学术速递4.30

    CV - 计算机视觉 |  ML - 机器学习 |  RL - 强化学习 | NLP 自然语言处理 Subjects: cs.CV 1.Masked Frequency Modeling for Se ...

  5. 每日学术速递1.26

    CV - 计算机视觉 今天带来的是北航IRIP实验室被国际人工智能联合会议IJCAI-ECAI 2022接收的3篇论文. IJCAI 是人工智能领域中最主要的学术会议之一,原为单数年召开,自2015年 ...

  6. 每日学术速递1.27

    CV - 计算机视觉  |  ML - 机器学习 |  RL - 强化学习 前沿推介: ICLR 2023 ICLR 全称为国际学习表征会议(International Conference on L ...

  7. 每日学术速递3.15

    CV - 计算机视觉 |  ML - 机器学习 |  RL - 强化学习 | NLP 自然语言处理 Subjects: cs.CV 1.MVImgNet: A Large-scale Dataset ...

  8. 每日学术速递5.21

    CV - 计算机视觉 |  ML - 机器学习 |  RL - 强化学习 | NLP 自然语言处理 Subjects: cs.CV 1.Going Denser with Open-Vocabular ...

  9. 每日学术速递5.29

    CV - 计算机视觉 |  ML - 机器学习 |  RL - 强化学习 | NLP 自然语言处理 Subjects: cs.CV 1.Custom-Edit: Text-Guided Image E ...

最新文章

  1. 拿到腾讯字节快手 offer 后,他的 LeetCode 刷题经验在 GitHub 火了!
  2. Gitlab+Jenkins学习之路(三)之gitlab权限管理--issue管理
  3. 每日一道算法题--leetcode 509--斐波那契数(动态规划)--python
  4. 6174问题 --ACM解决方法
  5. 使用maven的profile区分本地环境和线上环境
  6. python爬取新闻网站标题_python如何正确抓取网页标题
  7. mysql业务数据库回退_理解MySQL数据库事务-隔离性
  8. 7-118 估值一亿的AI核心代码 (20 分)
  9. 【20181031T2】几串字符【数位DP思想+组合数】
  10. java线程条件变量_Java线程:条件变量 lock
  11. Tableau数据可视化案例
  12. UG标准件库的使用方法
  13. Web前端初步——IDE工具选择和emment插件
  14. c语言 zipf分布,Zipf分布:如何测量Zipf分布
  15. linkedin python 领英技能 测评
  16. 自适应辛普森(Simpson)积分
  17. 跟开涛学shiro练习代码
  18. ms17-010(永恒之蓝)漏洞利用
  19. 最新弹幕播放器源码/支持对接苹果+蓝光接口API
  20. 信贷业务全流程22个环节

热门文章

  1. 爬取b站“开启一个时代”周杰伦mv《可爱女人》弹幕,以及词云制作
  2. 基于Socket的即时通信系统—CS模式(未完待续)
  3. 六级核心词汇(不熟悉部分)
  4. 选购CD-R/RW dvd盘片
  5. hdu2894// 算法竞赛——进阶指南——acwing 400. 太鼓达人 欧拉回路经典题 //欧拉回路的建模小结
  6. windows系统如何解除宽带限速?
  7. android字库使用
  8. 官方网站下载conda包并本地安装
  9. 戴尔塔式服务器显示器掉帧,【戴尔 SP2318H IPS显示器使用感受】颗粒感|掉帧|指示灯_摘要频道_什么值得买...
  10. 【限速标志识别】形态学限速标志识别【含GUI Matlab源码 1142期】