每日学术速递2.10

Subjects: cs.Cv

1.Spatiotemporal Deformation Perception for Fisheye Video Rectification

标题：鱼眼视频矫正的时空形变感知

作者：Shangrong Yang, Chunyu Lin, Kang Liao, Yao Zhao

文章链接：https://arxiv.org/abs/2302.03934v1

项目代码：https://github.com/uof1745-cmd/sdp

摘要：

虽然鱼眼图像的失真校正已被广泛研究，但鱼眼视频的校正仍是一个难以捉摸的挑战。对于鱼眼视频的不同帧，现有的图像校正方法忽略了序列的相关性，导致校正后的视频出现时间上的抖动。为了解决这个问题，我们提出了一个时间加权方案，以获得一个合理的全局光流，通过逐步减少帧的权重来缓解抖动效应。随后，我们观察到，视频的帧间光流有利于感知鱼眼视频的局部空间变形。因此，我们通过鱼眼视频和无变形视频的流来推导空间变形，从而提高预测结果的局部准确性。然而，每一帧的独立校正会破坏时间上的关联性。由于鱼眼视频的特性，一个扭曲的运动物体可能会在另一个时刻找到其无扭曲的模式。为此，我们设计了一个时间变形聚合器来重建帧之间的变形相关性，并提供一个可靠的全局特征。我们的方法实现了端到端的校正，与SOTA校正方法相比，在校正质量和稳定性方面表现出优越性。

Although the distortion correction of fisheye images has been extensively studied, the correction of fisheye videos is still an elusive challenge. For different frames of the fisheye video, the existing image correction methods ignore the correlation of sequences, resulting in temporal jitter in the corrected video. To solve this problem, we propose a temporal weighting scheme to get a plausible global optical flow, which mitigates the jitter effect by progressively reducing the weight of frames. Subsequently, we observe that the inter-frame optical flow of the video is facilitated to perceive the local spatial deformation of the fisheye video. Therefore, we derive the spatial deformation through the flows of fisheye and distorted-free videos, thereby enhancing the local accuracy of the predicted result. However, the independent correction for each frame disrupts the temporal correlation. Due to the property of fisheye video, a distorted moving object may be able to find its distorted-free pattern at another moment. To this end, a temporal deformation aggregator is designed to reconstruct the deformation correlation between frames and provide a reliable global feature. Our method achieves an end-to-end correction and demonstrates superiority in correction quality and stability compared with the SOTA correction methods.

2.Convolutional Neural Networks Trained to Identify Words Provide a Good Account of Visual Form Priming Effects

标题：训练有素的卷积神经网络为识别单词提供了一个很好的视觉形式诱导效应的说明

作者：Dong Yin, Valerio Biscione, Jeffrey Bowers

文章链接：https://arxiv.org/abs/2302.03992v1

项目代码：https://github.com/don-yin/orthographic-dnn

摘要：

为了解释提供字母串之间正字学相似性测量的掩蔽引语数据，人们开发了各种各样的正字学编码方案和视觉单词识别模型。这些模型倾向于包括手工编码的正字表征，并对特定形式的知识进行单一单元编码（例如，对特定位置的字母或字母序列进行编码的单元）。在这里，我们评估了这些编码方案和模型的范围是如何解释形式引力项目中的形式引力效果的，并将这些发现与计算机科学中开发的11个标准深度神经网络模型（DNNs）中观察到的结果进行了比较。我们发现，深度卷积网络的表现与编码方案和单词识别模型一样好，甚至更好，而转化器网络则表现较差。卷积网络的成功是显著的，因为它们的架构不是为支持单词识别而开发的（它们被设计为在物体识别上表现良好），而且它们对单词的像素图像进行分类（而对字母串进行人工编码）。这些发现补充了最近的工作（Hannagan等人，2021年），表明卷积网络可能捕获视觉单词识别的关键方面。

A wide variety of orthographic coding schemes and models of visual word identification have been developed to account for masked priming data that provide a measure of orthographic similarity between letter strings. These models tend to include hand-coded orthographic representations with single unit coding for specific forms of knowledge (e.g., units coding for a letter in a given position or a letter sequence). Here we assess how well a range of these coding schemes and models account for the pattern of form priming effects taken from the Form Priming Project and compare these findings to results observed in with 11 standard deep neural network models (DNNs) developed in computer science. We find that deep convolutional networks perform as well or better than the coding schemes and word recognition models, whereas transformer networks did less well. The success of convolutional networks is remarkable as their architectures were not developed to support word recognition (they were designed to perform well on object recognition) and they classify pixel images of words (rather artificial encodings of letter strings). The findings add to the recent work of (Hannagan et al., 2021) suggesting that convolutional networks may capture key aspects of visual word identification.

3.Cross-Layer Retrospective Retrieving via Layer Attention(ICLR 2023)

标题：通过层注意进行跨层回顾性检索

作者：Yanwen Fang, Yuxi Cai, Jintai Chen, Jingyu Zhao, Guangjian Tian, Guodong Li

文章链接：https://arxiv.org/abs/2302.03985v2

项目代码：https://github.com/joyfang1106/mrla

摘要：

越来越多的证据表明，加强层间互动可以增强深度神经网络的表征能力，而自我注意擅长通过检索查询激活的信息来学习相互依赖。受此启发，我们设计了一种跨层注意机制，称为多头循环层注意（MRLA），它将当前层的查询表征发送到所有以前的层，以便从不同层次的感受野检索查询相关信息。还提出了一个轻量级的MRLA版本，以减少二次计算的成本。所提出的层关注机制可以丰富许多最先进的视觉网络的表示能力，包括CNN和视觉变换器。它的有效性已经在图像分类、物体检测和实例分割任务中得到了广泛的评估，可以持续观察到改进。例如，我们的MRLA可以在ResNet-50上提高1.6％的Top-1准确性，而只引入了0.16M的参数和0.07B的FLOPs。令人惊讶的是，在密集预测任务中，它能以很大的幅度提高3-4%的盒式AP和掩码AP的性能。

More and more evidence has shown that strengthening layer interactions can enhance the representation power of a deep neural network, while self-attention excels at learning interdependencies by retrieving query-activated information. Motivated by this, we devise a cross-layer attention mechanism, called multi-head recurrent layer attention (MRLA), that sends a query representation of the current layer to all previous layers to retrieve query-related information from different levels of receptive fields. A light-weighted version of MRLA is also proposed to reduce the quadratic computation cost. The proposed layer attention mechanism can enrich the representation power of many state-of-the-art vision networks, including CNNs and vision transformers. Its effectiveness has been extensively evaluated in image classification, object detection and instance segmentation tasks, where improvements can be consistently observed. For example, our MRLA can improve 1.6% Top-1 accuracy on ResNet-50, while only introducing 0.16M parameters and 0.07B FLOPs. Surprisingly, it can boost the performances by a large margin of 3-4% box AP and mask AP in dense prediction tasks. Our code is available at https://github.com/joyfang1106/MRLA.