CVPR 2019 | 近日新出论文汇总（含视频目标分割、GAN、度量学习、高效语义分割等主题）...

点击我爱计算机视觉标星，更快获取CVML新技术

CV君汇总了最近两天新出的CVPR 2019 论文，涵盖内容包括：度量学习、视频目标分割、GAN图像生成、基于RGB图像的物体表面网格生成、深度补全、高效卷积网络设计、高效语义分割。

大家可于我爱计算机视觉公众号对话界面回复“cvpr314”，即可收到本文所列出的所有论文下载。

[Oral]Hardness-Aware Deep Metric Learning

Wenzhao Zheng, Zhaodong Chen, Jiwen Lu, Jie Zhou

Abstract This paper presents a hardness-aware deep metric learning (HDML) framework. Most previous deep metric learning methods employ the hard negative mining strategy to alleviate the lack of informative samples for training. However, this mining strategy only utilizes a subset of training data, which may not be enough to characterize the global geometry of the embedding space comprehensively. To address this problem, we perform linear interpolation on embeddings to adaptively manipulate their hard levels and generate corresponding label-preserving synthetics for recycled training, so that information buried in all samples can be fully exploited and the metric is always challenged with proper difficulty. Our method achieves very competitive performance on the widely used CUB-200-2011, Cars196, and Stanford Online Products datasets.

https://arxiv.org/abs/1903.05503

https://github.com/wzzheng/HDML

[Oral]A Skeleton-bridged Deep Learning Approach for Generating Meshes of Complex Topologies from Single RGB Images

Jiapeng Tang, Xiaoguang Han, Junyi Pan, Kui Jia, Xin Tong

Abstract This paper focuses on the challenging task of learning 3D object surface reconstructions from single RGB images. Existing methods achieve varying degrees of success by using different geometric representations. However, they all have their own drawbacks, and cannot well reconstruct those surfaces of complex topologies. To this end, we propose in this paper a skeleton-bridged, stage-wise learning approach to address the challenge. Our use of skeleton is due to its nice property of topology preservation, while being of lower complexity to learn. To learn skeleton from an input image, we design a deep architecture whose decoder is based on a novel design of parallel streams respectively for synthesis of curve- and surface-like skeleton points. We use different shape representations of point cloud, volume, and mesh in our stage-wise learning, in order to take their respective advantages. We also propose multi-stage use of the input image to correct prediction errors that are possibly accumulated in each stage. We conduct intensive experiments to investigate the efficacy of our proposed approach. Qualitative and quantitative results on representative object categories of both simple and complex topologies demonstrate the superiority of our approach over existing ones. We will make our ShapeNet-Skeleton dataset publicly available.

https://arxiv.org/abs/1903.04704

Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis

Qi Mao, Hsin-Ying Lee, Hung-Yu Tseng, Siwei Ma, Ming-Hsuan Yang

Abstract Most conditional generation tasks expect diverse outputs given a single conditional context. However, conditional generative adversarial networks (cGANs) often focus on the prior conditional information and ignore the input noise vectors, which contribute to the output variations. Recent attempts to resolve the mode collapse issue for cGANs are usually task-specific and computationally expensive. In this work, we propose a simple yet effective regularization term to address the mode collapse issue for cGANs. The proposed method explicitly maximizes the ratio of the distance between generated images with respect to the corresponding latent codes, thus encouraging the generators to explore more minor modes during training. This mode seeking regularization term is readily applicable to various conditional generation tasks without imposing training overhead or modifying the original network structures. We validate the proposed algorithm on three conditional image synthesis tasks including categorical generation, image-toimage translation, and text-to-image synthesis with different baseline models. Both qualitative and quantitative results demonstrate the effectiveness of the proposed regularization method for improving diversity without loss of quality.

https://arxiv.org/abs/1903.05628

https://github.com/HelenMao/MSGAN

RVOS: End-to-End Recurrent Network for Video Object Segmentation

Carles Ventura, Miriam Bellver, Andreu Girbau, Amaia Salvador, Ferran Marques, Xavier Giro-i-Nieto

Abstract Multiple object video object segmentation is a challenging task, specially for the zero-shot case, when no object mask is given at the initial frame and the model has to find the objects to be segmented along the sequence. In our work, we propose a Recurrent network for multiple object Video Object Segmentation (RVOS) that is fully end-to-end trainable. Our model incorporates recurrence on two different domains: (i) the spatial, which allows to discover the different object instances within a frame, and (ii) the temporal, which allows to keep the coherence of the segmented objects along time. We train RVOS for zero-shot video object segmentation and are the first ones to report quantitative results for DAVIS-2017 and YouTube-VOS benchmarks. Further, we adapt RVOS for one-shot video object segmentation by using the masks obtained in previous time steps as inputs to be processed by the recurrent module. Our model reaches comparable results to state-of-the-art techniques in YouTube-VOS benchmark and outperforms all previous video object segmentation methods not using online learning in the DAVIS-2017 benchmark. Moreover, our model achieves faster inference runtimes than previous methods, reaching 44ms/frame on a P100 GPU.

https://arxiv.org/abs/1903.05612

https://imatge-upc.github.io/rvos/

Depth Coefficients for Depth Completion

Saif Imran, Yunfei Long, Xiaoming Liu, Daniel Morris

Abstract Depth completion involves estimating a dense depth image from sparse depth measurements, often guided by a color image. While linear upsampling is straight forward, it results in artifacts including depth pixels being interpolated in empty space across discontinuities between objects. Current methods use deep networks to upsample and ”complete” the missing depth pixels. Nevertheless, depth smearing between objects remains a challenge. We propose a new representation for depth called Depth Coefficients (DC) to address this problem. It enables convolutions to more easily avoid inter-object depth mixing. We also show that the standard Mean Squared Error (MSE) loss function can promote depth mixing, and thus propose instead to use cross-entropy loss for DC. With quantitative and qualitative evaluation on benchmarks, we show that switching out sparse depth input and MSE loss with our DC representation and crossentropy loss is a simple way to improve depth completion performance, and reduce pixel depth mixing, which leads to improved depth-based object detection.

https://arxiv.org/abs/1903.05421

All You Need is a Few Shifts: Designing Efficient Convolutional Neural Networks for Image Classification

Weijie Chen, Di Xie, Yuan Zhang, Shiliang Pu

Abstract Shift operation is an efficient alternative over depthwise separable convolution. However, it is still bottlenecked by its implementation manner, namely memory movement. To put this direction forward, a new and novel basic component named Sparse Shift Layer (SSL) is introduced in this paper to construct efficient convolutional neural networks. In this family of architectures, the basic block is only composed by 1x1 convolutional layers with only a few shift operations applied to the intermediate feature maps. To make this idea feasible, we introduce shift operation penalty during optimization and further propose a quantization-aware shift learning method to impose the learned displacement more friendly for inference. Extensive ablation studies indicate that only a few shift operations are sufficient to provide spatial information communication. Furthermore, to maximize the role of SSL, we redesign an improved network architecture to Fully Exploit the limited capacity of neural Network (FE-Net). Equipped with SSL, this network can achieve 75.0% top-1 accuracy on ImageNet with only 563M M-Adds. It surpasses other counterparts constructed by depthwise separable convolution and the networks searched by NAS in terms of accuracy and practical speed.

https://arxiv.org/abs/1903.05285

Dense Classification and Implanting for Few-Shot Learning

Yann Lifchitz, Yannis Avrithis, Sylvaine Picard, Andrei Bursuc

Abstract Training deep neural networks from few examples is a highly challenging and key problem for many computer vision tasks. In this context, we are targeting knowledge transfer from a set with abundant data to other sets with few available examples. We propose two simple and effective solutions: (i) dense classification over feature maps, which for the first time studies local activations in the domain of few-shot learning, and (ii) implanting, that is, attaching new neurons to a previously trained network to learn new, taskspecific features. On miniImageNet, we improve the prior state-of-the-art on few-shot classification, i.e., we achieve 62.5%, 79.8% and 83.8% on 5-way 1-shot, 5-shot and 10- shot settings respectively.

https://arxiv.org/abs/1903.05050

Knowledge Adaptation for Efficient Semantic Segmentation

Tong He, Chunhua Shen, Zhi Tian, Dong Gong, Changming Sun, Youliang Yan

Abstract Both accuracy and efficiency are of significant importance to the task of semantic segmentation. Existing deep FCNs suffer from heavy computations due to a series of high-resolution feature maps for preserving the detailed knowledge in dense estimation. Although reducing the feature map resolution (i.e., applying a large overall stride) via subsampling operations (e.g., pooling and convolution striding) can instantly increase the efficiency, it dramatically decreases the estimation accuracy. To tackle this dilemma, we propose a knowledge distillation method tailored for semantic segmentation to improve the performance of the compact FCNs with large overall stride. To handle the inconsistency between the features of the student and teacher network, we optimize the feature similarity in a transferred latent domain formulated by utilizing a pretrained autoencoder. Moreover, an affinity distillation module is proposed to capture the long-range dependency by calculating the non-local interactions across the whole image. To validate the effectiveness of our proposed method, extensive experiments have been conducted on three popular benchmarks: Pascal VOC, Cityscapes and Pascal Context. Built upon a highly competitive baseline, our proposed method can improve the performance of a student network by 2.5% (mIOU boosts from 70.2 to 72.7 on the cityscapes test set) and can train a better compact model with only 8% float operations (FLOPS) of a model that achieves comparable performances.

https://arxiv.org/abs/1903.04688

论文下载

在“我爱计算机视觉”公众号对话界面回复“cvpr314”,即可收到上述所有论文的百度云下载地址。

加群交流

关注计算机视觉与机器学习技术，欢迎加入52CV群，扫码添加52CV君拉你入群，

（请务必注明:52CV）

喜欢在QQ交流的童鞋，可以加52CV官方QQ群：702781905。

（不会时时在线，如果没能及时通过验证还请见谅）

长按关注我爱计算机视觉

CVPR 2019 | 近日新出论文汇总（含视频目标分割、GAN、度量学习、高效语义分割等主题）...相关推荐

【论文速读】RandLA-Net大规模点云的高效语义分割
点云PCL免费知识星球,点云论文速读. 文章:RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds 作者:Qi ...
CVPR 2019 爆款论文作者现场解读：视觉语言导航、运动视频深度预测、6D姿态估计...
2019年计算机视觉顶会CVPR前不久刚在美国长滩闭幕.Robin.ly在大会现场独家采访20多位热点论文作者,为大家解读论文干货.本期三篇爆款文章包括: 1. CVPR满分文章.最佳学生论文奖.结合 ...
（九：2020.08.27）CVPR 2019 追踪之论文纲要（译）
CVPR 2019 追踪之论文纲要(修正于2020.08.28) 讲在前面论文目录讲在前面论坛很多博客都对论文做了总结和分类,但就医学领域而言,对这些论文的筛选信息显然需要更加精细的把控,所以自 ...
CVPR 2019超全论文合集新鲜出炉！| 资源帖
整理 | 夕颜出品 | AI科技大本营(ID: rgznai100) 实不相瞒,这是一个资源福利帖--CVPR 2019 接收论文超全合集! 此前关于 CVPR 2019 论文和合集出过不少,但是这 ...
快了！CVPR 2019 所有录用论文题目列表刊出，即将开放下载！
点击我爱计算机视觉标星,更快获取CVML新技术早前CVPR 2019 已经公布了录用论文列表,可惜只有论文编号: http://cvpr2019.thecvf.com/files/cvpr_2019 ...
CVPR 2020几篇论文内容点评：目标检测跟踪，人脸表情识别，姿态估计，实例分割等
CVPR 2020几篇论文内容点评:目标检测跟踪,人脸表情识别,姿态估计,实例分割等 CVPR 2020中选论文放榜后,最新开源项目合集也来了. 本届CPVR共接收6656篇论文,中选1470篇,&q ...
【深度学习】语义分割：论文阅读（NeurIPS 2021）MaskFormer: per-pixel classification is not all you need
目录详情知识补充语义分割实例分割基本流程主要技术路线自上而下的实例分割方法自下而上的实例分割方法掩膜 Mask 什么是mask掩码? mask掩码有什么用? mask classif ...
【论文阅读】Swin Transformer Embedding UNet用于遥感图像语义分割
[论文阅读]Swin Transformer Embedding UNet用于遥感图像语义分割文章目录 [论文阅读]Swin Transformer Embedding UNet用于遥感图像语义分割 ...
CVPR 2019 | 今日新出14篇论文汇总（来自微软、商汤、腾讯、斯坦福等）
点击我爱计算机视觉标星,更快获取CVML新技术今天新出了14篇CVPR2019的论文,CV君汇总了他们的简略信息,有代码的也一并列出了,感兴趣的朋友,可以文末下载细读. Video Generati ...

CVPR 2019 | 近日新出论文汇总（含视频目标分割、GAN、度量学习、高效语义分割等主题）...

CVPR 2019 | 近日新出论文汇总（含视频目标分割、GAN、度量学习、高效语义分割等主题）...相关推荐

最新文章

热门文章