CVPR 2018 最牛逼的十篇论文

标题

The 10 coolest papers from CVPR 2018

CVPR 2018 最牛逼的十篇论文

by 啦啦啦2

The 2018 Conference on Computer Vision and Pattern Recognition (CVPR) took place last week in Salt Lake City, USA. It’s the world’s top conference in the field of computer vision. This year, CVPR received 3,300 main conference paper submissions and accepted 979. Over 6,500 attended the conference and boy was it epic! 6500 people were packed into this room:

CVPR 2018 Grand Ballroom

Every year, CVPR brings in great people and their great research; there’s always something new to see and learn. Of course, there’s always those papers that publish new ground breaking results and bring in some great new knowledge into the field. These papers often shape the new state-of-the-art across many of the sub-domains of computer vision.

Lately though, what’s been really fun to see is those out-of-the-box and creative papers! With this fairly recent rush of deep learning in computer vision, we’re still discovering all the possibilities. Many papers will present totally new applications of deep networks in vision. They may not be the most fundamentally ground-breaking works, but they’re fun to see and offer a creative and enlightening perspective to the field, often sparking new ideas from the new angle they present. All in all, they’re pretty cool!

Here, I’m going to show you what I thought were the 10 coolest papers at CVPR 2018. We’ll see new applications that have only recently been made possible by using deep networks, and others that offer a new twist on how to use them. You might just pick up some new ideas yourself along the way ;). Without further adieu, let’s dive in!

2018年计算机视觉和模式识别会议（CVPR）上周在美国盐湖城举行。该会议是计算机视觉领域的世界顶级会议。今年，CVPR 收到3300篇主要会议论文并且最终被接收的论文多达 979 篇。超过6,500人参加了会议，这可以说是史诗级的大规模！ 6500人在下图的会议厅参会：

CVPR 2018大会会厅

每年，CVPR都会带来优秀的人才以及他们很棒的研究; 并且总能看到和学习到一些新的东西。当然，每年都有一些论文发表新的突破性成果，并为该领域带来一些很有用的新知识。这些论文经常在计算机视觉的许多子领域带来最先进的前沿技术。

最近，喜闻乐见的是那些开箱即用的创意论文！随着深度学习在计算机视觉领域的不断应用，我们仍然在探索各种可能性。许多论文将展示深度网络在计算机视觉中的全新应用。它们可能不是根本上的突破性作品，但它们很有趣，并且可以为该领域提供创造性和启发性的视角，从它们呈现的新角度经常可以引发新的想法。总而言之，它们非常酷！

在这里，我将向您展示我认为在2018年CVPR上的10篇最酷论文。我们将看到最近才使用的深度网络实现的新应用，以及其他的一些提供了新的使用方法和技巧的应用。您可能会在此过程中从中获得一些新想法；）。话不多说，让我们开始吧！

by Vincents2

Training Deep Networks with Synthetic Data: Bridging the Reality Gap by Domain Randomization

This paper comes from Nvidia and goes full throttle on using synthetic data to train Convolutional Neural Networks (CNNs). They created a plugin for Unreal Engine 4 which will generate synthetic training data. The real key is that they randomize many of the variables that training data can have including:

number and types of objects
number, types, colors, and scales of distractors
texture on the object of interest, and background photograph
location of the virtual camera with respect to the scene
angle of the camera with respect to the scene
number and locations of point lights

They showed some pretty promising results that demonstrate the effectiveness of pre-training with synthetic data; a result that previously has not been achieved. It may shed some light on how to go about generating and using synthetic data if you’re short on that important resource.

Figure from the paper: Training Deep Networks with Synthetic Data: Bridging the Reality Gap by Domain Randomization

WESPE: Weakly Supervised Photo Enhancer for Digital Cameras

This one’s clever! They train a Generative Adversarial Network (GAN) to automatically aesthetically enhance photographs. The cool part is that it is weakly supervised; you don’t need input-output image pairs! All you need to train the network is a set of “good” looking images (for the output ground truth) and a set of “bad” looking images that you want to enhance (for the input images). The GAN is then trained to generate an aesthetically enhancing version of the input, often greatly enhancing the color and contrast of the image.

It’s quick and easy to use because you don’t need exact pairs of images, and you end up with a “generic” image enhancer at the end. I also like that it’s a weakly supervised approach. Unsupervised learning seems quite far away. But for many sub-domains in computer vision, weak supervision seems like a promising and profitable direction.

Figure from the paper: WESPE: Weakly Supervised Photo Enhancer for Digital Cameras

使用合成数据训练深度网络：通过域随机化弥合现实差距
本文来自Nvidia，充分利用合成数据来训练卷积神经网络（CNN）。他们为虚幻引擎4创建了一个插件，该插件将生成综合训练数据。真正的关键是他们随机化了许多训练数据中可以包含的变量，包括：

对象的数量和类型
干扰物的数量，类型，颜色和尺度
感兴趣的对象和背景照片的纹理
虚拟相机相对于场景的位置
相机相对于场景的角度
点光源的数量和位置

他们展示了一些非常有前景的结果，证明了合成数据预训练的有效性; 达到了前所未有的结果。这也为没有重要数据来源时提供了一种思路：生成并使用合成数据。

图片来自论文：使用合成数据训练深度网络：通过域随机化弥合现实差距

WESPE：用于数码相机的弱监督照片增强器

这篇非常精妙！研究人员训练了一个生成对抗网络（GAN），能够自动美化图片。最酷的部分是，它是弱监督的，你不需要有输入和输出的图像对！想要训练网络，你只需要拥有一套“好看”的图片（用于输出的正确标注）和一套想进一步调整的“粗糙”的图片（用于输入图像）。生成对抗网络被训练成输出输入图像更符合审美的版本，通常是改进色彩和图片的对比度。

这一模型非常简单并且能快速上手，因为你不需要精确的图像对，并且最终会得到一个“通用的"图片增强器。我还喜欢这篇论文的一点是它是弱监督的方法，非监督学习看起来很遥远。但是对计算机视觉领域的许多子类来说，弱监督似乎是一个更可靠更有希望的方向。

图片来自论文：WESPE：用于数码相机的弱监督照片增强器

by Vincents2

Efficient Interactive Annotation of Segmentation Datasets with Polygon-RNN++

One of the main reasons deep networks work so well is the availability of large and fully annotated datasets. For many computer vision tasks however, such data is both time consuming and expensive to acquire. In particular, segmentation data requires the class labeling of each and every pixel in the images. As you can imagine….. this can take forever for big datasets!

Polygon-RNN++ allows you to set rough polygon points around each object in the image, and then the network will automatically generate the segmentation annotation! The paper shows that this method actually generalizes quite well and can be used to create quick and easy annotations for segmentation tasks!

Figure from the paper: Efficient Interactive Annotation of Segmentation Datasets with Polygon-RNN++

Creating Capsule Wardrobes from Fashion Images

Hhmmm what should I wear today? Wouldn’t it be great if someone or something could answer that question for you each morning, so that you wouldn’t have to? Well then say hello to Capsule Wardrobes!

In this paper, the authors design a model that, given an inventory of candidate garments and accessories, can assemble a minimal set of items that provides maximal mix-and-match outfits. It’s basically trained using objective functions that are designed to capture the key ingredients of visual compatibility, versatility, and user-specific preference. With wardrobe capsules, it’s easy to get the best looking outfit that fits your taste from your wardrobe!

Figure from the paper: Creating Capsule Wardrobes from Fashion Images

用Polygon-RNN ++实现分段数据集的高效交互式标注

深度网络能够良好运行的一个主要原因是有大型的经过标注的可用的数据集。然而对很多机器视觉任务来说，想获得这样的数据会很耗费时间并且成本高昂。特别是分割的数据需要对图片中的每个像素进行分类标注。所以对大型数据集来说，你可以想象......标注任务永远不可能标完！

Polygon-RNN++能够让你在图中每个目标物体的周围大致圈出多边形形状，然后网络会自动生成分割的标注！论文中表明，这一方法的表现非常不错，并且能在分割任务中快速生成简单标注！

图片来自论文：用Polygon-RNN ++实现分段数据集的高效交互式标注