关于 #今日arXiv精选 

这是「AI 学术前沿」旗下的一档栏目,编辑将每日从arXiv中精选高质量论文,推送给读者。

The Power of Points for Modeling Humans in Clothing

Comment: In ICCV 2021. Project page: https://qianlim.github.io/POP

Link: http://arxiv.org/abs/2109.01137

Abstract

Currently it requires an artist to create 3D human avatars with realisticclothing that can move naturally. Despite progress on 3D scanning and modelingof human bodies, there is still no technology that can easily turn a staticscan into an animatable avatar. Automating the creation of such avatars wouldenable many applications in games, social networking, animation, and AR/VR toname a few. The key problem is one of representation. Standard 3D meshes arewidely used in modeling the minimally-clothed body but do not readily capturethe complex topology of clothing. Recent interest has shifted to implicitsurface models for this task but they are computationally heavy and lackcompatibility with existing 3D tools. What is needed is a 3D representationthat can capture varied topology at high resolution and that can be learnedfrom data. We argue that this representation has been with us all along -- thepoint cloud. Point clouds have properties of both implicit and explicitrepresentations that we exploit to model 3D garment geometry on a human body.We train a neural network with a novel local clothing geometric feature torepresent the shape of different outfits. The network is trained from 3D pointclouds of many types of clothing, on many bodies, in many poses, and learns tomodel pose-dependent clothing deformations. The geometry feature can beoptimized to fit a previously unseen scan of a person in clothing, enabling thescan to be reposed realistically. Our model demonstrates superior quantitativeand qualitative results in both multi-outfit modeling and unseen outfitanimation. The code is available for research purposes.

NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo

Comment: To appear in ICCV 2021 (Oral). Project page:  https://weiyithu.github.io/NerfingMVS/

Link: http://arxiv.org/abs/2109.01129

Abstract

In this work, we present a new multi-view depth estimation method thatutilizes both conventional SfM reconstruction and learning-based priors overthe recently proposed neural radiance fields (NeRF). Unlike existing neuralnetwork based optimization method that relies on estimated correspondences, ourmethod directly optimizes over implicit volumes, eliminating the challengingstep of matching pixels in indoor scenes. The key to our approach is to utilizethe learning-based priors to guide the optimization process of NeRF. Our systemfirstly adapts a monocular depth network over the target scene by finetuning onits sparse SfM reconstruction. Then, we show that the shape-radiance ambiguityof NeRF still exists in indoor environments and propose to address the issue byemploying the adapted depth priors to monitor the sampling process of volumerendering. Finally, a per-pixel confidence map acquired by error computation onthe rendered image can be used to further improve the depth quality.Experiments show that our proposed framework significantly outperformsstate-of-the-art methods on indoor scenes, with surprising findings presentedon the effectiveness of correspondence-based optimization and NeRF-basedoptimization over the adapted depth priors. In addition, we show that theguided optimization scheme does not sacrifice the original synthesis capabilityof neural radiance fields, improving the rendering quality on both seen andnovel views. Code is available at https://github.com/weiyithu/NerfingMVS.

The Functional Correspondence Problem

Comment: Accepted to ICCV 2021

Link: http://arxiv.org/abs/2109.01097

Abstract

The ability to find correspondences in visual data is the essence of mostcomputer vision tasks. But what are the right correspondences? The task ofvisual correspondence is well defined for two different images of same objectinstance. In case of two images of objects belonging to same category, visualcorrespondence is reasonably well-defined in most cases. But what aboutcorrespondence between two objects of completely different category -- e.g., ashoe and a bottle? Does there exist any correspondence? Inspired by humans'ability to: (a) generalize beyond semantic categories and; (b) infer functionalaffordances, we introduce the problem of functional correspondences in thispaper. Given images of two objects, we ask a simple question: what is the setof correspondences between these two images for a given task? For example, whatare the correspondences between a bottle and shoe for the task of pounding orthe task of pouring. We introduce a new dataset: FunKPoint that has groundtruth correspondences for 10 tasks and 20 object categories. We also introducea modular task-driven representation for attacking this problem and demonstratethat our learned representation is effective for this task. But mostimportantly, because our supervision signal is not bound by semantics, we showthat our learned representation can generalize better on few-shotclassification problem. We hope this paper will inspire our community to thinkbeyond semantics and focus more on cross-category generalization and learningrepresentations for robotics tasks.

SLIDE: Single Image 3D Photography with Soft Layering and Depth-aware Inpainting

Comment: ICCV 2021 (Oral); Project page: https://varunjampani.github.io/slide  ; Video: https://www.youtube.com/watch?v=RQio7q-ueY8

Link: http://arxiv.org/abs/2109.01068

Abstract

Single image 3D photography enables viewers to view a still image from novelviewpoints. Recent approaches combine monocular depth networks with inpaintingnetworks to achieve compelling results. A drawback of these techniques is theuse of hard depth layering, making them unable to model intricate appearancedetails such as thin hair-like structures. We present SLIDE, a modular andunified system for single image 3D photography that uses a simple yet effectivesoft layering strategy to better preserve appearance details in novel views. Inaddition, we propose a novel depth-aware training strategy for our inpaintingmodule, better suited for the 3D photography task. The resulting SLIDE approachis modular, enabling the use of other components such as segmentation andmatting for improved layering. At the same time, SLIDE uses an efficientlayered depth formulation that only requires a single forward pass through thecomponent networks to produce high quality 3D photos. Extensive experimentalanalysis on three view-synthesis datasets, in combination with user studies onin-the-wild image collections, demonstrate superior performance of ourtechnique in comparison to existing strong baselines while being conceptuallymuch simpler. Project page: https://varunjampani.github.io/slide

4D-Net for Learned Multi-Modal Alignment

Comment: ICCV 2021

Link: http://arxiv.org/abs/2109.01066

Abstract

We present 4D-Net, a 3D object detection approach, which utilizes 3D PointCloud and RGB sensing information, both in time. We are able to incorporate the4D information by performing a novel dynamic connection learning across variousfeature representations and levels of abstraction, as well as by observinggeometric constraints. Our approach outperforms the state-of-the-art and strongbaselines on the Waymo Open Dataset. 4D-Net is better able to use motion cuesand dense image information to detect distant objects more successfully.

Adversarial Robustness for Unsupervised Domain Adaptation

Comment: Accepted by ICCV 2021

Link: http://arxiv.org/abs/2109.00946

Abstract

Extensive Unsupervised Domain Adaptation (UDA) studies have shown greatsuccess in practice by learning transferable representations across a labeledsource domain and an unlabeled target domain with deep models. However,previous works focus on improving the generalization ability of UDA models onclean examples without considering the adversarial robustness, which is crucialin real-world applications. Conventional adversarial training methods are notsuitable for the adversarial robustness on the unlabeled target domain of UDAsince they train models with adversarial examples generated by the supervisedloss function. In this work, we leverage intermediate representations learnedby multiple robust ImageNet models to improve the robustness of UDA models. Ourmethod works by aligning the features of the UDA model with the robust featureslearned by ImageNet pre-trained models along with domain adaptation training.It utilizes both labeled and unlabeled domains and instills robustness withoutany adversarial intervention or label requirement during domain adaptationtraining. Experimental results show that our method significantly improvesadversarial robustness compared to the baseline while keeping clean accuracy onvarious UDA benchmarks.

Generative Models for Multi-Illumination Color Constancy

Comment: Accepted in International Conference on Computer Vision Workshop  (ICCVW) 2021

Link: http://arxiv.org/abs/2109.00863

Abstract

In this paper, the aim is multi-illumination color constancy. However, mostof the existing color constancy methods are designed for single light sources.Furthermore, datasets for learning multiple illumination color constancy arelargely missing. We propose a seed (physics driven) based multi-illuminationcolor constancy method. GANs are exploited to model the illumination estimationproblem as an image-to-image domain translation problem. Additionally, a novelmulti-illumination data augmentation method is proposed. Experiments on singleand multi-illumination datasets show that our methods outperform sota methods.

SlowFast Rolling-Unrolling LSTMs for Action Anticipation in Egocentric Videos

Comment: Accepted to EPIC@ICCV 2021

Link: http://arxiv.org/abs/2109.00829

Abstract

Action anticipation in egocentric videos is a difficult task due to theinherently multi-modal nature of human actions. Additionally, some actionshappen faster or slower than others depending on the actor or surroundingcontext which could vary each time and lead to different predictions. Based onthis idea, we build upon RULSTM architecture, which is specifically designedfor anticipating human actions, and propose a novel attention-based techniqueto evaluate, simultaneously, slow and fast features extracted from threedifferent modalities, namely RGB, optical flow, and extracted objects. Twobranches process information at different time scales, i.e., frame-rates, andseveral fusion schemes are considered to improve prediction accuracy. Weperform extensive experiments on EpicKitchens-55 and EGTEA Gaze+ datasets, anddemonstrate that our technique systematically improves the results of RULSTMarchitecture for Top-5 accuracy metric at different anticipation times.

Self-Calibrating Neural Radiance Fields

Comment: Accepted in ICCV21, Project Page:  https://postech-cvlab.github.io/SCNeRF/

Link: http://arxiv.org/abs/2108.13826

Abstract

In this work, we propose a camera self-calibration algorithm for genericcameras with arbitrary non-linear distortions. We jointly learn the geometry ofthe scene and the accurate camera parameters without any calibration objects.Our camera model consists of a pinhole model, a fourth order radial distortion,and a generic noise model that can learn arbitrary non-linear cameradistortions. While traditional self-calibration algorithms mostly rely ongeometric constraints, we additionally incorporate photometric consistency.This requires learning the geometry of the scene, and we use Neural RadianceFields (NeRF). We also propose a new geometric loss function, viz., projectedray distance loss, to incorporate geometric consistency for complex non-linearcamera models. We validate our approach on standard real image datasets anddemonstrate that our model can learn the camera intrinsics and extrinsics(pose) from scratch without COLMAP initialization. Also, we show that learningaccurate camera models in a differentiable manner allows us to improve PSNRover baselines. Our module is an easy-to-use plugin that can be applied to NeRFvariants to improve performance. The code and data are currently available athttps://github.com/POSTECH-CVLab/SCNeRF.

·

今日arXiv精选 | 9篇ICCV 2021最新论文相关推荐

  1. 今日arXiv精选 | 11篇ICCV 2021最新论文

     关于 #今日arXiv精选  这是「AI 学术前沿」旗下的一档栏目,编辑将每日从arXiv中精选高质量论文,推送给读者. Explain Me the Painting: Multi-Topic K ...

  2. 今日arXiv精选 | 14 篇 ICCV 2021 最新论文

     关于 #今日arXiv精选  这是「AI 学术前沿」旗下的一档栏目,编辑将每日从arXiv中精选高质量论文,推送给读者. LocTex: Learning Data-Efficient Visual ...

  3. 今日arXiv精选 | 13 篇 ICCV 2021 最新论文

     关于 #今日arXiv精选  这是「AI 学术前沿」旗下的一档栏目,编辑将每日从arXiv中精选高质量论文,推送给读者. A QuadTree Image Representation for Co ...

  4. 今日arXiv精选 | 15篇ICCV 2021最新论文

     关于 #今日arXiv精选  这是「AI 学术前沿」旗下的一档栏目,编辑将每日从arXiv中精选高质量论文,推送给读者. Image In painting Applied to Art Compl ...

  5. 今日arXiv精选 | 11篇EMNLP 2021最新论文

     关于 #今日arXiv精选  这是「AI 学术前沿」旗下的一档栏目,编辑将每日从arXiv中精选高质量论文,推送给读者. Does Vision-and-Language Pretraining I ...

  6. 今日arXiv精选 | 13篇EMNLP 2021最新论文

     关于 #今日arXiv精选  这是「AI 学术前沿」旗下的一档栏目,编辑将每日从arXiv中精选高质量论文,推送给读者. Classification-based Quality Estimatio ...

  7. 今日arXiv精选 | 21篇EMNLP 2021最新论文

     关于 #今日arXiv精选  这是「AI 学术前沿」旗下的一档栏目,编辑将每日从arXiv中精选高质量论文,推送给读者. Efficient Domain Adaptation of Languag ...

  8. 今日arXiv精选 | 28篇EMNLP 2021最新论文

     关于 #今日arXiv精选  这是「AI 学术前沿」旗下的一档栏目,编辑将每日从arXiv中精选高质量论文,推送给读者. Broaden the Vision: Geo-Diverse Visual ...

  9. 今日arXiv精选 | 46篇EMNLP 2021最新论文

     关于 #今日arXiv精选  这是「AI 学术前沿」旗下的一档栏目,编辑将每日从arXiv中精选高质量论文,推送给读者. Neural Machine Translation Quality and ...

最新文章

  1. linux 命令输出 保存到文件 日志记录
  2. C++ GUI Programming with Qt 3安装说明
  3. .Net Framework 4.0 中利用Task实现并行处理、串并行混合处理
  4. struts2的文件上传机制
  5. 微软算法面试题(4)
  6. 帝豪gl车机系统降级_何以剑指合资?帝豪GL/英朗底盘对比
  7. [转载]读史记札记26:容人岂皆有雅量
  8. Python爬上不得姐 并将段子写入数据库
  9. java实现excel 行列转置,行列转换。附完整代码
  10. 《金山词霸2009 牛津版》插件工具加载
  11. 对偶范数、霍尔德(Hölder)不等式、范数的共轭范数
  12. 单个vue组件的打包和动态引入
  13. 【猛料】腾讯前总监受贿侵占数百万获刑9年
  14. 谷歌Chrome浏览器极速模式Never Slow Mode细节曝光
  15. Codevs 1228 苹果树
  16. 人工智能AI程序设计语言简介
  17. 05. HAXM is not installed
  18. 【Linux】限制进程的CPU使用率
  19. 解决笔记本电脑连接不上鼠标问题(亲测有效)
  20. 任何一个合数都可以写成几个质数相乘的形式

热门文章

  1. python滚动条_python中selenium操作下拉滚动条的几种方法汇总
  2. dedecms 备份和恢复的完整流程
  3. CF E2 - Daleks' Invasion (medium) (LCA求两点树上路径上的最大边权)
  4. java泛型bean copy list
  5. NOIP2010排队接水
  6. 【翻译】关于vertical-align所有你需要知道的
  7. CakePHP下使用paginator需要对多个字段排序的做法
  8. 基于visual Studio2013解决C语言竞赛题之1089牛虎过河
  9. 以Post方式发送数据采用WebClient
  10. java for 死循环_简单的java死循环 java中的死循环问题