
Taylor Guo, 2017年9月24日

Matterport3D: Learning from RGB-D Data in Indoor Environments

Matterport3D Home Page

mpview is a C++ application for parsing and viewing houses in the Matterport3D dataset.




3.1 数据获取过程


数据集中的每个环境,操作人员拍摄的一组全景在可行走的户型图的区域上都统一间隔2.5米。用户用iPad App标记窗户和镜子所在的位置,并把数据传给Matterport。Matterport对原始数据做以下处理:





3.2 语义标注







3.3 数据集的特性





图6:点云可视化(从左到右:色彩,diffuse shading,法线)。这些图片表示根据相机位姿将像素从所有RGB-D图像上重投射回世界空间中。注意全局配准的精度和表面法线的相对低噪声,不需要深度融合技术。







4 数据深度学习


4.1 关键点匹配



4.2 视图重叠区域估计




训练一个卷积神经网络(ResNet-50)将图像特征提取出来,特征之间用L2距离表示更高的重合度。训练这个模型的损失函数是距离比率损失函数【19】。重叠区域函数取值在0到1之间。在三联体神经网络头上添加回归损失函数可以将重叠区域 回归到 匹配的图像对上(重叠比率大于0.1)。

4.3 表面法线估计


Matterport3D数据集中的法线可以用来训练更好的模型来预测法线。我们采用了【48】的模型,在NYUv2数据集上获得了更好的结果。这个模型是一个全连接的卷积神经网络,由一个编码器,( 与VGG-16的架构完全一样,从开始到全连接层,) 和一个纯对称的解码器组成。


Screened Poisson Surface Reconstruction

ScanNet Dataset

ScanNet Home Page

3DMatch: RGB-D Local Geometric Descriptors


[1] I. Armeni, S. Sax, A. R. Zamir, and S. Savarese. Joint 2D-3D-semantic data for indoor scene understanding. arXiv preprint arXiv:1702.01105, 2017.
[2] I. Armeni, O. Sener, A. R. Zamir, H. Jiang, I. Brilakis, M. Fischer, and S. Savarese. 3D semantic parsing of largescale indoor spaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1534–1543, 2016.
[3] A. Bansal, B. Russell, and A. Gupta. Marr revisited: 2D-3D alignment via surface normal prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5965–5974, 2016.
[4] S. Bell, K. Bala, and N. Snavely. Intrinsic images in the wild. ACM Trans. on Graphics (SIGGRAPH), 33(4), 2014.
[5] S. Choi, Q.-Y. Zhou, S. Miller, and V. Koltun. A large dataset of object scans. arXiv preprint arXiv:1602.02481, 2016.
[6] M. Chuang and M. Kazhdan. Interactive and anisotropic geometry processing using the screened poisson equation. ACM Transactions on Graphics (TOG), 30(4):57, 2011.
[7] A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes., 2017.
[8] A. Dai, M. Nießner, M. Zoll¨ofer, S. Izadi, and C. Theobalt. Bundlefusion: Real-time globally consistent 3d reconstruction using on-the-fly surface re-integration. ACM Transactions on Graphics 2017 (TOG), 2017.
[9] D. Eigen and R. Fergus. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proceedings of the IEEE International Conference on Computer Vision, pages 2650–2658, 2015.
[10] M. Firman. RGBD datasets: Past, present and future. In CVPRWorkshop on Large Scale 3D Data: Acquisition, Modelling and Analysis, 2016.
[11] D. F. Fouhey, A. Gupta, and M. Hebert. Data-driven 3D primitives for single image understanding. In ICCV, 2013.
[12] D. F. Fouhey, A. Gupta, and M. Hebert. Unfolding an indoor origami world. In European Conference on Computer Vision, pages 687–702. Springer, 2014.
[13] S. Gupta, P. Arbelaez, and J. Malik. Perceptual organization and recognition of indoor scenes from RGB-D images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 564–571, 2013.
[14] S. Gupta, R. Girshick, P. Arbel´aez, and J. Malik. Learning rich features from RGB-D images for object detection and segmentation: Supplementary material, 2014.
[15] M. Halber and T. Funkhouser. Structured global registration of rgb-d scans in indoor environments. 2017.
[16] X. Han, T. Leung, Y. Jia, R. Sukthankar, and A. C. Berg. Matchnet: Unifying feature and metric learning for patchbased matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3279–3286, 2015.
[17] A. Handa, V. Patraucean, V. Badrinarayanan, S. Stent, and R. Cipolla. SceneNet: Understanding real world indoor scenes with synthetic data. arXiv preprint arXiv:1511.07041, 2015.
[18] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.
[19] E. Hoffer, I. Hubara, and N. Ailon. Deep unsupervised learning through spatial contrasting. arXiv preprint arXiv:1610.00243, 2016.
[20] B.-S. Hua, Q.-H. Pham, D. T. Nguyen, M.-K. Tran, L.-F. Yu, and S.-K. Yeung. SceneNN: A scene meshes dataset with annotations. In International Conference on 3D Vision (3DV), volume 1, 2016.
[21] M. Kazhdan, M. Bolitho, and H. Hoppe. Poisson surface reconstruction. In Proceedings of the fourth Eurographics symposium on Geometry processing, volume 7, 2006.
[22] A. Knapitsch, J. Park, Q.-Y. Zhou, and V. Koltun. Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics, 36(4), 2017.
[23] B. Li, C. Shen, Y. Dai, A. van den Hengel, and M. He. Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1119–1127, 2015.
[24] D. Lin, S. Fidler, and R. Urtasun. Holistic scene understanding for 3D object detection with rgbd cameras. In Proceedings of the IEEE International Conference on Computer Vision, pages 1417–1424, 2013.
[25] M. Nießner, M. Zollh¨ofer, S. Izadi, and M. Stamminger. Real-time 3D reconstruction at scale using voxel hashing. ACM Transactions on Graphics (TOG), 2013.
[26] K. Rematas, T. Ritschel, M. Fritz, E. Gavves, and T. Tuytelaars. Deep reflectance maps. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4508–4516, 2016.
[27] X. Ren, L. Bo, and D. Fox. RGB-(D) scene labeling: Features and algorithms. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 2759–2766. IEEE, 2012.
[28] M. Savva, A. X. Chang, P. Hanrahan, M. Fisher, and M. Nießner. PiGraphs: Learning Interaction Snapshots from Observations. ACM Transactions on Graphics (TOG), 35(4), 2016.
[29] T. Schmidt, R. Newcombe, and D. Fox. Self-supervised visual descriptor learning for dense correspondence. IEEE Robotics and Automation Letters, 2(2):420–427, 2017.
[30] J. Shotton, B. Glocker, C. Zach, S. Izadi, A. Criminisi, and A. Fitzgibbon. Scene coordinate regression forests for camera relocalization in RGB-D images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2930–2937, 2013.
[31] A. Shrivastava and A. Gupta. Building part-based object detectors via 3D geometry. In Proceedings of the IEEE International Conference on Computer Vision, pages 1745–1752, 2013.
[32] N. Silberman and R. Fergus. Indoor scene segmentation using a structured light sensor. In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, 2011.
[33] N. Silberman, D. Hoiem, P. Kohli, and R. Fergus. Indoor segmentation and support inference from RGBD images. In European Conference on Computer Vision, 2012.
[34] E. Simo-Serra, E. Trulls, L. Ferraz, I. Kokkinos, P. Fua, and F. Moreno-Noguer. Discriminative learning of deep convolutional feature point descriptors. In Proceedings of the IEEE International Conference on Computer Vision, pages 118–126, 2015.
[35] S. Song, S. P. Lichtenberg, and J. Xiao. SUN RGB-D: A RGB-D scene understanding benchmark suite. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 567–576, 2015.
[36] S. Song and J. Xiao. Sliding shapes for 3D object detection in depth images. In European conference on computer vision, pages 634–651. Springer, 2014.
[37] S. Song and J. Xiao. Deep sliding shapes for amodal 3D object detection in RGB-D images. 2016.
[38] S. Song, F. Yu, A. Zeng, A. X. Chang, M. Savva, and T. Funkhouser. Semantic scene completion from a single depth image. arXiv preprint arXiv:1611.08974, 2016.
[39] J. Valentin, A. Dai, M. Nießner, P. Kohli, P. Torr, S. Izadi, and C. Keskin. Learning to navigate the energy landscape. arXiv preprint arXiv:1603.05772, 2016.
[40] J. Valentin, V. Vineet, M.-M. Cheng, D. Kim, J. Shotton, P. Kohli, M. Nießner, A. Criminisi, S. Izadi, and P. Torr. SemanticPaint: Interactive 3D labeling and learning at your fingertips. ACM Transactions on Graphics (TOG), 34(5):154, 2015.
[41] X. Wang, D. Fouhey, and A. Gupta. Designing deep networks for surface normal estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 539–547, 2015.
[42] J. Xiao, K. A. Ehinger, A. Oliva, and A. Torralba. Recognizing scene viewpoint using panoramic place representation. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 2695–2702. IEEE, 2012.
[43] J. Xiao, J. Hays, K. A. Ehinger, A. Oliva, and A. Torralba. Sun database: Large-scale scene recognition from abbey to zoo. In Computer vision and pattern recognition (CVPR), 2010 IEEE conference on, pages 3485–3492. IEEE, 2010.
[44] J. Xiao, A. Owens, and A. Torralba. SUN3D: A database of big spaces reconstructed using SFM and object labels. In Proceedings of the IEEE International Conference on Computer Vision, pages 1625–1632, 2013.
[45] K. M. Yi, E. Trulls, V. Lepetit, and P. Fua. LIFT: Learned invariant feature transform. In European Conference on Computer Vision, pages 467–483. Springer, 2016.
[46] A. Zeng, S. Song, M. Niessner, M. Fisher, J. Xiao, and T. Funkhouser. 3DMatch: Learning local geometric descriptors from RGB-D reconstructions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.
[47] J. Zhang, C. Kan, A. G. Schwing, and R. Urtasun. Estimating the 3D layout of indoor scenes and its clutter from depth sensors. In Proceedings of the IEEE International Conference on Computer Vision, pages 1273–1280, 2013.
[48] Y. Zhang, S. Song, E. Yumer, M. Savva, J.-Y. Lee, H. Jin, and T. Funkhouser. Physically-based rendering for indoor scene understanding using convolutional neural networks. arXiv preprint arXiv:1612.07429, 2016.
[49] B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva. Learning deep features for scene recognition using places database. In Advances in neural information processing systems, pages 487–495, 2014.


  1. 基于三维数据的深度学习综述

    众所周知,计算机视觉的目标是对图像进行理解.我们从图像中获取视觉特征,从视觉特征中对图像.场景等进行认知,最终达到理解.感知.交互.目前,比较主流的计算机视觉基本是基于二维数据进行的,但是回顾计算机视 ...

  2. 一文入门基于三维数据的深度学习

    本文转载自北京智源人工智能研究院. 这是一篇三维数据深度学习的入门好文,兼顾基础与前沿,值得收藏!为方便大家学习,本文PDF版本和所列出的所有文献提供下载,(2020年7月27日11点后)在我爱计算机 ...

  3. 值得收藏!基于激光雷达数据的深度学习目标检测方法大合集(下)

    作者 | 黄浴 来源 | 转载自知乎专栏自动驾驶的挑战和发展 [导读]在近日发布的<值得收藏!基于激光雷达数据的深度学习目标检测方法大合集(上)>一文中,作者介绍了一部分各大公司和机构基于 ...

  4. NVIDIA数据中心深度学习产品性能

    NVIDIA数据中心深度学习产品性能 在现实世界的应用程序中部署AI,需要训练网络以指定的精度融合.这是测试AI系统的最佳方法-准备将其部署在现场,因为网络随后可以提供有意义的结果(例如,对视频流正确 ...

  5. 基于大数据与深度学习的自然语言对话

    基于大数据与深度学习的自然语言对话 发表于2015-12-04 09:44| 7989次阅读| 来源<程序员>电子刊| 5 条评论| 作者李航.吕正东.尚利峰 大数据深度学习自然语言处理自 ...

  6. 使用TensorFlow.js进行人脸触摸检测第1部分:将实时网络摄像头数据与深度学习配合使用

    目录 起点 将HTML5网络摄像头API与TensorFlow.js结合使用 检测脸部触摸 技术脚注 终点线 下一步是什么?我们是否可以在未经培训的情况下检测到面部触摸? 下载TensorFlowJS ...

  7. 人工智能,大数据,深度学习,机器学习(百度云581G学习资料免费分享)

    人工智能视频学习 链接: 提取码:264a 复制这段内容后打开百度网盘手机App,操作更方便哦 机器学习 ...

  8. 【网络流量识别】【深度学习】【四】DNN、GBT和RF—利用大数据和深度学习技术进行入侵检测

    本文发表于2019年4月,ACM东南会议纪要,作者为奥萨马·费克等人,现收录于ACM网站. 原文题目:使用大数据和深度学习技术进行入侵检测 原文链接:使用大数据和深度学习技术进行入侵检测|2019年A ...

  9. 最新7篇数据科学/深度学习/CNN/知识图谱/文本匹配等中英文综述论文推介(附下载)

    1 ▌深度文本匹配综述 作者:庞亮  兰艳艳  徐君  郭嘉丰  万圣贤  程学旗 摘要:自然语言理解的许多任务,例如信息检索.自动问答.机器翻译.对话系统.复述问题等等,都可以抽象成文本匹配问题.过 ...


  1. java无法编译_Java静态方法无法编译
  2. php组合查询,PHP组合查询多条件查询实例代码第1/2页
  3. UVa 297 - Quadtrees
  4. 《PHP和MySQL Web开发从新手到高手(第5版)》一2章 MySQL简介2.1 数据库简介
  5. 凸集、凸函数、凸优化问题 概念关联
  6. MongoDB一 之增删改查
  7. javascript打印1-100内的质数
  8. PHP设计模式之----观察者模式
  9. linux命令之tee,linux tee命令
  10. Flash与组件:制作Slider组件
  11. Docker 概述 与 CentOS 上安装、卸载、启动
  12. 大学生IT创业计划书
  13. C语言运算符和结合性
  14. sketch插件 android,用这个免费的Sketch插件,帮你完美还原安卓界面!
  15. 深入理解操作系统实验——bomb lab(phase_5)
  16. 海思3559万能平台:VGS的画线处理
  17. 基础数学(八)——期末考试复习
  18. 淘宝视频内容标签的结构化分析和管理
  19. mac 终端 配置代理
  20. kenlm N-gram语言模型的安装踩坑及使用


  1. 真真正正解决VScode不能安装插件问题、无法连接到应用商城问题
  2. MacBook连接蓝牙鼠标、蓝牙键盘失败的解决方案
  3. debug这个词真的源自小虫子
  4. 计算机科学引论connectivity,计算机科学引论课后题原文
  5. 【AI人工智能学习】GitHub 上适合初学者的 10 个最佳开源 AI 项目
  6. oracle 落落是谁,落落是老虎妖精吗?落落的父母是什么身份?
  7. 支付网关潜藏黑客!32万多含CVV码的财务信息被盗
  8. 银行相关术语及其翻译
  9. 喜欢在大雨中奔跑的感觉
  10. ThinkAdmin漏洞(CVE-2020-25540 )复现